27 January 2011

Adding PDF indexing to Search Server 2010

By default 44 different file types are index by Search Server.  PDF is not one of them.

To get your search server to index them you need to do a few things.

  • Add the PDF as a crawling file type
  • Install a PDF Ifilter
  • Add additional registry entries (if using Adobe)
  • Neaten things up and add a PDF icon.


Adding PDF as a crawl file type.

  • Form Central administration / Search administration
  • Click file types on the left menu pane
  • Click new file type
  • Add pdf
  • OK
  • Start a full crawl
This will now index files with the PDF extention.  But since there is no iFilter installed the server cannot "read" the files so only metadata is used.

Searching for pdf now will only return the basic available metadata



Install a PDF iFilter
There are two major players in the PDF iFilter space that you can use.

Adobe ifilter 9 x64
http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025
This is the free option.  Unfortunately this is only a single threaded iFilter.  That means slow.  It also requires additional registry entries to be made to get it to work properly with Search Server.  This is fine if you have a small amount of pdf content you want to crawl and index.

Foxit PDF Ifilter 2
http://www.foxitsoftware.com/pdf/ifilter/
This is a paid for iFilter but it has significant performance enhancement of the Adobe filter.  It also just installs and works out the box.


This is simply a case of download and install on the index server.  Afterwards you need to reset the index and run a full crawl.


Adobe Additional registry entries.
(Foxit does this during the install)

Add a new key called ".pdf" to

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\Filters\

The following registry values need to be added. to

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\Filters\.pdf


<REG_SZ> Default = <value not set>
<REG_SZ> Extension = pdf
<REG_DWORD> FileTypeBucket = 1
<REG_SZ> MimeTypes = application/pdf

Add a new key called ".pdf" to

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ContentIndexCommon\Filters\Extension

Set the default value to {E8978DA6-047F-4E3D-9C78-CDBE46041603}

This value is the CLSID for the iFilter.  To get the correct CLSID if adobe ever releases a new version search the registry for the filter installation path, in this case it is " C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms"

When this finishes you can search for pdf again and this time you will see that the content of the PDF has been indexed.



Add a PDF icon to your search results.

You can download the official Adobe Icons from http://www.adobe.com/misc/linking.html (yes really)
You of course do not have to use them.

You should get the 16x16 .gif file to match all your other ones.

On every web server copy your .gif file to

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\TEMPLATE\IMAGES

Browse to

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\TEMPLATE\XML

Edit the docicon.xml file.

Add
<Mapping Key=”pdf” Value=”YourFileName.gif” OpenControl=”"/>
before the </ByExtention> tag.



Do an IIS reset on the web server and it should now include the icon in the results.



Conclusion
Adding PDF support to your Search server is something you will have to do at some point.  IF you want an easy install and fast performance you must use the Foxit.  If you just want a free solution and performance is not an issue, Adobe is perfectly fine

I just have one more question - when is Apple going to try and copyright the term I-Filter for their new range of coffee machines?

No comments:

Post a Comment