27 January 2011

Adding PDF indexing to Search Server 2010

By default 44 different file types are index by Search Server.  PDF is not one of them.

To get your search server to index them you need to do a few things.

  • Add the PDF as a crawling file type
  • Install a PDF Ifilter
  • Add additional registry entries (if using Adobe)
  • Neaten things up and add a PDF icon.


Adding PDF as a crawl file type.

  • Form Central administration / Search administration
  • Click file types on the left menu pane
  • Click new file type
  • Add pdf
  • OK
  • Start a full crawl
This will now index files with the PDF extention.  But since there is no iFilter installed the server cannot "read" the files so only metadata is used.

Searching for pdf now will only return the basic available metadata



Install a PDF iFilter
There are two major players in the PDF iFilter space that you can use.

Adobe ifilter 9 x64
http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025
This is the free option.  Unfortunately this is only a single threaded iFilter.  That means slow.  It also requires additional registry entries to be made to get it to work properly with Search Server.  This is fine if you have a small amount of pdf content you want to crawl and index.

Foxit PDF Ifilter 2
http://www.foxitsoftware.com/pdf/ifilter/
This is a paid for iFilter but it has significant performance enhancement of the Adobe filter.  It also just installs and works out the box.


This is simply a case of download and install on the index server.  Afterwards you need to reset the index and run a full crawl.


Adobe Additional registry entries.
(Foxit does this during the install)

Add a new key called ".pdf" to

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\Filters\

The following registry values need to be added. to

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\Filters\.pdf


<REG_SZ> Default = <value not set>
<REG_SZ> Extension = pdf
<REG_DWORD> FileTypeBucket = 1
<REG_SZ> MimeTypes = application/pdf

Add a new key called ".pdf" to

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ContentIndexCommon\Filters\Extension

Set the default value to {E8978DA6-047F-4E3D-9C78-CDBE46041603}

This value is the CLSID for the iFilter.  To get the correct CLSID if adobe ever releases a new version search the registry for the filter installation path, in this case it is " C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms"

When this finishes you can search for pdf again and this time you will see that the content of the PDF has been indexed.



Add a PDF icon to your search results.

You can download the official Adobe Icons from http://www.adobe.com/misc/linking.html (yes really)
You of course do not have to use them.

You should get the 16x16 .gif file to match all your other ones.

On every web server copy your .gif file to

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\TEMPLATE\IMAGES

Browse to

C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\TEMPLATE\XML

Edit the docicon.xml file.

Add
<Mapping Key=”pdf” Value=”YourFileName.gif” OpenControl=”"/>
before the </ByExtention> tag.



Do an IIS reset on the web server and it should now include the icon in the results.



Conclusion
Adding PDF support to your Search server is something you will have to do at some point.  IF you want an easy install and fast performance you must use the Foxit.  If you just want a free solution and performance is not an issue, Adobe is perfectly fine

I just have one more question - when is Apple going to try and copyright the term I-Filter for their new range of coffee machines?

26 January 2011

Microsoft Search Server Feature Comparison

There are a few flavors of search available from MS at this point in time.  This table should give you an in depth comparison.


Feature comparison



Feature

SharePoint Foundation 2010

Search Server 2010 Express

Search Server 2010

SharePoint Server 2010

FAST Search Server 2010 for SharePoint

Basic site search

Y

Y

Y

Y

Y

Best Bets

Y

Y

Y

Y

Visual Best Bets

Y

Similar Results

Y

Duplicate Results

Y

Y

Y

Y

Y

Search Scopes

Y

Y

Y

Y

RSS Feeds for Search Results

Y

Y

Y

Y

Y

Alerts for Search Results

Y*

Y*

Y*

Y*

Advanced Search Page

Y

Y

Y

Y

Search Enhancement based on User Contexts

Y

Crawled and Managed Properties

Y

Y

Y

Y**

Entity Extraction

Y

Query Federation

Y

Y

Y

Y

Query Suggestions

Y

Y

Y

Y

Sort Results on Managed Properties or Rank Profiles

Y

Relevancy Tuning by Document or Site Promotions

Y

Y

Y

Y**

Shallow Results Refinement

Y

Y

Y

Y

Deep Results Refinement

Y

Document Preview

Y

Windows 7 Federation

Y

Y

Y

Y

People Search

Y

Y

Phonetic Name Search***

Y

Y

Nickname Search***

Y

Y

Self Search

Y

Y

Social Search

Y

Y

Taxonomy Integration

Y

Y

Multi-Tenant Hosting

Y

Rich Web Indexing Support

Y




To figure what what advantages FAST offer it is just easier to watch the demo video.







So why is it called FAST?  The name comes from Fast Search & Transfer ASA a Norwegian company founded in 1997 that specialized in enterprise search solutions.    Microsoft acquired FAST in 2008 and have integrated it into SharePoint 2010

Manage queries and result with Search Server

Search server has a number of built in logic features that allows for things like plural searches etc.  But if you really want to customize search results you would want to do some tweaking in how the queries are handled.

There are a few files that can make a huge difference if you configure them to suit your environment.

  • Stop File
  • Thesaurus file
  • Custom Dictionary file
Stop File
The stop file basically eliminates certain words form the query.  As an example if you search for "The blog of choice" the words "The" and "of" will be eliminated because they are too generic to add any value to search results.  You might want to add potentially offensive words, company name etc.

To add additional words to eliminate you need to:

  • Browse to  %ProgramFiles%\Microsoft Office Servers\14.0\Data\Applications\GUID\Config
  • Edit the noiseenu.txt file
  • Add the additional stop words and save the file.
  • Restart the SharePoint Server Search 14 service

Thesaurus file
This xml file allows you to replace search query terms or to specify synonyms (expansion) to include in a query.  

As an example for a replace, if you search for "The blog of choice" you can replace the word "blog" with "web site" to the query being sent will turn into "The web site of choice"

As an example of a synonym you can expand the word "blog" to also include the word "web site."  The query being sent will then turn into ""The blog or web site of choice"

  • Browse to  %ProgramFiles%\Microsoft Office Servers\14.0\Data\Applications\GUID\Config
  • Edit the tsneu.xml file
  • Remove the <!-- Commented out
  • Remove the -->
  • Add your own expansion or replacement
  • Restart the SharePoint Server Search 14 service

Here is an example file to show you the structure.

<XML ID="Microsoft Search Thesaurus">

<!--  Commented out

    <thesaurus xmlns="x-schema:tsSchema.xml">
        <diacritics_sensitive>0</diacritics_sensitive>
        <expansion>
            <sub>Internet Explorer</sub>
            <sub>IE</sub>
            <sub>IE8</sub>
        </expansion>
        <replacement>
            <pat>NT5</pat>
            <pat>W2K</pat>
            <sub>Windows 2000</sub>
        </replacement>
        <expansion>
            <sub>run</sub>
            <sub>jog</sub>
        </expansion>
    </thesaurus>
-->
</XML>

Not too tricky to follow.

Custom dictionary file
This is probably not the best name for it. But it is what it is.  A dictionary file can be used to prevent "normal substitution" of special characters, called word breakers. As an example if you search for the name "Blog&Go" the query will be broken up and return result for "blog" and "go" by including "Blog&Go" in the dictionary file it prevents this from happening.

There are some rules to dictionary files check out  http://technet.microsoft.com/en-us/library/cc262507.aspx for more information in this.

To add a custom dictionary file you need to:

  • Browse to %ProgramFiles%\Microsoft Office Servers\14.0\Bin.
  • Create a textfile named Custom0009.lex  (this is the filename for English - see the above link for other language file names)
  • Edit the file and add the dictionary words, one per line.
  • Save the file
  • Stop and start the SharePoint Server Search 14 service
  • Perform a full crawl

Bypassing these settings
All the setting and function provided by the above files can be bypassed by doing a literal search.  as an example if you want to search for and only for "The blog of choice" you can force this by including the quotation marks.


Conclusion
By using all or some of the files above you can fine tune how the queries are submitted and handled.  This in turn will return more relevant results.

For more information about this you can check out the "not so easy to read TechNet articles http://technet.microsoft.com/en-us/library/cc179583.aspx


24 January 2011

Getting started with Search Server 2010

Different scenarios exist for which you may want to use search.  Possibly your content is stuck somewhere in your old intranet, may the data you are looking for is in a mailbox some where, or maybe it is in a file somewhere on the file server

I really want to primarily use Search Server to recover "lost IP" from a very large file share infrastructure that has become so big and clumsy no can find anything in it.

Getting search to work is actually not complicated at all.

  • Specify a user account to be used for searching
  • Specify the content sources (file share, mailbox, SharePoint sites etc.)
  • Set up a crawl schedule

Specify a user account
Launch Central Administration Site


I am yet to find a quick way to navigate there but this works. Click on the links as they come up.

Application management
Manage Service Applications
Search Service Application



The first thing you will have to change is "Default Access Account"  Clicking on the name will allow you to change it.

When specifying this account you need to take a few things into consideration.

The account needs adequate right to crawl the content.
Confidential content should not be readable by this account.
Search result are "access based enumeration" - a user can only see what they have permission to view.

Add content sources

The content that will be search indexed is called a content source.
Under the crawling heading on the left hand menu select "Content sources"
Click new content source

A new window will open where you can specify the details
  • Name
  • Content source
  • Start address
  • Crawl settings
  • Crawl schedules
  • Content source priority
Start full crawl - will crawl the content as soon as you complete the form.


Searching the content

After completing the content source setup you will be back on the content sources screen.  You can see what the crawl status is.

Once your crawl has finished you can search from the search page.
By default this would be  http://yourservername



Specify the term you want to search for.

All the content the contains your search term that you have read access to will be returned.

One thing I find really nice here is the ability to filter your results based on: 
  • Document type
  • Content Source
  • Author
  • Modified date.
The nice part is you can do this after the initial search has completed. 



This is only a scratch on the surface of what can be done with Search Server but as you can see with very minimal effort you can already start leveraging huge functionality.



Installing Microsoft Search Server 2010

On of the most fundamental differences in they way users use their data is that no one particularly cares where stuff is anymore, as long as you can search for it and find it.

When I first started working with SharePoint years ago I could not quite understand why there was so much emphasis on search.  Now I get it.  As far as users are concerned, if it does not come up in a search result it probably does not exist.

The main problem with SharePoint search is that it was part of the SharePoint.  Not everyone had / has the appetite to implement a whole SharePoint infrastructure just to get the ability to search their company information.    In 2007 MS released "SharePoint Server 2007 Search", then "Search server 2008" and now "Search Server 2010." Search without SharePoint.

Installing is very straight forward.  I am Using Windows Server 2008 R2  and I am install Search Server 2010 x64

Open the Splash installer.


 Launching "Install Software Prerequisites" does all the hard work for you.


After a reboot you can now select the "Install search server" from the splash screen.
Enter your paid for license key

 Agree to the ULA
Clicking "Standalone" will auto configure just about everything.  If you want to build a more scalable deployment that has a full SQL back end etc. you will have to use the "Server Farm" option.

Once the install is complete it will start the configuration wizard.

It crunches away at setup database and IIS sites etc


When it is all done you can launch the Central Administrator site


That's it for installation.  Very straight forward and easy.  I really do appreciate packages like this that will prep a machine from scratch and get all the bits and pieces working together.

21 January 2011

Sophos Application Control Policy - implementation guide

A feature that I have always found very useful as part of the Sophos product is the Application Control.  As the name suggests it allows you to control which applications are allowed to run on a client machine and which ones are not.

Initially I started off just blocking software updates for Java and Adobe Reader.  As I am busy experimenting with policies for Sophos 9.5 I figured I would use it more extensively.




From the Enterprise Console in the Policies pane expand Application Control.
Create a new policy and edit it.

Scanning


The enable on-access scanning will check for application execution in real time.
Detect but allow to run will do just that, this is very usefull for initially establishing a application baseline, since all detected application will be reported on.
Enable on-demand and scheduled scanning will look for the applications on the machine during a scan.  This allows you to detect application without them being executed.

Messaging


Enabling desktop messaging will pop up the system tray message when an unauthorized application is detected.

You will definitely want to disable this when you have  "Detect but allow to run"  checked from the scanning options.
This will prevent users seeing loads of pop up messages while you are establishing your baseline.

You will also want to definitely enable this when you start blocking application from executing.  If you don't users will only experience OS level messages about access right to the application.



You can see the different options here.

Fist is the OS level error.
In the popup message you can see the default description in the red section.  While the custom "message text" is displayed in the green section.

Authorization
This is where the actual control element lies.


By Selecting the application types (orange) you will see the authorized and blocked windows populated.

All the items in Authorized will be allowed to run

All the Blocked items will be blocked, reported and or messaged on.


In my example here I would like to prevent browsers that do not meet the relevant criteria the be blocked from executing.

As part of my test I blocked Firefox.   Interestingly I was allowed to download the installation files but i could not even launch the installer before being notified and blocked.

While going through these application types you will always see an item called "All added by Sophos in the future"
I am assuming the idea here is that Sophos will include potential application that may be useful to block.  I would have like the ability to add my own list of application /executables.

Conclusion
By using application control policies you can get a better idea of your environment and the application running on your machines.  It also gives you the ability to control the execution of them.  This covers the gap left by suspicious files, Adware and PUA etc. cleanup and authorization configuration in the Antivirus and HIPS policy. It controls legitimate application you do not want to allow in your environment.

20 January 2011

Sophos Data Control Rules - Adding South African ID numbers

One of the best new features  that came out with Sophos 9 is the data control component.  It allows you to set policies that controls or prevents certain types of information to be copied / sent out of your corporate environment.

For this example I am going to add a custom rule for preventing South African ID numbers form being transfered.  This would be a typical example of something you don't want to leak, especially since these number will most often be included in a file with other sensitive information.

To set up your policy do the following:

  • Open the Sophos Enterprose Console
  • Expand the policies window - Data Control - Your Policy Name
  • Edit the policy
  • Click Manage Rules
  • Click add content rule
  • Specify a name and Rule description
  • Select the desired action from section 4.
  • In section 6. Click "contains"
  • In the Content Control List Manager window click Add
  • Specify a name and description
  • Click on advanced
  • Click create

The Perl5 regular expression is (((\d{2}((0[13578]|1[02])(0[1-9]|[12]\d|3[01])|(0[13456789]|1[012])(0[1-9]|[12]\d|30)|02(0[1-9]|1\d|2[0-8])))|([02468][048]|[13579][26])0229))(( |-)(\d{4})( |-)(\d{3})|(\d{7}))


  • Set the score and maximum count - for testing I have it set to 1
  • Close the windows till you get back to the Content Control List Manager window
  • Check your new definition
  • Click OK
  • Back in the Create Content Rule page in section 6 click "Select Destination"
  • Choose your desired options - click ok
  • Click ok

You can now deploy the policy to your test machine.

On the test machine create a text file with 2 valid SA ID numbers in it.
Attempt to copy it to a removable drive.
Your action should now apply.


The section in red identifies the filter that matched the file content.
The section in green is the custom message you specifies in the Data control policy under the Messages tab.


You can specify alternate message for the confirmation and block actions.

For more South African regular expression and others check out http://regexlib.com/Search.aspx?k=south+africa  Just remember to test and not just blindly trust the expression.

Thanks to Ryane Cane (Sophos South Africa) for his contribution to this article.

Very sneaky malware technique

I was going through some log and infection notices and I spotted this one.

From a machine that had an attempted infection of W32/Chir-B I got the following interesting scenario.  This according to all documentation is not a know technique but one that could arise given the variables in play.
I just recreated a simple example to show the sneakyness of this social engineering method.

For the average user this would very easily cause them to double click this.


A shortcut to a folder with the default "New Folder" name.  "Maybe I should check it out and see what's in it."

Strangely it does not open a folder. "Oh well maybe it was nothing."

If you follow the shortcut it actually points to a file called "New Folder.exe"  But because the default is to hide know extension it looks like a folder with a "funny icon"


Even if the extension is shown, the shortcut will still easily fool a user into double clicking it.




Sneaky indeed.

19 January 2011

HTTP 502: The Uniform Resource Locator (URL) does not use a recognized protocol

I recently ran into this interesting problem on some TMG infrastructure that I had deployed for a while already.

Two sites are hosted on the same physical server on the same IP.  One on port 80 and the other on port 8080.  These sites are publish through TMG to the internal network by a single NIC server.  There are two rules and two separate listeners.

When a user attempt to connect to the published application on port 8080 they receive this:


When tracking this request from the TMG server I see this error message:

Failed Connection Attempt
Status: 12006 The Uniform Resource Locator (URL) does not use a recognized protocol. Either the protocol is not supported or the request was not typed correctly. Confirm that a valid protocol is in use (for example, HTTP for a Web request).
Source: Internal (10.40.46.14:36448)
Destination: 10.40.20.139:8080
Request: GET http://addresss:8080/favicon.ico
Filter information: Req ID: 205ba366; Compression: client=Yes, server=No, compress rate=0% decompress rate=0%
User: anonymous

The problem is that this TMG server also acts as a proxy server,  The proxy is hosted on port 8080 as it should be.  A conflict therefore exists when a non proxy requests gets made on the same port.  Since proxy cannot be restricted to a single IP it effectively listens on all the NLB IPs.

To resolve this issue you can publish your server application on another port other than 8080 and use bridging to bring the request back to where the server expects it to land.

The correct way to fix it is to publish the server application on another IP and/or host header.  This will then allow you to publish on the standard port 80 without causing a similar conflict.

For more information you can always check out this article on technet  http://technet.microsoft.com/en-us/library/bb794799.aspx

17 January 2011

Publish Exchange 2010 Outlook anywhere with TMG and UAG

Found a link to this 58 page whitepaper from MS

It covers load of info inclusing on which option to choose. So have a look

Extract:

When you publish Exchange, Microsoft offers two software-based options: Microsoft Forefront Threat Management Gateway 2010 and Microsoft Forefront Unified Access Gateway 2010. Both options offer publishing wizards and security features to provide secure access to Exchange when it's accessed from outside the safety of the corporate network. This white paper provides detailed information about publishing Microsoft Exchange Server 2010 using Forefront TMG or Forefront UAG, including how to choose between them for different scenarios, and provides specific steps you can take to configure Forefront TMG and Forefront UAG to publish Exchange 2010 while using NTLM authentication for Outlook Anywhere access.




Download this doc from


12 January 2011

Fix my IT system Milestone

Hi everyone

Today marks a bit of a milestone day for the humble little Fix my IT system blog.  When I started off I wanted to give back something to the larger IT community.  I have often thanked my luck stars for finding an article or two that helped me out of a sticky situation.  I wanted to provide that to some of you out there, without taking things too seriously.

Since the blog started off in August 2010 there has been over 10 000 hits on the site, from more than 115 countries on 6 continents!


Some of the product producers I covered have also Twitted and referenced some of the articles, (the good ones.)

To mark this milestone I finally caved and registered the fixmyitsystem.com domain.  I have added some some pages to let you all know about the professional services that me and my colleagues can provide.

Added a new favicon, cleaned up some of the blogger badging. I also had a logo made by someone with way more artistic experience than me. Thank you Curtis.



To everyone out there, thanks for the support and the return visits.  For the RSS subscribers, I hope that you enjoy checking out the regular posts and find them useful.

Please add a comment (if you have anything good to say)

Thanks again

Etienne Liebetrau

Sophos products is nearing retirement notification and how to get rid of it

It is very nice of Sophos to let you know that your product is nearing retirement and that you need to get it's pension plan in order.  If however you cannot immediately deploy the newer version, you might want to turn of this notification for your 11 000 machines until you can deploy.



Here is how:

http://www.sophos.com/support/knowledgebase/article/112120.html and
http://www.sophos.com/support/knowledgebase/article/13112.html

The bit that they don't tell you about, is the sensitivity of creating the savconf.xml file and note, it is savconf and not savconfIg.

As prescribed method for updating a CID is probably the only one you will want to use.


  • Create a new file and name it savconf.xml
  • Copy and paste the following text into the file and save.


<?xml version="1.0" encoding="utf-8" ?>
<config xmlns="http://www.sophos.com/EE/EESavConfiguration">
<!-- Custom install configuration for SAV2K/XP/2003 -->
<inst:install xmlns:inst="http://www.sophos.com/SAVXP/SavInstallConfiguration"
xmlns="http://www.sophos.com/SAVXP/SavInstallConfiguration">
<endOfLife>
<disableMessaging>true</disableMessaging>
</endOfLife>
</inst:install>
</config>




Make sure you have not empty spaces or carriage return after the </config>

  • Copy the file into the CID's savxp folder. By default it would be C:\Program Files\Sophos Sweep for NT\SAVSCFXP
  • Run C:\sec311\Tools\configcid.exe "C:\Program Files\Sophos Sweep for NT\SAVSCFXP"
  • Look at the result and make sure the following lines show up.
  1. Adding entry for \savxp\savconf.xml
  2. Adding entry for \savconf.xml
This should now disable the message on the client machines after a successful update.

If you still have problems give me a shout.