Click to Rate and Give Feedback
MSDN
MSDN Library
Windows Search
Windows Search
 Developing Filters for Windows Sear...

  Switch on low bandwidth view
Developing Filters for Windows Search

[This documentation is preliminary and is subject to change.]

Microsoft Windows Search uses filters to extract the content of items for inclusion in a full-text index. You can extend Windows Search to index new or proprietary file types by writing filters to extract the content and property handlers to extract the properties of files. Filters are associated with file types, as denoted by file extensions, MIME types or class identifiers (CLSIDs). While one filter can handle multiple file types, each type works with only one filter.

This topic supplements the IFilter reference and Using Custom Filters with Indexing Service overview topics with information specific to Windows Search. Please read the aforementioned documentation before you begin.

This topic contains the following sections:

 

Indexing Service-Specific Implementation Information

The following information found in the Indexing Service IFilter documentation is not applicable to writing filters for Windows Search or Windows Vista:

  • Windows Search does not supply the same helper functions as Indexing Service.
  • Filters for Indexing Service run in the protocol host process (cidaemon.exe) under the Local System security context, while filters for Windows Search run in a filter isolation process (SearchFilterHost.exe) under the Local System security context with restricted rights.

In this filter isolation process, a number of rights are removed:

  • Restricted Code
  • Everyone
  • Local
  • Interactive
  • Authenticated Users
  • Built-in Users
  • Users' security identifier (SID)

The removal of these rights means the filter does not have access to the disk system or network or to any user interface or clipboard functions. Furthermore, the isolation process runs under a job object that prevents child processes from being created and imposes a 100 MB limit on the working set. The filter host isolation process increases the stability of the indexing platform, due to the possibility of incorrectly implemented 3rd party IFilters.

 

Windows Search-Specific Implementation Information

Implementation Information for Windows 7

In Windows 7 and later, there is new behavior when registering an IFilter, property handler, or new extension. When a new property handler and/or IFilter is installed, files with the corresponding extensions are automatically re-indexed.

In Windows 7 it is recommended that an IFilter is installed in conjunction with its corresponding property handlers, and that the IFilter is registered before the property handler. The registration of the property handler initiates immediate re-indexing of previously indexed files without first requiring a reboot, and takes advantage of any previously registered IFilter(s) for the purpose of content indexing.

If only an IFilter is installed, without a corresponding property handler, then automatic re-indexing occurs either after a restart of the indexing service, or a reboot of the system.

For property description flags specific to Windows 7, see the following reference topics:

Implementation Information for Windows Vista

In Windows Vista and earlier, and an installation of an IFilter or property handler will not initiate a re-indexing of existing items unless an Independent Software Vendor (ISV) explicitly calls a rebuild or reindex of matching URLs .

There are two major differences between legacy applications like Indexing Service and newer applications like Windows Search that you should be aware of when implementing filters: use of IPersistStream, and use of property handlers.

Firstly, Windows Vista and Windows Search 3.0 and later require you use IPersistStream for the following reasons:

  • To ensure performance and future compatibility.
  • To help increase security. Filters implemented with IPersistStream are more secure because the context in which the filter runs does not need the rights to open files on the disk or over the network.

While Windows Search uses only IPersistStream, you can also include IPersistFile Interface and/or IPersistStorage Interface implementations in your filters for backward compatibility.

The second major difference is that Windows Vista and Windows Search 3.0 and later have a new Property System that uses property handlers to enumerate properties of items.

However, there are times when you need to implement a filter that handles both content and properties in order to:

  • Support legacy MSSearch implementations
  • Traverse links
  • Preserve language information
  • Recursively filter embedded items

In these situations, you need a full filter implementation, including the IFilter::GetValue method to access property values.

Other Implementation Notes

Native versus Managed Code

Filters must be written in native code due to potential CLR versioning issues with the process that multiple add-ins run in. In Windows 7 and later, filters written in managed code are explicitly blocked.

Property Size Limitations

There are two potential limitations on property size: the maximum size of data that Windows Search accepts per file, and the maximum size per property as defined in the property description file. Currently, Windows Search does not use the defined property size when calculating the amount of data it accepts from a file. Instead, the limit Windows Search uses is the product of the size of the file and the MaxGrowFactor (file size N * MaxGrowFactor) read from the registry at HKEY_LOCAL_MACHINE->Software->Microsoft->Windows Search->Gathering Manager->MaxGrowFactor. The default MaxGrowFactor is four (4).

Consequently, if your file type tends to be small in total size but have larger properties, Windows Search may not accept all the property data you want to emit. However, you can increase the MaxGrowFactor to suit your needs.

Methods Not Implemented

There are a number of methods that filters need not implement:

Interface::MethodDescription
IPersistStream::IsDirtyFilters should return E_NOTIMPL
IPersistStream::SaveFilters should return E_NOTIMPL
IPersistStream::GetSizeMaxFilters should return E_NOTIMPL
IFilter::BindRegionFilters should return E_NOTIMPL

 

IFilter::Get Chunk and Locale Code Identifiers (LCIDs)

The language code identifier (LCID) of text can change within a single file. For example, the text of an instruction manual might alternate between English (en-us) and Spanish (es) or the text may include a single word in a language other than the primary language. In either case, your filter must begin a new chunk each time the LCID changes.

Furthermore, because the LCID is used to choose an appropriate word breaker, it is very important that you correctly identify it. If the filter cannot determine the locale of the text, it should assume the default system locale, available by calling GetSystemDefaultLCID. If you control the file format and it currently does not contain locale information, you should add a user feature to enable proper locale identification. Using a mismatched word breaker can result in a poor query experience for the user. See IFilter::GetChunk.

 

Ensuring Your Items Get Indexed

Now that you've implemented your filter, you want to make sure the items your filter is registered for get indexed. You can use the Catalog Manager to initiate re-indexing, and you can also use the Crawl Scope Manager to set up default rules indicating the URLsthat you want the Indexer to crawl. Another option is to follow the ReIndex sample in the Windows Search SDK Samples.

For further information, refer to Using the Catalog Manager and Using the Crawl Scope Manager.

 

Registering Filters for Windows Search

You need to make a total of four entries in the registry to register your filter add-in, where:

  • PH_CLSID is the PersistentHandler CLSID for this extension's file type
  • IF_CLSID is the CLSID for the IFilter implementation
  • 89BCB740-6119-101A-BCB7-00DD010655AF is the IFilter interface GUID, which is a constant for all IFilter implementations

First, register the persistent handler for the file extension with the following key and value:

HKEY_CLASSES_ROOT\<.ext>\PersistentHandler
     (Default) = <PH_CLSID>

Second, register the IFilter implementation with the following keys and values:

HKEY_CLASSES_ROOT\CLSID\<IF_CLSID>
     (Default) = <Filter Description>

HKEY_CLASSES_ROOT\CLSID\<IF_CLSID>\InprocServer32
     (Default) = <DLL Install Path>
     ThreadingModel = Both

HKEY_CLASSES_ROOT\CLSID\<PH_CLSID>\PersistentAddinsRegistered\{89BCB740-6119-101A-BCB7-00DD010655AF}
     (Default) = <IF_CLSID>
Note  We recommend registering your filter's threading model as Both. Associating your persistent handler with a specific file extension is the preferred method for registration.

 

Legacy Issues

As noted earlier, Windows Vista and Windows Search include a new property system that encapsulates an item's properties from its content. This property system does not exist in earlier versions of Microsoft Windows Desktop Search (WDS) 2.x. If your filter must support other applications as described above, it may need to handle both content and properties.

For more information on developing a compatible filter, see the following:

Related Topics

Tags What's this?: Add a tag
Community Content   What is Community Content?
Add new content RSS  Annotations
Filter host security context?      alegr1   |   Edit   |   Show History

"while filters for Windows Search run in a filter isolation process (SearchFilterHost.exe) under the Local System security context with restricted rights"

Do you mean Local Service security context instead?

Tags What's this?: Add a tag
Flag as ContentBug
Processing
© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Page view tracker