Registering IFilters in SharePoint Portal Server 2003

Microsoft SharePoint Portal Server Search (SharePointPSSearch) uses filters to extract the contents of documents in the document library and external content sources. You can install a special DLL, referred to as an IFilter, to access documents in their native formats. The name IFilter is derived from the name of the COM interface that IFilters implement.

Microsoft Office SharePoint Portal Server 2003 provides IFilters for the following items:

  • Microsoft Office Word (.doc) files
  • Microsoft Office Excel (.xls) files
  • Microsoft Office PowerPoint (.ppt) files
  • MIME
  • Text files
  • XML files
  • HTML files
  • TIFF files

IFilters run in the process of the Filter Daemon and extract the content and properties from specific file types. For example, the HTML filter strips a document of all of the HTML tags and emits body text in addition to properties representing the various HTML elements, such as "Title". IFilters are registered to document types, as identified by file extension, MIME type, content type, and class ID, and are independent of the protocol handler that uses them.

Note  The Filter Daemon process used by SharePoint Portal Server 2003 is Mssdmn.exe.

For additional information about how to create IFilters, see How to Write a Filter for Use by SharePoint Portal Server 2003 and Other Microsoft Search-Based Products. The Microsoft Platform Software Development Kit (SDK)provides additional information about IFilter interfaces.

Registering the IFilter

During the IFilter self-registration process, standard Indexing Service IFilters (using regsvr32.exe) bind to file system extensions that they can process. For more information about registering an IFilter through Indexing Service, see Persistent Handlers in the Indexing Service 3.0 SDK.

SharePoint Portal Server provides additional flexibility and capabilities for IFilter registration through the IFilterRegistration and IFilterRegistration2 interfaces.

Methods of the IFilterRegistration and IFilterRegistration2 interfaces are called in the DLLRegisterServer and DLLUnregisterServer export functions of your custom IFilter. They support the standard mapping available for previous IFilter versions, such as mapping an IFilter to a file name extension. In addition, an IFilter can be mapped to a particular MIME content type and to particular file types for a specified index. The IFilterRegistration and IFilterRegistration2 interfaces also support setting an IFilter as the default filter for a data source.

When SharePointPSSearch is crawling documents, if an IFilter is registered for the specific document type being crawled, SharePointPSSearch applies it to the document. If a registered IFilter is not available for that document type, SharePointPSSearch uses the default IFilter. Binding features specific to that SharePoint Portal Server computer are only available to IFilters that register for these features. The IFilters that are registered for SharePointPSSearch are examined first, followed by IFilters registered globally.

The location in the registry for SharePoint Portal Server 2003 IFilters is HKLM\software\Microsoft\SPSSearch\ContentIndexCommon\Filters.

The ILoadFilter interface provides additional methods that you can use to allow IFilters to load other IFilters. This functionality is useful if you are constructing IFilters that filter documents with one or more embedded or attached items.

Introduction to Protocol Handlers