Filtering Supported File Types
When a cell in an image column contains one of certain types of documents, full-text search uses a filter to interpret the binary data. The filter extracts the textual information from the document and submits it for indexing and subsequent querying.
Microsoft® SQL Server™ 2000 includes filters for these file extensions: .doc, .xls, .ppt, .txt, and .htm.
Many document types can be stored in a single image column. For each document, SQL Server chooses the correct filter based on the file extension. Because the file extension is not visible when the file is stored in an image column, the file extension must be stored in a separate column on the table. This type column can be of any character-based data type and contains the document file extension, such as .doc for a Microsoft Word document. If the type column is NULL, the document is assumed to be a text file (.txt).
Note For full-text indexing, a document must be less than 16 megabytes (MB) in size and must not contain more than 256 kilobytes (KB) of filtered text.
The document-type column is created in these ways:
- In the Full-Text Indexing Wizard, select the image column for indexing, and then specify a Binding column to hold the document type.
- The sp_fulltext_column stored procedure accepts an argument for the column to contain the document types.
To view the document type, use the sp_help_fulltext_columns stored procedure to return the column name and column ID.
After the image column is indexed, it can be queried using the search predicates CONTAINS and FREETEXT.
Note A filter may be able to handle objects embedded in the parent object, depending on its implementation. SQL Server does not configure filters to follow links to other objects.
You can create custom filters for full-text indexing of additional file types. For more information about creating custom filters, search on "custom filters" in the Platform SDK section of the MSDN® Library at Microsoft Web site.