Full-Text Catalogs and Indexes
A Microsoft® SQL Server™ 2000 full-text index provides efficient support for sophisticated word searches in character string data. The full-text index stores information about significant words and their location within a given column. This information is used to quickly complete full-text queries that search for rows with particular words or combinations of words.
Full-text indexes are contained in full-text catalogs. Each database can contain one or more full-text catalogs. A catalog cannot belong to multiple databases and each catalog can contain full-text indexes for one or more tables. A table can only have one full-text index, so each table with a full-text index belongs to only one full-text catalog.
Full-text catalogs and indexes are not stored in the database to which they belong. The catalogs and indexes are managed separately by the Microsoft Search service.
A full-text index must be defined on a base table; it cannot be defined on a view, system table, or temporary table. A full-text index definition includes:
- A column that uniquely identifies each row in the table (primary or candidate key) and does not allow NULLs.
- One or more character string columns covered by the index.
The full-text index is populated with the key values. The entry for each key has information about the significant words (noise-words or stop-words are stripped out) that are associated with the key, the column they are in, and their location in the column.
Formatted text strings, such as Microsoft® Word™ document files or HTML files, cannot be stored in character string or Unicode columns because many of the bytes in these files contain data structures that do not form valid characters. Database applications may still have a need to access this data and apply full-text searches to it. Many sites store this type of data in image columns, because image columns do not require that each byte form a valid character. SQL Server 2000 introduces the ability to perform full-text searches against these types of data stored in image columns. SQL Server 2000 supplies filters that allow it to extract the textual data from Microsoft Office™ files (.doc, .xls, and .ppt files), text files (.txt files), and HTML files (.htm files). When you design the table, in addition to the image column that holds the data, you include a binding column to hold the file extension for the format of data stored in the image column. You can create a full-text index that references both the image column and the binding column to enable full-text searches on the textual information stored in the image column. The SQL Server 2000 full-text search engine uses the file extension information from the binding column to select the proper filter to extract the textual data from the column.
Full-text indexing is the component that implements two Transact-SQL predicates for testing rows against a full-text search condition:
Transact-SQL also has two functions that return a set of rows that match a full-text search condition:
Internally, SQL Server sends the search condition to the Microsoft Search service. The Microsoft Search service finds all the keys that match the full-text search condition and returns them to SQL Server. SQL Server then uses the list of keys to determine which table rows are to be processed.