Export (0) Print
Expand All
Creating Audit Tables, Invoking COM Objects, and More
Exception-handling Techniques
Exploring SQL Server Triggers: Part 2
Deliver User-Friendly Reports from Your Application with SQL Server Reporting Services
SQL Server: Display Your Data Your Way with Custom Renderers for Reporting Services
Updating Data in Linked Servers, Information Schema Views, and More
XML Features in SQL Server 2000
Expand Minimize

Building Search Applications for the Web Using Microsoft SQL Server 2000 Full-Text Search

SQL Server 2000
 

Andrew B. Cencini
Microsoft Corporation

December 2002

Applies to:
    Microsoft® SQL™ Server 2000

Summary: Learn how to best utilize SQL Server 2000's Full-Text Search. This article includes several tips and tricks for maximum throughput and performance. (17 printed pages)

Contents

Introduction
A Little About Full-Text Search
Configuring Full-Text Search
Full-Text Queries
Ranking and Optimizations
Additional Performance Tricks
Conclusion
Appendix A: Implementing Best Bets for Full-Text Searches
Appendix B: A Sample Application Using Best Bets, Results Paging, and Efficient Full-Text Query Logic
Appendix C: Resources

Introduction

The Full-Text Search feature of Microsoft® SQL™ Server 2000 allows you to perform fast and flexible queries against indexes built on unstructured text data. A common use of Full-Text Search is that of the search engine for web sites. There are a number of concepts and abstractions that are useful in understanding the best way to utilize Full-Text Search, as well as several tips and tricks for optimizing your Full-Text indexes and queries for maximum throughput and performance.

A Little About Full-Text Search

Full-Text Search, as a feature, was introduced in SQL Server 7.0. The core engine of Full-Text Search is provided by means of Microsoft Search (MSSearch) technology, which is also used in products such as Microsoft Exchange, and Microsoft SharePoint™ Portal Server.

The functionality exposed in SQL Server 7.0 Full-Text Search provides basic text search abilities, and uses an earlier version of MSSearch. SQL Server 2000's Full-Text Search implementation delivers a robust set of index and query features, and several enhancements in addition to those included with SQL Server 7.0. Among those enhancements are: full support for clustering via Microsoft Clustering Service, the ability to filter and index documents stored in IMAGE columns, improved language support, and performance, scalability and reliability improvements.

MSSearch builds, maintains and queries Full-Text indexes stored in the file system (as opposed to inside of SQL Server). The logical and physical storage unit used for Full-Text indexes by MSSearch is a catalog. A Full-Text catalog contains one or more Full-Text indexes per database—one Full-Text index may be created per table in SQL Server, and you may include one or more columns from that table in the index. Each table may belong to only one catalog, and only one index may be created on each table. We'll get into best practices in terms of organizing your Full-Text catalogs and indexes in a little bit—but first, a little more on how Full-Text Search works.

Configuring Full-Text Search

To create a Full-Text index on your text data stored in SQL Server, there are a few steps you should take to get ready. The first step is to Full-Text enable the database that contains textual data you wish to index (if you have not done so already).

Caution   Executing the following statement will drop and re-create all Full-Text catalogs belonging to the database on which you wish to enable Full-Text Search. Be sure you haven't already created any Full-Text catalogs in the particular database you are enabling unless you want them rebuilt.

Provided you are a member of the sysadmin role or db_owner for that particular database, you can go ahead and issue the following statement:

use Northwind 
exec sp_fulltext_database 'enable'

Next, you will want to create a Full-Text catalog to store your Full-Text indexes. As I said before, the data in this catalog lives in your file system as opposed to within SQL Server, and therefore you should choose carefully when you consider where your Full-Text catalog will be stored. Unless you specify otherwise, your Full-Text catalog will be stored in a subdirectory of the FTDATA directory that resides in your Microsoft SQL Server\MSSQL storage location. Here is how you may go about creating a Full-Text Catalog in a non-default location:

exec sp_fulltext_catalog 'Cat_Desc', 'create', 'f:\ft'

In this case, your Full-Text catalog will be created as a subdirectory of 'f:\ft', and if you take a look at that part of your file system, you will see it there as its own directory. The naming convention used for Full-Text catalogs by MSSearch is:

SQL+dbid+catalogID

Catalog IDs start at 00005 and are incremented by one for each new catalog created.

As a best practice, if possible, Full-Text catalogs should be created on their own physical drive (or drives). Given the process of building a Full-Text index is fairly I/O intensive (on a high level, it consists of reading data from SQL Server, and then writing the index to the file system), you probably want to avoid letting your I/O subsystem become a bottleneck.

So, how big are your Full-Text catalogs? Generally, Full-Text catalogs add about 30% overhead for the amount of data stored in SQL Server that you are Full-Text indexing. This rule of thumb, however, is dependent on the distribution of unique words (or keys) in your data, as well as which words you consider noise words Noise words, or stop words, are terms that are excluded from Full-Text indexes and queries because they are not 'interesting' search terms that have a high rate of occurrence, and only bloat your indexes. Later, we will get into some considerations in terms of noise word selection, and how you can possibly tune your noise words to improve your query performance.

If you have not done so already, create a unique, single-column, non-nullable index on each table upon which you plan to build a Full-Text index. This unique index is used to map each row in your table to a unique, compressible key used internally by MSSearch. Next, you want to let MSSearch know that you wish to create a Full-Text index on your tables. Issuing the following statement for your table will add it to a Full-Text catalog of your choice (in this case, 'Cat_Desc', which we created above):

exec sp_fulltext_table 'Categories', 'create', 'Cat_Desc',
   'PK_Categories'

Adding columns to this Full-Text index is the next step. For each column, you may optionally choose a language, and if the column is of type IMAGE, you must specify another column that will be used to indicate what type of document is stored in each row of your IMAGE column.

There are some important—and not well-documented—considerations for choosing the column language. These considerations relate to how your text is tokenized and then indexed by MSSearch. Text being indexed is fed through a component called a wordbreaker that tokenizes on word boundaries. These word boundaries, in the English language, are typically whitespace or some form of punctuation. In other languages, such as German, words or characters may be combined together; therefore, your choice of a column-level language should represent the language you expect will be stored in rows of that column. If you are unsure, a general best bet is to use the neutral wordbreaker, which performs its tokenization purely on whitespace and punctuation. An additional benefit of your column-level language choice is "stemming". Stemming in Full-Text queries is defined as the process of searching for all stemmed (inflectional) forms of a word in a particular language.

Another consideration in language choice is related to the way in which your data is represented. For non-IMAGE column data, no special filtering is performed. Rather, the text is generally passed through the wordbreaking component as-is. Wordbreakers are designed mainly to process written text. So, if you have any type of markup (such as HTML) on your text, you may not get great linguistic accuracy during indexing and search. In that case, you have two choices—the preferred method is simply to store the text data in an IMAGE column, and to indicate its document type so it may be filtered. If this is not an option, you may consider using the neutral wordbreaker and, if possible, adding markup data (such as 'br' in HTML) to your noise word lists. You will not receive the benefit of any language-based stemming on a column with the neutral language specified, but certain circumstances may dictate this choice.

Now that you know what your column-level options are, try adding a column or two to your Full-Text index, by issuing the following:

exec sp_fulltext_column 'Categories', 'Description', 'add'

You may notice I did not specify any language here—in this case, the default Full-Text language will be used. You may set the default Full-Text language for your server via the system stored procedure "sp_configure".

With all of your columns added to your Full-Text index, you're now ready to start a population. There is much that can be said about the various options you have when it comes to population methods, and I'll avoid going into too much detail—but for this example, you should simply start a full population on your table, and wait for it to complete:

exec sp_fulltext_table 'Categories', 'start_full'

You may wish to monitor your population status using either the FULLTEXTCATALOGPROPERTY or OBJECTPROPERTY functions. To get your catalog population status, you can execute:

select FULLTEXTCATALOGPROPERTY('Cat_Desc', 'Populatestatus')

Typically, if a full population is in progress, the result returned is '1'. For more details on how to use FULLTEXTCATALOGPROPERTY and OBJECTPROPERTY, see SQL Server Books Online.

Full-Text Queries

Querying Full-Text indexes is slightly different than executing standard relational queries in SQL Server. Since your indexes are stored and managed external to SQL Server, Full-Text query processing is in large part handled by MSSearch—therefore, queries that are partially relational in nature, and partially Full-Text based will be processed separately—which can sometimes hurt performance.

Essentially, when you execute a Full-Text query, the query terms are passed to MSSearch, which traverses its internal data structures (the indexes), and it returns a key and rank value to SQL Server. You generally do not see the key or rank values when you execute a CONTAINS or FREETEXT query; however, if you execute a CONTAINSTABLE or FREETEXTTABLE query, you are provided with these values—which are typically then joined against the base table. The process of joining keys against the base table can be quite expensive—shortly, we'll present some clever ways to minimize or completely avoid this join.

If you've been thinking ahead, and know a little bit about how Full-Text queries return data, you may surmise that CONTAINS/FREETEXT queries are simply performing a CONTAINSTABLE/FREETEXTTABLE query and joining against the base table. Your understanding should lead you to avoid using those types of queries unless the cost of not doing so would be greater. In the case of web search applications, using CONTAINSTABLE and FREETEXTTABLE is much better than using the TABLE-less cousins.

So far, you know that Full-Text queries are special ways of accessing data from MSSearch indexes stored outside of SQL Server, and that you can get in a bit of trouble by blindly joining against your base table. Another key thing to know is the actual difference between CONTAINS-style queries and FREETEXT-style queries.

CONTAINS queries perform an exact match for all terms you are searching for. Whether you are simply looking for a single word, or all words beginning with 'orange', you will only be returned results that contain all of your search terms. Along these lines, CONTAINS queries are quite fast, as they typically return fewer results, and generally don't need to perform too much additional processing. Some downsides of CONTAINS queries include the pesky problem of noise word filtering. Experienced developers and DBAs who have worked with Full-Text Search in the past have encountered the dreaded "Your query contains only noise words" error when trying to match words or phrases that include even a single noise word. One way to avoid receiving this error is to filter out noise words prior to executing Full-Text queries. It is not possible to return results to a CONTAINS query containing noise words, as queries of this type are to return exact matches to your entire query string. Since noise words are not Full-Text indexed, no rows may be returned on a CONTAINS query that includes noise words.

FREETEXT queries overcome all of the caveats that occasionally occur in CONTAINS queries. When you issue a FREETEXT query, you are essentially issuing a stemmed any-words query. Therefore, when you search for "root beer", both 'root' and 'beer' are stemmed out to all of their forms (stemming is language specific; the language used is determined by the Full-Text column language specified at indexing time, and must be the same across all the queried columns), and any row that matches at least one of those terms will be returned.

The side effect of FREETEXT queries is that they have a tendency to use more CPU than CONTAINS queries—the stemming and larger possible set of returned results, combined with a more complicated calculation of rank are the culprits. Nevertheless, FREETEXT-based queries are amazingly flexible, still extremely fast, and quite often the best way to go for web-based search applications.

Ranking and Optimizations

I often meet with users of Full-Text Search, and they ask me what the ranking numbers mean, and how they can translate them to some sort of human-understandable value. There is a short answer and a long answer that can be given to this question, but I'll stick to the short one to be brief. Basically, those ranking numbers are not as significant as the order in which the results are returned. What this means is that when you order your results by rank, you are always returning the most relevant results first. The rank values themselves are prone to change—Full-Text Search uses a probabilistic ranking algorithm, which means the relevance of each document you are returning is directly effected by any and all other documents in your Full-Text index.

One trick some people think helps increase the rank of certain rows is to repeat commonly used search keywords in the Full-Text indexed columns of those rows. While this may, to a certain degree, help improve the chances of those rows being returned first for certain keywords, it may backfire in other cases—and also puts you at a moderate risk of hurting query performance for those terms. A better solution would be to implement a "Best Bets" system for your search application (see example below) so you may be guaranteed that certain documents are returned first. The problem with extensive duplication of keywords is that it can bloat your Full-Text indexes for those specific keywords, and cause MSSearch to spend more time than necessary in finding the right rows and calculating rank. If you have a huge amount of Full-Text indexed data, and have tried this tactic, you probably will find that some Full-Text queries take quite a while. If you are able to implement a more lean (and probably more accurate) "Best Bets" system, you are likely to find it will make a world of difference in your query performance.

Another problem related to extensive duplication of data is related to a commonly used trick to combine relational and Full-Text queries. This problem plagues many people using Full-Text Search and is encountered when one attempts to apply some sort of a filter to results returned from a Full-Text query. As I said, Full-Text queries return a key and a rank for each matching row—to garner any more information about those rows, one must perform a join against its base table. Since any number of results may possibly be returned from an unrestricted Full-Text query, that join may become quite costly. One clever way people have found to avoid that join is to simply add the data to be filtered (if possible) to their Full-Text index. In other words, if someone wants to search on the keyword of "Ichiro" from the body text of all articles in a newspaper, but only return articles that were in the sports section of the newspaper, queries are typically expressed similar to:

-- [APPROACH 1:]
-- most expensive: select all, then join and filter
SELECT ARTICLES_TBL.Author, ARTICLES_TBL.Body, ARTICLES_TBL.Dateline, 
   FT_TBL.[rank] 
FROM FREETEXTTABLE(Articles, Body, 'Ichiro') AS FT_TBL
INNER JOIN Articles AS ARTICLES_TBL
ON FT_TBL.[key] = ARTICLES_TBL.ArticleID
WHERE ARTICLES_TBL.Category = 'Sports'

-- [APPROACH 2:]
-- works, but can backfire and become slow or return inaccurate results: 
-- perform filtering via Full-Text and only extract key and rank 
-- (processing done at web server level)
SELECT [key], [rank] 
FROM CONTAINSTABLE(Articles, *, 'FORMSOF(INFLECTIONAL('Ichiro') 
      AND "sports"')

The problem with these queries is that they are either unnecessarily expensive, or run the risk of returning wrong results (in the second query, 'sports' is quite likely to occur in articles of all categories). There are other permutations of these techniques, but these are two very simple mockups. A suggestion I typically give, if it can be afforded, is that of some form of horizontal partitioning of data. In other words, each possible value for your 'categories' column could simply become its own column (or table), and searchable keywords relating to that article would be stored in only that column. Taking this approach, rather than having one single 'Body' column and a 'Category' column, you could get rid of the 'Category' column, and have a 'Body_<category>' column that would store the searchable keywords. An example below:

-- If you can adjust your schema this works great – each category 
-- becomes its own column (or table), and you only hit that 
-- smaller Full-Text index. There are obviously some caveats… 
SELECT [key], [rank] 
FROM FREETEXTTABLE(Articles, Body_Sports, 'Ichiro')

For those systems with a large amount of data that can accommodate this (perhaps major) schema change, a significant and appreciable performance improvement should become immediately apparent. There are obvious limitations when it comes to applying multiple or no filters; there are certainly other ways to work around those problems. From the example above, you can get a picture of one way to abstract some of your search conditions into your schema—in essence 'cheating' the optimizer (more appropriately, 'becoming' the optimizer) as there is little to no native optimization currently possible for Full-Text queries within SQL Server itself.

Additional Performance Tricks

Another request I commonly get from people I speak with is that of the ability to page through Full-Text query results. In other words, if I were to issue a query for "root beer", with the results to be displayed on a web page 40 at a time, I'd want to only return the 40 results for that particular page (if I were on page three, for example, I'd want to return only results 81-120).

There are several approaches I have seen when it comes to paging through results, but none of them have been 100% effective in terms of being very efficient. The approach I suggest lets you minimize the number of Full-Text queries you execute (in effect, only one per set of results to be paged through), and use your web server as a simple cache. On a high-level, what you should do is retrieve one complete rowset of keys and rank values for your Full-Text query (you can fold in Best Bets and abstract common filters to your schema if you desire), and store them in memory on your web server (depending on your application and load, imagine a typical key size of <32 bytes plus a rank size of <4 bytes = <36 bytes multiplied by a typical returned result set <1000 rows is <35K. Assume an actively cached set of <1000 active query result sets at any given time, and you will find that it occupies less than 35MB of RAM on the web server—not too shabby).

In order to page through the results, the process simply becomes traversing an array stored in memory on your web server and issuing a SELECT against your SQL Server for only the rows and columns you wish to display. This also gets back to the concept of only returning keys and ranks for Full-Text queries—a SELECT (even many of them) is many times faster than a Full-Text query. By using SELECT as opposed to joining many rows against the base table, combined with several other tactics, you will be able to reserve more CPU cycles on your SQL Server machines, and get more use of your (less expensive) web farm.

An alternate approach to web server-side caching is to cache result sets in SQL Server itself, and to define various methods for navigating through those results. Though this article focuses primarily on application design at the web server (ASP) level, the programmability features of SQL Server also provide a rich framework for building high-performance search applications for the web.

Conclusion

Microsoft SQL Server 2000's Full-Text Search capabilities present a robust, fast and flexible way to index and query unstructured text data stored in a database. Given the increasing popularity and importance of fast, accurate search capabilities in a variety of applications, it is important to implement your Full-Text Search solution in a way that takes advantage of its speed and precision. By distributing your computational load, and by organizing your data in some clever ways, you will save money on additional hardware and software, and prevent frustration caused by unnecessarily slow queries. While there are always many factors and considerations that play into developing great search applications, I hope that the information and examples I have provided will help you get started in building the best search application for the web using SQL Server 2000.

Appendix A: Implementing Best Bets for Full-Text Searches

One possible way to improve your Full-Text query performance and effectiveness is to implement a "Best Bets" system. This system is a very simple way to ensure that certain rows that match a particular query expression are returned before others. Best Bets, in the absence of some sophisticated preprogrammed logic (as is found in SharePoint Portal Server, for example), are typically handpicked.

For the purposes of this example, Best Bets are handpicked, and the unique key, and a number of keywords are stored in a separate table. A FREETEXTTABLE query is executed against the (vastly smaller) Best Bets table, and any results returned from that query are returned with the results of the FREETEXTTABLE query against the base table. Given these heuristics, all rows that are "Best Bets" will be returned first, followed by rows that are deemed most relevant by MSSearch, in decreasing order.

Below is a very simple sample script that creates a Best Bets system.

use myDb

create table documentTable(ftkey int not null, document ntext)
create unique index DTftkey_idx on documentTable(ftKey)

/*
   Insert documents here
   (all documents to be Full-Text indexed)
*/

-- Build Full-Text Catalog & Indexes for All Documents table
exec sp_fulltext_catalog 'documents_cat', 'create', 'f:\ftCats'
exec sp_fulltext_table 'documentTable', 'create', 'documents_cat', 
      'DTftkey_idx'
exec sp_fulltext_column 'documentTable', 'document', 'add'
exec sp_fulltext_table 'documentTable', 'start_change_tracking'
exec sp_fulltext_table 'documentTable', 'start_background_updateindex'

/*
   Now create best bets table and indexes
   (add documents that should always be returned first)
*/
create table bestBets(ftKey int not null, keywords ntext)
create unique index BBftkey_idx on bestBets(ftKey)

/*
   Insert best bets here
*/

-- Build Full-Text Catalog & Indexes for Best Bets table
exec sp_fulltext_catalog 'bestBets_cat', 'create', 'f:\ftCats'
exec sp_fulltext_table 'bestBets', 'create', 'bestBets_cat', 'BBftkey_idx'
exec sp_fulltext_column 'bestBets', 'keywords', 'add'
exec sp_fulltext_table 'bestBets', 'start_change_tracking'
exec sp_fulltext_table 'bestBets', 'start_background_updateindex'

First, a generic 'All Documents' table is created to store all documents to be Full-Text indexed. Typically, there will be other columns in the documents table, but for the purpose of this article, there are only two columns—the key index, and the document itself. A Full-Text catalog and index are created for the documents table.

Next, a 'Best Bets' table is created to store special documents that are to be returned first for all Full-Text queries. This table simply needs to have a Full-Text key column, and the document itself (a refinement on this strategy to help target certain documents for certain queries, could include adding additional keywords to the document that are not contained in the document itself). A Full-Text catalog and index are created for the best bets table.

The best bets table and documents table can either share documents (a document that is a best bet is also stored in the regular documents table, and they share the same key value), or the two can be mutually exclusive (best bets documents are stored only in the best bets table). For ease of retrieval, it may be easier to keep the best bets table exclusive from the documents table—doing so will eliminate the need to remove shared hits from the best bets and regular search results rowsets that are returned. On the other hand, it may not be practical to maintain documents in this way, in which case logic would need to be added to queries to remove shared documents between the returned rowsets.

Given the above tables, two stored procedures may be created to search against the best bets and documents tables. Logic at the web server level, or an additional stored procedure can be used to cache and present the desired results (see the next section for a complete efficient example of caching, presentation and paging when used with best bets).

First, a stored procedure to retrieve best bets rows, if any:

create procedure BBSearch @searchTerm varchar(1024) as

select [key], [rank] from freetexttable(bestBets, keywords, @searchTerm) order by [rank] desc

Ensure that the incoming search string has been cleaned to prevent arbitrary execution of T-SQL on the server, and ensure the string is enclosed in single quotes. Using FREETEXTTABLE is preferable over using CONTAINSTABLE in this case, as FREETEXTTABLE will utilize stemming and find best bets that match any of the search terms.

Next, a second stored procedure retrieves documents that match regular search criteria, if any:

create procedure FTSearch @searchTerm varchar(1024) as

select [key], [rank] from freetexttable(documentTable, keywords, @searchTerm) order by [rank] desc

Again, ensure the incoming search string has been cleaned, and that it is surrounded by single quotes.

When executing these stored procedures, the same search term should be passed into both, with the best bets search being executed first, followed by the regular Full-Text search. The next section provides a broader overview of how best bets may be used alongside other Full-Text Search techniques in building web search applications.

Appendix B: A Sample Application Using Best Bets, Results Paging, and Efficient Full-Text Query Logic

In this example, we implement a web search application that takes advantage of almost all of the optimizations I have discussed in this article. We use the simple scenario of a search engine for an online retailer's catalog, and assume a high volume of traffic where customers are all expecting their results in a very short response time. The best bets tables and stored procedures from the previous section are used in this example.

This application is just a simple example of some higher-level tactics that may be used to achieve the best possible Full-Text Search performance. This example uses ASP, but ISAPI, ASP.NET, or other platforms may also be used to implement similar solutions with their own set of strengths and weaknesses. Use of the session object is not always recommended for all applications, and if used improperly, can be somewhat dangerous. In this case, we use the session object to implement a quick-and-dirty caching mechanism—there are many other ways to implement that functionality at various levels.

Below is the generic code for the ASP page:

<% @Language = "VBScript" %>
<% Response.buffer = true %>
<html>
   <head>
      <title>FT Test</title></head>
   <body>
<pre>
----------------- Begin Test ------------------

<%

Dim firstRow   ' if paging thru rows, the element to start with
Dim lastRow      ' last row when paging
Dim pageSize   ' size of page (how many rows at a time)
Dim cn      ' connection object
Dim rs      ' resultset for FT key/rank (reused)
Dim useCache   ' use cache or hit FT (0, don't use; 1, use)
Dim alldata      ' results rowset to be cached
Dim bbdata      ' best bets rowset to be cached
Dim connectionString   ' sql connection string

' determine whether or not to pull from cache
' default to false, otherwise take what's coming in
if (request.Form("useCache") <> "") then
   useCache = request.Form("useCache")
elseif (request.QueryString("useCache") <> "") then
   useCache = request.QueryString("useCache")
else
   useCache = 0
end if

' set constants
pageSize = 24
firstRow = 0
lastRow = 23
connectionString = <your connection string here>

'----------------------------------------------------------------'
' displays a simple key/rank pair matching best bets/search term '
'----------------------------------------------------------------'
Private Sub SearchNPage()

   Dim p         ' counter for looping thru rows
   Dim numRows      ' number of rows, total, in cache/resultset

   if (useCache <> "1") then ' pull bestbets/results and cache them

      Dim queryArg   ' incoming query term
      if (request.Form("searchTerm") <> "") then
         queryArg = request.Form("searchTerm")
      elseif (request.QueryString("searchTerm") <> "") then
         queryArg = request.QueryString("searchTerm")
      else
         response.Write("No search term provided" & VbCrLF)
         exit sub
      end if      

      ' ideally we should clean our query term here...
      ' add your custom cleaning logic to prevent arbitrary
      ' sql execution

      ' call CleanString(queryArg)

      ' establish connection to SQL
      Set cn = Server.CreateObject("ADODB.Connection")
      cn.Open connectionString

      ' get best bets matches passing in clean string
      set rs = cn.Execute("exec BBSearch '" & queryArg & "'")

      ' get best bets, if any
      if not(rs.EOF) then
         bbData = rs.GetRows
      end if

      ' now get regular matches passing in clean string
      set rs = cn.Execute("exec FTSearch '" & queryArg & "'")

      ' if we return nothing from anything, bail out
      if (rs.EOF and IsEmpty(bbdata)) then
         response.Write("No rows matched" & VbCrLF)
         call ConnClose
         exit sub
      end if

      ' otherwise, grab rows, if any
      if not(rs.EOF) then
         alldata = rs.GetRows
         Session("results") = alldata
      end if

      call ConnClose

   else ' load up from cache (usecache=1)

      alldata = Session("results")

      ' also get a row range to use here
      if (request.Form("firstRow") <> "") then
         firstRow = request.Form("firstRow")
         lastRow = firstRow+pageSize
      elseif (request.QueryString("firstRow") <> "") then
         firstRow = request.QueryString("firstRow")
         lastRow = firstRow+pageSize
      end if

   end if ' useCache<>TRUE

   ' for this app, we'll just print out all best bets 
      ' (may be larger than pagesize), then page thru regular results
      ' we make the assumption that if we're using the cache we've 
' previously shown best bets-if there are no best bets move on
   if not(IsEmpty(bbdata)) then
      response.Write("BEST BETS:" & VbCrLf)
      for p = 0 to ubound(bbdata, 2)
response.Write(bbData(0,p) & " "  & bbData(1,p) & VbCrLf)
      next
      response.Write(VbCrLf)
   end if

   ' return search results if any (there may only be best bets)
   if not(IsEmpty(alldata)) then
      if uBound(alldata, 2) < lastRow then
         lastRow = uBound(allData, 2)
      end if

      response.Write("SEARCH RESULTS:" & VbCrLf)

      for p = firstRow to lastRow
response.Write(allData(0,p) & " "  & allData(1,p) & VbCrLf)
      next
   end if  ' not(IsEmpty(alldata))

End Sub

'----------------------------------------------------------------'
' close and clean connection objects                  '
'----------------------------------------------------------------'
Private Sub ConnClose
   rs.Close
   Set rs = Nothing
   cn.Close
   Set cn = Nothing
End Sub

call SearchNPage

%>

---------------- Test Complete ----------------

<form action="<this page>" method="post">
<input type=submit value="next <%=pageSize%> rows" NAME="Submit1">
<input type=hidden name="useCache" value="1">
<input type=hidden name="firstRow" value=<%=lastrow+1%>>
</form>

</pre>
   </body>
</html>

A simple HTML form page can drive the above script as such:

<html>
<head><title>Enter search terms</title>
</head>

<body>

<form action="<search asp page>" method="post">
Search Term: <input name="searchTerm">
<p>
<input type="submit" value="Search">
</form>

</body>
</html>

As can be seen by the above two code samples, it doesn't take much effort to create a web application that can execute an efficient Full-Text query (complete with best bets), and cache and page through results. Logic can be added with minimal overhead to provide additional data, enhance the appearance of best bets, and to navigate through search results (It is also highly recommended to implement additional well-thought-out logic for error handling, security and scrubbing incoming data).

Provided the above high-level recommendations and examples, the design and implementation of a fast and scalable web search application using SQL Server 2000 Full-Text Search should be well within reach.

Appendix C: Resources

Full-Text Search Deployment

A good reference for those just getting started with Full-Text Search. Covers population methods, hardware and software requirements, and provides tips, tricks, and additional documentation for working with SQL Server 2000 Full-Text Search.

Full-Text Search Public Newsgroup (microsoft.public.sqlserver.fulltext)

An excellent place to find answers to questions about Full-Text Search, as well as helpful hints and tricks. The Full-Text Search newsgroup is frequented by members of the SQL Server development team and very knowledgeable Microsoft MVPs.

Show:
© 2014 Microsoft