Microsoft Site Server 3.0 Search Database Indexing

 

Mike Cheng
Microsoft Corporation

May 1998


Introduction

Site Server Search database indexing enables full text indexing of Microsoft® SQL Server™ and Access databases using ODBC and Active Server Pages (ASP). Microsoft Site Server 3.0 provides the easy-to-use Create New Catalog Definition Wizard for creating search indexes. This document explains how database indexing works, provides troubleshooting tips, and demonstrates how to accomplish advanced tasks.

Creating the Database Catalog

At the heart of the Site Server search functionality is the search catalog—essentially an index and property store that can be searched. Creating the database catalog is a two-step process. First, the Create New Catalog Definition Wizard collects the following information about the database catalog from the user:

  • ODBC data source name
  • Catalog name
  • Table name
  • Content column name
  • Primary key column name
  • Hyperlink (Title) column name
  • Index flags for columns other than content and title
  • Retrieve flags for columns other than content and title

It then creates the following set of .asp and .htm files and scripts, which are used for either the indexing or searching process:

  • Dir.asp (indexing): The start address for the catalog
  • Row.asp (indexing): The script used to retrieve individual records from the database
  • View.asp (searching): The script used to display individual records in the database
  • Search.htm (searching): The form used to collect search criteria
  • Results.asp (searching): The script used to display results. Links on this page use view.asp

After these files are generated, Site Server Search creates the catalog using dir.asp and row.asp. Dir.asp generates links to records in the database that are fetched by row.asp; both use ActiveX® Data Objects (ADO) to access the ODBC data source. This document focuses on dir.asp, row.asp, and view.asp. For more information on search.htm and results.asp, please refer to the Site Server documentation.

Example

Let's walk through creating a database catalog using the Products table of the AdvWorks database, a sample database included with Site Server. The schema for the Products table is shown in Table 1.

Table 1. The Products table of the AdvWorks sample database

Column name Data type
ProductID COUNTER, Nullable
ProductCode TEXT(10), Nullable
ProductType TEXT(20), Nullable
ProductIntroductionDate DATETIME, Nullable
ProductName TEXT(50), Nullable
ProductDescription TEXT(255), Nullable
ProductSize TEXT(5), Nullable
ProductImageURL TEXT(255), Nullable
UnitPrice DOUBLE, Nullable
OnSale BIT, Required

To create a database catalog project for the AdvWorks database:

  1. Start Site Server Service Admin (HTML).
  2. Click Search.
  3. On the menu frame, click Catalog Build Definitions.
  4. On the Catalog Build Definitions page, click Create.

    The Create New Catalog Definition Wizard appears.

  5. Select A database, and then click Next.
  6. Specify AdvWorks as the ODBC data source and as the catalog name. Leave the user name and password blank.
  7. Click Next.
  8. Select the Products table, and then click Next.
  9. Select ProductDescription as the Content column, ProductID as the Primary key column, and ProductName as the Hyperlink column.
  10. Click Next.
    The columns (except ProductDescription and ProductName) from the Products table are listed.
  11. Select ProductID and ProductType as searchable and as retrievable.
  12. To build the database catalog at this time, select Build the database catalog now.
  13. Click Next.
  14. Click Finish.

The scripts generated by the wizard for dir.asp, row.asp, and view.asp follow.

dir.asp

<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\dir.asp
 1: <HTML><HEAD>
 2: <TITLE></TITLE>
 3: <META NAME="ROBOTS" CONTENT="NOINDEX">
 4: </HEAD><BODY>
 5: <% Set r = CreateObject("ADODB.Recordset.1.5") 
 6:  r.open "SELECT ProductID FROM Products", "UID=;PWD=;DSN=AdvWorks" 
 7: sf=0
 8: sf=Request("sf")
 9: i=0
10: if not sf = 0 then r.Move sf
11: while i < 1000 and not r.Eof %>
12: <a href="row.asp?id=<% =r ("ProductID") %>">Link &nbsp;</a>
13: <% i=i+1
14: r.movenext
15: Wend
16: sf = 1000+sf
17: if not r.eof then
18: Response.Write "<a href=dir.asp?sf=" & sf & "> NextPage </A>"
19: End If %>
20: </BODY></HTML>

row.asp

<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\row.asp
 1: <% Set r = createobject ("ADODB.Recordset.1.5") %>
 2: <% r.Open "SELECT ProductID, ProductType, ProductName, ProductDescription FROM Products WHERE ProductID = " & Request.QueryString("id"), "UID=;PWD=;DSN=AdvWorks" %>
 3: <% Dim x() %>
 4: <% ReDim x(4) %>
 5: <% x(0) = r ("ProductID") %>
 6: <% x(1) = r ("ProductType") %>
 7: <% x(2) = r ("ProductName") %>
 8: <% x(3) = r ("ProductDescription") %>
 9: <HTML><HEAD><TITLE>
10: <% y=VarType(x(2)) %>
11: <% if y <> 0 And y <> 1 Then %>
12: <% =server.htmlencode (x(2)) %>
13: <% End if %>
14: </TITLE>
15: <% y=VarType(x(0)) %>
16: <% if y <> 0 And y <> 1 Then %>
17: <META NAME="ProductID" content="<% =server.htmlencode (x(0)) %>">
18: <% End if %>
19: <% y=VarType(x(1)) %>
20: <% if y <> 0 And y <> 1 Then %>
21: <META NAME="ProductType" content="<% =server.htmlencode (x(1)) %>">
22: <% End if %>
23: </HEAD><BODY>
24: <% y=VarType(x(3)) %>
25: <% if y <> 0 And y <> 1 Then %>
26: <% =server.htmlencode (x(3)) %>
27: <% End if %>
28: </BODY></HTML>

view.asp

<Install Directory>\SiteServer\Knowledge\Search\database\Search\view.asp
 1: <% Response.expires=0 %>
 2: <!--Copyright 1997, 1998 Microsoft Corporation. All rights reserved.-->
 3: <% Set r = createobject ("ADODB.Recordset.1.5") %>
 4: <% r.Open "SELECT ProductID, ProductCode, ProductType, ProductIntroductionDate, ProductName, ProductDescription, ProductSize, ProductImageURL, UnitPrice, OnSale FROM Products WHERE ProductID = " & Request.QueryString("id"), "UID=;PWD=;DSN=AdvWorks" %>
 5: <% Dim x() %>
 6: <% ReDim x(10) %>
 7: <% x(0) = r ("ProductID") %>
 8: <% x(1) = r ("ProductCode") %>
 9: <% x(2) = r ("ProductType") %>
10: <% x(3) = r ("ProductIntroductionDate") %>
11: <% x(4) = r ("ProductName") %>
12: <% x(5) = r ("ProductDescription") %>
13: <% x(6) = r ("ProductSize") %>
14: <% x(7) = r ("ProductImageURL") %>
15: <% x(8) = r ("UnitPrice") %>
16: <% x(9) = r ("OnSale") %>
17: <HTML>
18: <HEAD>
19: <TITLE>
20: <% y=VarType(x(4)) %>
21: <% if y <> 0 And y <> 1 Then %>
22: <% =server.htmlencode (x(4)) %>
23: <% End if %>
24: </TITLE>
25: </HEAD>
26: <body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17 leftmargin=17 background="images/RightBkgd.gif">
27: <font face="Arial,Helvetica">
28: <b>
29: <center>
30: Database Record:
31: </center>
32: </b>
33: <b>
34: <center>
35: <% y=VarType(x(4)) %>
36: <% if y <> 0 And y <> 1 Then %>
37: <% =server.htmlencode (x(4)) %>
38: <% End if %>
39: </center>
40: </b>
41: <p>
42: <table border=1 cellpadding=5 align=center>
43: <tr>
44:   <td><font size=2>
45:     ProductName
46:   </font></td>
47:   <td><font size=2>
48: <% y=VarType(x(4)) %>
49: <% if y <> 0 And y <> 1 Then %>
50:     <% =server.htmlencode (x(4)) %>
51: <% Else %>
52:     <% =server.htmlencode ("<n/a>")%>
53: <% End if %>
54:   </font></td>
55: </tr>
56: <tr>
57:   <td><font size=2>
58:     ProductID
59:   </font></td>
60:   <td><font size=2>
61: <% y=VarType(x(0)) %>
62: <% if y <> 0 And y <> 1 Then %>
63:     <% =server.htmlencode (x(0)) %>
64: <% Else %>
65:     <% =server.htmlencode ("<n/a>")%>
66: <% End if %>
67:   </font></td>
68: </tr>
69: <tr>
70:  <td><font size=2>
71:     ProductType
72:   </font></td>
73:   <td><font size=2>
74: <% y=VarType(x(2)) %>
75: <% if y <> 0 And y <> 1 Then %>
76:     <% =server.htmlencode (x(2)) %>
77: <% Else %>
78:     <% =server.htmlencode ("<n/a>")%>
79: <% End if %>
80:   </font></td>
81: </tr>
82: <tr>
83:   <td><font size=2>
84:     ProductDescription
85:   </font></td>
86:   <td><font size=2>
87: <% y=VarType(x(5)) %>
88: <% if y <> 0 And y <> 1 Then %>
89:     <% =server.htmlencode (x(5)) %>
90: <% Else %>
91:     <% =server.htmlencode ("<n/a>")%>
92: <% End if %>
93:   </font></td>
94: </tr>
95: </BODY></HTML>

How It Works

Dir.asp, when executed, creates a record set with all the rows in the specified table of the ODBC data source. This is accomplished by the SQL query "SELECT <Primary key column name> FROM <Table name>" (see line 6 of dir.asp). For each row in the record set, an HTML link to row.asp (line 12 of dir.asp) is created with the primary key value as the parameter. Row.asp takes a column value as a parameter, assuming it to be a value of a primary key. Using this parameter, row.asp fetches the data from the row specified. This is accomplished by the SQL query "SELECT <Column 1>, <Column 2>, … <Column N> FROM <Table name> WHERE <Primary key column name> = <Parameter>", where column 1 through N are columns marked to be indexed or marked as retrievable (line 2 of row.asp). The content column and the title column are indexed and marked as retrievable by default. Row.asp is used to generate an HTML page dynamically. The content of the title column is used as the HTML title (lines 7 and 12 of row.asp). The content of the content column is used as the HTML body (see lines 8 & 26 of row.asp). All other columns marked to be indexed or marked as retrievable are used in HTML meta tag: for example, "<META NAME="<Column M>" content="<content of Column M>">" (lines 5, 17, 6, and 21 of row.asp). When the catalog builder accesses dir.asp, all the links will be followed. Since each link represents a row in the ODBC data source, the Catalog Builder is able to index the ODBC data source.

Mapping

View.asp and row.asp are closely related. The URL mapping automatically created by the wizard allows row.asp to retrieve data and view.asp to display data. For example, http://<HostName>/SiteServer/Knowledge/Search/database/Index/AdvWorks/row.asp is mapped to http://<HostName>/SiteServer/Knowledge/Search/database/Search/AdvWorks/view.asp. Row.asp and view.asp differ in two significant ways. Row.asp selects only columns that are marked to be indexed or retrieved while view.asp selects all columns. Row.asp puts indexed or retrieved columns into HTML meta tags beside the Content and Title columns. Although view.asp selects all columns, it only displays retrievable columns in a table. With all columns available, view.asp can easily be modified to present data in custom ways.

If a database is really large, storage space can be an issue. By shortening the URL for view.asp, storage space can be saved. Assuming a virtual root named "dbsearch" exists on the machine, the mapping can be changed by following the steps listed below. For information on how to create a virtual root, please refer to the Microsoft Internet Information Server (IIS) documentation.

  1. Copy <Install Directory>\SiteServer\Knowledge\Search\database\Search\view.asp to <virtual root dbsearch>\view.asp.
  2. In the Microsoft Management Console (MMC), open the property page of the AdvWorks catalog under Catalog Build Server.
  3. Select the URLs property tab.
  4. Select the existing mapping.
  5. Click Remove.
  6. Click Yes.
  7. Click Add.
  8. Click Next.
  9. Enter http://<HostName>/SiteServer/Knowledge/Search/database/Index/AdvWorks/row.asp as the Access location.
  10. Enter http://<HostName>/dbsearch/view.asp as the Display location.
  11. Click Finish.

Troubleshooting

A database catalog is actually just a crawl catalog, therefore general Site Server search troubleshooting techniques are just as valuable as the ones mentioned here. The following points are specific to the creation of a database catalog (for general Site Server search troubleshooting, please refer to the Site Server documentation):

  • ODBC data source

    If the wizard does not display a specific data source, make sure the ODBC data source is a system data source. To verify a data source, a tool like MS Query can be used. Use the same user name and password supplied to the wizard when verifying the data source. If user name or password turns out to be a problem, both can be changed in the generated dir.asp, row.asp, and view.asp. User name can be found in line 6 of dir.asp, line 2 of row.asp, and line 4 of view.asp (UID=<User name>). Password can be found in line 6 of dir.asp, line 2 of row.asp, and line 4 of view.asp (PWD=<Password>).

  • Dir.asp and row.asp

    Dir.asp and row.asp are critical to the generation of the database catalog. To verify dir.asp, use your browser to open it. If it is working properly, you should see one or more links to row.asp (line 12 of row.asp). Click on the link to verify row.asp. View the HTML source to verify that the data retrieved from the database is indeed what was expected. If the wizard did not report any error while generating the pages and the pages are not modified after generation, but you cannot verify dir.asp and row.asp using your browser, the problem is most likely caused by ASP script timeout.

  • Timeouts

    There are two classes of timeouts that affect the creation of a database catalog. One is the ASP script timeout and the other is the catalog build server timeout. To set the ASP script timeout:

    1. Start the MMC.
    2. Add the IIS Snap-in.
    3. Right-click Default Web Site, and then click Properties.
    4. Select the Home Directory tab.
    5. Click Configuration. . . .
    6. Select the App Options tab.
    7. Change the ASP Script timeout value to the desired value (240 seconds should be sufficient for small databases).
    8. Click OK on the Application Configuration dialog box.
    9. Click OK on the Default Web Site Properties dialog box.

    To set the Catalog Build Server timeout, use the following steps:

    1. Start the MMC.
    2. Add the Search Snap-in.
    3. Right-click Catalog Build Server, and then click Properties.
    4. Select the Timing tab.
    5. Change the Wait for a connection value to the desired value (270 seconds should be sufficient for small databases).
    6. Change the Wait for request acknowledgement value to the desired value (270 seconds should be sufficient for small databases).
    7. Click OK.
  • SQL Query

    If the SQL queries used in dir.asp and row.asp have been modified, use a tool such as Microsoft Query to verify that the queries do indeed produce the desired results. Once the queries are verified, it is a good idea to use a browser to verify the new scripts.

  • Oracle Databases

    The Create New Catalog Definition wizard has not been tested with Oracle databases. With the information given in this document, one should have little or no trouble creating catalogs for Oracle databases. If you have trouble using the wizard on Oracle databases, the following tips can be used. Create a database catalog for AdvWorks as described previously, but give the catalog a different name. This gives you a set of scripts to work with. Modify the queries used in the scripts to retrieve the desired data. Change the connection parameters to the ones needed to connect to the Oracle database. After all the modifications are finished, verify the scripts.

  • Searching

    To search across all the properties, use the query "@all <text>". The title column is indexed as content as well as doctitle. Title specific searches can done by using the query "@doctitle <text>".

Advanced Database Query

The wizard only supports cases that involve a single table and a simple query. In order to use a more advanced query, you need to create all the pages from scratch or modify the pages generated by the wizard. Creating all the pages from scratch is beyond the scope of this document; therefore we will focus on modifying the pages generated by the wizard.

The query used to generate the row set is in line 6 of dir.asp, and the query used to get data from the row set is in line 2 of row.asp. Any query that is supported by ADO 1.5 can be used in place of the generated query. It is a good idea to verify the new query independently before trying to build the catalog. After the query is verified, the browser can be used to confirm that things are working properly. An example of how to incorporate an advanced query in dir.asp is included in the example below.

Limitations

  • ODBC data sources must be system domain service names (DSNs).
  • There is no incremental crawl support. It can be achieved by using triggers and notifications.
  • There is no security on database rows. Once the data is indexed, anyone who has permission to search the catalog can access the data.
  • The wizard is tested with SQL Server and Microsoft Access only. This can be overcome by using the generated scripts as examples and creating custom ASP scripts for other data sources.

Advanced Database Query Example

This example shows how to modify the query used in dir.asp. As example data, a few rows from the Products table in the AdvWorks database are shown in Table 2.

Table 2. Example data from the Products table of the AdvWorks database

ProductID ProductName ProductDescription ProductType
1 North Face Sunspot windproof, water-resistant, even heat distribution, storage sack included, red (re) SleepingBag
2 Polar Star ultralight goose-down insulated alpinist's bag, contour closure with locking zipper, storage sack included, green (gn) SleepingBag
3 Big Sur generously cut sleeping bag, goose down with polyester taffeta, cotton storage sack included, blue (bl) SleepingBag
4 Cascade backpacking sleeping bag, goose down with nylon shell, locking zipper, storage sack included, choice of zipper location, blue (bl) SleepingBag
5 Everglades medium-weight, waterproof leather boots, good traction, brown (br) Boot
7 Rockies lightweight Nubuck leather/nylon, water-resistant boots, polyurethane soles, green/gray (gg) or green/brown (gbn) Boot
13 Starlight 4-person dome tent, roomy, vaulted pole sleeves, black/gray (bg) Tent

If we only want to index sleeping bags, we can modify the dir.asp as follows:

Modified dir.asp

 1: <HTML><HEAD>
 2: <TITLE></TITLE>
 3: <META NAME="ROBOTS" CONTENT="NOINDEX">
 4: </HEAD><BODY>
 5: <% Set r = CreateObject("ADODB.Recordset.1.5") 
 6: r.open "SELECT ProductID FROM Products where ProductType='SleepingBag'", "UID=;PWD=;DSN=AdvWorks" 
 7: sf=0
 8: sf=Request("sf")
 9: i=0
10: if not sf = 0 then r.Move sf
11: while i < 1000 and not r.Eof %>
12: <a href="row.asp?id=<% =r ("ProductID") %>">Link &nbsp;</a>
13: <% i=i+1
14: r.movenext
15: Wend
16: sf = 1000+sf
17: if not r.eof then
18: Response.Write "<a href=dir.asp?sf=" & sf & "> NextPage </A>"
19: End If %>
20: </BODY></HTML>

Only line 6 of dir.asp needs to be changed. When viewed with the browser, the new dir.asp generates only one link for each sleeping bag in the database, resulting in a total of four links. Any valid query supported by ADO 1.5 can be used to customize the catalog being built. Row.asp and view.asp can be modified in a similar fashion.

Appendix: Handling Large Databases with an Updated Wizard

For large databases (half-a-million records or more), decent performance requires a powerful machine (recommended: 200-MHz Dual Pentium Pro processor with 256 MB of RAM) and modifications to the default configuration of IIS and Site Server. ASP script timeout should be set to at least 900 seconds. Both the "Wait for connection" and "Wait for request acknowledgement" timeouts should be set to at least 1000 seconds. Refer to the troubleshooting section for details on how to set the timeout values.

To handle large databases, changing the timeouts alone might not be sufficient. When there are millions of rows in a database, generating links to all the rows in dir.asp can be prohibitively expensive. In order to overcome this issue, a new wizard file (dbwiz6.asp) is available with this article. The original wizard maps the title column to DocTitle and uses the content column as the document content. The updated wizard will do the same, but it also makes the title column and content column available as HTML meta tags. This way, one can perform specific searches on title and content.

To use the new wizard file, follow the following steps:

  1. Rename dbwiz6.asp in <Install Directory>\SiteServer\Admin\Knowledge\Search\ to dbwiz6.bak.
  2. Copy the new dbwiz6.asp to <Install Directory>\SiteServer\Admin\Knowledge\Search\ as dbwiz6.asp.

Assuming the same catalog definition discussed in this document is created using the new wizard, a different dir.asp will be generated, as follows:

<Install Directory>\SiteServer\Knowledge\Search\database\Index\AdvWorks\dir.asp
 1: <HTML><HEAD>
 2: <TITLE></TITLE>
 3: <META NAME="ROBOTS" CONTENT="NOINDEX">
 4: </HEAD><BODY>
 5: <% Set r = CreateObject("ADODB.Recordset.1.5") 
 6:  r.open "SELECT ProductID FROM Products", "UID=;PWD=;DSN=AdvWorks" 
 7: sf=0
 8: sf=Request("sf")
 9: l2 = 0
10: l2 = Request("l2")
11: i=0
12: if not l2 = 0 then
13:  r.Move sf
14:   while i < 10000 and not r.Eof %>
15: <a href="row.asp?id=<% =r ("ProductID") %>">Link &nbsp;</a>
16: <%
17:     i=i+1
18:     r.movenext
19:   wend
20:   sf = 10000+sf
21:   if not r.eof then
22:     Response.Write "<a href=dir.asp?sf=" & sf & "&l2=1 " & "> NextPage </A>"
23:   end if %>
24: <%
25: else
26:   i=0
27:   while not r.Eof
28:     LinkName = "Page " & i & " "
29:     Response.Write "<a href=dir.asp?sf=" & i & "&l2=1 " & "> " & LinkName & "</A>"
30:     i = i+10000
31:     r.Move 10000
32:   wend
33: end if
34: %>
35: </BODY></HTML>

Instead of linking directly to row.asp for each row in the database, the new dir.asp uses a two-level hierarchy. The new dir.asp takes two parameters. When the second parameter does not exist (or is zero), a link (see line 29) back to dir.asp is created for every 10,000 rows in the database. These links use the first parameter to indicate which 10,000 rows the links need to be generated for (see line 13 and 15). With this new structure, links can be generated and processed simultaneously.

Show: