MSDN Magazine > Issues and Downloads > 2002 > April >  SharePoint Portal Server 2001: Search and Acces...
From the April 2002 issue of MSDN Magazine
MSDN Magazine

SharePoint Portal Server 2001

Search and Access Disparate Data Repositories in Your Enterprise

Kayode Dada
This article assumes you're familiar with XML, ADO, ASP, and SQL
Level of Difficulty     1   2   3 
Download the code for this article: SPS.exe (42KB)
SUMMARY The knowledge worker is greatly empowered if she is able to access information across the enterprise from a central access point. With the SharePoint Portal Server 2001 Search Service you can catalogue information stored in Exchange public folders, on the Web, in the file system, and even in Lotus Notes databases. This article discusses the use of ActiveX Data Objects and the Web-based Distributed Authoring and Versioning protocol for creating search solutions based on SharePoint Portal Server 2001.
Microsoft® SharePoint™ Portal Server 2001 combines document management, advanced search capabilities, and a Digital Dashboard portal in a single collaborative server solution. In this article I will discuss how you can incorporate the search capabilities of SharePoint Portal Server into your existing intranet and Internet solutions. I will also discuss the creation of a solution using two different approaches: one with ActiveX® Data Objects (ADO) combined with the OLE DB Provider for Internet Publishing, and the other using the Web-based Distributed Authoring and Versioning (WebDAV) protocol and the ServerXMLHTTP object. SharePoint Portal Server is compliant with the Microsoft Exchange implementation of WebDAV. Both are available at the link at the top of this article.

Solution Architecture Overview

      Let's dive right in. The architecture of my sample solution is shown in Figure 1. The client ASP application (ASearch_Client.asp or WSearch_Client.asp) instantiates the Search component (ASearch_Server or WSearch_Server), which is implemented as a VBScript Class object in the corresponding ASP pages. The Search classes expose the same properties and methods; the difference is in the technology used for the implementation of each object. Clients execute search queries by calling the Find method, passing the search string, the search scope, and the desired page. The Find method returns an XML string containing the result of the search.

Figure 1 Search Architecture

Figure 1

Search Architecture

      The Status property indicates the status of the Find call and is set to HTTPSTATUS_OK or SEARCHSTATUS_OK if the query executes successfully. The ResultCount property is the total number of matches found for the search string. (This is not supported for the ADO solution.) Using this property and the BeginOfResult and EndOfResult properties, client pages can provide result navigation and data on the number of result pages found.

Creating and Configuring the Search Workspace

      To search for information across the different repositories in your enterprise, you will need to create a workspace that will be dedicated to indexing your content sources. With SharePoint Portal Server you can index various file formats that provide IFilter interfaces, as well as non-flat-file content sources such as Microsoft Exchange public folders, Microsoft Content Management Server (MSCMS), other SharePoint Portal Server workspaces, internal and external Web sites, file systems, and Lotus Notes databases.

Figure 2 Describe Content Source

Figure 2

Describe Content Source

      For each repository you want to query, you must create a content source using the Add Content Source Wizard (see Figure 2). You must be a coordinator in the workspace to create content sources. To index an MSCMS 2001 channel, you must create a Web site content source, using the Web site address format http://servername/channel_name (see Figure 3). All the linked pages on this site will then be crawled and their contents indexed.

Figure 3 Specify Content Source

Figure 3

Specify Content Source

      It may be necessary to keep certain subfolders or virtual directories from being indexed. Maybe they contain restricted information, or are still in beta, or maybe you just have your own reason to leave them out. You can set the site path rule in your workspace to exclude these folders (see Figure 4 and Figure 5).

Figure 4 Selecting a Site

Figure 4

Selecting a Site

Figure 5 Preventing Indexing

Figure 5

Preventing Indexing

      If your content source contains HTML files with META tags and you want your users to be able to search for these documents using these META values, you must configure SharePoint Portal Server to do so, since it does not enable this feature by default. The article "Crawling Custom Metadata Using SharePoint Portal Server 2001" describes the use of the Advanced Search Web Part and presents scripts you can use to search custom content metadata.

SharePoint Portal Server SQL

      SharePoint Portal Server provides an extension of standard database SQL syntax so you can search documents more easily. The general format of the SQL is:
SELECT searchable_properties FROM scope_specifier 
WHERE condition ORDER BY searchable_properties
SharePoint Portal Server includes a number of searchable document properties, which represent the column names in your SELECT statement. The property names all have special characters such as ":" and "#" so you must enclose them in double quotes in your SELECT statements, as shown here:
SELECT "DAV:href" 
Refer to the SharePoint Portal Server SDK to become familiar with this syntax. In addition, you should review the information on searchable document properties. Keep in mind that searching across several content sources can negatively impact the performance of your search. Consider specifying the scope of the search in the FROM clause of the SELECT statement to narrow the search to a particular path in a content source. The following query searches all the content sources in the workspace:
SELECT xxx FROM SCOPE()
This next query example uses the DEEP TRAVERSAL keywords to search for documents in the current path (http://www.microsoft.com/sharepoint/downloads), and also in all subfolders contained in that particular path.
SELECT xxx FROM SCOPE ('DEEP TRAVERSAL OF 
"http://www.microsoft.com/sharepoint/downloads" ')
      Rather than continuously executing a DEEP TRAVERSAL search, you could create a Web Storage System (WSS) search folder. Refer to "Creating Search Folders" in the WSS SDK for more information.
      The SHALLOW TRAVERSAL keywords constrain query results to a single directory level. This query searches only documents in the path specified and does not check folders below this level:
SELECT xxx FROM SCOPE ('SHALLOW TRAVERSAL OF 
"http://www.microsoft.com/sharepoint/downloads" ')
      The WHERE clause lets you further specify the search term you want to look for, as in these two examples:
SELECT xxx FROM yyy WHERE FREETEXT('SharePoint')

SELECT xxx FROM yyy WHERE CONTAINS("urn:schemas-microsoft-
   com:office:office#Title", 'SharePoint' )
The FREETEXT predicate is better suited for finding documents containing combinations of the search terms spread throughout the column, while the CONTAINS predicate is better suited for finding exact matches.
      In your query, you can also specify how the rank values are calculated by the SharePoint Portal Server search service and also modify the rank values returned by using the RANK BY clause of the SELECT statement. This kind of ranking is useful if the search term has a higher relevance in some specific columns than it has in others. Here's an example using the RANK BY clause:
SELECT xxx FROM SCOPE() WHERE CONTAINS
   ( "urn:schemas-microsoft-com:office:office#Title", 
      'SharePoint' )
   RANK BY WEIGHT (0.90)

SELECT xxx FROM SCOPE() WHERE CONTAINS 
   ( "urn:schemas-microsoft-com:office:office#Title", 
     'SharePoint' )
   RANK BY COERCION (ABSOLUTE, 1000)

Developing with ADO

      If you're familiar with ADO, you can use the OLE DB Provider for Internet Publishing, which provides a way to use your favorite data access API to access SharePoint Portal Server. ADO is good for read-only tasks, such as search queries, reading file properties, and enumerating folders. Using ADO also allows you to connect to the SharePoint Portal Server from a remote Web server without passing along the authentication credentials for the connection. A drawback to using ADO is that you cannot set the limit for the size of the resultset it returns (the default is 200 rows). However, this is possible using the WebDAV/ServerXMLHTTP approach.
      To use ADO you will need to create an ADO Connection object, a Command object which will contain the query text, and a RecordSet object to hold the results of the query. Retrieving and using search data requires four steps: connecting to SharePoint Portal Server, executing the query, retrieving the column values, and paging through the resultset.

Connecting to SharePoint Portal Server

      To connect to SharePoint Portal Server, you first need to create an ADO Connection object and set the connection string value to the OLE DB provider for Internet Publishing (msdaipp.dso). Then you must set the data source as the URL of the SharePoint Portal Server workspace. This lets you call the Open method, as you see in this example:
Set m_objCn = Server.CreateObject("ADODB.Connection")

m_objCn.ConnectionString = "provider=msdaipp.dso; data source=" & _
"http://myserver/sps_search"

m_objCn.Open

Executing the Query

      To execute a search query, assign the query to the CommandText property of the Command object and open a RecordSet with the Command object as the source (see Figure 6). In the SQL code, m_strScope is the Uniform Resource Identifier (URI) of the path to search. You can specify URIs in a number of popular formats, such as http://servername/vroot and file://servername/share. The search term to look for is m_strSearch. Column aliases ("DAV:displayname" AS DisplayName) can also be used as a convenient shortcut to access the column values in the resultset.

Retrieving Column Values

      Column values should be retrieved by specifying the column name or alias, as you can see in these examples:
m_objRs.Fields("DAV:displayname").Value
m_objRs.Fields("DisplayName").Value 
While the Fields collection supports index-based retrieval, using the ordinal value of the column may lead to unpredictable results because the resultset contains additional columns not specified in the query.

Paging through the Resultset

      Note that the result from the search query can hold as many as 200 rows. You can segment these results into pages and provide some navigation (next page, previous page, for example) for the user to request additional pages.
      The Knowledge Base article "Page Through a Recordset from ASP" details how to page a standard RecordSet. However, because the RecordSet obtained from this query does not have some of the required properties (PageCount, PageSize, AbsolutePage, and RecordCount), I used a workaround to accomplish paging.
      To compute the starting row of the page to display as shown in the following code snippet, I used SEARCH_RANGESIZE as a constant that defines the number of rows that make up a page, and m_intPage as the page number and then set the RecordSet to point to that row.
m_intAbsPosition = ((m_intPage-1) * SEARCH_RANGESIZE) + 1
m_objRs.Move m_intAbsPosition         
The page number is provided by the client application as the user navigates through the search result. Figure 7 shows how to verify that the new record position is not the EOF (end of file) before retrieving rows equal to the SEARCH_RANGESIZE. Note that the RecordCount property is not supported for this RecordSet. Therefore you will not be able to provide feedback to your users as to the number of pages of results found. You will also have to reexecute your query for each page request.

Developing with WebDAV and ServerXMLHTTP

      As I mentioned earlier, search solutions for SharePoint Portal Server can also be built using ServerXMLHTTP and WebDAV. The advantage of this approach is that you can set a limit for the size of the resultset to be returned and use XSL to transform the result to HTML since the result is returned as an XML stream.
      The drawback to this method is that, depending on how the application is set up, a user name and password may be required to connect to SharePoint Portal Server. With an ASP application running on a different server from SharePoint Portal Server, the authentication information will be required, but this is not the case when the ASP page is run from the same server as SharePoint Portal Server. In the next sections, I'll explain how to implement the search request, connect to the server, set limits on the query, execute the request, view the response to a successful request, transform the response to HTML, and finally, page through the result.

Sending the Search Request

      The request for information is sent as XML data to the server (see Figure 8). Your search request XML should begin with the <?xml> tag, followed by a <DAV:searchrequest> tag. To avoid repeating the DAV: prefix throughout the request, you can assign a namespace for it. This namespace, however, does not apply to any SharePoint SQL statements specified in <D:sql> tags.
      Since SharePoint SQL uses XML, special characters in the SQL statements must be properly encoded to avoid any XML parsing errors. For instance, use
"urn:schemas.microsoft.com:fulltextqueryinfo:rank" &gt; 10 
instead of
"urn:schemas.microsoft.com:fulltextqueryinfo:rank" > 10
      The request is then sent using ServerXMLHTTP. When developing server-side applications using XMLHTTP, the ServerXMLHTTP object is recommended. To create an instance of the object, you must create it on the server:
Set m_objXMLHTTP = Server.CreateObject("MSXML2.serverXMLHTTP")
Refer to the Knowledge Base article, "Frequently Asked Questions about ServerXMLHTTP" for more information. Note that specifying multiple <DAV:sql> tags to send multiple queries will result in an error.

Connecting to the Server

      You connect to SharePoint Portal Server using the Open method of the ServerXMLHTTP object with two required parameters: the WebDAV SEARCH verb and the SharePoint Portal Server workspace URL:
m_objXMLHTTP.Open "SEARCH", "http://myserver/sps_search", FALSE, _
m_strUId, m_strPWD
As you can see, there are three optional parameters to the Open call. The third parameter indicates whether the call is asynchronous. The default is FALSE, implying that the method call should get all the results before going on to the next statement, which is required for SharePoint Portal Server search queries. The last two parameters are user ID and password; these may or may not be required depending on your security settings.

Setting Limits on the Query

      One of the advantages of using the ServerXMLHTTP and WebDAV approach is that you can specify the maximum number of rows returned by the search query. This improves query performance by reducing latency when set to return 200 rows or fewer:
m_objXMLHTTP.SetRequestHeader "MS-SEARCH-MAXROWS", 200
      You can also specify the range of rows to return from the matches found, thereby breaking the search results into pages. The ServerXMLHTTP object supports the HTTP Range request header to specify the number of bytes returned. When used with the rows specifier, the WebDAV engine recognizes it as indicating the set of rows to return. For example, if you specify rows=0-9 in the Range header, the first 10 rows will be returned; specifying rows=10-19 returns the next 10 rows, and so on.
m_objXMLHTTP.SetRequestHeader "Range:", "rows=0-25"
By specifying different row ranges and resending the search request you can add page navigation features to your search app. These limits are specified in the request header information using the SetRequestHeader method of the ServerXMLHTTP object.

Executing the Request

      The steps I've outlined are necessary to set up the request to be processed by the server. The request, however, is executed only when the Send method is called:
m_objXMLHTTP.Send m_strRequest
Errors resulting from the execution of the request can be detected by checking the Status and the StatusText properties of the ServerXMLHTTP object.

Viewing the Response

      The response to a successful request from SharePoint Portal Server consists of two parts: the header information and the actual result. You can access the information contained in the response header using the GetAllResponseHeader and GetResponseHeader methods of the ServerXMLHTTP object.
      From the header you can get the total number of matches found and the range of rows in the response as well as an indicator to tell you if there are more matching rows than were requested. The total number of matches found can be used to provide some feedback to the client regarding the number of pages in the resultset. (See the section "Paging through the Result" later in this article.)
      The matching rows are contained as an XML string in the ResponseText property (or the ResponseXML property) of the ServerXMLHTTP object. This string begins with the multistatus tag. All the namespace aliases are defined in this tag. Note that the namespace aliases are randomly assigned from A-Z and may be different for each request. Figure 9 shows a sample response XML, which is the result of the request in Figure 10.
      The response elements, consisting of the href and propstat elements, provide the column information for each row returned by the query. The href element is always returned even when not specified in the query. The propstat element contains the columns returned for the row. Two propstat elements may be returned in a response element if any column specified in the request generates an HTTP status of 404 (resource not found).
      There are two elements inside the propstat element: status and prop. The status element indicates the HTTP status for the columns of the row contained in the prop element immediately following it. The status value can be either "HTTP/1.1 200 OK," which indicates that the columns have data, or "HTTP/1.1 404 Resource Not Found," which indicates that no values were found for the columns.
      The prop element contains the columns of the request and their returned values. The column element name consists of the column's namespace identifier followed by the column name. An empty column element that returns with an HTTP 200 status is an indication that the property was set but has an empty value.

Transforming the Response to HTML

      The response from SharePoint to a search request can be transformed to HTML. In the XSL you create for this transformation, you should define your own namespace aliases for the columns specified in your query, since SharePoint Portal Server randomly assigns namespace aliases. You cannot rely on the namespace aliases that SharePoint Portal Server provides by default:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
     xmlns:MSNS="urn:schemas-microsoft-com:office:office" 
     xmlns:DAV="DAV:" 
     xmlns:MSNSFT="urn:schemas.microsoft.com:fulltextqueryinfo:">
<xsl:template match="/">

Paging through the Result

      Paging is accomplished by requesting a specific row range which corresponds to a page of result data. This range is computed as follows:
m_intRequestRangeStart = ((m_intPage - 1) * SEARCH_RANGESIZE)
m_intRequestRangeEnd = m_intRequestRangeStart + SEARCH_RANGESIZE
m_strRange = "rows=" & m_intRequestRangeStart & _
"-" & m_intRequestRangeEnd
This range is then set in the request header that's sent to the server:
With m_objXMLHTTP
    .Open "SEARCH", "http://myserver/sps_search", False
    .SetRequestHeader "Range:", m_strRange
    .SetRequestHeader "MS-SEARCH-MAXROWS", trim(cstr(SEARCH_MAXROWS))
    .Send m_strRequest
End With
As you can see in the following, after the request is completed, you can use the MS-Search-MoreRows directive to see whether there are more rows available than are encompassed by the range you've specified in your request:
m_blnMoreRows = m_objXMLHTTP.GetResponseHeader("MS-Search-MoreRows")
You can also get the total number of matches found so that you can provide some feedback to your users about how many pages of results they can expect:
m_arrContentRange = Split(m_objXMLHTTP.GetResponseHeader
("Content-Range"),"total=",-1, vbTextCompare)
m_intResultCount = CInt(trim(m_arrContentRange(1)))
Again, for each page request you will need to resend your query.

Conclusion

      In developing your own custom search solutions, deciding whether to use ADO or WebDAV will largely depend on what you are most comfortable with. Using ADO and the OLE DB Provider for Internet Publishing, you can easily integrate your solution into existing ADO applications and bind to data controls that require ADO. This method works best for read-only tasks against the Web store. With WebDAV and ServerXMLHTTP, you can bind directly to the XML response and control the size of the query resultset. You can also use XSL style sheets to format the XML-based query results. Both access methods bypass the Document Management Object provided by SharePoint Portal Server and save the overhead of instantiating the necessary Document Management Objects that bind to the store.
For related articles see:
SharePoint: SharePoint Portal Server Makes Your Intranet More Manageable and Easier to Navigate
Microsoft SharePoint Portal Server 2001 as a Collaborative Solution Platform

For background information see:
SharePoint Portal Server SDK
Microsoft Full-Text Search Technologies
Exchange Store Search Folders

Kayode Dada works for Microsoft Consulting Services. Previously, he was a Solution Architect specializing in Enterprise Collaboration Solutions at Avanade Inc (http://www.avanade.com). He led the development of Avanade's corporate portal and portal offering. He can be reached at kay_dee98@hotmail.com.

Page view tracker