Managing Search Roots

The Crawl Scope Manager (CSM) enables you to add and remove search roots (URLs to particular containers, e.g., file:///C:\MyDocuments) for your data stores to the Windows Search crawl scope.

This topic includes the following subjects:

  • About Search Roots
  • Before You Begin
  • Adding Roots to the Crawl Scope
  • Removing Roots from the Crawl Scope
  • Enumerating Roots in the Crawl Scope
  • Related Topics

 

About Search Roots

A search root identifies the top-level URL for a container, or content store, that can be indexed by the Indexer. It doesn't specify which parts of this store should or shouldn't be indexed; it merely signals that a content store exists and is associated with a registered protocol handler. The syntax for identifying a search root URL includes the protocol, the store or user's security identifier, the path, and optionally a specific item (like a file):

<protocol>://<store or SID>\<path>\[item]
or
<protocol>://<store or SID>/<path>/[item]

The <protocol> segment must be followed by two (2) forward slashes ('/') unless it is the file: protocol, which requires three (file:///). The <site or SID> segment represents either a content store or a user security identifier if the search root is meant to be specific to the user. The <path> segment is a set of containers, like directories or folders, and can include the wildcard character '*'. The <item> segment is optional and can also include the wildcard character '*'. If you do not include an item, be sure you finish your path segment with a slash or else the Indexer assumes the last sub-container is an item.

For example, let's say you have implemented a protocol handler (myPH) to handle files of type *.myext for a custom application. All of these files will be located in the folder WorkteamA\ProjectFiles on a local machine. The search root for that might look like this: myPH:///C:\WorkteamA\ProjectFiles\. Suppose you are planning to include all .myext files but nothing else from that container in the Index. The search root for that might look like this: myPH:///C:\WorkteamA\ProjectFiles\*.myext.

Wildcard patterns define URLs with the wildcard character '*' and are considered a pattern rather than a URL, but the terminology is often used interchangeably. An example would be the following: file:///C:\ProjectA\*\data\ which would match on the following URLs:

  • C:\ProjectA\version1\data\
  • C:\ProjectA\version2\data\
  • but not C:\ProjectA\version1\temp\data\

You should create new search roots for containers not already in the Indexer's crawl scope. If path C:\ParentScope is already included in the Indexer's crawl scope, then you do not need to add a new search root for C:\ParentScope\ChildScope unless you know that the child scope had been previously excluded.

The Windows Search Options user interface, an any other client user interface, displays search roots to users so they can refine the scope rules for their searches. As part of the installation process for your custom protocol handler, container, and/or application, you might define a default crawl scope, with inclusion and exclusion rules. These rules may be presented to end users as locations and check boxes in the Windows Search Options dialog. Users can navigate within the subdirectories of your pre-defined search root and check the ones they want to include in searches or uncheck the ones they want to exclude.

 

Before You Begin

Before using any of the Crawl Scope Manager interfaces there are a few pre-requisite steps:

  1. Create the CSearchManager object and obtain its ISearchManager interface.
  2. Call ISearchManager::GetCatalog for "SystemIndex" to obtain an instance of an ISearchCatalogManager.
  3. Call ISearchCatalogManager::GetCrawlScopeManager() to obtain an instance of ISearchCrawlScopeManager.

After making any changes to the Crawl Scope Manager, you must call the SaveAll() method. This method takes no parameters and returns S_OK on success.

 

Adding Roots to the Crawl Scope

The ISearchCrawlScopeManager tells the search engine about containers to crawl and/or watch, and items under those containers to include or exclude. To add a new search root, you instantiate an ISearchRoot object, set root attributes, and then call ISearchCrawlScopeManager::AddRoot with a pointer to your ISearchRoot object. The search root URL takes the same form as the search root described earlier.

The following table describes the relevant "put" methods for ISearchRoot; the interface's other put methods are currently not used by Windows Search. We recommend including the users' security identifiers (SIDs) in all roots for better security. Per-user roots are more secure as queries are run in a per-user process, ensuring that one user cannot see items indexed from another user's inbox, for example.

Method Description
put_ProvidesNotifications Set to TRUE if a protocol handler or other application will notify the search engine about changes to the URLs under the search root, initiating indexing.
put_RootURL Sets the root URL of the current search. The URL takes the search root form described earlier.

 

Note  By default, the Indexer doesn't crawl a given scope when you add a search root. You must also add an include rule for the root.

If you want to add a user-specific root for an application, that application should add the appropriate roots for new users on application startup:

  1. Get the user's SID
  2. Enumerate all roots to check if one exists for this SID
  3. Add a new root, using the SID, if necessary

 

Removing Roots from the Crawl Scope

You can use ISearchCrawlScopeManager to remove a root from the crawl scope when you no longer want that URL indexed. Removing a root also deletes all scope rules for that URL. For example, let's say you want to uninstall a content store and/or its protocol handler, like Outlook Express, from a system. You can uninstall the application, remove all data, and then remove the search root from the crawl scope, and the Crawl Scope Manager removes the root and all scope rules associated with the root.

To remove an existing search root, you call ISearchCrawlScopeManager::RemoveRoot with the search root you want to remove. The search root URL takes the same form as the search root described earlier. This method returns S_FALSE if the root isn't found and an error code if there was an error removing a root that is found.

Removing a search root also removes the URL from the Windows Search Options dialog, so users may not be able to re-add that container or location using the dialog.

It is also possible to remove all user-set overrides of a search root and revert back to the original search root and scope rules. Refer to Managing Scope Rules for more information.

Note  On Microsoft Windows Vista, if users are removed through the User Profiles in the Control Panel, CSM removes all rules and roots with their SID and removed their indexed items from the catalog. On XP, you need to remove the users' roots and rules manually.

 

Enumerating Roots in the Crawl Scope

The CSM enumerates search roots using a standard COM-style enumerator interface, IEnumSearchRoots. You can use this interface to enumerate search roots for a number of purposes. For example, you might want to display the entire crawl scope in a user interface, or discover whether a given root or the child of a root is already in the crawl scope.