Notifying the Index of Changes

With Windows Search, components can notify the Windows Search Indexer that data in their store has changed. Typically, you can rely on the Indexer's incremental crawl. However, providing notifications to the Indexer can improve performance by ensuring the Indexer doesn't crawl the entire store on incremental indexes. For example, this might recommended if you expect your data store to be exceptionally large and/or exceptionally busy, like an email data store. Using notification APIs, components can notify the Indexer that an item has been changed, moved or deleted, and they can add search scopes to the Windows Search Indexer's queue of URLs requiring indexing.

  • Overview of Notifications Process
  • Implementing Indexer-managed Notifications
  • Implementing Provider-managed Notifications
  • Related Topics

Overview of Notifications Process

There are three ways that data from your data store gets indexed: incremental crawls, Indexer-managed notifications, or provider-managed notifications. With the first approach, the Indexer includes your store in its normal incremental crawls. You do not need to implement any form of notifications. As a background process, the Indexer crawls through its crawl scope, looking for changes and updating the catalog. This approach is recommended for almost all situations.

With Indexer-managed notifications, you implement a notification strategy that sends notifications to the Indexer when data in the data store has changed, and the Indexer manages tracking the notifications and indexing the data. In this situation, your component (which we'll call a notifications provider) monitors the data store, collects information about changes to the store, and then periodically notifies the Indexer with a list of items that need indexing. The Indexer is responsible for recovering and resolving notifications in case of failure. This approach, which you can think of as the "send it and forget" strategy, reduces the frequency of Indexer crawls. This is recommended only if you expect incremental crawls of your data store will significantly hinder performance.

With provider-managed notifications, you implement a notification strategy that is similar to the second approach, except that your notifications provider must track notifications and is responsible for recovering and resolving notifications in case of failure. In this situation, your notifications provider monitors the data store, collects and maintains information about changes to the store, periodically notifies the Indexer with a list of items that need indexing, receives status updates from the Indexer, and re-sends notifications in case of failure. Obviously, this option is non-trivial and is not recommended unless you expect incremental crawls of your data store to hinder performance significantly and you require granular control over or insight into the indexing status.

Implementing Indexer-managed Notifications

Indexer-managed notifications enable you to control access to your data store while freeing you of maintaining the notifications queue throughout the entire indexing process. Your notifications provider must monitor changes to the data store and create a notifications queue. Periodically, your provider sends a batch of change notifications to the Indexer. When the Indexer receives your notifications, it returns an acknowledgement and you can remove the item(s) from your queue. If after a period of time you do not receive an acknowledgement, you can re-send the notifications. In case of failure, the Indexer rebuilds its internal queue of items to crawl or performs an incremental crawl of the store.

To implement Indexer-managed notifications, you need to implement the following:

  • A mechanism for monitoring changes in your data store.
  • A data structure to queue up information (multiple SEARCH_ITEM_PERSISTENT_CHANGE structures) about those changes.
  • ISearchPersistentItemsChangedSink interface to send your notifications to the Indexer and to get notification acknowledgements from the Indexer.

Notifications Queue

You need to monitor and queue up every change in your data store to send to the Indexer as a notification. How many notifications you queue up and how frequently you send them to the Indexer depends on your circumstance. Perhaps you send a batch of notifications for every n number of changes or after some t time interval, or a combination of the two.

The Indexer expects the notifications to come in an array of SEARCH_ITEM_PERSISTENT_CHANGE structures, so you may choose to implement your queue similarly.

ISearchPersistentItemsChangedSink

To access this interface, you first instantiate an ISearchManager object to gain access to an ISearchCatalogManager object. From that ISearchCatalogManager object, you instantiate an ISearchPersistentItemsChangedSink object and notify the Indexer of the data changes with a call to the OnItemsChanged method.

In the call to this method, you include the number of changes being reported and an array of SEARCH_ITEM_PERSISTENT_CHANGE structures. You get back an array of HR completion codes indicating whether each URL was accepted for indexing. This is your acknowledgement from the Indexer.

Implementing Provider-managed Notifications

Provider-managed notifications enable you to control access to your data store and to monitor the progress of the Indexer as it updates the Windows Search catalog. Your provider must monitor changes to the data store and create a notifications queue. Periodically, your provider sends a batch of change notifications to the Indexer. When the Indexer receives your notifications, it returns an acknowledgement. If after a period of time you do not receive an acknowledgement, you can re-send the notifications. As the Indexer crawls the data store and updates the Windows Search catalog, it notifies your provider of each catalog update and you can remove the item(s) from your queue. Your provider maintains its notification queue throughout this process so that in case of failure, you can resend notifications to the Indexer.

To implement provider-managed notifications, you need to implement the following:

  • A mechanism for monitoring changes in your data store.
  • A data structure to queue up information (multiple SEARCH_ITEM_CHANGE structures) about those changes.
  • ISearchItemsChangedSink interface to send your notifications to the Indexer and to get notification acknowledgements from the Indexer.
  • ISearchNotifyInlineSite interface to receive updates about the status of indexing.

Notifications Queue

You need to monitor and queue up every change in your data store to send to the Indexer as a notification. How many notifications you queue up and how frequently you send them to the Indexer depends on your circumstance. Perhaps you send a batch of notifications for every n number of changes or after some t time interval, or a combination of the two.

The Indexer expects the notifications to come in an array of SEARCH_ITEM_CHANGE structures, so you may choose to store change information similarly. However, you also need to be able to match the notifications you send with the acknowledgements and updates returned by the Indexer. You may also want to be able to detect how long it takes to get acknowledgements, so you can decide if/when to resend notifications.

ISearchItemsChangedSink

To access this interface, you first instantiate an ISearchManager object to gain access to an ISearchCatalogManager object. From that ISearchCatalogManager object, you instantiate an ISearchItemsChangedSink object and notify the Indexer of the data changes with a call to the OnItemsChanged method.

In the call to this method, you include the number of changes being reported and an array of SEARCH_ITEM_CHANGE structures. You get back an array of Indexer-assigned DocIds that represent each change as well as an array of HR completion codes indicating whether each URL was accepted for indexing. This is your acknowledgement from the Indexer that it has received your notifications and is preparing to index the items.

From then on, the Indexer sends updates using the ISearchNotifyInlineSite interface, discussed next.

ISearchNotifyInlineSite

In order to get updates about the status of both your items and the catalog, you must register your ISearchNotifyInlineSite interface with the Indexer so it can send you callbacks. Each update sent using ISearchNotifyInlineSite::OnItemIndexedStatusChange identifies the items by DocId, the status of each item (SEARCH_ITEM_INDEXING_STATUS) and the indexing phase (SEARCH_INDEXING_PHASE) the items are in.

Not only do you receive updates on the status of each item, as described earlier, you also get important information about the status of the catalog when the Indexer calls ISearchNotifyInlineSite::OnCatalogStatusChange. This method includes the following parameters.

Parameter Description
guidCatalogResetSignature A GUID representing the catalog reset. If this GUID changes, all notifications must be resent.
guidCheckPointSignature A GUID representing a checkpoint.
dwLastCheckPointNumber A number indicating the last checkpoint saved.

Occassionally, the catalog is reset and everything in the catalog is deleted. Your notifications provider must track the guidCatalogResetSignature in order to detect when the catalog has ben reset. When this GUID changes, you can discard your notifications queue and you must repush your entire data store.

Periodically, the catalog undergoes a checkpoint, and all notifications sent prior to that checkpoint are safe and recoverable in the event of a service failure. The catalog can be restored with all the relevant data sent prior to that checkpoint. Therefore, your provider needs to track only those notifications sent between checkpoints.

A catalog checkpoint is represented by both a GUID (guidCheckPointSignature) and a DWORD (dwLastCheckPointNumber). The DWORD represents the most recent checkpoint saved. When this changes, the client no longer needs to track the notifications sent prior to that checkpoint. The GUID represents the checkpoint after a catalog restore, when the catalog has been rolled back to a known, saved checkpoint. When this GUID changes, the client needs to resend all notifications accumulated since the most recent checkpoint.