3.1.4 Message Processing Events and Sequencing Rules

The following table summarizes the list of operations as defined by this specification.

Operation

Description

EnumerateFolder

Returns information about the immediate child folders, documents, and pages of the specified folder.

GetAttachments

Returns information about all the attachments of the specified list item.

GetChanges

Returns the changes that occurred on site collection content from one moment in time to another.

GetChangesEx

Returns the changes that occurred on site collection content from one moment in time to another, as well as additional metadata related to the site collection.

GetContent

Returns change information (ChangeId) about the content database and site collection, as well as information about the Web application, site collection, site, list, list item, folder and attachment.

GetContentEx

Returns change information (ChangeId) about the content database and site collection, as well as information about the Web application, site collection, site, list, list item, folder, and attachment, and enables better control of the protocol with parameters.

GetList

Returns general information, schema of the fields and access right permissions for a given list.

GetListCollection

Returns general information about all lists of the context site.

GetListItems

Returns all or desired fields of all list items satisfying certain criteria in a given list.

GetSite

Returns information about the site collection of the context site.

GetSiteAndWeb

Returns the URL of the site collection and the site to which the specified URL belongs.

GetSiteUrl

Returns the GUID and URL of the site collection to which the specified URL belongs.

GetURLSegments

Returns the identifiers of the object pointed by the specified URL.

GetWeb

Returns information about the context site.

Operations GetSite, GetWeb, EnumerateFolder, GetListCollection, GetList, GetListItems, GetAttachments are used for full content traversal and indexing. They can be called specific traversal operations.

Operations GetSiteAndWeb, GetSiteUrl, GetURLSegments are auxiliary; they are used to parse URLs.

Operations GetContent, GetContentEx, GetChanges and GetChangesEx are used for full and then incremental content indexing.

Note that GetContent operation is polymorphic; it can be used (when called with proper parameters) instead of specific traversal operations: GetSite, GetWeb, EnumerateFolder, GetListCollection, GetList, GetListItems, GetAttachments. GetContent fully covers specific traversal operations functionality, albeit details of input parameters and syntax of the response might differ.

Recommended sequencing of the operations, briefly outlined in section 1.3 is explained here in detail.

Sequence for Establishing Indexing Context

Indexing starts with the indexing service establishing context. All the indexing service needs to know is the URL of a page from the site collection. For example:

 http://www.Fabrikam.com:8080/Subsite/Shared%20Documents/Forms/AllItems.aspx 

The indexing service identifies the site collection of interest, and obtains details about its content database and current change timestamp, using the steps as shown in the following diagram.

Establishment of indexing context

Figure 4: Establishment of indexing context

  1. Indexing service extracts the host name with port number from the URL (for example www.Fabrikam.com:8080) and sends the GetContent request to that host, passing "VirtualServer" as the value of the objectType parameter.

  2. The protocol server responds with a message that contains GUIDs of all the content databases servicing that Web application.

  3. The indexing service sends a GetContent request, passing "ContentDatabase" as the value of the objectType parameter, and passing one of the identifiers obtained in step 2 as the value the of the objectId parameter.

  4. The protocol server sends a response message, enumerating all site collections serviced by that content database: their URLs and identifiers (GUIDs), as well as the current ChangeId, (in the context of full content database). Assume that the response contains an entry such as the following:

     <Site URL=" http://www.Fabrikam.com:8080" 
     ID="f945dc15-825e-4c16-8d12-bed6413dc61a" />
    

    Also, assume that this site collection is exactly the one that the indexing service wants to index. Then proceed to step 5. Otherwise, if the site collection is not serviced by this content database, repeat from step 3 until the URL of interest is found.

  5. The indexing service sends the request GetChanges to the site collection with URL http://www.Fabrikam.com:8080, passing "Site" as the value of the objectType parameter, optionally the identifier of the content database obtained in step 2 as the value of the contentDatabaseId parameter (it can be left blank), and the current ChangeId (in the context of content database) obtained in step 4 as the value of the LastChangeId parameter, leaving the CurrentChangeId parameter blank.

  6. The protocol server sends a response message containing the LastChangeID (meaning the last known change) in the context of the site collection, namely http://www.Fabrikam.com:8080. This ChangeId (hereafter referred to as the StartChangeId) is required to perform incremental change processing in the future.

Now that the context is established, indexing starts with full traversal, when the indexing service attempts to snapshot the whole structure of the site collection and its current content.

Sequence of full traversal

The following diagram depicts the sequence of this protocol’s operations required for full traversal. A brief explanation of each message follows. Note that the following diagram explains only one branch of depth-first [KNUTH] full traversal scenario. For example, having found some lists among others returned by the GetListCollection response, first explore its list items. Complete depth-first traversal explores all the remaining lists then, which is not shown.

Note that this protocol does not prescribe depth-first traversal to build a snapshot of the site content. Any systematic traversal sequence, for example, breadth-first [KNUTH], will do, however depth-first traversal is simpler to explain.

Outline of full site traversal scenario

Figure 5: Outline of full site traversal scenario

  1. The protocol client sends GetSiteAndWeb input message (with any site – using referring URL as parameter) to get URL for site collection and the site that this URL refers to.

  2. In response, the protocol server sends a message containing those URLs.

  3. The protocol client sends a GetWeb request message, targeting the site collection of interest, for example, http://www.fabrikam.com:8080.

  4. The protocol server sends a response message, which among other things contains the list of all subsites (direct child objects) of the site collection targeted by the request. For example, one of them can be http://www.fabrikam.com:8080/subsite.

  5. To perform a depth-first exploration of that subsite, the protocol client sends a GetListCollection request message targeting the first subsite.

  6. The protocol server sends the GetListCollection response message, containing information about all lists, and in particular, their identifiers.

  7. To perform a depth-first exploration of the first list, the protocol client sends a GetList message passing this subsite URL and list identifier as parameters.

  8. The protocol server sends the GetList response message, containing information about the list, including when created, when last modified, permission settings, and list schema.

  9. To obtain information about all the list items, the protocol client sends a GetListItems input message with empty sQuery parameter (meaning all items).

  10. The protocol server sends GetListItems response message, enumerating all list items.

The protocol client can now inspect all fields of those list items to build the index. The details of building the index are outside of the scope of this protocol itself.

Recommended sequence of incremental indexing

This section specifies the recommended sequence of steps to keep the index up-to-date.

The GetChanges and the GetContent operations of this protocol provide the ability to obtain incremental change reports, rather than repeating full traversal of the site.

This protocol presumes that the protocol server keeps track of all changes to site content in a content database. The content database can service multiple site collections, however a site collection is supported by a single content database. Although a site collection can be comprised of multiple sites and subsites, all are considered part of the site collection. The protocol server uses the content database to keep track of all content changes to all sites and subsites in the site collection.

This protocol provides protocol clients with two levels of granularity when obtaining incremental changes: the whole content database level, and the single site collection level. This flexibility means that protocol clients can obtain incremental changes to the entire content database, or incremental changes to just a single site collection.

Protocol clients typically run multiple processes that obtain incremental changes to multiple site collections in parallel. To facilitate this capability, this protocol requires a mechanism to identify the context of any given incremental change. This document and this protocol define the term, change tracking space to identify the context of any incremental change. The change tracking space identifies the context—both the site collection and the granularity—of any incremental change.

This protocol associates every incremental change with a specific change tracking space. A given change tracking space can specify the whole content database or it can specify a single, particular site collection. Subsites do not have their own change spaces, their changes belong to the site collection.

The protocol server MUST capture every change to site collection content in an atomic change record and mark each record with an opaque token, called a ChangeId. This ChangeId has the following properties:

  • Every ChangeId is unambiguously associated with a particular change tracking space.

  • All ChangeId tokens that belong to the same change tracking space MUST be ordered in the sequence of change occurrence time. ChangeId tokens that belong to different change tracking spaces might not be comparable.

  • Protocol clients (indexing services) MUST treat ChangeId tokens as opaque strings and SHOULD NOT interpret the ChangeId token structure.

  • Protocol clients (indexing services) MUST pass to the protocol server only ChangeId tokens previously received from the same protocol server.

  • Protocol clients (indexing services) use ChangeId tokens to request the set of changes that have occurred in a particular change tracking space because a specified ChangeId token. ChangeId tokens serve no other purpose.

  • The retention time for change records (how long the change records remain in the content database) is an implementation detail. If the protocol server receives a request for changes it has not retained, it MUST indicate that a ChangeId specified in a request for incremental changes is too old, meaning the change records are no longer retained. In this case, the protocol client SHOULD abandon incremental indexing using any previously-obtained ChangeId tokens in that change tracking space, and SHOULD restart with full traversal.

  • The protocol does not limit the number of changes that can occur after a certain ChangeId. To make the dialog round-trip time predictable, the protocol client sets the Timeout value in each request, limiting the time the protocol server can spend preparing the answer. The protocol server begins to build a change report starting from the oldest changes, and once the timeout is over, returns a report that includes all changes from the LastChangeId token (as received from the protocol client) and up to the new LastChangeId token, provided by the protocol server in the response. The protocol client MUST continue the process by supplying this new LastChangeId token. The protocol server MUST guarantee that by following this procedure, eventually the protocol client receives all changes.

Structural elements

From the change reporting point of view, the site collection is a structure consisting of sites, subsites and lists. Lists consist of views and list items. Lists can have folders, while list items can have files as attachments. This document defines the term, structural elements, to refer to all these components of a site. The site collection itself is a structural element of the content database.

Note that document libraries, documents, pages and other content types are just special cases of lists and list items respectively and are treated in the same manner.

The protocol server MUST consider a structural element to have changed only if the following conditions are satisfied:

  • The structural element has changed, for example, it was added, deleted, directly modified or renamed.

  • One or more components that this element consists of have changed.

  • Permissions to access this element have changed.

Change notifications

This document uses the term change notification to specify a fragment of a change report, describing the change to a particular structural element. The element that changed is referred to as the subject of this notification.

All change notifications are marked with the following attributes:

  • Change: Type of the change, as defined in section 3.1.4.3.5.1.

  • ItemCount: Number of subordinated structural elements (direct and indirect), whose change notifications are included into that same report.

All change reports, full or partial, contain one, and only one, root change notification. The subject of the root notification is the change tracking space. This change tracking space can specify the whole content database or a particular site collection.

If the root element is marked as "Unchanged", ItemCount = 0, it means that no direct or indirect changes are included in that report. Otherwise, root change notification MUST contain one or more change notifications for its direct components. These component change notifications typically contain nested change notifications.

The number of change notifications to include in one change report is an implementation detail of the protocol server. However, the change notifications created by the protocol server MUST conform to the following rules:

  • If a change notification about any structural element is included in the report, it is embedded in a change notification addressing the parent of the element. This is because the parent element is considered changed and MUST have ItemCount greater than zero. In other words, change notifications cannot be orphans, they are embedded in parent change notifications.

  • No change notifications addressing unchanged elements can be included in the report, with the exception of the root element.

  • Any given changed element, that is, the change notification addressing this element, can only appear in the change report once. All reported changes of its structural components MUST be embedded into that notification.

In other words, the change report MUST constitute a contiguous sub-tree of the tree of all structural elements in the change tracking space.

Permissions

The user or principal (1) that performs GetContent and GetChanges operations MUST have a special permission, called "Full Read".

Full Read permission is not normally granted to any user, not even to administrators. By default, only the service account used by the built-in indexing agent is granted Full Read permission.

Incremental indexing

The following diagram describes the recommended sequence of operations to perform incremental indexing.

Incremental indexing scenario

Figure 6: Incremental indexing scenario

The following occurs:

  1. To obtain incremental changes, the protocol client sends a GetChanges message, passing the following input parameters:

    • Content database identifier obtained in step 2 of Establishment of indexing context scenario as the value of contentDatabaseId parameter.

    • The StartChangeId token, obtained in the context establishment phase, as the value of LastChangeId parameter.

  2. The protocol server sends a GetChanges response message that contains a change report. This change report consists of one or more change notifications. Each change notification specifies a particular change location and the changed element itself. This change location specifies a particular site, subsite, list, or list item. It also returns the LastChangeId token (meaning to get further changes, provide this LastChangeId value) and a flag called moreChanges, which indicates whether there are more changes to retrieve.

  3. If the moreChanges flag is true, the protocol client sends another GetChanges message, passing the value of the LastChangeId token obtained in step 2.

  4. The protocol server sends a GetChanges response message similar to that in step 2, returning second change report. If there are no more changes, the protocol server returns false in the moreChanges flag.

The protocol client stores the LastChangeId obtained in step 4 to be used in the next incremental indexing session by repeating this sequence.

Full details of the individual operations of this protocol are specified in sections 3.1.4.1 to 3.1.4.14.