Performing Incremental Crawling with the Business Data Catalog in SharePoint Server 2007

Summary: Learn to create a Business Data Catalog application definition file that supports incremental crawling. (4 printed pages)

Joel Krist, Akona Systems

October 2008

Applies to: Microsoft Office SharePoint Server 2007

Contents

  • About the Business Data Catalog

  • Searching Business Data

  • Authoring Required Metadata to Support Business Data Search

  • Configuring the __BdcLastModifiedTimestamp Property and IDEnumerator Method

  • Conclusion

  • Additional Resources

About the Business Data Catalog

The Business Data Catalog is a shared service introduced with Microsoft Office SharePoint Server 2007 that provides an easy way to integrate business data from back-end server applications, such as an SAP or Siebel application, within Office SharePoint Server 2007. After data from back-end line-of-business (LOB) systems is made available via the Business Data Catalog, you can use it in a variety of SharePoint features and applications such as lists, Web Parts, and search. You can also use the data in custom Business Data Catalog applications. To achieve the goal of requiring minimal coding, the Business Data Catalog uses metadata (see Business Data Catalog: Metadata Model) to describe the APIs of business applications. Business Data Catalog metadata defines the business entities that a business application interacts with and the methods it exposes to allow access to its data. Metadata authors define Business Data Catalog metadata by using XML to create Business Data Catalog application definition files. The Business Data Catalog stores the metadata in the metadata repository. For more information about the Business Data Catalog architecture, see Business Data Catalog: Architecture. For more information about the development life cycle with the Business Data Catalog, see Business Data Catalog: Roles and Development Life Cycle.

Authoring Required Metadata to Support Business Data Search

Enterprise Search in Microsoft Office SharePoint Server 2007 supports the indexing and searching of LOB data via the Business Data Catalog. For example, registering an LOB database that contains customer data with the Business Data Catalog, and then configuring the Business Data Catalog application as a search content source, enables Enterprise Search to index data in the LOB system and return customer data in search results.

The application definition file of a business application must contain the required metadata to enable Enterprise Search to use the Business Data Catalog to index the business application's data. A Business Data Catalog search crawl has two phases:

  1. ID enumeration: In this first phase, all entity instance identifiers are fetched. An entity is a business object, such as a customer or a product. Using database terminology, an entity instance is equivalent to a row in a table. An entity instance identifier is the key that uniquely identifies a specific entity; in database terms, this is typically the entity's primary key.

  2. Detail fetch: In this second phase, details are selected for entity instances that are represented by the identifiers returned in the first phase.

Business Data Catalog entities have associated methods that are used to select entity instances from the LOB system. The Business Data Catalog metadata model supports a specific type of method named IDEnumerator. The Business Data Catalog uses an entity's IDEnumerator method to retrieve the list of identifiers, or unique keys, for each entity that should be searchable. An entity can have zero IDEnumerator methods or one IDEnumerator method.

Configuring the __BdcLastModifiedTimestamp Property and IDEnumerator Method

An entity's IDEnumerator method allows the data for the entity to be indexed as part of a full crawl. To support incremental crawling, one of the fields returned by the entity's IDEnumerator method should represent the date and time that the entity instance was last updated in the LOB application. In addition, the __BdcLastModifiedTimestamp property of the entity must be set to the value of the Name attribute of the TypeDescriptor for the field that is returned by the IDEnumerator method that represents the last modified date.

The following metadata fragment shows how to set the __BdcLastModifiedTimestamp property on an entity named Product.

<Entity EstimatedInstanceCount="10000" Name="Product">
  <Properties>
    <Property Name="Title" Type="System.String">ProductName</Property>
    <Property Name="__BdcLastModifiedTimestamp" Type="System.String">LastModifiedOn</Property>
  </Properties>

Notice that the Type attribute of the __BdcLastModifiedTimestamp property must be System.String because it represents the name of the field that is returned by the IDEnumerator method that contains the last modified date and time of the entity instance.

The following metadata fragment shows the corresponding IDEnumerator method for the Product entity.

<Method Name="ProductIDEnumerator">
  <Properties>
    <Property Name="RdbCommandText" Type="System.String">SELECT ProductKey, LastModifiedOn FROM Products</Property>
    <Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=2.0.0.0, 
      Culture=neutral, PublicKeyToken=b77a5c561934e089">Text</Property>
  </Properties>
  <AccessControlList>
    <AccessControlEntry Principal="LITWAREINC\administrator">
      <Right BdcRight="Edit" />
      <Right BdcRight="Execute" />
      <Right BdcRight="SetPermissions" />
      <Right BdcRight="SelectableInClients" />
    </AccessControlEntry>
    <AccessControlEntry Principal="NT AUTHORITY\Authenticated Users">
      <Right BdcRight="Execute" />
      <Right BdcRight="SelectableInClients" />
    </AccessControlEntry>
  </AccessControlList>
  <Parameters>
    <Parameter Direction="Return" Name="Products">
      <TypeDescriptor TypeName="System.Data.IDataReader, System.Data, Version=2.0.3600.0, Culture=neutral, 
        PublicKeyToken=b77a5c561934e089" IsCollection="true" Name="ProductDataReader">
        <TypeDescriptors>
          <TypeDescriptor TypeName="System.Data.IDataRecord, System.Data, Version=2.0.3600.0, Culture=neutral, 
            PublicKeyToken=b77a5c561934e089" Name="ProductDataRecord">
            <TypeDescriptors>
              <TypeDescriptor TypeName="System.Int32" IdentifierName="ProductKey" Name="ProductKey" />
              <TypeDescriptor TypeName="System.DateTime" Name="LastModifiedOn" />
            </TypeDescriptors>
          </TypeDescriptor>
        </TypeDescriptors>
      </TypeDescriptor>
    </Parameter>
  </Parameters>
  <MethodInstances>
    <MethodInstance Type="IdEnumerator" ReturnParameterName="Products" ReturnTypeDescriptorName="ProductDataReader" 
      ReturnTypeDescriptorLevel="0" Name="ProductIDEnumeratorInstance">
      <AccessControlList>
        <AccessControlEntry Principal="LITWAREINC\administrator">
          <Right BdcRight="Edit" />
          <Right BdcRight="Execute" />
          <Right BdcRight="SetPermissions" />
          <Right BdcRight="SelectableInClients" />
        </AccessControlEntry>
        <AccessControlEntry Principal="NT AUTHORITY\Authenticated Users">
          <Right BdcRight="Execute" />
          <Right BdcRight="SelectableInClients" />
        </AccessControlEntry>
      </AccessControlList>
    </MethodInstance>
  </MethodInstances>
</Method>

Notice that the SQL SELECT statement from the method's RdbCommandText property selects both a Product entity instance's unique key, represented by the ProductKey field, and the last modified date and time of the instance, represented by the LastModifiedOn field, which is a DateTime column in the LOB database.

<Property Name="RdbCommandText" Type="System.String">SELECT ProductKey, LastModifiedOn FROM Products</Property>

In addition, the Products return parameter has a TypeDescriptor with its TypeName attribute set to System.DateTime, and its Name attribute is set to the same value as that specified for the entity's __BdcLastModifiedTimestamp property.

<TypeDescriptor TypeName="System.DateTime" Name="LastModifiedOn" />

After an entity's __BdcLastModifiedTimestamp property and IDEnumerator method are correctly configured, the Business Data Catalog can perform incremental crawls of the entity data by requesting the data for only those entity instances that were modified since the last crawl.

Conclusion

This article describes how the Business Data Catalog supports incremental crawling through the use of the __BdcLastModifiedTimestamp entity property and the IDEnumerator method. The Type attribute of the __BdcLastModifiedTimestamp property must be System.String and its value must be set to the value of the Name attribute of the TypeDescriptor for the field returned by the IDEnumerator method that contains the last modified date and time of the entity instance.

Additional Resources

For more information, see the following additional resources: