Using the Microsoft Sync Framework Metadata Storage Services

Microsoft Corporation
October 2009

Introduction

Microsoft Sync Framework is a comprehensive synchronization platform enabling collaboration and offline access for applications, services and devices. Developers can build sync ecosystems that integrate any application, any data from any store using any protocol over any network.

Sync Framework does not prescribe the type of data that can be synchronized, but it does require some common metadata be maintained for each replica and the data items that need to be synchronized. Solution developers create plug-in components called synchronization providers, which enable synchronization functionality for a given data store or service. Compatible providers can then be used in various combinations to enable end-to-end synchronization of data between same or different kinds of data stores. All synchronization providers need the ability of storing, maintaining, and accessing the common synchronization metadata. However, all data stores do not lend themselves naturally to storing this metadata as part of the store itself (for example, file systems, legacy stores, simple devices, etc.) and providers may need to come up with an additional store for this metadata.

The metadata storage service makes it easier to write a Sync Framework provider by providing a lightweight store implementation for the common synchronization metadata and an easy-to-use API for accessing this store. It also enables storing additional provider-specific metadata that may be necessary to enable specific synchronization scenarios. Furthermore, it provides additional value by providing helper methods and services that may it a lot easier to implement a synchronization provider (for example, change enumeration and delete detection services).

Overview

Most of the API supporting the metadata storage service is independent of the specific store implementation used to store the metadata. In this release of Sync Framework, we have an implementation of the metadata storage service built on a database that is a lightweight, compact, in-proc database designed for desktop usage. It provides the reliability, security and performance benefits of a database solution while being compact enough to build client applications on it. It is possible that in future releases, we may deliver other implementations of the metadata storage service API on other stores, based on customer feedback.

The metadata storage service provides the following features:

  • Support for the Sync Framework–required metadata like IDs, versions, synchronization scope metadata and cleaning up tombstones.
  • Support for tick count management so that providers can use this to create the correct update versions required by Sync Framework.
  • Support for providers to bridge the gap between change tracking provided by the data store (or lack thereof) and the change tracking and metadata maintenance required by Sync Framework. This includes support for efficient storage and retrieval of provider-specific metadata.
  • Transaction support so that providers can commit or roll back the metadata store depending on the success or failure of data updates.
  • Services that make it easier to implement provider functionality, for example, change enumeration methods and support methods for change application.

For ease of use, both managed and unmanaged implementations of the metadata storage service are available for use by the provider writers. The sample code referenced here is unmanaged, but will map nearly the same way to managed API calls. There is no difference in functionality provided by either implementation, so developers creating managed providers should use the managed API and those creating unmanaged providers should use the native API.

Enabled Scenarios

The metadata storage service is designed with the idea that writing providers for different scenarios becomes less complicated when built-in services will handle most of the details of storing and maintaining the required metadata. Some of the scenarios in which using the metadata storage service can really expedite solution development are:

  • An application that needs to keep two file system replicas synchronized. This file system synchronization provider needs an efficient place to store the Sync Framework metadata as well as custom metadata to correctly detect local changes to the file system replica, including file deletes. The provider needs to store a snapshot of timestamps, attributes, filename, and other information about files found on a previous synchronization to correctly enumerate creates, deletes, renames, and other changes for the next synchronization. The metadata storage service provides efficient storage for all of this information as well as value-added services on the stored information. The metadata storage service database file can be placed in the file system replica itself.
  • Consider a scenario in which data on a limited-form-factor mobile device needs to be synchronized with a desktop PC. The device itself provides no change tracking for its data and is not capable of storing any extra metadata. In this case, the synchronization provider can store all of the metadata related to the device store in a metadata storage service store on the host PC that is updated every time the device is synchronized with the PC.
  • Another scenario in which the metadata storage service can be used with a lot of benefit is when data from a Web service needs to be synchronized locally for offline access. In this scenario, if the web service allows extra storage, the metadata store can be stored online, downloaded for a synchronization operation, and then saved back to the service at the end. Or if the service is read-only, the metadata store can be maintained offline only on the local PC and used during synchronization operations.

Using the Metadata Storage Service

Let's see how the metadata storage service can be used to provide the Sync Framework metadata by looking at it in the context of a simple provider used to synchronize file data. Figure 1 below depicts the various components involved and sample interactions between them, to show how the metadata storage service can be used to implement a synchronization provider.

Figure 1. Using the metadata storage service

Storing Sync Framework Metadata

The metadata storage service provides a clear API to set the requisite Sync Framework metadata to the store. It maintains the consistency requirements of such metadata (unique identifiers) to help detect errors when the metadata is out of sync. The metadata storage service supports information at the per-replica level (knowledge and forgotten knowledge).

In addition to providing support for the identifiers, the metadata storage service also supports version information (creation version and current version) and a standard field for tracking whether an item was deleted in the data store. Updating the current version on an item to reflect local changes ensures that the next time you enumerate changes from this store it will be included in the batch of changes.

Storing Custom Metadata

The metadata storage service allows for custom-defined metadata to be added to the store. This has the advantage of:

  • Extensibility: The provider can add provider-specific metadata to the store by specifying column information when creating the store for the first time. This metadata can aid the provider in change detection when it can use the snapshot of metadata from a previous synchronization to compare against current metadata.
  • Performance: When adding custom columns to the metadata storage service, information can be provided as to the appropriate indexes which need to be added to ensure that subsequent queries can be performant.

This custom metadata can be defined at the time that the store and replica information is created for the first time. Column information for these custom metadata values are constrained by the limitations and requirements of the underlying database. At the time of creation of these fields, you can also specify the required indexes to aid with searches in the future. By default, the metadata storage service has some indexes defined on the Sync Framework data.

const WCHAR* lastWriteTimeField = L"LastWriteTime";

CUSTOM_FIELD_DEFINITION customField = {

    *lastWriteTimeField,

    SYNC_METADATA_FIELD_TYPE_UINT64,

    0 /*not used for UINT64 */

};

CUSTOM_FIELDS_INDEX customFieldsIndex = {

    &lastWriteTimeField,

    1,

    FALSE

};

pSqlMetadataStore->InitializeReplicaMetadata((BYTE *)&guidReplica, &IdParams, 

    customField, 1, &customFieldsIndex, 1);

Tick Count Management

Providers are expected to implement a tick count management solution to correctly assign versions to local changes. The metadata storage service helps providers by implementing a solution that returns an incrementing tick count. It also ensures that the knowledge stored in the metadata storage service is up to date by always returning the saved knowledge with the local tick count. This way the providers do not have to explicitly set the current tick count in knowledge and can rely on the metadata API to return the right value.

Searching the Metadata Store

The metadata storage service provides support for the efficient searching of the Sync Framework metadata. Usually providers need to search by local or global IDs or by values set in the custom metadata. The code snippet below shows how you can search by a custom field that returns an enumerator of all items that match the search query.

IItemMetadataEnumerator *pItems = NULL;

pReplica->CreateEmptyFieldValue(&pLastWriteTimeValue);

pLastWriteTimeValue->SetUInt64Value(uint64Value);

pReplica->FindItemMetadataByIndexedFields(&lastWriteTimeField, &pLastWriteTimeValue, 1, 

    &pItems);

Searches for strings are case-insensitive, so searching for an item as "ABCDE" or "abCde" will yield the same results.

Change Detection and Delete Detection Services

Providers can compare a snapshot of the metadata taken at the beginning on a change detection pass to compare against the metadata stored on a previous pass. The metadata storage service provides tick count management for providers to assign versions to the new detected changes. A simple change detection pass to assign versions to an updated file on a file system can look like this:

pReplica->GetNextTickCount(&ullTickCount);

SYNC_VERSION version;

version.dwLastUpdatingReplicaKey = 0; 

version.ullTickCount = ullTickCount;

FILETIME ftLastWriteTimeInMDS = {0};       

FILETIME ftLastWriteTimeOnFS = {0};       

GetLastWriteTimeInMDS(pExistingItem, &ftLastWriteTimeInMDS);

GetLastWriteTimeOnFS(L"FileName", &ftLastWriteTimeOnFS);

if (1 == CompareFileTime(&ftLastWriteTimeOnFS, &ftLastWriteTimeInMDS))

{

    pExistingItem->SetChangeVersion(&version);

    SetLastWriteTime(pExistingItem, ftLastWriteTimeOnFS);     

    pReplica->SaveItemMetadata(pExistingItem);

}

The metadata storage service provides the ability to track the existence of an item in the store by allowing items to be marked as deleted. The metadata storage service also provides the mechanism by which a provider can report all active items and then query the store for unreported items. These unreported items can then be marked as deleted and the deletes can then flow throughout the community.

The code below extends the code above to show how, on a change detection pass, the provider can report every item as active by either explicitly using the ReportUnmodifiedItem method or updating the version (SetCurrentVersion) on an item to denote a change. Using the GetUnreportedItems call will then return the unreported items that can be marked as deleted.

pReplica->ResetReportingWatermark();

hFind = FindFirstFile(wszSearchPath, &findData);

do

{

    IFieldValue* pFileNameValue;

    pReplica->CreateEmptyFieldValue(&pFileNameValue);

    pFileNameValue->SetStringValue(findData.cFileName);

    IItemMetadata *pExistingItem = NULL;

    LPCWSTR fileNameField = L"FileName";

    pReplica->FindItemMetadataByUniqueIndexedFields(&fileNameField, &pFileNameValue, 1, &pExistingItem);

    if(NULL != pExistingItem)

    {

        fItemChanged = DetectFileChanged(pExistingItem);

        if(fItemChanged)

        {

            pExistingItem->SetChangeVersion(&version);

            pReplica->SaveItemMetadata(pExistingItem);

        }

        else

        {

            pReplica->ReportLiveItemByIndexedFields(&fileNameField, &pFileNameValue, 1);

        }

    }

    else

    {

        // No item found in the data store, so create a new Item

        IItemMetadata *pItem;

        pReplica->CreateNewItemMetadata(&pItem);

        // ... Assign IDs and other custom information to pItem

        pItem->SetChangeVersion(&version);

        pItem->SetCreationVersion(&version);

        pReplica->SaveItemMetadata(pItem);

    }

}

while (FindNextFile(hFind, &findData) != 0);

Now, having reported all active items, we can ask the store for all the items that need to be marked as deleted.

IItemMetadataEnumerator* pItemMetadataEnumerator;

pReplica->GetUnreportedItems(&pItemMetadataEnumerator);

IItemMetadata *pItemMetadata;

ULONG cFetched = 0;

pItemMetadataEnumerator->Next(1, &pItemMetadata, &cFetched);

do

{

    pItemMetadata->MarkAsDeleted(&version);

    pReplica->SaveItemMetadata(pItemMetadata);

    pItemMetadataEnumerator->Next(1, &pItemMetadata, &cFetched);

}

while(cFetched == 1);

For privacy reasons, once an item has been marked as deleted and saved to the store, the provider must clear out all custom metadata.

Transaction Support

The metadata storage service supports implicit and explicit transactions on the store. For implicit transactions, metadata is flushed to the metadata store at a flush interval determined by the store. The store is also guaranteed to be flushed when all connections are released and the store is closed.

For providers that need a stronger guarantee than above, we provide explicit transaction support. At the end of a transaction, data is flushed to disk immediately. Presently, the metadata storage service does not support distributed transactions, so providers cannot use explicit transactions to send transaction updates to their data and metadata stores together. They can, however, use explicit transactions to batch updates to the metadata storage service in case they support batched updates to the data store.

Change Enumeration and Change Application Services

The metadata storage service provides an efficient implementation of the GetChangeBatch method that Sync Framework requires a provider to implement. This implementation requires providers to have completed a change detection pass (explicit, or implicit if using a notification service) over their data and ensure that all versions accurately reflect any local changes on the data.

The metadata storage service GetChangeBatch implementation allows providers to exclude changes from the batch via callbacks. For example, there could be providers that do not wish to ever synchronize delete changes in case of a backup scenario. These providers need to implement the IChangeBatchCallback interface and pass it in as a parameter to the metadata storage service GetChangeBatch method. The provider will get called on every change that should be added to the change batch, and it can specify whether the change should be excluded.

The metadata storage service also provides support for change application by providing an implementation of the GetItemBatchVersions method that Sync Framework requires a provider to implement.

Thread-Safety

The metadata storage service API itself is not thread-safe and does not provide guarantees around the consistency of concurrent operations on the metadata store. This means that:

  • All method calls on an interface must be serialized. You cannot call multiple methods on the same interface at the same time.
  • You can call these APIs on any thread.

If multi-threaded access to data in the metadata storage service is required, applications and providers will need to use appropriate thread synchronization mechanisms.

Conclusion

The metadata storage service API was designed to make it easier to develop synchronization providers where the data store is not capable of storing the standard metadata required for synchronization. It is already used by other components (such as the file synchronization provider) that ship with Sync Framework, and we hope that it proves to be useful to other provider writers. For the future, we plan on enhancing the metadata storage service to more intrinsically support other Sync Framework features like change units and tombstone cleanup.