Manage Your Data Effectively With The Microsoft Sync Framework
This article discusses:
- Microsoft Sync Framework
- Synchronization Services
- Metadata storage
- Schema synchronization
This article uses the following technologies:
ADO.NET, Microsoft Sync Framework, SQL Server
It's never a trivial task to synchronize data between different data stores. Writing your own logic to perform the synchronization is a time-consuming and costly process. But armed with the Microsoft Sync Framework and its constituent APIs, building synchronization logic becomes much simpler. The Microsoft Sync Framework helps you integrate application data from any data store using any protocol over any network.
Together with the Microsoft Sync Framework, Microsoft released Microsoft Synchronization Services for ADO.NET, Synchronization Services for File Systems, and Synchronization Services for FeedSync, all of which enable developers to perform synchronizations for a specific environment, unlike the Microsoft Sync Framework, which is for building synchronization logic within your application.
These Sync Services provide a set of tools to help you synchronize data between two database sources, synchronize files between machines, and synchronize your application with an RSS or ATOM feed. With the help of these services, you can focus more of your effort on your application logic and less on data synchronization.
The Microsoft Sync Framework is useful in a number of scenarios. Let's take a look at some common ones.
Download only Synchronization In this scenario, the client PC will not have rights to determine what to store and how to keep track of the data stored. An example of this would be RSS Sync. The RSS provider determines how the information is available and how frequently it gets updated. The client determines what content is offered and when.
Offline Data Access Often, you'll need to give some Windows applications offline data access so that the application can continue to run properly if the user is not connected to the network. An example is an inventory application. You would want your remote salespeople to be able to enter sales data on their laptops and then later sync with your database.
Peer-to-Peer Data Collaboration Sometimes you'll issue local copies of your database to increase performance on the client. When there are several such local copies floating around, you'll need to update your back-end data repository appropriately so that all changes are accounted for. One typical example of this is a point-of-sale (POS) application. As sales happen, the POS application in each store records the sales transactions and updates the store's inventory. These updates must be reflected in both the central data store and in all the local copies of the database that other shops are using so that salespersons in each shop know the status of the inventory on hand.
These data access scenarios represent the most common uses for Microsoft Sync Framework. Let's now take a look at how it works to solve these offline data access problems.
Behind the Microsoft Sync Framework
The Microsoft Sync Framework uses a session to carry out data synchronization. Within a session, you define a data source, from which the changes originate, and a destination, to which the changes will be replicated. When the synchronization is triggered by a client, the following happens:
- A synchronization session is created.
- The Sync Framework checks the destination data to see if there have been changes from any other sources committed since the local copy was downloaded.
- Any changes found are sent back to the source to have them applied.
- The destination will then request that the source send its changes to the destination store.
- The source will send out the changes, and they will be applied to the destination data store.
In order for this to work, both source and destination need to maintain some metadata so they can identify the changes that have happened between synchronizations. There are a number of ways to do this, each of which depends on the type of data we are synchronizing.
Depending on the type of application and synchronization you are trying to implement, you may not need to implement custom change tracking because some data does not include ways for tracking changes. But some data, such as files stored in the file system, already contains create and update metadata. Microsoft Sync Framework supports both one-way synchronization and two-way synchronization. Two-way synchronization is actually implemented as two one-way synchronizations between the source and destination.
Microsoft Sync Framework uses the SyncOrchestrator class to manage the data flow between two data sources. You don't need to extend this class in any way unless you want to add custom application logic during the synchronization session.
The direction of the synchronization is controlled by the Direction property of the SyncOrchestrator class. You can choose whether the specific SyncOrchestrator instance is used for performing an upload or download or both synchronizations.
Providers are used to communicate with source and destination data stores. You need to tell SyncOrchestrator which provider is used before you trigger the synchronization. Once you do trigger it, SyncOrchestrator will call methods in the provider to retrieve changes, save them, and resolve conflicts. Figure 1 illustrates how the synchronization works.
Figure 1 The SyncOrchestrator Object
The Microsoft Sync Framework uses a provider to communicate with both the source and the destination. The provider is responsible for communicating with the data source used to store application data in a manner that is compatible with the associated data store. It is also responsible for detecting changes based on information provided by the other side of the sync session, for applying changes received, and for retrieving changes from the data source.
Developing a provider can require significant time and effort. To implement a synchronization provider, you need to implement the KnowledgeSyncProvider abstract class, the IChangeDataRetriever interface, and the INotifyingChangeApplierTarget interface. Within these classes and interfaces, there are a number of events available for implementing the logic for detecting changes and applying them.
You can implement a proxy provider if there is not a direct connection between the client application and the central data stores. The proxy provider receives commands from and relays those commands to other providers in the synchronization session.
Proxy providers are useful in scenarios such as when a client application communicates with a central data store over the Internet. In this case, you might have the client talking to a proxy through HTTPS and then have the proxy talk to SQL Server. This configuration helps to reduce the exposure of internal servers to the Internet but still allows external users to use Sync Framework to synchronize data.
You implement the proxy provider in the same way you implement any other provider. The only difference between the two is that the proxy provider will not be triggered by the client application directly but by another provider.
Metadata Storage Services in the Sync Framework
In some data sources, you cannot store additional information or implement any kind of change tracking. One good example is the Windows file system. In such a case, Microsoft Sync Framework provides a solution where instead of having the developer build a metadata repository, the Sync Framework provides a lightweight data store called Metadata Storage Services.
Stores you create within Metadata Storage Services can contain two types of data, replica-level data and item-level data. They are used to store information regarding a particular destination (a replica) and an item. Each destination needs a replica ID that uniquely identifies the destination.
ReplicaMetadata, ItemMetadata, and MetadataStore are abstract classes that represent replica metadata, item metadata, and a metadata store. You can manipulate your metadata with the help of these classes.
SqlMetadataStore is the concrete implementation of Metadata Storage Services, which comes with Microsoft Sync Framework to help developers manage metadata at replica level. Unfortunately, the Microsoft Sync Framework does not provide an implementation on the item level for metadata support. Instead, you need to create your own implementation of the ItemMetadata class and create MetadataStore objects if want to store item-level metadata. SqlMetadataStore uses SQL Server to store the metadata in a file you define.
To work with SqlMetadataStore, you either open a saved store using the OpenStore method or create a new one using the CreateStore method. Usually, one store is sufficient for one application. But if you like, you can also maintain multiple stores. To retrieve metadata from a store, you will use the GetReplicaMetadata method.
The following shows how to retrieve items from a SqlMetadataStore:
SyncIdFormatGroup idFormat=new SyncIdFormatGroup();
SyncId syncId=new SyncId(1);
ReplicaMetadata item= metadataStore.GetReplicaMetadata(idFormat,syncId);
When using SqlMetadataStore, you need to specify the schema of the data you'll be storing. You can do so by calling the InitializeReplicaMetadata method of the SqlMetadataStore. This method creates and populates the tables to hold information in the underlying SQL database. For example, the following shows how to populate a metadata store with one field:
SyncIdFormatGroup idFormat=new SyncIdFormatGroup();
SyncId syncId=new SyncId(1);
FieldSchema fields=new FieldSchema;
IndexSchema indexes=new IndexSchema;
ReplicaMetadata item= metadataStore.InitializeReplicaMetadata( idFormat,syncId, fields,indexes );
Sync Services in Action
Now, let's see how you can add synchronization support to your application using Sync Services for ADO.NET. With the help of SQL Server Compact Edition 3.5, you can create a lightweight data store in local file systems. The good thing about SQL Server Compact Edition is that it works like SQL Server but uses a file as the repository, eliminating the need to modify the application to work with a local data store.
Synchronization Services for ADO.NET supports the synchronization of data and schema between SQL Server databases including SQL Server Compact Edition databases. Here, I will show you how to enable an application that uses a central SQL Server database for offline access via a replica maintained in a SQL Server Compact Edition database.
The first thing to do when you want to implement Synchronization Services for ADO.NET is to decide how to track changes in your data. In Synchronization Services for ADO.NET, you can leverage SQL Server 2008's Integrated Change Tracking or you can use custom tracking by managing change tracking yourself in the application database.
Within SQL Server 2008, tables that are enabled for integrated change tracking will log every data change in a change tracking log table and Sync Services will look at this log to identify all changes since the last synchronization. This keeps the developer from having to change the database schema of an existing application.
But sometimes, as is the case for applications using SQL Server 2005, integrated change tracking may not be available. In such cases, you should consider using custom change tracking. The custom change tracking approach identifies the insertion and update of records in a table with the help of columns and identifies deletions with the help of a tombstone table. To help Sync Services identify rows that have been updated or inserted since the last update, you will need to add four columns—Update Originator, Update Time, Create Originator, and Update Time—to each table that you wish to track. You also need to create a table called a tombstone table to keep a log of all rows that are removed. Within the table, you need at least two pieces of information: the time of deletion and the row's original primary key.
The tombstone table is where Sync Services finds entries that must be removed from the destination store because they have been removed from the source. All of this information helps Sync Services to identify the changes since last synchronization.
Synchronizing Data and Schemas
When implementing synchronization between databases, keeping schemas consistent becomes an issue for most applications. As such, Sync Services for ADO.NET also supports synchronizing database schemas. There is a limitation though. Sync Services will create the table in the client database only the first time it performs the sync unless the developer configures it to recreate it every time it synchronizes.
To configure your providers to use Sync Services for ADO.NET, you need to implement the code in the provider to configure the tables to be synchronized and the direction in which you want the synchronization to run. You can choose to synchronize data from the source to the destination, from the destination to the source, or both. You can place the code to configure the synchronization in the constructor of the provider class. You can also separate tables into groups so that you can have more fine-grained control over the individual tables to be synchronized.
With the destination, most of the time you won't really need to explicitly configure the provider because Sync Services for ADO.NET will automatically create the appropriate schemas and load the data. You simply need to specify the location of the destination database. Sync Services for ADO.NET comes with two providers: one is designed to work with SQL Server and one is designed to work with SQL Server Compact Edition. The only real difference is in the underlying library used to communicate with the database engine; the SQL Server provider uses System.Data.SqlClient while the provider for SQL Server Compact Edition uses System.Data.SqlCeClient.
The provider that is used for communicating with SQL Server is called DbServerSyncProvider while the one designed for SQL Server Compact Edition is SqlCeClientSyncProvider. When you are initializing the provider, you will need a SyncAdapter object, which contains the information about how the provider will communicate with the underlying SQL Server database, and the details about change tracking, such as which column is the change originator column, and whether custom SQL change tracking is used. Another important piece of information defined in the SyncAdapter is the direction of the synchronization. Sync Services for ADO.NET provides the ability for developers to define the direction of the synchronization on a per-table basis. Sync Services for ADO.NET comes with a SyncAdapterBuilder that helps you create the required SyncAdapter object and the SQL statements by supplying information such as table name, name of updater column, create originator column, and so on. Figure 2 illustrates how the adapter builder helps you create a SyncAdapter.
Figure 2 SyncAdapterBuilder
sqlSABuilder = new Microsoft.Synchronization.Data.Server.SqlSyncAdapterBuilder(sqlconn);
sqlSABuilder.TableName = "User";
sqlSABuilder.FilterClause = "username='" + System.Threading.Thread.CurrentPrincipal.Identity.Name + "' ";
sqlSABuilder.TombstoneTableName = "Deleted_"+ sqlSABuilder.TableName;
sqlSABuilder.SyncDirection = Microsoft.Synchronization.Data.SyncDirection.Bidirectional;
sqlSABuilder.CreationTrackingColumn = "ct";
sqlSABuilder.UpdateTrackingColumn = "ut";
sqlSABuilder.DeletionTrackingColumn = "dt";
sqlSA = sqlSABuilder.ToSyncAdapter(true, true, true, true);
sqlSA.TableName = "User";
Having said that, most of the time we just need to add logic for populating SyncAdapters on the source provider. You'll need to do both if you are doing a bi-directional synchronization.
To set up database synchronization, create an instance of the SyncAgent object, which is similar to the SyncOrchestrator object I mentioned earlier but designed to carry out synchronization for Sync Services for ADO.NET. Then assign providers to the RemoteProvider and LocalProvider properties of the object. Next, call the Synchronize method of the SyncAgent, which will trigger the synchronization process and call the respective events and methods that are implemented by the providers. The following illustrates how to configure SyncAgent and start synchronization.
SyncAgent syncAgent=new SyncAgent();
syncAgent.LocalProvider = new SqlCeClientSyncProvider ();
syncAgent.RemoteProvider = new DbServerSyncProvider();
One point to note about Sync Services for ADO.NET is that it's possible to use it with almost any data source supported by ADO.NET, so you're not limited to SQL Server as the source and/or destination database.
Sync Services for ADO.NET uses the SyncTable object to store information about the tables involved in the synchronization. The following code demonstrates how to populate the SyncTable object and configure it with SyncAgent. You can use the SyncGroup class to aggregate SyncTables into a synchronization group that ADO.NET will process as a whole and ensure consistency for all tables in the group.
SyncGroup lookupSyncGroup=new SyncGroup("LookupTables");
SyncTable syncTable = new SyncTable("Customer");
syncTable.CreationOption = syncTable.DropExistingOrCreateNewTable;
syncTable.SyncDirection = syncTable.Bidirectional;
syncTable.SyncGroup = lookupSyncGroup;
Implementing Custom Synchronization
What if you want to implement synchronization using your own logic or you want to implement synchronization for data stores that are not supported by the Microsoft Sync Framework? In this case, you need to create your own provider and implement logic for retrieving data from the underlying data store.
To do this, create a class that derives from the abstract class KnowledgeSyncProvider and override the GetChangeBatch method to retrieve the changes. This method is triggered by the SyncOrchestrator object when you initiate the synchronization process. All detected changes are stored in an object called ChangeBatch, which is passed to the ProcessChangeBatch method of the destination provider. The destination provider will process the ChangeBatch object and use its own logic to apply the changes to the destination data store.
The FilterInfo object is used to communicate filtering information between source and destination providers. The source provider is responsible for populating this object and attaching it to the ChangeBatch object. The destination provider can then use it to apply changes to only an appropriate subset of data. Once the custom provider is implemented, you can use it with SyncOrchestrator. Keep in mind, however, that synchronization providers are not thread-safe, so you need to use cloned providers in new threads if you want to have multiple synchronization running under different threads. To do this, create a new provider object and merge it with the existing one by using the Union method. Then configure SyncOrchestrator to use the newly created provider instead of the original one. Otherwise, the synchronization will fail if there is more than one running synchronization session associated with that provider.
Conflict resolution is always problematic when implementing synchronization logic in applications. The Microsoft Sync Framework will determine if there is a conflict on the item using the following logic.
- 1.The destination provider determines whether the destination replica's version information for the item is contained in the source replica's knowledge.
- 2.If the destination replica's version is not contained in the source replica's knowledge, the object is said to be in conflict.
If the destination version information is not included in the item itself, the Sync Framework will trigger the TryGetDestinationVersion event of the class that implemented the INotifyingChangeApplierTarget interface for each item. You can then add logic in this event to provide the versioning information to the Sync Framework to aid in conflict resolutions.
public bool TryGetDestinationVersion(ItemChange sourceChange, out ItemChange destinationVersion)
Microsoft Sync Framework supports some simple conflict resolution by default. You can configure the conflict resolution policy of the provider and the Sync Framework will perform conflict resolution for you. By default, the Sync Framework supports the following five types of conflict resolution policy.
- Source wins: the source replica always wins if there is a conflict.
- Destination wins: the destination replica always wins if there is a conflict.
- Merge: the changes from both source and destination are merged to form a new version of the item, which is then sent to the destination to be applied.
- Log: conflicts are ignored and the conflicting item information is sent to the SaveConflict event of the class implementing INotifyingChangeApplierTarget.
- Defer: completely ignore the conflicts and the destination store will not receive information regarding the conflicts.
All of these conflict resolution policies are applicable at item level but only source wins and destination wins apply to session. Sync policy is specified for the provider by setting the ConflictResolutionPolicy property.
destinationProvider.Configuration.ConflictResolutionPolicy = ConflictResolutionPolicy.DestinationWins;
You can configure the provider to report the progress of the synchronization instead of using the default configuration. By default, the provider will report progress by calling the ProgressChanged event of the associated SyncCallback objects. You can change this behavior by implementing your own progress reporting. To do this, you need to implement events at the item level using the WorkEstimate event of the ItemChange class, or at the Batch level using the BatchWorkEstimate event of the ChangeBatch class. Then within your provider code, you can call the OnProgressChanged method of the provider to report the change whenever you think that is appropriate.
The Microsoft Sync Framework is an application programming framework designed to help build synchronization into applications. While the framework itself includes a number of features and libraries that simplify the development work of building synchronization between different data sources, it does not really perform the synchronization: it relies on developers to provide the logic for detecting and applying changes and resolving conflicts.
Sync Services is a set of libraries created to help address the need for synchronization among some commonly used data stores, such as SQL Server, the file system, and RSS. With Sync Services, you do not need to spend time developing the logic for the synchronization; you simply need to adapt the library for your application. The result is fast, easy synchronization. By adapting Microsoft Sync Framework, you can easily bring support of offline data access and collaboration to your own applications.
James Yip MCT, MCITP, MCPD, PMP, is Managing Consultant of Eventus Limited, a Hong Kong–based technologies consulting company. James works as an architect and project manager. He also teaches MOC classes in both Windows Server Systems Administration and .NET development and works as an SME and TR for MOC courses and authors courseware.