Introduction to Microsoft Sync Framework
Microsoft Sync Framework is a comprehensive synchronization platform enabling collaboration and offline for applications, services and devices. Developers can build synchronization ecosystems that integrate any application, any data from any store using any protocol over any network. Sync Framework features technologies and tools that enable roaming, sharing, and taking data offline.
A key aspect of Sync Framework is the ability to create custom providers. Providers enable any data sources to participate in the Sync Framework synchronization process, allowing peer-to-peer synchronization to occur.
A number of providers are included by Sync Framework that support many common data sources. Although they are not required, to minimize development it is recommended that developers use these providers wherever possible. The following are the providers included:
Developers can ultimately use any of the out-of-the-box providers or can create custom providers to exchange information between devices and applications.
The goal of this document is to help you understand how Microsoft Sync Framework enables synchronization. In this document we will outline some key concepts that will form the basis for how to create a provider. For more detailed information on Microsoft Sync Framework please see http://msdn.microsoft.com/sync.
Before discussing the specific components of a provider, we first need to understand the different types of participants that can be supported. A participant is the location where information from the data source is retrieved. A participant could be anything from a web service, to a laptop, to a USB thumb drive.
Based on the capabilities of the device, the way that a provider integrates synchronization will vary. At the very least, we will assume that the device is capable of programmatically returning information when requested. Ultimately, what needs to be determined is if the device can:
It is important to distinguish the types of participants that will be part of the synchronization ecosystem because it tells us if they will be able to store the state information required by the provider and it also tells us if we are able to execute the provider directly from the device. Ultimately, the participant model is meant to be generic. As such, a full participant could be configured to be either a partial or simple participant.
Full participants are devices that allow developers to create applications and new data stores directly on the device. A laptop or a Smartphone are examples of full participants because new applications can be executed directly from the device and you can also create new data stores to persist information if required.
Partial participants are devices that have the ability to store data either in the existing data store or another data store on the device. These devices, however, do not have the ability to launch executables directly from the device. Some examples of these participants are thumb drives or SD Cards. These devices act like a hard drive where information can be created, updated or deleted. However, they do not typically give an interface that allows applications to be executed on them directly.
Simple participants are devices that are only capable of providing information when requested. These devices cannot store or manipulate new data and are unable to support the creation of new applications. RSS Feeds and web services provided by an external organization such as Amazon or EBay are both examples of simple participants. These organizations may give you the ability to execute web services and get results back, however, they do not give you the ability to create your own data stores and they also do not give you the ability to create your own applications to be executed within their web servers.
Bringing it All Together
Ultimately the goal of Microsoft Sync Framework is to allow any data source to be integrated regardless of the participant type. For this reason, partial participants can synchronize information with full participants and full participants can synchronize information with simple participants. At the very least there needs to be one full participant that has the ability to store information and launch the synchronization process.
Microsoft Synchronization Framework
Before implementing synchronization using Sync Framework, we need to first understand the key components of a provider. The following diagram shows how a provider built using Sync Framework communicates with a data source and retrieves state information from a metadata store. These providers in turn communicate with other providers through a synchronization session.
The data source is the location where all information which needs to be synchronized is stored. A data source could be a relational database, a file, a Web Service or even a custom data source included within a line of business application. As long as you can programmatically access the data, it can participate in synchronization.
A fundamental component of a provider is the ability to store information about the data store and the objects within that data store with respect to state and change information. Metadata can be stored in a file, within a database or within the data source being synchronized. As an optional convenience, Sync Framework offers a complete implementation of a metadata store built on a lightweight database that runs in your process. The metadata for a data store can be broken down into five key components:
For each item that is being synchronized, a small amount of information is stored that describes where and when the item was changed. This metadata is composed of two versions: a creation version and an update version. A version is composed of two components: a tick count assigned by the data store and the replica ID for the data store. As items are updated, the current tick count is applied to that item and the tick count is incremented by the data store. The replica ID is a unique value that identifies a particular data store. The creation version is the same as the update version when the item is created. Subsequent updates to the item modify the update version.
The two primary ways that versioning can be implemented are:
All change-tracking must occur at least at the level of items. In other words, every item must have an independent version. In the case of a database, an item might be the entire row within a table. Alternatively, an item might be a column within a row of a table. In the case of file synchronization an item will likely be the file. More granular tracking is highly desirable in some scenarios as it reduces the potential for data conflicts (two users updating the same item on different replicas). The downside is that it increases the amount of change-tracking information stored.
Another key concept that we need to discuss is the notion of knowledge. Knowledge is a compact representation of changes that the replica is aware of. As version information is updated so does the knowledge for the data store. Providers use replica knowledge to:
Each replica must also maintain tombstone information for each of the items that are deleted. This is important because when synchronization is executed, if the item is no longer there, the provider will have no way of telling that this item has been deleted and cannot propagate the change to other providers. A tombstone must contain the following information:
Because the number of tombstones will grow over time, it may be prudent to create a process to clean up this store after a period of time in order to save space. Support for managing tombstone information is provided with Sync Framework.
The replica where synchronization is initiated is called the source and the replica it connects to is called the destination. The following sections outline the flow of synchronization described in the following diagram. For bidirectional synchronization, this process will be executed twice; source and destination swapped on the second iteration.
Synchronization Session Initiated with Destination
During this phase, the source provider initiates communication to the destination provider. The link between the two providers is called a synchronization session.
Destination Prepares and Sends Knowledge
As discussed previously, each replica stores its own unique knowledge. The knowledge stored in the destination is passed on to the source.
Destination Knowledge used to Determine Changes to be sent
On the source side, the knowledge that was just received is compared to the local item versions to determine the items that the destination does not know about. It is important to note that the versions that are sent are not the actual items but a summary of where the last change was made to each item.
Change Versions and Source Knowledge sent to Destination
Once the source has prepared the list of change versions required, they are transported to the destination
Local Version Retrieved for Change Items and Compared against Source Version and Knowledge
The destination uses the versions to prepare a list of items that the source needs to send. The destination also uses this information to detect if there are any constraint conflicts.
Conflicts are Detected and Resolved or Deferred
A conflict is detected when the change version in one replica does NOT contain the knowledge of the other. Fundamentally, a conflict occurs if a change is made to the same item on two replicas between synchronization sessions.
Conflicts specifically occur when the source knowledge does not contain the destination version for an item (it is implied that the destination knowledge does not contain any of the source versions sent).
If the version is contained in the destination's knowledge then the change is considered obsolete.
Replicas are free to implement a variety of policies for the resolution of items in conflict across the synchronization community. Below are some examples of commonly used resolution policies:
During this phase the destination has determined which items in the source need to be retrieved and communicates this request to the source.
Source Prepares and Sends Item Data
The source takes the item data request and prepares the actual data to be transferred to the destination. If the item being tracked is a row in a database, that row will be sent. If the item is a file in a folder then the file will be transferred.
Items are applied at Destination
The items received are taken and applied at the destination. If there are any errors during this process, such as network failure, the items will be tagged as exceptions and corrected during the next synchronization. The knowledge received from the source is added to the destination knowledge.
Using the synchronization flow described in the previous section, we will walk through an example of how Sync Framework enumerates changes and ultimately applies item data.
In this example there will be two replicas; Replica A and Replica B. Replica A will initiate synchronization to Replica B (meaning Replica A is the source and Replica B is the destination).
For example, imagine we wanted to synchronize files between these two replicas. A single file in a folder will be the item that will be tracked and is described a In (for example, I1, I2,I3…). When a new file (I1) is created, the metadata associated with that file also needs to be updated as follows:
If that file was updated again, the version table might look as follows:
There are likely multiple files being tracked so let's bring in multiple items. As you can see the version information grows as more and more items are created. Sync Framework does not require previous update versions to be stored. It only needs to know of the most recent update version.
If we take the current state of the items for this replica, we would represent the knowledge of Replica A as
A is the Replica ID and 5 is the current tick count that this replica knows changes up to.
On Replica B, there may also be a number of files. This replica looks as follows:
The current knowledge for Replica B is:
At this point we choose to initiate synchronization between the two replicas. Replica A will be the source (the one initiating synchronization) and Replica B will be the destination.
During synchronization the destination sends the source its knowledge. As mentioned earlier, the knowledge for the two replicas look as follows:
The source (Replica A) receives this knowledge and uses it to determine which versions to send to the destination. Since Replica B is not aware of any of the items in Replica A, the entire contents of Replica A are sent. In this case, it would include the following versions.
The destination receives these versions and enumerates through them to determine which items are to be requested from the source. It also uses this information to determine if there are any conflicts (for example, the same file was updated on both replicas).
Once that is complete the destination requests the source to send the items it is not aware of. In this case, Replica A would send the files that are associated with I1, I2 and I3.
The destination receives these files and adds them to its folder.
At the end of this synchronization session, the process is executed one more time, but this time the source becomes the destination and the destination becomes the source. This allows Replica A to receive any of the files that were created or changed on Replica B (I104 and I105)
At the end of synchronization both replica have the following updated knowledge.
Extending the previous example, the two replicas are currently synchronized and each of the item versions look as follows:
Similarly, the knowledge for both replicas is as follows:
At this point both of the replicas decide to update the same file (item I2).
On Replica A the item version table is updated to:
On Replica B the item version table is updated to:
The knowledge for both replicas is also updated to:
Replica A Knowledge = A6, B4
Replica B Knowledge = A5, B5
At this point Replica A initiates synchronization with Replica B. Skipping to the stage where the source sends the item versions and knowledge to the destination, the following steps are performed for item I2.
Microsoft Sync Framework includes all of the things required to integrate applications into an offline or collaboration based network, by using the pre-created providers or writing new custom providers. Providers enable any data source to participate in data synchronization regardless of network or device type.
For more information or to get a copy of the Sync Framework SDK, please visit http://msdn.microsoft.com/sync.