How Merge Replication Works
Merge replication is implemented by the Snapshot Agent and Merge Agent. The Snapshot Agent prepares snapshot files containing schema and data of published tables, stores the files in the snapshot folder, and inserts synchronization jobs in the publication database. The Snapshot Agent also creates replication-specific stored procedures, triggers, and system tables.
The Merge Agent applies the initial snapshot jobs held in the publication database tables to the Subscriber. It also merges incremental data changes that occurred at the Publisher or Subscribers after the initial snapshot was created, and reconciles conflicts according to rules you configure or a custom resolver you create.
The role of the Distributor is very limited in merge replication, so implementing the Distributor locally (on the same server as the Publisher) is very common. The Distribution Agent is not used at all during merge replication, and the distribution database on the Distributor stores history and miscellaneous information about merge replication.
Microsoft® SQL Server™ 2000 identifies a unique column for each row in the table being replicated. This allows the row to be identified uniquely across multiple copies of the table. If the table already contains a column with the ROWGUIDCOL property that has a unique index or primary key constraint, SQL Server will use that column automatically as the row identifier for the publishing table.
Otherwise, SQL Server adds a uniqueidentifier column, titled rowguid, which has the ROWGUIDCOL property and an index, to the publishing table. Adding the rowguid column increases the size of the publishing table. The rowguid column and the index are added to the publishing table the first time the Snapshot Agent executes for the publication.
SQL Server then installs triggers that track changes to the data in each row or each column. The triggers capture changes made to the publishing table and record the changes in merge system tables. Tracking triggers on the publishing tables are created while the Snapshot Agent for the publication runs for the first time. Triggers are created at the Subscriber when the snapshot is applied at the Subscriber.
Different triggers are generated for articles that track changes at the row level or the column level. Because SQL Server supports multiple triggers of the same type on the publishing table, merge replication triggers do not interfere with application-defined triggers.
The Snapshot Agent also creates custom stored procedures that update the subscription database. There is one custom stored procedure for INSERT statements, one for UPDATE statements, and one for DELETE statements. When data is updated and the new records need to be entered in the subscription database, the custom stored procedures are used rather than individual INSERT, UPDATE, and DELETE statements. For more information, see Using Custom Stored Procedures in Articles.
SQL Server then adds several system tables to the database to support data tracking, efficient synchronization, and conflict detection, resolution and reporting. For every changed or created row, the table MSmerge_contents contains the generation in which the most recent modification occurred. It also contains the version of the row as a whole and every attribute of the row. MSmerge_tombstone stores DELETEs to the data within a publication. These tables use the rowguid column to join to the publishing table.
The generation column in these tables acts as a logical clock indicating when a row was last updated at a given site. Actual datetime values are not used for marking when changes occur, or deciding conflicts, and there is no dependence on synchronized clocks between sites. This makes the conflict detection and resolution algorithms more resilient to time zone differences and differences between physical clocks on multiple servers. At a given site, the generation numbers correspond to the order in which changes were performed by the Merge Agent or by a user at that site.
MSmerge_genhistory and MSmerge_replinfo allow SQL Server to determine the generations that need to be sent with each merge.
There are several tracking columns added to a merge publication table. If your publishing table has column names reserved for merge processing, you will not be able to generate an initial snapshot because of duplicate column names. Reserved column names are:
Initial Snapshot and the Snapshot Agent
Before a new Subscriber can receive incremental changes from a Publisher, the Subscriber must contain tables with the same schema and data as the tables at the Publisher. Copying the complete current publication from the Publisher to the Subscriber is called applying the initial snapshot. SQL Server will create and apply the snapshot for you, or you can choose to apply the snapshot manually. For more information, see Applying the Initial Snapshot.
Even when creating a subscription for which the snapshot is not applied automatically (sometimes referred to as a nosync subscription), portions of the snapshot are still applied. The necessary tracking triggers and tables are created at the Subscriber, which means that you still need to create and apply a snapshot even when subscriptions specify that the snapshot will not be applied automatically.
Replication of changed data occurs only after merge replication ensures that the Subscriber has the most recent snapshot of the table schema and data that has been generated. When snapshots are distributed and applied to Subscribers, only those Subscribers needing initial snapshots are affected. Subscribers that are already receiving INSERTs, UPDATEs, DELETEs, or other modifications to the published data are unaffected unless the subscription is marked for reinitialization or the publication is marked for a reintialization, in which case all subscriptions corresponding to a given publication are reintialized during the next merge process.
A subscription table can subscribe only to one merge publication at a time. For example, suppose you publish the Customers table in two publications, and then you subscribe to both publications from one Subscriber, indicating the same subscription database will receive data from both publications. One of the Merge Agents will fail during the initial synchronization.
The initial snapshot can be an attached subscription database in snapshot replication, transactional replication, and merge replication. If you use an attachable subscription database, a subscription database and its subscriptions will be copied and you can apply them at another Subscriber. For more information, see Attachable Subscription Databases.
The Snapshot Agent implements the initial snapshot in merge replication using similar steps to the Snapshot Agent in snapshot replication. For more information, see Snapshot Replication .
After the snapshot files have been generated, you can view them in the Snapshot Folder using the Snapshot Explorer. In SQL Server Enterprise Manager, expand the Replication and Publications folders, right-click a publication, and then click Explore the Latest Snapshot Folder. For more information, see Exploring Snapshots.
Dynamic snapshots provide a performance advantage when applying the snapshot of a merge publication with dynamic filters. By using SQL Server 2000 bulk copy programming files to apply data to a specific Subscriber instead of a series of INSERT statements, you will improve the performance of applying the initial snapshot for dynamically filtered merge publications.
For more information, see Dynamic Snapshots.
After the initial snapshot has been applied to a Subscriber, SQL Server triggers will begin tracking INSERT, UPDATE and DELETE statements made at the Publisher and at Subscribers.
Every table that participates in merge replication is assigned a generation slot in the MSmerge_articles table. When a row is updated in a merge publication at the Publisher or at Subscribers, even if they are not connected, a trigger updates the generation column in the MSmerge_contents system table for that row to the appropriate generations slot for the given base table. When the Publisher and Subscriber are reconnected and the Merge Agent runs, the Merge Agent collects all the undelivered row changes (with new generation values) into one or more groups and assigns generation values that are higher than all previous generations. This allows the Merge Agent to batch changes to different tables in separate generations and process these batches to achieve efficiency over slow networks.
The Merge Agent at each site keeps track of the highest generation it has sent to each of the other sites, and the highest generation that each of the other sites has sent to it. These provide starting points, so that each table can be examined without looking at data already shared with the other site. The generations stored in a given row can differ between sites because the numbers at a site reflect the order in which changes were processed at that site.
You can limit the number of merge processes running simultaneously by setting the @max_concurrent_merge parameter of sp_addmergepublication or sp_changemergepublication. If the maximum number of merge processes is already running, any new merge processes will wait in a queue. You can set –StartQueueTimeout on the Merge Agent command line to specify how long the agent should wait for the other merge processes to complete. If the –StartQueueTimeout period is exceeded, and the new merge process is still waiting, it will stop and exit.
Synchronization occurs when Publishers and Subscribers in a merge replication topology reconnect and changes are propagated between sites, and if necessary, conflicts detected and resolved. At the time of synchronization, the Merge Agent sends all changed data to the Subscriber. Data flows from the originator of the change to the site that needs to be updated or synchronized.
The direction of the exchange controls whether the Merge Agent uploads changes from the Subscriber (-ExchangeType='Upload'), downloads changes to the Publisher (-ExchangeType='Download') or executes an upload followed by a download (-ExchangeType='Bidirectional'). If the number of changes applied must be controlled, the Merge Agent command line parameters –MaxUploadChanges and –MaxDownloadChanges can be configured. In this case, the data at the Publisher and Subscribers converges only when all changes are propagated.
At the destination database, updates propagated from other sites are merged with existing values according to conflict detection and resolution rules. A Merge Agent evaluates the arriving and current data values, and any conflicts between new and old values are resolved automatically based on the default resolver, a resolver you specified when creating the publication or a custom resolver. Merge replication in SQL Server 2000 offers many out-of-the-box custom resolvers that will help you implement the business logic.
Changed data values are replicated to other sites and converged with changes made at those sites only when synchronization occurs. Synchronizations can occur minutes, days, or even weeks apart and are defined in the Merge Agent schedule. Data is converged and all sites ultimately end up with the same data values, but for this to happen, you would have to stop all updates and merge between sites a couple of times.
The retention period for subscriptions specified for each publication controls how often the Publisher and Subscribers should synchronize. If subscriptions do not synchronize with the Publisher within the retention period, they are marked as 'expired' and will need to be reinitialized. This is to prevent old Subscriber data from synchronizing and uploading these changes to the Publisher. The default retention period for a publication is 14 days. Because the Merge Agent cleans up the publication and subscription databases based on this value, care must be taken to configure this value appropriate to the application.
Note The merge process requires an entry for the Publisher in the sysservers table on the Subscriber. If the entry does not exist, SQL Server will attempt to add this entry. If the login used by the Merge Agent does not have access to add the entry (such as db_owner of the subscription database), an error will be returned.
Merge replication Subscribers update data based on the original snapshot provided to them unless you mark the subscription for reinitialization. When you mark the subscription for reinitialization, the next time the Merge Agent runs, it will apply a new snapshot to the Subscriber. Optionally, changes made at the Subscriber can be uploaded to the Publisher before the snapshot is reapplied. This ensures that any data changes at the Subscriber are not lost when the subscription is reinitialized.
If you created a subscription and indicated no initial snapshot was to be applied at the Subscriber (the @sync_type parameter set to nosync in sp_addmergesubscription system stored procedure), and you reinitialize the subscription, the snapshot will be reapplied to the Subscriber. This functionality ensures that Subscribers have data and schema identical to data and schema at the Publisher.
If you reinitialize all subscriptions to a merge publication, the subscriptions specified with no initial snapshot synchronization will be reinitialized the same way the subscriptions with synchronization type of 'automatic' are reinitialized. To prevent the reapplication of the snapshot to the Subscriber, drop the subscription specified with no initial snapshot synchronization, and then recreate it after reinitialization.
For more information about synchronization, see Synchronizing Data.
The Merge Agent is a component of SQL Server Agent and can be administered directly by using SQL Server Enterprise Manager. The Snapshot Agent and Merge Agent can also be embedded into applications by using Microsoft ActiveX® controls. The Snapshot Agent executes on the Distributor. The Merge Agent usually executes on the Distributor for push subscriptions and on Subscribers for pull subscriptions. Remote agent activation can be used to offload agent processing to another server. For more information, see Remote Agent Activation.
SQL Server can validate the data at the Subscriber as the replication process is occurring so that you can ensure that data updates applied at the Publisher are applied at Subscribers. For more information, see Validating Replicated Data.
Validating Permissions for a Subscriber
SQL Server 2000 provides the option to validate permissions for a Subscriber to upload data changes to a Publisher. This verifies that the Merge Agent login has the permissions to perform INSERT, UPDATE, and DELETE commands on the publication database. Validating permissions requires that the Merge Agent login be a valid user with the appropriate permissions in the publication database.
This permissions validation is in addition to the verification that the logins used at the Subscriber are in the publication access list (PAL).
Validating permissions for a Subscriber can be set using the @check_permissions property in sp_addmergearticle or by using the CheckPermissions Property in SQL-DMO. For more information, see CheckPermissions Property. You can specify one or more of the following values for the @check_permissions parameter in sp_addmergearticle.
|0 (Default)||Permissions will not be checked.|
|1||Check permissions at the Publisher before INSERTs made at a Subscriber can be uploaded.|
|2||Check permissions at the Publisher before UPDATEs made at a Subscriber can be uploaded.|
|4||Check permissions at the Publisher before DELETEs made at a Subscriber can be uploaded.|
Note If you set the @check_permissions parameter after the initial snapshot has been generated, a new snapshot must be generated and reapplied at the Subscriber in order for permissions to be validated when data changes are merged.
Cleaning Up Merge Replication
When the distribution database is created, SQL Server adds the following tasks automatically to SQL Server Agent to purge the data no longer needed:
- Subscription cleanup at the Publisher
- History cleanup at the Distributor
These tasks help replication to function effectively in a long-running environment; therefore, administrators should plan for this periodic maintenance. The cleanup tasks delete the initial snapshot for each publication and remove history information in the Msmerge_history table.
Merge Meta Data Cleanup
When there is a large amount of merge meta data in the system tables, cleaning up the meta data improves the performance of merge replication. Prior to SQL Server 2000 Service Pack 1 (SP1), meta data could be cleaned up only by running sp_mergecleanupmetadata. However, SQL Server 2000 SP1 and later includes retention-based meta data cleanup, which means that meta data can be automatically deleted from the following system tables:
- Before image tables, if they are present (They are present if the @keep_partition_changes synchronization optimization option is enabled on the publication)
Retention-based meta data cleanup occurs as follows:
- If the –MetadataRetentionCleanup Merge Agent parameter is set to 1, as it is by default, the Merge Agent cleans up the Subscriber and the Publisher that are involved in the merge.
Note: The -MetadataRetentionCleanup 1 parameter is now part of all Merge Agent profiles that ship with SQL Server 2000 SP1 and later.
- If the -MetadataRetentionCleanup parameter is set to 0, automatic cleanup does not occur. In this case, manually initiate retention-based meta data cleanup by executing sp_mergemetadataretentioncleanup. This stored procedure must be executed at every Publisher and Subscriber that should be cleaned up. It is recommended, but not required, that the Publisher and Subscribers be cleaned up at similar points in time (See later section Preventing False Conflicts).
The default retention period for publications is 14 days. If an article belongs to several publications, there might be different retention periods. In that situation, the longest retention period is used to determine the earliest possible time that cleanup can occur.
Important If there are multiple publications on a database, and any one of those publications uses an infinite publication retention period (@retention=0), merge meta data for the database will not automatically be cleaned up. For this reason, use infinite publication retention with caution.
Meta Data Cleanup in Topologies with Different Versions of SQL Server
For automatic retention-based cleanup to occur in a database involved in merge replication, the database and the Merge Agent must both be on servers running SQL Server 2000 SP1 or later. For example:
- A SQL Server 7.0 pull Subscriber will not run cleanup at a SQL Server 2000 SP1 Publisher.
- A SQL Server 2000 SP1 push Merge Agent will not run cleanup in a SQL Server 2000 (without SP1) Subscriber database.
- A SQL Server 2000 SP1 push Merge Agent will run cleanup in a Server 2000 SP1 Publisher database even if it has subscribers that are SQL Server 2000 or earlier.
Automatic cleanup on some servers and not on others will at most cause false conflicts, and those should be rare. For topologies that include versions of SQL Server prior to SQL Server 2000 SP1, you may see performance benefits by running sp_mergemetadatacleanup on all servers that aren't cleaned up automatically.
Preventing False Conflicts
Retention-based meta data cleanup prevents non-convergence and silent overwrites of changes at other nodes. However, false conflicts can occur if:
- The meta data is cleaned up at one node and not another in the topology, and
- A subsequent update at the cleaned-up node occurs on a row whose meta data was deleted.
For example, if meta data is cleaned up at the Publisher but not at the Subscriber, and an update is made at the Publisher, a conflict will occur even though data appears to be synchronized.
- To prevent this conflict, make sure meta data is cleaned up at related nodes at about the same time. If -MetadataRetentionCleanup 1 is used, both the Publisher and Subscriber are cleaned up automatically before the merge starts, thereby ensuring that the nodes are cleaned up at the same time.
- If a conflict occurs, use the merge replication conflict viewer to review the conflict and change the outcome if necessary.
If an article belongs to several publications or is in a republishing scenario, it is possible that the retention periods for a given row at the Publisher and Subscriber are different. To reduce the chance of cleaning up meta data on one side but not the other, it is recommended that those different publications have similar retention periods.
Note If there is a large amount of meta data in the system tables that must be cleaned up, the merge process may take longer to run. Clean up the meta data on a regular basis to prevent this issue.