Enhancing Merge Replication Performance
You can enhance the performance of merge replication in your application and on your network by:
- Using indexes on columns used in subset and join filters.
- Creating a ROWGUIDCOL column prior to generating the initial snapshot.
- Using native mode bcp whenever possible.
- Increasing the batch sizes processed by the Merge Agent.
- Using pull and anonymous subscriptions when there are a large number of Subscribers.
- Limiting the use of text and image columns.
- Considering over-normalization of tables containing text and image columns.
- Using static rather than dynamic partitions when possible.
- Using dynamic snapshots for dynamically filtered publications.
- Limiting complexity of subset filter clauses.
- Reducing publication retention settings.
- Selecting column-level tracking when bandwidth is limited.
- Optimizing synchronization when partitioning data.
- Controlling article processing order if using triggers for referential integrity.
- Using global subscriptions.
- Occasionally re-indexing merge replication system tables.
- Not overusing join filters.
- Modifying database design.
- Limiting or controlling simultaneous agent processing.
- Considering Reinitialization of the subscription.
Using Indexes on Columns Used in Subset and Join Filters
When you use a filter on a published article, create an index on each of the columns that is used in the filter's WHERE clause. Without an index, Microsoft® SQL Server™ 2000 has to read each row in the table to determine whether the row should be included in the article (that is, in the horizontal partition of the table). With an index, SQL Server 2000 can quickly locate which rows should be included. The fastest processing takes place if SQL Server 2000 can fully resolve the WHERE clause of the filter from just the index.
Indexing all the columns used in JOIN filters is also important. Each time the Merge Agent runs, it searches the base table to determine which rows in the base table and which rows in related tables are included in the article. Creating an index on the JOIN columns saves SQL Server 2000 from having to read each row in the table every time the Merge Agent runs.
For more information, see Filtering Published Data.
Create a ROWGUIDCOL Column Prior to Generating the Initial Snapshot
By creating a column that can be used to help track changes during merge replication, you will avoid the sometimes significant time (and disk and log) decrease in performance that occurs from waiting for the Snapshot Agent to alter the tables for you.
Merge replication requires that each published table have a ROWGUIDCOL column. If a ROWGUIDCOL column does not exist in the table before the Snapshot Agent creates the initial snapshot files, the agent must first add and populate the ROWGUIDCOL column. To gain a performance advantage when generating snapshots during merge replication, create the ROWGUIDCOL column on each table before publishing using merge replication. The column can have any name (rowguid is used by the Snapshot Agent by default), but must contain the following data type characteristics:
- The data type as UNIQUEIDENTIFIER.
- The default as NEWID().
- The ROWGUIDCOL property.
- A unique index on the column.
The ROWGUIDCOL column is used frequently in merge replication during tracking and synchronization of changes made at the Publisher and at Subscribers.
Increase the Batch Sizes Processed by the Merge Agent
By default, the Merge Agent processes 100 generations in each batch uploaded and downloaded between the Publisher and Subscriber. If you make frequent updates to a single table and update a large number of rows in a single transaction, consider increasing the number of generations in each batch. You can set the -DownloadGenerationsPerBatch and -UploadGenerationsPerBatch parameters in the Merge Agent profile.
Use Pull and Anonymous Subscriptions When There Are a Large Number of Subscribers
A pull subscription moves the Distribution Agent from the Distributor to the Subscriber. Relocating the Distribution Agent reduces the amount of processing the Distributor must do for each pull subscription and shifts the processing overhead to the Subscriber. By creating pull subscriptions instead of push subscriptions, you free up more processing capacity at the Distributor for performing other replication or application tasks. By creating anonymous subscriptions, you can further reduce the resource demands on the Distributor because no meta data has to be stored about the Subscriber.
Use Native Mode bcp Whenever Possible
When you create a publication, you have the choice of specifying that one or more Subscribers will be Microsoft Jet 4.0 (Microsoft Access) or SQL Server for Windows® CE databases. Enabling support for these types of Subscribers causes the Snapshot Agent to store the snapshot files in character format instead of native SQL Server 2000 format. Because it takes additional processing time and storage space for SQL Server 2000 to process and store character format files, do not enable SQL Server for Windows CE or Access Subscribers unless you are sure that you will actually have such Subscribers.
Limit the Use of text and image Columns
text and image columns require more storage space and processing than other column data types. Do not include text and image columns in articles unless absolutely necessary for your application.
Consider Over-normalizing Tables Containing text and image Columns
When synchronization occurs, the Merge Agent may need to read and transfer the entire data row from a Publisher or Subscriber. If the row contains text and image columns this process can require additional memory allocation and negatively impact performance even though these columns may not have been updated. To reduce the likelihood that this performance impact will occur, consider putting text and image columns in a separate table using a one-to-one relationship to the rest of the row data.
Reducing Use of Horizontal Filtering
When subset filters or join filters are used to filter the data in a publication, the Merge Agent must determine if rows need to be added to or removed from a subscription database. While this can decrease the amount of data that must be transferred to each Subscriber, it can increase the amount of processing required at the Publisher during each synchronization. If data is not horizontally filtered, all data changes must be sent to each Subscriber and the Merge Agent will begin sending data to Subscribers quickly without having to first evaluate the filter criteria for the publication.
Publish unrelated tables in separate publications if some tables receive a lot of activity and others do not.
Use Static Rather Than Dynamic Partitions When Possible
Dynamic filters and partitions are a powerful feature of SQL Server 2000 replication. However, even with indexes on the filtered columns, SQL Server 2000 must still read each row in the dynamic partition and compare it to the filtered value. Using static filters and partitions reduces the processing time required to complete the merge process.
Using Dynamic Snapshots for Dynamic Filtered Publications
When dynamic filters are used to partition a publication, the Snapshot Agent cannot pre-determine the data required for a Subscriber. As such, the Merge Agent must request inserts for all data specific to its partition after it applies the schema files from the snapshot folder. Processing the initial snapshot for a large volume of data using inserts can be significantly slower than processing the same data using the SQL Server bulk copy utility.
With SQL Server 2000, the Snapshot Agent can be instructed to generate bcp files specific to each subscriber by creating a dynamic snapshot job. Dynamic snapshots will generate bcp files as though a static filter had been applied to the publication. While this requires running the Snapshot Agent in a special mode once for each partition of data to be generated for Subscribers, it can dramatically improve the time it takes the Merge Agent to apply the data when processing the snapshot files.
For more information, see Dynamic Snapshots.
Limiting Complexity of Subset Filter Clauses
When using subset filter clauses to horizontally partition data in a publication, limit the complexity of the filter criteria. The subset filter clause will be evaluated frequently to determine which changed rows of published data should be synchronized with each Subscriber. Limiting the complexity of the filtering criteria will help improve performance when the merge agent is evaluating row changes to send to Subscribers. Avoid using sub-selects within merge subset filter clauses. Instead, consider using join filters, which are generally more efficient when used to partition data in one table based on the subset filter clause in another table.
Note Do not overuse join filters. Join filters with dozens or more tables will impact performance.
For more information, see Filtering Published Data.
Reducing Publication Retention Settings
Publication retention determines how long a Subscriber can go without synchronizing incremental changes before that Subscriber is considered to be out of synchronization and requires a new snapshot from the Publisher.
This setting also controls how long some merge tracking meta data is maintained in the publication and subscription databases. You can control the growth of merge tracking meta data and, in some cases, see improved performance while synchronizing changes, if you reduce the publication retention period setting. Select a publication retention setting that is adequate to support Subscribers working offline for extended periods of time.
Selecting Column-level Tracking When Bandwidth is Limited
While business application needs generally drive the choice selection of row- or column-level tracking for merge publications, there can be a performance benefit to selecting column-level tracking when bandwidth availability is low. Column-level tracking of data changes allows the Merge Agent to send only the changed columns and rowguidcol property across the network for changed rows. Conversely, the Merge Agent will always send the entire row when row-level tracking is used. Sending only the changed columns can provide better performance across a network with limited bandwidth when an application frequently changes only a few columns in a table that has many columns.
Optimizing Synchronization When Partitioning Data
Selecting the @keep_partition_changes option when adding an article to a merge publication can significantly reduce the amount of time it takes the Merge Agent to determine whether recently changed rows should be sent to a Subscriber.
In cases where an application updates a column used in a subset filter or join filter, the Merge Agent must do additional work to determine if that row change requires that rows be added to or removed from the partition for each Subscriber as they synchronize. By maintaining some additional data about the changed rows in the publication database, the Merge Agent can more quickly determine which partition-related row changes are relevant to each Subscriber.
Caution Choosing to maintain this additional information at the Publisher will result in an increase in the storage requirements for the merge replication tracking system tables in the publication database. However, if UPDATES to columns included in partitions are not atypical, the performance gains are usually worth maintaining the additional information.
For more information, see Optimizing Synchronization.
Controlling Article Processing Order If Using Triggers for Referential Integrity
When publishing tables related to one another via declared foreign key constraints or constraints enforced via triggers, the Merge Agent will need to apply changes to related rows in the correct order to propagate all changes. If you are using declared referential integrity, SQL Server will process articles in order based on the relationships. By processing articles in the optimal order based on the action being performed (for example, inserting parent rows before related child rows), the Merge Agent can avoid additional retry operations when processing articles during synchronizing.
If declared referential integrity is not used, the Merge Agent will, by default, process articles in the order they are added to a publication via stored procedures or SQLDMO – article order cannot be controlled through SQL Server Enterprise Manager.
If triggers are used to enforce referential integrity, the Merge Agent will not recognize this as declared referential integrity, and you need to be aware of the processing order of the articles.
Using Global Subscriptions
When synchronizing changes for a local or anonymous merge Subscriber, the Publisher must also synchronize additional system tracking data that would otherwise be unnecessary with global subscriptions. Using global subscriptions may improve synchronization performance in cases where subscribers make frequent updates.
Occasionally Re-index Merge Replication System Tables.
As part of maintenance for merge replication, occasionally check the growth of the system tables associated with merge replication: MSmerge_contents, MSmerge_genhistory, and MSmerge_tombstone. Periodically re-index these tables by running DBCC Transact-SQL commands. To re-index these system tables, execute the following commands on the publication database:
- DBCC DBREINDEX ('MSmerge_contents')
- DBCC DBREINDEX ('MSmerge_genhistory')
- DBCC DBREINDEX ('MSmerge_tombstone')
Additionally, you should minimize the size of the merge system tables (specifically MSmerge_history) by using sp_mergecleanupmetadata. For more information, see How Merge Replication Works.
Not Overusing Join Filters
Join filters with several tables (such as dozens or hundreds of tables) will seriously impact performance during merge processing. It is recommended that if you are generating join filters of five or more tables that you consider other solutions. Another strategy might be to avoid filtering tables that are primarily lookup tables, smaller tables, and tables that are not subject to change. Make those tables part of the publication in their entirety. It is recommended that you use join filters only between tables for which it is important they carefully partition among Subscribers.
Modify Database Design
The design of the database ultimately determines the complexity and processing resource requirements of the queries used by merge replication, which affects merge performance. A poor database design or a database design that does fit with the publication (or filtering) needs of a merge publication may require some structural changes to the database to improve merge performance. Specifically, adding columns or tables to support dynamic partitioning logic more efficiently, and making sure that the columns used in the filtering expressions can take advantage of indexes. Generic 'optimizing queries for index usage' rules apply. If you generically mention to use indexes on all filtering columns, this may actually be counter-productive in terms of index maintenance if the index is not used by the query optimizer, because the data is not very unique or the expression cannot use indexes. Sometimes changing the filtering expressions will allow an existing index to be used where it was not before.
Limit or Control Simultaneous Agent Processing
Limit or control the number of multiple simultaneous Snapshot Agent or Merge Agent processes, especially with large data sets, complex partitioning logic, and large volumes of merged changes. The @max_concurrent_merge and @max_concurrent_dynamic_snapshots parameters for sp_addmergepublication can help with this.
Consider Reinitializing the Subscription
When large amounts of changes need to be sent to subscribers, reinitializing them with a new snapshot may be faster than using merge to move the individual changes.