How Transactional Replication Works
Transactional replication is implemented by the Snapshot Agent, Log Reader Agent, and Distribution Agent. The Snapshot Agent prepares snapshot files containing schema and data of published tables and database objects, stores the files in the snapshot folder, and records synchronization jobs in the distribution database on the Distributor.
The Log Reader Agent monitors the transaction log of each database configured for transactional replication and copies the transactions marked for replication from the transaction log into the distribution database. The Distribution Agent moves the initial snapshot jobs and the transactions held in the distribution database tables to Subscribers.
Before a new transactional replication Subscriber can receive incremental changes from a Publisher, the Subscriber must contain tables with the same schema and data as the tables at the Publisher. Copying the complete current publication from the Publisher to the Subscriber is called applying the initial snapshot. Microsoft® SQL Server™ 2000 will create and apply the snapshot for you, or you can choose to apply the snapshot manually. For more information, see Applying the Initial Snapshot.
When snapshots are distributed and applied to Subscribers, only those Subscribers waiting for initial snapshots are affected. Other Subscribers to that publication (those that are already receiving inserts, updates, deletes, or other modifications to the published data) are unaffected.
Concurrent Snapshot Processing
Typically with snapshot generation, SQL Server will place shared locks on all tables published as part of replication for the duration of snapshot generation. This can prevent updates from being made on the publishing tables. Concurrent snapshot processing, available only with transactional replication, does not hold the share locks in place during the entire snapshot generation, therefore, it allows users to continue working uninterrupted while SQL Server 2000 creates initial snapshot files.
When you create a new publication using transactional replication and indicate that all Subscribers will be instances of SQL Server 7.0 or SQL Server 2000, concurrent snapshot processing is available.
After replication begins, the Snapshot Agent places shared locks on the publication tables. The locks prevent changes until a record indicating the start of the snapshot is entered in the log file. After the transaction is received, the shared locks are released and data modifications at the database can continue. The duration for holding the locks is very brief (a few seconds) even if a large amount of data is being copied.
At this point, the Snapshot Agent starts to build the snapshot files. When the snapshot is complete, a second record indicating the end of the snapshot process is written to the log. Any transactions that affect the tables while the snapshot is being generated are captured between these beginning and ending tokens and forwarded to the distribution database by the Log Reader Agent.
When the snapshot is applied at the Subscriber, the Distribution Agent first applies the snapshot files (schema and .bcp files). It then reconciles each captured transaction to see if it has already been delivered to the Subscriber. During this reconciliation process, the tables on the Subscriber are locked. Depending on the number of transactions captured at the Publisher while the snapshot was created, you should expect an increase in the amount of time required to apply the snapshot at the Subscriber. Conceptually, this is similar to the process of recovery that SQL Server uses when it is restarted.
UPDATETEXT statements cannot be performed on data marked for replication while it is being extracted during concurrent snapshot processing. If you initiate an UPDATETEXT statement, you will get an error indicating that the operation is not allowed because of concurrent snapshot processing. After the snapshot is complete, UPDATETEXT statements can be performed again.
As mentioned earlier, use caution when concurrent snapshot processing occurs on systems where business logic is indicated through triggers or constraints on the subscription database. Concurrent snapshot processing uses bulk inserts of tables followed by a series of special INSERT and DELETE statements that bring the table to a consistent state. These operations are performed as one transaction so that database users do not see the data in an inconsistent state; however, constraints at the Subscriber will be executed within the transaction and may evaluate changes that are not based on a consistent set of data. To prevent this, it is generally recommended that you specify the NOT FOR REPLICATION option on all constraints and columns with the IDENTITY property on the Subscriber database. Business logic implemented using custom stored procedures will not be affected because custom stored procedures are not used during concurrent snapshot processing until the Subscriber tables are in a consistent state.
Foreign key constraints, check constraints, and triggers at the Subscriber do not require the NOT FOR REPLICATION option because they will be disabled during the concurrent snapshot generation and will be enabled after the snapshot is generated.
Important The Log Reader Agent must run after the snapshot is generated with concurrent processing. If the Log Reader Agent does not run, the Distribution Agent will continue to return an error stating that the snapshot is not available and will not apply it to Subscribers. The Log Reader Agent needs to propagate all changes that occurred during snapshot generation to the distribution database before the Distribution Agent can apply the snapshot to Subscribers. Usually the Log Reader Agent runs in continuous mode, so it will run automatically soon after the snapshot is generated, but this is not a concern. If you choose not to run the Log Reader Agent in continuous mode, you must run it manually.
Although concurrent snapshot processing allows updates to continue on publishing tables, the performance will be lowered due to the overhead of the snapshot itself. It is recommended that you generate the snapshot during periods of lowest general activity whenever possible (similar to when you would choose to do a database backup).
Important For SQL Server 2000 prior to Service Pack 1: If the publishing table has a primary key or unique constraint not contained within the clustered index, replication could fail if data modifications occur on the clustering key during concurrent snapshot processing. It is recommended that you enable concurrent snapshot processing only when unique and primary key constraints are contained within the clustered index or you ensure that data modifications are not made to the columns of the clustering index while the snapshot is generated. Beginning with SQL Server 2000 Service Pack 1 there are no longer any restrictions on using concurrent snapshot processing.
Concurrent snapshot processing is available only with transactional replication and for Subscribers running instances of SQL Server 7.0 or later on the Microsoft Windows® 98, Microsoft Windows NT® 4.0 and Microsoft Windows 2000 operating systems.
If you are publishing to Subscribers running SQL Server 7.0, the Distributor must be running SQL Server 2000, and you must use push subscriptions to use concurrent snapshot processing. The Distribution Agent runs at the Distributor, and is able to execute the concurrent snapshot processing. If you used a pull subscription, the Distribution Agent would run at the Subscriber on SQL Server 7.0 where concurrent snapshot processing is not available. If you use pull subscriptions with Subscribers running SQL Server 7.0, concurrent snapshot processing must be disabled.
Because of these restrictions, the Create Publication Wizard does not make concurrent snapshot processing the default when you create a transactional publication; however, if your application meets these criteria, it is recommended that you enable this option. To enable concurrent snapshot processing, change the snapshot generation mode. Open Publication Properties, click the Snapshot tab, and then select the Concurrent access during snapshot generation checkbox.
The procedures by which the Snapshot Agent implements the initial snapshot in transactional replication are the same procedures used in snapshot replication (except as outlined earlier with regard to concurrent snapshot processing). After the snapshot files have been generated, you can view them in the Snapshot Folder using the Snapshot Explorer. In SQL Server Enterprise Manager, expand the Replication and Publications folders, right click a publication, and then click Explore the Latest Snapshot Folder. For more information, see Exploring Snapshots.
Modifying Data and the Log Reader Agent
The Log Reader Agent runs either continuously or according to a schedule you establish at the time the publication is created. When executing, the Log Reader Agent first reads the publication transaction log (the same database log used for transaction tracking and recovery during regular SQL Server 2000 operations) and identifies any INSERT, UPDATE, and DELETE statements, or other modifications made to the data transactions that have been marked for replication. Next, the agent batch copies those transactions to the distribution database at the Distributor. The Log Reader Agent uses the internal stored procedure sp_replcmds to get the next set of commands marked for replication from the log. The distribution database then becomes the store-and-forward queue from which changes are sent to Subscribers. Only committed transactions are sent to the distribution database.
There is a one-to-one correspondence between transactions on the Publisher and replication transactions in the distribution database. One transaction stored in MSrepl_transactions can consist of one or more commands and each command can be broken up along a 500-Unicode-character boundary in the MSrepl_commands table. After the entire batch of transactions has been written successfully to the distribution database, it is committed. Following the commit of each batch of commands to the Distributor, the Log Reader Agent calls sp_repldone to mark where replication was last completed. Finally, the agent marks the rows in the transaction log that are ready to be truncated. Rows still waiting to be replicated are not truncated. The transaction log on the Publisher can be dumped without interfering with replication, because only transactions not marked for replication are purged.
Data modifications made at the Subscriber will always be propagated as a series of single row statements, provided they do not modify a uniquely constrained column. If an UPDATE does modify a uniquely constrained column, the UPDATE will be propagated as a series of DELETE statements followed by a series of INSERT statements. A uniquely constrained column is any column participating in a unique index or clustered index, even if the clustered index is not declared as unique. UPDATES made to indexed views or base tables that indexed views are based on will be propagated as DELETE/INSERT pairs.
The Log Reader Agent usually runs under SQL Server Agent at the Distributor and can be administered directly by accessing it in SQL Server Enterprise Manager under Replication Monitor and the Agents folder.
Transaction commands are stored in the distribution database until the Distribution Agent propagates them to all Subscribers or a Distribution Agent at the Subscriber pulls the changes. The distribution database is used only by replication and does not contain any user tables. You should never create other objects in the distribution database. Subscribers will receive transactions in the same order in which they were applied at the Publisher.
The Distribution Agent is a component of SQL Server Agent and can be administered directly by using SQL Server Enterprise Manager. The Snapshot Agent and Distribution Agent can also be embedded into applications by using Microsoft ActiveX® controls. The Snapshot Agent executes on the Distributor. The Distribution Agent usually executes on the Distributor for push subscriptions, or on Subscribers for pull subscriptions, but remote agent activation can be used to offload agent processing to another server. For more information, see Remote Agent Activation.
SQL Server can validate the data being updated at the Subscriber as the replication process is occurring so that you can ensure that data is the same at the Publisher and at the Subscribers. For more information, see Validating Replicated Data.
Skipping Errors in Transactional Replication
The -skiperrors agent command line parameter for transactional replication allows you to specify errors that can be skipped during the distribution process. Typically, when the Log Reader Agent and Distribution Agent are running in continuous mode and one of them encounters an error, the agent, and the distribution process, stops. By specifying expected errors or errors that you do not want to interfere with replication, with the -skiperrors parameter, the Distribution Agent will log the error information and then continue running. For more information, see Handling Agent Errors.
Cleaning Up Transactional Replication
When the distribution database is created, SQL Server adds the following tasks to SQL Server Agent at the Distributor to purge the data no longer required:
- Agent checkup
- Agent history cleanup
- Transaction cleanup
- Distribution cleanup
- History cleanup
- Expired subscription cleanup
After all Subscribers have received transactions, the Distribution Cleanup Agent removes delivered transactions in the distribution database. Delivered transactions are kept in the distribution database for a defined period known as the retention period. Setting a retention period while scheduling backups can ensure that information required to recover a destination database automatically is available within the distribution database.
For example, if a Subscriber has scheduled a transaction log dump of a destination database every 24 hours, you could set the retention period to 48 hours. Even if the Subscriber experiences a failure immediately before a scheduled backup, all transactions necessary to restore the replicated tables automatically will still be available to the distribution process of the Distributor.