.gif)
SQL Server Best Practices Article
Writer:Steffen Krause
Contributors:Sanjay
Mishra, Gopal Ashok, Greg Yvkoff, Rui Wang
Technical Reviewers:Burzin Patel, Denny
Lee, Glenn Berry (SQL Server MVP), Joseph Sack, Lindsey Allen, Michael Redman,
Mike Ruthruff, Paul S. Randal (SQLskills.com)
Published:November 2008
Applies to: SQL Server 2008
Summary:Change data capture is a new
feature in SQL Server 2008 that provides an easy way to capture changes to data
in a set of database tables so these changes can be transferred to a second
system like a data warehouse. This document provides guidance on how to
configure change data capture parameters to maximize data capture performance
while minimizing the performance impact on the production workload. The scope
of this document is limited to the capture of change data and the cleanup
process. Querying the changed data is out of scope for this white paper.
>Introduction
Change data capture is a new feature in Microsoft® SQL
Server® 2008 that provides an easy way to capture changes to data in a set of
database tables so these changes can be transferred to a second system such as
a data warehouse. This document provides guidance on how to configure change
data capture parameters to maximize data capture performance while minimizing
the performance impact on the production workload. The scope of this document
is limited to the capture of change data and the cleanup process. Querying the
changed data is out of scope for this white paper. For an introduction to the
change data capture feature, see Change Data
Capture in SQL Server Books Online.
Notice that there is another new feature named change
tracking in SQL Server 2008 that also enables tracking of table rows changed by
DML commands (insert, update, delete, merge). Change tracking is aimed at the
synchronization of loosely coupled database applications and tracks only the
primary keys of changed rows and optionally which columns changed, but not the
changed data itself. Change Tracking is also out of scope for this document.
For more information about these two features, see Comparing Change
Data Capture and Change Tracking in SQL Server Books Online.
>Benefits of Change Data Capture
Traditionally, detecting changes in a source database to
transfer these changes to a data warehouse required either special columns in
the source tables (time stamps, row versions), triggers that capture changes,
or comparison of the source and the destination system. These methods can have
significant disadvantages: special columns require a change in the source
database schema and in many cases a change in the application logic. Triggers
need to be implemented manually and can put significant additional overhead on
DML commands to the source system. Comparing source and target databases can
put a heavy load on both systems.
Change data capture enables the capture of changes in the
source system by asynchronously reading the transaction log of the source
database. For this, change data capture uses the same log reader that is used
in transactional replication. Because change data capture works on existing
table schemas, the source database or application doesn’t need to be changed to
enable change data capture. Because the log reader job works asynchronously,
DML transactions are far less impacted than with synchronous solutions like
triggers. All changes to the source tables are recorded in special change
tables, so no comparison between source and target system for changes is
needed.
>Basic Change Data Capture Method of Operation
When change data capture is enabled on a database, the
schema “cdc” and a set of metadata tables in this schema are created. Most of
these change data capture tables contain metadata and have very little data
volume and transactional load. The only metadata table that is frequently
written to during change data capture operation and that can grow to a
significant size is the table cdc.lsn_time_mapping,
which records the mapping between log sequence numbers (LSNs) and the date and
time when the transaction happened.
When subsequently a table is enabled for change data capture,
a capture instance is created. A capture instance consists of a change table
(cdc.<capture_instance_name>_CT) and up to two table-valued functions for
querying this change table. There can be up to two capture instances per table.
If there is no log scan job for either transactional replication or another
capture instance on the database, a capture job and a cleanup job are created
in SQL Server Agent with default parameters (For more information about
changing these parameters, see Configuration of
the Capture Job).
The capture job is automatically started.
After the capture job is enabled, it will write one row to
the change table for every row inserted or deleted in the source table, and two
rows for every row updated in the source table (one row containing the values
before the update, one row containing the values after the update). The change
table always contains the values for all captured columns in the source table,
not only for columns that changed their value during the update. Therefore,
data from only the change table is necessary to propagate the changes to a data
warehouse. In addition to the data from the source table, the change table
contains at least 37 bytes (or more, depending on the number of captured
columns; for more information, see cdc.<capture_instance>_CT
in SQL Server Books Online) of additional data for each row of change data.
>Configuration of sys.sp_cdc_enable_table Parameters
The stored procedure sys.sp_cdc_enable_table
has several parameters that we found important for the performance of change
data capture. Parameters that are not relevant for performance are not
discussed here. For more information about the influence of sys.sp_cdc_enable_table on the
workloads we tested, see Influence of sys.sp_cdc_enable_table
Parameters on Change Data Capture Performance.
The parameter @capture_instance
determines the name of the capture instance for this table. The name itself has
no impact on performance. However, this parameter can be used to create a
second capture instance for the same table. The feature of having a second
capture instance is only intended for schema upgrade scenarios where a change
to the schema of the source table is performed. The old capture instance should
be disabled as soon as it is no longer needed, because having two capture
instances active on one table means that twice as much change data needs to be
written. This can significantly impact performance.
When you have a schema change on your source table, you
should do the following:
- Change the schema of your source table.
- Create a new capture instance. That new capture
instance is created with the new (changed) schema of the source table.
- Wait for a change to occur, and then read the
minimum LSN from the new capture instance (using sys.fn_cdc_get_min_lsn(new
capture instance name)).
- Read and process all changes from the old
capture instance up to but not including the first LSN from the new capture
instance.
- Disable the old capture instance.
- Update all references to the old capture
instance name with the new name.
- Continue reading from the new capture instance.
The parameter @captured_column_listdetermines which columns of the
source table are included in the change table and captured by change data
capture. If this parameter is not specified or NULL, all columns of the source
table are included in the change table. We found that the number and size of
captured columns has a significant impact on change data capture performance
and required disk space. Change data capture performance is generally better
when the number of captured columns is smaller because the amount of data that
needs to be written to the change tables is smaller.
The parameter @supports_net_changes determines whether net change queries to the
change table (using cdc.fn_cdc_get_net_changes_<capture_instance>) are possible. Net change queries result in
only one change row for each source row that was changed, independent of the
number of changes. For instance, if stock market prices are tracked per stock
and the price column is updated several times a day, an all changes query would
return every change of every stock price in the queried LSN interval. A net
change query would result in only one change row per stock with the final price
in the data.
When @supports_net_changes is set to 1, an
additional nonclustered index is created on the change table and the net
changes query function is created. Because this index needs to be maintained,
we found that enabling net changes can have negative impact on change data
capture performance.
The parameter @filegroup_name determines the name of the filegroup where the change tables are created.
If this parameter is not specified, the default filegroup will be used. Because
in most cases the default filegroup is PRIMARY and it is a best practice to
keep the PRIMARY filegroup small, a filegroup name should always be specified.
>Configuration of the Capture Job
When change data
capture is enabled on a database that is already configured for transactional
replication, the replication capture job is also used by change data capture.
For this reason, the discussion in this topic does not apply for databases that
have both transactional replication and change data capture enabled.
The capture job
created in SQL Server Agent for change data capture (typically named
cdc.database name_capture) contains only a call to the procedure sys.sp_MScdc_capture_job
without parameters. This procedure determines the scan job parameters and calls
sys.sp_cdc_scan with these parameters. The procedure sys.sp_cdc_scan
does the actual work by scanning the log (using sys.sp_replcmds) and
inserting the data that needs to be captured into the change tables.
There are four
parameters in sys.sp_cdc_scan that determine the behavior of the capture
job.
The first parameter
is continuous (default value 1). It determines whether the capture job runs
continuously (value 1) or exits after one scan phase (one shot mode, value 0).
One shot mode is recommended for testing only, not for production use. The main
reason for this is that the log records stay active until change data capture
has processed them. So, the log will keep growing when the scan job is not
running.
The other three
parameters determine how often and how many transactions are read from the log
and inserted to the change tables.
The parameter maxtrans (default value 500) determines
how many transactions are read from the log and written to the change tables.
This write is done in one transaction. The parameter maxscans (the default value is 10) determines how many of these
scan cycles are attempted before the job is ended (continuous = 0) or before
pausing for an interval (continuous=1). The length of this pause interval is
set by the parameter pollinginterval
(in seconds, with a default value of 5 seconds). WAITFOR is executed when a
scan cycle drains the log completely or when maxscans scan cycles are completed.
Figure 1: Influence of capture job parameters
The scan job
parameters can be changed by using sys.sp_cdc_change_job. Because these
parameters are only read during the initialization of the capture job, the
capture job needs to be stopped using EXEC sys.sp_cdc_stop_job @job_type =
'capture' and subsequently restarted using EXEC sys.sp_cdc_start_job @job_type
= 'capture' so the changed parameters are applied.
>Test Workload and Test Environment
To test change data
capture performance and influence on workloads, we tested against an ISV
application workload. The tests were done on a HP DL 580 (4 dual-core, 32 GB
RAM, x64) connected to a HP EVA SAN with separate disk volumes for data, log,
backup and change data capture filegroups. The application server was a HP
BL460 (2 quad-core, 32 GB RAM, 32bit) connected over Gigabit Ethernet. For more
information about hardware and software environment used in the test, see Appendix A: Test Hardware and Software.
In the application’s
main database, seven tables with a total of 603 columns were enabled for change
data capture. It should be noted that with this workload, change data capture
was used to capture transaction data (such as sales, shipment, audit data), not
master reference data (such as customer, product, sites data). Many customers
will use change data capture only to capture changes to master reference data,
not to transaction data. This would result in significantly lower transaction
rates that need to be captured by change data capture than with this workload.
The application
comprised of a ‘batch’ type workload with many discrete phases. With the
exception of phase 3 (see Table 1), nearly all
transactions to the application database wrote to the tables that were tracked
by change data capture.
Phase | Duration (min) | Average application database transactions per
second (tps) | Average tempdb transactions per second (tps) | Average CPU usage % (Database server) |
Begin | A few seconds | 2 big transactions (200,000
and 20,000 rows updated) | Not applicable | Not applicable |
1 | 7 | 5 | 780 | 6 |
2 | 10 | 10, spikes up to 200 | 40 | 13 |
3 | 17 | 3,000 gradually decreasing to 1800. Of these, about 1,000 writes to tables are
covered by change data capture | 5,000 decreasing to 3,200 | 60 |
4 | 3 | 1,800 – 2,500 | 1,000 | 37 |
5 | 7 | 850 | 2,000 | 62 |
6 | 6 | 3,000 – 4,000 | 12,000 - 1,200 | 45 |
7 | 3 | Varying, spikes up to 14,500 | Varying, low to medium | Varying, low to medium |
Table 1: Application load behavior
In addition to the
application workload, we also wanted to test synthetic workloads (simpler,
artificially scaled workloads for testing specific types of DML at constant
levels) that push change data capture to its limits and determine the behavior
of change data capture with different parameter sets under extremely high load.
Each of these workloads was designed to produce a load as high as possible on
our test system. All workloads ran from a test client with 100 parallel threads
and ran for 500 seconds on the same database that the application workload
used.
Specifics of synthetic workloads:
- Insert test 1: Insert into a wide table (82
columns) with 8 columns filled and 390,000 rows prefilled in the table, one row
inserted per transaction.
- Insert test 2: Insert into a narrower table (21
columns) with 10 columns filled and 666,000 rows prefilled in the table, one
row inserted per transaction.
- Small update test: Update 3 columns in the 82
column table, one row per transaction.
- Insert with trigger: The same insert as in test
1. The only difference is that the INSERT statement fires a trigger that
updates one column of the same table after the insert. This results in three
rows in the change table for every insert (one insert row, one row that
contains the values as they exist before the update, and one row that contains
the values as they exist after the update).
- Mix of insert, update, and select statements.
- Large update: Updates large numbers of rows
(random between 10 and 2,000 rows) per transaction.
- Large insert: 1,000 rows per transaction
inserted into a wide table (82 columns) with 8 columns filled and 390,000 rows prefilled
in the table. This workload ran for 300 seconds.
>Determining Performance and Characteristics of
Change Data Capture
To determine the performance characteristics of change data
capture, three main questions needed to be answered:
- What is the performance of change data capture
itself?
- How does enabling change data capture influence
the performance of the original workload?
- Which parts of the system (CPU, I/O, and so on)
are impacted by change data capture at what amount?
The performance of
change data capture itself is determined by the difference between the time
when the original transaction happened in the database and the time the change
record appears in the change table. This time is called latency. On a system where change data capture can fully keep up
with the workload, latency should not be significantly higher than the polling
interval, and it should not increase over time.
The latency can be determined using the dynamic management
view sys.dm_cdc_log_scan_sessions. For this white paper, this
dynamic management view was used to determine how long the scan sessions took,
how many commands and transactions were processed, and how big the latency is.
The sys.dm_cdc_log_scan_sessions dynamic management view keeps track of only
the last few scan sessions. Because we wanted to monitor the buildup of latency
over the full runtime of the workload, we captured the extended event sqlserver.cdc_session
to a file instead of querying sys.dm_cdc_log_scan_sessions.
Extended Events is a new,
high-performance event capturing mechanism in SQL Server 2008 that allows you
to register targets (in this case, a file) with events (in this case, the sqlserver.cdc_session event). Whenever the event fires (in
this case, for each log scan session) the event data is written to the target.
A big advantage of Extended Events is that they incur a very low overhead on
the server.
The extended event sqlserver.cdc_session contains the same information as the sys.dm_cdc_log_scan_sessions
dynamic management view, but it enabled us to
record every scan session that happened during the workload.
To capture the event, you first have to create an event session that
determines the event(s) to be captured and the target that the event will be
written to:
CREATE EVENT SESSION cdc_session ON SERVER
ADD EVENT sqlserver.cdc_session
ADD TARGET package0.asynchronous_file_target
(SET filename='c:\cdc_session.xel',
metadatafile='c:\cdc_session.xem', max_file_size=10)
Next, before each run of the workload, enable the event session:
ALTER EVENT SESSION cdc_session ON SERVER STATE = start
To record the time it takes
for change data capture to catch up with all transactions of a given workload
and thus to find the maximum latency, we also needed a way to find out whether
change data capture was finished reading the log after the workload finished.
This can be determined by querying the dynamic management view periodically (in
our test, every minute) to see whether the last scan was an empty scan (that
is, a scan that didn’t find any log records that needed to be inserted into the
change tables). The dynamic management view does not list empty scans
individually; instead it lists them as one row with the column empty_scan_count filled with the number
of empty scans. In the case of our workloads, change data capture was finished
catching up as soon as the last scan had a nonzero empty scan count:
SELECT empty_scan_count FROM appdb.sys.dm_cdc_log_scan_sessions WHERE start_time =
(select MAX(start_time) from appdb.sys.dm_cdc_log_scan_sessions)
After both the workload and
change data capture finish, the event session is disabled and the results are
written to a table. Because extended events are recorded in XML format, the
result XML needs to be parsed:
ALTER EVENT SESSION cdc_session ON SERVER STATE = stop
SELECT
CAST (event_data as
xml).value('(/event/data[@name="start_time"]/text/text())[1]','datetime2(2)' )
start_time,
CAST (event_data as
xml).value('(/event/data[@name="end_time"]/text/text())[1]','datetime2(2)' )
end_time,
CAST (event_data as
xml).value('(/event/data[@name="last_commit_cdc_time"]/text/text())[1]','datetime2(2)
' ) last_commit_cdc_time,
CAST (event_data as
xml).value('(/event/data[@name="duration"]/value/text())[1]','int' ) duration,
CAST (event_data as
xml).value('(/event/data[@name="tran_count"]/value/text())[1]','int' ) tran_count,
CAST (event_data as
xml).value('(/event/data[@name="command_count"]/value/text())[1]','int' )
command_count,
CAST (event_data as xml).value('(/event/data[@name="latency"]/value/text())[1]','int'
) latency
FROM sys.fn_xe_file_target_read_file
('C:\cdc_session_*.xel','C:\cdc_session_*.xem', null, null)
This capture results in a
complete table of all log scan sessions that happened during a workload. It
also contains the maximum latency. We also calculated the maximum throughput
from these numbers. (For throughput, we measured commands per seconds, not
transactions per seconds. To calculate change data capture throughput, we only
considered the time period when change data capture ran under full load. This
is the period where latency was > 5 seconds. We calculated the number of
commands processed and divided it by the time from the start of the first scan
that had a latency > 5s to the end of the last scan. For instance, if change
data capture finished exactly 1000 seconds after it reached a significant
latency and processed 1,000,000 commands in that time, the throughput would be
1,000 commands per second. This number is a good indicator on how many commands
change data capture can keep up with a given workload.)
To determine the influence of change data capture on the
workload performance, we simply used the run time of the workload for the
ISV application workload (which is recorded by the application itself in a
table). For the synthetic workloads that have a fixed run time of 500 seconds,
we recorded the number of completed transactions in the workload.
To determine the additional load change data capture incurs
on the system, we recorded a number of performance counters in the Windows®
operating system and SQL Server dynamic management views. The following
performance counters proved useful for measuring the influence of change data
capture on the system:
- Logical Disk:
Average Disk Queue Length for Data Disk (M:), Log Disk (L:) and CDC Filegroup
Disk (P:)
- Databases:
Transactions/second for the application database and for tempdb
- Processor: %
Processor time (Total)
To gather data about how big
the additional load on data and log files was ,we used the dynamic management
view sys.dm_io_virtual_file_stats. For information about whether change data
capture incurs additional waits to the system, we used the dynamic management
view sys.dm_os_wait_stats, which was reset before each workload run by using
the command DBCC SQLPERF("sys.dm_os_wait_stats" , CLEAR). We also
restored the application database and used DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS
to start each test run with the same baseline (by ensuring all caches were
cold).
>Considerations for Change Data Capture Performance
>Influence of Scan Job Parameters on Change Data
Capture Performance
As explained in Configuration
of the Capture Job, the capture job with default parameters (maxscans=10, maxtrans=500, and pollinginterval=5)
cannot, even in theory, process more than 1,000 transactions per second, on
average.
.gif)
Figure 2: Latency buildup
with default scan job parameters (maxscans=10,
maxtrans=500, and pollinginterval=5)
In practice, we found that with the ISV application workload
about 670 transactions per second were processed, with drops when very big
transactions needed to be processed. As a result, the total latency until all
transactions were processed by change data capture was 3123 seconds. The
latency started to build up as soon as the workload reached more than 670 transactions
per second, as shown in Figure 2.
To increase the number of transactions per second change
data capture can handle and to reduce latency you can do one of the following
three things:
- Increase the number of transactions per scan by
increasing maxtrans.
- Increase the number of scans before a pause by
increasing maxscans.
- Decrease the pause between scanning cycles by
decreasing pollinginterval.
Table 2
shows the results achieved by changing these parameters.
Table 2: Change data capture
performance with different scan job parameters (ISV workload)
All results shown in Table 2 were taken using the ISV
application workload with all 603 columns in seven tables captured by change
data capture, net changes disabled, and change tables on the same filegroup as
the application data. We also measured with change tables on a different
filegroup on a different LUN and found no significant difference with this
workload and hardware.
By only modifying scan job parameters, we were able to
reduce the latency from above 52 minutes to about 9 minutes (first green
column) and the throughput to 1,900 commands per second (dark yellow column).
The influence on workload run time was below 10% (first blue column). Notice
that workload run time without change data capture already has differences of
±5%. This can also explain the unexpected good workload run time results in the
tests with a low polling interval, which did not exist in a second test run.
Figure 3 shows Transactions/s and latency for a test with maxtrans=5000, maxscans=100, and pollinginterval=5.
Notice that both the time axis (bottom) and the latency axis (right) have a
different scale than in Figure 2. With these changed parameters, the latency is
much smaller, and latency does not build up until after 35 minutes of workload
run time, because change data capture can now keep up with higher transaction
loads.
.gif)
Figure 3: Latency buildup with changed scan job parameters
(maxscans=100, maxtrans=5000, and pollinginterval=5)
Our results show that change data capture has a very low
impact (below 10% change in workload run time) on the application workload and
can therefore be used without negatively impacting application performance on a
system with sufficient I/O capacity. On systems with a less optimal I/O
subsystem, especially for the log disk, results can be different. In our tests,
average disk queue length for the log disk tripled for most tests, the number
of log bytes written grew to 250% of normal, and the log disk had significant
read activity with change data capture. Other disk-related counters not shown
in the table also grew significantly, such as stall read and write time and
data disk writes.
In most test cases, change data capture incurred only a very
small increase in processor usage. The only exceptions were the tests where pollinginterval was reduced to 0 (so no
waits between scan cycles happened).
The tests prove that even a very big increase of maxtrans has no a significant negative
impact on application performance. We also found that from a performance
perspective, there was little difference between increasing maxscans or maxtrans and decreasing pollinginterval.
Other test series on the application workload showed a slight advantage in
increasing maxtrans over maxscans. But setting maxtrans very high leads to long scan
cycle durations (we measured scan cycles up to 390 seconds for maxtrans = 500,000) and huge numbers of
changed rows written to the change table in one transaction, which may hold a
large number of locks and thus negatively impact reads from the change tables.
On the synthetic workloads, the workload 2 (small inserts to
narrower table) showed that setting maxtrans
too high (50,000 and 500,000) or setting pollinginterval
too low (0) can decrease performance (lower transactions/s for the workload,
and higher latency for change data capture).
.jpg)
Table 3: Change data capture performance for synthetic workload 2
The other synthetic workloads didn’t show similar behavior.
Recommendation:
If change data capture with default parameters cannot keep up with the workload
and latency becomes too high, you can increase maxscans and/or maxtrans
by a factor of 10, or you can reduce pollinginterval
to 1. If your latency decreases but is still too high, you can further increase
maxtrans, but monitor the performance
of your workload, latency and performance of queries to the change tables
closely.
>Influence of Workload Characteristics on Change
Data Capture Performance
For planning a change data capture solution, workloads
characteristics are very important. The main factors to consider are
INSERT/DELETE vs. UPDATE, and whether the DML operations impact one row per
transaction vs. many. To compare INSERT and UPDATE operations, we compared
workload 1, which inserts one row per transaction into a table, with workload
3, which updates one row per transaction in the same table. For more
information about these workloads, see Test Workload and Test Environment.
.jpg)
Table 4: Comparison between INSERT and UPDATE transactions
SQL Server is able to maintain a much higher rate of UPDATE
transactions than INSERT transactions (about 5x transactions per second). For
change data capture, one INSERT only creates one row in the change table. One
UPDATE, however, inserts two rows in the change table. In total this means that
the UPDATE workload has to insert about 10 times the number of rows into the
change table that the INSERT workload does, which leads to the significant
latency that shows in Table 3.
Workload performance (number of transactions in 500 seconds) reduction with
change data capture was between 3% and 14% for the INSERT workload and between
10% and 17% for the UPDATE workload. This is a very low overhead for the amount
of work that needs to be done because of change data capture.
A common scenario in applications is that a row is inserted
to a table and the immediately updated to fill missing fields. This can happen
in the application or by utilizing an INSERT trigger that looks for missing
fields and then updates them if necessary. We tested change data capture in
this scenario. The table that workloads 1 and 4 insert into has an INSERT
trigger that checks if one column is filled and updates it with another
column’s value if it is NULL.
.jpg)
Table 5: Change data capture
performance when inserted rows are immediately updated
In synthetic workload 1, we filled this column in the INSERT
command so no update would happen in the trigger. In synthetic workload 4, the
column value was not filled, so the trigger updated the row immediately after
insert (within the same transaction). This resulted in three rows instead of
one row that needed to be written to the change table for each insert to the
original table. The results show that both workload and change data capture
performance are better when no update happens immediately after insert.
Recommendation: Try to avoid scenarios where a row needs to be
updated immediately after insert.
Workload 6 (large updates, 10-2,000 rows updated per
transaction) showed no significant difference in performance or latency
independent of change data capture parameters. The reason for this behavior is
that with small amounts of big update transactions, change data capture has to
insert two rows to the change table for every row that was updated in the
source table. So with this workload, one transaction leads on average to 2,000
inserts to the change table. The time this large insert takes is so high that
the scan job parameters do not have a significant impact any more. Even with
default parameters, change data capture has to insert 1 million rows to the
change table in one scan cycle. The latency here was only determined by the
amount of data that needed to be inserted to the change table in total. Latency
grew up to 13,000 seconds (with all columns captured) or 7,000 seconds (with
reduced number of columns captured) for an 800-second workload that updated
about 18 million rows.
Recommendation: Try
to avoid using change data capture to capture changes to tables that have
frequent large update transactions.
Workload 7 (large inserts, 1,000 rows inserted per
transaction, about 1.7 million rows inserted in 300 seconds) also showed no
significant difference in performance or latency independent of change data
capture parameters. The latency was at about 250-399 seconds, depending on net
change support, the number of columns captures with change data capture, and
scan job parameters. This means that change data capture took about the same
time to insert the rows into the change table after the workload finished as
the original workload took.
>Influence of sys.sp_cdc_enable_table Parameters on
Change Data Capture Performance
The application workload has
change data capture enabled on seven tables with 603 columns total. When
studying the change tables we found that only 171 columns had more than one
distinct value. The other columns where either NULL or had always the same value.
In a second test run, we decided to capture only the 171 columns that had
different values. We did this by specifying the @captured_column_list parameter in sys.sp_cdc_enable_table. We found that the decreased number of
columns captured made a huge difference in overall change data capture
performance, especially when specifying nondefault scan job parameters.
.jpg)
Table 6: Change data capture
performance in dependency of data amount captured
By reducing the number of columns
captured to the ones that have relevant insertions or updates, we were able to
reduce latency to below 8 minutes for the overall workload and at the same time
reduce the workload runtime and reduce the load on the I/O subsystem incurred
through change data capture. For the synthetic workloads, we saw similar
improvements in Transactions/s and reductions in latency when the number of
captured columns was smaller..gif)
Figure 4: Latency buildup with reduced number of columns
captured (maxscans=100, maxtrans=5000, and pollinginterval=5)
Figure 4, which uses the same scan job parameters as Figure 3,
shows that with a reduced number of columns captured, not only is the latency
lower and the performance is better, but the latency also shrinks between
minute 36 and minute 43.
Recommendation: Always limit the list of columns captured by change data capture
to only the columns you really need to track by specifying the @captured_column_list parameter in sys.sp_cdc_enable_table.
The @supports_net_changes
parameter can have significant influence on change data capture performance.
Especially if change data capture is just able to keep up with a workload, the
additional load that is incurred by maintaining the additional index to support
net changes queries can be enough to prevent change data capture from keeping
up with the workload. The following table shows performance with and without
supporting net changes queries for the synthetic workload 4 (Insert with
trigger that updates the row immediately after insert).
.jpg)
Table 7: Influence of the @supports_net_changes parameter
Recommendation: If
you do not require support for net changes, set @supports_net_changes to 0. If you do require querying for net
changes but change data capture latency grows too big, it can be worthwhile to
turn support for net changes off and do the net change detection later in a
staging database.
>Cleanup Job Considerations
To test cleanup performance, cleanup was run manually after
the application workload finished. In the test, one change table that contained
4,147,855 rows was cleaned completely with varying threshold parameters by
running sys.sp_cdc_cleanup_change_table.
The configurable threshold value limits how many entries are deleted in any
single statement. The test was done after querying the change table (by running
SELECT count(*) FROM change_table_name) to simulate the scenario when a change
table is first read and then cleared.
When the cleanup job was run without additional workload, it
showed that an increase of threshold up to 500,000 could improve overall
cleanup performance. Improving it further to 5,000,000 (so the whole table
would be cleaned up in one transaction) lowered the performance, as shown in
Figure 4.
.gif)
Figure 4: Cleanup job run
time depending on threshold
When the cleanup job was run in parallel with a workload
(workload 5: mix of insert, update, and select transactions), the time to run
cleanup grew significantly. This test was done with the following change data
capture job parameters: maxtrans=5000,
maxscans=100, and pollinginterval=5.
.gif)
Figure 5: Cleanup job run time with concurrent workload
Again, a threshold of 500,000 is a sweet spot for this
workload. Figure 5 also illustrates the average transaction run time while the
cleanup job runs. Notice that especially with big thresholds, a lock escalation
on change tables can happen. This could degrade the application response time,
and it could increase the latency of the change data capture scan job.
Recommendation:
If possible, run cleanup when there is no other workload active. Test
increasing the threshold parameter until you find a sweet spot for your
workload.
>Transaction Log File Considerations
One of the most important things to watch out for with
change data capture is the transaction log I/O subsystem. As stated earlier,
log file I/O significantly grows when change data capture is enabled in a
database. In addition to that, log records stay active until change data
capture has processed them. This means that especially in environments where a
large latency builds up, the log file can grow significantly because the log
space cannot be reused as long as the change data capture scan job has not
processed the log records, even in simple recovery model, or even after a log
backup in full recovery model.
It should be noted that change data capture works with all
recovery models. But when change data capture is enabled, operations that would
normally be minimally logged in simple or bulk-logged recovery models are fully
logged to enable change data capture to capture all changes.
Be aware also that when a log disk becomes full, you cannot
shrink the log file by backing it up and manually shrinking it until change
data capture has processed all transactions. But change data capture cannot
process the transactions when the log disk is full, because change data capture
writes to change tables are logged operations. In this case, the easiest way to
recover from this situation is to temporarily add another log file on a
different disk.
Recommendation: Whenplanning change data capture
architecture, take a significant increase in log size and log volume I/O
operations into account. Depending on the amount of data captured by change
data capture and the time change data capture needs to catch up with changes,
the log file size can grow to 200-300% of the original size, in some cases even
more. Size the log file accordingly. Make sure that the growth in log file size
does not result in a completely full log disk.
>Filegroup Considerations
We found no
difference in our tests between having the change table on the same filegroup as the application table and having them on
different filegroups. This might be different when the data files are located
on an I/O subsystem that is already under heavy load. In this case, putting
change tables in a filegroup that is located on a different set of physical
disks can improve change data capture performance.
Recommendation: To keep the PRIMARY filegroup small and to have a clear distinction
between application data and change data, you should specify @filegroup_name in sys.sp_cdc_enable_table.
In addition to the change tables, the table cdc.lsn_time_mapping can also grow to a
significant size and become a target of many I/O operations. This table is created
on the default filegroup when sys.sp_cdc_enble_db
is executed on a database.
Recommendation:
Consider changing the default filegroup for the database before you execute sys.sp_cdc_enble_db, so that change
data capture metadata and especially cdc.lsn_time_mapping
are located on a different filegroup than PRIMARY. You can change the default
filegroup back after the change data capture metadata tables are created.
>Conclusion
Change data capture provides an easy and high-performing way
to capture changes in a set of tables. There are many ways to tune the
performance of change data capture. Understanding workload characteristics,
system I/O usage, and allowable latency is key to tuning change data capture
performance without negatively impacting the base workload. Tuning the scan job
parameters, sys.sp_cdc_enable_table
parameters, and, if possible, queries in the workload can significantly improve
change data capture performance under load.
Here is a summary for the recommendations in this white
paper.
Storage:
- Whenplanning
change data capture architecture, take a significant increase in log size and
log volume I/O operations into account.
- Consider specifying a filegroup in sys.sp_cdc_enable_table.
- Consider changing the default filegroup for the
database before you execute sys.sp_cdc_enble_db so that change data capture
metadata and especially cdc.lsn_time_mappings are located on a different
filegroup than PRIMARY.
Workload behavior:
- Try to avoid scenarios where a row needs to be
updated immediately after insert.
- Try to avoid using change data capture to
capture changes to tables that have frequent large update transactions.
Change data capture parameters:
- Always reduce the list of columns captured by
change data capture to only the columns you really need to track.
- If you do not require support for net changes,
set @ to 0.
- Use
to see whether change data capture can keep up with your workload.
- If change data capture cannot keep up with your
workload, modify scan job parameters and restart the scan job.
Cleanup:
- If possible, run cleanup when there is no other
workload active.
- Test increasing the threshold parameter until
you find a sweet spot for your workload.
>Appendix A: Test Hardware and Software
>Database Server
HP DL 580
- 4 socket dual core
- Intel Xeon 3,4 GHz
- 32 GB RAM
- Windows Server® 2003 Enterprise x64 Edition with
Service Pack 2 (SP2)
>Application Server (for an ISV Application)
HP BL 460
- 2 socket quad core
- Intel Xeon 2,83 GHz
- 32 GB RAM
- Windows Server 2003 32bit with SP2
>Storage
HP EVA SAN with 2 disk groups:
- Disk group 1 with 152 disks 300 GB each @15,000
RPM, RAID1+0. Disk group 1 contained separate logical volumes for the
following:
- Database data files for the source tables
- Database data files for the change tables
- Data files for the database
- Backups
- Disk group 2 with 88 disks 72 GB each @15,000
RPM, RAID1+0. Disk group 2 contained:
>Software
SQL Server 2008
For More Information:
http://www.microsoft.com/sqlserver/: SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter
http://www.sqlcat.com/: SQL Server customer advisory team
Did this
paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to
5 (excellent), how would you rate this paper and why have you given it this
rating? For example:
- Are
you rating it high due to having good examples, excellent screen shots, clear
writing, or another reason?
- Are
you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback
will help us improve the quality of white papers we release.
Send feedback.