Designing Your SSIS Packages for Parallelism (SQL Server Video)
Technical notes are selected excerpts from the video.
The following are the tenets of parallel design: partition the problem, eliminate conflicts, and schedule efficiently.
The idea of parallel design is to break a large problem into smaller, independent pieces, and spread the work out. For Integration Services, breaking the problem into smaller pieces means partitioning the data to be processed. You’ll want to try and get these partitions of equal size as this will make optimal scheduling and distribution much easier.
Next, you need to eliminate conflicts between the smaller pieces of work, so that they can happen in parallel without interfering with each other. You’ll want a stateless design, which means each unit of work is self-contained and does not need to coordinate with anything outside to do its job. You’ll also want to reduce contention for external resources.
Finally, you need to take the small pieces of independent work and spread them out to get them done faster. This means scheduling and distributing the work so that your most critical resources are used efficiently. It also means using time wisely and not letting a long task dominate the runtime. If you imagine a Gantt chart of the work to be done, you want the workload balanced so that the entire set of work finishes as early as possible.
Partitioning the Problem
You’ll want to partition your source data into smaller chunks of about the same size. You can do this based on natural ranges in the data, such as by time or geography. Or, if you have an identity column, you can modulo the values in it to identify equal partitions. Otherwise, you can fall back to applying a hash function on key columns to produce partitions.
In addition to partitioning the source table, you’ll want to partition destination tables to match the partitioning of your source data. The SQL SWITCH command provides a very efficient mechanism to add and remove partitions to a table.
After partitioning the problem, you need to eliminate any potential conflicts between those partitions. Having a stateless design is important to avoid complex interactions and eventual conflicts. To ensure the package is stateless in its design, you’ll want to pass into it all the information it needs to do its work.
Be sure to avoid lock contention. Many connections inserting to the same table will eventually cause contention. To avoid this, use partitioned tables and take advantage of the SQL SWITCH.
Watch out for any control hardware contention. If Disk I/O is an issue, move to more or faster drives. If network I/O is an issue, add or upgrade network controllers. If CPU or memory usage is an issue, run on a computer with more processors or more memory, or run across several computers. And remember, Integration Services is design to run in memory, so be sure each package has enough memory.
After partitioning into smaller tasks and eliminating conflicts between those tasks, you’ll want to schedule those tasks to run efficiently. To create an efficient schedule, start by defining a priority queue of the work to be done. A SQL table is great for this purpose.
Then, start multiple copies of the package you have built to do the work. Using the Windows START command to invoke dtexec.exe is a simple and effective way to do this. The number of packages you start determines the degree of parallelism you will use.
Each of these packages will then process work from the task queue based on priority until there is no work left to be done. A loop inside the package firsts gets a task from the priority queue, then does the work defined by that task, and then repeats until the task queue is empty.
In the first run of the demo, each task is being processed in sequence by a single package instance.
In the second run of the demo, two processes are working in parallel. The tasks get completed in batches of two and the total demo run time drops in about half from about 64 seconds to 36 seconds.
In the third run of the demo, four processes are working in parallel. The time for individual tasks has risen from 9 seconds to about 14 seconds. And, the total run time has dropped from about 36 seconds to about 28 seconds.
In the last run of the demo, eight processes are working in parallel. As all tasks get worked on simultaneously, the time for each task has risen to about 27 seconds and the total run time is almost the same as with four processes. What’s happened here is that we’ve hit a disk I/O bottleneck as all eight processes contend with each other to read their data files from the disk. To solve this problem, we would want to spread the files across separate disks and controllers, or move to a faster disk technology.