by John Welch, SQL Server MVP and Chief Architect at Mariner. You can view John’s blog at http://agilebi.com/cs/blogs/jwelch/.
Introduction
It is important to have a defined approach for handling
configurations in Integration Services packages. Configurations are important
to making packages portable - being able to deploy them in multiple
environments without making changes to the package file itself. Developing a
consistent approach is also important because it makes maintenance simpler, and
supports multiple developers much more easily.
Considerations for Choosing a Configuration Approach
There are several important aspects in defining an approach
to configurations.
1. Understand the advantages and limitations of each configuration type.
Books Online includes details about each type of configuration (http://msdn.microsoft.com/en-us/library/ms141682(SQL.100).aspx).
One limitation has a direct effect on the configuration approach - Parent
Package Variable configurations are always applied later than the other types.
This means that a Parent Package Variable configuration cannot be used to set
values that other configurations depend on.
2. Some configuration types are excluded by environmental factors. For
example, in some environments, the people responsible for the servers may not
allow new environment variables to be created on them. This eliminates
environment variable configurations from consideration. It also prevents the
use of indirect configurations, since those rely on environment variables as
well. If the deployment environment does not include a SQL Server relational
instance, then SQL Server configurations cannot be used. Sometimes, it may be
worth seeking an exception to such restrictions, as they add complexity to the
configuration solution.
3. How the packages will be run? Will the packages be run as single,
standalone packages? Or will multiple packages be run together as a unit, using
a parent package to call multiple child packages? If the packages are to be run
as single units, the configuration approach needs to avoid the use of parent
package variables. In some cases, the packages may need to be capable of
running both standalone and as a unit, which can require special handling.
4. How will the location of the configurations be specified? With SQL
Server configurations, Integration Services needs to know which database holds
the configuration table. With an XML configuration file, Integration Services needs
to know the path to the XML file. These values are stored in the package, but
they often need to be updated or overridden when the package is deployed.
5. Which properties are going to be grouped in a single configuration file?
An Integration Services configuration container can store multiple property
settings. For example, an XML configuration file can hold connection string
values for two separate connection managers. When the configuration is applied,
SSIS will attempt to set the connection strings for all connection managers
specified in the file. The ConfigurationFilter value serves the same purpose of
grouping property settings together for SQL Server configurations. If you
attempt to apply a configuration to a package, and the configuration contains a
property that does not exist in the current package, a warning will be
generated. For this reason, you should only include multiple property settings
in a configuration file or group when you are confident that any package that
is using the configuration will have all the properties defined.
6. How often will the configurations need to be changed? Will the
actual configuration need to be changed dynamically when the package is
executed, for example, by using the /CONFIG, /CONN, or /SET switch of DTEXEC?
Or will the configuration be consistent whenever the package is executed in a
given environment?
Understanding How Integration Services Applies Configurations
Integration Services applies configurations with a “last in
wins” approach. The last configuration applied will overwrite any previous
values. However, it will only apply the property settings that are in the
configuration container, which means that two configuration files or groups can
overlap. Configurations will always be loaded in the order specified in the
Package Configurations dialog, with the exception of Parent Package Variable
configurations, which are always applied after the other configuration types.
SQL Server 2005 Integration Services
When you execute the package in SQL Server 2005,
configurations are applied in this order:
1. The package file is loaded.
2. The configurations specified in
the package at design time are applied in the order specified in the package
(except for Parent Package Variables).
3. Any options specified from the
command line are applied. Any configurations specified on the command line
overwrite the current values; however, they do not trigger a reload of the
configurations if a changed value impacts a configuration dependency. For
example, if the connection string used for SQL Server configurations is updated
from the command line at run time, the package will still retain any other
values from the design-time SQL Server configuration database.
4. Parent Package Variable
Configurations are applied.
5. The package is run.
This order can create a scenario that is difficult to
resolve with Integration Services configurations. If you use XML - based
configurations, you need to specify the path to the configuration file. If you
are using SQL Server configurations, you need to specify the initial connection
string to the database that holds the configuration table. Ideally, you would
be able to override these values by using the command line switches of DTEXEC.
However, since configurations are applied before values specified on the
command line, this does not work. For example, if you specified a new
connection string for the configuration database at the command line, the SQL
Server configurations would be applied from the original connection string.
Then the connection string from the command line would be applied. However, the
SQL Server configurations would not reload, so there is effectively no way to
override the configurations from the command line.
SQL Server 2008 Integration Services
This behavior has been changed in SQL Server 2008.
Configurations are applied twice - once before the command line options, and
then again after applying the command line options. The configurations are
applied in this order:
1. The package file is loaded.
2. The configurations specified in
the package at design time are applied in the order specified in the package
(except for Parent Package Variables).
3. Command line values are
applied.
4. The configurations specified in
the package at design time are reloaded in the order specified in the package
(except for Parent Package Variables). The configurations will use any updated
values from the command line when they are reloaded, so it is possible to
change the connection string for a SQL Server configuration.
5. Parent Package Variable
Configurations are applied.
6. The package is run.
This is an improvement over the behavior in SQL Server 2005.
However, there is still a potential problem with the new functionality. Since
the design time configuration is applied twice, it can overwrite a value
specified on the command line. This means that you cannot override a configured
value from the command line without also overriding the design time configuration
location.
Parent and Child Packages
Command line options are not passed from a parent package to
a child package. So it is not possible to override the configuration location
on a child package by using a DTEXEC command line switch to override it on the
parent.
Sample Approaches to Configurations
Now that the various aspects of configurations have been
covered, we will review two approaches to configurations. These approaches
should cover the most commonly encountered scenarios.
SQL Server Approach
Advantages and Disadvantages
The first approach uses SQL Server as the primary
configuration location. It uses an environment variable to get the initial
connection string for the SQL Server database. The advantages of this approach
are:
- The package can run independently, as there is
no dependency on parent package variables or command line options to specify
the primary configuration location. The location is handled by the environment
variable.
- This approach can also serve as a base for
parent-child packages. The core configuration items can be handled through the
environment variable and SQL Server configurations, with remaining items passed
as Parent Package Variable configurations. However, the packages can still be
run in a standalone manner easily.
- Multiple configurations can be stored in the
same SQL Server database, so this approach only requires a single environment
variable to contain the connection string. If we were using XML configurations,
and required multiple configurations, we would have to create an environment
variable for each XML configuration file.
The disadvantages of this approach are:
- It requires an instance of SQL Server and an
environment variable. If these are prohibited in the deployment environment,
this approach is not usable.
- Users of the computer are not protected from
seeing the values of environment variables, so this approach should only be
used if you are using Windows Authentication to connect to the SQL Server
database for configurations. Using Windows Authentication will prevent you from
having to store the password in the environment variable.
- Setting up a system environment variable
requires system administrator privileges.
Setting Up the Environment Variables
To set up this approach at design time, follow these steps:
1. Before opening the Business
Intelligence Developer Studio (BIDS) environment, create a system environment
variable named SSIS_ROOT to hold the connection string for the SQL Server
database. Enter the connection string into the value for the variable. Directions
for creating a system environment variable can be found here: (http://technet2.microsoft.com/windowsserver/en/library/4029b464-c281-49af-84d5-6cd092e61a651033.mspx?mfr=true).
This needs to be done before opening BIDS, since it caches the environment
variables when it is first opened. It should also be created as a system
environment variable, so that all users of the computer will have access to it.
2. Open BIDS and create a new
Integration Services project. Open the newly created package.
3. Create an OLE DB connection
manager named Configuration that points to the SQL Server database that will be
used to store configurations in the development environment.
.jpg)
Figure 1: Creating an OLE DB connection manager for SQL Server
configurations
4. Open the Package Configurations
dialog box by choosing Package Configurations from the SSIS menu.
.jpg)
Figure 2: Selecting the Package Configurations option on the SSIS menu
5. Select the Enable package
configurations and add a new Environment Variable configuration that points to
the SSIS_ROOT environment variable. Select Next to continue.
.gif)
Figure 3: Specifying an environment variable in the Package
Configuration Wizard
6. Select the Configuration
connection manager’s ConnectionString property. Select Next to continue.
.gif)
Figure 4: Select the target property in the Package Configuration
Wizard
7. Give the configuration an
appropriate name, and select Finish to complete the wizard.
Adding the Configurations
Now the package is prepared to have SQL Server
configurations added to it. The process below should be repeated for each set
of configurations that you want to store for the package.
1. Add a new SQL Server
configuration that uses the Configuration connection manager. Note that you can
create the configuration table in the database if it does not already exist.
Set the configuration filter to an appropriate value. Select Next to continue.
.gif)
Figure 5: Setting up SQL Server configurations in the Package
Configuration Wizard
2. On the next page, select the
appropriate properties that you would like to configure for your package. The
values selected here will be stored in the SQL Server table, with the same
Configuration filter value, so that they will be treated as a unit. Click next
to continue.
3. Give the configuration an
appropriate name, and select Finish to complete the wizard.
Reviewing the Configurations
The Package Configuration dialog should look similar to the
following image, with the first configuration referencing the SSIS_ROOT
environment variable, and subsequent configurations referencing SQL Server:
.gif)
Figure 6: Review
configurations in the Package Configurations Organizer
Deploying the Package
To deploy this package to a different environment, follow
these steps:
1. Deploy the package files to the
new environment, either by copying the files or by deploying the packages to
the MSDB database in SQL Server.
2. Create the SSIS_ROOT environment
variable in the new environment and set the value to the appropriate connection
string for that environment.
3. For an initial deployment, copy
the configuration table to the configuration database for the new environment.
Update the Configured Values stored in the table to reflect the appropriate
values for the new environment. For subsequent deployments, only new or updated
values will need to be adjusted.
4. At this point, the packages
should be ready to run in the new environment, and to use the configured values
from the new environment.
Please note that while the above approach uses a separate
environment variable configuration to get the initial connection string for the
SQL Server, it is possible to use indirect configurations to achieve the same
goal. However, both approaches are functionally equivalent, and the approach
used above makes it clearer where the initial connection string is coming from.
Parent Package Variables Approach
Advantages and Disadvantages
The parent package variable approach uses the ability of
Integration Services to pass values between packages. It requires a parent
package that retrieves all the configuration values, and then calls the child
packages. The child packages have only Parent Package Variable configurations
defined. The advantages of this approach are:
- It does not require the use of environment
variables, SQL Server, or XML files. Everything is driven from the parent
package.
- The Parent Package Variable approach is often
the most flexible if the packages will be deployed to a large number of
environments, where you cannot be sure what options for configurations might be
available. Changes based on the external environment affect only the parent
package.
- All of the values in the parent package can be
set from the command line. These can then be passed to child packages through
the Parent Package Variable configurations. For example, you could use an XML
configuration (specified at run time through the /CONFIG switch of DTEXEC) to
set all the values in the parent package. The file from the /CONFIG switch
won’t be passed automatically to the child packages, but the values from the
parent package will be passed.
The disadvantages of this approach are:
- The packages are typically not usable in a
stand-alone manner. The package can be run with the default values stored in
the package, but this may not be adequate in multiple developer environments,
or if the packages need to be run standalone in test or production
environments. One option to work around this is to provide a generic parent
package which can be used to execute any child package with the proper values.
- It does not blend well with other configuration
types. Since Parent Package Variables are always applied last, they will
overwrite any values from other configurations. They also cannot be used to
override the initial connections for other configuration types, as they are
applied after the other configurations have already been loaded.
Setting Up the Configurations
To set up this approach at design time, follow these steps:
1. Create a parent package. Define
a variable in the parent package for each value that you want to configure in
the child packages.
.gif)
Figure 7: Setting up variables in the parent package
2. Add the appropriate logic to
the package to set the variable values to the desired values. This could be by
using other configuration types to set the variables, by using a Script Task to
set the values by reading from a non-SQL Server database, or by hard coding the
values into the packages. For this example, we will add an XML configuration
that holds the value for each variable.
.gif)
Figure 8: Setting up XML configurations in the Package Configurations
Organizer
3. Add a connection manager and an
Execute Package Task for each child package that needs to be called.
4. In each child package, enable
package configurations. Add a Parent Package Variable configuration. Specify
the name of the parent variable that you want to use in this package. Click
Next to continue.
.gif)
Figure 9: Setting up Parent Package Variable configurations in the
child packages
5. On the properties page, pick
the appropriate package property that should be set from the parent variable.
Click next to continue.
6. Give the configuration an
appropriate name, and select Finish to complete the wizard.
Deploying the Packages
To deploy these packages to a different environment, follow
these steps:
1. Deploy the package files to the
new environment, either by copying the files or by deploying the packages to
the MSDB database in SQL Server.
2. Copy the XML configuration file
to the new environment. Update the variable values in the XML file to reflect
the appropriate values for the new environment. For subsequent deployments,
only new or updated values will need to be adjusted.
3. Run the parent package from
DTEXEC and use the /CONFIG switch to specify the location of the XML
configuration file in the new environment. The packages should run, and use the
updated values from the configuration.
Recommendations for Specific Scenarios
If the deployment environment supports environment variables
and has a SQL Server database available, the SQL Server approach discussed in
this article should be used. If a SQL Server database is not available, then
XML files with indirect configurations (based on environment variables) can be
used in a very similar way.
If the deployment environment does not allow environment
variables, but a consistent path can be provided in each environment, then XML
configurations can be used. The path should be identical in each environment
(for example, “C:\SSIS_Config\”), so that a local copy of the configuration
file can be deployed to each environment without having to modify the path to
point to a new configuration location.
If environment variables are not allowed, and a consistent
local path cannot be used, the Parent Package Variable approach is often the
best approach, as it isolates any environmental changes to the parent package.
This is not an exhaustive list of the possibilities, but
these are approaches that have worked well for me in a variety of situations.
Conclusion
This article has described two sample approaches to
configurations in SQL Server Integration Services. In addition, it has
presented many of the factors that must be considered before selecting a
configuration approach. The samples here are usable as they are, but in many
cases, you will want to modify or blend multiple approaches. Integration
Services configurations are very flexible, and, used properly, can be a
powerful way to make packages more portable and reusable.
About the author. John Welch is Chief Architect with
Mariner, a consulting firm that specializes in enterprise reporting &
analytics, data warehousing and performance management solutions. John has been
working with business intelligence and data warehousing technologies for 6
years, with a focus on Microsoft products in heterogeneous environments. He is
a Microsoft Most Valued Professional (MVP), an award given due to his
commitment to sharing his knowledge with the IT community. John is an
experienced speaker, and has given presentations at Professional Association
for SQL Server (PASS) conferences, Software Development West (SD West),
Software Management Conference (ASM/SM), and others. John has also been
published in DM Review, SQL Server Professional, and XML Developer.