by Rafael Salas, SQL Server MVP and Senior
Consultant at Mariner. You can view Rafael's blog at:
http://rafael-salas.blogspot.com/.
Introduction
With the 2008 release, SQL Server Integration Services
(SSIS) continues its advance in the enterprise data integration arena.
Integration Services offers an entire architecture that combines the required
elements for building solutions that provide high scalability and performance.
A well-architected data extraction, transformation and load
(ETL) system should be able to respond to changes in the environment or to
other external factors, ideally without editing the source code. Typical
examples of these changing factors are:
- Differences between development and deployment
environments. Typically, you develop SSIS packages on a client machine and then
deploy them to a server. These two environments are not necessarily identical,
as they may use different server names, connection strings, folders, and so
forth.
- Conditional logic that needs to be controlled
externally. For example, a package that performs either a full or incremental
processing of a SQL Server Analysis Services cube based on an unscheduled event
or upon user request.
- Execution of parameterized scripts or tasks. For
example, the WHERE clause in a SQL statement is based on date range that is
driven by variables.
Unfortunately, many times data integration practitioners
fail to bring such flexibility into their solutions, affecting the reliability
and the cost of maintenance and deployment of the system.
The good news is that SSIS comes with a range of options
that help you when building solutions that respond favorably to these
scenarios. These options are:
- The /SET option of the DTExec command prompt
utility.
- Property expressions.
- Package Configurations.
In essence, these three methods let you modify the values of
package properties (like connection strings, variable values, network drive
paths, etc) each time you run the package, without the need to edit the package
in Business Intelligence Development Studio (BIDS). Since the package
configuration is the most versatile, but perhaps the most complex of these
methods, the remainder of this article focuses on explaining the main concepts
and considerations for its implementation.
Package Configurations: What They
Are and What They Aren’t
When you add configurations to a package, you are basically
exposing package properties and allowing them to be updated with a new value
that comes from a file, table, environment variable, registry key or parent
package variable. Package configurations are disabled by default, and have to
be enabled on each package in which you want to use them. You can enable package
configurations by choosing the Package
Configurations option on the SSIS
menu as shown in Figure 1.
.jpg)
Figure 1. The SSIS
menu and the Package Configurations option.
After the Package
Configuration Organizer is visible, you have to check the box next to the Enable Package Configurations option.
See Figure 2 for more details.
.jpg)
Figure 2. The Package
Configurations Organizer.
From the Package
Configurations Organizer, you can launch the Package Configuration Wizard
to create and edit configurations; you can also remove and set the order in
which the configurations should be applied. The functionality of the Package
Configuration Wizard is covered in the following section.
Package configurations are available in Integration Services
in both SQL Server 2005 and 2008. While they offer identical options, they
differ in the order in which configurations are applied to the package at
execution time.
In SQL Server 2005 Integration Services, all package
configurations, except Parent Package Variable configurations, are applied
before applying the options specified with the DTExec command. Parent Package
Variables are applied after applying DTExec options.
In SQL Server 2008 Integration Services, package
configurations are applied twice, before and after applying the options of the
DTExec command prompt utility. This should be seen as an improvement, since you
can now use the /SET, /CONF or /CONN options of the DTExec utility to alter the
original definition of the configurations. For example, you can now use the /CONN
option to alter the connection manager being used in SQL Server configurations
- something that was not possible with the 2005 version.
Package configurations can fail when they are being
applied.When that happens warning messages are generated, the values available
inside of the package (design time values) are used, and the package is
executed. Including logging capabilities in your packages is a good way to
capture failures in package configurations.
You can save some development time by using package templates.
If you anticipate creating a large number of packages that use a common set of
package configurations, you can simply include the required package
configurations in a template, and then use the template when creating new
packages.
The data flow task in SSIS does not allow changes to the
pipeline structure at run time (number of columns, column names, and data
types), and package configurations should not be used to attempt such changes
in dataflow pipeline metadata, since it would cause validation errors in the
package.
Types of Package Configurations
There are five different types of configurations you can use
in SSIS, and while all of them serve the same purpose of updating the value of
package properties, their actual behavior and implementation differ from each
other. The package configuration types
are:
- XML configuration file
- Environment variable
- Registry entry
- Parent package variable
- SQL Server
Each configuration type, except for the Environment
variable, provides a direct and indirect method to provide the location of the
configuration information.
XML File Configurations
With this type of configuration, the configuration value and
the path of the property being configured are saved in a XML file. Here is a sample of a very simple XML configuration
file:
<?xml version="1.0" ?>
- <DTSConfiguration>
- <DTSConfigurationHeading>
<DTSConfigurationFileInfo GeneratedBy="MARINER\rsalas" GeneratedFromPackageName="Test XML Config - MSDN Article" GeneratedFromPackageID="{E108046D-9862-443B-B98A-4DCC342AE239}" GeneratedDate="6/29/2008 11:30:12 AM" />
</DTSConfigurationHeading>
- <Configuration ConfiguredType="Property" Path="\Package.Variables[User::ConfigTarget].Properties[Value]" ValueType="String">
<ConfiguredValue>Run time Value from config file - direct Method</ConfiguredValue>
</Configuration>
-
</DTSConfiguration>
An XML configuration file has two parts. The header contains
metadata about the file itself, like creator, the name and ID of the package
that was used when creating the file, and the creation date and time. The
configuration section is where the path to the properties being updated and the
configuration values to be used are stored.
In the example above, there is only one configuration entry, whose Path
attribute points to the Value property of a variable declared at the package
level called [User:: ConfigTarget]:
Path="\Package.Variables[User::ConfigTarget].Properties[Value]"
And the value of the variable will be updated using the
string contained in the ConfiguredValue element:
<ConfiguredValue>Run time Value from config file - direct Method</ConfiguredValue>
The good news is that you do not need to create the XML file
manually, since the Package Configuration Wizard can create it for you.
Let’s walk through an example to demonstrate how you can use
package configurations to change the value of variable.
First, create a new package, add a variable of string type
called ConfigTargetObject and assign the string “Development time Value” as its
value as shown in Figure 3.
.jpg)
Figure 3. Creating a
new package variable.
Then, enable package
configurations by going to the SSIS
menu, and choosing the Package
Configurations option. Then check the Enable
Package Configurations option after the Package Configurations Organizer opens, and click the Add button to open the Package Configuration Wizard as shown
in figure 4.
.jpg)
Figure 4. The Select
Configuration Type page of the Package Configuration Wizard.
On the first page of the wizard, you can choose the
configuration type, which in this case is an XML configuration file. Then you
have two options for providing the location of the configuration file.
- The first option is known as the direct method,
where the path to the file is stored inside of the package definition. This
option works fine if you don’t have to deploy the package to different
machines, or if the path provided is valid on every machine the package will be
deploy to.
- The second method is called indirect, where you
can pick the name of a Windows environment variable that would hold the path to
the configuration file. This method is ideal when you have to deploy packages
to different servers, and when the location of the XML file is subject to
change, since you would only need to change the value of the environment
variable where the package is deployed. Notice that the environment variable
must exist prior to creating the package configuration, and in some instances
you may have to close and re-open the Business Intelligence Development Studio
(BIDS) session in order to get the name of the environment variable in the
dropdown list.
For this example we have chosen the direct method.
In the next page of the wizard, look for the variable name
we created at the beginning of this example, and check the box next to the
Value property as illustrated in Figure 5.
.jpg)
Figure 5. The Select
Properties to Export page of the Package Configuration Wizard.
On this screen, you have the ability to select multiple
properties. This is because XML configuration files can contain multiple
configuration entries. Click the next button one more time to see a summary of
the configuration entry and provide a name as in Figure 6.
.jpg)
Figure 6. The
Completing the Wizard page of the Package Configuration Wizard.
When you press the Finish button, the configuration file is
created and you should be able to see a new entry in the package configuration
organizer as showed in Figure 7. Notice that the file is created by exporting
the values inside of the package at the time the wizard starts. This means that
you still have to edit the XML file and adjust the configuration values
according to your needs. In this example, we decide to provide “Run-time value”
as new value for the variable being configured. For that we open the file in a
text editor and change the line:
<ConfiguredValue>Development-time Value</ConfiguredValue>
To:
<ConfiguredValue> Run-time value from config file</ConfiguredValue>
.jpg)
Figure 7. The Package
Configurations Organizer after creating an XML configuration file.
Now that you have seen how to create an XML file
configuration, let’s list some important aspects that will help you in getting
the most out of this type of configurations:
- Consider using the indirect method whenever
possible, as you won’t need to edit the package if the location or name of the
configuration file change. With this method, the path and file name are stored
in an environment variable.
- If you find that creating environment variables
is too intrusive for your scenario, consider using the direct method. But be
aware that changes to the file location may trigger changes to the package, or
force you to usether options, such
asthe /CONF switch on DTExec command
line.
- You can override the configuration file path and
name from the DTExec utility command line by using the /CONFIGURATION option.
This option only affects the package being called from the command line; it
does not affect packages executed via an Execute package task from a parent
package.
- You can configure multiple properties inside of
a single XML file.
- Multiple packages can use the same XML
configuration file.
- The package configuration wizard can create the
XML file only when the direct method is used. If you decide to use the indirect
method, consider creating the configuration by using the direct method first,
and once the file is created, change the method to the indirect one.
- If you use the direct method and provide an
existing configuration file, the wizard prompts you to either reuse the
configuration information in the file, or to override it with new configuration
information from the current package. A message similar to the one in figure 8
is displayed.Make sure you understand
this behavior, as choosing the wrong option could change the content of the
configuration file and produce unexpected execution results.
.jpg)
Figure 8. The prompt
seen when reusing an existing configuration file.
- If you chose the indirect method, the wizard
will not display the ‘Select properties to Export’ page. The wizard assumes
that the configuration file already exists and contains all required
configuration information (target properties and values).
- When creating the file, the wizard does not
include any sensitive information (suchas passwords in connection strings) in
the configuration file, and you have to edit the configuration file to add such
information manually..
- Consider the adoption of strong security
policies to restrict access to the configuration files and their content, as
they may include highly sensitive data.
- The Package Configurations Organizer does not
show any value for Target Object and Target Property columns for XML file based
configurations, as the file may contain multiple configurations.
Environment Variable Configurations
With this type of package configuration, you have to create
an environment variable for each package property you intend to update, and
place the configuration value as the value of the environment variable. As you
can see in Figure 9, the indirect method is not available when use this type of
configuration; which it would not make too much sense as the indirect method is
based on an environment variables.
.jpg)
Figure 9. Creating a
new Environment variable configuration.
In the first page of the wizard, you have to choose the
environment variable to be used from a drop down list. Then, the next page,
shown in Figure 10, lets you choose which object property is going to be affected
by the configuration you are creating. Notice that selecting multiple
properties is disallowed for this type of configuration.
.jpg)
Figure 10. Setting a
single value for an environment variable configuration.
Now, let’s review some considerations you should keep in
mind when using this type of package configurations:
- The environment variable(s) must exist on all
machines where the package is going to be deployed.This is because the name of the environment
variable is stored inside of the package.
- You have to create an environment variable for
each configuration value that you need. This is afactor to consider when
setting multiple configurations is required. This type of package configuration
works better when combined with SQL Server configuration or XML file types.
- The configuration wizard does not create the
environment variables, and they are visible to the wizard only if they were
created prior to opening the current BIDS session.
- You can use either system or user environment
variables, but user environment variables are not visible to other users;
hence, the package has to be executed by the user that owns the environment
variable in order for it to work.
- Multiple package configurations from multiple
packages can reference the same system environment variable.
- The environment variable configuration type is
different than the indirect method available with other configurations types
(which are based on environment variables).
Registry Entry Configurations
This configuration type lets you store configuration values
in Windows registry entries in a similar fashion than environment variable
configurations do. Figure 11 shows the first page of the configuration wizard
when you select registry entry as the type to be used.
.jpg)
Figure 11. Creating a
new Registry entry configuration.
After you select a registry key configuration type from the
dropdown list, you have to choose the method to be used. The first option is
the direct method, where the wizard expects a valid registry key name that
exists under the Windows registry HKEY_CURRENT_USER key. The second option is
the indirect method, where you provide the name of an environment variable that
in turns contains the registry key name to be used by the configuration. The
indirect method gives you the flexibility to change the name of the registry
key or to pointto a different one by updating the environment variable value.
Let’s see a couple of examples of the value expected by the
package configuration wizard in the registry entry field. If you want to use a registry
key that exists directly under HKEY_CURRENT_USER, as in Figure 12, the expected
value is:
SSISPkgConfig
.jpg)
Figure 12. A sample
registry entry configuration.
If you create a registry key to be used by the configuration
that is not directly under HKEY_CURRENT_USER key, as shown in Figure 13, then
the wizard expects this value:
SSISPkgConfig\config1
.jpg)
Figure 13. A sample
of a nested registry entry configuration.
In the next page of the wizard, select the property object
you want to update trough the configuration, as shown in figure 14.
.jpg)
Figure 14. Setting a
single value for a registry entry configuration.
There are a few other things you need to keep in mind when
using this type of configurations:
- The wizard does not create or modify the
registry keys; the registry keys must be manually created and edited.
- The registry keys must exist, either directly or
indirectly, under HKEY_CURRENT_USER, and the configuration value has to be
store in an entry called ‘value’.
- Registry keys under HKEY_CURRENT_USER are
visible only to the user that created them, hence, the package can apply the
configurations only when it is executed using the credentials of that user. If
you execute the package via SQL Server Agent, a workaround is to set up a proxy
account in SQL Server Agent. Make sure that you consider all the implications
of this behavior, as this may complicate your deployment scenarios.
- You have to use one registry key per
configuration value. If you are configuring multiple properties, consider using
SQL Server or XML file configurations instead.
- You do not need to close and reopen the BIDS
session each time you create or make changes to the registry keys being used by
the configuration. This is an advantage over environment variable configuration
type.
- You need access to modify the Windows registry
settings on each machine were you intend to deploy the package. This makes this
configuration type risky.
- Be extremely careful when using the registry
editor, as any error could cause serious problems to the system. Always create
a backup of the registry before making changes.
Parent Package Variable Configurations
When you execute a package (the child) from another package
(the parent) via the Execute Package task, you can use Parent Package Variable
configurations in the child package to pass variable values from the parent.
Regardless of its name, this configuration type has to be
set up in the child package. In the Package Configuration Wizard, in the child
package, you have to specify the name of the variable (that exists in the
parent) that holds the desired configuration value, as shown in Figure 15.
.jpg)
Figure 15. Creating a
new Parent package variable configuration.
Notice that the child package is unaware of the existence of
the parent package, and the name of the variable that you enter is not
validated when you create the configuration. When using the direct method, you
have to type the variable name exactly as it appears in the parent package.
Alternatively, you can select an environment variable that contains the name of
the parent package variable, thus adding the flexibility the indirect method
offers.
The next page of the wizard allows you to select the
property to be updated, in the same way as when using registry entry or
environment variable configurations.
Finally, let’s review some considerations and facts that are
relevant when working with parent package variable configurations:
- By design, parent package variables are always
applied last, regardless of where you place them in the configurations organizer.
This is not necessarily a bad thing, but because of this, this type of
configuration may not suitable in every scenario.
- This type of configuration is applied only when
the child package is executed via the Execute Package task.
- SSIS variable names are case-sensitive. Always
make sure that both spelling and casing of the variable name in the
configuration entry are identical to the ones in the parent package. For
example, “Myvariable” and “MyVariable” are considered different variables.
- Standalone execution and unit testing of child
packages using this type of configuration may be challenging when the child
package does not run in the context of the parent. A simple but effective
workaround is to always apply this type of configurations to variables in the
child package, and then use expressions in the objects that need to be updated
(connection managers, server names, and so forth) based on those variables.
This way, if you need to run the child package without the parent, you can
manually edit the variables in the child package in BIDS, or via the /SET
option of the DTExec utility.
- Each package configuration entry can only affect
one property in the package being configured. Only XML file and SQL Server
based package configurations have the ability to hold multiple configuration
property/value pairs.
SQL Server Configurations
This configuration type offers almost the same level of
flexibility and functionality as XML configuration files, with the difference
that configuration information is stored in a SQL Server table. The table can
be created in any database that is accessible by the package at execution time.
You can use the Package Configuration Wizard to create the table. This is the
default structure of the table:
CREATE TABLE [dbo].[SSIS Configurations]
(
ConfigurationFilter NVARCHAR(255) NOT NULL,
ConfiguredValue NVARCHAR(255) NULL,
PackagePath NVARCHAR(255) NOT NULL,
ConfiguredValueType NVARCHAR(20) NOT NULL
)
These fields are used as follows:
- ConfigurationFilter: This field is used by SSIS
to indentify a set of property/values pairs that are part of the same
configuration entry in the Package Configurations Organizer.
- ConfigurationValue: It stores the value that is
used to update the package property specified in PackagePath column.
- PackagePath: The path that point to the property
being configured.
- ConfiguredValueType: the SSIS data type of the
property being configured.
When you create SQL Server package configurations, you first
have to choose which method you would use to provide the connection information
to the configuration table, as the Figure 16 shows.
.jpg)
Figure 16. Creating a
new SQL Server configuration.
With the direct method, the connection information,
configuration table and filter are stored inside of the package. The indirect
method instead allows storing that information in an environment variable.
Notice that both methods use an SSIS connection manager, and its connection
string is hard-coded inside of the package. Therefore, if you use this
configuration type, it is a good practice to ensure that the connection string
in this connection manager can be updated from an external source. A common
approach is to use a separate configuration (XML, Registry Key or environment
variable) to update the connection manager when required.
The next page in the configuration wizard lets you choose
the set of properties to be targeted by the configuration being created. As you
can see in figure 17, selecting multiple properties is allowed.
.jpg)
Figure 17. Setting
multiple values for SQL Server configurations.
Now, let’s go through some important considerations that
will help you to understand this configuration type better, and to avoid common
implementation issues:
- SQL Server configurations are very flexible, as
you can store multiple configuration properties/values in the same table.
- Multiple packages can use the same configuration
table and filter.
- You can have a configuration filter for each
entry in the configuration table, or you can use the same filter to group a set
of entries. If you use the second approach, make sure that all properties under
the selected filter exist in the package being configured. Otherwise warning
messages are generated for those properties in the configuration filter that do
not exist in the target package.
- You can use existing database security and
backup policies to protect the configuration table contents.
- A drawback of this configuration type is that it
relies on an SSIS connection manager to get access to the configuration table,
but does not have built-in support for updating its connection string. In other
words, the connection string that points to the configuration table is
hard-coded inside of the package, and you have to take extra steps in order to
update it (for example, through an extra package configuration placed at the
top of the Package Configurations Organizer, or in SQL Server 2008, by using
the /CONN option of DTExec).
- The /CONN option of DTExec utility can be used
to override the connection string used by the configuration only in SSIS 2008.
This is because SQL Server 2008 Integration Services reloads configurations
after applying DTExec command line options.
- The configuration wizard can create the table
and insert the required rows when they do not exist, but only if the direct
method is chosen. The wizard never includes any sensitive data in the
ConfiguredValue column of the table, and you would need to manually update the
table to add the required sensitive data.
- If you use the direct method, and provide an
existing configuration filter, the wizard will prompt you to either reuse the
configuration information in the table or to override it with the property
values from the current package. Make sure that you understand this behavior,
since choosing the wrong option could change the contents of the configuration
table and produce unexpected execution results.
- When you use the indirect method, the
environment variable must contain the name of the configuration connection
manager, the name of the configuration table and the configuration filter. The
syntax should be:
“ConfigurationManagerName”;”Schema.ConfigurationTableName”;’ConfigurationFilter”
As with the direct method, you have
to account for extra logic if you need to modify the connection string inside
of “ConfigurationManagerName”
Conclusion
Package configuration is the natural way to parameterize
Integration Services packages and to put your ETL solution in a better position
to seamlessly respond to possible changes in the environment. With five types
and two methods available, package configuration is a sophisticated mechanism
that can be combined in a number of ways, and the time invested in
understanding its behavior and the options available is well worth it.
About the author. Rafael Salas is a Senior Consultant at Mariner, a BI focus consulting firm, where
he specializes in helping organizations to improve performance through Business
Intelligence and Data Warehousing solutions. He has been a SQL Server
evangelist since he started using the 2005 CTP. He is a SQL Server MVP, MCTS,
and an active member of the user communities, where he provides guidance on the
use of the SQL Server tools.