Pivot Transformation
The Pivot transformation makes a normalized data set into a less normalized but more compact version by pivoting the input data on a column value. For example, a normalized Orders data set that lists customer name, product, and quantity purchased typically has multiple rows for any customer who purchased multiple products, with each row for that customer showing order details for a different product. By pivoting the data set on the product column, the Pivot transformation can output a data set with a single row per customer. That single row lists all the purchases by the customer, with the product names shown as column names, and the quantity shown as a value in the product column. Because not every customer purchases every product, many columns may contain null values.
When a dataset is pivoted, input columns perform different roles in the pivoting process. A column can participate in the following ways:
-
The column is passed through unchanged to the output. Because many input rows can result only in one output row, the transformation copies only the first input value for the column.
-
The column acts as the key or part of the key that identifies a set of records.
-
The column defines the pivot. The values in this column are associated with columns in the pivoted dataset.
-
The column contains values that are placed in the columns that the pivot creates.
This transformation has one input, one regular output, and one error output.
To pivot data efficiently, which means creating as few records in the output dataset as possible, the input data must be sorted on the pivot column. If the data is not sorted, the Pivot transformation might generate multiple records for each value in the set key, which is the column that defines set membership. For example, if a dataset is pivoted on a Name column but the names are not sorted, the output dataset could have more than one row for each customer, because a pivot occurs every time that the value in Name changes.
The input data might contain duplicate rows, which will cause the Pivot transformation to fail. "Duplicate rows" means rows that have the same values in the set key columns and the pivot columns. To avoid failure, you can either configure the transformation to redirect error rows to an error output or you can pre-aggregate values to ensure there are no duplicate rows.
You configure the pivot operation by setting the options in the Pivot dialog box. To open the Pivot dialog box, add the Pivot transformation to the package in SQL Server Data Tools (SSDT), and then right-click the component and click Edit.
The following list describes the options in the Pivot dialog box.
The following table shows a data set before the data is pivoted on the Year column.
|
Year |
Product Name |
Total |
|---|---|---|
|
2004 |
HL Mountain Tire |
1504884.15 |
|
2003 |
Road Tire Tube |
35920.50 |
|
2004 |
Water Bottle – 30 oz. |
2805.00 |
|
2002 |
Touring Tire |
62364.225 |
The following table shows a data set after the data has been pivoted on the Year column.
|
|
2002 |
2003 |
2004 |
|---|---|---|---|
|
HL Mountain Tire |
141164.10 |
446297.775 |
1504884.15 |
|
Road Tire Tube |
3592.05 |
35920.50 |
89801.25 |
|
Water Bottle – 30 oz. |
NULL |
NULL |
2805.00 |
|
Touring Tire |
62364.225 |
375051.60 |
1041810.00 |
To pivot the data on the Year column, as shown above, the following options are set in the Pivot dialog box.
-
Year is selected in the Pivot Key list box.
-
Product Name is selected in the Set Key list box.
-
Total is selected in the Pivot Value list box.
-
The following values are entered in the Generate pivot output columns from values box.
[2002],[2003],[2004]
For information about how to set the properties of this component, see Set the Properties of a Data Flow Component.