Select Columns Transform
Updated: July 18, 2015
Creates a transformation that selects the same subset of columns as in the given dataset
Category: Data Transformation / Manipulation
You can use the Select Columns Transform module to create a transformation that will ensure that the same set of columns is always used in downstream operations. This can be useful if you want to apply an operation that always requires specific columns and you make sure that column selections do not change, as this might break the experiment or change the results.
Note that you can also use Select Columns in Dataset to choose a subset of columns to use in a downstream module. However, Select Columns in Dataset is not always the easiest to configure if there are many columns. Moreover, there are times when the selection of columns in an input dataset might change depending on feature selection or other operations.
For example, suppose you use Filter Based Feature Selection with a dataset to automatically find the best features in a dataset, and then use the dataset created by Filter Based Feature Selection as an input to Train Model.
Because Filter Based Feature Selection evaluates the feature importance based on the values in the column, it is impossible to know beforehand which columns to use when scoring. Moreover, if you apply Filter Based Feature Selection to the scoring dataset, it might choose a different set of columns, which would cause the scoring operation to fail.
In this scenario, you can use the Select Columns Transform module to generate a transformation (which you can save as an ITransform interface) to ensure that the same set of columns is used for scoring that is used for training.
Name | Type | Description |
|---|---|---|
Dataset with desired columns | Dataset containing desired set of columns |
Name | Type | Description |
|---|---|---|
Columns selection transformation | Transformation that selects the same subset of columns as in the given dataset. |
Exception | Description |
|---|---|
Exception occurs if one or more of inputs are null or empty. |