Permutation Feature Importance
Updated: April 12, 2016
Computes the permutation feature importance scores of feature variables given a trained model and a test dataset
Category: Feature Selection Modules
You can use the Permutation Feature Importance module to compute a set of feature importance scores for your dataset, based on how much the performance of a model based on the dataset changes when the feature values are randomly shuffled.
The scores that the module returns represent the change in the performance of a trained model, after permutation. You can configure the module to use any one of several standard performance metrics used for evaluation.
The module requires that you provide a test dataset, as well as an existing trained classification or regression model.
Permutation feature importance works by randomly changing the values of each feature column, one column at a time, and then evaluating the input model. Important features are usually more sensitive to the shuffling process, and will thus result in higher importance scores.
This article provides a good general overview of permutation feature importance, its theoretical basis, and its applications in machine learning: Permutation feature importance
To generate a set of feature scores requires that you have an already trained model, as well as a test dataset.
Add the Permutation Feature Importance module to your experiment.
To the left input, connect a trained model.
The model must be a regression model or classification model.
To the right input, connect a dataset, preferably one that is different from the dataset used for training the model. This dataset is used for scoring based on the trained model, and for evaluating the model after feature values have been changed.
For Random seed, type a value to use as seed for randomization. If you specify 0 (the default), a number is generated based on the system clock.
A value is optional but you should provide a value if you want reproducibility across runs of the same experiment.
For Metric for measuring performance, select a single metric to use when computing model quality after permutation.
Azure Machine Learning Studio supports the following metrics, depending on whether you are evaluating a classification or regression model:
Classification - Accuracy
Classification - Precision
Classification - Recall
Classification - Average Log Loss
Regression - Mean Absolute Error
Regression - Root Mean Squared Error
Regression - Relative Absolute Error
Regression - Relative Squared Error
Regression - Coefficient of Determination
For a more detailed description of these evaluation metrics, and how they are calculated, see Machine Learning / Evaluate.
Run the experiment.
The module outputs a list of feature columns and the scores associated with them, ranked in order of the scores, descending.
To see examples of how feature selection is used in machine learning, see these sample experiments in the Model Gallery:
The Permutation Feature Importance sample demonstrates how to use this module to rank feature variables of a dataset in order of permutation importance scores.
The Using the Permutation Feature Importance module sample illustrates the usage of this module in a web service.
the rankings provided by permutation feature importance are often different from the ones you get from Filter Based Feature Selection, which calculates scores before a model is created. This is because permutation feature importance doesn’t measure the association between a feature and a target value, but instead captures how much influence each feature has on predictions from the model.
Name | Type | Description |
|---|---|---|
Trained model | A trained classification or regression model | |
Test data | Test dataset for scoring and evaluating a model after permutation of feature values |
Name | Type | Range | Optional | Default | Description |
|---|---|---|---|---|---|
Random seed | Integer | >=0 | Required | 0 | Random number generator seed value |
Metric for measuring performance | enum:EvaluationMetricType | Required | Classification - Accuracy | Select the metric to use when evaluating the variability of the model after permutations. |
Name | Type | Description |
|---|---|---|
Feature importance | A dataset containing the feature importance results, based on the selected metric |
Exception | Description |
|---|---|
Exception occurs when attempting to compare two models with different learner types. | |
Exception occurs if dataset does not contain a label column. | |
Thrown when a module definition file defines an unsuppported parameter type | |
Exception occurs if number of rows in some of the datasets passed to the module is too small. |