Permutation Feature Importance

 

Updated: April 12, 2016

Computes the permutation feature importance scores of feature variables given a trained model and a test dataset

You can use the Permutation Feature Importance module to compute a set of feature importance scores for your dataset, based on how much the performance of a model based on the dataset changes when the feature values are randomly shuffled.

The scores that the module returns represent the change in the performance of a trained model, after permutation. You can configure the module to use any one of several standard performance metrics used for evaluation.

The module requires that you provide a test dataset, as well as an existing trained classification or regression model.

Permutation feature importance works by randomly changing the values of each feature column, one column at a time, and then evaluating the input model. Important features are usually more sensitive to the shuffling process, and will thus result in higher importance scores.

This article provides a good general overview of permutation feature importance, its theoretical basis, and its applications in machine learning: Permutation feature importance

To generate a set of feature scores requires that you have an already trained model, as well as a test dataset.

 

  1. Add the Permutation Feature Importance module to your experiment.

  2. To the left input, connect a trained model.

    The model must be a regression model or classification model.

  3. To the right input, connect a dataset, preferably one that is different from the dataset used for training the model. This dataset is used for scoring based on the trained model, and for evaluating the model after feature values have been changed.

  4. For Random seed, type a value to use as seed for randomization. If you specify 0 (the default), a number is generated based on the system clock.

    A value is optional but you should provide a value if you want reproducibility across runs of the same experiment.

  5. For Metric for measuring performance, select a single metric to use when computing model quality after permutation.

    Azure Machine Learning Studio supports the following metrics, depending on whether you are evaluating a classification or regression model:

    • Classification - Accuracy

    • Classification - Precision

    • Classification - Recall

    • Classification - Average Log Loss

    • Regression - Mean Absolute Error

    • Regression - Root Mean Squared Error

    • Regression - Relative Absolute Error

    • Regression - Relative Squared Error

    • Regression - Coefficient of Determination

    For a more detailed description of these evaluation metrics, and how they are calculated, see Machine Learning / Evaluate.

  6. Run the experiment.

  7. The module outputs a list of feature columns and the scores associated with them, ranked in order of the scores, descending.

To see examples of how feature selection is used in machine learning, see these sample experiments in the Model Gallery:

the rankings provided by permutation feature importance are often different from the ones you get from Filter Based Feature Selection, which calculates scores before a model is created. This is because permutation feature importance doesn’t measure the association between a feature and a target value, but instead captures how much influence each feature has on predictions from the model.

Name

Type

Description

Trained model

ILearner interface

A trained classification or regression model

Test data

Data Table

Test dataset for scoring and evaluating a model after permutation of feature values

  

Name

Type

Range

Optional

Default

Description

Random seed

Integer

>=0

Required

0

Random number generator seed value

Metric for measuring performance

enum:EvaluationMetricType

Required

Classification - Accuracy

Select the metric to use when evaluating the variability of the model after permutations.

Name

Type

Description

Feature importance

Data Table

A dataset containing the feature importance results, based on the selected metric

Exception

Description

Error 0062

Exception occurs when attempting to compare two models with different learner types.

Error 0024

Exception occurs if dataset does not contain a label column.

Error 0105

Thrown when a module definition file defines an unsuppported parameter type

Error 0021

Exception occurs if number of rows in some of the datasets passed to the module is too small.

Show: