Linear Discriminant Analysis (deprecated)
Updated: July 2, 2015
Identify a linear set of variables that best separates two or more classes
Category: Feature Selection Modules
You can use Fisher Linear Discriminant Analysis to create a set of scores that identifies the combination of features that best separate two or more classes.
Warning |
|---|
This module is provided for backward compatibility with experiments created using the pre-release version of Azure Machine Learning, and will soon be deprecated. We recommend that you modify your experiments to use Fisher Linear Discriminant Analysis instead. |
Linear discriminant analysis is often used for dimensionality reduction, because it projects a set of features onto a smaller feature space while preserving the information that discriminates between classes. This not only reduces computational costs for a given classification task, but can help prevent overfitting.
You provide a set of possible feature columns as inputs, and the algorithm determines the optimal combination of the input columns that linearly separates each group of data while minimizing the distances within each group.
Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by comparing the means of the variables, and is based on these assumptions:
Predictors are independent
Values are normally distributed
Variances among groups are similar
Note that Linear Discriminant Analysis is sometimes abbreviated to LDA, but this is easily confused with Latent Dirichlet Allocation. The techniques are completely different, so in this documentation, the acronym LDA will be used only for Latent Dirichlet Allocation.
After connecting your dataset, select a set of numeric feature columns as inputs.
The columns provided as inputs must meet these requirements:
Your data must be complete (no missing values).
It is also useful to have fewer predictors than there are samples
Because the values are expected to have a normal distribution, you should review the data for outliers.
The algorithm determines the optimal combination of the input columns that linearly separates each group of data while minimizing the distances within each group.
To see examples of how feature selection is used in machine learning experiments, see these sample experiments in the Model Gallery:
The Twitter Sentiment Analysis sample uses filter-Based Feature Selection to improve experiment results.
The Fisher Linear Discriminant Analysis sample demonstrates how to use this module for dimensionality reduction.
This method works only on continuous variables, not categorical or ordinal variables.
Rows with missing values are ignored when computing the transformation matrix.
The algorithm will examine all numeric columns not designated as labels, to see if there is any correlation. If you want to exclude a numeric column, add a Project Columns module before feature selection to create a view that contains only the columns you wish to analyze.
The module has two outputs:
Feature Extractors
A set of scores (eigenvectors), also called a discrimination matrix.
Transformed Features
A dataset containing the features that have been transformed using the eigenvectors.
For example, the following table shows the output when Fisher Linear Discriminant Analysis was performed on a subset of features from the Breast Cancer sample dataset:
E1 | E2 | E3 | E4 |
|---|---|---|---|
0.997588 | 0.97843 | 0.982406 | -0.937945 |
0.055731 | -0.163812 | -0.104849 | -0.174996 |
0.0411289 | -0.115274 | 0.103606 | 0.257335 |
0.0046382 | 0.0504354 | -0.114674 | 0.153015 |
These values represent four eigenvectors for each of the predictors (deg-malig, node-caps, tumor-size, breast-quad), and their associated eigenvalues. The combination of eigenvectors and their values provides information about the shape of the linear transformation.
In general, bigger scores (farther from zero) are better predictors. When deciding which combination of variables to keep for the optimal (and smaller) dimensional subspace, you can eliminates the eigenvectors with the lowest values, because they contain the least information about the distribution of the data.
In this example, the values in the eigenvectors all have a similar magnitude, which indicates that the data is projected onto a fairly good feature space.
However, if some of the eigenvalues were much larger than others, you would want to keep only those eigenvectors with the highest values, since they contain more information about the data distribution. That is, values that are closer to 0 are less informative and should be dropped when selecting a new set of features.
For more information about how the eigenvalues are calculated, see this paper (PDF):
Eigenvector-based Feature Extraction for Classification. Tymbal, Puuronen et al.
Name | Type | Description |
|---|---|---|
Dataset | Input dataset |
Name | Range | Type | Default | Description |
|---|---|---|---|---|
Class labels column | any | ColumnSelection | None | Select the column that contains the categorical class labels |
Name | Type | Description |
|---|---|---|
Feature extractors | Eigen vectors of input dataset | |
Transformed features | Fisher linear discriminant analysis features transformed to eigen vector space |
Exception | Description |
|---|---|
Exception occurs if one or more specified columns of data set couldn't be found. | |
Exception occurs if one or more of inputs are null or empty. | |
Exception occurs if one or more specified columns have type unsupported by current module. |
