Data Transformation - Scale and Reduce
Updated: October 6, 2017
This article describes the modules in Azure Machine Learning Studio that are provided to help you work with numerical data. For machine learning, common data tasks include clipping, binning, and normalizing numerical values. Other modules support dimensionality reduction.
Tasks such as normalizing, binning, or redistributing numerical variables are an important part of data preparation. The modules in this group support the following data preparation tasks:
Grouping data into bins of varying sizes or distributions
Removing outliers or changing their values
Normalizing a set of numeric values into a specific range
Creating a compact set of feature columns from a high-dimension dataset
Related Tasks
In addition to these modules, you might find the following related tools useful for transforming numeric data:
Selecting the relevant and useful features for use in building the model: Feature Selection or Fisher Linear Discriminant Analysis
Selecting features based on counts of the values: Learning with Counts
Removing or replacing missing values: Clean Missing Data
Replacing categorical values with numerical values derived from calculations: Replace Discrete Values
Computing a probability distribution for discrete or numerical columns: Evaluate Probability Function
Filtering and transforming digital signals and waveforms: Filter
This category includes the following modules:
| Module | Description |
|---|---|
| Clip Values | Detects outliers and clips or replaces their values |
| Group Data into Bins | Puts numerical data into bins |
| Normalize Data | Rescales numeric data to constrain dataset values to a standard range |
| Principal Component Analysis | Computes a set of features with reduced dimensionality for more efficient learning |
Module Categories and Descriptions
Manipulation
Sample and Split
Filter
Learning with Counts
Feature Selection
A-Z Module List