Data Transformation / Scale and Reduce
Updated: April 11, 2016
The modules in this group help you clip, bin, and normalize numerical values, as well as reduce the number of columns in the dataset.
Normalizing, binning, or redistributing numerical variables is an important part of data preparation for many machine learning task. The modules in this group help you perform these critical data preparation tasks:
Grouping data into bins of varying sizes or distributions
Removing outliers or changing their values
Normalizing a set of numeric values into a specific range
Creating a compact set of feature columns from a high-dimension dataset
Related Tasks
In addition to these modules, you might find the following related tools useful for transforming numeric data:
Selecting the relevant and useful features for use in building the model: Feature Selection Modules or Fisher Linear Discriminant Analysis
Selecting the most important features based on counts of the values: Data Transformation / Learning with Counts
Removing or replacing missing values: Clean Missing Data
Replacing categorical values with numerical values derived from calculations: Replace Discrete Values
Computing a probability distribution for discrete or numerical columns: Evaluate Probability Function
Filtering and transforming digital signals and waveforms: Data Transformation / Filter
The category Data Transformation / Scale and Reduce includes these modules:
Module | Description |
|---|---|
Detects outliers and clips or replaces their values | |
Puts numerical data into bins | |
Rescales numeric data to constrain dataset values to a standard range | |
Computes a set of features with reduced dimensionality for more efficient learning |