Machine Learning Modules
Updated: October 13, 2015
The typical workflow for machine learning includes many phases:
Identifying a problem to solve and a metric for measuring results
Finding, cleaning, and preparing appropriate data
Identifying the best features and engineering new features
Building, evaluating, and tuning models
Using models to generate predictions, recommendations, and other results
The modules in this section provide tools for the final phases of machine learning, in which you apply an algorithm to data to train a model, generate scores, and then evaluate the accuracy and usefulness of the model.
Initialize Model: Choose from a variety of customizable machine learning algorithms, including clustering, regression, classification, and anomaly detection models.
Train: Provide your data to the configured model to learn from patterns and create statistics that can be used for predictions.
Score: Create predictions using the trained models.
Evaluate: Measure the accuracy of a trained model or compare multiple models.
Tip |
|---|
For a detailed description of this experimental workflow, see the credit risk solution walkthrough. |
Click the links in the table to see a complete list of the Machine Learning modules in each category:
Category |
|---|
For examples of machine learning in action, see the Model Gallery.
Data Exploration and Data Quality
Part of the machine learning process is making sure your data is the right kind of data, the right quantity, and the right quality for the algorithm you’ve chosen. Understand how much data you have, how it is distributed. Are there outliers? How were those generated and what do they mean? Are there any duplicate records?
For examples, see Advanced data processing in Azure.
Missing Value Handling
Missing values can affect your results in many ways. For example, almost all statistical methods discard cases with missing values. If you want to impute the missing values or correct your data, use Clean Missing Data before training your model.
By default, Azure Machine Learning follows these rules when it encounters rows with missing values:
If data used to train a model has missing values, any rows with missing values are skipped.
If data used as input when scoring against a model has missing values, the missing values are used as inputs, but nulls are propagated, which usually means that the result is also a missing value.
Feature Selection and Dimensionality Reduction
Azure Machine Learning Studio can help you sift through your data to find the most useful attributes.
Use tools such as Fisher Linear Discriminant Analysis or Filter Based Feature Selection to determine which columns of data have the most predictive power, or to identify columns that should be removed because of data leakage.
Create or engineer features from existing data. Normalize Data or Group Data into Bins to make new groupings of data or standardize the range of numeric values prior to analysis.
Reduce dimensionality by grouping categorical values, by using Principal Component Analysis, or by sampling.
Choosing an Algorithm
The problem you are trying to solve determines both the choice of data to use in analysis, and the choice of an algorithm.
For help in choosing a machine learning algorithm, see How to choose an algorithm in Azure Machine Learning.
