Machine Learning Modules

 

Updated: October 13, 2015

The typical workflow for machine learning includes many phases:

  • Identifying a problem to solve and a metric for measuring results

  • Finding, cleaning, and preparing appropriate data

  • Identifying the best features and engineering new features

  • Building, evaluating, and tuning models

  • Using models to generate predictions, recommendations, and other results

The modules in this section provide tools for the final phases of machine learning, in which you apply an algorithm to data to train a model, generate scores, and then evaluate the accuracy and usefulness of the model.

System_CAPS_tipTip

For a detailed description of this experimental workflow, see the credit risk solution walkthrough.

Click the links in the table to see a complete list of the Machine Learning modules in each category:

Category

Machine Learning / Evaluate

Machine Learning / Initialize Model

Machine Learning / Score

Machine Learning / Train

For examples of machine learning in action, see the Model Gallery.

Data Exploration and Data Quality

Part of the machine learning process is making sure your data is the right kind of data, the right quantity, and the right quality for the algorithm you’ve chosen. Understand how much data you have, how it is distributed. Are there outliers? How were those generated and what do they mean? Are there any duplicate records?

For examples, see Advanced data processing in Azure.

Missing Value Handling

Missing values can affect your results in many ways. For example, almost all statistical methods discard cases with missing values. If you want to impute the missing values or correct your data, use Clean Missing Data before training your model.

By default, Azure Machine Learning follows these rules when it encounters rows with missing values:

  • If data used to train a model has missing values, any rows with missing values are skipped.

  • If data used as input when scoring against a model has missing values, the missing values are used as inputs, but nulls are propagated, which usually means that the result is also a missing value.

Feature Selection and Dimensionality Reduction

Azure Machine Learning Studio can help you sift through your data to find the most useful attributes.

Choosing an Algorithm

The problem you are trying to solve determines both the choice of data to use in analysis, and the choice of an algorithm.

For help in choosing a machine learning algorithm, see How to choose an algorithm in Azure Machine Learning.

Show: