Machine Learning Module Descriptions

 

Updated: June 9, 2017

This topic provides an overview of all the modules included in Azure Machine Learning Studio, organized by categories such as data transformation and text analytics.

By simply connecting and configuring modules, you can create a workflow that reads data from external sources, prepares it for analysis, applies machine learning algorithms, and generates results.

System_CAPS_ICON_tip.jpg Tip

Looking for machine learning algorithms? See the Machine Learning category, which contains modules for decision trees, clustering, neural networks, and more. The Train and Evaluate categories include modules to help train andd test your models.

In Azure Machine Learning Studio, a module is a building block for creating experiments.

Each module encapsulates a specific machine learning algorithm, function, or code library that can act on data in your workspace. The modules are designed to accept connections from other modules, to share and modify data.

You won't find a lot of the data integration tools or pipelines, like those provided by Integration Services or Azure Data Factory. Instead, the modules in Azure Machine Learning provide functionality that is specific to machine learning:

  • Normalization, grouping, and scaling of data
  • Computing statistical distribution of data
  • Conversion to other machine learning formats
  • Import of data used for machine learning experiments and export of results
  • Text analytics, feature selection, dimensionality reduction, and more

The code that runs in each module comes from many sources, including open source libraries and languages, algorithms developed by Microsoft Research, and tools for working with Azure and other cloud services.

When an experiment is open in Studio, you can see the complete list of current modules in the navigation pane at left. You drag these building blocks tinto your experiment, and then connect them to create a complete machine learning workflow, called an experiment.

For an example of how to build a complete machine learning experiment, see these tutorials:

Sometimes modules are updated to add new functionality, or to remove older code. When this happens, any experiments that you created using the module will continue to run, but the next time you open the experiment, you will be prompted to upgrade the module, or to use a different module.

To make it easier to find related modules, the machine learning tools in Azure Machine Learning Studio are grouped by these categories.

Data Format conversions

Use these modules to convert data to one of the formats used by other machine learning tools or formats.

  • Data Input and Output

    Use these modules to read data and models from cloud data sources, including Hadoop clusters, Azure table storage, and Web URLs, or to write results to storage or to a database.

  • Data Transformation

    Use these modules to prepare data for analysis. You can change data types, flag columns as features or labels, generate features, and scale or normalize data, and much more.

  • Filter

    Transform numeric data derived from digital signal processing.

  • Learning With Counts

    Use joint probability distributions to build features that compactly describe large datasets.

  • Manipulation

    This group provides a variety of tools for data science: remove or replace missing values, choose a subset of columns, add column or concatenate two datasets, and so forth.

  • Sample and Split

    Divide a dataset by criteria or by size, to create training and test sets, or to isolate certain rows.

  • Scale and Reduce

    Transform numerical data.

Feature Selection

Use these modules to identify the best features in your data, using widely researched statistical methods.

Machine Learning

This group contains most of the machine learning algorithms supported by Azure Machine Learning.

It also contains modules intended to support the algorithms by training models, generating scores, and evaluating model performance.

OpenCV Library Modules

These modules give you easy access to a popular open source library for image processing and image classification.

R Language Modules

Use these modules to add custom R code to your experiment, or implement a machine learning model based on an R package.

Python Language Modules

Use these modules to add custom Python code to your experiment.|

Statistical Functions

Use these modules to calculate probability distributions, create custom calculations, and perform a wide variety of other tasks related to numerical variables.|

Text Analytics

Use these modules to perform feature hashing and named entity recognition, or to preprocess text using NLP tools.|

Time Series

Use these modules to assess anomalies in trends, using algorithms specifically designed for time series data.|

A-Z Module List

Show: