Machine Learning Module Descriptions
Updated: June 9, 2017
This topic provides an overview of all the modules included in Azure Machine Learning Studio, organized by categories such as data transformation and text analytics.
By simply connecting and configuring modules, you can create a workflow that reads data from external sources, prepares it for analysis, applies machine learning algorithms, and generates results.
Looking for machine learning algorithms? See the Machine Learning category, which contains modules for decision trees, clustering, neural networks, and more. The Train and Evaluate categories include modules to help train andd test your models. |
In Azure Machine Learning Studio, a module is a building block for creating experiments.
Each module encapsulates a specific machine learning algorithm, function, or code library that can act on data in your workspace. The modules are designed to accept connections from other modules, to share and modify data.
You won't find a lot of the data integration tools or pipelines, like those provided by Integration Services or Azure Data Factory. Instead, the modules in Azure Machine Learning provide functionality that is specific to machine learning:
- Normalization, grouping, and scaling of data
- Computing statistical distribution of data
- Conversion to other machine learning formats
- Import of data used for machine learning experiments and export of results
- Text analytics, feature selection, dimensionality reduction, and more
The code that runs in each module comes from many sources, including open source libraries and languages, algorithms developed by Microsoft Research, and tools for working with Azure and other cloud services.
When an experiment is open in Studio, you can see the complete list of current modules in the navigation pane at left. You drag these building blocks tinto your experiment, and then connect them to create a complete machine learning workflow, called an experiment.
For an example of how to build a complete machine learning experiment, see these tutorials:
Sometimes modules are updated to add new functionality, or to remove older code. When this happens, any experiments that you created using the module will continue to run, but the next time you open the experiment, you will be prompted to upgrade the module, or to use a different module.
To make it easier to find related modules, the machine learning tools in Azure Machine Learning Studio are grouped by these categories.
Data Format conversions
Use these modules to convert data to one of the formats used by other machine learning tools or formats.
Use these modules to read data and models from cloud data sources, including Hadoop clusters, Azure table storage, and Web URLs, or to write results to storage or to a database.
Use these modules to prepare data for analysis. You can change data types, flag columns as features or labels, generate features, and scale or normalize data, and much more.
Transform numeric data derived from digital signal processing.
Use joint probability distributions to build features that compactly describe large datasets.
This group provides a variety of tools for data science: remove or replace missing values, choose a subset of columns, add column or concatenate two datasets, and so forth.
Divide a dataset by criteria or by size, to create training and test sets, or to isolate certain rows.
Transform numerical data.
Feature Selection
Use these modules to identify the best features in your data, using widely researched statistical methods.
Machine Learning
This group contains most of the machine learning algorithms supported by Azure Machine Learning.
It also contains modules intended to support the algorithms by training models, generating scores, and evaluating model performance.
After you have trained a model, use these tools to measure the model’s accuracy.
These modules provide the machine learning algorithms, which you can customize by setting parameters. The algorithms in this section are grouped by type:
Use these modules to pass new data through the algorithm and generate a set of results for evaluation. You can also use the results of scoring as part of a predictive service.
These modules train an initialized machine learning model on data you provide.
OpenCV Library Modules
These modules give you easy access to a popular open source library for image processing and image classification.
R Language Modules
Use these modules to add custom R code to your experiment, or implement a machine learning model based on an R package.
Python Language Modules
Use these modules to add custom Python code to your experiment.|
Statistical Functions
Use these modules to calculate probability distributions, create custom calculations, and perform a wide variety of other tasks related to numerical variables.|
Text Analytics
Use these modules to perform feature hashing and named entity recognition, or to preprocess text using NLP tools.|
Time Series
Use these modules to assess anomalies in trends, using algorithms specifically designed for time series data.|