Train Clustering Model
Updated: October 6, 2017
Trains a clustering model and assigns data from the training set to clusters
Category: Machine Learning / Train
This article describes how to use the Train Clustering Model module in Azure Machine Learning Studio to train a clustering model.
The module takes an untrained clustering model that you have already configured using the K-Means Clustering module, and trains the model using a labeled or unlabeled data set. The module creates both a trained model that you can use for prediction, and a set of cluster assignments for each case in the training data.
A clustering model cannnot be trained using the Train Model module, which is the generic module for creating machine learning models. That is because Train Model works only with supervised learning algorithms. A supervised learning model requires labeled data to train from. In contrast, K-means and other clustering algorithms allow unsupervised learning, meaning that you can let the algorithm learn from new data without necessarily providing it with examples. |
Add the Train Clustering Model module to the experiment. You can find the module in Azure Machine Learning Studio, in the Machine Learning category, under Train.
To the left input, connect the K-Means Clustering module, or another custom module that creates a compatible clustering model.
Configure the clustering model.
Attach a training dataset to the right-hand input of Train Clustering Model.
In Column Set, select the columns from the dataset to use in building clusters. The columns you select should be good features -- avoid using IDs or other columns that have unique values, or columns that have all the same values.
If a label is available, you can use it as a feature, or leave it out.
Check for Append or Uncheck for Result Only: When this option is selected, the module outputs the training data together with the new cluster label.
When you deselect this option, only the cluster label is output.
Run the experiment, or click the Train Clustering Model module and select Run Selected.
Results
To view the cluster and their separation in a graph, right-click the Results dataset output and select Visualize.
The graph represents the principal components of the cluster, rather than th eactual values. See Principal Component Analysis for more information.
To view the values in the dataset, add an instance of the Convert to Dataset module, and connect it to the Results dataset output. Run the Convert to Dataset module to get a copy of the data that you can view or download.
To save the trained model for later re-use, right-click the module, select Trained model, and click Save As Trained Model.
To generate scores from the model, use Assign Data to Clusters.
For an example of how clustering is used in machine learning, see these sample experiments in the Model Gallery:
The Clustering: Find similar Companies sample demonstrates how to use clustering on attributes derived from unstructured text.
The Clustering: Color quantization sample demonstrates how to use clustering to find related colors and reduce the number of bits used in images.
The Clustering: Group iris data sample provides a simple example of clustering based on the iris dataset.
| Name | Type | Description |
|---|---|---|
| Untrained model | ICluster interface | Untrained clustering model |
| Dataset | Data Table | Input data source |
| Name | Range | Type | Default | Description |
|---|---|---|---|---|
| Column Set | any | ColumnSelection | Column selection pattern | |
| Check for Append or Uncheck for Result Only | any | Boolean | true | Whether output dataset must contain input dataset appended by assignments column (Checked) or assignments column only (Unchecked) |
| Name | Type | Description |
|---|---|---|
| Trained model | ICluster interface | Trained clustering model |
| Results dataset | Data Table | Input dataset appended by data column of assignments or assignments column only |
For a complete list of module errors, see Module Error Codes.
| Exception | Description |
|---|---|
| Error 0003 | Exception occurs if one or more of inputs are null or empty. |
A-Z Module List
Train
Assign Data to Clusters
K-Means Clustering