Train Clustering Model
Updated: April 12, 2016
Trains a clustering model and assigns data from the training set to clusters
Category: Machine Learning / Train
The Train Clustering Model module takes an untrained clustering model, such as that produced by K-Means Clustering, and an unlabeled data set. It builds a clustering model using the specified configuration and returns the training data, together with a cluster assignment for each case in training data.
Note that Train Clustering Model works a bit differently from Train Model, which is the generic module for creating trained machine learning models. That is because Train Model works only with supervised learning algorithms – that is, algorithms that require a sample of data with known labels to train from. In contrast, K-means and other clustering algorithms are unsupervised machine learning algorithms, meaning that you can let the algorithm learn from new data without having to provide it with examples.
Add the Train Clustering Model module to the experiment.
To the left input, connect the K-Means Clustering module, or another custom module that creates a clustering model.
Configure the clustering model.
Attach a training dataset to the right-hand input of Train Clustering Model.
In Column Set, select the columns from the dataset to use in building clusters.
Run the experiment.
When the model is trained, right-click the Results dataset output and select Visualize to view the cluster separation in a Principal Components graph.
You can also save the trained model, or use it to predict cluster membership for new cases, using the Assign to Clusters (deprecated) module.
For an example of how clustering is used in machine learning, see these sample experiments in the Model Gallery:
The Clustering: Find similar Companies sample demonstrates how to use clustering on attributes derived from unstructured text.
The Clustering: Color quantization sample demonstrates how to use clustering to find related colors and reduce the number of bits used in images.
The Clustering: Group iris data sample provides a simple example of clustering based on the iris dataset.
Name | Type | Description |
|---|---|---|
Untrained model | Untrained clustering model | |
Dataset | Input data source |
Name | Range | Type | Default | Description |
|---|---|---|---|---|
Column Set | any | ColumnSelection | Column selection pattern | |
Check for Append or Uncheck for Result Only | any | Boolean | true | Whether output dataset must contain input dataset appended by assignments column (Checked) or assignments column only (Unchecked) |
Name | Type | Description |
|---|---|---|
Trained model | Trained clustering model | |
Results dataset | Input dataset appended by data column of assignments or assignments column only |
For a complete list of module errors, see Machine Learning Module Error Codes.
Exception | Description |
|---|---|
Exception occurs if one or more of inputs are null or empty. |