Machine Learning - Initialize Model - Clustering

 

Updated: October 5, 2016

This article introduces the clustering algorithms provided in Azure Machine Learning Studio.

Clustering, in machine learning, is a method of grouping data points into similar clusters. It is also called segmentation.

Over the years, many clustering algorithms have been developed, but all such algorithms use features to find similar items. For example, clustering can be applied in text analysis to group pieces of text that contain common words, where the words are the features.

Clustering can also be applied to data without any labels or indeed any preconceptions of how the data might be related, to discover new patterns. Hence clustering is often used for exploration of data prior to analysis with other more predictive algorithms.

A clustering algorithm can be used with either labeled or unlabeled data.

  • With unlabeled data, the clustering algorithm determines which data points are closest together, and creates clusters around a central point, or centroid. YOu can then use the cluster ID as a sort of temporary label for the group of data.

  • If the data has labels, you can use the label to drive the number of clusters, or use the label as just another feature.

After you have configured the clustering algorithm, you train it on data by using either the Train Clustering Model or Sweep Clustering modules.

When the model is trained, use if to predict cluster membership for new data points. For example, if you have used clustering to group customers by purchasing behavior, you can use the model to predict the purchasing behavior of new customers.

Wondering which algorithm you need for a task? See these topics:

The category for Initialize/Clustering includes the following modules:

ModuleDescription
K-Means ClusteringConfigures and initializes a K-means clustering model

Related Tasks

To use a different clustering algorithm, or create custom clustering model using R, see these topics:

Regression
Classification
Text Analytics
Image classification using OpenCV

Show: