Data Mining Algorithms (Analysis Services - Data Mining)


Updated: March 2, 2016

Applies To: SQL Server 2016

An algorithm in data mining (or machine learning) is a set of heuristics and calculations that creates a model from data. To create a model, the algorithm first analyzes the data you provide, looking for specific types of patterns or trends. The algorithm uses the results of this analysis over many iterations to find the optimal parameters for creating the mining model. These parameters are then applied across the entire data set to extract actionable patterns and detailed statistics.

The mining model that an algorithm creates from your data can take various forms, including:

  • A set of clusters that describe how the cases in a dataset are related.

  • A decision tree that predicts an outcome, and describes how different criteria affect that outcome.

  • A mathematical model that forecasts sales.

  • A set of rules that describe how products are grouped together in a transaction, and the probabilities that products are purchased together.

The algorithms provided in SQL Server Data Mining are the most popular, well-researched methods of deriving patterns from data. To take one example, K-means clustering is one of the oldest clustering algorithms and is available widely in many different tools and with many different implementations and options. However, the particular implementation of K-means clustering used in SQL Server Data Mining was developed by Microsoft Research and then optimized for performance with Analysis Services. All of the Microsoft data mining algorithms can be extensively customized and are fully programmable, using the provided APIs. You can also automate the creation, training, and retraining of models by using the data mining components in Integration Services.

You can also use third-party algorithms that comply with the OLE DB for Data Mining specification, or develop custom algorithms that can be registered as services and then used within the SQL Server Data Mining framework.

Choosing the best algorithm to use for a specific analytical task can be a challenge. While you can use different algorithms to perform the same business task, each algorithm produces a different result, and some algorithms can produce more than one type of result. For example, you can use the Microsoft Decision Trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can identify columns that do not affect the final mining model.

Choosing an Algorithm by Type

SQL Server Data Mining includes the following algorithm types:

  • Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset.

  • Regression algorithms predict one or more continuous numeric variables, such as profit or loss, based on other attributes in the dataset.

  • Segmentation algorithms divide data into groups, or clusters, of items that have similar properties.

  • Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis.

  • Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a series of clicks in a web site, or a series of log events preceding machine maintenance.

However, there is no reason that you should be limited to one algorithm in your solutions. Experienced analysts will sometimes use one algorithm to determine the most effective inputs (that is, variables), and then apply a different algorithm to predict a specific outcome based on that data. SQL Server Data Mining lets you build multiple models on a single mining structure, so within a single data mining solution you could use a clustering algorithm, a decision trees model, and a Naïve Bayes model to get different views on your data. You might also use multiple algorithms within a single solution to perform separate tasks: for example, you could use regression to obtain financial forecasts, and use a neural network algorithm to perform an analysis of factors that influence forecasts.

Choosing an Algorithm by Task

To help you select an algorithm for use with a specific task, the following table provides suggestions for the types of tasks for which each algorithm is traditionally used.

Examples of tasksMicrosoft algorithms to use
Predicting a discrete attribute:

Flag the customers in a prospective buyers list as good or poor prospects.

Calculate the probability that a server will fail within the next 6 months.

Categorize patient outcomes and explore related factors.
Microsoft Decision Trees Algorithm

 Microsoft Naive Bayes Algorithm

 Microsoft Clustering Algorithm

 Microsoft Neural Network Algorithm
Predicting a continuous attribute:

Forecast next year's sales.

Predict site visitors given past historical and seasonal trends.

Generate a risk score given demographics.
Microsoft Decision Trees Algorithm

 Microsoft Time Series Algorithm

 Microsoft Linear Regression Algorithm
Predicting a sequence:

Perform clickstream analysis of a company's Web site.

Analyze the factors leading to server failure.

Capture and analyze sequences of activities during outpatient visits, to formulate best practices around common activities.
Microsoft Sequence Clustering Algorithm
Finding groups of common items in transactions:

Use market basket analysis to determine product placement.

Suggest additional products to a customer for purchase.

Analyze survey data from visitors to an event, to find which activities or booths were correlated, to plan future activities.
Microsoft Association Algorithm

 Microsoft Decision Trees Algorithm
Finding groups of similar items:

Create patient risk profiles groups based on attributes such as demographics and behaviors.

Analyze users by browsing and buying patterns.

Identify servers that have similar usage characteristics.
Microsoft Clustering Algorithm

 Microsoft Sequence Clustering Algorithm

The following table provides links to learning resources for each of the data mining algorithms that are provided in SQL Server Data Mining:

Basic algorithm descriptionExplains what the algorithm does and how it works, and outlines possible business scenarios where the algorithm might be useful.
Microsoft Association Algorithm

 Microsoft Clustering Algorithm

 Microsoft Decision Trees Algorithm

 Microsoft Linear Regression Algorithm

 Microsoft Logistic Regression Algorithm

 Microsoft Naive Bayes Algorithm

 Microsoft Neural Network Algorithm

 Microsoft Sequence Clustering Algorithm

 Microsoft Time Series Algorithm
Technical referenceProvides technical detail about the implementation of the algorithm, with academic references as necessary. Lists the parameters that you can set to control the behavior of the algorithm and customize the results in the model. Describes data requirements and provides performance tips if possible.
Microsoft Association Algorithm Technical Reference

 Microsoft Clustering Algorithm Technical Reference

 Microsoft Decision Trees Algorithm Technical Reference

 Microsoft Linear Regression Algorithm Technical Reference

 Microsoft Logistic Regression Algorithm Technical Reference

 Microsoft Naive Bayes Algorithm Technical Reference

 Microsoft Neural Network Algorithm Technical Reference

 Microsoft Sequence Clustering Algorithm Technical Reference

 Microsoft Time Series Algorithm Technical Reference
Model contentExplains how information is structured within each type of data mining model, and explains how to interpret the information stored in each of the nodes.
Mining Model Content for Association Models (Analysis Services - Data Mining)

 Mining Model Content for Clustering Models (Analysis Services - Data Mining)

 Mining Model Content for Decision Tree Models (Analysis Services - Data Mining)

 Mining Model Content for Linear Regression Models (Analysis Services - Data Mining)

 Mining Model Content for Logistic Regression Models (Analysis Services - Data Mining)

 Mining Model Content for Naive Bayes Models (Analysis Services - Data Mining)

 Mining Model Content for Neural Network Models (Analysis Services - Data Mining)

 Mining Model Content for Sequence Clustering Models (Analysis Services - Data Mining)

 Mining Model Content for Time Series Models (Analysis Services - Data Mining)
Data mining queriesProvides multiple queries that you can use with each model type. Examples include content queries that let you learn more about the patterns in the model, and prediction queries to help you build predictions based on those patterns.
Association Model Query Examples

 Clustering Model Query Examples

 Decision Trees Model Query Examples

 Linear Regression Model Query Examples

 Logistic Regression Model Query Examples

 Naive Bayes Model Query Examples

 Neural Network Model Query Examples

 Sequence Clustering Model Query Examples

 Time Series Model Query Examples
Determine the algorithm used by a data mining modelQuery the Parameters Used to Create a Mining Model
Create a Custom Plug-In AlgorithmPlugin Algorithms
Explore a model using an algorithm-specific viewerData Mining Model Viewers
View the content of a model using a generic table formatBrowse a Model Using the Microsoft Generic Content Tree Viewer
Learn about how to set up your data and use algorithms to create modelsMining Structures (Analysis Services - Data Mining)

 Mining Models (Analysis Services - Data Mining)

Data Mining Tools

Community Additions