Detect Categories Video Tutorial (Data Mining Table Analysis Tool)
Applies to:Microsoft SQL Server Analysis Services
Author: Michele Hart, Microsoft Corporation
Narrator: Mary Brennan, Microsoft Corporation
Size: 18,295 KB
Type: WMV file
Hi, my name is Mary Brennan. I’m a technical writer for Microsoft SQL Server.
The Detect Categories tool uses the Microsoft clustering algorithm to automatically find rows in a table that have similar characteristics.
When the tool finishes, it creates a report that lists the categories it found, together with their distinguishing characteristics. By default, it adds a new column to the data table that contains the proposed category for each row of your data. You can then review the categories and rename them.
To begin, we will make a copy of the Table Analysis Tools Sample worksheet.
When you use the Detect Categories tool, it creates a new column in the original data table. If you perform subsequent data mining operations, the presence of this column could influence your results. To protect against this, we’ll first make a copy of the worksheet.
Right-click the Table Analysis Tools Sample tab and select Move or Copy.
Check Create a copy.
Highlight Table Analysis Tools Sample to place the copy before the original.
Select the Table Analysis Tools Sample (2) tab and click anywhere inside the table to activate the Table Analysis Tools.
Under the Table Tools menu select the Analyze tab to open the Table Analysis Tools ribbon.
Click Detect Categories to launch the wizard.
In the wizard dialog, specify the columns to use in analysis. You can deselect columns that have distinct values, such as personal names or record IDs, because these columns might not be useful for analysis. Notice that ID is automatically deselected. We will leave everything else selected.
Optionally, you can also specify the maximum number of categories to create. By default, the tool automatically creates as many categories as it finds. We will leave the default setting.
Check the option to Append a Category column to the original Excel table.
The tool creates a new worksheet, named Categories Report, which contains the list of categories and their characteristics.
The Categories Report contains two tables and a Category Profiles chart. By exploring these results, we can discover patterns in our data that will help us learn about our customers.
The first table lists the new categories by their temporary names and shows the number of rows in the original data that were classified into that category.
The second table, Category Characteristics, shows details about the similarities that were found in the category. Click the Filter button at the top of the Category column to see the characteristics for a different category.
The Relative importance shading bar indicates how important the attribute and value pair is as a distinguishing factor. The longer the bar, the more likely it is that this attribute is representative of this category.
In this example, we see that Category 1 is made up of European customers with low incomes and short commutes. Let’s use this information to give Category 1 a more meaningful name.
In the first table, double-click to select Category 1.
Type Eur Low-Income, Short Commute.
Then widen the column to see the entire label.
The new category label is immediately propagated both to the other chart and to the category assignment column in the source data worksheet. It is not, however, updated on the Category Profiles chart.
The Category Profiles chart at the bottom of the page is an interactive tool that enables you to explore your findings in more detail.
Click anywhere in the chart to display a Pivot Table chart control that enables you to interactively filter and rearrange fields.
The Excel menu ribbon also displays a PivotChart Tools tab that contains various options for reformatting the chart and rearranging, merging, and manipulating the data.
The next thing we want to do is update the chart to display our new Category name.
Click outside the chart and press CTRL-ALT-F5. Our new label displays. Remember to use CTRL-ALT-F5 to update the chart whenever you make a change in one of the tables above or with the Pivot Table control.
Looking at the chart we see that Category 3 is made up of very young customers. The blue bar represents Very Low age and the sliver of red represents Low age.
Let’s look at a different column.
In the PivotTable Field List, select Column.
Remove Age and add Region instead.
Over on the right our Category 1 of European Low-Income Short-Commute displays a large percentage of European customers.
There is much more you can do with this chart. Links to more information on using Pivot Tables is provided above.
Let’s go back to our data.
Click the Table Analysis Tools Sample (2) tab. All the way to the right our new column appears for Category. We can now sort and filter our data by Category.
In the Category dropdown, under Text Filters, put a checkmark next to Eur Low-Income, Short Commute.
Your data is now sorted and filtered by Category.