Highlight Exceptions Video Tutorial (Data Mining Table Analysis Tool)
Applies to:Microsoft SQL Server Analysis Services
Author: Michele Hart, Microsoft Corporation
Narrator: Mary Brennan, Microsoft Corporation
Size: 19,479 KB
Type: WMV file
Hi, my name is Mary Brennan. I'm a Technical Writer for Microsoft SQL Server. This video will help you get started using the Highlight Exceptions tool. The Highlight Exceptions tool uses the Microsoft clustering algorithm and pattern analysis to find values in a dataset that are atypical. They might be out of the range of most other values, missing, or even wrong, and can affect the quality of your analysis. The Highlight Exceptions tool helps you find these values and review them for further action.
The Highlight Exceptions tool can work with the entire range of data in an Excel data table, or you can select only a few columns. You can also adjust a threshold that controls the variability of data, to find more or fewer exceptions.
In this tutorial we will use the Highlight Exceptions tool to identify data that might be abnormal. These values are known as “outliers”. They might be caused by data entry errors, or could be genuinely unusual values that require further analysis. We will then review the outliers and make corrections. Finally, we will adjust the Exception threshold to limit our outliers to only the most significant.
To begin, select the Table Analysis Tools Sample tab and click anywhere inside the table to activate the Table Analysis Tools.
Under the Table Tools menu select the Analyze tab to open the Table Analysis Tools ribbon.
Double-click Highlight Exceptions to launch the Wizard.
In the Column Selection window, select the columns that you want to analyze for exceptions. Deselect the input columns that you know to have poor information or that are unlikely to be useful in creating a pattern. For example, deselect columns that have many missing or zero values and deselect columns that contain distinct values such as names or ID numbers.
Click Run. The reports display in a new worksheet.
The Outliers worksheet displays a summary report of how many outliers were found in each of the columns that you analyzed. The tool also highlights the exceptions in the original data table.
Our example found 34 outliers that were above the exception threshold, most of them in the Children and Cars columns. When we go back to the sample data, we can easily search for these outliers. Dark highlighting indicates that the row needs attention. Bright highlighting indicates that the value in that particular cell was identified as suspect.
In the Table Analysis Tools Samples worksheet, sort the Age column from oldest to youngest.
The 3rd and 6th row both have outliers based on Age. Perhaps it is peculiar for a 78-year-old to be earning $100,000 and to have purchased a bike.
Scroll down to outlier 17657. You realize this is a mistake and that this 30-year-old customer only has two children.
Change the value from 4 to 2. The new value is instantly analyzed and, now that it falls within the expected range, the highlighting is removed.
After reviewing the highlighted cells, return to the summary report and change the Exception threshold value to reduce the number of outliers. The initial value for the Exception threshold is always 75, meaning that the algorithm calculated there is a 75% chance that the highlighted data is wrong. However, we only want to see the most significant outliers.
Increase the Exception threshold to 90. The number of exceptions drops to seven. Back on the source worksheet, only seven rows are now highlighted.