Prediction Calculator Video Tutorial (Data Mining Table Analysis Tool)

Video Summary

In this tutorial we will learn how to use the Prediction Calculator Table Analysis Tool for Excel 2007.

Video Transcript

Introduction

Hi, my name is Mary Brennan. I'm a Technical Writer for Microsoft SQL Server. This video will help you get started using the Prediction Calculator tool. This tool generates both interactive and printable calculators for scoring new data against a desired outcome. For example, it could be used to determine if a loan application should be approved or not based on the scores it computes for customer attributes like occupation and income.

In addition to generating calculators, the Prediction Calculator tool also creates a worksheet that stores all the underlying calculations, so that you can interact with the model and see how different input values affect the final score.

Basically, the Prediction Calculator tool predicts an outcome based on the data.

The Wizard

  1. To begin, select the Table Analysis Tools Sample tab and click anywhere inside the table to activate the Table Analysis Tools.

  2. Under the Table Tools menu, select the Analyze tab to open the Table Analysis Tools ribbon.

  3. Double-click Prediction Calculator to launch the wizard.

  4. Identify the Target to predict. In this case, we select Purchased Bike.

    This will create a score for how likely a particular customer is to purchase a bike

  5. Because Purchased Bike is discrete, select Exactly and choose Yes.

    If the column you selected as the target contains discrete values, this is the only option available.

    If the column you selected as the target contains continuous numeric values, you would select this option and type a number in the box to specify a single number as the target value.

    Note that the Target: In Range option is available only if the column you selected as the target contains a numeric value.

  6. Click Choose columns to be used for analysis.

  7. Select the columns that are most useful for your analysis. To avoid skewing the results, you should also remove columns that have duplicate information. For example, if you have an Income column that contains numeric data, and an Income Group column that contains the labels High, Medium, and Low, you should not include both columns in the same model. Instead, you could create a separate model for each column.For this tutorial, let’s accept the default selections and click OK.

  8. Check both Operational Calculator and Printer-ready Calculator.

    The Operational Calculator is interactive. It contains the results of analysis, together with cells where you can type in values and see the predicted scores.

    The Printer-ready Calculator is formatted for printing in the default Excel style. The report contains the same basic information as the Operational Calculator, with checkboxes that you can fill in and use as a checklist in evaluating individual cases by hand.

  9. When you click Run, the reports display in three new worksheets.

The Reports

Prediction Report for Purchased Bike

Open the report called Prediction Report for Purchased Bike. This report contains the results of analysis, complete with interactive tables and graphs that help you experiment with interactions and profits. The tools on this report help you assess the financial impact of a particular analysis.

This report contains a table where you can specify the costs and profits associated with correctly and incorrectly predicting a value. These costs and profits are needed to compute the optimum score threshold for the calculator. For example, this report shows that the cost of an incorrect positive prediction is $10 and the profit from correctly predicting a bike purchase is $10.

As you enter values in the table, the related graphs update automatically to show you the best point for maximizing profit. For example, our Suggested Threshold to maximize profit shows 559. So the chart on the right will show 559 as the highest point in the line graph. To maximize profits, we should use the top 559 recommendations from the mining model, ordered by probability.

This tool is interactive, so let's make some adjustments to the costs. I'm going to change the cost of a false positive prediction to 15 and the cost of a correct positive prediction to 25. Notice that the new threshold number is 499. With this new data, we should use the top 499 recommendations to maximize profit.

The Score Breakdown table in the lower left of the report shows a detailed breakdown of the values that were detected, and how each value affects the outcome. You cannot change the values in this table; they are displayed to help you understand the prediction. You will be able to change these values in the Prediction Calculator worksheet. For example, this table lists each input column that was used in the model, regardless of whether it affected the model. The values in the Relative Impact column are probabilities, represented as percentages. The cell is shaded to visually represent the impact of this value on the outcomes.

You can see that being married does not affect the likelihood of buying a bike. However, being single is a strong indicator that a customer is likely to buy a bike.

Finally, the bottom chart shows the cost of making the wrong prediction. This chart is especially useful in scenarios where the cost of making the wrong decision significantly outweighs the cost of guessing correctly.

Prediction Calculator for Purchased Bike report

Open the Prediction Calculator for Purchased Bike report. This report is similar to the previous, except that it also allows you to assign new values to attributes to score their impact on bike purchasing.

In this example, the suggested threshold is 561 and the prediction for Bike Purchase is True. Let's see what happens if we change the values for Home Owner and Commute Distance. The calculator assigns a score to each value, in this example when we change Homeowner from No to Yes, the score changes from 5 to 0 and a Commute Distance of 5-10 miles changes the score from 105 to 25. Our bike purchase prediction is now False.

Please note that the scores assigned to each value will vary according to your specific model and data.

With these two reports you no longer have to maintain a connection with SQL Server 2008 to perform predictions. A salesperson, for example, can carry this report on a PC and enter customer profile data to make predictions.

Printable Calculator for Purchased Bike report

Open the Printable Calculator for Purchased Bike report.

The wizard can create a printed version of the worksheet that you can use for offline scoring. You cannot interact with the model as you can with the online Excel workbook, but instead you print this sheet and enter the scores by hand. The printed version provides all the calculations you need to enter values and compute a final score. This is helpful when you don’t have a computer available.

Conclusion

This concludes the Prediction Calculator video tutorial. For additional help with the Table Analysis Tools, I recommend viewing the other Table Analysis Tools video tutorials and the Help documentation included with the Data Mining Add-ins. Thank you for viewing this tutorial.