Testing the Accuracy of the Mining Models (Data Mining Tutorial)
After you have built, processed, and explored the mining models for the targeted mailing scenario, you can test the models to determine how well they perform predictions and to determine whether one of the models performs better than the others.
On the Mining Accuracy Chart tab of Data Mining Designer, you can calculate how well each model predicts, and you can compare the results of each model directly against the results of the other models. This method of comparison can be referred to as a lift chart. The Mining Accuracy Chart tab uses input data, which is data that is separated from the original dataset, to compare predictions against a known result. The results of the comparison are then sorted and plotted on a graph. An ideal model, a theoretical model that predicts the result correctly 100 percent of the time, is also plotted on the graph. You can compare the results of the actual models against the results of the ideal model, to see how well the models perform predictions. For more information about how lift charts work, see Lift Chart.
The lift chart is important because it helps distinguish between models in a structure that are almost the same, to help you determine which model provides the best predictions. Similarly, the lift chart shows which type of algorithm performs the best predictions for a particular situation. For more information about how to use the Mining Accuracy Chart tab, see Validating Data Mining Models.
In this topic, you will perform the following tasks:
Mapping the Input Columns
Filtering Input Rows
Selecting the Models, Predictable Columns, and Values
Viewing the Lift Chart
The first step in testing the accuracy of your mining models is to map the columns in the mining structure to the columns in the input data. If the column names map directly, Data Mining Designer automatically creates relationships.
On the Column Mapping tab of the Mining Accuracy Chart tab in Data Mining Designer, click Select Case Table on the Select Input Table(s) box.
The Select Table dialog box opens. In this dialog box, you select the table that contains the input data, the data that you want to use in the prediction queries to determine the accuracy of the models. For purposes of this tutorial, you will use the same data for the input columns that you used to process the models. However, ideally the input columns would be separate data, rows that you held out from the data used to process the models. You would select that data as the input columns in the Select Table dialog box.
In the Data Source list, verify that Adventure Works DW is selected.
In the Table/View Name list, select vTargetMail, and then click OK.
The columns in the mining structure are automatically mapped to the columns with the same name in the input table.
A prediction query is generated for each model in the structure based on the column mappings. To delete a mapping between two columns, select the line that links the column in the Mining Structure table to the column in the Select Input Table(s) table, and then press DELETE. You can also manually create mappings by clicking a column in Select Input Table(s) and dragging it onto the corresponding column in Mining Structure.
You can use the grid under Filter the input data used to generate the lift chart to filter the input data. You can drag columns from Select Input Table(s) to the grid, or you can select values by clicking a column of the grid and using the list of values that appears. For example, if you want to limit the input rows to those in which the Income column is greater than x, select vTargetMail in the Source column, Income in the Field column, and then type >x in the Criteria/Argument column.
Note that you will not filter the data in this tutorial.
The next step is to select the models that you want to include in the lift chart, and to select the predictable column against which to compare the models. By default, all the models in the mining structure are selected. You can choose not to include a model, but for this tutorial leave all the models selected.
You can create two types of accuracy charts. If you select a predictable value, you will see a chart that shows how much lift the model provides. If you do not include a predictable value, the chart will show how accurate the model is.
If the Synchronize Prediction Columns and Values check box is selected, the predictable column is synchronized for each mining model in the mining structure.
|The mining model columns that are listed in the Predictable Column Name list are restricted to columns that have the usage type set to Predict or Predict Only. The columns also must be based on mining structure columns that have a content type of Discrete or Discretized.|
In some advanced scenarios, you may want to generate a lift chart that includes a predictable column in two mining models that are not based on the same mining structure column but that contain the same data. If you clear the Synchronize Prediction Columns and Values check box, you can select any valid predictable column and value. The results are plotted together, regardless of whether they make sense.
To view the lift chart, switch to the Lift Chart tab of the Mining Accuracy Chart. When you click the tab, a prediction query runs against the server and database for the mining structure and the input table. The predicted results are compared to the actual values that are known, and plotted on the graph. For more information about how to use the chart, see Lift Chart.