SystemGetCrossValidationResults (Analysis Services - Data Mining)
Partitions the mining structure into the specified number of cross-sections, trains a model for each partition, and then returns accuracy metrics for each partition.
Note
|
|---|
|
This stored procedure cannot be used to cross-validate clustering models, or models that are built by using the Microsoft Time Series algorithm or the Microsoft Sequence Clustering algorithm. To cross-validate clustering models, you can use the separate stored procedure, SystemGetClusterCrossValidationResults (Analysis Services - Data Mining). |
The rowset that is returned contains scores for each partition in each model.
The following table describes the columns in the rowset.
|
Column Name |
Description |
|---|---|
|
ModelName |
The name of the model that was tested. |
|
AttributeName |
The name of the predictable column. |
|
AttributeState |
A specified target value in the predictable column. If this value is null, the most probable prediction was used. If this column contains a value, the accuracy of the model is assessed against this value only. |
|
PartitionIndex |
An 1-based index that identifies to which partition the results apply. |
|
PartitionSize |
An integer that indicates how many cases were included in each partition. |
|
Test |
Category of the test that was performed. For a description of the categories and the tests that are included in each category, see Measures in the Cross-Validation Report. |
|
Measure |
The name of the measure returned by the test. Measures for each model depend on the type of the predictable value. For a definition of each measure, see Cross-Validation (Analysis Services - Data Mining). For a list of measures returned for each predictable type, see Measures in the Cross-Validation Report. |
|
Value |
The value of the specified test measure. |
To return accuracy metrics for the complete data set, use SystemGetAccuracyResults (Analysis Services - Data Mining).
If the mining model has already been partitioned into folds, you can bypass processing and return only the results of cross-validation by using SystemGetAccuracyResults (Analysis Services - Data Mining).
The following example demonstrates how to partition a mining structure for cross-validation into two folds, and then test two mining models that are associated with the mining structure, [v Target Mail].
Line three of the code lists the mining models that you want to test. If you do not specify the list, all non-clustering models associated with the structure are used. Line four of the code specifies the number of partitions. Because no value is specified for max cases, all cases in the mining structure are used and distributed evenly across the partitions.
Line five specifies the predictable attribute, Bike Buyer, and line six specifies the value to predict, 1 (meaning "yes, will buy").
The NULL value in line seven indicates that there is no minimum probability bar that must be met. Therefore, the first prediction that has a non-zero probability will be used in assessing accuracy.
CALL SystemGetCrossValidationResults( [v Target Mail], [Target Mail DT], [Target Mail NB], 2, 'Bike Buyer', 1, NULL )
Sample results:
|
ModelName |
AttributeName |
AttributeState |
PartitionIndex |
PartitionSize |
Test |
Measure |
Value |
|---|---|---|---|---|---|---|---|
|
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Classification |
True Positive |
144 |
|
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Classification |
False Positive |
105 |
|
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Classification |
True Negative |
186 |
|
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Classification |
False Negative |
65 |
|
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Likelihood |
Log Score |
-0.619042807138345 |
|
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Likelihood |
Lift |
0.0740963734002671 |
|
Target Mail DT |
Bike Buyer |
1 |
1 |
500 |
Likelihood |
Root Mean Square Error |
0.346946279977653 |
|
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Classification |
True Positive |
162 |
|
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Classification |
False Positive |
86 |
|
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Classification |
True Negative |
165 |
|
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Classification |
False Negative |
87 |
|
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Likelihood |
Log Score |
-0.654117781086519 |
|
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Likelihood |
Lift |
0.038997399132084 |
|
Target Mail DT |
Bike Buyer |
1 |
2 |
500 |
Likelihood |
Root Mean Square Error |
0.342721344892651 |
Note