SystemGetCrossValidationResults
Collapse the table of content
Expand the table of content

SystemGetCrossValidationResults (Analysis Services - Data Mining)

 

Applies To: SQL Server 2016

Partitions the mining structure into the specified number of cross-sections, trains a model for each partition, and then returns accuracy metrics for each partition.

System_CAPS_ICON_note.jpg Note


This stored procedure cannot be used to cross-validate clustering models, or models that are built by using the Microsoft Time Series algorithm or the Microsoft Sequence Clustering algorithm. To cross-validate clustering models, you can use the separate stored procedure, SystemGetClusterCrossValidationResults (Analysis Services - Data Mining).

  
SystemGetCrossValidationResults(  
<mining structure>  
[, <mining model list>]  
,<fold count>  
,<max cases>  
,<target attribute>  
[,<target state>]  
[,<target threshold>]  
[,<test list>])  

mining structure
Name of a mining structure in the current database.

(required)

mining model list
Comma-separated list of mining models to validate.

If a model name contains any characters that are not valid in the name of an identifier, the name must be enclosed in brackets.

If a list of mining models is not specified, cross-validation is performed against all models that are associated with the specified structure and that contain a predictable attribute.

System_CAPS_ICON_note.jpg Note


To cross-validate clustering models, you must use a separate stored procedure, SystemGetClusterCrossValidationResults (Analysis Services - Data Mining).

(optional)

fold count
Integer that specifies the number of partitions into which to separate the data set. The minimum value is 2. The maximum number of folds is maximum integer or the number of cases, whichever is lower.

Each partition will contain roughly this number of cases: max cases/fold count.

There is no default value.

System_CAPS_ICON_note.jpg Note


The number of folds greatly affects the time that is required to perform cross-validation. If you select a number that is too high, the query might run for a very long time, and in some cases the server can become unresponsive or time out.

(required)

max cases
Integer that specifies the maximum number of cases that can be tested across all folds.

A value of 0 indicates that all the cases in the data source will be used.

If you specify a value that is greater than the actual number of cases in the data set, all cases in the data source will be used.

There is no default value.

(required)

target attribute
String that contains the name of the predictable attribute. A predictable attribute can be a column, nested table column, or nested table key column of a mining model.

System_CAPS_ICON_note.jpg Note


The existence of the target attribute is validated only at run time.

(required)

target state
Formula that specifies the value to predict. If a target value is specified, metrics are collected for the specified value only.

If a value is not specified or is null, the metrics are computed for the most probable state for each prediction.

The default is null.

An error is raised during validation if the specified value is not valid for the specified attribute, or if the formula is not the correct type for the specified attribute.

(optional)

target  threshold
Double greater than 0 and less than 1. Indicates the minimum probability score that must be obtained for the prediction of the specified target state to be counted as correct.

A prediction that has a probability less than or equal to this value is considered incorrect.

If no value is specified or is null, the most probable state is used, regardless of its probability score.

The default is null.

System_CAPS_ICON_note.jpg Note


Analysis Services will not raise an error if you set state threshold to 0.0, but you should never use this value. In effect, a threshold of 0.0 means that predictions with a 0 percent probability are counted as correct.

(optional)

test list
A string that specifies testing options.

Note This parameter is reserved for future use.

(optional)

The rowset that is returned contains scores for each partition in each model.

The following table describes the columns in the rowset.

Column NameDescription
ModelNameThe name of the model that was tested.
AttributeNameThe name of the predictable column.
AttributeStateA specified target value in the predictable column. If this value is null, the most probable prediction was used.

If this column contains a value, the accuracy of the model is assessed against this value only.
PartitionIndexAn 1-based index that identifies to which partition the results apply.
PartitionSizeAn integer that indicates how many cases were included in each partition.
TestCategory of the test that was performed. For a description of the categories and the tests that are included in each category, see Measures in the Cross-Validation Report.
MeasureThe name of the measure returned by the test. Measures for each model depend on the type of the predictable value. For a definition of each measure, see Cross-Validation (Analysis Services - Data Mining).

For a list of measures returned for each predictable type, see Measures in the Cross-Validation Report.
ValueThe value of the specified test measure.

To return accuracy metrics for the complete data set, use SystemGetAccuracyResults (Analysis Services - Data Mining).

If the mining model has already been partitioned into folds, you can bypass processing and return only the results of cross-validation by using SystemGetAccuracyResults (Analysis Services - Data Mining).

The following example demonstrates how to partition a mining structure for cross-validation into two folds, and then test two mining models that are associated with the mining structure, [v Target Mail].

Line three of the code lists the mining models that you want to test. If you do not specify the list, all non-clustering models associated with the structure are used. Line four of the code specifies the number of partitions. Because no value is specified for max cases, all cases in the mining structure are used and distributed evenly across the partitions.

Line five specifies the predictable attribute, Bike Buyer, and line six specifies the value to predict, 1 (meaning "yes, will buy").

The NULL value in line seven indicates that there is no minimum probability bar that must be met. Therefore, the first prediction that has a non-zero probability will be used in assessing accuracy.

CALL SystemGetCrossValidationResults(  
[v Target Mail],  
[Target Mail DT], [Target Mail NB],  
2,  
'Bike Buyer',  
1,  
NULL  
)  

Sample results:

ModelNameAttributeNameAttributeStatePartitionIndexPartitionSizeTestMeasureValue
Target Mail DTBike Buyer11500ClassificationTrue Positive144
Target Mail DTBike Buyer11500ClassificationFalse Positive105
Target Mail DTBike Buyer11500ClassificationTrue Negative186
Target Mail DTBike Buyer11500ClassificationFalse Negative65
Target Mail DTBike Buyer11500LikelihoodLog Score-0.619042807138345
Target Mail DTBike Buyer11500LikelihoodLift0.0740963734002671
Target Mail DTBike Buyer11500LikelihoodRoot Mean Square Error0.346946279977653
Target Mail DTBike Buyer12500ClassificationTrue Positive162
Target Mail DTBike Buyer12500ClassificationFalse Positive86
Target Mail DTBike Buyer12500ClassificationTrue Negative165
Target Mail DTBike Buyer12500ClassificationFalse Negative87
Target Mail DTBike Buyer12500LikelihoodLog Score-0.654117781086519
Target Mail DTBike Buyer12500LikelihoodLift0.038997399132084
Target Mail DTBike Buyer12500LikelihoodRoot Mean Square Error0.342721344892651

Cross-validation is available only in SQL Server Enterprise beginning with SQL Server 2008.

SystemGetCrossValidationResults
SystemGetAccuracyResults (Analysis Services - Data Mining)
SystemGetClusterCrossValidationResults (Analysis Services - Data Mining)
SystemGetClusterAccuracyResults (Analysis Services - Data Mining)

Community Additions

ADD
Show:
© 2016 Microsoft