Boosted Decision Tree Regression
Updated: June 9, 2016
Creates a regression model using the Boosted Decision Tree algorithm
You can use the Boosted Decision Tree Regression module to create an ensemble of regression trees using boosting. Boosting means that each tree is dependent on prior trees, and learns by fitting the residual of the trees that preceded it. Thus, boosting in a decision tree ensemble tends to improve accuracy with some small risk of less coverage.
This regression method is a supervised learning method, and therefore requires a labeled dataset. The label column must contain numerical values.
You can train the model by providing the model and the labeled dataset as an input to Train Model or Tune Model Hyperparameters. The trained model can then be used to predict values for the new input examples.
Use this module only with datasets that use numerical variables.
Want to know more about the trees that were created? After the model has been trained, right-click the output of the Train Model module (or Tune Model Hyperparameters module) and select Visualize to see the tree that was created on each iteration. You can drill down into the splits for each tree and see the rules for each node.
Boosting is one of several classic methods for creating ensemble models, along with bagging, random forests, and so forth. In Azure Machine Learning Studio, boosted decision trees use an efficient implementation of the MART gradient boosting algorithm. Gradient boosting is a machine learning technique for regression problems. It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error in each step and correct for it in the next. Thus the prediction model is actually an ensemble of weaker prediction models.
In regression problems, boosting builds a series of trees in a step-wise fashion, and then selects the optimal tree using an arbitrary differentiable loss function.
For additional information, see these articles:
The Wikipedia article on gradient boosting provides background on boosted trees.
Microsoft Research: From RankNet to LambdaRank to LambdaMART: An OverviewC.J.C. Burges.
The gradient boosting method can also be used for classification problems by reducing them to regression with a suitable loss function. For more information about the boosted trees implementation for classification tasks, see Two-Class Boosted Decision Tree.
Add the Boosted Decision Tree module to the experiment.
Specify how you want the model to be trained, by setting the Create trainer mode option.
If you know how you want to configure the model, you can provide a specific set of values as arguments. You might have learned these values by experimentation or received them as guidance.
If you are not sure of the best parameters, you can find the optimal parameters by specifying multiple values and using a parameter sweep to find the optimal configuration.
Tune Model Hyperparameters will iterate over all possible combinations of the settings you provided and determine the combination of settings that produces the optimal results.
For Maximum number of leaves per tree, indicate the maximum number of terminal nodes (leaves) that can be created in any tree.
By increasing this value, you potentially increase the size of the tree and get better precision, at the risk of overfitting and longer training time.
For Minimum number of samples per leaf node, indicate the minimum number of cases required to create any terminal node (leaf) in a tree.
By increasing this value, you increase the threshold for creating new rules. For example, with the default value of 1, even a single case can cause a new rule to be created. If you increase the value to 5, the training data would have to contain at least 5 cases that meet the same conditions.
For Learning rate, type a number between 0 and 1 that defines the step size while learning.
The learning rate determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge on the best solution.
For Number of trees constructed, indicate the total number of decision trees to create in the ensemble. By creating more decision trees, you can potentially get better coverage, but training time will increase.
This value also controls the number of trees displayed when visualizing the trained model. if you want to see or print a ingle tree, you can set the value to 1; however, this means that only one tree will be produced (the tree with the initial set of parameters) and n further iterations will be performed.
For Random number seed, you can type a non-negative integer to use as the random seed value. Specifying a seed ensures reproducibility across runs that have the same data and parameters.
The random seed is set by default to 0, which means the initial seed value is obtained from the system clock.
Select Allow unknown categorical levels option to create a group for unknown values in the training and validation sets.
If you deselect this option, the model can accept only the values that are contained in the training data. In the former case, the model might be less precise for known values, but it can provide better predictions for new (unknown) values.
Run the experiment.
For examples of how boosted trees are used in machine learning, see these sample experiments in the Cortana Intelligence Gallery:
In general, decision trees yield better results when features are somewhat related. If features have a large degree of entropy (that is, they are not related), they share little or no mutual information, and ordering them in a tree will not yield a lot of predictive significance.
The ensemble of trees is produced by computing, at each step, a regression tree that approximates the gradient of the loss function, and adding it to the previous tree with coefficients that minimize the loss of the new tree.
The output of the ensemble produced by MART on a given instance is the sum of the tree outputs.
For binary classification problem, the output is converted to probability by using some form of calibration.
For regression problems, the output is the predicted value of the function.
For ranking problem, the instances are ordered by the output value of the ensemble.
If you pass a parameter range to Train Model, it will use only the first value in the parameter range list.
If you pass a single set of parameter values to the Tune Model Hyperparameters module, when it expects a range of settings for each parameter, it ignores the values and using the default values for the learner.
If you select the Parameter Range option and enter a single value for any parameter, that single value you specified will be used throughout the sweep, even if other parameters change across a range of values.
Maximum number of leaves per tree
Specify the maximum number of leaves per tree
Minimum number of samples per leaf node
Specify the minimum number of cases required to form a leaf node
Specify the initial learning rate
Total number of trees constructed
Specify the maximum number of trees that can be created during training
Random number seed
Provide a seed for the random number generator used by the model. Leave blank for default.
Allow unknown categorical levels
If true, create an additional level for each categorical column. Levels in the test dataset not available in the training dataset are mapped to this additional level.