Points of Significance: Classification and regression trees

Journal name:
Nature Methods
Volume:
14,
Pages:
757–758
Year published:
DOI:
doi:10.1038/nmeth.4370
Published online
Corrected online

Decision trees are a simple but powerful prediction method.

At a glance

Figures

  1. A classification decision tree is built by partitioning the predictor variable to reduce class mixing at each split.
    Figure 1: A classification decision tree is built by partitioning the predictor variable to reduce class mixing at each split.

    (a) An n = 60 sample with one predictor variable (X) and each point belonging to one of three classes (green, dark gray, blue). Three possible splits are shown at X = 20 (X < 20 and X > 20), X = 38 and X = 46 along with the number of points in the resulting subsets (n1, n2), their breakdown by class (colored numbers), the purity of the subset (Ig(S1), Ig(S2)) and information gain for the split (IGg), based on the Gini index. The sample's Gini index is Ig = 0.67. (b) The information gain based on Gini index (IGg), entropy (IGe) and misclassification error (IGc) for all possible first splits. The maxima of IGg, IGe and IGc are at X = 38, 34 and 29–34, respectively. (c) The decision tree classifier of sample in a based on IGg. Large text in each node is the number of points colored by predicted class. Smaller text indicates class membership in each subset. N, no; Y, yes.

  2. Regression trees predict a continuous variable using steps in which the prediction is constant.
    Figure 2: Regression trees predict a continuous variable using steps in which the prediction is constant.

    (a) A nonlinear function (black) with its prediction (gray) based on a regression tree. (b) Splits in the regression tree minimize mean square error (MSE), which is shown here for all possible positions of the first split. (c) The full regression tree for the prediction shown in a. For each split, the absolute MSE and relative (to the first node) rMSE is shown along with the difference in successive rMSE values, a. The tree was built with cutoff of a = 0.01, which terminates the growth of the tree at the dashed line.

  3. Decision trees can be applied to many predictor variables.
    Figure 3: Decision trees can be applied to many predictor variables.

    (a) Classification boundaries for an n = 100 sample with two predictor variables (X, Y) and four categories (colors). (b) Decision tree built with a = 0.01 of the data set in a.

Change history

Corrected online 28 July 2017
In the version of this article initially published, the expression (g1, g2) used to describe a sample subset in the Figure 1 legend was incorrect. The correct expression is (Ig(S1), Ig(S2)). The error has been corrected in the HTML and PDF versions of the article.

References

  1. Lever, J., Krzywinski, M. & Altman, N. Nat. Methods 13, 541542 (2016).
  2. Altman, N. & Krzywinski, M. Nat. Methods 12, 9991000 (2015).
  3. Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. Classification and Regression Trees (Wadsworth, 1984).
  4. Lever, J., Krzywinski, M. & Altman, N. Nat. Methods 13, 703704 (2016).
  5. Lever, J., Krzywinski, M. & Altman, N. Nat. Methods 13, 803804 (2016).

Download references

Author information

Affiliations

  1. Martin Krzywinski is a staff scientist at Canada's Michael Smith Genome Sciences Centre.

  2. Naomi Altman is a Professor of Statistics at the Pennsylvania State University.

Author details

Additional data