Points of Significance: Regularization

Journal name:
Nature Methods
Volume:
13,
Pages:
803–804
Year published:
DOI:
doi:10.1038/nmeth.4014
Published online

Constraining the magnitude of parameters of a model can control its complexity

At a glance

Figures

  1. Regularization controls model complexity by imposing a limit on the magnitude of its parameters.
    Figure 1: Regularization controls model complexity by imposing a limit on the magnitude of its parameters.

    (a) Complexity of polynomial models of orders 1 to 5 fit to data points in b as measured by the sum of squared parameters. The highest order polynomial is the most complex (blue bar) and drastically overfits, fitting the data exactly (blue trace in b). (b) The effect of λ on ridge regression regularization of a fifth-order polynomial model fit to the six data points. Higher values of λ decrease the magnitude of parameters, lower model complexity and reduce overfitting. When λ = 0 the model is not regularized (blue trace). A linear fit is shown with a dashed line. (c) The effect of λ on the classification decision boundary of logistic regression applied to two classes (black and white). The regression fits a third-order polynomial to variables X and Y.

  2. Ridge regression (RR) can resolve cases where multiple models yield the same quality fit to the data.
    Figure 2: Ridge regression (RR) can resolve cases where multiple models yield the same quality fit to the data.

    (a) The SSE of a multiple linear regression fit of the model for various values of and . For each parameter pair, 1,000 uniformly distributed and perfectly correlated samples x1 = x2 were used with an underlying model of y = 6x1 + 2x2. The minimum SSE is achieved by all models that fall on the black line , shown in all panels. (b) The value of the RR regularizer function, , which favors smaller magnitudes for each parameter. (c) A unique model parameter solution (black point) found by minimizing the sum of the SSE and the regularizer function shown in b. As λ is increased, the solution moves along the dotted line toward the origin. Color ramp is logarithmic.

  3. LASSO and elastic net (EN) can remove variables from a model by forcing their parameters to zero.
    Figure 3: LASSO and elastic net (EN) can remove variables from a model by forcing their parameters to zero.

    (a) The number of parameters forced to zero by each method as a function of λ. For each λ, 800 uniformly distributed samples were generated from a 100-variable underlying model with normally distributed parameters. For EN, an equal balance of the LASSO and RR penalties was used. (b) The same scenario as in Figure 2b but with independent and uniformly distributed x1 and x2. Constraint space and best solutions are shown for RR (T = 9, solid circle, black point) and LASSO (T = 3, dashed square, white point). Lighter circle and square correspond to RR with T = 25 and LASSO with T = 5. As regularization is relaxed and T is increased, the solutions for each method follow the dotted lines, and both approach the estimated parameters achieved without regularization: . (c) RR and LASSO constraint spaces shown as in b for the correlated data scenario in Figure 2a. RR yields a unique solution, while LASSO has multiple solutions that have the same minimum SSE. EN (not shown) also yields a unique solution, and its boundary is square with slightly bulging edges.

References

  1. Lever, J., Krzywinski, M. & Altman, N. Nat. Methods 13, 703704 (2016).
  2. Zou, H. & Hastie, T. J. R. Stat. Soc. B 67, 301320 (2005).

Download references

Author information

Affiliations

  1. Jake Lever is a PhD candidate at Canada's Michael Smith Genome Sciences Centre.

  2. Martin Krzywinski is a staff scientist at Canada's Michael Smith Genome Sciences Centre.

  3. Naomi Altman is a Professor of Statistics at The Pennsylvania State University.

Competing financial interests

The authors declare no competing financial interests.

Author details

Additional data