Points of Significance: Regression diagnostics

Journal name:
Nature Methods
Volume:
13,
Pages:
385–386
Year published:
DOI:
doi:10.1038/nmeth.3854
Published online

Residual plots can be used to validate assumptions about the regression model.

At a glance

Figures

  1. Residual plots are helpful in assessments of nonlinear trends and heteroscedasticity.
    Figure 1: Residual plots are helpful in assessments of nonlinear trends and heteroscedasticity.

    (a) Fit and residual plot for linear regression of n = 40 observations of weight (W) versus height (H) for three scenarios: the linear model W = −45 + 2H/3 + ε, where ε ~ N(0, 1) (left), the quadratic model W = −45 + 2H/3 − (H − 165)2/15 + ε, where ε ~ N(0, 1) (middle), and the linear model with heteroscedastic noise (nonconstant variance) ε ~ N(0, ((H − 160)/5)2). Shown are the fit line (black line), model (blue line), sample means (dotted lines), 95% confidence interval (dark gray area) and 95% prediction interval (light gray area). (b) Residual plots for the fit including box plots of residuals and smoothed nonparametric fits (solid lines). When assumptions are met, plots should have zero mean, constant spread and no global trends (left). Global trends with zero mean can indicate nonlinear terms (middle). Note that for the heteroscedastic noise scenario, the absolute value of residuals is shown with a mean of 0.68.

  2. Q-Q (normal probability) plots compare the differences between two distributions by showing how their quantiles differ.
    Figure 2: Q–Q (normal probability) plots compare the differences between two distributions by showing how their quantiles differ.

    (a) Probability plots for n = 40 noise samples and their box plots drawn from three noise distributions. The distributions all have means of 0 and variance of 1. (b) Regression fits of n = 40 observations for the model W = −45 + 2H/3 + ε, where the samples from a are used for the noise. Variables and plot elements are defined as in Figure 1. (c) Q–Q plots and box plots for residuals in fits shown in b.

References

  1. Altman, N. & Krzywinski, M. Nat. Methods 12, 9991000 (2015).
  2. Krzywinski, M. & Altman, N. Nat. Methods 12, 11031104 (2015).
  3. Altman, N. & Krzywinski, M. Nat. Methods 13, 281282 (2016).
  4. Eicker, F. in Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability Vol. 1 (eds. Le Cam, L.M., Neyman, J. & Scott, E.M.) 5982 (Univ. of California Press, 1967)
  5. Lumley, T. et al. Annu. Rev. Public Health 23, 151169 (2002).

Download references

Author information

Affiliations

  1. Naomi Altman is a Professor of Statistics at The Pennsylvania State University.

  2. Martin Krzywinski is a staff scientist at Canada's Michael Smith Genome Sciences Centre.

Competing financial interests

The authors declare no competing financial interests.

Author details

Additional data