Points of Significance: Multiple linear regression

Journal name:
Nature Methods
Volume:
12,
Pages:
1103–1104
Year published:
DOI:
doi:10.1038/nmeth.3665
Published online

When multiple variables are associated with a response, the interpretation of a prediction equation is seldom simple.

At a glance

Figures

  1. The results of multiple linear regression depend on the correlation of the predictors, as measured here by the Pearson correlation coefficient r (ref. 2).
    Figure 1: The results of multiple linear regression depend on the correlation of the predictors, as measured here by the Pearson correlation coefficient r (ref. 2).

    (a) Simulated values of uncorrelated predictors, r(H,J) = 0. The thick gray line is the regression line, and thin gray lines show the 95% confidence interval of the fit. (b) Regression of weight (W) on height (H) and of weight on jump height (J) for uncorrelated predictors shown in a. Regression slopes are shown (bH = 0.71, bJ = −0.088). (c) Simulated values of correlated predictors, r(H,J) = 0.9. Regression and 95% confidence interval are denoted as in a. (d) Regression (red lines) using correlated predictors shown in c. Light red lines denote the 95% confidence interval. Notice that bJ = 0.097 is now positive. The regression line from b is shown in blue. In all graphs, horizontal and vertical dotted lines show average values.

  2. Results and interpretation of multiple regression changes with the sample correlation of the predictors.
    Figure 2: Results and interpretation of multiple regression changes with the sample correlation of the predictors.

    Shown are the values of regression coefficient estimates (bH, bJ, b0) and R2 and the significance of the test used to determine whether the coefficient is zero from 250 simulations at each value of predictor sample correlation −1 < r(H,J) < 1 for each scenario where either H or J or both H and J predictors are fitted in the regression. Thick and thin black curves show the coefficient estimate median and the boundaries of the 10th–90th percentile range, respectively. Histograms show the fraction of estimated P values in different significance ranges, and correlation intervals are highlighted in red where >20% of the P values are >0.01. Actual regression coefficients (βH, βJ, β0) are marked on vertical axes. The decrease in significance for bJ when jump height is the only predictor and r(H,J) is moderate (red arrow) is due to insufficient statistical power (bJ is close to zero). When predictors are uncorrelated, r(H,J) = 0, R2 of individual regressions sum to R2 of multiple regression (0.66 + 0.19 = 0.85). Panels are organized to correspond to Table 1, which shows estimates of a single trial at two different predictor correlations.

  3. Regression coefficients and R2
    Supplementary Fig. 1: Regression coefficients and R2

    The significance and value of regression coefficients and R2 for a model with both regression coefficients positive, E(W|H,J) = 0.7H + 0.08J - 46.5 + ε. The format of the figure is the same as that of Figure 2.

References

  1. Altman, N. & Krzywinski, M. Nat. Methods 12, 9991000 (2015).
  2. Altman, N. & Krzywinski, M. Nat. Methods 12, 899900 (2015).

Download references

Author information

Affiliations

  1. Naomi Altman is a Professor of Statistics at The Pennsylvania State University.

  2. Martin Krzywinski is a staff scientist at Canada's Michael Smith Genome Sciences Centre.

Competing financial interests

The authors declare no competing financial interests.

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Regression coefficients and R2 (226 KB)

    The significance and value of regression coefficients and R2 for a model with both regression coefficients positive, E(W|H,J) = 0.7H + 0.08J - 46.5 + ε. The format of the figure is the same as that of Figure 2.

PDF files

  1. Supplementary Figure 1 (299 KB)

    Regression coefficients and R2

Additional data