Points of Significance: Visualizing samples with box plots

Journal name:
Nature Methods
Volume:
11,
Pages:
119–120
Year published:
DOI:
doi:10.1038/nmeth.2813
Published online

Use box plots to illustrate the spread and differences of samples.

At a glance

Figures

  1. The construction of a box plot.
    Figure 1: The construction of a box plot.

    (a) The median (m = −0.19, solid vertical line) and interquartile range (IQR = 1.38, gray shading) are ideal for characterizing asymmetric or irregularly shaped distributions. A skewed normal distribution is shown with mean μ = 0 (dark dotted line) and s.d. σ = 1 (light dotted lines). (b) Box plots for an n = 20 sample from a. The box bounds the IQR divided by the median, and Tukey-style whiskers extend to a maximum of 1.5 × IQR beyond the box. The box width may be scaled by √n, and a notch may be added approximating a 95% confidence interval (CI) for the median. Open circles are sample data points. Dotted lines indicate the lengths or widths of annotated features.

  2. Box plots reflect sample variability and should be avoided for very small samples (n < 5), with notches shown only when they appear within the IQR.
    Figure 2: Box plots reflect sample variability and should be avoided for very small samples (n < 5), with notches shown only when they appear within the IQR.

    Tukey-style box plots for five samples with sample size n = 5, 10, 20 and 50 drawn from the distribution in Figure 1a are shown; notch width is as in Figure 1b. Vertical dotted lines show Q1 (−0.78), median (−0.19), Q3 (0.60) and Q3 + 1.5 × IQR (2.67) values for the distribution.

  3. Quartiles are more intuitive than the mean and s.d. for samples from skewed distributions.
    Figure 3: Quartiles are more intuitive than the mean and s.d. for samples from skewed distributions.

    Four distributions with the same mean (μ = 0, dark dotted line) and s.d. (σ = 1, light dotted lines) but significantly different medians (m) and IQRs are shown with corresponding Tukey-style box plots for n = 10,000 samples.

  4. Box plots are a more communicative way to show sample data.
    Figure 4: Box plots are a more communicative way to show sample data.

    Data are shown for three n = 20 samples from normal distributions with s.d. σ = 1 and mean μ = 1 (A,B) or 3 (C). (a) Showing sample mean and s.e.m. using bar plots is not recommended. Note how the change of baseline or cutting the y axis affects the comparative heights of the bars. (b) When sample size is sufficiently large (n > 3), scatter plots with s.e.m. or 95% confidence interval (CI) error bars are suitable for comparing central tendency. (c) Box plots may be combined with sample mean and 95% CI error bars to communicate more information about samples in roughly the same amount of space.

References

  1. McGill, R., Tukey, J.W & Larsen, W.A. Am. Stat. 32, 1216 (1978).
  2. Krzywinski, M. & Altman, N. Nat. Methods 10, 921922 (2013).
  3. Streit, M. & Gehlenborg, N. Nat. Methods 11, 117 (2014).
  4. Spitzer, M. et al. Nat. Methods 11, 121122 (2014).

Download references

Author information

Affiliations

  1. Martin Krzywinski is a staff scientist at Canada's Michael Smith Genome Sciences Centre.

  2. Naomi Altman is a Professor of Statistics at The Pennsylvania State University.

Competing financial interests

The authors declare no competing financial interests.

Author details

Additional data