Points of significance: Sources of variation

Journal name:
Nature Methods
Volume:
12,
Pages:
5–6
Year published:
DOI:
doi:10.1038/nmeth.3224
Published online

Abstract

To generalize conclusions to a population, we must sample its variation.

At a glance

Figures

  1. Internal and external validity relate respectively to how precise and representative the results are of the population of interest.
    Figure 1: Internal and external validity relate respectively to how precise and representative the results are of the population of interest.

    (a) Sampling only a part of the population may create precise measurements, but generalizing to the rest of the population can result in bias. (b) Better representation can be achieved by sampling across the population, but this can result in highly variable measurements. (c) Identifying blocks of similar subjects within the population increases the precision (within block) and captures population variability (between blocks).

  2. In the presence of variability, the precision in sample mean can be improved by increasing the sample size, or the number of replicates in a nested design.
    Figure 2: In the presence of variability, the precision in sample mean can be improved by increasing the sample size, or the number of replicates in a nested design.

    (a) Increasing the sample size, n, improves the precision in the mean by 1/√n as measured by the s.e.m. The 95% CI is a more intuitive measure of precision: the range of values that are not significantly different at α = 0.05 from the observed mean. The 95% confidence interval (CI) shrinks as t*/√n, where t* is the critical value of the Student's t-distribution at two-tailed α = 0.05 and n – 1 degrees of freedom. t* decreases from 4.3 (n = 3) to 2.0 (n = 50). Dotted lines represent constant multiples of the s.e.m. (b) For a nested design with mouse, cell and technical variances of M = 1, C = 4, ε = 0.25 (σ2TOT = 5.25), the variance of the mean decreases with the number of replicates at each layer.

  3. A two-factor nested design scenario with different number of replicates and different variance at each layer.
    Supplementary Fig. 1: A two-factor nested design scenario with different number of replicates and different variance at each layer.

    For a given layer (e.g. mouse), also shown is the variance of the mean of observations for all deeper layers (e.g. cell + technical). Xijk corresponds to the k measurement of cell j from mouse i. Dots in the subscript correspond to average over those subscripts (e.g. Xi.. is the average of cell and technical replicates for mouse i).

References

  1. Krzywinski, M. & Altman, Nat. Methods 11, 597598 (2014).
  2. Blainey, P., Krzywinski, M. & Altman, Nat. Methods 11, 879880 (2014).
  3. Krzywinski, M. & Altman, N. Nat. Methods 11, 699700 (2014).
  4. Krzywinski, M. & Altman, N. Nat. Methods 10, 809810 (2013).

Download references

Author information

Affiliations

  1. Naomi Altman is a Professor of Statistics at The Pennsylvania State University.

  2. Martin Krzywinski is a staff scientist at Canada's Michael Smith Genome Sciences Centre.

Competing financial interests

The authors declare no competing financial interests.

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: A two-factor nested design scenario with different number of replicates and different variance at each layer. (48 KB)

    For a given layer (e.g. mouse), also shown is the variance of the mean of observations for all deeper layers (e.g. cell + technical). Xijk corresponds to the k measurement of cell j from mouse i. Dots in the subscript correspond to average over those subscripts (e.g. Xi.. is the average of cell and technical replicates for mouse i).

PDF files

  1. Supplementary Text and Figures (123 KB)

    Supplementary Figure 1 and Supplementary Note.

Additional data