Points of Significance: Sampling distributions and the bootstrap

Journal name:
Nature Methods
Volume:
12,
Pages:
477–478
Year published:
DOI:
doi:10.1038/nmeth.3414
Published online

Abstract

The bootstrap can be used to assess uncertainty of sample estimates.

At a glance

Figures

  1. Sampling distributions of estimators can be used to predict the precision and accuracy of estimates of population characteristics.
    Figure 1: Sampling distributions of estimators can be used to predict the precision and accuracy of estimates of population characteristics.

    (a) The shape of the distribution of estimates can be used to evaluate the performance of the estimator. The population distribution shown is standard normal (μ = 0, σ = 1). The sampling distribution of the sample means estimator is shown in red (this particular estimator is known to be normal with σ = 1/√n for sample size n). (b) Precision can be measured by the s.d. of the sampling distribution (which is defined as the standard error, s.e.). Estimators whose distribution is not centered on the true value are biased. Bias can be assessed if the true value (red point) is available. Error bars show s.d.

  2. The Luria-Delbruck experiment studied the mechanism by which bacteria acquired mutations that conferred resistance to a virus.
    Figure 2: The Luria-Delbrück experiment studied the mechanism by which bacteria acquired mutations that conferred resistance to a virus.

    (a) Bacteria are grown for t generations in the absence of the virus, and N cells are plated onto medium containing the virus. Those with resistance mutations survive. (b) The relationship between the mean and variation in the number of cells in each culture depends on the mutation mechanism. (c) Simulated distributions of cell counts for both processes shown in a using 10,000 cultures and mutation rates (0.49 induced, 0.20 spontaneous) that yield equal count means. Induced mutations occur in the medium (at t = 4). Spontaneous mutations can occur at each of the t = 4 generations. Points and error bars are mean and s.d. of simulated distributions (3.92 ± 2.07 spontaneous, 3.92 ± 1.42 induced). For a small number of generations, the induced model distribution is binomial and approaches Poisson when t is large and rate is small.

  3. The sampling distribution of complex quantities such as the variance-to-mean ratio (VMR) can be generated from observed data using the bootstrap.
    Figure 3: The sampling distribution of complex quantities such as the variance-to-mean ratio (VMR) can be generated from observed data using the bootstrap.

    (a) A source sample (n = 25, mean = 5.48, variance = 55.3, VMR = 10.1), generated from negative binomial distribution (μ = 5, σ2 = 50, VMR = 10), was used to simulate four samples (hollow circles) with parametric (blue) and nonparametric bootstrap (red). (b) VMR sampling distributions generated from parametric (blue) and nonparametric (red) bootstrap of 10,000 samples (n = 25) simulated from source samples drawn from two different distributions: negative binomial and bimodal, both with μ = 5 and σ2 = 50, shown as black histograms with the source samples shown below. Points and error bars show mean and s.d. of the respective sampling distributions of VMR. Values beside error bars show s.d.

References

  1. Krzywinski, M. & Altman, N. Nat. Methods 10, 809810 (2013).
  2. Luria, S.E. & Delbrück, M. Genetics 28, 491511 (1943).

Download references

Author information

Affiliations

  1. Anthony Kulesa is a graduate student in the Department of Biological Engineering at MIT.

  2. Martin Krzywinski is a staff scientist at Canada's Michael Smith Genome Sciences Centre.

  3. Paul Blainey is an Assistant Professor of Biological Engineering at MIT and a Core Member of the Broad Institute.

  4. Naomi Altman is a Professor of Statistics at The Pennsylvania State University.

Competing financial interests

The authors declare no competing financial interests.

Author details

Additional data