Quantitative evolutionary dynamics using high-resolution lineage tracking

Journal name:
Nature
Volume:
519,
Pages:
181–186
Date published:
DOI:
doi:10.1038/nature14279
Received
Accepted
Published online
Corrected online

Abstract

Evolution of large asexual cell populations underlies ~30% of deaths worldwide, including those caused by bacteria, fungi, parasites, and cancer. However, the dynamics underlying these evolutionary processes remain poorly understood because they involve many competing beneficial lineages, most of which never rise above extremely low frequencies in the population. To observe these normally hidden evolutionary dynamics, we constructed a sequencing-based ultra high-resolution lineage tracking system in Saccharomyces cerevisiae that allowed us to monitor the relative frequencies of ~500,000 lineages simultaneously. In contrast to some expectations, we found that the spectrum of fitness effects of beneficial mutations is neither exponential nor monotonic. Early adaptation is a predictable consequence of this spectrum and is strikingly reproducible, but the initial small-effect mutations are soon outcompeted by rarer large-effect mutations that result in variability between replicates. These results suggest that early evolutionary dynamics may be deterministic for a period of time before stochastic effects become important.

At a glance

Figures

  1. Lineage tracking with random barcodes.
    Figure 1: Lineage tracking with random barcodes.

    a, Typical lineage trajectories. A small lineage that does not acquire a beneficial mutation (neutral, blue) will fluctuate in size due to drift before eventually being outcompeted. Rarely, a lineage will acquire a beneficial mutation (star) with a fitness effect of s (adaptive, red). In most cases, this beneficial mutation is lost to drift. If the beneficial mutants drift to a size >~1/s (lower dotted horizontal line), the lineage will begin to grow exponentially at a rate s. Extrapolating the exponential growth to the time at which the mutation is inferred to have reach a size ~1/s yields the establishment time (τ, dashed vertical line) which roughly corresponds to the time when the mutation occurred with an uncertainty of ~1/s. At sizes > ~1/Ub (upper dotted horizontal line), where Ub is the total beneficial mutation rate, the lineage will acquire additional beneficial mutations. b, Barcode insertion and sequencing. Left, sequences containing random 20 nucleotide barcodes (colours) are inserted first into a plasmid and then into a specific location in the genome. Bottom, recombination between two partially crippled loxP sites (loxP*) integrates the plasmid into the genome and completes a URA3 selectable marker, resulting in one functional and one crippled loxP site (loxP*URA3 marker is interrupted by an artificial intron containing the barcode. Right, to measure relative fitness, cells are passed through growth-bottleneck cycles of ~8 generations. Before each bottleneck, genomic DNA is extracted, lineage barcode tags are amplified using a two-step PCR protocol, and amplicons are sequenced. By inserting unique molecular identifiers49 (also short random barcodes, grey bars) in early cycles of the PCR, PCR duplicates of the same template molecule (purple) are detected49, 50.

  2. Inferring the fitnesses and establishment times from lineage trajectories.
    Figure 2: Inferring the fitnesses and establishment times from lineage trajectories.

    a, Selected lineage trajectories from E1 coloured according to the probability that they contain an established beneficial mutation. The decline of adaptive lineages at later times is caused by the increase of the population mean fitness (inset). The population mean fitness is inferred from both the decline of neutral lineages (blue circles) and the growth of beneficial lineages (red line, Supplementary Information section 6.2). Shading indicates the error in mean fitness. b, c, The inferred fitnesses (b) and establishment times (c) from analysis of simulated trajectories correlate strongly with the known simulated values. d, Scatter plot of the fitness of 33 clones picked from E2 at generation 88 inferred by sequencing and pairwise competition (colouring as in (a), with outliers lightened in colour and excluded from correlation). Error bars represent one standard deviation.

  3. Fitness effects, establishment times, and population dynamics.
    Figure 3: Fitness effects, establishment times, and population dynamics.

    a, Scatter plot of τ and s of all ~25,000 beneficial mutations (circles) identified in E1. Circle area represents the size of the lineage at generation 88. Purple circles indicate lineages with mutations that occurred in the period of common growth (t < 0) that were sampled into, and established in, E1 and E2. Green circles indicate lineages that were identified as adaptive in only one replicate and likely contain mutations that arose after t = 0. Lines indicate the time limits before which mutations must occur in order to establish (large dash) or be observed (small dash). These limits trail the mean fitness (solid line) by ~1/s generations. Inset, the spectrum of mutation rates, μ(s), as a function of fitness effect, s inferred from mutations that likely occurred after t = 0 (Supplementary Information section 10.2). The y axis is the mutation rate density, so the mutation rate to a range, Δs, is obtained by multiplying this density by Δs. The total beneficial mutation rate to s > 5% is inferred to be ~1 × 10−6 and is consistent across replicates. The observed spectrum is not exponential (grey line, with the error range shaded). b, the distribution of the number of adaptive cells binned by their fitness over time. As the mean fitness (grey curtain) surpasses the fitness of a subpopulation, cells with that fitness begin to decline in frequency.

  4. The need for high frequency resolution.
    Figure 4: The need for high frequency resolution.

    The fitness spectrum of adaptive lineages in replicate E1 that could be identified within the first 100 generations at different frequency resolution thresholds.

  5. Total population size over time.
    Extended Data Fig. 1: Total population size over time.

    A single ancestral cell is grown for ~32 generations to ~1010 cells before barcodes are inserted. Cells that incorporate a barcode are grown for another 16 generations. The population is then divided into two replicates (E1 and E2) at t = 0. Beneficial mutations that occurred before barcoding can be sampled into both replicates.

  6. Inferring the fitnesses and establishment times from lineage trajectories.
    Extended Data Fig. 2: Inferring the fitnesses and establishment times from lineage trajectories.

    a, Selected lineage trajectories and the mean fitness trajectory from replicate E2. b, The distribution of lineage sizes over time, for lineages that begin with ~100 ± 2 cells (vertical line). Adaptive lineages (red) begin to expand above the neutral expectation (black curve) and push neutral lineages to lower cell numbers (blue). c, The posterior probability distribution over s and τ for an adaptive lineage in E2. d, The measured trajectory of this lineage in E1 (unadaptive, blue circles) and E2 (adaptive, red circles) compared with the predicted trajectory with largest probability in E1 (blue line) and E2 (red line).

  7. Fitness effects and establishment times for replicate E2.
    Extended Data Fig. 3: Fitness effects and establishment times for replicate E2.

    a, Scatter plot of τ and s of all ~14,000 beneficial mutations (circles) identified in E2. Circle area represents the size of the lineage at generation 88. Purple circles indicate lineages with mutations that occurred in the period of common growth (t < 0) that were sampled into, and established in, E1 and E2. Green circles indicate lineages that were identified as adaptive in only one replicate and likely contain mutations that arose after t = 0. Lines indicate the time limits before which mutations must occur in order to establish (large dash) or be observed (small dash). These limits trail the mean fitness (solid line) by ~1/s generations. Inset, the spectrum of mutation rates, μ(s), as a function of fitness effect, s inferred from mutations that likely occurred after t = 0 (Supplementary Information section 10.2). The y axis is the mutation rate density, so the mutation rate to a range, Δs, is obtained by multiplying this by Δs. The total beneficial mutation rate to s > 5% is inferred to be ~1 × 10−6 and is consistent across replicates. The observed spectrum is not exponential (grey line, with the error range shaded). b, The distribution of the number of adaptive cells binned by their fitness over time. As the mean fitness (grey curtain) surpasses the fitness of a subpopulation, cells with that fitness begin to decline in frequency.

Change history

Corrected online 11 March 2015
A minor change was made to the Acknowledgements.

References

  1. Kvitek, D. J. & Sherlock, G. Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS Genet. 9, e1003972 (2013)
  2. Herron, M. D. & Doebeli, M. Parallel evolutionary dynamics of adaptive diversification in Escherichia coli. PLoS Biol. 11, e1001490 (2013)
  3. Lang, G. I. et al. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500, 571574 (2013)
  4. Lang, G. I., Botstein, D. & Desai, M. M. Genetic variation and the fate of beneficial mutations in asexual populations. Genetics 188, 647661 (2011)
  5. Ding, L. et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature 464, 9991005 (2010)
  6. Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461, 809813 (2009)
  7. Mardis, E. R. et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med. 361, 10581066 (2009)
  8. International Cancer Genome Consortium et al. International network of cancer genome projects. Nature 464, 993998 (2010)
  9. Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191196 (2010)
  10. Weinreich, D. M., Delaney, N. F., DePristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111114 (2006)
  11. Young, B. C. et al. Evolutionary dynamics of Staphylococcus aureus during progression from carriage to disease. Proc. Natl Acad. Sci. USA 109, 45504555 (2012)
  12. Holden, M. T. G. et al. A genomic portrait of the emergence, evolution, and global spread of a methicillin-resistant Staphylococcus aureus pandemic. Genome Res. 23, 653664 (2013)
  13. Desai, M. M., Walczak, A. M. & Fisher, D. S. Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics 193, 565585 (2013)
  14. Neher, R. A. & Hallatschek, O. Genealogies of rapidly adapting populations. Proc. Natl Acad. Sci. USA 110, 437442 (2013)
  15. Hegreness, M., Shoresh, N., Hartl, D. & Kishony, R. An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 16151617 (2006)
  16. Kao, K. C. & Sherlock, G. Molecular characterization of clonal interference during adaptive evolution in asexual populations of Saccharomyces cerevisiae. Nature Genet. 40, 14991504 (2008)
  17. Imhof, M. & Schlötterer, C. Fitness effects of advantageous mutations in evolving Escherichia coli populations. Proc. Natl Acad. Sci. USA 98, 11131117 (2001)
  18. Gerrits, A. et al. Cellular barcoding tool for clonal analysis in the hematopoietic system. Blood 115, 26102618 (2010)
  19. Desai, M. M. & Fisher, D. S. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics 176, 17591798 (2007)
  20. Charlesworth, B. The good fairy godmother of evolutionary genetics. Curr. Biol. 6, 220 (1996)
  21. Berns, K. et al. A large-scale RNAi screen in human cells identifies new components of the p53 pathway. Nature 428, 431437 (2004)
  22. Smith, A. M. et al. Quantitative phenotyping via deep barcode sequencing. Genome Res. 19, 18361842 (2009)
  23. Lu, R., Neff, N. F., Quake, S. R. & Weissman, I. L. Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nature Biotechnol. 29, 928933 (2011)
  24. Blundell, J. R. & Levy, S. F. Beyond genome sequencing: lineage tracking with barcodes to study the dynamics of evolution, infection, and cancer. Genomics 104, 417430 (2014)
  25. Sternberg, N. & Hamilton, D. Bacteriophage P1 site-specific recombination. J. Mol. Biol. 150, 467486 (1981)
  26. Austin, S., Ziese, M. & Sternberg, N. A novel role for site-specific recombination in maintenance of bacterial replicons. Cell 25, 729736 (1981)
  27. Gerrish, P. J. & Lenski, R. E. The fate of competing beneficial mutations in an asexual population. Genetica 102–103, 127144 (1998)
  28. Luria, S. E. & Delbrück, M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491511 (1943)
  29. Lang, G. I. & Murray, A. W. Estimating the per-base-pair mutation rate in the yeast Saccharomyces cerevisiae. Genetics 178, 6782 (2008)
  30. Lynch, M. et al. A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl Acad. Sci. USA 105, 92729277 (2008)
  31. Zhu, Y. O., Siegal, M. L., Hall, D. W. & Petrov, D. A. Precise estimates of mutation rate and spectrum in yeast. Proc. Natl Acad. Sci. USA 111, E2310E2318 (2014)
  32. Joseph, S. B. & Hall, D. W. Spontaneous mutations in diploid Saccharomyces cerevisiae: more beneficial than expected. Genetics 168, 18171825 (2004)
  33. Desai, M. M., Fisher, D. S. & Murray, A. W. The speed of evolution and maintenance of variation in asexual populations. Curr. Biol. 17, 385394 (2007)
  34. Gillespie, J. H. Molecular evolution over the mutational landscape. Evolution 38, 11161129 (1984)
  35. Orr, H. A. The distribution of fitness effects among beneficial mutations. Genetics 163, 15191526 (2003)
  36. Kassen, R. & Bataillon, T. Distribution of fitness effects among beneficial mutations before selection in experimental populations of bacteria. Nature Genet. 38, 484488 (2006)
  37. Rokyta, D. R., Joyce, P., Caudle, S. B. & Wichman, H. A. An empirical test of the mutational landscape model of adaptation using a single-stranded DNA virus. Nature Genet. 37, 441444 (2005)
  38. Rokyta, D. R. et al. Beneficial fitness effects are not exponential for two viruses. J. Mol. Evol. 67, 368376 (2008)
  39. Gresham, D. et al. The repertoire and dynamics of evolutionary adaptations to controlled nutrient-limited environments in yeast. PLoS Genet. 4, e1000303 (2008)
  40. Good, B. H., Rouzine, I. M., Balick, D. J., Hallatschek, O. & Desai, M. M. Distribution of fixed beneficial mutations and the rate of adaptation in asexual populations. Proc. Natl Acad. Sci. USA 109, 49504955 (2012)
  41. Salmon, S. E. & Smith, B. A. Immunoglobulin synthesis and total body tumor cell number in IgG multiple myeloma. J. Clin. Invest. 49, 11141121 (1970)
  42. Michaelson, J. S. et al. Predicting the survival of patients with breast carcinoma using tumor size. Cancer 95, 713723 (2002)
  43. König, C., Simmen, H. P. & Blaser, J. Bacterial concentrations in pus and infected peritoneal fluid–implications for bactericidal activity of antibiotics. J. Antimicrob. Chemother. 42, 227232 (1998)
  44. Wilson, M. L. & Gaido, L. Laboratory diagnosis of urinary tract infections in adult patients. Clin. Infect. Dis. 38, 11501158 (2004)
  45. Thomas, C. E., Ehrhardt, A. & Kay, M. A. Progress and problems with the use of viral vectors for gene therapy. Nature Rev. Genet. 4, 346358 (2003)
  46. Bushman, F. et al. Genome-wide analysis of retroviral DNA integration. Nature Rev. Microbiol. 3, 848858 (2005)
  47. Ran, F. A. et al. Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154, 13801389 (2013)
  48. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823826 (2013)
  49. Kivioja, T. et al. Counting absolute numbers of molecules using unique molecular identifiers. Nature Methods 9, 7274 (2011)
  50. Lundberg, D. S., Yourstone, S., Mieczkowski, P., Jones, C. D. & Dangl, J. L. Practical innovations for high-throughput amplicon sequencing. Nature Methods 10, 9991002 (2013)

Download references

Author information

  1. These authors contributed equally to this work.

    • Sasha F. Levy &
    • Jamie R. Blundell

Affiliations

  1. Department of Genetics, Stanford University, Stanford, California 94305-5120, USA

    • Sasha F. Levy &
    • Gavin Sherlock
  2. Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794–5252, USA

    • Sasha F. Levy
  3. Department of Biochemistry and Cellular Biology, Stony Brook University, Stony Brook, New York 11794–5215, USA

    • Sasha F. Levy
  4. Department of Applied Physics, Stanford University, Stanford, California 94305, USA

    • Jamie R. Blundell &
    • Daniel S. Fisher
  5. Department of Biology, Stanford University, Stanford, California 94305, USA

    • Jamie R. Blundell,
    • Sandeep Venkataram,
    • Dmitri A. Petrov &
    • Daniel S. Fisher

Contributions

S.F.L. conceived of the barcoding system. S.F.L. and G.S. designed the barcoding system and evolution experiments. S.F.L., J.R.B., D.A.P., G.S. and D.S.F. developed the project vision. S.F.L. performed the barcoding and evolution experiments. S.V. and D.A.P. designed the pairwise competition assays. S.V. performed the pairwise competition assays. J.R.B. and D.S.F. developed theory and analysed the data. J.R.B. and S.F.L. wrote the paper. All authors edited the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Total population size over time. (91 KB)

    A single ancestral cell is grown for ~32 generations to ~1010 cells before barcodes are inserted. Cells that incorporate a barcode are grown for another 16 generations. The population is then divided into two replicates (E1 and E2) at t = 0. Beneficial mutations that occurred before barcoding can be sampled into both replicates.

  2. Extended Data Figure 2: Inferring the fitnesses and establishment times from lineage trajectories. (409 KB)

    a, Selected lineage trajectories and the mean fitness trajectory from replicate E2. b, The distribution of lineage sizes over time, for lineages that begin with ~100 ± 2 cells (vertical line). Adaptive lineages (red) begin to expand above the neutral expectation (black curve) and push neutral lineages to lower cell numbers (blue). c, The posterior probability distribution over s and τ for an adaptive lineage in E2. d, The measured trajectory of this lineage in E1 (unadaptive, blue circles) and E2 (adaptive, red circles) compared with the predicted trajectory with largest probability in E1 (blue line) and E2 (red line).

  3. Extended Data Figure 3: Fitness effects and establishment times for replicate E2. (229 KB)

    a, Scatter plot of τ and s of all ~14,000 beneficial mutations (circles) identified in E2. Circle area represents the size of the lineage at generation 88. Purple circles indicate lineages with mutations that occurred in the period of common growth (t < 0) that were sampled into, and established in, E1 and E2. Green circles indicate lineages that were identified as adaptive in only one replicate and likely contain mutations that arose after t = 0. Lines indicate the time limits before which mutations must occur in order to establish (large dash) or be observed (small dash). These limits trail the mean fitness (solid line) by ~1/s generations. Inset, the spectrum of mutation rates, μ(s), as a function of fitness effect, s inferred from mutations that likely occurred after t = 0 (Supplementary Information section 10.2). The y axis is the mutation rate density, so the mutation rate to a range, Δs, is obtained by multiplying this by Δs. The total beneficial mutation rate to s > 5% is inferred to be ~1 × 10−6 and is consistent across replicates. The observed spectrum is not exponential (grey line, with the error range shaded). b, The distribution of the number of adaptive cells binned by their fitness over time. As the mean fitness (grey curtain) surpasses the fitness of a subpopulation, cells with that fitness begin to decline in frequency.

Supplementary information

PDF files

  1. Supplementary Information (23.2 MB)

    This file contains Supplementary Text and Data, Supplementary Tables 1-2, Supplementary Figures 1-49 and Supplementary references – see contents page for more details.

Additional data