Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Exploring population size changes using SNP frequency spectra

A Corrigendum to this article was published on 27 August 2015

This article has been updated

Abstract

Inferring demographic history is an important task in population genetics. Many existing inference methods are based on predefined simplified population models, which are more suitable for hypothesis testing than exploratory analysis. We developed a novel model-flexible method called stairway plot, which infers changes in population size over time using SNP frequency spectra. This method is applicable for whole-genome sequences of hundreds of individuals. Using extensive simulation, we demonstrate the usefulness of the method for inferring demographic history, especially recent changes in population size. We apply the method to the whole-genome sequence data of 9 populations from the 1000 Genomes Project and show a pattern of fluctuations in human populations from 10,000 to 200,000 years ago.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Illustration of the multi-epoch model.
Figure 2: Comparing the inferred histories of the stairway plot and the PSMC method using simulated samples on the basis of six different models.
Figure 3: Inferred histories of nine populations.

Change history

  • 14 August 2015

    In the version of this article initially published, the authors neglected to acknowledge one of the funding sources for their study. The acknowledgements should have recognized support from Chinese NSF grant 91231120 in addition to the other funding sources listed. The error has been corrected in the HTML and PDF versions of the article.

  • 27 August 2015

    A Correction to this paper has been published: https://doi.org/10.1038/ng0915-1099a

References

  1. Keinan, A. & Clark, A.G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Schaffner, S.F. et al. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Kryukov, G.V., Shpunt, A., Stamatoyannopoulos, J.A. & Sunyaev, S.R. Power of deep, all-exon resequencing for discovery of human trait genes. Proc. Natl. Acad. Sci. USA 106, 3871–3876 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA 108, 11983–11988 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Pybus, O.G., Rambaut, A. & Harvey, P.H. An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155, 1429–1437 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Strimmer, K. & Pybus, O.G. Exploring the demographic history of DNA sequences using the generalized skyline plot. Mol. Biol. Evol. 18, 2298–2305 (2001).

    Article  CAS  PubMed  Google Scholar 

  8. Drummond, A.J., Rambaut, A., Shapiro, B. & Pybus, O.G. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192 (2005).

    Article  CAS  PubMed  Google Scholar 

  9. Opgen-Rhein, R., Fahrmeir, L. & Strimmer, K. Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo. BMC Evol. Biol. 5, 6 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Minin, V.N., Bloomquist, E.W. & Suchard, M.A. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25, 1459–1471 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Heled, J. & Drummond, A.J. Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8, 289 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Gill, M.S. et al. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013).

    Article  CAS  PubMed  Google Scholar 

  13. Ho, S.Y.W. & Shapiro, B. Skyline-plot methods for estimating demographic history from nucleotide sequences. Mol. Ecol. Resour. 11, 423–434 (2011).

    Article  PubMed  Google Scholar 

  14. Atkinson, Q.D., Gray, R.D. & Drummond, A.J. Bayesian coalescent inference of major human mitochondrial DNA haplogroup expansions in Africa. Proc. Biol. Soc. 276, 367–373 (2009).

    Article  Google Scholar 

  15. Gignoux, C.R., Henn, B.M. & Mountain, J.L. Rapid, global demographic expansions after the origins of agriculture. Proc. Natl. Acad. Sci. USA 108, 6044–6049 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Sheehan, S., Harris, K. & Song, Y.S. Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Liu, X., Fu, Y.-X., Maxwell, T.J. & Boerwinkle, E. Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error. Genome Res. 20, 101–109 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Nielsen, R. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154, 931–942 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Hudson, R.R. Two-locus sampling distributions and their application. Genetics 159, 1805–1817 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Boitard, S., Schlötterer, C., Nolte, V., Pandey, R.V. & Futschik, A. Detecting selective sweeps from pooled next-generation sequencing samples. Mol. Biol. Evol. 29, 2177–2186 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Gayral, P. et al. Reference-free population genomics from next-generation transcriptome data and the vertebrate-invertebrate gap. PLoS Genet. 9, e1003457 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  24. Palo, J.U., Ulmanen, I., Lukka, M., Ellonen, P. & Sajantila, A. Genetic markers and population history: Finland revisited. Eur. J. Hum. Genet. 17, 1336–1346 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Hudson, R.R. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002).

    Article  CAS  PubMed  Google Scholar 

  26. Chen, G.K., Marjoram, P. & Wall, J.D. Fast and flexible simulation of DNA sequence data. Genome Res. 19, 136–142 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Kong, A. et al. Rate of de novo mutations and the importance of father's age to disease risk. Nature 488, 471–475 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Campbell, C.D. et al. Estimating the human mutation rate using autozygosity in a founder population. Nat. Genet. 44, 1277–1281 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Conrad, D.F. et al. Variation in genome-wide mutation rates within and between human families. Nat. Genet. 43, 712–714 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Scally, A. & Durbin, R. Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 13, 745–753 (2012).

    Article  CAS  PubMed  Google Scholar 

  31. Boyko, A.R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Pruitt, K.D., Tatusova, T., Brown, G.R. & Maglott, D.R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012).

    Article  CAS  PubMed  Google Scholar 

  33. Lachance, J. et al. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers. Cell 150, 457–469 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Matsumura, S. & Forster, P. Generation time and effective population size in Polar Eskimos. Proc. Biol. Soc. 275, 1501–1508 (2008).

    Article  Google Scholar 

  35. Langergraber, K.E. et al. Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proc. Natl. Acad. Sci. USA 109, 15716–15721 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Awadalla, P. et al. Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. Am. J. Hum. Genet. 87, 316–324 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Fenner, J.N. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am. J. Phys. Anthropol. 128, 415–423 (2005).

    Article  PubMed  Google Scholar 

  38. Michaelson, J.J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–1442 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Garrigan, D. & Hammer, M.F. Reconstructing human origins in the genomic era. Nat. Rev. Genet. 7, 669–680 (2006).

    Article  CAS  PubMed  Google Scholar 

  40. Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Polanski, A. & Kimmel, M. New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. Genetics 165, 427–436 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Fu, Y.X. Statistical properties of segregating sites. Theor. Popul. Biol. 48, 172–197 (1995).

    Article  CAS  PubMed  Google Scholar 

  43. Pedersen, M.E.H. Tuning and Simplifying Heuristical Optimization. PhD thesis, Univ. Southampton (2010).

  44. Storn, R. & Price, K. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11, 341–359 (1997).

    Article  Google Scholar 

  45. Davidon, W.C. Variable metric method for minimization. SIAM J. Optim. 1, 1–17 (1991).

    Article  Google Scholar 

  46. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the Texas Advanced Computing Center (TACC; https://www.tacc.utexas.edu/) at the University of Texas at Austin for providing high-performance computing resources that have contributed to the research results reported within this paper. The authors acknowledge the 1000 Genomes Project Consortium for generating and providing the DNA sequence resources used in this study. The authors thank S. Barton for copyediting the manuscript. The authors thank S. Xu, Y. Guan, T.J. Maxwell, F. Yu and E. Boerwinkle for helpful discussions and encouragement. This study was supported by US National Institutes of Health grants 1U01HG005728 and 2U54HG003273 and by Chinese NSF grant 91231120.

Author information

Authors and Affiliations

Authors

Contributions

X.L. designed the study, developed the method, conducted the analyses and wrote the manuscript. Y.-X.F. provided critical advice on methodology development and result interpretation.

Corresponding author

Correspondence to Xiaoming Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Comparing the inferred histories of the stairway plot and the PSMC method using simulated samples based on five models.

(a) PSMC “sim-1” model. (b) PSMC “sim-2” model. (c) PSMC “sim-3” model. (d) PSMC “sim-YH” model. (e) Population 1 of the population split model. (f) Population 2 of the population split model. We assumed a mutation rate of 1.2 × 10–8 mutations per base pair per generation and a generation time of 24 years. Thin black lines, true models. Thick orange lines, medians of the inferred histories of the stairway plot; thin orange lines, 2.5 and 97.5 percentiles of the inferred histories of the stairway plot. Thick green lines, medians of the inferred histories of the PSMC method; thin green lines, 2.5 and 97.5 percentiles of the inferred histories of the PSMC method. n is number of simulated sequences, and L is the length of the simulated sequences. Dotted lines in e and f indicate the time point at which population 1 and population 2 split.

Supplementary Figure 2 Effects of sequence length, sample size and recombination rate on the inferences of the stairway plot.

We assumed a mutation rate of 1.2 × 10–8 mutations per base pair per generation and a generation time of 24 years. Sequences were simulated on the basis of the complex model I (see the Supplementary Note for details) assuming different sequence lengths (top row, ac), sample sizes (middle row, a,d,e) or recombination rates (bottom row, a,f,g). Thin black lines, true models. Thick orange lines, medians of the inferred histories of the stairway plot; thin orange lines, 2.5 and 97.5 percentiles of the inferred histories of the stairway plot. N is the number of simulated sequences, L is the length of the simulated sequences, r is the recombination rate per site and t is the mutation rate per site.

Supplementary Figure 3 Simulation experiments on the unobserved bottlenecks of the 1000 Genomes Project populations.

(a) YRI model. (b) CEU-LWK hybrid model. (c) FIN-LWK hybrid model. (d) FIN-CEU-LWK hybrid model. Thin black lines, true models. Thick orange lines, medians of the inferred histories of the stairway plot; thin orange lines, 2.5 and 97.5 percentiles of the inferred histories of the stairway plot.

Supplementary Figure 4 Stairway plot estimations (medians) of the nine 1000 Genomes Project populations.

The bottlenecks after 200,000 years ago are likely artificial bottlenecks.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4 and Supplementary Note. (PDF 1448 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Fu, YX. Exploring population size changes using SNP frequency spectra. Nat Genet 47, 555–559 (2015). https://doi.org/10.1038/ng.3254

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3254

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing