Technical Report | Published:

Inferring human population size and separation history from multiple genome sequences

Nature Genetics volume 46, pages 919925 (2014) | Download Citation

Abstract

The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20,000–30,000 years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. The dawn of human matrilineal diversity. Am. J. Hum. Genet. 82, 1130–1140 (2008).

  2. 2.

    et al. Complete mitochondrial genomes reveal neolithic expansion into Europe. PLoS ONE 7, e32473 (2012).

  3. 3.

    et al. A predominantly neolithic origin for European paternal lineages. PLoS Biol. 8, e1000285 (2010).

  4. 4.

    , & mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory. Mol. Biol. Evol. 25, 468–474 (2008).

  5. 5.

    & Approximating the coalescent with recombination. Phil. Trans. R. Soc. Lond. B 360, 1387–1393 (2005).

  6. 6.

    & Fast “coalescent” simulation. BMC Genet. 7, 16 (2006).

  7. 7.

    & Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

  8. 8.

    , & An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128 (2011).

  9. 9.

    , & Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013).

  10. 10.

    , & A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol. 87, 51–61 (2013).

  11. 11.

    et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).

  12. 12.

    Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am. J. Phys. Anthropol. 128, 415–423 (2005).

  13. 13.

    & Generation time and effective population size in Polar Eskimos. Proc. Biol. Sci. 275, 1501–1508 (2008).

  14. 14.

    et al. Rate of de novo mutations and the importance of father's age to disease risk. Nature 488, 471–475 (2012).

  15. 15.

    et al. Estimating the human mutation rate using autozygosity in a founder population. Nat. Genet. 44, 1277–1281 (2012).

  16. 16.

    1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  17. 17.

    & Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 13, 745–753 (2012).

  18. 18.

    et al. A revised timescale for human evolution based on ancient mitochondrial genomes. Curr. Biol. 23, 553–559 (2013).

  19. 19.

    et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).

  20. 20.

    et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  21. 21.

    , & The great human expansion. Proc. Natl. Acad. Sci. USA 109, 17758–17764 (2012).

  22. 22.

    Going east: new genetic and archaeological perspectives on the modern human colonization of Eurasia. Science 313, 796–800 (2006).

  23. 23.

    Why did modern human populations disperse from Africa ca. 60,000 years ago? A new model. Proc. Natl. Acad. Sci. USA 103, 9381–9386 (2006).

  24. 24.

    et al. Late Pleistocene climate change and the global expansion of anatomically modern humans. Proc. Natl. Acad. Sci. USA 109, 16089–16094 (2012).

  25. 25.

    & The human genetic history of the Americas: the final frontier. Curr. Biol. 20, R202–R207 (2010).

  26. 26.

    , & The late Pleistocene dispersal of modern humans in the Americas. Science 319, 1497–1502 (2008).

  27. 27.

    et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl. Acad. Sci. USA 110, 11791–11796 (2013).

  28. 28.

    et al. Genetic and demographic implications of the Bantu expansion: insights from human paternal lineages. Mol. Biol. Evol. 26, 1581–1589 (2009).

  29. 29.

    et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

  30. 30.

    et al. Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl. Acad. Sci. USA 111, 2632–2637 (2014).

  31. 31.

    et al. Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool. Am. J. Hum. Genet. 91, 83–96 (2012).

  32. 32.

    et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

  33. 33.

    et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA 108, 11983–11988 (2011).

  34. 34.

    et al. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166, 351–372 (2004).

  35. 35.

    et al. Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nat. Genet. 39, 1251–1255 (2007).

  36. 36.

    et al. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583 (2005).

  37. 37.

    et al. Inferring human population sizes, divergence times and rates of gene flow from mitochondrial, X and Y chromosome resequencing data. Genetics 177, 2195–2207 (2007).

  38. 38.

    & Possible ancestral structure in human populations. PLoS Genet. 2, e105 (2006).

  39. 39.

    et al. Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci. USA 104, 17614–17619 (2007).

  40. 40.

    & Demography and the age of rare variants (2014).

  41. 41.

    & Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genet. 9, e1003521 (2013).

  42. 42.

    et al. The southern route “out of Africa”: evidence for an early expansion of modern humans into Arabia. Science 331, 453–456 (2011).

  43. 43.

    et al. Middle Paleolithic assemblages from the Indian subcontinent before and after the Toba super-eruption. Science 317, 114–116 (2007).

  44. 44.

    & Evolution of the primate lineage leading to modern humans: phylogenetic and demographic inferences from DNA sequences. Proc. Natl. Acad. Sci. USA 94, 4811–4815 (1997).

  45. 45.

    & Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304 (2000).

  46. 46.

    et al. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43, 1031–1034 (2011).

  47. 47.

    , & Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).

  48. 48.

    1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  49. 49.

    , , , & . Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93, 687–696 (2013).

  50. 50.

    et al. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation. Am. J. Hum. Genet. 91, 660–671 (2012).

  51. 51.

    , & Fast and flexible simulation of DNA sequence data. Genome Res. 19, 136–142 (2009).

  52. 52.

    , , & Biological Sequence Analysis: Probabalistic Models of Proteins and Nucleic Acids (Cambridge University Press, Cambridge, UK, 1998).

  53. 53.

    A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).

Download references

Acknowledgements

We thank A. Scally for useful comments and discussion, in particular on interpreting population divergence estimates, and the Durbin group for general discussion. S.S. thanks A. Fischer for helpful support with HMM implementation details. We thank J. Kidd, S. Gravel and C. Bustamante for making ancestry tracts for the MXL individuals available to us. S.S. acknowledges grant support from an EMBO (European Molecular Biology Organization) long-term fellowship. This work was funded by Wellcome Trust grant 098051.

Author information

Affiliations

  1. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.

    • Stephan Schiffels
    •  & Richard Durbin

Authors

  1. Search for Stephan Schiffels in:

  2. Search for Richard Durbin in:

Contributions

R.D. proposed the basic strategy and designed the overall study. S.S. developed the theory, implemented the algorithm and obtained results. S.S. and R.D. analyzed the results and wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Stephan Schiffels or Richard Durbin.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–10, Supplementary Tables 1–4 and Supplementary Note

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3015

Further reading