Letter | Published:

Bayesian inference of ancient human demography from individual genome sequences

Nature Genetics volume 43, pages 10311034 (2011) | Download Citation

Abstract

Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108–157 thousand years ago, that Eurasians diverged from an ancestral African population 38–64 thousand years ago, and that the effective population size of the ancestors of all modern humans was 9,000.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    & The application of molecular genetic approaches to the study of human evolution. Nat. Genet. 33 (suppl.) 266–275 (2003).

  2. 2.

    et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

  3. 3.

    et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010).

  4. 4.

    , & Mitochondrial DNA and human evolution. Nature 325, 31–36 (1987).

  5. 5.

    , , , & Whole-mtDNA genome sequence analysis of ancient African lineages. Mol. Biol. Evol. 24, 757–768 (2007).

  6. 6.

    , & Features of evolution and expansion of modern humans, inferred from genomewide microsatellite markers. Am. J. Hum. Genet. 72, 1171–1186 (2003).

  7. 7.

    , , & A geographically explicit genetic model of worldwide human-settlement history. Am. J. Hum. Genet. 79, 230–237 (2006).

  8. 8.

    et al. Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc. Natl. Acad. Sci. USA 102, 18508–18513 (2005).

  9. 9.

    et al. Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci. USA 104, 17614–17619 (2007).

  10. 10.

    , & Detecting ancient admixture and estimating demographic parameters in multiple human populations. Mol. Biol. Evol. 26, 1823–1827 (2009).

  11. 11.

    , , & Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).

  12. 12.

    , , & Measurement of the human allele frequency spectrum demonstrates greater genetic drift in east Asians than in Europeans. Nat. Genet. 39, 1251–1255 (2007).

  13. 13.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

  14. 14.

    et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

  15. 15.

    et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

  16. 16.

    et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19, 1622–1629 (2009).

  17. 17.

    et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943–947 (2010).

  18. 18.

    et al. Inferring human population sizes, divergence times and rates of gene flow from mitochondrial, X and Y chromosome resequencing data. Genetics 177, 2195–2207 (2007).

  19. 19.

    et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).

  20. 20.

    & Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).

  21. 21.

    & Estimation of hominoid ancestral population sizes under Bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25, 1979–1994 (2008).

  22. 22.

    & Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158, 885–896 (2001).

  23. 23.

    Isolation with migration models for more than two populations. Mol. Biol. Evol. 27, 905–920 (2010).

  24. 24.

    , , , & Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 1103–1108 (2006).

  25. 25.

    Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum. Mutat. 21, 12–27 (2003).

  26. 26.

    The apportionment of human diversity. in Evolutionary Biology (eds. Dobzhansky, T.H., Hecht, M.K. & Steere, W.C.) 6 (Appleton-Century-Crofts, New York, New York, USA, 1972).

  27. 27.

    , & Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002).

  28. 28.

    & Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

  29. 29.

    , , & Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).

  30. 30.

    , & An accurate sequentially markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128 (2011).

  31. 31.

    & Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  32. 32.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  33. 33.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  34. 34.

    1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  35. 35.

    , & Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

  36. 36.

    et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

Download references

Acknowledgements

This research was supported by a Packard Fellowship (to A.S.), National Science Foundation grant DBI-0644111 and National Institutes of Health training grant T32HD052471 (to C.G.D.). We thank S. Schuster, W. Miller, D. Reich, G. Coop, J. Hey, J. Wall, R.S. Wells, A. Keinan, A.G. Clark, S.C. Choi, C.D. Bustamante, B. Henn and others for helpful discussions and feedback.

Author information

Affiliations

  1. Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA.

    • Ilan Gronau
    • , Melissa J Hubisz
    • , Charles G Danko
    •  & Adam Siepel
  2. Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA.

    • Brad Gulko

Authors

  1. Search for Ilan Gronau in:

  2. Search for Melissa J Hubisz in:

  3. Search for Brad Gulko in:

  4. Search for Charles G Danko in:

  5. Search for Adam Siepel in:

Contributions

A.S. conceived of and designed the study. I.G. implemented G-PhoCS and applied it to both simulated and real data. B.G. implemented BSNP and applied it to the individual genomes. I.G., M.J.H., B.G., C.G.D. and A.S. performed additional statistical analyses. I.G. and A.S. wrote the paper with review and contributions by all authors.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Adam Siepel.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–13, Supplementary Tables 1–7 and Supplementary Note.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.937

Further reading