Bayesian inference of ancient human demography from individual genome sequences

Journal name:
Nature Genetics
Volume:
43,
Pages:
1031–1034
Year published:
DOI:
doi:10.1038/ng.937
Received
Accepted
Published online

Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108–157 thousand years ago, that Eurasians diverged from an ancestral African population 38–64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ~9,000.

At a glance

Figures

  1. Population phylogeny and genealogies.
    Figure 1: Population phylogeny and genealogies.

    The population phylogeny assumed in this study with one diploid genome per population (Table 1) and a haploid chimpanzee outgroup. We included the Yoruban and Bantu individuals in the analysis as alternative African ingroups (denoted X) because their relationship to one another was uncertain (Supplementary Note). The free parameters in our model include the five population divergence times (τ) and the ten effective population sizes (θ), all expressed in units of expected mutations per site. We also considered various 'migration bands' (gray double-headed arrow) to allow for gene flow between populations, treating the (constant) migration rates along bands as free parameters. The two parameters of primary interest were the San (τKHEXS) and African-Eurasian (τKHEX) divergence times (div.). We obtained absolute divergence times (in years) and effective population sizes (in numbers of individuals) by assuming a human-chimpanzee average genomic divergence time of 5.6–7.6 Mya and a point estimate of 6.5 Mya.

  2. Results of the simulation study.
    Figure 2: Results of the simulation study.

    Simulations assumed a population tree like the one shown in Figure 1 and plausible divergence times, population sizes and migration scenarios (Supplementary Note). (a) Accuracy of estimated African-Eurasian (τKHEX) and San (τKHEXS) divergence times without migration. Dotted lines indicate the values assumed for the simulations, and each boxplot summarizes the posterior mean estimates in six separate runs of G-PhoCS. Results are shown for correctly phased data (gold) and integration over unknown phasings (red). A random phasing procedure produced substantially poorer results (Supplementary Fig. 10). Most estimates fall within 10% of the true value, except for the smallest assumed divergence times, where weak information in the data leads to an upward bias. (b) Accuracy of the estimated San divergence time (τKHEXS) and the Yoruban-Bantu population size (θX) in simulations with four levels of constant-rate migration (denoted 0, 1, 2 and 3 in order of increasing strength) from population S to population X. Ratios of the estimated to true values are shown when migration is not allowed (blue) and is allowed (red) in the model. Each boxplot summarizes 12 runs. Notice that there is a pronounced bias when migration is present but is not modeled, but this bias is eliminated when migration is added to the model. Simulated and estimated migration rates (measured in expected number of migrants per generation) are shown at right (see Supplementary Figs. 9–11 for the complete results).

  3. Parameter estimates from real data.
    Figure 3: Parameter estimates from real data.

    Estimates of population divergence times (a), migration rates (b) and effective population sizes (c) obtained for various scenarios. In a and c, both mutation-scaled (left) and calibrated (right) y axes are shown (with a calibration of Tdiv = 6.5 Mya). Results are shown for scenarios with either the Yoruban or Bantu ingroup X and with or without a migration band between X and the San ingroup. Panel b shows estimated migration rates for 14 different migration bands. Only the Yoruban-San (Y-S) and Bantu-San (B-S) migration scenarios are strongly supported. In all panels, each bar represents the mean estimate and 95% credible interval (error bars) of a single representative run of the program (see Supplementary Tables 4 and 5 and Supplementary Figs. 12 and 13 for complete results).

References

  1. Cavalli-Sforza, L.L. & Feldman, M.W. The application of molecular genetic approaches to the study of human evolution. Nat. Genet. 33 (suppl.) 266275 (2003).
  2. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710722 (2010).
  3. Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 10531060 (2010).
  4. Cann, R.L., Stoneking, M. & Wilson, A.C. Mitochondrial DNA and human evolution. Nature 325, 3136 (1987).
  5. Gonder, M.K., Mortensen, H.M., Reed, F.A., de Sousa, A. & Tishkoff, S.A. Whole-mtDNA genome sequence analysis of ancient African lineages. Mol. Biol. Evol. 24, 757768 (2007).
  6. Zhivotovsky, L.A., Rosenberg, N.A. & Feldman, M.W. Features of evolution and expansion of modern humans, inferred from genomewide microsatellite markers. Am. J. Hum. Genet. 72, 11711186 (2003).
  7. Liu, H., Prugnolle, F., Manica, A. & Balloux, F. A geographically explicit genetic model of worldwide human-settlement history. Am. J. Hum. Genet. 79, 230237 (2006).
  8. Voight, B.F. et al. Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc. Natl. Acad. Sci. USA 102, 1850818513 (2005).
  9. Fagundes, N.J. et al. Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci. USA 104, 1761417619 (2007).
  10. Wall, J.D., Lohmueller, K.E. & Plagnol, V. Detecting ancient admixture and estimating demographic parameters in multiple human populations. Mol. Biol. Evol. 26, 18231827 (2009).
  11. Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
  12. Keinan, A., Mullikin, J.C., Patterson, N. & Reich, D. Measurement of the human allele frequency spectrum demonstrates greater genetic drift in east Asians than in Europeans. Nat. Genet. 39, 12511255 (2007).
  13. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
  14. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 5359 (2008).
  15. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 6065 (2008).
  16. Ahn, S.M. et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19, 16221629 (2009).
  17. Schuster, S.C. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943947 (2010).
  18. Garrigan, D. et al. Inferring human population sizes, divergence times and rates of gene flow from mitochondrial, X and Y chromosome resequencing data. Genetics 177, 21952207 (2007).
  19. Tishkoff, S.A. et al. The genetic structure and history of Africans and African Americans. Science 324, 10351044 (2009).
  20. Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 16451656 (2003).
  21. Burgess, R. & Yang, Z. Estimation of hominoid ancestral population sizes under Bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25, 19791994 (2008).
  22. Nielsen, R. & Wakeley, J. Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158, 885896 (2001).
  23. Hey, J. Isolation with migration models for more than two populations. Mol. Biol. Evol. 27, 905920 (2010).
  24. Patterson, N., Richter, D.J., Gnerre, S., Lander, E.S. & Reich, D. Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 11031108 (2006).
  25. Kondrashov, A.S. Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum. Mutat. 21, 1227 (2003).
  26. Lewontin, R.C. The apportionment of human diversity. in Evolutionary Biology (eds. Dobzhansky, T.H., Hecht, M.K. & Steere, W.C.) 6 (Appleton-Century-Crofts, New York, New York, USA, 1972).
  27. Beaumont, M.A., Zhang, W. & Balding, D.J. Approximate Bayesian computation in population genetics. Genetics 162, 20252035 (2002).
  28. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493496 (2011).
  29. Hobolth, A., Christensen, O.F., Mailund, T. & Schierup, M.H. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).
  30. Paul, J.S., Steinrucken, M. & Song, Y.S. An accurate sequentially markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 11151128 (2011).
  31. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  32. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 20782079 (2009).
  33. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 12971303 (2010).
  34. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 10611073 (2010).
  35. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 18511858 (2008).
  36. Frazer, K.A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851861 (2007).

Download references

Author information

Affiliations

  1. Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA.

    • Ilan Gronau,
    • Melissa J Hubisz,
    • Charles G Danko &
    • Adam Siepel
  2. Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA.

    • Brad Gulko

Contributions

A.S. conceived of and designed the study. I.G. implemented G-PhoCS and applied it to both simulated and real data. B.G. implemented BSNP and applied it to the individual genomes. I.G., M.J.H., B.G., C.G.D. and A.S. performed additional statistical analyses. I.G. and A.S. wrote the paper with review and contributions by all authors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (4M)

    Supplementary Figures 1–13, Supplementary Tables 1–7 and Supplementary Note.

Additional data