Methods and models for unravelling human evolutionary history

Key Points

  • High-throughput sequencing is enabling massively large catalogues of DNA sequence variation to be collected in geographically diverse human populations. Such data sets contain considerable information about human history but are complex and require careful analysis.

  • Quality control and exploratory data analyses are critical in analyses of large-scale sequencing data sets and help to identify features of the data that may complicate downstream inferences.

  • Functional and comparative genomics data (such as sequence conservation, chromatin immunoprecipitation followed by sequencing (ChIP–seq) and DNase I hypersensitivity) can be leveraged to mitigate the confounding effect of natural selection when inferring demographic models.

  • A large number of flexible and sophisticated methods have been developed that allow specific and detailed demographic inferences to be made. The appropriate method to use depends on the specific hypothesis or question being asked, and the underlying assumptions of a given method should be carefully considered.

  • As sample sizes become increasingly large, inferences about specific aspects of breeding structure and demography may be possible. However, these methods are still in their infancy and require substantial theoretical and methodological development.

  • The increasing availability of ancient DNA from modern and archaic humans provides exciting new possibilities to refine parameters of human evolutionary history, although new methodological development is needed to fully realize the potential of these data.

Abstract

The genomes of contemporary humans contain considerable information about the history of our species. Although the general contours of human evolutionary history have been defined with increasing resolution throughout the past several decades, the continuing deluge of massively large sequencing data sets presents new opportunities and challenges for understanding human evolutionary history. Here, we review the signatures that demographic history imparts on patterns of DNA sequence variation, statistical methods that have been developed to leverage information contained in genome-scale data sets and insights gleaned from these studies. We also discuss the importance of using exploratory analyses to assess data quality, the strengths and limitations of commonly used population genomics methods, and factors that confound population genomics inferences.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Identifying demographically informative genomic regions.
Figure 2: Inferring population demographic history.
Figure 3: The effect of demographic perturbations on gene genealogies and the SFS.

References

  1. 1

    Veeramah, K. R. & Hammer, M. F. The impact of whole-genome sequencing on the reconstruction of human population history. Nat. Rev. Genet. 15, 149–162 (2014).

    CAS  PubMed  Google Scholar 

  2. 2

    Metzker, M. L. Sequencing technologies — the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

    CAS  PubMed  Google Scholar 

  3. 3

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). This study describes an international project that created one of the most-comprehensive catalogues of sequence variation in geographically diverse populations.

  4. 4

    Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). This article represents one of the earliest large-scale, high-coverage exome data sets to be produced; it has been extensively used in evolutionary and medical genomics.

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5

    Bustamante, C. D., De La Vega, F. M. & Burchard, E. G. Genomics for the world. Nature 475, 163–165 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).

    CAS  PubMed  Google Scholar 

  7. 7

    Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    Nielsen, R. Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218 (2005).

    CAS  PubMed  Google Scholar 

  10. 10

    Sabeti, P. C. et al. Positive natural selection in the human lineage. Science 312, 1614–1620 (2006).

    CAS  PubMed  Google Scholar 

  11. 11

    Bamshad, M. & Wooding, S. P. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4, 99–111 (2003).

    CAS  PubMed  Google Scholar 

  12. 12

    Akey, J. M. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 19, 711–722 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13

    Fu, W. & Akey, J. M. Selection and adaptation in the human genome. Annu. Rev. Genom. Hum. Genet. 14, 467–489 (2013).

    CAS  Google Scholar 

  14. 14

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Auwera, G. A. et al. From fastQ data to high-confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 15, 1110 (2013).

    Google Scholar 

  17. 17

    Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).

    PubMed  PubMed Central  Google Scholar 

  18. 18

    Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Schraiber, J. G., Shih, S. & Slatkin, M. Genomic tests of variation in inbreeding among individuals and among chromosomes. Genetics 192, 1477–1482 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20

    Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Williamson, S. H. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl Acad. Sci. USA 102, 7882–7887 (2005). This study reports a clever approach to account for the effects of selection when making demographic inferences.

    CAS  PubMed  Google Scholar 

  22. 22

    Živkovic, D., Steinrücken, M., Song, Y. S. & Stephan, W. Transition densities and sample frequency spectra of diffusion processes with selection and variable population size. Genetics 200, 601–617 (2015).

    PubMed  PubMed Central  Google Scholar 

  23. 23

    Hammer, M. F. et al. The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nat. Genet. 42, 830–831 (2010).

    CAS  PubMed  Google Scholar 

  24. 24

    Gottipati, S., Arbiza, L., Siepel, A., Clark, A. G. & Keinan, A. Analyses of X-linked and autosomal genetic variation in population-scale whole genome sequencing. Nat. Genet. 43, 741–743 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Gazave, E. et al. Neutral genomic regions refine models of recent rapid human population growth. Proc. Natl Acad. Sci. USA 111, 757–762 (2014). This study illustrates well how choosing neutral genomic regions carefully can lead to more-refined estimates of demographic parameters.

    CAS  PubMed  Google Scholar 

  26. 26

    Consortium, T. E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Google Scholar 

  27. 27

    Romanoski, C. E., Glass, C. K., Stunnenberg, H. G., Wilson, L. & Almouzni, G. Epigenomics: roadmap for regulation. Nature 518, 314–316 (2015).

    CAS  PubMed  Google Scholar 

  28. 28

    Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).

    PubMed  PubMed Central  Google Scholar 

  30. 30

    Pollard, K. S. et al. Forces shaping the fastest evolving regions in the human genome. PLoS Genet. 2, e168 (2006).

    PubMed  PubMed Central  Google Scholar 

  31. 31

    Arbiza, L., Zhong, E. & Keinan, A. NRE: a tool for exploring neutral loci in the human genome. BMC Bioinformatics 13, 301 (2012).

    PubMed  PubMed Central  Google Scholar 

  32. 32

    Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000). This classic paper describes a nonparametric approach for inferring population structure.

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33

    Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History And Geography Of Human Genes (Princeton Univ. Press, 1994).

    Google Scholar 

  34. 34

    Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Biswas, S., Scheinfeldt, L. B. & Akey, J. M. Genome-wide insights into the patterns and determinants of fine-scale population structure in humans. Am. J. Hum. Genet. 84, 641–650 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    McVean, G. A. Genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).

    PubMed  PubMed Central  Google Scholar 

  37. 37

    François, O. et al. Principal component analysis under population genetic models of range expansion and admixture. Mol. Biol. Evol. 27, 1257–1268 (2010).

    PubMed  Google Scholar 

  38. 38

    Novembre, J. & Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat. Genet. 40, 646–649 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Yang, W.-Y., Novembre, J., Eskin, E. & Halperin, E. A model-based approach for analysis of spatial structure in genetic data. Nat. Genet. 44, 725–731 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Tang, H., Peng, J., Wang, P. & Risch, N. J. Estimation of individual admixture: analytical and study design considerations. Genet. Epidemiol. 28, 289–301 (2005).

    PubMed  Google Scholar 

  41. 41

    Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Raj, A., Stephens, M. & Pritchard, J. K. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197, 573–589 (2014).

    PubMed  PubMed Central  Google Scholar 

  43. 43

    Huelsenbeck, J. P. & Andolfatto, P. Inference of population structure under a Dirichlet process model. Genetics 175, 1787–1802 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Xie, W., Lewis, P. O., Fan, Y., Kuo, L. & Chen, M.-H. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60, 150–160 (2010).

    PubMed  PubMed Central  Google Scholar 

  45. 45

    Patterson, N. et al. Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979–1000 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Gravel, S. Population genetics models of local ancestry. Genetics 191, 607–619 (2012).

    PubMed  PubMed Central  Google Scholar 

  47. 47

    Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48

    Price, A. L. et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 5, e1000519 (2009).

    PubMed  PubMed Central  Google Scholar 

  49. 49

    Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Pool, J. E. & Nielsen, R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181, 711–719 (2009).

    PubMed  PubMed Central  Google Scholar 

  51. 51

    Liang, M. & Nielsen, R. The lengths of admixture tracts. Genetics 197, 953–967 (2014).

    PubMed  PubMed Central  Google Scholar 

  52. 52

    Sankararaman, S., Sridhar, S., Kimmel, G. & Halperin, E. Estimating local ancestry in admixed populations. Am. J. Hum. Genet. 82, 290–303 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53

    Brisbin, A. et al. PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Hum. Biol. 84, 343–364 (2012).

    PubMed  PubMed Central  Google Scholar 

  54. 54

    Wakeley, J. Coalescent Theory: An Introduction (Robert & Co., 2009).

    Google Scholar 

  55. 55

    Sawyer, S. A. & Hartl, D. L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Bhaskar, A. & Song, Y. S. Descartes' rule of signs and the identifiability of population demographic models from genomic variation data. Ann. Statist. 42, 2469–2493 (2014).

    Google Scholar 

  57. 57

    Terhorst, J. & Song, Y. S. Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proc. Natl Acad. Sci. USA 112, 7677–7682 (2015).

    CAS  PubMed  Google Scholar 

  58. 58

    Bustamante, C. D., Wakeley, J., Sawyer, S. & Hartl, D. L. Directional selection and the site-frequency spectrum. Genetics 159, 1779–1788 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Evans, S. N., Shvets, Y. & Slatkin, M. Non-equilibrium theory of the allele frequency spectrum. Theor. Popul. Biol. 71, 109–119 (2007).

    PubMed  Google Scholar 

  60. 60

    Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).

    PubMed  PubMed Central  Google Scholar 

  61. 61

    Lukic, S. & Hey, J. Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192, 619–639 (2012).

    PubMed  PubMed Central  Google Scholar 

  62. 62

    Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V. C. & Foll, M. Robust demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 (2013).

    PubMed  PubMed Central  Google Scholar 

  63. 63

    Excoffier, L. & Foll, M. Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332–1334 (2011).

    CAS  PubMed  Google Scholar 

  64. 64

    Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8, e1002967 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. 65

    Bhaskar, A., Wang, Y. & Song, Y. S. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res. 25, 268–279 (2014).

    Google Scholar 

  66. 66

    Griffiths, R. C. & Marjoram, P. An ancestral recombination graph. University of Canterbury[online], (1997).

  67. 67

    Wiuf, C. & Hein, J. Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999).

    CAS  PubMed  Google Scholar 

  68. 68

    Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Ralph, P. & Coop, G. The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    Palamara, P. F., Lencz, T., Darvasi, A. & Pe'er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Palamara, P. F. & Pe'er, I. Inference of historical migration rates via haplotype sharing. Bioinformatics 29, i180–i188 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

    PubMed  PubMed Central  Google Scholar 

  74. 74

    McVean, G. A. T. & Cardin, N. J. Approximating the coalescent with recombination. Philos. Trans. R. Soc. B Biol. Sci. 360, 1387–1393 (2005). This article introduces the SMC, which enabled important developments in population genomic inferencing from recombining sequences.

    CAS  Google Scholar 

  75. 75

    Marjoram, P. & Wall, J. D. Fast 'coalescent' simulation. BMC Genet. 7, 16 (2006).

    PubMed  PubMed Central  Google Scholar 

  76. 76

    Harris, K. & Nielsen, R. Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genet. 9, e1003521 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77

    Liu, S. et al. Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears. Cell 157, 785–794 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78

    Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011). This study describes PSMC, which enables quasi-non-parametric inferencing of effective population size through time from a single diploid genome sequence.

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79

    Drummond, A. J., Rambaut, A., Shapiro, B. & Pybus, O. G. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192 (2005).

    CAS  PubMed  Google Scholar 

  80. 80

    Heled, J. & Drummond, A. J. Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8, 289 (2008). This study details one of the first, and underappreciated, methods to infer population size history in a relatively non-parametric way from haplotype data.

    PubMed  PubMed Central  Google Scholar 

  81. 81

    Minin, V. N., Bloomquist, E. W. & Suchard, M. A. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25, 1459–1471 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  82. 82

    Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  83. 83

    Hobolth, A., Christensen, O. F., Mailund, T. & Schierup, M. H. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).

    PubMed  PubMed Central  Google Scholar 

  84. 84

    Dutheil, J. Y. et al. Ancestral population genomics: the coalescent hidden Markov model approach. Genetics 183, 259–274 (2009).

    PubMed  PubMed Central  Google Scholar 

  85. 85

    Mailund, T., Dutheil, J. Y., Hobolth, A., Lunter, G. & Schierup, M. H. Estimating divergence time and ancestral effective population size of bornean and sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet. 7, e1001319 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. 86

    Hobolth, A., Dutheil, J. Y., Hawks, J., Schierup, M. H. & Mailund, T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. 87

    Mailund, T. et al. A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet. 8, e1003125 (2012).

    PubMed  PubMed Central  Google Scholar 

  88. 88

    Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. 89

    Sheehan, S., Harris, K. & Song, Y. S. Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013).

    PubMed  PubMed Central  Google Scholar 

  90. 90

    Stephens, M. & Donnelly, P. Inference in molecular population genetics. J. R. Stat. Soc. B 62, 605–655 (2000).

    Google Scholar 

  91. 91

    Paul, J. S., Steinrücken, M. & Song, Y. S. An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128 (2011).

    PubMed  PubMed Central  Google Scholar 

  92. 92

    Steinrücken, M., Paul, J. S. & Song, Y. S. A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol. 87, 51–61 (2013).

    PubMed  Google Scholar 

  93. 93

    Kuhner, M. K. LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22, 768–770 (2006).

    CAS  PubMed  Google Scholar 

  94. 94

    Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. 95

    Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  96. 96

    Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43, 1031–1034 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  97. 97

    Lohse, K., Harrison, R. J. & Barton, N. H. A general method for calculating likelihoods under the coalescent process. Genetics 189, 977–987 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. 98

    Lohse, K. & Frantz, L. A. F. Neandertal admixture in Eurasia confirmed by maximum-likelihood analysis of three genomes. Genetics 196, 1241–1251 (2014).

    PubMed  PubMed Central  Google Scholar 

  99. 99

    Rasmussen, M. D., Hubisz, M. J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).

    PubMed  PubMed Central  Google Scholar 

  100. 100

    Ségurel, L., Wyman, M. J. & Przeworski, M. Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15, 47–70 (2014). This review covers in great detail the recent controversy about the human genomic mutation rate and summarizes the different kinds of mutations in the human genome.

    PubMed  Google Scholar 

  101. 101

    Bhaskar, A., Clark, A. G. & Song, Y. S. Distortion of genealogical properties when the sample is very large. Proc. Natl Acad. Sci. USA 111, 2385–2390 (2014).

    CAS  PubMed  Google Scholar 

  102. 102

    Wakeley, J., King, L., Low, B. S. & Ramachandran, S. Gene genealogies within a fixed pedigree, and the robustness of Kingman's coalescent. Genetics 190, 1433–1445 (2012).

    PubMed  PubMed Central  Google Scholar 

  103. 103

    Möhle, M. Robustness results for the coalescent. J. Appl. Probab. 35, 438–447 (1998). This important theory paper outlines the broad generality of the Kingman coalescent.

    Google Scholar 

  104. 104

    Pitman, J. Coalescents with multiple collisions. Ann. Appl. Probab. 27, 1870–1902 (1999).

    Google Scholar 

  105. 105

    Sagitov, S. The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Probab. 36, 1116–1125 (1999).

    Google Scholar 

  106. 106

    Zerjal, T. et al. The genetic legacy of the Mongols. Am. J. Hum. Genet. 72, 717–721 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  107. 107

    Varin, C., Reid, N. & Firth, D. An overview of composite likelihood methods. Statist. Sin. 21, 5–42 (2011).

    Google Scholar 

  108. 108

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  109. 109

    Beaumont, M. A. Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Syst. 41, 379–406 (2010).

    Google Scholar 

  110. 110

    Beaumont, M. A., Zhang, W. & Balding, D. J. Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002).

    PubMed  PubMed Central  Google Scholar 

  111. 111

    Sunnåker, M. et al. Approximate Bayesian computation. PLoS Comput. Biol. 9, e1002803 (2013).

    PubMed  PubMed Central  Google Scholar 

  112. 112

    Csilléry, K., Blum, M. G. B., Gaggiotti, O. E. & François, O. Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418 (2010).

    PubMed  Google Scholar 

  113. 113

    Wegmann, D., Leuenberger, C. & Excoffier, L. Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182, 1207–1218 (2009).

    PubMed  PubMed Central  Google Scholar 

  114. 114

    Sisson, S. A., Fan, Y. & Tanaka, M. M. Sequential Monte Carlo without likelihoods. Proc. Natl Acad. Sci. USA 104, 1760–1765 (2007).

    CAS  PubMed  Google Scholar 

  115. 115

    Wegmann, D., Leuenberger, C., Neuenschwander, S. & Excoffier, L. ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11, 116 (2010).

    PubMed  PubMed Central  Google Scholar 

  116. 116

    Fearnhead, P. & Prangle, D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc. 74, 419–474 (2012).

    Google Scholar 

  117. 117

    Pickrell, J. K. & Reich, D. Toward a new history and geography of human genes informed by ancient DNA. Trends Genet. 30, 377–389 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  118. 118

    Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  119. 119

    Plagnol, V. & Wall, J. D. Possible ancestral structure in human populations. PLoS Genet. 2, e105 (2006).

    PubMed  PubMed Central  Google Scholar 

  120. 120

    Eriksson, A. & Manica, A. Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc. Natl Acad. Sci. USA 109, 13956–13960 (2012).

    CAS  PubMed  Google Scholar 

  121. 121

    Burger, J., Kirchner, M., Bramanti, B., Haak, W. & Thomas, M. G. Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, 3736–3741 (2007).

    CAS  PubMed  Google Scholar 

  122. 122

    Malmström, H. et al. in Migration in Prehistory: DNA and Stable Isotope Analysis of Swedish Skeletal Material (ed. Linderholm, A.) (Stockholm University, 2008).

    Google Scholar 

  123. 123

    Malmström, H. et al. High frequency of lactose intolerance in a prehistoric hunter-gatherer population in northern Europe. BMC Evol. Biol. 10, 89 (2010).

    PubMed  PubMed Central  Google Scholar 

  124. 124

    Lacan, M. et al. Ancient DNA reveals male diffusion through the Neolithic Mediterranean route. Proc. Natl Acad. Sci. USA 108, 9788–9791 (2011).

    CAS  PubMed  Google Scholar 

  125. 125

    Plantinga, T. S. et al. Low prevalence of lactase persistence in Neolithic South-West Europe. Eur. J. Hum. Genet. 20, 778–782 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  126. 126

    Bollback, J. P., York, T. L. & Nielsen, R. Estimation of 2Nes from temporal allele frequency data. Genetics 179, 497–502 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  127. 127

    Malaspinas, A.-S., Malaspinas, O., Evans, S. N. & Slatkin, M. Estimating allele age and selection coefficient from time-serial data. Genetics 192, 599–607 (2012).

    PubMed  PubMed Central  Google Scholar 

  128. 128

    Mathieson, I. & McVean, G. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193, 973–984 (2013).

    PubMed  PubMed Central  Google Scholar 

  129. 129

    Steinrücken, M., Bhaskar, A. & Song, Y. S. A novel spectral method for inferring general diploid selection from time series genetic data. Ann. Appl. Statist. 8, 2203–2222 (2014).

    Google Scholar 

  130. 130

    Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  131. 131

    Yang, Z. & Rannala, B. Bayesian species delimitation using multilocus sequence data. Proc. Natl Acad. Sci. USA 107, 9264–9269 (2010).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The author would like to acknowledge Kelley Harris for helpful discussions regarding SMC-based approaches.

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Joshua G. Schraiber or Joshua M. Akey.

Ethics declarations

Competing interests

J.M.A. is a paid consultant of Glenview Capital. J.G.S. declares no competing interests.

Supplementary information

Supplementary information S1 (box)

Issues in inferring absolute dates (PDF 141 kb)

Supplementary information S2 (box)

Lambda coalescents (PDF 139 kb)

PowerPoint slides

Glossary

Exploratory data analyses

(EDA). The initial stages of 'digging into' a data set, usually by plotting low-dimensional summaries of the data.

Likelihoods

The probabilities of the data given various models and their parameters, thought of as functions of those parameters. The parameter values that maximize the probability of the data in each model are called maximum likelihood estimates.

Eigenvectors

Vectors that, when multiplied by a given matrix, still point in the same direction.

Covariance matrix

An n×n matrix describing the covariance between each pair in a sample of size n.

Panmictic population

A group of individuals among whom random mating occurs.

Linkage disequilibrium

(LD). Nonrandom association between alleles at physically distinct genomic loci. Over time, this will be broken down by recombination.

Coalescence times

The times in the past when genomic regions shared a common genetic ancestor.

Isolation by distance

Genetic differentiation between individuals induced by geographic separation. Individuals that are closer geographically will be closer genetically.

Overfitting

By adding more parameters to a model, it will begin to model the noise in the observed data, rather than the true underlying mechanism of data generation. Overfit models will generalize poorly to new data sets.

Cross-validation error

The error in predicting the structure of a held-out portion of the data, when a model is trained on a subset of the whole data set. Minimizing cross-validation error is an effective way to choose parameters and hyperparameters.

Ancestral recombination graphs

Graph structures representing the genealogical history of a sample with a recombining genome. In addition to coalescence events (which bring two lineages together and therefore reduce the number of lineages in the graph), recombination events cause splits to occur, which increases the number of lineages in the graph.

Hidden Markov model

(HMM). A statistical model in which a set of underlying hidden states are assumed to follow Markov chain dynamics and induce a set of observed states.

Reference panel

A large number of individuals, related to samples of interest, for which some quality is known (for example, allelic phase).

Effective population size

The size that a theoretical population evolving under a Wright–Fisher model would need to be in order to match aspects of the observed genetic data.

Poisson process

A stochastic process in which new events occur at a constant rate per unit of time. Often used to model mutation.

Identity by descent

(IBD). Whether a genomic region has descended from an ancestor unchanged. A genomic region in two (or more) individuals is identical by descent if it is inherited from a common ancestor without being broken up by recombination. Some authors require IBD segments to also be identical by state, that is, to also have no mutations in the region.

Identity by state

(IBS). Whether a genomic region has the same sequence as the corresponding region in another individual. A genomic region in two (or more) individuals is identical by state if it contains no mutations that distinguish the two individuals. Note that a region of IBS is not necessarily also identical by descent.

SMC'

A modification to the sequential Markov coalescent (SMC) that allows for hidden recombination events that do not change the local genealogy.

Conditionally sampled alleles

Alleles that are sampled from a population given that a set of reference alleles is already in hand.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schraiber, J., Akey, J. Methods and models for unravelling human evolutionary history. Nat Rev Genet 16, 727–740 (2015). https://doi.org/10.1038/nrg4005

Download citation

Further reading

Search

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing