Methods and models for unravelling human evolutionary history

Schraiber, Joshua G.; Akey, Joshua M.

doi:10.1038/nrg4005

Review Article
Published: 10 November 2015

Methods and models for unravelling human evolutionary history

Joshua G. Schraiber¹ &
Joshua M. Akey¹

Nature Reviews Genetics volume 16, pages 727–740 (2015)Cite this article

14k Accesses
89 Citations
31 Altmetric
Metrics details

Subjects

Key Points

High-throughput sequencing is enabling massively large catalogues of DNA sequence variation to be collected in geographically diverse human populations. Such data sets contain considerable information about human history but are complex and require careful analysis.
Quality control and exploratory data analyses are critical in analyses of large-scale sequencing data sets and help to identify features of the data that may complicate downstream inferences.
Functional and comparative genomics data (such as sequence conservation, chromatin immunoprecipitation followed by sequencing (ChIP–seq) and DNase I hypersensitivity) can be leveraged to mitigate the confounding effect of natural selection when inferring demographic models.
A large number of flexible and sophisticated methods have been developed that allow specific and detailed demographic inferences to be made. The appropriate method to use depends on the specific hypothesis or question being asked, and the underlying assumptions of a given method should be carefully considered.
As sample sizes become increasingly large, inferences about specific aspects of breeding structure and demography may be possible. However, these methods are still in their infancy and require substantial theoretical and methodological development.
The increasing availability of ancient DNA from modern and archaic humans provides exciting new possibilities to refine parameters of human evolutionary history, although new methodological development is needed to fully realize the potential of these data.

Abstract

The genomes of contemporary humans contain considerable information about the history of our species. Although the general contours of human evolutionary history have been defined with increasing resolution throughout the past several decades, the continuing deluge of massively large sequencing data sets presents new opportunities and challenges for understanding human evolutionary history. Here, we review the signatures that demographic history imparts on patterns of DNA sequence variation, statistical methods that have been developed to leverage information contained in genome-scale data sets and insights gleaned from these studies. We also discuss the importance of using exploratory analyses to assess data quality, the strengths and limitations of commonly used population genomics methods, and factors that confound population genomics inferences.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Identifying demographically informative genomic regions.**

**Figure 2: Inferring population demographic history.**

**Figure 3: The effect of demographic perturbations on gene genealogies and the SFS.**

Estimation of coalescence probabilities and population divergence times from SNP data

Article Open access 01 May 2021

Kristy Mualim, Christoph Theunert & Montgomery Slatkin

A method for genome-wide genealogy estimation for thousands of samples

Article 02 September 2019

Leo Speidel, Marie Forest, … Simon R. Myers

Incongruence in the phylogenomics era

Article 27 June 2023

Jacob L. Steenwyk, Yuanning Li, … Antonis Rokas

References

Veeramah, K. R. & Hammer, M. F. The impact of whole-genome sequencing on the reconstruction of human population history. Nat. Rev. Genet. 15, 149–162 (2014).
CAS PubMed Google Scholar
Metzker, M. L. Sequencing technologies — the next generation. Nat. Rev. Genet. 11, 31–46 (2010).
CAS PubMed Google Scholar
The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). This study describes an international project that created one of the most-comprehensive catalogues of sequence variation in geographically diverse populations.
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). This article represents one of the earliest large-scale, high-coverage exome data sets to be produced; it has been extensively used in evolutionary and medical genomics.
CAS PubMed PubMed Central Google Scholar
Bustamante, C. D., De La Vega, F. M. & Burchard, E. G. Genomics for the world. Nature 475, 163–165 (2011).
CAS PubMed PubMed Central Google Scholar
Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).
CAS PubMed Google Scholar
Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).
CAS PubMed PubMed Central Google Scholar
Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).
CAS PubMed PubMed Central Google Scholar
Nielsen, R. Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218 (2005).
CAS PubMed Google Scholar
Sabeti, P. C. et al. Positive natural selection in the human lineage. Science 312, 1614–1620 (2006).
CAS PubMed Google Scholar
Bamshad, M. & Wooding, S. P. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4, 99–111 (2003).
CAS PubMed Google Scholar
Akey, J. M. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 19, 711–722 (2009).
CAS PubMed PubMed Central Google Scholar
Fu, W. & Akey, J. M. Selection and adaptation in the human genome. Annu. Rev. Genom. Hum. Genet. 14, 467–489 (2013).
CAS Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
CAS PubMed PubMed Central Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
CAS PubMed PubMed Central Google Scholar
Auwera, G. A. et al. From fastQ data to high-confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 15, 1110 (2013).
Google Scholar
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).
PubMed PubMed Central Google Scholar
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
CAS PubMed PubMed Central Google Scholar
Schraiber, J. G., Shih, S. & Slatkin, M. Genomic tests of variation in inbreeding among individuals and among chromosomes. Genetics 192, 1477–1482 (2012).
CAS PubMed PubMed Central Google Scholar
Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).
CAS PubMed PubMed Central Google Scholar
Williamson, S. H. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl Acad. Sci. USA 102, 7882–7887 (2005). This study reports a clever approach to account for the effects of selection when making demographic inferences.
CAS PubMed Google Scholar
Živkovic, D., Steinrücken, M., Song, Y. S. & Stephan, W. Transition densities and sample frequency spectra of diffusion processes with selection and variable population size. Genetics 200, 601–617 (2015).
PubMed PubMed Central Google Scholar
Hammer, M. F. et al. The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nat. Genet. 42, 830–831 (2010).
CAS PubMed Google Scholar
Gottipati, S., Arbiza, L., Siepel, A., Clark, A. G. & Keinan, A. Analyses of X-linked and autosomal genetic variation in population-scale whole genome sequencing. Nat. Genet. 43, 741–743 (2011).
CAS PubMed PubMed Central Google Scholar
Gazave, E. et al. Neutral genomic regions refine models of recent rapid human population growth. Proc. Natl Acad. Sci. USA 111, 757–762 (2014). This study illustrates well how choosing neutral genomic regions carefully can lead to more-refined estimates of demographic parameters.
CAS PubMed Google Scholar
Consortium, T. E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Google Scholar
Romanoski, C. E., Glass, C. K., Stunnenberg, H. G., Wilson, L. & Almouzni, G. Epigenomics: roadmap for regulation. Nature 518, 314–316 (2015).
CAS PubMed Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
CAS PubMed PubMed Central Google Scholar
McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).
PubMed PubMed Central Google Scholar
Pollard, K. S. et al. Forces shaping the fastest evolving regions in the human genome. PLoS Genet. 2, e168 (2006).
PubMed PubMed Central Google Scholar
Arbiza, L., Zhong, E. & Keinan, A. NRE: a tool for exploring neutral loci in the human genome. BMC Bioinformatics 13, 301 (2012).
PubMed PubMed Central Google Scholar
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000). This classic paper describes a nonparametric approach for inferring population structure.
CAS PubMed PubMed Central Google Scholar
Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History And Geography Of Human Genes (Princeton Univ. Press, 1994).
Google Scholar
Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).
CAS PubMed PubMed Central Google Scholar
Biswas, S., Scheinfeldt, L. B. & Akey, J. M. Genome-wide insights into the patterns and determinants of fine-scale population structure in humans. Am. J. Hum. Genet. 84, 641–650 (2009).
CAS PubMed PubMed Central Google Scholar
McVean, G. A. Genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).
PubMed PubMed Central Google Scholar
François, O. et al. Principal component analysis under population genetic models of range expansion and admixture. Mol. Biol. Evol. 27, 1257–1268 (2010).
PubMed Google Scholar
Novembre, J. & Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat. Genet. 40, 646–649 (2008).
CAS PubMed PubMed Central Google Scholar
Yang, W.-Y., Novembre, J., Eskin, E. & Halperin, E. A model-based approach for analysis of spatial structure in genetic data. Nat. Genet. 44, 725–731 (2012).
CAS PubMed PubMed Central Google Scholar
Tang, H., Peng, J., Wang, P. & Risch, N. J. Estimation of individual admixture: analytical and study design considerations. Genet. Epidemiol. 28, 289–301 (2005).
PubMed Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
CAS PubMed PubMed Central Google Scholar
Raj, A., Stephens, M. & Pritchard, J. K. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197, 573–589 (2014).
PubMed PubMed Central Google Scholar
Huelsenbeck, J. P. & Andolfatto, P. Inference of population structure under a Dirichlet process model. Genetics 175, 1787–1802 (2007).
CAS PubMed PubMed Central Google Scholar
Xie, W., Lewis, P. O., Fan, Y., Kuo, L. & Chen, M.-H. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60, 150–160 (2010).
PubMed PubMed Central Google Scholar
Patterson, N. et al. Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979–1000 (2004).
CAS PubMed PubMed Central Google Scholar
Gravel, S. Population genetics models of local ancestry. Genetics 191, 607–619 (2012).
PubMed PubMed Central Google Scholar
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).
CAS PubMed PubMed Central Google Scholar
Price, A. L. et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 5, e1000519 (2009).
PubMed PubMed Central Google Scholar
Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).
CAS PubMed PubMed Central Google Scholar
Pool, J. E. & Nielsen, R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181, 711–719 (2009).
PubMed PubMed Central Google Scholar
Liang, M. & Nielsen, R. The lengths of admixture tracts. Genetics 197, 953–967 (2014).
PubMed PubMed Central Google Scholar
Sankararaman, S., Sridhar, S., Kimmel, G. & Halperin, E. Estimating local ancestry in admixed populations. Am. J. Hum. Genet. 82, 290–303 (2008).
CAS PubMed PubMed Central Google Scholar
Brisbin, A. et al. PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Hum. Biol. 84, 343–364 (2012).
PubMed PubMed Central Google Scholar
Wakeley, J. Coalescent Theory: An Introduction (Robert & Co., 2009).
Google Scholar
Sawyer, S. A. & Hartl, D. L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992).
CAS PubMed PubMed Central Google Scholar
Bhaskar, A. & Song, Y. S. Descartes' rule of signs and the identifiability of population demographic models from genomic variation data. Ann. Statist. 42, 2469–2493 (2014).
Google Scholar
Terhorst, J. & Song, Y. S. Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proc. Natl Acad. Sci. USA 112, 7677–7682 (2015).
CAS PubMed Google Scholar
Bustamante, C. D., Wakeley, J., Sawyer, S. & Hartl, D. L. Directional selection and the site-frequency spectrum. Genetics 159, 1779–1788 (2001).
CAS PubMed PubMed Central Google Scholar
Evans, S. N., Shvets, Y. & Slatkin, M. Non-equilibrium theory of the allele frequency spectrum. Theor. Popul. Biol. 71, 109–119 (2007).
PubMed Google Scholar
Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
PubMed PubMed Central Google Scholar
Lukic, S. & Hey, J. Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192, 619–639 (2012).
PubMed PubMed Central Google Scholar
Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V. C. & Foll, M. Robust demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 (2013).
PubMed PubMed Central Google Scholar
Excoffier, L. & Foll, M. Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332–1334 (2011).
CAS PubMed Google Scholar
Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8, e1002967 (2012).
CAS PubMed PubMed Central Google Scholar
Bhaskar, A., Wang, Y. & Song, Y. S. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res. 25, 268–279 (2014).
Google Scholar
Griffiths, R. C. & Marjoram, P. An ancestral recombination graph. University of Canterbury[online], (1997).
Wiuf, C. & Hein, J. Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999).
CAS PubMed Google Scholar
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
CAS PubMed PubMed Central Google Scholar
Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).
CAS PubMed PubMed Central Google Scholar
Ralph, P. & Coop, G. The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013).
CAS PubMed PubMed Central Google Scholar
Palamara, P. F., Lencz, T., Darvasi, A. & Pe'er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).
CAS PubMed PubMed Central Google Scholar
Palamara, P. F. & Pe'er, I. Inference of historical migration rates via haplotype sharing. Bioinformatics 29, i180–i188 (2013).
CAS PubMed PubMed Central Google Scholar
Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).
PubMed PubMed Central Google Scholar
McVean, G. A. T. & Cardin, N. J. Approximating the coalescent with recombination. Philos. Trans. R. Soc. B Biol. Sci. 360, 1387–1393 (2005). This article introduces the SMC, which enabled important developments in population genomic inferencing from recombining sequences.
CAS Google Scholar
Marjoram, P. & Wall, J. D. Fast 'coalescent' simulation. BMC Genet. 7, 16 (2006).
PubMed PubMed Central Google Scholar
Harris, K. & Nielsen, R. Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genet. 9, e1003521 (2013).
CAS PubMed PubMed Central Google Scholar
Liu, S. et al. Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears. Cell 157, 785–794 (2014).
CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011). This study describes PSMC, which enables quasi-non-parametric inferencing of effective population size through time from a single diploid genome sequence.
CAS PubMed PubMed Central Google Scholar
Drummond, A. J., Rambaut, A., Shapiro, B. & Pybus, O. G. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192 (2005).
CAS PubMed Google Scholar
Heled, J. & Drummond, A. J. Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8, 289 (2008). This study details one of the first, and underappreciated, methods to infer population size history in a relatively non-parametric way from haplotype data.
PubMed PubMed Central Google Scholar
Minin, V. N., Bloomquist, E. W. & Suchard, M. A. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25, 1459–1471 (2008).
CAS PubMed PubMed Central Google Scholar
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).
CAS PubMed PubMed Central Google Scholar
Hobolth, A., Christensen, O. F., Mailund, T. & Schierup, M. H. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).
PubMed PubMed Central Google Scholar
Dutheil, J. Y. et al. Ancestral population genomics: the coalescent hidden Markov model approach. Genetics 183, 259–274 (2009).
PubMed PubMed Central Google Scholar
Mailund, T., Dutheil, J. Y., Hobolth, A., Lunter, G. & Schierup, M. H. Estimating divergence time and ancestral effective population size of bornean and sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet. 7, e1001319 (2011).
CAS PubMed PubMed Central Google Scholar
Hobolth, A., Dutheil, J. Y., Hawks, J., Schierup, M. H. & Mailund, T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011).
CAS PubMed PubMed Central Google Scholar
Mailund, T. et al. A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet. 8, e1003125 (2012).
PubMed PubMed Central Google Scholar
Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).
CAS PubMed PubMed Central Google Scholar
Sheehan, S., Harris, K. & Song, Y. S. Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013).
PubMed PubMed Central Google Scholar
Stephens, M. & Donnelly, P. Inference in molecular population genetics. J. R. Stat. Soc. B 62, 605–655 (2000).
Google Scholar
Paul, J. S., Steinrücken, M. & Song, Y. S. An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128 (2011).
PubMed PubMed Central Google Scholar
Steinrücken, M., Paul, J. S. & Song, Y. S. A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol. 87, 51–61 (2013).
PubMed Google Scholar
Kuhner, M. K. LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22, 768–770 (2006).
CAS PubMed Google Scholar
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).
CAS PubMed PubMed Central Google Scholar
Rannala, B. & Yang, Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003).
CAS PubMed PubMed Central Google Scholar
Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43, 1031–1034 (2011).
CAS PubMed PubMed Central Google Scholar
Lohse, K., Harrison, R. J. & Barton, N. H. A general method for calculating likelihoods under the coalescent process. Genetics 189, 977–987 (2011).
CAS PubMed PubMed Central Google Scholar
Lohse, K. & Frantz, L. A. F. Neandertal admixture in Eurasia confirmed by maximum-likelihood analysis of three genomes. Genetics 196, 1241–1251 (2014).
PubMed PubMed Central Google Scholar
Rasmussen, M. D., Hubisz, M. J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).
PubMed PubMed Central Google Scholar
Ségurel, L., Wyman, M. J. & Przeworski, M. Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15, 47–70 (2014). This review covers in great detail the recent controversy about the human genomic mutation rate and summarizes the different kinds of mutations in the human genome.
PubMed Google Scholar
Bhaskar, A., Clark, A. G. & Song, Y. S. Distortion of genealogical properties when the sample is very large. Proc. Natl Acad. Sci. USA 111, 2385–2390 (2014).
CAS PubMed Google Scholar
Wakeley, J., King, L., Low, B. S. & Ramachandran, S. Gene genealogies within a fixed pedigree, and the robustness of Kingman's coalescent. Genetics 190, 1433–1445 (2012).
PubMed PubMed Central Google Scholar
Möhle, M. Robustness results for the coalescent. J. Appl. Probab. 35, 438–447 (1998). This important theory paper outlines the broad generality of the Kingman coalescent.
Google Scholar
Pitman, J. Coalescents with multiple collisions. Ann. Appl. Probab. 27, 1870–1902 (1999).
Google Scholar
Sagitov, S. The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Probab. 36, 1116–1125 (1999).
Google Scholar
Zerjal, T. et al. The genetic legacy of the Mongols. Am. J. Hum. Genet. 72, 717–721 (2003).
CAS PubMed PubMed Central Google Scholar
Varin, C., Reid, N. & Firth, D. An overview of composite likelihood methods. Statist. Sin. 21, 5–42 (2011).
Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
CAS PubMed PubMed Central Google Scholar
Beaumont, M. A. Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Syst. 41, 379–406 (2010).
Google Scholar
Beaumont, M. A., Zhang, W. & Balding, D. J. Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002).
PubMed PubMed Central Google Scholar
Sunnåker, M. et al. Approximate Bayesian computation. PLoS Comput. Biol. 9, e1002803 (2013).
PubMed PubMed Central Google Scholar
Csilléry, K., Blum, M. G. B., Gaggiotti, O. E. & François, O. Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418 (2010).
PubMed Google Scholar
Wegmann, D., Leuenberger, C. & Excoffier, L. Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182, 1207–1218 (2009).
PubMed PubMed Central Google Scholar
Sisson, S. A., Fan, Y. & Tanaka, M. M. Sequential Monte Carlo without likelihoods. Proc. Natl Acad. Sci. USA 104, 1760–1765 (2007).
CAS PubMed Google Scholar
Wegmann, D., Leuenberger, C., Neuenschwander, S. & Excoffier, L. ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11, 116 (2010).
PubMed PubMed Central Google Scholar
Fearnhead, P. & Prangle, D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc. 74, 419–474 (2012).
Google Scholar
Pickrell, J. K. & Reich, D. Toward a new history and geography of human genes informed by ancient DNA. Trends Genet. 30, 377–389 (2014).
CAS PubMed PubMed Central Google Scholar
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
CAS PubMed PubMed Central Google Scholar
Plagnol, V. & Wall, J. D. Possible ancestral structure in human populations. PLoS Genet. 2, e105 (2006).
PubMed PubMed Central Google Scholar
Eriksson, A. & Manica, A. Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc. Natl Acad. Sci. USA 109, 13956–13960 (2012).
CAS PubMed Google Scholar
Burger, J., Kirchner, M., Bramanti, B., Haak, W. & Thomas, M. G. Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, 3736–3741 (2007).
CAS PubMed Google Scholar
Malmström, H. et al. in Migration in Prehistory: DNA and Stable Isotope Analysis of Swedish Skeletal Material (ed. Linderholm, A.) (Stockholm University, 2008).
Google Scholar
Malmström, H. et al. High frequency of lactose intolerance in a prehistoric hunter-gatherer population in northern Europe. BMC Evol. Biol. 10, 89 (2010).
PubMed PubMed Central Google Scholar
Lacan, M. et al. Ancient DNA reveals male diffusion through the Neolithic Mediterranean route. Proc. Natl Acad. Sci. USA 108, 9788–9791 (2011).
CAS PubMed Google Scholar
Plantinga, T. S. et al. Low prevalence of lactase persistence in Neolithic South-West Europe. Eur. J. Hum. Genet. 20, 778–782 (2012).
CAS PubMed PubMed Central Google Scholar
Bollback, J. P., York, T. L. & Nielsen, R. Estimation of 2Nes from temporal allele frequency data. Genetics 179, 497–502 (2008).
CAS PubMed PubMed Central Google Scholar
Malaspinas, A.-S., Malaspinas, O., Evans, S. N. & Slatkin, M. Estimating allele age and selection coefficient from time-serial data. Genetics 192, 599–607 (2012).
PubMed PubMed Central Google Scholar
Mathieson, I. & McVean, G. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193, 973–984 (2013).
PubMed PubMed Central Google Scholar
Steinrücken, M., Bhaskar, A. & Song, Y. S. A novel spectral method for inferring general diploid selection from time series genetic data. Ann. Appl. Statist. 8, 2203–2222 (2014).
Google Scholar
Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).
CAS PubMed PubMed Central Google Scholar
Yang, Z. & Rannala, B. Bayesian species delimitation using multilocus sequence data. Proc. Natl Acad. Sci. USA 107, 9264–9269 (2010).
CAS PubMed Google Scholar

Download references

Acknowledgements

The author would like to acknowledge Kelley Harris for helpful discussions regarding SMC-based approaches.

Author information

Authors and Affiliations

Department of Genome Sciences, University of Washington, 3720 15th Avenue NE, box 355065, Seattle, 98195–5065, Washington, USA
Joshua G. Schraiber & Joshua M. Akey

Authors

Joshua G. Schraiber
View author publications
You can also search for this author in PubMed Google Scholar
Joshua M. Akey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Joshua G. Schraiber or Joshua M. Akey.

Ethics declarations

Competing interests

J.M.A. is a paid consultant of Glenview Capital. J.G.S. declares no competing interests.

Supplementary information

Supplementary information S1 (box)

Issues in inferring absolute dates (PDF 141 kb)

Supplementary information S2 (box)

Lambda coalescents (PDF 139 kb)

Glossary

Exploratory data analyses: (EDA). The initial stages of 'digging into' a data set, usually by plotting low-dimensional summaries of the data.
Likelihoods: The probabilities of the data given various models and their parameters, thought of as functions of those parameters. The parameter values that maximize the probability of the data in each model are called maximum likelihood estimates.
Eigenvectors: Vectors that, when multiplied by a given matrix, still point in the same direction.
Covariance matrix: An n×n matrix describing the covariance between each pair in a sample of size n.
Panmictic population: A group of individuals among whom random mating occurs.
Linkage disequilibrium: (LD). Nonrandom association between alleles at physically distinct genomic loci. Over time, this will be broken down by recombination.
Coalescence times: The times in the past when genomic regions shared a common genetic ancestor.
Isolation by distance: Genetic differentiation between individuals induced by geographic separation. Individuals that are closer geographically will be closer genetically.
Overfitting: By adding more parameters to a model, it will begin to model the noise in the observed data, rather than the true underlying mechanism of data generation. Overfit models will generalize poorly to new data sets.
Cross-validation error: The error in predicting the structure of a held-out portion of the data, when a model is trained on a subset of the whole data set. Minimizing cross-validation error is an effective way to choose parameters and hyperparameters.
Ancestral recombination graphs: Graph structures representing the genealogical history of a sample with a recombining genome. In addition to coalescence events (which bring two lineages together and therefore reduce the number of lineages in the graph), recombination events cause splits to occur, which increases the number of lineages in the graph.
Hidden Markov model: (HMM). A statistical model in which a set of underlying hidden states are assumed to follow Markov chain dynamics and induce a set of observed states.
Reference panel: A large number of individuals, related to samples of interest, for which some quality is known (for example, allelic phase).
Effective population size: The size that a theoretical population evolving under a Wright–Fisher model would need to be in order to match aspects of the observed genetic data.
Poisson process: A stochastic process in which new events occur at a constant rate per unit of time. Often used to model mutation.
Identity by descent: (IBD). Whether a genomic region has descended from an ancestor unchanged. A genomic region in two (or more) individuals is identical by descent if it is inherited from a common ancestor without being broken up by recombination. Some authors require IBD segments to also be identical by state, that is, to also have no mutations in the region.
Identity by state: (IBS). Whether a genomic region has the same sequence as the corresponding region in another individual. A genomic region in two (or more) individuals is identical by state if it contains no mutations that distinguish the two individuals. Note that a region of IBS is not necessarily also identical by descent.
SMC': A modification to the sequential Markov coalescent (SMC) that allows for hidden recombination events that do not change the local genealogy.
Conditionally sampled alleles: Alleles that are sampled from a population given that a set of reference alleles is already in hand.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schraiber, J., Akey, J. Methods and models for unravelling human evolutionary history. Nat Rev Genet 16, 727–740 (2015). https://doi.org/10.1038/nrg4005

Download citation

Published: 10 November 2015
Issue Date: December 2015
DOI: https://doi.org/10.1038/nrg4005

This article is cited by

Population genomics unravels the Holocene history of bread wheat and its relatives
- Xuebo Zhao
- Yafei Guo
- Fei Lu
Nature Plants (2023)
Olfactory marker protein contains a leucine-rich domain in the Ω-loop important for nuclear export
- Noriyuki Nakashima
- Akiko Nakashima
- Makoto Takano
Molecular Brain (2022)
Genome-wide association studies of yield-related traits in high-latitude japonica rice
- Guomin Zhang
- Rongsheng Wang
- Ying Wang
BMC Genomic Data (2021)
Admixture-enabled selection for rapid adaptive evolution in the Americas
- Emily T. Norris
- Lavanya Rishishwar
- I. King Jordan
Genome Biology (2020)
Whole-genome sequencing of 128 camels across Asia reveals origin and migration of domestic Bactrian camels
- Liang Ming
- Liyun Yuan
- Jirimutu
Communications Biology (2020)