The genomes of contemporary humans contain considerable information about the history of our species. Although the general contours of human evolutionary history have been defined with increasing resolution throughout the past several decades, the continuing deluge of massively large sequencing data sets presents new opportunities and challenges for understanding human evolutionary history. Here, we review the signatures that demographic history imparts on patterns of DNA sequence variation, statistical methods that have been developed to leverage information contained in genome-scale data sets and insights gleaned from these studies. We also discuss the importance of using exploratory analyses to assess data quality, the strengths and limitations of commonly used population genomics methods, and factors that confound population genomics inferences.
At a glance
- The impact of whole-genome sequencing on the reconstruction of human population history. Nat. Rev. Genet. 15, 149–162 (2014). &
- Sequencing technologies — the next generation. Nat. Rev. Genet. 11, 31–46 (2010).
- The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
This study describes an international project that created one of the most-comprehensive catalogues of sequence variation in geographically diverse populations.
- Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
This article represents one of the earliest large-scale, high-coverage exome data sets to be produced; it has been extensively used in evolutionary and medical genomics.
- Genomics for the world. Nature 475, 163–165 (2011). , &
- Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011). et al.
- A genetic atlas of human admixture history. Science 343, 747–751 (2014). et al.
- The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015). et al.
- Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218 (2005).
- Positive natural selection in the human lineage. Science 312, 1614–1620 (2006). et al.
- Signatures of natural selection in the human genome. Nat. Rev. Genet. 4, 99–111 (2003). &
- Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 19, 711–722 (2009).
- Selection and adaptation in the human genome. Annu. Rev. Genom. Hum. Genet. 14, 467–489 (2013). &
- The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). et al.
- A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). et al.
- From fastQ data to high-confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 15, 1110 (2013). et al.
- ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014). , &
- Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008). , &
- Genomic tests of variation in inbreeding among individuals and among chromosomes. Genetics 192, 1477–1482 (2012). , &
- Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009). et al.
- Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl Acad. Sci. USA 102, 7882–7887 (2005).
This study reports a clever approach to account for the effects of selection when making demographic inferences.
- Transition densities and sample frequency spectra of diffusion processes with selection and variable population size. Genetics 200, 601–617 (2015). , , &
- The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nat. Genet. 42, 830–831 (2010). et al.
- Analyses of X-linked and autosomal genetic variation in population-scale whole genome sequencing. Nat. Genet. 43, 741–743 (2011). , , , &
- Neutral genomic regions refine models of recent rapid human population growth. Proc. Natl Acad. Sci. USA 111, 757–762 (2014).
This study illustrates well how choosing neutral genomic regions carefully can lead to more-refined estimates of demographic parameters.
- An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
- Epigenomics: roadmap for regulation. Nature 518, 314–316 (2015). , , , &
- Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010). , , &
- Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009). , , &
- Forces shaping the fastest evolving regions in the human genome. PLoS Genet. 2, e168 (2006). et al.
- NRE: a tool for exploring neutral loci in the human genome. BMC Bioinformatics 13, 301 (2012). , &
- Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
This classic paper describes a nonparametric approach for inferring population structure.
- 1994). , & The History And Geography Of Human Genes (Princeton Univ. Press,
- Genes mirror geography within Europe. Nature 456, 98–101 (2008). et al.
- Genome-wide insights into the patterns and determinants of fine-scale population structure in humans. Am. J. Hum. Genet. 84, 641–650 (2009). , &
- Genealogical interpretation of principal components analysis. PLoS Genet. 5, e1000686 (2009).
- Principal component analysis under population genetic models of range expansion and admixture. Mol. Biol. Evol. 27, 1257–1268 (2010). et al.
- Interpreting principal component analyses of spatial population genetic variation. Nat. Genet. 40, 646–649 (2008). &
- A model-based approach for analysis of spatial structure in genetic data. Nat. Genet. 44, 725–731 (2012). , , &
- Estimation of individual admixture: analytical and study design considerations. Genet. Epidemiol. 28, 289–301 (2005). , , &
- Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009). , &
- fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197, 573–589 (2014). , &
- Inference of population structure under a Dirichlet process model. Genetics 175, 1787–1802 (2007). &
- Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60, 150–160 (2010). , , , &
- Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979–1000 (2004). et al.
- Population genetics models of local ancestry. Genetics 191, 607–619 (2012).
- Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003). &
- Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 5, e1000519 (2009). et al.
- Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012). , , &
- Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181, 711–719 (2009). &
- The lengths of admixture tracts. Genetics 197, 953–967 (2014). &
- Estimating local ancestry in admixed populations. Am. J. Hum. Genet. 82, 290–303 (2008). , , &
- PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Hum. Biol. 84, 343–364 (2012). et al.
- 2009). Coalescent Theory: An Introduction (Robert & Co.,
- Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992). &
- Descartes' rule of signs and the identifiability of population demographic models from genomic variation data. Ann. Statist. 42, 2469–2493 (2014). &
- Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proc. Natl Acad. Sci. USA 112, 7677–7682 (2015). &
- Directional selection and the site-frequency spectrum. Genetics 159, 1779–1788 (2001). , , &
- Non-equilibrium theory of the allele frequency spectrum. Theor. Popul. Biol. 71, 109–119 (2007). , &
- Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009). , , &
- Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192, 619–639 (2012). &
- Robust demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 (2013). , , , &
- Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332–1334 (2011). &
- Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8, e1002967 (2012). &
- Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res. 25, 268–279 (2014). , &
- [online], (1997). & An ancestral recombination graph. University of Canterbury
- Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999). &
- Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007). &
- Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009). et al.
- The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013). &
- Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012). , , &
- Inference of historical migration rates via haplotype sharing. Bioinformatics 29, i180–i188 (2013). &
- Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013). &
- Approximating the coalescent with recombination. Philos. Trans. R. Soc. B Biol. Sci. 360, 1387–1393 (2005).
This article introduces the SMC, which enabled important developments in population genomic inferencing from recombining sequences.
- Fast 'coalescent' simulation. BMC Genet. 7, 16 (2006). &
- Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genet. 9, e1003521 (2013). &
- Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears. Cell 157, 785–794 (2014). et al.
- Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).
This study describes PSMC, which enables quasi-non-parametric inferencing of effective population size through time from a single diploid genome sequence.
- Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192 (2005). , , &
- Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8, 289 (2008).
This study details one of the first, and underappreciated, methods to infer population size history in a relatively non-parametric way from haplotype data.
- Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25, 1459–1471 (2008). , &
- Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014). &
- Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007). , , &
- Ancestral population genomics: the coalescent hidden Markov model approach. Genetics 183, 259–274 (2009). et al.
- Estimating divergence time and ancestral effective population size of bornean and sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet. 7, e1001319 (2011). , , , &
- Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011). , , , &
- A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet. 8, e1003125 (2012). et al.
- Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012). et al.
- Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013). , &
- Inference in molecular population genetics. J. R. Stat. Soc. B 62, 605–655 (2000). &
- An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128 (2011). , &
- A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol. 87, 51–61 (2013). , &
- LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22, 768–770 (2006).
- Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012). , , &
- Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 1645–1656 (2003). &
- Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 43, 1031–1034 (2011). , , , &
- A general method for calculating likelihoods under the coalescent process. Genetics 189, 977–987 (2011). , &
- Neandertal admixture in Eurasia confirmed by maximum-likelihood analysis of three genomes. Genetics 196, 1241–1251 (2014). &
- Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014). , , &
- Determinants of mutation rate variation in the human germline. Annu. Rev. Genomics Hum. Genet. 15, 47–70 (2014).
This review covers in great detail the recent controversy about the human genomic mutation rate and summarizes the different kinds of mutations in the human genome.
- Distortion of genealogical properties when the sample is very large. Proc. Natl Acad. Sci. USA 111, 2385–2390 (2014). , &
- Gene genealogies within a fixed pedigree, and the robustness of Kingman's coalescent. Genetics 190, 1433–1445 (2012). , , &
- Robustness results for the coalescent. J. Appl. Probab. 35, 438–447 (1998).
This important theory paper outlines the broad generality of the Kingman coalescent.
- Coalescents with multiple collisions. Ann. Appl. Probab. 27, 1870–1902 (1999).
- The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Probab. 36, 1116–1125 (1999).
- The genetic legacy of the Mongols. Am. J. Hum. Genet. 72, 717–721 (2003). et al.
- An overview of composite likelihood methods. Statist. Sin. 21, 5–42 (2011). , &
- PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). et al.
- Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Syst. 41, 379–406 (2010).
- Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002). , &
- Approximate Bayesian computation. PLoS Comput. Biol. 9, e1002803 (2013). et al.
- Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418 (2010). , , &
- Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182, 1207–1218 (2009). , &
- Sequential Monte Carlo without likelihoods. Proc. Natl Acad. Sci. USA 104, 1760–1765 (2007). , &
- ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics 11, 116 (2010). , , &
- Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc. 74, 419–474 (2012). &
- Toward a new history and geography of human genes informed by ancient DNA. Trends Genet. 30, 377–389 (2014). &
- A draft sequence of the Neandertal genome. Science 328, 710–722 (2010). et al.
- Possible ancestral structure in human populations. PLoS Genet. 2, e105 (2006). &
- Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc. Natl Acad. Sci. USA 109, 13956–13960 (2012). &
- Absence of the lactase-persistence-associated allele in early Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, 3736–3741 (2007). , , , &
- 2008). et al. in Migration in Prehistory: DNA and Stable Isotope Analysis of Swedish Skeletal Material (ed. Linderholm, A.) (Stockholm University,
- High frequency of lactose intolerance in a prehistoric hunter-gatherer population in northern Europe. BMC Evol. Biol. 10, 89 (2010). et al.
- Ancient DNA reveals male diffusion through the Neolithic Mediterranean route. Proc. Natl Acad. Sci. USA 108, 9788–9791 (2011). et al.
- Low prevalence of lactase persistence in Neolithic South-West Europe. Eur. J. Hum. Genet. 20, 778–782 (2012). et al.
- Estimation of 2Nes from temporal allele frequency data. Genetics 179, 497–502 (2008). , &
- Estimating allele age and selection coefficient from time-serial data. Genetics 192, 599–607 (2012). , , &
- Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193, 973–984 (2013). &
- A novel spectral method for inferring general diploid selection from time series genetic data. Ann. Appl. Statist. 8, 2203–2222 (2014). , &
- Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015). et al.
- Bayesian species delimitation using multilocus sequence data. Proc. Natl Acad. Sci. USA 107, 9264–9269 (2010). &