As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population genomic data sets. Such data hold the potential to resolve long-standing questions in evolutionary biology about the role of gene exchange in species formation. In principle, the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However, there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here.
At a glance
- 1859). On the Origins of Species by Means of Natural Selection (Murray,
- Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet. 6, e1000862 (2010).
This was the first study in which RAD-tag sequencing was used to scan genome-wide patterns of differentiation in the quest to find genes involved in adaptation.
- A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6, e19379 (2011). et al.
- Sequencing technologies — the next generation. Nature Rev. Genet. 11, 31–46 (2010).
This is an excellent Review of the NGS technologies, their applications, potential and limitations.
- Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Rev. Genet. 12, 499–510 (2011). et al.
- A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010). et al.
- Bayesian inference of ancient human demography from individual genome sequences. Nature Genet. 43, 1031–1034 (2011).
This study exemplifies the application of coalescence-based genealogy sampler methods to analyse NGS data, representing the largest data set analysed so far with such methods.
, , , &
- Evolutionary history and adaptation inferred from whole-genome sequences of diverse African hunter-gatherers Cell 150, 457–469 (2012). et al.
- Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nature Genet. 43, 956–963 (2011). et al.
- Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature 464, 898–902 (2010). et al.
- The bonobo genome compared with the chimpanzee and human genomes. Nature 486, 527–531 (2012). et al.
- Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012). et al.
- Comparative and demographic analysis of orang-utan genomes. Nature 469, 529–533 (2011). et al.
- The Heliconius Genome Consortium. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487, 94–98 (2012).
- The genomic landscape of species divergence in Ficedula flycatchers. Nature 491, 756–760 (2012). et al.
- Correcting the site frequency spectrum for divergence-based ascertainment. PLoS ONE 4, e5152 (2009).
- Population genetic inference from genomic sequence variation. Genome Res. 20, 291–300 (2010). , , &
- Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443–451 (2011).
This provides a detailed Review on the challenges and recent developments on genotype and SNP calling for NGS data.
, , &
- 1937). & Genetics and the Origin of Species (Columbia Univ. Press,
- The evolutionary genetics of speciation. Phil. Trans. R. Soc. B 353, 287 (1998). &
- Theory and speciation. Trends Ecol. Evol. 16, 330–343 (2001). , &
- Non-allopatric speciation in animals. Systemat. Biol. 29, 254–271 (1980). &
- 1942). Systematics and the Origin of Species: from the Viewpoint of a Zoologist (Harvard Univ. Press,
- 1963). Animal Species and Evolution (Harvard Univ. Press,
- Sympatric speciation: models and empirical evidence. Annu. Rev. Ecol. Evol. Systemat. 38, 459–487 (2007). &
- Sympatric speciation in animals: the ugly duckling grows up. Trends Ecol. Evol. 16, 381–390 (2001).
- Darwin's bridge between microevolution and macroevolution. Nature 457, 837–842 (2009). &
- The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23–35 (1974). &
- Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 355, 1553–1562 (2000).
- The genic view of the process of speciation. J. Evol. Biol. 14, 851–865 (2001).
- Recombination and speciation. Mol. Ecol. 14, 2621–2635 (2005).
- Divergence with gene flow: models and data. Annu. Rev. Ecol. Evol. Systemat. 41, 215–230 (2010). &
- Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158, 885–896 (2001).
This is one of the first papers in which a full likelihood approach based on genealogy samplers was applied to an isolation with migration model.
- Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167, 747–760 (2004). &
- 157–175 (Springer, 1998). & in Molecular Approaches to Ecology and Evolution
- The power and promise of population genomics: from genotyping to genome typing. Nature Rev. Genet. 4, 981–994 (2003). , , , &
- Statistical inferences in phylogeography. Mol. Ecol. 18, 1034–1047 (2009). &
- Interspecific hybridization, heterozygosity and gene exchange in Phlox. Evolution 29, 37–51 (1975).
- Gene flow and natural selection in the origin of Drosophila pseudoobscura and close relatives. Genetics 147, 1091–1106 (1997). , &
- A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005). , , , &
- Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nature Rev. Genet. 9, 477–485 (2008).
- Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218 (2005).
- Adaptation genomics: the next generation. Trends Ecol. Evol. 25, 705–712 (2010). et al.
- Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Rev. Genet. 10, 639–650 (2009). &
- The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61 (2012). et al.
- Adaptation and speciation: what can FST tell us? Trends Ecol. Evol. 20, 435–440 (2005).
- Quantifying population structure using the F-model. Mol. Ecol. Resources 10, 821–830 (2010). &
- Detecting loci under selection in a hierarchically structured population. Heredity 103, 285–298 (2009). , &
- Hierarchical Bayesian model for next-generation population genomics. Genetics 187, 903–917 (2011). &
- Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011).
This provides a detailed description of the principles and properties of the D statistic (also known as the ABBA and BABA test), now widely used to detect and estimate rates of admixture and introgression.
, , &
- A draft sequence of the neandertal genome. Science 328, 710–722 (2010). et al.
- Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010). et al.
- Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc. Natl Acad. Sci. 109, 13956–13960 (2012). &
- The Bayesian revolution in genetics. Nature Rev. Genet. 5, 251–261 (2004). &
- Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154, 931–942 (2000).
- Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl Acad. Sci. 102, 7882–7887 (2005). et al.
- Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
This was the first study solving the expected AFS for an isolation with migration model using the diffusion approximation, opening the door for computing likelihoods for genomic SNP data.
, , &
- fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332–1334 (2011). &
- Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics 168, 1699–1712 (2004). &
- Evolution in Mendelian populations. Genetics 16, 97 (1931).
- Solution of a process of random genetic drift with a continuous model. Proc. Natl Acad. Sci. USA 41, 144 (1955).
- Non-equilibrium allele frequency spectra via spectral methods. Theor. Popul. Biol. 79, 203–219 (2011). , &
- Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192, 619–639 (2012). &
- 3rd edn (eds Balding, D. J., Bishop, M. & Cannings, C.) 878–908 (Wiley, 2007). in Handbook of Statistical Genetics
- Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. 108, 11983–11988 (2011). et al.
- Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012). &
- Reconstructing population histories from single nucleotide polymorphism data. Mol. Biol. Evol. 28, 673–683 (2011). , &
- Inferring population histories using genome-wide allele frequency data. Mol. Biol. Evol. 30, 654–668 (2013). &
- On the genealogy of large populations. J. Appl. Probab. 19, 27–43 (1982).
- Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23, 183–201 (1983).
- Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983).
- Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22, 521–565 (1988).
- Modern computational approaches for analysing molecular genetic variation data. Nature Rev. Genet. 7, 759–770 (2006). &
- Coalescent genealogy samplers: windows into population history. Trends Ecol. Evol. 24, 86–93 (2009).
- Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc. Natl Acad. Sci. USA 104, 2785–2790 (2007). &
- Unified framework to evaluate panmixia and migration direction among multiple sampling locations. Genetics 185, 313–326 (2010). &
- Estimating divergence parameters with small samples from a large number of loci. Genetics 184, 363–379 (2010). &
- A general method for calculating likelihoods under the coalescent process. Genetics 189, 977–987 (2011).
This paper describes an interesting approach to obtain likelihoods for a large number of loci using generating functions that can be applied to isolation with migration models and can, in principle, deal with recombination.
- A likelihood-based comparison of population histories in a parasitoid guild. Mol. Ecol. 21, 4605–4617 (2012). , , &
- Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Systemat. 41, 379–406 (2010).
- Approximate Bayesian computation. PLoS Computat. Biol. 9, e1002803 (2013). et al.
- Computer simulations: tools for population and evolutionary genetics. Nature Rev. Genet. 10, 110–122 (2012). , &
- & François, O. Approximate Bayesian computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418 (2010). , ,
- A new approach to estimate parameters of speciation models with application to apes. Genome Res. 17, 1505–1519 (2007). &
- Hybrid speciation and independent evolution in lineages of alpine butterflies. Evolution 67, 1055–1068 (2013). et al.
- Estimating demographic parameters from large-scale population genomic data using approximate Bayesian computation. BMC Genet. 13, 22 (2012). &
- Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003). &
- An approximate likelihood for genetic data under a model with recombination and population splitting. Theor. Popul. Biol. 75, 331–345 (2009). , &
- Inferring human colonization history using a copying model. PLoS Genet. 4, e1000078 (2008). , &
- A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol. 7 Sep 2012 (doi:org/10.1016/j.tpb.2012.08.004). , &
- An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128 (2011).
This study describes a promising approximation for obtaining ARGs consistent with the data. This can in principle be applied to calculate likelihoods under isolation with migration models explicitly accounting for recombination.
- Analysis of linkage disequilibrium in an island model. Theor. Popul. Biol. 29, 161–197 (1986). &
- Linkage disequilibrium: what history has to tell us. Trends Genet. 18, 83–90 (2002). &
- Can one learn history from the allelic spectrum? Theor. Popul. Biol. 73, 342–348 (2008). , &
- Population genetics models of local ancestry. Genetics 191, 607–619 (2012).
- Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181, 711–719 (2009).
This study proposes a solid theoretical framework to describe the haplotype block lengths in a population receiving immigrants.
- The date of interbreeding between Neandertals and modern humans. PLoS Genet. 8, e1002947 (2012). , , , Pääbo, S. &
- Ancient admixture in human history. Genetics 192, 1065–1093 (2012). et al.
- Inference of admixture parameters in human populations using weighted linkage disequilibrium. Preprint at arXiv [online], (2012). et al.
- Haplotype blocks and linkage disequilibrium in the human genome. Nature Rev. Genet. 4, 587–597 (2003). &
- Ancestral inference from samples of DNA sequences with recombination. J. Computat. Biol. 3, 479–502 (1996). &
- Maximum likelihood estimation of recombination rates from population data. Genetics 156, 1393–1401 (2000). , &
- Bayesian inference of fine-scale recombination rates using population genomic data. Phil. Trans. R. Soc. B 363, 3921–3930 (2008). &
- Two-locus sampling distributions and their application. Genetics 159, 1805–1817 (2001).
- A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160, 1231–1241 (2002). , &
- Stepwise mutation likelihood computation by sequential importance sampling in subdivided population models. Theor. Popul. Biol. 68, 41–53 (2005). , , &
- Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999). &
- The ancestry of a sample of sequences subject to recombination. Genetics 151, 1217–1228 (1999). &
- Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007). , , &
- Estimating divergence time and ancestral effective population size of Bornean and Sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet. 7, e1001319 (2011). , , , &
- Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011). &
- A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet. 8, e1003125 (2012).
This is the first application of HMM-based methods for isolation with migration models, explicitly accounting for recombination.
- Dating the age of admixture via wavelet transform analysis of genome-wide data. Genome Biol. 12, R19 (2011). , , , &
- Identity by descent between distant relatives: detection and applications. Annu. Rev. Genet. 46, 617–633 (2012). &
- & Pe'er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012). , ,
- Ascertainment bias in estimates of average heterozygosity. Am. J. Hum. Genet. 58, 1033–1041 (1996). &
- Population genetic analysis of ascertained SNP data. Hum. Genom. 1, 218–224 (2004).
- Population genomics of sub-Saharan Drosophila melanogaster: African diversity and non-African admixture. PLoS Genet. 8, e1003080 (2012). et al.
- Population genomics of inversion polymorphisms in Drosophila melanogaster. PLoS Genet. 8, e1003056 (2012). &
- The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010). et al.
- Haplotype phasing: existing methods and new developments. Nature Rev. Genet. 12, 703–714 (2011). &
- The potential and challenges of nanopore sequencing. Nature Biotech. 26, 1146–1153 (2008). et al.
- Gene genealogies and the coalescent process. Oxford Surveys Evol. Biol. 7, 44 (1990).
- Linkage disequilibrium, gene trees and selfing: an ancestral recombination graph with partial self-fertilization. Genetics 154, 923–929 (2000).
- Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002).