Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Understanding the origin of species with genome-scale data: modelling gene flow

Key Points

  • One of the great debates in evolution is about how one species separates into two. The now classical allopatric speciation model has started to be questioned by recent findings that point to divergence in the presence of gene flow.

  • Gene flow is expected to reduce the overall levels of differentiation across the genome. Divergence in the face of gene flow results from the interaction of the opposing forces of gene flow and diversifying selection and the action of recombination.

  • Today, next-generation sequencing (NGS) technologies and assembly tools make it possible to obtain genome-scale data affordably from multiple individuals from closely related populations and/or species, offering the promise of disentangling the complex interplay between selection, gene flow and recombination.

  • One common approach to learn about divergence is to scan the genome using indicators of population differentiation, such as FST. Examples of statistics that are sensitive only to certain aspects of divergence include the ABBA and BABA test (D statistic) for detecting and estimating unidirectional admixture (introgression).

  • Isolation with migration models provide a general theoretical framework for studying speciation. Alternative modes of divergence can be described by alternative isolation with migration models, such as models with no gene flow, secondary contact and migration followed by isolation.

  • A full portrait of the divergence processes can be obtained via the likelihood of a given divergence model. Currently, there are two main families of likelihood-based approaches to studying divergence: allele frequency spectrum (AFS) and genealogy-based approaches.

  • One of the major limitations of current likelihood methods arises when trying to model intermediate levels of recombination explicitly, thus great advances in population genomic inference could be achieved with a comprehensive model of recombination and population divergence. These areas are already undergoing active research, especially in the quest for finding good approximations for the likelihoods of complex demographic models.

Abstract

As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population genomic data sets. Such data hold the potential to resolve long-standing questions in evolutionary biology about the role of gene exchange in species formation. In principle, the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However, there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Alternative modes of divergence.
Figure 2: Disentangling ancestral polymorphism from gene flow (ABBA and BABA test).
Figure 3: Allele frequency spectrum under alternative divergence models.
Figure 4: Distinguishing migration events based on linkage disequilibrium block structure.

Similar content being viewed by others

References

  1. Darwin, C. On the Origins of Species by Means of Natural Selection (Murray, 1859).

    Google Scholar 

  2. Hohenlohe, P. A. et al. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet. 6, e1000862 (2010). This was the first study in which RAD-tag sequencing was used to scan genome-wide patterns of differentiation in the quest to find genes involved in adaptation.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Elshire, R. J. et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6, e19379 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Metzker, M. L. Sequencing technologies — the next generation. Nature Rev. Genet. 11, 31–46 (2010). This is an excellent Review of the NGS technologies, their applications, potential and limitations.

    Article  CAS  PubMed  Google Scholar 

  5. Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Rev. Genet. 12, 499–510 (2011).

    Article  CAS  PubMed  Google Scholar 

  6. Altshuler, D. L. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    Article  PubMed  CAS  Google Scholar 

  7. Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nature Genet. 43, 1031–1034 (2011). This study exemplifies the application of coalescence-based genealogy sampler methods to analyse NGS data, representing the largest data set analysed so far with such methods.

    Article  CAS  PubMed  Google Scholar 

  8. Lachance, J. et al. Evolutionary history and adaptation inferred from whole-genome sequences of diverse African hunter-gatherers Cell 150, 457–469 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Cao, J. et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nature Genet. 43, 956–963 (2011).

    Article  CAS  PubMed  Google Scholar 

  10. von Holdt, B. M. et al. Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature 464, 898–902 (2010).

    Article  CAS  Google Scholar 

  11. Prüfer, K. et al. The bonobo genome compared with the chimpanzee and human genomes. Nature 486, 527–531 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Locke, D. P. et al. Comparative and demographic analysis of orang-utan genomes. Nature 469, 529–533 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. The Heliconius Genome Consortium. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487, 94–98 (2012).

  15. Ellegren, H. et al. The genomic landscape of species divergence in Ficedula flycatchers. Nature 491, 756–760 (2012).

    Article  CAS  PubMed  Google Scholar 

  16. Kern, A. D. Correcting the site frequency spectrum for divergence-based ascertainment. PLoS ONE 4, e5152 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Pool, J. E., Hellmann, I., Jensen, J. D. & Nielsen, R. Population genetic inference from genomic sequence variation. Genome Res. 20, 291–300 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443–451 (2011). This provides a detailed Review on the challenges and recent developments on genotype and SNP calling for NGS data.

    Article  CAS  PubMed  Google Scholar 

  19. Dobzhansky, T. G. & Dobzhansky, T. Genetics and the Origin of Species (Columbia Univ. Press, 1937).

    Google Scholar 

  20. Coyne, J. A. & Orr, H. A. The evolutionary genetics of speciation. Phil. Trans. R. Soc. B 353, 287 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Turelli, M., Barton, N. H. & Coyne, J. A. Theory and speciation. Trends Ecol. Evol. 16, 330–343 (2001).

    Article  CAS  PubMed  Google Scholar 

  22. Futuyma, D. J. & Mayer, G. C. Non-allopatric speciation in animals. Systemat. Biol. 29, 254–271 (1980).

    Article  Google Scholar 

  23. Mayr, E. Systematics and the Origin of Species: from the Viewpoint of a Zoologist (Harvard Univ. Press, 1942).

    Google Scholar 

  24. Mayr, E. Animal Species and Evolution (Harvard Univ. Press, 1963).

    Book  Google Scholar 

  25. Bolnick, D. I. & Fitzpatrick, B. M. Sympatric speciation: models and empirical evidence. Annu. Rev. Ecol. Evol. Systemat. 38, 459–487 (2007).

    Article  Google Scholar 

  26. Via, S. Sympatric speciation in animals: the ugly duckling grows up. Trends Ecol. Evol. 16, 381–390 (2001).

    Article  CAS  PubMed  Google Scholar 

  27. Reznick, D. N. & Ricklefs, R. E. Darwin's bridge between microevolution and macroevolution. Nature 457, 837–842 (2009).

    Article  CAS  PubMed  Google Scholar 

  28. Smith, J. M. & Haigh, J. The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23–35 (1974).

    Article  CAS  PubMed  Google Scholar 

  29. Barton, N. H. Genetic hitchhiking. Phil. Trans. R. Soc. Lond. B 355, 1553–1562 (2000).

    Article  CAS  Google Scholar 

  30. Wu, C. I. The genic view of the process of speciation. J. Evol. Biol. 14, 851–865 (2001).

    Article  Google Scholar 

  31. Butlin, R. K. Recombination and speciation. Mol. Ecol. 14, 2621–2635 (2005).

    Article  CAS  PubMed  Google Scholar 

  32. Pinho, C. & Hey, J. Divergence with gene flow: models and data. Annu. Rev. Ecol. Evol. Systemat. 41, 215–230 (2010).

    Article  Google Scholar 

  33. Nielsen, R. & Wakeley, J. Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158, 885–896 (2001). This is one of the first papers in which a full likelihood approach based on genealogy samplers was applied to an isolation with migration model.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Hey, J. & Nielsen, R. Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167, 747–760 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Wakeley, J. & Hey, J. in Molecular Approaches to Ecology and Evolution 157–175 (Springer, 1998).

    Book  Google Scholar 

  36. Luikart, G., England, P. R., Tallmon, D., Jordan, S. & Taberlet, P. The power and promise of population genomics: from genotyping to genome typing. Nature Rev. Genet. 4, 981–994 (2003).

    Article  CAS  PubMed  Google Scholar 

  37. Nielsen, R. & Beaumont, M. A. Statistical inferences in phylogeography. Mol. Ecol. 18, 1034–1047 (2009).

    Article  CAS  PubMed  Google Scholar 

  38. Levin, D. A. Interspecific hybridization, heterozygosity and gene exchange in Phlox. Evolution 29, 37–51 (1975).

    Article  PubMed  Google Scholar 

  39. Wang, R. L., Wakeley, J. & Hey, J. Gene flow and natural selection in the origin of Drosophila pseudoobscura and close relatives. Genetics 147, 1091–1106 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005).

    Article  CAS  PubMed  Google Scholar 

  41. Slatkin, M. Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nature Rev. Genet. 9, 477–485 (2008).

    Article  CAS  PubMed  Google Scholar 

  42. Nielsen, R. Molecular signatures of natural selection. Annu. Rev. Genet. 39, 197–218 (2005).

    Article  CAS  PubMed  Google Scholar 

  43. Stapley, J. et al. Adaptation genomics: the next generation. Trends Ecol. Evol. 25, 705–712 (2010).

    Article  PubMed  Google Scholar 

  44. Holsinger, K. E. & Weir, B. S. Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Rev. Genet. 10, 639–650 (2009).

    Article  CAS  PubMed  Google Scholar 

  45. Jones, F. C. et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Beaumont, M. A. Adaptation and speciation: what can FST tell us? Trends Ecol. Evol. 20, 435–440 (2005).

    Article  PubMed  Google Scholar 

  47. Gaggiotti, O. E. & Foll, M. Quantifying population structure using the F-model. Mol. Ecol. Resources 10, 821–830 (2010).

    Article  Google Scholar 

  48. Excoffier, L., Hofer, T. & Foll, M. Detecting loci under selection in a hierarchically structured population. Heredity 103, 285–298 (2009).

    Article  CAS  PubMed  Google Scholar 

  49. Gompert, Z. & Buerkle, C. A. A. Hierarchical Bayesian model for next-generation population genomics. Genetics 187, 903–917 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011). This provides a detailed description of the principles and properties of the D statistic (also known as the ABBA and BABA test), now widely used to detect and estimate rates of admixture and introgression.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Green, R. E. et al. A draft sequence of the neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Eriksson, A. & Manica, A. Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Proc. Natl Acad. Sci. 109, 13956–13960 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Beaumont, M. A. & Rannala, B. The Bayesian revolution in genetics. Nature Rev. Genet. 5, 251–261 (2004).

    Article  CAS  PubMed  Google Scholar 

  55. Nielsen, R. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154, 931–942 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Williamson, S. H. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl Acad. Sci. 102, 7882–7887 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009). This was the first study solving the expected AFS for an isolation with migration model using the diffusion approximation, opening the door for computing likelihoods for genomic SNP data.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Excoffier, L. & Foll, M. fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332–1334 (2011).

    Article  CAS  PubMed  Google Scholar 

  59. Adams, A. M. & Hudson, R. R. Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics 168, 1699–1712 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Wright, S. Evolution in Mendelian populations. Genetics 16, 97 (1931).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Kimura, M. Solution of a process of random genetic drift with a continuous model. Proc. Natl Acad. Sci. USA 41, 144 (1955).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Lukic´, S., Hey, J. & Chen, K. Non-equilibrium allele frequency spectra via spectral methods. Theor. Popul. Biol. 79, 203–219 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Lukić, S. & Hey, J. Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192, 619–639 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Stephens, M. in Handbook of Statistical Genetics 3rd edn (eds Balding, D. J., Bishop, M. & Cannings, C.) 878–908 (Wiley, 2007).

    Google Scholar 

  65. Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. 108, 11983–11988 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Sirén, J., Marttinen, P. & Corander, J. Reconstructing population histories from single nucleotide polymorphism data. Mol. Biol. Evol. 28, 673–683 (2011).

    Article  PubMed  CAS  Google Scholar 

  68. Gautier, M. & Vitalis, R. Inferring population histories using genome-wide allele frequency data. Mol. Biol. Evol. 30, 654–668 (2013).

    Article  CAS  PubMed  Google Scholar 

  69. Kingman, J. F. C. On the genealogy of large populations. J. Appl. Probab. 19, 27–43 (1982).

    Article  Google Scholar 

  70. Hudson, R. R. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23, 183–201 (1983).

    Article  CAS  PubMed  Google Scholar 

  71. Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Felsenstein, J. Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22, 521–565 (1988).

    Article  CAS  PubMed  Google Scholar 

  73. Marjoram, P. & Tavaré, S. Modern computational approaches for analysing molecular genetic variation data. Nature Rev. Genet. 7, 759–770 (2006).

    Article  CAS  PubMed  Google Scholar 

  74. Kuhner, M. K. Coalescent genealogy samplers: windows into population history. Trends Ecol. Evol. 24, 86–93 (2009).

    Article  PubMed  Google Scholar 

  75. Hey, J. & Nielsen, R. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc. Natl Acad. Sci. USA 104, 2785–2790 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Beerli, P. & Palczewski, M. Unified framework to evaluate panmixia and migration direction among multiple sampling locations. Genetics 185, 313–326 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Wang, Y. & Hey, J. Estimating divergence parameters with small samples from a large number of loci. Genetics 184, 363–379 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Lohse, K., Harrison, R. & Barton, N. H. A general method for calculating likelihoods under the coalescent process. Genetics 189, 977–987 (2011). This paper describes an interesting approach to obtain likelihoods for a large number of loci using generating functions that can be applied to isolation with migration models and can, in principle, deal with recombination.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Lohse, K., Barton, N. H., Melika, G. & Stone, G. N. A likelihood-based comparison of population histories in a parasitoid guild. Mol. Ecol. 21, 4605–4617 (2012).

    Article  PubMed  Google Scholar 

  80. Beaumont, M. A. Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Systemat. 41, 379–406 (2010).

    Article  Google Scholar 

  81. Sunnåker, M. et al. Approximate Bayesian computation. PLoS Computat. Biol. 9, e1002803 (2013).

    Article  CAS  Google Scholar 

  82. Hoban, S., Bertorelle, G. & Gaggiotti, O. E. Computer simulations: tools for population and evolutionary genetics. Nature Rev. Genet. 10, 110–122 (2012).

    Article  CAS  Google Scholar 

  83. Csilléry, K., Blum, M. G., Gaggiotti, O. E. & François, O. Approximate Bayesian computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418 (2010).

    Article  PubMed  Google Scholar 

  84. Becquet, C. & Przeworski, M. A new approach to estimate parameters of speciation models with application to apes. Genome Res. 17, 1505–1519 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Nice, C. C. et al. Hybrid speciation and independent evolution in lineages of alpine butterflies. Evolution 67, 1055–1068 (2013).

    Article  PubMed  Google Scholar 

  86. Li, S. & Jakobsson, M. Estimating demographic parameters from large-scale population genomic data using approximate Bayesian computation. BMC Genet. 13, 22 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Davison, D., Pritchard, J. & Coop, G. An approximate likelihood for genetic data under a model with recombination and population splitting. Theor. Popul. Biol. 75, 331–345 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Hellenthal, G., Auton, A. & Falush, D. Inferring human colonization history using a copying model. PLoS Genet. 4, e1000078 (2008).

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  90. Steinrücken, M., Paul, J. S. & Song, Y. S. A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol. 7 Sep 2012 (org/10.1016/j.tpb.2012.08.004).

  91. Paul, J. S., Steinrücken, M. & Song, Y. S. An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128 (2011). This study describes a promising approximation for obtaining ARGs consistent with the data. This can in principle be applied to calculate likelihoods under isolation with migration models explicitly accounting for recombination.

    Article  PubMed  PubMed Central  Google Scholar 

  92. Tachida, H. & Cockerham, C. C. Analysis of linkage disequilibrium in an island model. Theor. Popul. Biol. 29, 161–197 (1986).

    Article  CAS  PubMed  Google Scholar 

  93. Nordborg, M. & Tavare, S. Linkage disequilibrium: what history has to tell us. Trends Genet. 18, 83–90 (2002).

    Article  CAS  PubMed  Google Scholar 

  94. Myers, S., Fefferman, C. & Patterson, N. Can one learn history from the allelic spectrum? Theor. Popul. Biol. 73, 342–348 (2008).

    Article  PubMed  Google Scholar 

  95. Gravel, S. Population genetics models of local ancestry. Genetics 191, 607–619 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Pool, J. E. & Nielsen, R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics 181, 711–719 (2009). This study proposes a solid theoretical framework to describe the haplotype block lengths in a population receiving immigrants.

    Article  PubMed  PubMed Central  Google Scholar 

  97. Sankararaman, S., Patterson, N., Li, H., Pääbo, S. & Reich, D. The date of interbreeding between Neandertals and modern humans. PLoS Genet. 8, e1002947 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Patterson, N. J. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Loh, P.-R. et al. Inference of admixture parameters in human populations using weighted linkage disequilibrium. Preprint at arXiv [online], (2012).

  100. Wall, J. D. & Pritchard, J. K. Haplotype blocks and linkage disequilibrium in the human genome. Nature Rev. Genet. 4, 587–597 (2003).

    Article  CAS  PubMed  Google Scholar 

  101. Griffiths, R. C. & Marjoram, P. Ancestral inference from samples of DNA sequences with recombination. J. Computat. Biol. 3, 479–502 (1996).

    Article  CAS  Google Scholar 

  102. Kuhner, M. K., Yamato, J. & Felsenstein, J. Maximum likelihood estimation of recombination rates from population data. Genetics 156, 1393–1401 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Wang, Y. & Rannala, B. Bayesian inference of fine-scale recombination rates using population genomic data. Phil. Trans. R. Soc. B 363, 3921–3930 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  104. Hudson, R. R. Two-locus sampling distributions and their application. Genetics 159, 1805–1817 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. McVean, G., Awadalla, P. & Fearnhead, P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160, 1231–1241 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. De Iorio, M., Griffiths, R. C., Leblois, R. & Rousset, F. Stepwise mutation likelihood computation by sequential importance sampling in subdivided population models. Theor. Popul. Biol. 68, 41–53 (2005).

    Article  PubMed  Google Scholar 

  107. Wiuf, C. & Hein, J. Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999).

    Article  CAS  PubMed  Google Scholar 

  108. Wiuf, C. & Hein, J. The ancestry of a sample of sequences subject to recombination. Genetics 151, 1217–1228 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Hobolth, A., Christensen, O. F., Mailund, T. & Schierup, M. H. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  110. Mailund, T., Dutheil, J. Y., Hobolth, A., Lunter, G. & Schierup, M. H. Estimating divergence time and ancestral effective population size of Bornean and Sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genet. 7, e1001319 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Mailund, T. et al. A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genet. 8, e1003125 (2012). This is the first application of HMM-based methods for isolation with migration models, explicitly accounting for recombination.

    Article  PubMed  PubMed Central  Google Scholar 

  113. Pugach, I., Matveyev, R., Wollstein, A., Kayser, M. & Stoneking, M. Dating the age of admixture via wavelet transform analysis of genome-wide data. Genome Biol. 12, R19 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  114. Browning, S. & Browning, B. Identity by descent between distant relatives: detection and applications. Annu. Rev. Genet. 46, 617–633 (2012).

    Article  CAS  PubMed  Google Scholar 

  115. Francesco Palamara, P., Lencz, T., Darvasi, A. & Pe'er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).

    Article  CAS  Google Scholar 

  116. Rogers, A. R. & Jorde, L. B. Ascertainment bias in estimates of average heterozygosity. Am. J. Hum. Genet. 58, 1033–1041 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  117. Nielsen, R. Population genetic analysis of ascertained SNP data. Hum. Genom. 1, 218–224 (2004).

    Article  CAS  Google Scholar 

  118. Pool, J. E. et al. Population genomics of sub-Saharan Drosophila melanogaster: African diversity and non-African admixture. PLoS Genet. 8, e1003080 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  119. Corbett-Detig, R. B. & Hartl, D. L. Population genomics of inversion polymorphisms in Drosophila melanogaster. PLoS Genet. 8, e1003056 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  120. Li, R. Q. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).

    Article  CAS  PubMed  Google Scholar 

  121. Browning, S. R. & Browning, B. L. Haplotype phasing: existing methods and new developments. Nature Rev. Genet. 12, 703–714 (2011).

    Article  CAS  PubMed  Google Scholar 

  122. Branton, D. et al. The potential and challenges of nanopore sequencing. Nature Biotech. 26, 1146–1153 (2008).

    Article  CAS  Google Scholar 

  123. Hudson, R. R. Gene genealogies and the coalescent process. Oxford Surveys Evol. Biol. 7, 44 (1990).

    Google Scholar 

  124. Nordborg, M. Linkage disequilibrium, gene trees and selfing: an ancestral recombination graph with partial self-fertilization. Genetics 154, 923–929 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Hudson, R. R. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by grants from the US National Science Foundation and the US National Institutes of Health to J.H.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jody Hey.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Single-nucleotide polymorphisms

(SNPs). Sites in the DNA in which there is variation across the genomes in a population, usually comprising two alleles that correspond to two different nucleotides.

Ascertainment bias

Systematic bias introduced by the sampling design (for example, criteria used to select individuals and/or genetic markers) that induces a nonrandom sample of observations.

Paired-end libraries

Sequencing from each end of the fragments in a library. The two sequenced ends are typically separated by a gap.

Sympatric speciation

The process of divergence between populations or species occupying the same geographical area and in presence of gene flow.

Diversifying selection

Natural selection acting towards different alleles (or phenotypes) being favoured in different regions within a single population or among multiple connected populations.

Neutral genes

Genes for which genetic patterns are mostly affected by mutation and demographic factors, such as genetic drift and migration.

Allopatric divergence

The process of divergence between populations or species that are geographically separated, in the absence of gene flow.

Linkage disequilibrium

(LD). The nonrandom association of alleles at different sites or loci.

Islands of differentiation

Genomic regions of elevated differentiation owing to the action of natural selection.

F ST

The proportion of the total genetic variability occurring among populations, typically used as a measure of the level of population genetic differentiation.

Island model

A model introduced by Sewall Wright to study population structure comprising multiple populations connected to each other through migration.

Metapopulation model

In the context of FST-based statistics, this is an idealized model in which several populations diverge without migration from a common ancestral gene pool (or metapopulation).

Nested island model

A hierarchical island model with groups of populations in which migration among populations within the same group is higher than among populations in different groups.

Gene trees

Bifurcating trees that represent the ancestral relationships of homologous haplotypes sampled from a single or multiple populations. A gene tree includes coalescent events and, in models with gene flow, migration events. A gene tree is characterized by a topology, branch lengths, coalescence times and migration times.

Bayesian statistics

Statistical framework in which the parameters of the models are treated as random variables, allowing expression of the probability of parameters, given the data; this is called the posterior. The posterior probability is obtained by Bayes' rule, and it is proportional to the likelihood times the prior.

Allele frequency spectrum

(AFS). A distribution of the counts of single-nucleotide polymorphisms with a given observed frequency in a single or multiple populations.

Genetic drift

Stochastic changes in gene frequency owing to finite size of populations, resulting from the random sampling of gametes from the parents at each generation.

Coalescent theory

A theory that describes the distribution of gene trees (and ancestral recombination graphs) under a given demographic model that can be used to compute the probability of a given gene tree.

Generating functions

Statistical technique used to obtain the distribution of sums of random variables, as required in computation of the probability of genealogies given the parameters of an underlying model.

Haplotype

A DNA sequence that is inherited as a single unit in the absence of recombination.

Bottlenecks

Reductions in the size of populations owing to stochastic events or owing to colonization of new areas (founder events).

Ancestral recombination graphs

(ARGs). Graphs that represent the ancestral relationship of homologous DNA sequences sampled from a single or multiple populations. In models with gene flow, an ARG includes coalescent, migration and recombination events.

Identity by descent

(IBD). Two haplotypes are identical by descent if they are identical copies of a haplotype that are shared between individuals within families and hence are assumed to be identical by descent.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sousa, V., Hey, J. Understanding the origin of species with genome-scale data: modelling gene flow. Nat Rev Genet 14, 404–414 (2013). https://doi.org/10.1038/nrg3446

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3446

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing