Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Computer simulations: tools for population and evolutionary genetics

Key Points

  • Computer simulations of genetic polymorphisms complement analytical approaches for investigating complex evolutionary, epidemiological and ecological scenarios for both applied and theoretical purposes. Current uses include human evolutionary history, understanding the genetic bases of complex diseases, genetic epidemiology, conservation management, plant breeding and infection spread.

  • Dozens of software packages are now available. Some are highly flexible and widely applicable, whereas others address specific tasks, such as case–control studies or continuous populations on dynamic landscapes.

  • Some key features include the ability to: simulate thousands of markers over complex evolutionary histories; create realistic patterns of recombination; model, in detail, a species' life history and mating patterns; integrate selective forces on simple and complex traits; monitor perturbations such as bottlenecks and admixture; and simulate samples from museum specimens and ancient DNA.

  • We can divide the uses of simulators into probability-based prediction, making statistical inferences and validating new methods or statistics. Simulations fill various other roles: teaching of genetic concepts, planning of surveys for the collection of genetic samples and post hoc power analysis of data.

  • The steps to follow in implementing a simulation-based study are straightforward, but they do require key user decisions, such as choosing parameters, deciding run length, picking statistics to summarize the data and comparing models to each other or to real data.

  • A crucial step is deciding on an appropriate simulator and, as all simulators have weaknesses, the user must balance selective, demographic, genomic and historical complexity. We provide guidance on these aspects and explain the differences between forward and backward approaches.

  • In general, forward simulators can model life history (such as mating and age structure) and selection (including trait-based or sexual selection) in much more detail, whereas backward simulators are faster and do not require setting initial genetic conditions. A few simulators are currently used most frequently, but the wide range of options and capabilities that are available means that users can and should match a simulator to their study needs.

  • It is important to note that many users will find a simulation package that is 'ready to use' for their analysis, but some simulation projects will require programming skills to integrate the simulator into a bioinformatics pipeline. We also advise that users should carefully plan their simulations, keeping in mind potential limitations or model violations.

  • The future of simulators is certainly bright. Emerging areas of improvement include building increased ecological and landscape realism, connecting genetic simulators to other models, including infection spread or climate change, identifying appropriate and informative summary statistics, recreating biases from next-generation sequence data and increasing efficiency and accessibility.

Abstract

Computer simulations are excellent tools for understanding the evolutionary and genetic consequences of complex processes whose interactions cannot be analytically predicted. Simulations have traditionally been used in population genetics by a fairly small community with programming expertise, but the recent availability of dozens of sophisticated, customizable software packages for simulation now makes simulation an accessible option for researchers in many fields. The in silico genetic data produced by simulations, along with greater availability of population-genomics data, are transforming genetic epidemiology, anthropology, evolutionary and population genetics and conservation. In this Review of the state-of-the-art of simulation software, we identify applications of simulations, evaluate simulator capabilities, provide a guide for their use and summarize future directions.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Overview of simulation studies.
Figure 2: Designing predictive, inferential and validation simulation studies.
Figure 3: Decision matrix for choosing a simulator.

References

  1. Ray, N. & Excoffier, L. Inferring past demography using spatially explicit population genetic models. Hum. Biol. 81, 141–157 (2009).

    Article  PubMed  Google Scholar 

  2. Ohta, T. & Kimura, M. Simulation studies on electrophoretically detectable genetic variability in a finite population. Genetics 76, 615–624 (1974).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Kimura, M. Average time until fixation of a mutant allele in a finite population under continued mutation pressure: studies by analytical, numerical, and pseudo-sampling methods. Proc. Natl Acad. Sci. USA 77, 522–526 (1980).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. Sokal, R. R. & Wartenberg, D. E. A test of spatial autocorrelation analysis using an isolation-by-distance model. Genetics 105, 219–237 (1983).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Kingman, J. F. C. The coalescent. Stoch. Proc. Appl. 13, 235–248 (1982).

    Article  Google Scholar 

  6. Hudson, R. R. in Oxford Surveys in Evolutionary Biology (eds Futuyma, D. & Antonovics, J.) 1–44 (Oxford Univ. Press, UK, 1990). This book is the most easy-to-follow and authoritative review on the coalescent.

    Google Scholar 

  7. Fu, Y. X. & Li, W. H. Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14, 195–199 (1997).

    CAS  Article  PubMed  Google Scholar 

  8. Schneider, S. & Excoffier, L. Estimation of past demographic parameters from the distribution of pairwise differences when the mutation rates vary among sites: application to human mitochondrial DNA. Genetics 152, 1079–1089 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Weiss, G. & von Haeseler, A. Inference of population history using a likelihood approach. Genetics 149, 1539–1546 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Anderson, C. N. K., Ramakrishnan, U., Chan, Y. L. & Hadly, E. A. Serial SimCoal: a population genetics model for data from multiple populations and points in time. Bioinformatics 21, 1733–1734 (2004).

    Article  CAS  PubMed  Google Scholar 

  11. Chadeau-Hyam, M. et al. Fregene: simulation of realistic sequence-level data in populations and ascertained samples. BMC Bioinformatics 9, 364 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Peng, B. & Kimmel, M. simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21, 3686–3687 (2005).

    CAS  Article  PubMed  Google Scholar 

  13. Coombs, J. A., Letcher, B. H. & Nislow, K. H. Pedagog: software for simulating eco-evolutionary population dynamics. Mol. Ecol. Resour. 10, 558–563 (2010).

    Article  PubMed  Google Scholar 

  14. Estoup, A. et al. Combining genetic, historical and geographical data to reconstruct the dynamics of bioinvasions: application to the cane toad Bufo marinus. Mol. Ecol. Resour. 10, 886–901 (2010).

    Article  PubMed  Google Scholar 

  15. Currat, M., Ray, N. & Excoffier, L. Splatche: a program to simulate genetic diversity taking into account environmental heterogeneity. Mol. Ecol. Notes 4, 139–142 (2004). This paper introduces Splatche, one of the first simulators to account for environmental variation, which was a major step in use of simulations in landscape genetics.

    Article  Google Scholar 

  16. Strand, A. E. metasim 1.0: an individual-based environment for simulating population genetics of complex population dynamics. Mol. Ecol. Notes 2, 373–376 (2002). This paper introduces metasim (later RmetaSim), one of the earlier user-friendly software packages for simulation studies; the manuscript provides a coherent description of how to construct migration and transition state matrices, as well as a review of uses of simulations.

    Article  Google Scholar 

  17. Ritchie, D. Genome simulation approaches for synthesizing in silico datasets for human genomics. Adv. Genet. 72, 1–24 (2010).

    CAS  Article  PubMed  Google Scholar 

  18. Carvajal-Rodriguez, A. Simulation of genomes: a review. Curr. Genomics 9, 155–159 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. Carvajal-Rodriguez, A. Simulation of genes and genomes forward in time. Curr. Genomics 11, 58–61 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. Lively, C. A review of red queen models for the persistence of obligate sexual reproduction. J. Hered. 101, S13–S20 (2010).

    Article  PubMed  Google Scholar 

  21. Church, S. & Taylor, D. The evolution of reproductive isolation in spatially structured populations. Evolution 56, 1859–1862 (2002).

    Article  PubMed  Google Scholar 

  22. Servedio, M. The evolution of premating isolation: local adaptation and natural and sexual selection against hybrids. Evolution 58, 913–924 (2004).

    Article  PubMed  Google Scholar 

  23. Daleszczyk, K. & Bunevich, A. N. Population viability analysis of European bison populations in Polish and Belarusian parts of Bialowieza Forest with and without gene exchange. Biol. Conserv. 142, 3068–3075 (2009).

    Article  Google Scholar 

  24. Vonholdt, B. M. et al. The genealogy and genetic viability of reintroduced Yellowstone grey wolves. Mol. Ecol. 17, 252–274 (2008).

    Article  PubMed  Google Scholar 

  25. Alves, D. A. et al. Successful maintenance of a stingless bee population despite a severe genetic bottleneck. Conserv. Genet. 12, 647–658 (2011).

    Article  Google Scholar 

  26. Ng, K. K. S., Lee, S. L. & Ueno, S. Impact of selective logging on genetic diversity of two tropical tree species with contrasting breeding systems using direct comparison and simulation methods. For. Ecol. Manage. 257, 107–116 (2009).

    Article  Google Scholar 

  27. Kenney, J. S., Smith, J. L. D., Starfield, A. M. & McDougal, C. W. The long-term effects of tiger poaching on population viability. Conserv. Biol. 9, 1127–1133 (1995).

    Article  PubMed  Google Scholar 

  28. Bruford, M. et al. Projecting genetic diversity and population viability for the fragmented orang-utan population in the Kinabatangan floodplain, Sabah, Malaysia. Endanger. Species Res. 12, 249–261 (2010).

    Article  Google Scholar 

  29. Yang, Z., Hu, J. & Liu, N. The influence of dispersal on the metapopulation viability of Giant Panda (Aliuropoda melanoleuca) in the Minshan Mountains. Acta Zool. Academ. Sci. Hung. 53, 169–184 (2007).

    Google Scholar 

  30. Sellers, T. A., Weaver, T. W., Phillips, B., Altmann, M. & Rich, S. S. Environmental factors can confound identification of a major gene effect: results from a segregation analysis of a simulated population of lung cancer families. Genet. Epidemiol. 15, 251–262 (1998).

    CAS  Article  PubMed  Google Scholar 

  31. Peng, B. & Kimmel, M. Simulations provide support for the common disease–common variant hypothesis. Genetics 175, 763–776 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. Maher, B. The case of the missing heritability. Nature 456, 18–21 (2008).

    CAS  Article  PubMed  Google Scholar 

  33. Dickson, S. P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D. B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. van Oosterhout, C. A new theory of MHC evolution: beyond selection on the immune genes. Proc. R. Soc. B 276, 657–665 (2009).

    CAS  Article  PubMed  Google Scholar 

  35. Lohmueller, K. E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  36. Fagundes, N. J. R. et al. Statistical evaluation of alternative models of human evolution. Proc. Natl Acad. Sci. USA 104, 17614–17619 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  37. Hellenthal, G., Auton, A. & Falush, D. Inferring human colonization history using a copying model. PLoS Genet. 4 e1000078 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Ficetola, G. F., Bonin, A. & Miaud, C. Population genetics reveals origin and number of founders in a biological invasion. Mol. Ecol. 17, 773–782 (2008). This is a simple but striking example of the use of simulations to infer parameters of a historical process (in this case, introduction of non-native species).

    CAS  Article  PubMed  Google Scholar 

  39. Banks, S. C. et al. Genetic structure of a recent climate change-driven range extension. Mol. Ecol. 19, 2011–2024 (2010).

    Article  PubMed  Google Scholar 

  40. Martínez-Cruz, B., Godoy, J. A. & Negro, J. J. Population genetics after fragmentation: the case of the endangered Spanish imperial eagle (Aquila adalberti). Mol. Ecol. 13, 2243–2255 (2004).

    Article  CAS  PubMed  Google Scholar 

  41. Fabbri, E. et al. From the Apennines to the Alps: colonization genetics of the naturally expanding Italian wolf (Canis lupus) population. Mol. Ecol. 16, 1661–1671 (2007).

    CAS  Article  PubMed  Google Scholar 

  42. Alberto, F. et al. Habitat continuity and geographic distance predict population genetic differentiation in giant kelp. Ecology 91, 49–56 (2010).

    Article  PubMed  Google Scholar 

  43. Enard, D., Depaulis, F. & Crollius, H. R. Human and non-human primate genomes share hotspots of positive selection. PLoS Genet. 6 e1000840 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Mardulyn, P. & Milinkovitch, M. C. Inferring contemporary levels of gene flow and demographic history in a local population of the leaf beetle Gonioctena olivacea from mitochondrial DNA sequence variation. Mol. Ecol. 14, 1641–1653 (2005).

    CAS  Article  PubMed  Google Scholar 

  45. Peter, B. M., Wegmann, D. & Excoffier, L. Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure. Mol. Ecol. 19, 4648–4660 (2010). This paper provides a coherent contemporary example of use of ABC in model choice.

    Article  PubMed  Google Scholar 

  46. Mardulyn, P., Mikhailov, Y. E. & Pasteels, J. M. Testing phylogeographic hypotheses in a Euro–Siberian cold-adapted leaf beetle with coalescent simulations. Evolution 63, 2717–2729 (2009).

    CAS  Article  PubMed  Google Scholar 

  47. Thalmann, O., Fischer, A., Lankester, F., Paabo, S. & Vigilant, L. The complex evolutionary history of gorillas: insights from genomic data. Mol. Biol. Evol. 24, 146–158 (2007).

    CAS  Article  PubMed  Google Scholar 

  48. Haanes, H., Roed, K. H., Flagstad, O. & Rosef, O. Genetic structure in an expanding cervid population after population reduction. Conserv. Genet. 11, 11–20 (2010).

    Article  Google Scholar 

  49. Shriner, D., Liu, Y., Nickle, D. C. & Mullins, J. I. Evolution of intrahost HIV-1 genetic diversity during chronic infection. Evolution 60, 1165–1176 (2006).

    PubMed  Google Scholar 

  50. Tanaka, M. M., Francis, A. R., Luciani, F. & Sisson, S. A. Using approximate Bayesian computation to estimate tuberculosis transmission parameters from genotype data. Genetics 173, 1511–1520 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  51. Jorjani, H. A general genomics simulation program. Interbull Bull. 40, 202–206 (2010).

    Google Scholar 

  52. Luikart, G., Allendorf, F. W., Cornuet, J.-M. & Sherwin, W. B. Distortion of allele frequency distributions provides a test for recent population bottlenecks. J. Hered. 89, 238–247 (1998).

    CAS  Article  PubMed  Google Scholar 

  53. Beerli, P. Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations. Mol. Ecol. 13, 827–836 (2004).

    Article  PubMed  Google Scholar 

  54. Waples, R. S. Temporal variation in allele frequencies: testing the right hypothesis. Evolution 43, 1236–12351 (1989).

    Article  PubMed  Google Scholar 

  55. Faubet, P., Waples, R. S. & Gaggiotti, O. E. Evaluating the performance of a multilocus Bayesian method for the estimation of migration rates. Mol. Ecol. 16, 1149–1166 (2007).

    Article  PubMed  Google Scholar 

  56. Gaggiotti, O. E., Lange, O., Rassmann, K. & Gliddon, C. A comparison of two indirect methods for estimating average levels of gene flow using microsatellite data. Mol. Ecol. 8, 1513–1520 (1999).

    CAS  Article  PubMed  Google Scholar 

  57. Hardy, O. J., Charbonnel, N., Freville, H. & Heuertz, M. Microsatellite allele sizes: a simple test to assess their significance on genetic differentiation. Genetics 163, 1467–1482 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Manel, S., Berther, P. & Luikart, G. Detecting wildlife poaching: identifying the origin of individuals with Bayesian assignment tests and multilocus genotypes. Conserv. Biol. 16, 650–659 (2002).

    Article  Google Scholar 

  59. Francois, O. & Eric, D. Spatially explicit Bayesian clustering models in population genetics. Mol. Ecol. Res. 10, 773–784 (2010).

    Article  Google Scholar 

  60. Vaha, J.-P. & Primmer, C. R. Efficiency of model-based Bayesian methods for detecting hybrid individuals under different hybridization scenarios and with different numbers of loci. Mol. Ecol. 15, 63–72 (2006).

    CAS  Article  PubMed  Google Scholar 

  61. Landguth, E. L., Cushman, S. A., Murphy, M. A. & Luikart, G. Relationships between migration rates and landscape resistance assessed using individual-based simulations. Mol. Ecol. Res. 10, 854–862 (2010).

    CAS  Article  Google Scholar 

  62. Vasemagi, A. & Primmer, C. R. Challenges for identifying functionally important genetic variation: the promise of combining complementary research strategies. Mol. Ecol. 14, 3623–3642 (2005).

    CAS  Article  PubMed  Google Scholar 

  63. Meuwissen, T. & Goddard, M. Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics 185, 623–631 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  64. Huang, J., Johnson, A. & O'Donnell, C. PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies. Bioinformatics 27, 1201–1206 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  65. Spencer, C. C. A., Su, Z., Donnelly, P. & Marchini, J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 5, e1000477 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Wray, N. R., Goddard, M. E. & Visscher, P. M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  67. Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 3, e3395 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Ryman, N. et al. Power for detecting genetic divergence: differences between statistical methods and marker loci. Mol. Ecol. 15, 2031–2045 (2006).

    CAS  Article  PubMed  Google Scholar 

  69. Rosenberg, N. A. & Nordborg, M. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Rev. Genet. 3, 380–390 (2002).

    CAS  Article  PubMed  Google Scholar 

  70. Crandall, K. A., Bininda-Emonds, O. R. P., Mace, G. M. & Wayne, R. K. Considering evolutionary processes in conservation biology. Trends Ecol. Evol. 15, 290–295 (2000).

    CAS  Article  PubMed  Google Scholar 

  71. Ryman, N. & Palm, S. POWSIM: a computer program for assessing statistical power when testing for genetic differentiation. Mol. Ecol. 6, 600–602 (2006).

    Article  Google Scholar 

  72. Hamilton, M. Population Genetics (Wiley–Blackwell, Chichester, 2009).

    Google Scholar 

  73. Jones, T. C. & Laughlin, T. F. Popgen fishbowl: a free online simulation model of microevolutionary processes. Am. Biol. Teach. 72, 100–103 (2010).

    Article  Google Scholar 

  74. Delport, W. Coalface: a graphical user interface program for the simulation of coalescence. Mol. Ecol. Notes 6, 281–284 (2006).

    Article  Google Scholar 

  75. Neuenschwander, S. AquaSplatche: a program to simulate genetic diversity in populations living in linear habitats. Mol. Ecol. Notes 6, 583–585 (2006).

    Article  Google Scholar 

  76. Gaggiotti, O. E. & Excoffier, L. A simple method of removing the effect of a bottleneck and unequal population sizes on pairwise genetic distances. Proc. R. Soc. B 267, 81–87 (2000).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  77. Garza, J. C. & Williamson, E. G. Detection of reduction in population size using data from microsatellite loci. Mol. Ecol. 10, 305–318 (2001).

    CAS  Article  PubMed  Google Scholar 

  78. Excoffier, L. & Lischer, H. E. L. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Res. 10, 564–567 (2010).

    Article  Google Scholar 

  79. Rousset, F. Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol. Ecol. Res. 8, 103–106 (2008).

    Article  Google Scholar 

  80. Wright, S. Evolution in Mendelian populations. Genetics 16, 97–159 (1931).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Hoggart, C. J. et al. Sequence-level population simulations over large genomic regions. Genetics 177, 1725–1731 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  82. Marjoram, P. & Wall, J. Fast “coalescent” simulation. BMC Genetics 7, 16 (2006). This was a description of one of the first implementations of the sequentially Markovian coalescent algorithm to improve the efficiency of the coalescent simulators.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. McVean, G. A. T. & Cardin, N. J. Approximating the coalescent with recombination. Phil. Trans. R. Soc. B 360, 1387–1393 (2005). This paper describes the introduction of the approximation to the coalescent with recombination that allows fast simulation of genealogies sequentially along a sequence. This development paved the way for coalescent simulators able to generate genomic data.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  84. Wall, J. D. & Pritchard, J. K. Assessing the performance of the haplotype block model of linkage disequilibrium. Am. J. Hum. Genet. 73, 502–515 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  85. Padhukasahasram, B., Marjoram, P., Wall, J. D., Bustamante, C. D. & Nordborg, M. Exploring population genetic models with recombination using efficient forward-time simulations. Genetics 178, 2417–2427 (2008). This paper describes a major innovation in the forward-in-time simulations, which determine the fate of individuals several generations in the future; the program saves efficiency by not following those that will not leave descendants.

    Article  PubMed  PubMed Central  Google Scholar 

  86. Chen, G. K., Marjoram, P. & Wall, J. D. Fast and flexible simulation of DNA sequence data. Genome Res. 19, 136–142 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  87. Dudek, S., Motsinger, A., Velez, D., Williams, S. & Ritchie, M. Data simulation software for whole-genome association and other studies in human genetics. Pac. Symp. Biocomput. 11, 499–510 (2006).

    Google Scholar 

  88. Harismendy, O. et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 10, R32 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Schwartz, S., Oren, R. & Ast, G. Detection and removal of biases in the analysis of next-generation sequencing reads. PLoS ONE 6, e16685 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  90. Wernsdörfer, H. et al. Relationships between demography and gene flow and their importance for the conservation of tree populations in tropical forests under selective felling regimes. Conserv. Genet. 12, 15–29 (2010).

    Article  Google Scholar 

  91. Epperson, B. K. et al. Utility of computer simulations in landscape genetics. Mol. Ecol. 19, 3549–3564 (2010).

    Article  PubMed  Google Scholar 

  92. Haddock, S. & Dunn, C. Practical Computing for Biologists (Sinauer Associates, 2010). This book is a useful, practical and brand new guide to basic bioinformatics including scripting, data management and command line; especially recommended for graduate students in all areas of biology.

    Google Scholar 

  93. Ilves, K., Huang, W., Wares, J. & Hickerson, M. Colonization and/or mitochondrial selective sweeps across the North Atlantic intertidal assemblage revealed by multi-taxa approximate Bayesian computation. Mol. Ecol. 19, 4505–4519 (2010).

    Article  PubMed  Google Scholar 

  94. Hudson, R. R. Island models and the coalescent process. Mol. Ecol. 7, 413–418 (1998).

    Article  Google Scholar 

  95. Hudson, R. R. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002). This paper introduces ms, one of the earliest and still one of the most popular software packages for simulation studies.

    CAS  Article  PubMed  Google Scholar 

  96. Wall, J. & Hudson, R. Coalescent simulations and statistical tests of neutrality. Mol. Biol. Evol. 18, 1134–1135 (2001).

    CAS  Article  PubMed  Google Scholar 

  97. Liu, Y., Athanasiadis, G. & Weale, M. A survey of genetic simulation software for population and epidemiological studies. Hum. Genomics 3, 79–86 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  98. Csilléry, K., Blum, M. G. B., Gaggiotti, O. E. & François, O. Approximate Bayesian computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418 (2010).

    Article  PubMed  Google Scholar 

  99. Beaumont, M. A. Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol. Evol. Syst. 41, 379–406 (2010). This provides overview of the theory, methodology, major advances and current and future uses of ABC.

    Article  Google Scholar 

  100. Bertorelle, G., Benazzo, A. & Mona, S. ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Mol. Ecol. 19, 2609–2625 (2010).

    CAS  Article  PubMed  Google Scholar 

  101. Lopes, J. & Boessenkool, S. The use of approximate Bayesian computation in conservation genetics and its application in a case study on yellow-eyed penguins. Conserv. Genet. 11, 421–433 (2009).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the European Project CONGRESS funded by the European Union under FP7. We also thank E. Anderson and two anonymous referees for very helpful suggestions, as well as all of the software developers (see the full list in the Supplementary information) for their assistance in checking the information presented in Table 1 and in Supplementary information S1,S2 (tables).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oscar E. Gaggiotti.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary information S1 (table)

Basic information about simulators we examined, input/output information, links, and published journal articles for reference (PDF 228 kb)

Supplementary information S2 (table)

Brief, general descriptions of programs using direct quotes from original journal article and/or program manual (PDF 121 kb)

Related links

Related links

FURTHER INFORMATION

Sean Hoban's homepage

Giorgio Bertorelle's homepage

Oscar E. Gaggiotti's homepage

ConGRESS

Nature Reviews Genetics Series on Study Designs

Glossary

Stocking

Human-mediated supplementation of a native population with translocated or captive-bred individuals to increase population size or growth rates.

Stepwise mutation model

A mutation model in which the allelic states produced by mutation depend on the initial state of an allele. The basic version assumes mutations between adjacent states, but other versions allow larger mutational changes. This model is commonly used to model the microsatellite mutation process.

Infinite alleles model

A model in which each mutational event creates a new allele that is unlike any other that is currently present in the population.

Coalescent theory

A theory describing the genealogy of chromosomes or genes. The genealogy is constructed backwards-in-time, starting with the present-day sample. Lineages coalesce until the most recent common ancestor of the sample is reached.

Parametric bootstrap confidence intervals

These measure the accuracy of sample estimates using a bootstrapping approach where a parametric model is fitted to the data, and samples of parameter values are drawn from this fitted model.

Population viability analysis

(PVA). A probability-based modelling approach for assessing the future potential (such as reproduction and extinction) of populations or species.

FIS

Wright's inbreeding coefficient, measuring the level of correlation between two genes drawn from an individual relative to two genes drawn from the population. Also defined as the probability that two alleles in an individual are both descended from a single allele in an ancestor.

Summary statistics

Numerical values for summarizing the characteristics of a genetic data set; these often summarize features such as variability (number of alleles) or population differentiation (FST).

Bayesian

A scientific paradigm that uses probability as a means of quantifying the analyst's knowledge or uncertainty concerning the model and/or its parameters, given the data observed. Given a particular model described by a likelihood function, the approach involves choosing a prior distribution and then updating this with the information provided by the observed data.

Most recent common ancestor

In the case of a sample of genes, this is the most recent gene from which all alleles in the sample are directly descended.

Number of segregating sites

The number of polymorphic sites in a sample of homologous DNA sequences. It measures the degree of DNA sequence variation that is present in the sample.

Assignment tests

A broad category of methods whose goal is to determine with a degree of confidence the population of origin of individuals using genetic data.

Panmixia

The random mating of individuals within a breeding population.

Genome scan

Large-scale genotyping (thousands of markers) that is usually used to detect outliers such as regions of the genome under selection.

Prior distributions

The probability distributions of parameter values before observing the data. They reflect the observer's knowledge about what values the model parameters might take before having seen the data.

Posterior distributions

The conditional distributions of the parameter given the observed data. They reflect both the likelihood of the data and the prior distribution. They represent what we know about the model parameters, having observed the data.

Bayes factors

The relative odds that the hypothesis is true before and after examining the data. Calculated as the ratio of the prior probabilities of the null hypothesis versus the alternative hypothesis over the ratio of the posterior probabilities.

Deviance information criteria

(DIC). A method of model comparison or selection in which increased fit owing to addition of terms is balanced by a penalty for each additional term.

Carrying capacities

The maximum population size of a species that a habitat can sustain. It is determined by availability of space and resources.

Admixture

The interbreeding of individuals issued from two or more distinct populations or species.

Wright's island model

A population-genetics model in which all populations are of equal size and contribute equally to a global migrant pool, from which each population draws an equal proportion of immigrants each generation.

Hierarchical island model

A variation on Wright's island model in which local sets of populations are connected to each other by a relatively high migration rate and to other local sets of populations by a relatively low rate. They are well-suited to modelling species that are distributed over several continents.

Geographic information system

(GIS). A collection of spatially referenced data, such as geographical and altitudinal coordinates of individuals.

k allele model

A mutation model in which each allele can mutate to any of the other k – 1 possible alleles with equal probability.

Sequential Markov coalescent

A simplified genealogical process that aims to capture the essential features of the full coalescent model with recombination while being scalable in the number of loci. Computation time is saved by only accounting for coalescence between lineages without overlapping ancestral material.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hoban, S., Bertorelle, G. & Gaggiotti, O. Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 13, 110–122 (2012). https://doi.org/10.1038/nrg3130

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3130

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing