Data analysis and method development in population genetics often rely on the simulation of models that provide in silico ground truth. While numerous models have been published for many species, efficient and reproducible simulation is far from easy. In an effort to standardize population genetic simulations, the PopSim Consortium developed the stdpopsim resource.

Stdpopsim currently consists of a catalog of widely used population genetic models for six species: Homo sapiens, Pongo abelii, Canis familiaris, Drosophila melanogaster, Arabidopsis thaliana and Escherichia coli. Information including physical organization of the genome, inferred genetic maps, population-level parameters, and demographic models is carefully curated. The simulation engines support both coalescent and forward simulations, with rigorous quality control for reliable execution. Both a command line interface and a Python API are provided for ease of use.

The authors applied stdpopsim to benchmark several methods for demographic inference. Compared to most previous benchmarking efforts, the standardized strategy deployed by stdpopsim can minimize the chance of errors when simulating complicated models and can enable consistent and fair comparison under a variety of relevant population genetic scenarios.

The PopSim Consortium is adding support for more complex simulation models, including various forms of selection, as well as expanding the catalog of species and demographic models, notes Jerome Kelleher of the University of Oxford. “This project was born out of the need for standardization and community standards in population genetics,” says Andrew Kern of the University of Oregon. “A number of us saw a need to try to actively pull the community together, to establish best practices and communal resources that will enable the next decade(s) of discovery. Simulation is a crucial tool for popgen, so this first product from the PopSim consortium, the stdpopsim library, is a logical initial step.”