Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Nationwide genomic atlas of soil-dwelling Listeria reveals effects of selection and population ecology on pangenome evolution


Natural bacterial populations can display enormous genomic diversity, primarily in the form of gene content variation caused by the frequent exchange of DNA with the local environment. However, the ecological drivers of genomic variability and the role of selection remain controversial. Here, we address this gap by developing a nationwide atlas of 1,854 Listeria isolates, collected systematically from soils across the contiguous United States. We found that Listeria was present across a wide range of environmental parameters, being mainly controlled by soil moisture, molybdenum and salinity concentrations. Whole-genome data from 594 representative strains allowed us to decompose Listeria diversity into 12 phylogroups, each with large differences in habitat breadth and endemism. ‘Cosmopolitan’ phylogroups, prevalent across many different habitats, had more open pangenomes and displayed weaker linkage disequilibrium, reflecting higher rates of gene gain and loss, and allele exchange than phylogroups with narrow habitat ranges. Cosmopolitan phylogroups also had a large fraction of genes affected by positive selection. The effect of positive selection was more pronounced in the phylogroup-specific core genome, suggesting that lineage-specific core genes are important drivers of adaptation. These results indicate that genome flexibility and recombination are the consequence of selection to survive in variable environments.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Nationwide distribution and ecological drivers of Listeria.
Fig. 2: Biogeography, pangenome openness and recombination across Listeria phylogroups.
Fig. 3: Stronger positive selection in more cosmopolitan phylogroup.
Fig. 4: Impact of selection differs between Listeria core genes and phylogroup-specific core genes.

Data availability

Trimmed paired-end reads for the 594 Listeria isolates used in this study were deposited to the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (SRA) and assembled genomes were deposited to NCBI GenBank under the accession numbers listed in Supplementary Table 2. Metadata collected in this study including sampling date, latitude, longitude, elevation, soil physiochemical data, climate data and land-use data are provided in Supplementary Table 1.

Code availability

Codes for sigB allelic typing, negative sample selection, core SNPs detection and Listeria phylogroup detection are available at The code for land-use proportion calculation is available at


  1. McInerney, J. O., McNally, A. & O’Connell, M. J. Why prokaryotes have pangenomes. Nat. Microbiol. 2, 17040 (2017).

  2. Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).

    Article  CAS  PubMed  Google Scholar 

  3. Bentley, S. Sequencing the species pan-genome. Nat. Rev. Microbiol. 7, 258–259 (2009).

    Article  CAS  PubMed  Google Scholar 

  4. Bromham, L. & Penny, D. The modern molecular clock. Nat. Rev. Genet. 4, 216–224 (2003).

    Article  CAS  PubMed  Google Scholar 

  5. Otto, S. P. & Whitlock, M. C. The probability of fixation in populations of changing size. Genetics 146, 723–733 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Moura, A. et al. Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes. Nat. Microbiol. 2, 16185 (2016).

  7. Linke, K. et al. Reservoirs of Listeria species in three environmental ecosystems. Appl. Environ. Microbiol. 80, 5583–5592 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Liao, J., Wiedmann, M. & Kovac, J. Genetic stability and evolution of the sigB allele, used for Listeria sensu stricto subtyping and phylogenetic inference. Appl. Environ. Microbiol. 83, e00306–e00317 (2017).

    PubMed  PubMed Central  Google Scholar 

  9. Duché, O., Trémoulet, F., Glaser, P. & Labadie, J. Salt stress proteins induced in Listeria monocytogenes. Appl. Environ. Microbiol. 68, 1491–1498 (2002).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Mcclure, P. J., Roberts, T. A. & Oguru, P. O. Comparison of the effects of sodium chloride, pH and temperature on the growth of Listeria monocytogenes on gradient plates and in liquid medium. Lett. Appl. Microbiol. 9, 95–99 (1989).

    Article  CAS  Google Scholar 

  11. Schwarz, G., Mendel, R. R. & Ribbe, M. W. Molybdenum cofactors, enzymes and pathways. Nature 460, 839–847 (2009).

    Article  CAS  PubMed  Google Scholar 

  12. Cordero, O. X. & Polz, M. F. Explaining microbial genomic diversity in light of evolutionary ecology. Nat. Rev. Microbiol. 12, 263–273 (2014).

    Article  CAS  PubMed  Google Scholar 

  13. Iranzo, J., Wolf, Y. I., Koonin, E. V. & Sela, I. Gene gain and loss push prokaryotes beyond the homologous recombination barrier and accelerate genome sequence divergence. Nat. Commun. 10, 5376 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Thomas, C. M. & Nielsen, K. M. Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat. Rev. Microbiol. 3, 711–721 (2005).

    Article  CAS  PubMed  Google Scholar 

  15. Smith, J. M., Feil, E. J. & Smith, N. H. Population structure and evolutionary dynamics of pathogenic bacteria. BioEssays 22, 1115–1122 (2000).

    Article  CAS  PubMed  Google Scholar 

  16. Crits-Christoph, A., Olm, M. R., Diamond, S., Bouma-Gregson, K. & Banfield, J. F. Soil bacterial populations are shaped by recombination and gene-specific selection across a grassland meadow. ISME J. 14, 1834–1846 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Murrell, B. et al. Gene-wide identification of episodic selection. Mol. Biol. Evol. 32, 1365–1371 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Angelastro, A. Chemoenzymatic synthesis of isotopically labelled folates. J. Am. Chem. Soc. 139, 13047–13054 (2017).

    Article  CAS  PubMed  Google Scholar 

  19. Shapiro, B. J. et al. Population genomics of early events in the ecological differentiation of bacteria. Science 336, 48–51 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Choudoir, M. J., Doroghazi, J. R. & Buckley, D. H. Latitude delineates patterns of biogeography in terrestrial Streptomyces. Environ. Microbiol. 18, 4931–4945 (2016).

    Article  PubMed  Google Scholar 

  21. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Black, C. A., Evans, D. D., Ensminger, L. E., White, J. L. & Clark, F. E. Methods of Soil Analysis Part 1: Physical and Mineralogical Properties, Including Statistics of Measurement and Sampling 128–151 (American Society of Agronomy, 1965).

  27. Weller, D., Belias, A., Green, H., Roof, S. & Wiedmann, M. Landscape, water quality, and weather factors associated with an increased likelihood of foodborne pathogen contamination of New York streams used to source water for produce production. Food Sustain. Food Syst. 3, 124 (2020).

    Article  Google Scholar 

  28. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).

    Google Scholar 

  29. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

    Article  CAS  PubMed  Google Scholar 

  30. Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, 309–314 (2018).

    Article  CAS  Google Scholar 

  31. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Article  CAS  PubMed  Google Scholar 

  32. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G. & Toth, I. K. Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal. Methods 8, 12–24 (2016).

    Article  Google Scholar 

  36. Carlin, C. R. et al. Listeria cossartiae sp. nov., Listeria immobilis sp. nov., Listeria portnoyi sp. nov. and Listeria rustica sp. nov. isolated from agricultural water and natural environments. Int J. Syst. Evol. Microbiol. 71, 004795 (2021).

    CAS  PubMed Central  Google Scholar 

  37. Arevalo, P., VanInsberghe, D., Elsherbini, J., Gore, J. & Polz, M. F. A reverse ecology approach based on a biological definition of microbial populations. Cell 178, 820–834 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Méric, G. et al. A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter. PLoS ONE 9, e92798 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Gardner, S. N., Slezak, T. & Hall, B. G. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics 31, 2877–2878 (2015).

    Article  CAS  PubMed  Google Scholar 

  40. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Kelly, J. K. A test of neutrality based on interlocus associations. Genetics 146, 1197–1206 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    Article  CAS  PubMed  Google Scholar 

  43. Martin, D. P., Murrell, B., Golden, M., Khoosal, A. & Muhire, B. RDP4: detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Liao, J. et al. Serotype-specific evolutionary patterns of antimicrobial-resistant Salmonella enterica. BMC Evol. Biol. 19, 132 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Pond, S. L. K., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Mol. Biol. Evol. 23, 1891–1901 (2006).

    Article  CAS  Google Scholar 

Download references


This work was supported by the Center for Produce Safety (award number 2018CPS13) through the Florida Department of Agriculture and Consumer Services (under agreement number 024842) (M.W.) and Simons Foundation Collaboration: Principles of Microbial Ecosystems (PriME) award # 542395 (O.X.C.). The contents do not necessarily reflect the views or policies of the Center for Produce Safety the United States Department of Agriculture nor does mention of trade names, commercial productions, services or organization imply endorsement by the U.S. Government. We are grateful for A. Harrison, A. Snyder, A. Curtis, A. Shed, A. Alles, A. Andrus, A.a Ho Watson, A. Edwarthy, A. King, A. Roberts, A. Carroll, A. Ferrero, A. Buehler, B. Perkins, C. Casou, C. Ajua, C. Metzger, C. Rock, C. Day, C. Burnham, C. Mauck, C. Cox, C. Fitzegerald, C. Robinson, D. Koziol, D. Bryan, D. Trudelle, D. van De Grift, D. Baumler, D. Murphy, D. Sue, D. Brasill, Debarah Weller, Donald Weller, D. Thomas, E. White, E. Wilson, F. Becker, G. Gandhi, G. Bulnhem, H. Zhang, H. Duong, H. Deng, J. Zimmerman, J. Lunsford, J. Kovac, J. Adams, J. Goettig, J. Moreira, J. Perkins, J. David, J. Eggers, J. Jenkins, J. Steffan, J. Bejeck, K. Kniel, K. Kharel, K. Kruckow, K. Cobb, K. Schmidt, K. Gall, K. Kemp, K. Jordan, L. Sun, L. Carroll, L. Gorski, L. Goddik, L. Adams, L. Nelther, M. Scawartz, M. Schwart, M. Yeung, M. Levandowski, M. Nelther, M. Stasiewicz, M. Watterson, M. Jin, M. Cao, M. Theriault, N. Brassill, R. Nuir, R. Vgas, R. Smith, R. Wallar, S. Cimowsky, S. Beno, S. Murphy, S. Duerr, S. Ahn, S. Morquez, S. Clark, T. Cobb, T. Denes, T. Lan, T. Autry, T. Peters, V. Gund, V. Lappi, V. Chherri, X. Cheng, Y. Han, Y. Hu, Z. Raposo and Z. Xiong for helping with the sample collection.

Author information

Authors and Affiliations



J.L., M.W. and O.X.C. designed the study. J.L., D.L.W. and M.W. designed the sample collection plan. J.L. coordinated the sample collection processes. J.L. and X.G. processed soil samples and performed Listeria isolation and molecular characterization. J.L. analysed the data with input from S.P., D.H.B. and O.X.C. J.L. and O.X.C. wrote the manuscript with input from S.P., D.H.B. and M.W.

Corresponding authors

Correspondence to Martin Wiedmann or Otto X. Cordero.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Density of samples positive and negative for Listeria based on spatial, soil, and climate variables.

“***”, “**”, and “*” indicates a significant difference between positive and negative samples at P level of 0.001, 0.01, and 0.05, respectively, in two-sided Mann–Whitney tests with a Benjamini and Hochberg (BH) false discovery rate (FDR) correction.

Extended Data Fig. 2 Pairwise Pearson correlation between spatial, soil, climate, and land-use variables.

r is the Pearson coefficient. “*” indicates that variables are significantly correlated at two-sided P level of 0.05 with a BH FDR correction. TN: total nitrogen; TC: total carbon; OM: organic matter; Developed A: developed open space (< 20% impervious cover); Developed B: developed open space (>20% impervious cover).

Extended Data Fig. 3 Variable importance in predicting the presence of Listeria based on the Mean Decrease Gini index in a random forest model.

The 24 environmental variables (13 soil, 3 climate, and 8 land-use variables) and 2 spatial variables are sorted in ascending order according to the median Mean Decrease Gini value of 1,000 repetitions. Minimum and maximum values are depicted by short vertical lines of whiskers; the box signifies the upper and lower quartiles, and the short line within the box signifies the median. Points above and below the whiskers indicate outliers. Developed A: developed open space (< 20% impervious cover); Developed B: developed open space (>20% impervious cover).

Extended Data Fig. 4 Clusters in networks of recent horizontal gene transfer among Listeria genomes.

Nodes represent clonal clusters and edges represent the inferred amount of gene flow between them. Node size indicates the size of clonal cluster, which represents a group of too closely related genomes (<0.035% divergence). Nodes are color-coded by phylogroups.

Extended Data Fig. 5 Distribution of each Listeria phylogroup across the US.

Circles indicate samples positive for each Listeria phylogroup. Polygon indicates the distribution of each phylogroup.

Extended Data Fig. 6 Prevalence of the 12 Listeria phylogroups.

Prevalence is indicated by the proportion of sampling sites positive for one phylogroup among all 1,004 sampling sites. Populations are sorted by the prevalence in descending order. The mean prevalence is 3.35%. The standard deviation is 3.08%.

Extended Data Fig. 7 The accumulation curves of core genomes and pangenomes of Listeria phylogroups.

The blue and red line is showing the mean pan and core genome sizes of 100 repetitions of subsampling an increasing number of genomes, respectively. The vertical bars indicate the standard deviations of 100 repetitions. The formula of the power law function cNγ predicting the pangenome size (npan) is shown, where c is the size of the core genome, N is the number of genomes and γ is a scaling exponent with between 0 and 1.

Extended Data Fig. 8 Relationship between prevalence and average nucleotide diversity of phylogroup core genes.

R is the two-sided Spearman’s rank correlation coefficient, R2 indicates the variability explained by the linear regression model, and the line and the shaded area depict the best-fit trendline and the 95% confidence interval (mean + /− 1.96 SEM) of the linear regression, respectively.

Extended Data Fig. 9 Spearman’s rank correlation between the average r2N/r2S of core genes of Listeria phylogroups and the variance of soil, climate, and land-use variables for their samples.

R is the two-sided Spearman’s rank correlation coefficient. The line and the shaded area depict the best-fit trendline and the 95% confidence interval (mean + /− 1.96 SEM) of the linear regression, respectively.

Extended Data Fig. 10 Spearman’s rank correlation between prevalence and average dN/dS of Listeria core genes (orange) and phylogroup-specific core genes (blue) involved in each COG functional category.

R is the two-sided Spearman’s rank correlation coefficient. The line and the shaded area depict the best-fit trendline and the 95% confidence interval (mean + /− 1.96 SEM) of the linear regression, respectively. COG functional categories are descripted as below. C: Energy production and conversion; D: Cell cycle control, cell division, chromosome partitioning; E: Amino acid transport and metabolism; F: Nucleotide transport and metabolism; G: Carbohydrate transport and metabolism; H: Coenzyme transport and metabolism; I: Lipid transport and metabolism; J: Translation, ribosomal structure and biogenesis; K: Transcription; L: Replication, recombination and repair; M: Cell wall/membrane/envelope biogenesis; N: Cell Motility; O: Posttranslational modification, protein turnover, chaperones; P: Inorganic ion transport and metabolism; Q: Secondary metabolites biosynthesis, transport and catabolism; S: Function unknown; T: Signal transduction mechanisms; U: Intracellular trafficking, secretion, and vesicular transport; V: Defense mechanisms; Z: Cytoskeleton.

Supplementary information

Supplementary Information

Supplementary Methods, Figs. 1–5 and Tables 3, 5, 7–8.

Reporting Summary

Peer Review File

Supplementary Tables File

Supplementary Table 1, metadata summary of 1,004 soil samples; Supplementary Table 2, metadata summary of 594 Listeria isolates selected for whole-genome sequencing; Supplementary Table 4, core genes of Listeria phylogroups undergoing positive selection based on BUSTED model; and Supplementary Table 6, gene type and functional annotation of orthologous genes.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liao, J., Guo, X., Weller, D.L. et al. Nationwide genomic atlas of soil-dwelling Listeria reveals effects of selection and population ecology on pangenome evolution. Nat Microbiol 6, 1021–1030 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing