Natural bacterial populations can display enormous genomic diversity, primarily in the form of gene content variation caused by the frequent exchange of DNA with the local environment. However, the ecological drivers of genomic variability and the role of selection remain controversial. Here, we address this gap by developing a nationwide atlas of 1,854 Listeria isolates, collected systematically from soils across the contiguous United States. We found that Listeria was present across a wide range of environmental parameters, being mainly controlled by soil moisture, molybdenum and salinity concentrations. Whole-genome data from 594 representative strains allowed us to decompose Listeria diversity into 12 phylogroups, each with large differences in habitat breadth and endemism. ‘Cosmopolitan’ phylogroups, prevalent across many different habitats, had more open pangenomes and displayed weaker linkage disequilibrium, reflecting higher rates of gene gain and loss, and allele exchange than phylogroups with narrow habitat ranges. Cosmopolitan phylogroups also had a large fraction of genes affected by positive selection. The effect of positive selection was more pronounced in the phylogroup-specific core genome, suggesting that lineage-specific core genes are important drivers of adaptation. These results indicate that genome flexibility and recombination are the consequence of selection to survive in variable environments.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Trimmed paired-end reads for the 594 Listeria isolates used in this study were deposited to the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (SRA) and assembled genomes were deposited to NCBI GenBank under the accession numbers listed in Supplementary Table 2. Metadata collected in this study including sampling date, latitude, longitude, elevation, soil physiochemical data, climate data and land-use data are provided in Supplementary Table 1.
Codes for sigB allelic typing, negative sample selection, core SNPs detection and Listeria phylogroup detection are available at https://github.com/JingqiuLiao/listeria-biogeography. The code for land-use proportion calculation is available at https://github.com/wellerd2/Calculating-land-use-land-cover-and-landscape-structure-parameters.
McInerney, J. O., McNally, A. & O’Connell, M. J. Why prokaryotes have pangenomes. Nat. Microbiol. 2, 17040 (2017).
Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).
Bentley, S. Sequencing the species pan-genome. Nat. Rev. Microbiol. 7, 258–259 (2009).
Bromham, L. & Penny, D. The modern molecular clock. Nat. Rev. Genet. 4, 216–224 (2003).
Otto, S. P. & Whitlock, M. C. The probability of fixation in populations of changing size. Genetics 146, 723–733 (1997).
Moura, A. et al. Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes. Nat. Microbiol. 2, 16185 (2016).
Linke, K. et al. Reservoirs of Listeria species in three environmental ecosystems. Appl. Environ. Microbiol. 80, 5583–5592 (2014).
Liao, J., Wiedmann, M. & Kovac, J. Genetic stability and evolution of the sigB allele, used for Listeria sensu stricto subtyping and phylogenetic inference. Appl. Environ. Microbiol. 83, e00306–e00317 (2017).
Duché, O., Trémoulet, F., Glaser, P. & Labadie, J. Salt stress proteins induced in Listeria monocytogenes. Appl. Environ. Microbiol. 68, 1491–1498 (2002).
Mcclure, P. J., Roberts, T. A. & Oguru, P. O. Comparison of the effects of sodium chloride, pH and temperature on the growth of Listeria monocytogenes on gradient plates and in liquid medium. Lett. Appl. Microbiol. 9, 95–99 (1989).
Schwarz, G., Mendel, R. R. & Ribbe, M. W. Molybdenum cofactors, enzymes and pathways. Nature 460, 839–847 (2009).
Cordero, O. X. & Polz, M. F. Explaining microbial genomic diversity in light of evolutionary ecology. Nat. Rev. Microbiol. 12, 263–273 (2014).
Iranzo, J., Wolf, Y. I., Koonin, E. V. & Sela, I. Gene gain and loss push prokaryotes beyond the homologous recombination barrier and accelerate genome sequence divergence. Nat. Commun. 10, 5376 (2019).
Thomas, C. M. & Nielsen, K. M. Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat. Rev. Microbiol. 3, 711–721 (2005).
Smith, J. M., Feil, E. J. & Smith, N. H. Population structure and evolutionary dynamics of pathogenic bacteria. BioEssays 22, 1115–1122 (2000).
Crits-Christoph, A., Olm, M. R., Diamond, S., Bouma-Gregson, K. & Banfield, J. F. Soil bacterial populations are shaped by recombination and gene-specific selection across a grassland meadow. ISME J. 14, 1834–1846 (2020).
Murrell, B. et al. Gene-wide identification of episodic selection. Mol. Biol. Evol. 32, 1365–1371 (2015).
Angelastro, A. Chemoenzymatic synthesis of isotopically labelled folates. J. Am. Chem. Soc. 139, 13047–13054 (2017).
Shapiro, B. J. et al. Population genomics of early events in the ecological differentiation of bacteria. Science 336, 48–51 (2012).
Choudoir, M. J., Doroghazi, J. R. & Buckley, D. H. Latitude delineates patterns of biogeography in terrestrial Streptomyces. Environ. Microbiol. 18, 4931–4945 (2016).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Black, C. A., Evans, D. D., Ensminger, L. E., White, J. L. & Clark, F. E. Methods of Soil Analysis Part 1: Physical and Mineralogical Properties, Including Statistics of Measurement and Sampling 128–151 (American Society of Agronomy, 1965).
Weller, D., Belias, A., Green, H., Roof, S. & Wiedmann, M. Landscape, water quality, and weather factors associated with an increased likelihood of foodborne pathogen contamination of New York streams used to source water for produce production. Food Sustain. Food Syst. 3, 124 (2020).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, 309–314 (2018).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Pritchard, L., Glover, R. H., Humphris, S., Elphinstone, J. G. & Toth, I. K. Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal. Methods 8, 12–24 (2016).
Carlin, C. R. et al. Listeria cossartiae sp. nov., Listeria immobilis sp. nov., Listeria portnoyi sp. nov. and Listeria rustica sp. nov. isolated from agricultural water and natural environments. Int J. Syst. Evol. Microbiol. 71, 004795 (2021).
Arevalo, P., VanInsberghe, D., Elsherbini, J., Gore, J. & Polz, M. F. A reverse ecology approach based on a biological definition of microbial populations. Cell 178, 820–834 (2019).
Méric, G. et al. A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter. PLoS ONE 9, e92798 (2014).
Gardner, S. N., Slezak, T. & Hall, B. G. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics 31, 2877–2878 (2015).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Kelly, J. K. A test of neutrality based on interlocus associations. Genetics 146, 1197–1206 (1997).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Martin, D. P., Murrell, B., Golden, M., Khoosal, A. & Muhire, B. RDP4: detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015).
Liao, J. et al. Serotype-specific evolutionary patterns of antimicrobial-resistant Salmonella enterica. BMC Evol. Biol. 19, 132 (2019).
Pond, S. L. K., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Mol. Biol. Evol. 23, 1891–1901 (2006).
This work was supported by the Center for Produce Safety (award number 2018CPS13) through the Florida Department of Agriculture and Consumer Services (under agreement number 024842) (M.W.) and Simons Foundation Collaboration: Principles of Microbial Ecosystems (PriME) award # 542395 (O.X.C.). The contents do not necessarily reflect the views or policies of the Center for Produce Safety the United States Department of Agriculture nor does mention of trade names, commercial productions, services or organization imply endorsement by the U.S. Government. We are grateful for A. Harrison, A. Snyder, A. Curtis, A. Shed, A. Alles, A. Andrus, A.a Ho Watson, A. Edwarthy, A. King, A. Roberts, A. Carroll, A. Ferrero, A. Buehler, B. Perkins, C. Casou, C. Ajua, C. Metzger, C. Rock, C. Day, C. Burnham, C. Mauck, C. Cox, C. Fitzegerald, C. Robinson, D. Koziol, D. Bryan, D. Trudelle, D. van De Grift, D. Baumler, D. Murphy, D. Sue, D. Brasill, Debarah Weller, Donald Weller, D. Thomas, E. White, E. Wilson, F. Becker, G. Gandhi, G. Bulnhem, H. Zhang, H. Duong, H. Deng, J. Zimmerman, J. Lunsford, J. Kovac, J. Adams, J. Goettig, J. Moreira, J. Perkins, J. David, J. Eggers, J. Jenkins, J. Steffan, J. Bejeck, K. Kniel, K. Kharel, K. Kruckow, K. Cobb, K. Schmidt, K. Gall, K. Kemp, K. Jordan, L. Sun, L. Carroll, L. Gorski, L. Goddik, L. Adams, L. Nelther, M. Scawartz, M. Schwart, M. Yeung, M. Levandowski, M. Nelther, M. Stasiewicz, M. Watterson, M. Jin, M. Cao, M. Theriault, N. Brassill, R. Nuir, R. Vgas, R. Smith, R. Wallar, S. Cimowsky, S. Beno, S. Murphy, S. Duerr, S. Ahn, S. Morquez, S. Clark, T. Cobb, T. Denes, T. Lan, T. Autry, T. Peters, V. Gund, V. Lappi, V. Chherri, X. Cheng, Y. Han, Y. Hu, Z. Raposo and Z. Xiong for helping with the sample collection.
The authors declare no competing interests.
Peer review information Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Density of samples positive and negative for Listeria based on spatial, soil, and climate variables.
“***”, “**”, and “*” indicates a significant difference between positive and negative samples at P level of 0.001, 0.01, and 0.05, respectively, in two-sided Mann–Whitney tests with a Benjamini and Hochberg (BH) false discovery rate (FDR) correction.
Extended Data Fig. 2 Pairwise Pearson correlation between spatial, soil, climate, and land-use variables.
r is the Pearson coefficient. “*” indicates that variables are significantly correlated at two-sided P level of 0.05 with a BH FDR correction. TN: total nitrogen; TC: total carbon; OM: organic matter; Developed A: developed open space (< 20% impervious cover); Developed B: developed open space (>20% impervious cover).
Extended Data Fig. 3 Variable importance in predicting the presence of Listeria based on the Mean Decrease Gini index in a random forest model.
The 24 environmental variables (13 soil, 3 climate, and 8 land-use variables) and 2 spatial variables are sorted in ascending order according to the median Mean Decrease Gini value of 1,000 repetitions. Minimum and maximum values are depicted by short vertical lines of whiskers; the box signifies the upper and lower quartiles, and the short line within the box signifies the median. Points above and below the whiskers indicate outliers. Developed A: developed open space (< 20% impervious cover); Developed B: developed open space (>20% impervious cover).
Extended Data Fig. 4 Clusters in networks of recent horizontal gene transfer among Listeria genomes.
Nodes represent clonal clusters and edges represent the inferred amount of gene flow between them. Node size indicates the size of clonal cluster, which represents a group of too closely related genomes (<0.035% divergence). Nodes are color-coded by phylogroups.
Circles indicate samples positive for each Listeria phylogroup. Polygon indicates the distribution of each phylogroup.
Prevalence is indicated by the proportion of sampling sites positive for one phylogroup among all 1,004 sampling sites. Populations are sorted by the prevalence in descending order. The mean prevalence is 3.35%. The standard deviation is 3.08%.
Extended Data Fig. 7 The accumulation curves of core genomes and pangenomes of Listeria phylogroups.
The blue and red line is showing the mean pan and core genome sizes of 100 repetitions of subsampling an increasing number of genomes, respectively. The vertical bars indicate the standard deviations of 100 repetitions. The formula of the power law function cNγ predicting the pangenome size (npan) is shown, where c is the size of the core genome, N is the number of genomes and γ is a scaling exponent with between 0 and 1.
Extended Data Fig. 8 Relationship between prevalence and average nucleotide diversity of phylogroup core genes.
R is the two-sided Spearman’s rank correlation coefficient, R2 indicates the variability explained by the linear regression model, and the line and the shaded area depict the best-fit trendline and the 95% confidence interval (mean + /− 1.96 SEM) of the linear regression, respectively.
Extended Data Fig. 9 Spearman’s rank correlation between the average r2N/r2S of core genes of Listeria phylogroups and the variance of soil, climate, and land-use variables for their samples.
R is the two-sided Spearman’s rank correlation coefficient. The line and the shaded area depict the best-fit trendline and the 95% confidence interval (mean + /− 1.96 SEM) of the linear regression, respectively.
Extended Data Fig. 10 Spearman’s rank correlation between prevalence and average dN/dS of Listeria core genes (orange) and phylogroup-specific core genes (blue) involved in each COG functional category.
R is the two-sided Spearman’s rank correlation coefficient. The line and the shaded area depict the best-fit trendline and the 95% confidence interval (mean + /− 1.96 SEM) of the linear regression, respectively. COG functional categories are descripted as below. C: Energy production and conversion; D: Cell cycle control, cell division, chromosome partitioning; E: Amino acid transport and metabolism; F: Nucleotide transport and metabolism; G: Carbohydrate transport and metabolism; H: Coenzyme transport and metabolism; I: Lipid transport and metabolism; J: Translation, ribosomal structure and biogenesis; K: Transcription; L: Replication, recombination and repair; M: Cell wall/membrane/envelope biogenesis; N: Cell Motility; O: Posttranslational modification, protein turnover, chaperones; P: Inorganic ion transport and metabolism; Q: Secondary metabolites biosynthesis, transport and catabolism; S: Function unknown; T: Signal transduction mechanisms; U: Intracellular trafficking, secretion, and vesicular transport; V: Defense mechanisms; Z: Cytoskeleton.
Supplementary Methods, Figs. 1–5 and Tables 3, 5, 7–8.
Supplementary Table 1, metadata summary of 1,004 soil samples; Supplementary Table 2, metadata summary of 594 Listeria isolates selected for whole-genome sequencing; Supplementary Table 4, core genes of Listeria phylogroups undergoing positive selection based on BUSTED model; and Supplementary Table 6, gene type and functional annotation of orthologous genes.
About this article
Cite this article
Liao, J., Guo, X., Weller, D.L. et al. Nationwide genomic atlas of soil-dwelling Listeria reveals effects of selection and population ecology on pangenome evolution. Nat Microbiol 6, 1021–1030 (2021). https://doi.org/10.1038/s41564-021-00935-7