Bacterial speciation is a fundamental evolutionary process characterized by diverging genotypic and phenotypic properties. However, the selective forces that affect genetic adaptations and how they relate to the biological changes that underpin the formation of a new bacterial species remain poorly understood. Here, we show that the spore-forming, healthcare-associated enteropathogen Clostridium difficile is actively undergoing speciation. Through large-scale genomic analysis of 906 strains, we demonstrate that the ongoing speciation process is linked to positive selection on core genes in the newly forming species that are involved in sporulation and the metabolism of simple dietary sugars. Functional validation shows that the new C. difficile produces spores that are more resistant and have increased sporulation and host colonization capacity when glucose or fructose is available for metabolism. Thus, we report the formation of an emerging C. difficile species, selected for metabolizing simple dietary sugars and producing high levels of resistant spores, that is adapted for healthcare-mediated transmission.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Genomes have been deposited in the European Nucleotide Archive. Accession codes are listed in Supplementary Table 1. The 13 C. difficile reference isolates (Supplementary Table 2) are publicly available from the NCTC and the annotation of these genomes are available from the Host-Microbiota Interactions Laboratory (HMIL; www.lawleylab.com), Wellcome Sanger Institute.
No custom code was used.
Lawrence, J. G. & Retchless, A. C. The interplay of homologous recombination and horizontal gene transfer in bacterial speciation. Methods Mol. Biol. 532, 29–53 (2009).
Fraser, C., Alm, E. J., Polz, M. F., Spratt, B. G. & Hanage, W. P. The bacterial species challenge: making sense of genetic and ecological diversity. Science 323, 741–746 (2009).
Staley, J. T. The bacterial species dilemma and the genomic-phylogenetic species concept. Phil. Trans. R. Soc. Lond. B 361, 1899–1909 (2006).
Moeller, A. H. et al. Cospeciation of gut microbiota with hominids. Science 353, 380–382 (2016).
Vandamme, P. et al. Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiol Rev. 60, 407–438 (1996).
Cohan, F. M. & Perry, E. B. A systematics for discovering the fundamental units of bacterial diversity. Curr. Biol. 17, R373–R386 (2007).
Martin, J. S., Monaghan, T. M. & Wilcox, M. H. Clostridium difficile infection: epidemiology, diagnosis and understanding transmission. Nat. Rev. Gastroenterol. Hepatol. 13, 206–216 (2016).
Lessa, F. C., Winston, L. G., McDonald, L. C. & Emerging Infections Program C. difficile Surveillance Team. Burden of Clostridium difficile infection in the United States. N. Engl. J. Med. 372, 2369–2370 (2015).
Stabler, R. A. et al. Macro and micro diversity of Clostridium difficile isolates from diverse sources and geographical locations. PLoS ONE 7, e31559 (2012).
He, M. et al. Evolutionary dynamics of Clostridium difficile over short and long time scales. Proc. Natl Acad. Sci. USA 107, 7527–7532 (2010).
Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).
Jackson, M. & Spray, E. C. Health and Medicine in the Enlightenment (Oxford University Press, 2012).
Mostowy, R. et al. Efficient inference of recent and ancestral recombination within bacterial populations. Mol. Biol. Evol. 34, 1167–1182 (2017).
Lawley, T. D. et al. Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores. J. Bacteriol. 191, 5377–5386 (2009).
Pettit, L. J. et al. Functional genomics reveals that Clostridium difficile Spo0A coordinates sporulation, virulence and metabolism. BMC Genom. 15, 160 (2014).
Fimlaid, K. A. et al. Global analysis of the sporulation pathway of Clostridium difficile. PLoS Genet. 9, e1003660 (2013).
Lawley, T. D. et al. Use of purified Clostridium difficile spores to facilitate evaluation of health care disinfection regimens. Appl. Environ. Microbiol. 76, 6895–6900 (2010).
Connor, M. et al. Evolutionary clade affects resistance of Clostridium difficile spores to cold atmospheric Plasma. Sci. Rep. 7, 41814 (2017).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Cantarel, B. L. et al. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 37, D233–D238 (2009).
Lustig, R. H., Schmidt, L. A. & Brindis, C. D. Public health: the toxic truth about sugar. Nature 482, 27–29 (2012).
Collins, J. et al. Dietary trehalose enhances virulence of epidemic Clostridium difficile. Nature 553, 291–294 (2018).
Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543–546 (2016).
Merrigan, M. et al. Human hypervirulent Clostridium difficile strains exhibit increased sporulation as well as robust toxin production. J. Bacteriol. 192, 4904–4911 (2010).
Sebaihia, M. et al. The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat. Genet. 38, 779–786 (2006).
He, M. et al. Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat. Genet 45, 109–113 (2013).
Cairns, M. D. et al. Comparative genome analysis and global phylogeny of the toxin variant clostridium difficile PCR Ribotype 017 reveals the evolution of two independent sublineages. J. Clin. Microbiol. 55, 865–876 (2017).
Dingle, K. E. et al. A role for tetracycline selection in recent evolution of agriculture-associated Clostridium difficile PCR Ribotype 078. MBio 10 e02790-18 (2019).
Knetsch, C. W. et al. Zoonotic transfer of Clostridium difficile harboring antimicrobial resistance between farm animals and humans. J. Clin. Microbiol. 56 e01384-17 (2018).
Knight, D. R., Squire, M. M. & Riley, T. V. Nationwide surveillance study of Clostridium difficile in Australian neonatal pigs shows high prevalence and heterogeneity of PCR ribotypes. Appl. Environ. Microbiol. 81, 119–123 (2015).
Bauer, M. P. et al. Clostridium difficile infection in Europe: a hospital-based survey. Lancet 377, 63–73 (2011).
Tang, C. et al. The incidence and drug resistance of Clostridium difficile infection in Mainland China: a systematic review and meta-analysis. Sci. Rep. 6, 37865 (2016).
Argimon, S. et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Micro. Genom. 2, e000093 (2016).
Croucher, N. J. et al. Rapid pneumococcal evolution in response to clinical interventions. Science 331, 430–434 (2011).
Harris, S. R. et al. Evolution of MRSA during hospital transmission and intercontinental spread. Science 327, 469–474 (2010).
Quail, M. A. et al. A large genome center’s improvements to the Illumina sequencing system. Nat. Methods 5, 1005–1010 (2008).
Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).
Boetzer, M. & Pirovano, W. Toward almost closed genomes with GapFiller. Genome Biol. 13, R56 (2012).
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Chain, P. S. et al. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).
Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15 (2015).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Milne, I. et al. TOPALiv2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics 25, 126–127 (2009).
Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).
Popescu, A. A., Huber, K. T. & Paradis, E. ape 3.0: new tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics 28, 1536–1537 (2012).
Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 39, W475–W478 (2011).
Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002).
Cheng, L., Connor, T. R., Siren, J., Aanensen, D. M. & Corander, J. Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol. Biol. Evol. 30, 1224–1228 (2013).
Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000).
Jombart, T., Devillard, S. & Balloux, F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 11, 94 (2010).
Jombart, T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24, 1403–1405 (2008).
Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 40, W445–W451 (2012).
Riley, M. Functions of the gene products of Escherichia coli. Microbiol Rev. 57, 862–952 (1993).
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
Lerat, E. & Ochman, H. Recognizing the pseudogenes in bacterial genomes. Nucleic Acids Res. 33, 3125–3132 (2005).
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in bayesian phylogenetics using tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Karasawa, T., Ikoma, S., Yamakawa, K. & Nakamura, S. A defined growth medium for Clostridium difficile. Microbiology 141, 371–375 (1995).
Duncan, S. H., Hold, G. L., Harmsen, H. J., Stewart, C. S. & Flint, H. J. Growth requirements and fermentation products of Fusobacterium prausnitzii, and a proposal to reclassify it as Faecalibacterium prausnitzii gen. nov., comb. nov. Int. J. Syst. Evol. Microbiol. 52, 2141–2146 (2002).
This work was supported by the Wellcome Trust (098051), the UK Medical Research Council (PF451 and MR/K000511/1), the Australian National Health and Medical Research Council (1091097 and 1159239 to S.F.) and the Victorian Government’s Operational Infrastructure Support Program. The authors thank S. Weese, F. Miyajima, G. Songer, T. Louie, J. Rood and N. M. Brown for C. difficile strains. The authors thank A. Neville, D. Knight and B. Hornung for critical reading and comments. The authors would also like to acknowledge the support of the Wellcome Sanger Institute Pathogen Informatics Team.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
(a) Number of strains based on geographical location is shown in bar-plots. (b) Number of strains based on source.
Supplementary Figure 2 Pairwise SNPs difference between different phylogenetic groups of Clostridium difficile.
Boxplots show distribution of SNPs differences calculated between pairs of genomes belonging to different PGs (PG1: n = 108 genomes, PG2: n = 398 genomes, PG3: n = 112 genomes, PG4: n = 288 genomes). Box plots show minimum to maximum values and the median value.
C. difficile strains from distinct clades were plated on YCFA agar plates supplemented with 0.1% sodium taurocholate and incubated for 8 days and C. difficile colonies were photographed. Ribotype RT002, RT027, and RT017 represent PG1, 2 and 3 respectively. RT045, RT078 and RT033 represent PG4. Experiment was repeated 3 time with similar results.
Skyline plot of Clostridium difficile PG2 (RT027; n = 44 strains) and PG4 (RT078; n = 97 strains) indicate signals of C. difficile clade A expansion in the year 1595. The black line represents median estimate, and purple area represents its 95% highest posterior density intervals.
Supplementary Figure 5 Recombination analysis based on whole genome of 906 Clostridium difficile strains.
Phylogenetic groups of C. difficile are shown in circles. Direction of edges represent direction of recombination event (donor to recipient). Range of recombination events are shown on the edges.
Supplementary Figure 6 Comparison of accessory genome between 4 phylogenetic groups (PGs) of Clostridium difficile.
(a) Discriminant analysis of principal components using Clusters of Orthologous Groups (COGs) and accessory genome of strains from PG1 (n = 108 genomes), PG2 (n = 398 genomes), PG3 (n = 112 genomes), and PG4 (n = 288 genomes). (b) Functional classification and distribution of enriched genes in the group of PG1, 2 and 3 (n = 618 genomes) as compared to PG4 (n = 288 genomes). Cell motility (including flagella) and mobile elements are the most enriched functions. (c) Functional classification and distribution of enriched genes in PG4 (n = 288 genomes) as compared to the group of PG1, 2 and 3 (n = 618 genomes). Uncharacterized functions and DNA replication and modification functions are the most enriched functions. One-sided Fisher’s exact test with p-value adjusted by Hochberg method.
Supplementary Figure 7 High number of pseudogenes in the Clostridium difficile clade A compared to clade B.
The bar-plot shows the number of pseudogenes in each phylogenetic group (PG1: n = 108 genomes, PG2: n = 398 genomes, PG3: n = 112 genomes, PG4: n = 288 genomes).
There are 21 sporulation-associated positively selected genes in PG4. These are all either present in the mature spore proteome or they are regulated by Spo0A or its sporulation specific sigma factors. There are no genes directly involved in producing a spore in any of the sporulation stages.
Supplementary Figure 9 Multiple sequence alignment of the sodA gene from Clostridium difficile clade A and clade B.
A nucleotide consensus sequence for 4 phylogenetic groups (PG1-4) is shown. Three-point mutations which are present in all C. difficile clade A genomes and absent in C. difficile clade B genomes are shown in black boxes. The amino-acids related to these mutations are mentioned.
Supplementary Figure 10 Schematic diagram showing the metabolic pathway of glucose and fructose metabolism in C. difficile.
Positively selected genes of Clostridium difficile clade A are shown in blue.
Supplementary Figure 11 Functional diversity of carbohydrate-active enzyme in 4 phylogenetic groups (PGs) of Clostridium difficile.
Discriminant analysis of principal components using carbohydrate active enzymes (CAZymes) database. Each color represents a strain from 4 PGs: PG1 (n = 108 genomes); PG2 (n = 398 genomes); PG3 (n = 112 genomes) and PG4 (n = 288 genomes). One-sided Fisher’s exact test with p-value adjusted by Hochberg method.
Supplementary Figs. 1–11
List of Clostridium difficile strains included in this study.
List of high-quality genomes of Clostridium difficile strains.
List of 1322 single copy core genes present in 906 Clostridium difficile strains.
List of accessory genes enriched in Clostridium difficile clade A (n = 618 genomes). One-sided Fisher’s exact test with p-value adjusted by Hochberg method.
List of accessory genes enriched in Clostridium difficile clade B (n = 288 genomes). One-sided Fisher’s exact test with p-value adjusted by Hochberg method.
List of pseudogenes in Clostridium difficile PG1
List of pseudogenes in Clostridium difficile PG2.
List of pseudogenes in Clostridium difficile PG3.
List of pseudogenes in Clostridium difficile PG4.
List of pseudogenes that are present in all phylogenetic groups of Clade A but absent in Clade B of Clostridium difficile.
List of pseudogenes that are present in Clade B but absent in Clade A of Clostridium difficile.
List of positively selected genes in Clostridium difficile clade A
List of positively selected genes in Clostridium difficile clade B.
Presence/absence matrix of carbohydrate-active enzyme in 4 phylogenetic groups of Clostridium difficile.