Adaptation of host transmission cycle during Clostridium difficile speciation


Bacterial speciation is a fundamental evolutionary process characterized by diverging genotypic and phenotypic properties. However, the selective forces that affect genetic adaptations and how they relate to the biological changes that underpin the formation of a new bacterial species remain poorly understood. Here, we show that the spore-forming, healthcare-associated enteropathogen Clostridium difficile is actively undergoing speciation. Through large-scale genomic analysis of 906 strains, we demonstrate that the ongoing speciation process is linked to positive selection on core genes in the newly forming species that are involved in sporulation and the metabolism of simple dietary sugars. Functional validation shows that the new C. difficile produces spores that are more resistant and have increased sporulation and host colonization capacity when glucose or fructose is available for metabolism. Thus, we report the formation of an emerging C. difficile species, selected for metabolizing simple dietary sugars and producing high levels of resistant spores, that is adapted for healthcare-mediated transmission.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Phylogeny and population structure of Clostridium difficile.
Fig. 2: Adaptation of sporulation and metabolic genes in Clostridium difficile clade A.
Fig. 3: Bacterial speciation is linked to increased host adaptation and transmission ability.

Data availability

Genomes have been deposited in the European Nucleotide Archive. Accession codes are listed in Supplementary Table 1. The 13 C. difficile reference isolates (Supplementary Table 2) are publicly available from the NCTC and the annotation of these genomes are available from the Host-Microbiota Interactions Laboratory (HMIL;, Wellcome Sanger Institute.

Code availability

No custom code was used.


  1. 1.

    Lawrence, J. G. & Retchless, A. C. The interplay of homologous recombination and horizontal gene transfer in bacterial speciation. Methods Mol. Biol. 532, 29–53 (2009).

    CAS  Article  Google Scholar 

  2. 2.

    Fraser, C., Alm, E. J., Polz, M. F., Spratt, B. G. & Hanage, W. P. The bacterial species challenge: making sense of genetic and ecological diversity. Science 323, 741–746 (2009).

    CAS  Article  Google Scholar 

  3. 3.

    Staley, J. T. The bacterial species dilemma and the genomic-phylogenetic species concept. Phil. Trans. R. Soc. Lond. B 361, 1899–1909 (2006).

    Article  Google Scholar 

  4. 4.

    Moeller, A. H. et al. Cospeciation of gut microbiota with hominids. Science 353, 380–382 (2016).

    CAS  Article  Google Scholar 

  5. 5.

    Vandamme, P. et al. Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiol Rev. 60, 407–438 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Cohan, F. M. & Perry, E. B. A systematics for discovering the fundamental units of bacterial diversity. Curr. Biol. 17, R373–R386 (2007).

    CAS  Article  Google Scholar 

  7. 7.

    Martin, J. S., Monaghan, T. M. & Wilcox, M. H. Clostridium difficile infection: epidemiology, diagnosis and understanding transmission. Nat. Rev. Gastroenterol. Hepatol. 13, 206–216 (2016).

    Article  Google Scholar 

  8. 8.

    Lessa, F. C., Winston, L. G., McDonald, L. C. & Emerging Infections Program C. difficile Surveillance Team. Burden of Clostridium difficile infection in the United States. N. Engl. J. Med. 372, 2369–2370 (2015).

    Article  Google Scholar 

  9. 9.

    Stabler, R. A. et al. Macro and micro diversity of Clostridium difficile isolates from diverse sources and geographical locations. PLoS ONE 7, e31559 (2012).

    CAS  Article  Google Scholar 

  10. 10.

    He, M. et al. Evolutionary dynamics of Clostridium difficile over short and long time scales. Proc. Natl Acad. Sci. USA 107, 7527–7532 (2010).

    CAS  Article  Google Scholar 

  11. 11.

    Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).

    CAS  Article  Google Scholar 

  12. 12.

    Jackson, M. & Spray, E. C. Health and Medicine in the Enlightenment (Oxford University Press, 2012).

  13. 13.

    Mostowy, R. et al. Efficient inference of recent and ancestral recombination within bacterial populations. Mol. Biol. Evol. 34, 1167–1182 (2017).

    CAS  Article  Google Scholar 

  14. 14.

    Lawley, T. D. et al. Proteomic and genomic characterization of highly infectious Clostridium difficile 630 spores. J. Bacteriol. 191, 5377–5386 (2009).

    CAS  Article  Google Scholar 

  15. 15.

    Pettit, L. J. et al. Functional genomics reveals that Clostridium difficile Spo0A coordinates sporulation, virulence and metabolism. BMC Genom. 15, 160 (2014).

    Article  Google Scholar 

  16. 16.

    Fimlaid, K. A. et al. Global analysis of the sporulation pathway of Clostridium difficile. PLoS Genet. 9, e1003660 (2013).

    CAS  Article  Google Scholar 

  17. 17.

    Lawley, T. D. et al. Use of purified Clostridium difficile spores to facilitate evaluation of health care disinfection regimens. Appl. Environ. Microbiol. 76, 6895–6900 (2010).

    CAS  Article  Google Scholar 

  18. 18.

    Connor, M. et al. Evolutionary clade affects resistance of Clostridium difficile spores to cold atmospheric Plasma. Sci. Rep. 7, 41814 (2017).

    CAS  Article  Google Scholar 

  19. 19.

    Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    CAS  Article  Google Scholar 

  20. 20.

    Cantarel, B. L. et al. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 37, D233–D238 (2009).

    CAS  Article  Google Scholar 

  21. 21.

    Lustig, R. H., Schmidt, L. A. & Brindis, C. D. Public health: the toxic truth about sugar. Nature 482, 27–29 (2012).

    CAS  Article  Google Scholar 

  22. 22.

    Collins, J. et al. Dietary trehalose enhances virulence of epidemic Clostridium difficile. Nature 553, 291–294 (2018).

    CAS  Article  Google Scholar 

  23. 23.

    Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543–546 (2016).

    CAS  Article  Google Scholar 

  24. 24.

    Merrigan, M. et al. Human hypervirulent Clostridium difficile strains exhibit increased sporulation as well as robust toxin production. J. Bacteriol. 192, 4904–4911 (2010).

    CAS  Article  Google Scholar 

  25. 25.

    Sebaihia, M. et al. The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat. Genet. 38, 779–786 (2006).

    Article  Google Scholar 

  26. 26.

    He, M. et al. Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat. Genet 45, 109–113 (2013).

    CAS  Article  Google Scholar 

  27. 27.

    Cairns, M. D. et al. Comparative genome analysis and global phylogeny of the toxin variant clostridium difficile PCR Ribotype 017 reveals the evolution of two independent sublineages. J. Clin. Microbiol. 55, 865–876 (2017).

    CAS  Article  Google Scholar 

  28. 28.

    Dingle, K. E. et al. A role for tetracycline selection in recent evolution of agriculture-associated Clostridium difficile PCR Ribotype 078. MBio 10 e02790-18 (2019).

  29. 29.

    Knetsch, C. W. et al. Zoonotic transfer of Clostridium difficile harboring antimicrobial resistance between farm animals and humans. J. Clin. Microbiol. 56 e01384-17 (2018).

  30. 30.

    Knight, D. R., Squire, M. M. & Riley, T. V. Nationwide surveillance study of Clostridium difficile in Australian neonatal pigs shows high prevalence and heterogeneity of PCR ribotypes. Appl. Environ. Microbiol. 81, 119–123 (2015).

    Article  Google Scholar 

  31. 31.

    Bauer, M. P. et al. Clostridium difficile infection in Europe: a hospital-based survey. Lancet 377, 63–73 (2011).

    Article  Google Scholar 

  32. 32.

    Tang, C. et al. The incidence and drug resistance of Clostridium difficile infection in Mainland China: a systematic review and meta-analysis. Sci. Rep. 6, 37865 (2016).

    CAS  Article  Google Scholar 

  33. 33.

    Argimon, S. et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Micro. Genom. 2, e000093 (2016).

    Google Scholar 

  34. 34.

    Croucher, N. J. et al. Rapid pneumococcal evolution in response to clinical interventions. Science 331, 430–434 (2011).

    CAS  Article  Google Scholar 

  35. 35.

    Harris, S. R. et al. Evolution of MRSA during hospital transmission and intercontinental spread. Science 327, 469–474 (2010).

    CAS  Article  Google Scholar 

  36. 36.

    Quail, M. A. et al. A large genome center’s improvements to the Illumina sequencing system. Nat. Methods 5, 1005–1010 (2008).

    CAS  Article  Google Scholar 

  37. 37.

    Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    CAS  Article  Google Scholar 

  38. 38.

    Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).

    CAS  Article  Google Scholar 

  39. 39.

    Boetzer, M. & Pirovano, W. Toward almost closed genomes with GapFiller. Genome Biol. 13, R56 (2012).

    Article  Google Scholar 

  40. 40.

    Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

    CAS  Article  Google Scholar 

  41. 41.

    Chain, P. S. et al. Genome project standards in a new era of sequencing. Science 326, 236–237 (2009).

    CAS  Article  Google Scholar 

  42. 42.

    Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).

    CAS  Article  Google Scholar 

  43. 43.

    Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    CAS  Article  Google Scholar 

  44. 44.

    Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15 (2015).

    Article  Google Scholar 

  45. 45.

    Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    CAS  Article  Google Scholar 

  46. 46.

    Milne, I. et al. TOPALiv2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics 25, 126–127 (2009).

    CAS  Article  Google Scholar 

  47. 47.

    Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).

    Article  Google Scholar 

  48. 48.

    Popescu, A. A., Huber, K. T. & Paradis, E. ape 3.0: new tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics 28, 1536–1537 (2012).

    CAS  Article  Google Scholar 

  49. 49.

    Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 39, W475–W478 (2011).

    CAS  Article  Google Scholar 

  50. 50.

    Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002).

    Article  Google Scholar 

  51. 51.

    Cheng, L., Connor, T. R., Siren, J., Aanensen, D. M. & Corander, J. Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol. Biol. Evol. 30, 1224–1228 (2013).

    CAS  Article  Google Scholar 

  52. 52.

    Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000).

    CAS  Article  Google Scholar 

  53. 53.

    Jombart, T., Devillard, S. & Balloux, F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 11, 94 (2010).

    Article  Google Scholar 

  54. 54.

    Jombart, T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24, 1403–1405 (2008).

    CAS  Article  Google Scholar 

  55. 55.

    Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 40, W445–W451 (2012).

    CAS  Article  Google Scholar 

  56. 56.

    Riley, M. Functions of the gene products of Escherichia coli. Microbiol Rev. 57, 862–952 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).

    CAS  Article  Google Scholar 

  58. 58.

    Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).

    CAS  Article  Google Scholar 

  59. 59.

    Lerat, E. & Ochman, H. Recognizing the pseudogenes in bacterial genomes. Nucleic Acids Res. 33, 3125–3132 (2005).

    CAS  Article  Google Scholar 

  60. 60.

    Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in bayesian phylogenetics using tracer 1.7. Syst. Biol. 67, 901–904 (2018).

    CAS  Article  Google Scholar 

  61. 61.

    Karasawa, T., Ikoma, S., Yamakawa, K. & Nakamura, S. A defined growth medium for Clostridium difficile. Microbiology 141, 371–375 (1995).

    CAS  Article  Google Scholar 

  62. 62.

    Duncan, S. H., Hold, G. L., Harmsen, H. J., Stewart, C. S. & Flint, H. J. Growth requirements and fermentation products of Fusobacterium prausnitzii, and a proposal to reclassify it as Faecalibacterium prausnitzii gen. nov., comb. nov. Int. J. Syst. Evol. Microbiol. 52, 2141–2146 (2002).

    CAS  PubMed  Google Scholar 

Download references


This work was supported by the Wellcome Trust (098051), the UK Medical Research Council (PF451 and MR/K000511/1), the Australian National Health and Medical Research Council (1091097 and 1159239 to S.F.) and the Victorian Government’s Operational Infrastructure Support Program. The authors thank S. Weese, F. Miyajima, G. Songer, T. Louie, J. Rood and N. M. Brown for C. difficile strains. The authors thank A. Neville, D. Knight and B. Hornung for critical reading and comments. The authors would also like to acknowledge the support of the Wellcome Sanger Institute Pathogen Informatics Team.

Author information




N.K. and T.D.L. conceived and managed the study. N.K., S.C.F., E.V., H.P.B. and T.D.L. wrote the manuscript. D.J.F., P.R., M.P., M.RJ.C., M.B.F.J., K.R.H., M.I., L.H.W., C.S., T.N., G.D., T.V.R., E.J.K. and B.W.W. provided critical input and contributed to the editing of the manuscript. N.K. performed the computational analysis. H.P.B. performed genome annotation of reference genomes. D.J.F., P.R., M.P., M.RJ.C., M.B.F.J., K.R.H., M.I., L.H.W., C.S. and T.N. obtained C. difficile strains. E.V., H.P.B., S.C.F. and T.D.L. designed in vitro and in vivo experiments. H.P.B., E.V. and M.S. performed in vitro experiments. E.V., M.D.S., S.C. and K.H. performed in vivo experiments.

Corresponding authors

Correspondence to Nitin Kumar or Trevor D. Lawley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Breakdown of 906 Clostridium difficile strains based on metadata.

(a) Number of strains based on geographical location is shown in bar-plots. (b) Number of strains based on source.

Supplementary Figure 2 Pairwise SNPs difference between different phylogenetic groups of Clostridium difficile.

Boxplots show distribution of SNPs differences calculated between pairs of genomes belonging to different PGs (PG1: n = 108 genomes, PG2: n = 398 genomes, PG3: n = 112 genomes, PG4: n = 288 genomes). Box plots show minimum to maximum values and the median value.

Supplementary Figure 3 Colony morphology of Clostridium difficile strains.

C. difficile strains from distinct clades were plated on YCFA agar plates supplemented with 0.1% sodium taurocholate and incubated for 8 days and C. difficile colonies were photographed. Ribotype RT002, RT027, and RT017 represent PG1, 2 and 3 respectively. RT045, RT078 and RT033 represent PG4. Experiment was repeated 3 time with similar results.

Supplementary Figure 4 Bayesian skyline plots.

Skyline plot of Clostridium difficile PG2 (RT027; n = 44 strains) and PG4 (RT078; n = 97 strains) indicate signals of C. difficile clade A expansion in the year 1595. The black line represents median estimate, and purple area represents its 95% highest posterior density intervals.

Supplementary Figure 5 Recombination analysis based on whole genome of 906 Clostridium difficile strains.

Phylogenetic groups of C. difficile are shown in circles. Direction of edges represent direction of recombination event (donor to recipient). Range of recombination events are shown on the edges.

Supplementary Figure 6 Comparison of accessory genome between 4 phylogenetic groups (PGs) of Clostridium difficile.

(a) Discriminant analysis of principal components using Clusters of Orthologous Groups (COGs) and accessory genome of strains from PG1 (n = 108 genomes), PG2 (n = 398 genomes), PG3 (n = 112 genomes), and PG4 (n = 288 genomes). (b) Functional classification and distribution of enriched genes in the group of PG1, 2 and 3 (n = 618 genomes) as compared to PG4 (n = 288 genomes). Cell motility (including flagella) and mobile elements are the most enriched functions. (c) Functional classification and distribution of enriched genes in PG4 (n = 288 genomes) as compared to the group of PG1, 2 and 3 (n = 618 genomes). Uncharacterized functions and DNA replication and modification functions are the most enriched functions. One-sided Fisher’s exact test with p-value adjusted by Hochberg method.

Supplementary Figure 7 High number of pseudogenes in the Clostridium difficile clade A compared to clade B.

The bar-plot shows the number of pseudogenes in each phylogenetic group (PG1: n = 108 genomes, PG2: n = 398 genomes, PG3: n = 112 genomes, PG4: n = 288 genomes).

Supplementary Figure 8 Sporulation-associated genes in Clostridium difficile clade B.

There are 21 sporulation-associated positively selected genes in PG4. These are all either present in the mature spore proteome or they are regulated by Spo0A or its sporulation specific sigma factors. There are no genes directly involved in producing a spore in any of the sporulation stages.

Supplementary Figure 9 Multiple sequence alignment of the sodA gene from Clostridium difficile clade A and clade B.

A nucleotide consensus sequence for 4 phylogenetic groups (PG1-4) is shown. Three-point mutations which are present in all C. difficile clade A genomes and absent in C. difficile clade B genomes are shown in black boxes. The amino-acids related to these mutations are mentioned.

Supplementary Figure 10 Schematic diagram showing the metabolic pathway of glucose and fructose metabolism in C. difficile.

Positively selected genes of Clostridium difficile clade A are shown in blue.

Supplementary Figure 11 Functional diversity of carbohydrate-active enzyme in 4 phylogenetic groups (PGs) of Clostridium difficile.

Discriminant analysis of principal components using carbohydrate active enzymes (CAZymes) database. Each color represents a strain from 4 PGs: PG1 (n = 108 genomes); PG2 (n = 398 genomes); PG3 (n = 112 genomes) and PG4 (n = 288 genomes). One-sided Fisher’s exact test with p-value adjusted by Hochberg method.

Supplementary information

Supplementary Information

Supplementary Figs. 1–11

Reporting Summary

Supplementary Table 1

List of Clostridium difficile strains included in this study.

Supplementary Table 2

List of high-quality genomes of Clostridium difficile strains.

Supplementary Table 3

List of 1322 single copy core genes present in 906 Clostridium difficile strains.

Supplementary Table 4

List of accessory genes enriched in Clostridium difficile clade A (n = 618 genomes). One-sided Fisher’s exact test with p-value adjusted by Hochberg method.

Supplementary Table 5

List of accessory genes enriched in Clostridium difficile clade B (n = 288 genomes). One-sided Fisher’s exact test with p-value adjusted by Hochberg method.

Supplementary Table 6

List of pseudogenes in Clostridium difficile PG1

Supplementary Table 7

List of pseudogenes in Clostridium difficile PG2.

Supplementary Table 8

List of pseudogenes in Clostridium difficile PG3.

Supplementary Table 9

List of pseudogenes in Clostridium difficile PG4.

Supplementary Table 10

List of pseudogenes that are present in all phylogenetic groups of Clade A but absent in Clade B of Clostridium difficile.

Supplementary Table 11

List of pseudogenes that are present in Clade B but absent in Clade A of Clostridium difficile.

Supplementary Table 12

List of positively selected genes in Clostridium difficile clade A

Supplementary Table 13

List of positively selected genes in Clostridium difficile clade B.

Supplementary Table 14

Presence/absence matrix of carbohydrate-active enzyme in 4 phylogenetic groups of Clostridium difficile.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kumar, N., Browne, H.P., Viciani, E. et al. Adaptation of host transmission cycle during Clostridium difficile speciation. Nat Genet 51, 1315–1320 (2019).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing