Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Epigenomic characterization of Clostridioides difficile finds a conserved DNA methyltransferase that mediates sporulation and pathogenesis


Clostridioides (formerly Clostridium) difficile is a leading cause of healthcare-associated infections. Although considerable progress has been made in the understanding of its genome, the epigenome of C. difficile and its functional impact has not been systematically explored. Here, we perform a comprehensive DNA methylome analysis of C. difficile using 36 human isolates and observe a high level of epigenomic diversity. We discovered an orphan DNA methyltransferase with a well-defined specificity, the corresponding gene of which is highly conserved across our dataset and in all of the approximately 300 global C. difficile genomes examined. Inactivation of the methyltransferase gene negatively impacts sporulation, a key step in C. difficile disease transmission, and these results are consistently supported by multiomics data, genetic experiments and a mouse colonization model. Further experimental and transcriptomic analyses suggest that epigenetic regulation is associated with cell length, biofilm formation and host colonization. These findings provide a unique epigenetic dimension to characterize medically relevant biological processes in this important pathogen. This study also provides a set of methods for comparative epigenomics and integrative analysis, which we expect to be broadly applicable to bacterial epigenomic studies.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: The methylomes of the 36 strains of C. difficile.
Fig. 2: CamA modulates sporulation levels in C. difficile.
Fig. 3: Abundance, distribution and conservation of CAAAAA motif sites.
Fig. 4: The distribution of non-methylated CAAAAA motif sites and their overlap with TFBSs and TSSs.
Fig. 5: Gene-expression analysis.
Fig. 6: In vivo and additional functional impacts of the ∆camA mutation.

Data availability

Genome assemblies and methylation data are available from NCBI under BioProject ID PRJNA448390. RNA-seq data are available under project ID PRJNA445308. Additional data are available from the corresponding authors on reasonable request.

Code availability

Scripts and a tutorial supporting all of the key analyses of this research are publicly available as a package named Bacterial Epigenome Analysis SuiTe (BEAST) at


  1. 1.

    Smits, W. K., Lyras, D., Lacy, D. B., Wilcox, M. H. & Kuijper, E. J. Clostridium difficile infection. Nat. Rev. Dis. Primers 2, 16020 (2016).

    PubMed  PubMed Central  Google Scholar 

  2. 2.

    Sebaihia, M. et al. The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat. Genet. 38, 779–786 (2006).

    PubMed  Google Scholar 

  3. 3.

    He, M. et al. Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat. Genet. 45, 109–113 (2013).

    CAS  PubMed  Google Scholar 

  4. 4.

    Herbert, M., O’Keeffe, T. A., Purdy, D., Elmore, M. & Minton, N. P. Gene transfer into Clostridium difficile CD630 and characterisation of its methylase genes. FEMS Microbiol. Lett. 229, 103–110 (2003).

    CAS  PubMed  Google Scholar 

  5. 5.

    van Eijk, E. et al. Complete genome sequence of the Clostridium difficile laboratory strain 630Δerm reveals differences from strain 630, including translocation of the mobile element CTn5. BMC Genom. 16, 31 (2015).

    Google Scholar 

  6. 6.

    Hargreaves, K. R., Thanki, A. M., Jose, B. R., Oggioni, M. R. & Clokie, M. R. Use of single molecule sequencing for comparative genomics of an environmental and a clinical isolate of Clostridium difficile ribotype 078. BMC Genom. 17, 1020 (2016).

    Google Scholar 

  7. 7.

    Casadesus, J. & Low, D. Epigenetic gene regulation in the bacterial world. Microbiol. Mol. Biol. Rev. 70, 830–856 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Low, D. A., Weyand, N. J. & Mahan, M. J. Roles of DNA adenine methylation in regulating bacterial gene expression and virulence. Infect. Immun. 69, 7197–7204 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Cohen, N. R. et al. A role for the bacterial GATC methylome in antibiotic stress survival. Nat. Genet. 48, 581–586 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Manso, A. S. et al. A random six-phase switch regulates pneumococcal virulence via global epigenetic changes. Nat. Commun. 5, 5055 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Atack, J. M. et al. A biphasic epigenetic switch controls immunoevasion, virulence and niche adaptation in non-typeable Haemophilus influenzae. Nat. Commun. 6, 7828 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Wion, D. & Casadesus, J. N 6-methyl-adenine: an epigenetic signal for DNA-protein interactions. Nat. Rev. Microbiol. 4, 183–192 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Oliveira, P. H., Touchon, M. & Rocha, E. P. Regulation of genetic flux between bacteria by restriction-modification systems. Proc. Natl Acad. Sci. USA 113, 5658–5663 (2016).

    CAS  PubMed  Google Scholar 

  14. 14.

    Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Beaulaurier, J., Schadt, E. E. & Fang, G. Deciphering bacterial epigenomes using modern sequencing technologies. Nat. Rev. Genet. 20, 157–172 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Fang, G. et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nat. Biotechnol. 30, 1232–1239 (2012).

    CAS  PubMed  Google Scholar 

  17. 17.

    Murray, I. A. et al. The methylomes of six bacteria. Nucleic Acids Res. 40, 11450–11462 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Davis, B. M., Chao, M. C. & Waldor, M. K. Entering the era of bacterial epigenomics with single molecule real time DNA sequencing. Curr. Opin. Microbiol. 16, 192–198 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Smits, W. K. Hype or hypervirulence: a reflection on problematic C. difficile strains. Virulence 4, 592–596 (2013).

    PubMed  PubMed Central  Google Scholar 

  20. 20.

    Roberts, R. J. et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. 31, 1805–1812 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Wust, J., Sullivan, N. M., Hardegger, U. & Wilkins, T. D. Investigation of an outbreak of antibiotic-associated colitis by various typing methods. J. Clin. Microbiol. 16, 1096–1101 (1982).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Barra-Carrasco, J. & Paredes-Sabja, D. Clostridium difficile spores: a major threat to the hospital environment. Future Microbiol. 9, 475–486 (2014).

    CAS  PubMed  Google Scholar 

  23. 23.

    Dembek, M. et al. High-throughput analysis of gene essentiality and sporulation in Clostridium difficile. mBio 6, e02383 (2015).

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Donnelly, M. L., Fimlaid, K. A. & Shen, A. Characterization of Clostridium difficile spores lacking either SpoVAC or dipicolinic acid synthetase. J. Bacteriol. 198, 1694–1707 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Shen, A., Fimlaid, K. A. & Pishdadian, K. Inducing and quantifying Clostridium difficile spore formation. Methods Mol. Biol. 1476, 129–142 (2016).

    PubMed  Google Scholar 

  26. 26.

    Schbath, S. & Hoebeke, M. in Advances in Genomic Sequence Analysis and Pattern Discovery Vol. 7 (eds Elnitsk, L. et al.) 25–64 (World Scientific, 2011).

  27. 27.

    Knijnenburg, T. A. et al. Multiscale representation of genomic signals. Nat. Methods 11, 689–694 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Huang, D. W. et al. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35, W169–W175 (2007).

    PubMed  PubMed Central  Google Scholar 

  29. 29.

    Lim, H. N. & van Oudenaarden, A. A multistep epigenetic switch enables the stable inheritance of DNA methylation states. Nat. Genet. 39, 269–275 (2007).

    CAS  PubMed  Google Scholar 

  30. 30.

    Ardissone, S. et al. Cell cycle constraints and environmental control of local DNA hypomethylation in α-Proteobacteria. PLoS Genet. 12, e1006499 (2016).

    PubMed  PubMed Central  Google Scholar 

  31. 31.

    Cota, I. et al. OxyR-dependent formation of DNA methylation patterns in OpvABOFF and OpvABON cell lineages of Salmonella enterica. Nucleic Acids Res. 44, 3595–3609 (2016).

    CAS  PubMed  Google Scholar 

  32. 32.

    Fimlaid, K. A. et al. Global analysis of the sporulation pathway of Clostridium difficile. PLoS Genet. 9, e1003660 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Pishdadian, K., Fimlaid, K. A. & Shen, A. SpoIIID-mediated regulation of σK function during Clostridium difficile sporulation. Mol. Microbiol. 95, 189–208 (2015).

    CAS  PubMed  Google Scholar 

  34. 34.

    Fimlaid, K. A. & Shen, A. Diverse mechanisms regulate sporulation sigma factor activity in the Firmicutes. Curr. Opin. Microbiol. 24, 88–95 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Saujet, L., Pereira, F. C., Henriques, A. O. & Martin-Verstraete, I. The regulatory network controlling spore formation in Clostridium difficile. FEMS Microbiol. Lett. 358, 1–10 (2014).

    CAS  PubMed  Google Scholar 

  36. 36.

    Saujet, L. et al. Genome-wide analysis of cell type-specific gene transcription during spore formation in Clostridium difficile. PLoS Genet. 9, e1003756 (2013).

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Rosenbusch, K. E., Bakker, D., Kuijper, E. J. & Smits, W. K. C. difficile 630Δerm Spo0A regulates sporulation, but does not contribute to toxin production, by direct high-affinity binding to target DNA. PLoS ONE 7, e48608 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Fimlaid, K. A., Jensen, O., Donnelly, M. L., Siegrist, M. S. & Shen, A. Regulation of Clostridium difficile spore formation by the SpoIIQ and SpoIIIA proteins. PLoS Genet. 11, e1005562 (2015).

    PubMed  PubMed Central  Google Scholar 

  39. 39.

    Ribis, J. W., Fimlaid, K. A. & Shen, A. Differential requirements for conserved peptidoglycan remodeling enzymes during Clostridioides difficile spore formation. Mol. Microbiol. 110, 370–389 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Maldarelli, G. A. et al. Type IV pili promote early biofilm formation by Clostridium difficile. Pathog. Dis. 74, ftw061 (2016).

    PubMed  PubMed Central  Google Scholar 

  41. 41.

    Jenior, M. L., Leslie, J. L., Young, V. B. & Schloss, P. D. Clostridium difficile colonizes alternative nutrient niches during infection across distinct murine gut microbiomes. mSystems 2, e00063-17 (2017).

    PubMed  PubMed Central  Google Scholar 

  42. 42.

    Fletcher, J. R., Erwin, S., Lanzas, C. & Theriot, C. M. Shifts in the gut metabolome and Clostridium difficile transcriptome throughout colonization and infection in a mouse model. mSphere 3, e00089-18 (2018).

    PubMed  PubMed Central  Google Scholar 

  43. 43.

    Lessa, F. C. et al. Burden of Clostridium difficile infection in the United States. N. Engl. J. Med. 372, 825–834 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Deakin, L. J. et al. The Clostridium difficile spo0A gene is a persistence and transmission factor. Infect. Immun. 80, 2704–2711 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Lewis, B. B. & Pamer, E. G. Microbiota-based therapies for Clostridium difficile and antibiotic-resistant enteric Infections. Annu. Rev. Microbiol. 71, 157–178 (2017).

    CAS  PubMed  Google Scholar 

  46. 46.

    Abt, M. C., McKenney, P. T. & Pamer, E. G. Clostridium difficile colitis: pathogenesis and host defence. Nat. Rev. Microbiol. 14, 609–620 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Sanchez-Romero, M. A., Cota, I. & Casadesus, J. DNA methylation in bacteria: from the methyl group to the methylome. Curr. Opin. Microbiol. 25, 9–16 (2015).

    CAS  PubMed  Google Scholar 

  48. 48.

    Griffiths, D. et al. Multilocus sequence typing of Clostridium difficile. J. Clin. Microbiol. 48, 770–778 (2010).

    CAS  PubMed  Google Scholar 

  49. 49.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  51. 51.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Oliveira, P. H., Touchon, M. & Rocha, E. P. The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res. 42, 10618–10631 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Roberts, R. J., Vincze, T., Posfai, J. & Macelis, D. REBASE-a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 43, D298–D299 (2015).

    CAS  PubMed  Google Scholar 

  54. 54.

    Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Katoh, K. & Standley, D. M. MAFFT: iterative refinement and additional methods. Methods Mol. Biol. 1079, 131–146 (2014).

    PubMed  Google Scholar 

  56. 56.

    Gouy, M., Guindon, S. & Gascuel, O. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 27, 221–224 (2010).

    CAS  PubMed  Google Scholar 

  57. 57.

    Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Bland, C. et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinform. 8, 209 (2007).

    Google Scholar 

  59. 59.

    Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Goldfarb, T. et al. BREX is a novel phage resistance system widespread in microbial genomes. EMBO J. 34, 169–183 (2015).

    CAS  PubMed  Google Scholar 

  61. 61.

    Ofir, G. et al. DISARM is a widespread bacterial defence system with broad anti-phage activities. Nat. Microbiol. 3, 90–98 (2018).

    CAS  PubMed  Google Scholar 

  62. 62.

    Makarova, K. S., Wolf, Y. I., van der Oost, J. & Koonin, E. V. Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biol. Direct 4, 29 (2009).

    PubMed  PubMed Central  Google Scholar 

  63. 63.

    Doron, S. et al. Systematic discovery of antiphage defense systems in the microbial pangenome. Science 359, eaar4120 (2018).

    PubMed  PubMed Central  Google Scholar 

  64. 64.

    Xie, Y. et al. TADB 2.0: an updated database of bacterial type II toxin-antitoxin loci. Nucleic Acids Res. 46, D749–D753 (2018).

    CAS  PubMed  Google Scholar 

  65. 65.

    Fouts, D. E. Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res. 34, 5839–5851 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Arndt, D. et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 44, W16–W21 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Cury, J., Jove, T., Touchon, M., Neron, B. & Rocha, E. P. Identification and analysis of integrons and cassette arrays in bacterial genomes. Nucleic Acids Res. 44, 4539–4550 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Cury, J., Touchon, M. & Rocha, E. P. C. Integrative and conjugative elements and their hosts: composition, distribution and organization. Nucleic Acids Res. 45, 8943–8956 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113 (2004).

    Google Scholar 

  70. 70.

    Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).

    PubMed  PubMed Central  Google Scholar 

  71. 71.

    Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72.

    Touchon, M. et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 5, e1000344 (2009).

    PubMed  PubMed Central  Google Scholar 

  73. 73.

    Miele, V., Penel, S. & Duret, L. Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinform. 12, 116 (2011).

    Google Scholar 

  74. 74.

    Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).

    CAS  PubMed  Google Scholar 

  75. 75.

    Didelot, X. & Wilson, D. J. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput. Biol. 11, e1004041 (2015).

    PubMed  PubMed Central  Google Scholar 

  76. 76.

    Sawyer, S. Statistical tests for detecting gene conversion. Mol. Biol. Evol. 6, 526–538 (1989).

    CAS  PubMed  Google Scholar 

  77. 77.

    Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).

    CAS  Google Scholar 

  78. 78.

    Pfeifer, B., Wittelsburger, U., Ramos-Onsins, S. E. & Lercher, M. J. PopGenome: an efficient Swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 31, 1929–1936 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79.

    Csuros, M. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26, 1910–1912 (2010).

    PubMed  Google Scholar 

  80. 80.

    Ng, Y. K. et al. Expanding the repertoire of gene tools for precise manipulation of the Clostridium difficile genome: allelic exchange using pyrE alleles. PLoS ONE 8, e56051 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. 81.

    Sorg, J. A. & Dineen, S. S. Laboratory maintenance of Clostridium difficile. Curr. Protoc. Microbiol. 12, 9A.1.1–9A.1.10 (2009).

    Google Scholar 

  82. 82.

    Cartman, S. T. & Minton, N. P. A mariner-based transposon system for in vivo random mutagenesis of Clostridium difficile. Appl. Environ. Microbiol. 76, 1103–1109 (2010).

    CAS  PubMed  Google Scholar 

  83. 83.

    Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

    CAS  PubMed  Google Scholar 

  84. 84.

    Donnelly, M. L. et al. A Clostridium difficile-specific, gel-forming protein required for optimal spore germination. mBio 8, e02085-16 (2017).

    PubMed  PubMed Central  Google Scholar 

  85. 85.

    Ducret, A., Quardokus, E. M. & Brun, Y. V. MicrobeJ, a tool for high throughput bacterial cell detection and quantitative analysis. Nat. Microbiol. 1, 16077 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. 86.

    Ribis, J. W., Ravichandran, P., Putnam, E. E., Pishdadian, K. & Shen, A. The conserved spore coat protein SpoVM Is largely dispensable in Clostridium difficile spore formation. mSphere 2, e00315-17 (2017).

    PubMed  PubMed Central  Google Scholar 

  87. 87.

    Edwards, A. N. et al. Chemical and stress resistances of Clostridium difficile spores and vegetative cells. Front. Microbiol. 7, 1698 (2016).

    PubMed  PubMed Central  Google Scholar 

  88. 88.

    Darling, A. E., Mau, B. & Perna, N. T. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5, e11147 (2010).

    PubMed  PubMed Central  Google Scholar 

  89. 89.

    Rissman, A. I. et al. Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics 25, 2071–2073 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. 90.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. 91.

    Novichkov, P. S. et al. RegPrecise 3.0—a resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genom. 14, 745 (2013).

    CAS  Google Scholar 

  92. 92.

    Bailey, T. L. & Gribskov, M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48–54 (1998).

    CAS  PubMed  Google Scholar 

  93. 93.

    Mirauta, B., Nicolas, P. & Richard, H. Parseq: reconstruction of microbial transcription landscape from RNA-seq read counts using state-space models. Bioinformatics 30, 1409–1416 (2014).

    CAS  PubMed  Google Scholar 

  94. 94.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. 95.

    Kopylova, E., Noe, L. & Touzet, H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012).

    CAS  Google Scholar 

  96. 96.

    Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

    CAS  Google Scholar 

  97. 97.

    Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. 98.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  99. 99.

    Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. 100.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    PubMed  PubMed Central  Google Scholar 

  101. 101.

    Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).

    CAS  PubMed  Google Scholar 

  102. 102.

    Mi, H., Muruganujan, A., Ebert, D., Huang, X. & Thomas, P. D. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 47, D419–D426 (2019).

    CAS  PubMed  Google Scholar 

  103. 103.

    Wang, M., Zhao, Y. & Zhang, B. Efficient test and visualization of multi-set intersections. Sci. Rep. 5, 16923 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. 104.

    Anjuwon-Foster, B. R., Maldonado-Vazquez, N. & Tamayo, R. Characterization of flagellum and toxin phase variation in Clostridioides difficile ribotype 012 isolates. J. Bacteriol. 200, e00056-18 (2018).

    PubMed  PubMed Central  Google Scholar 

  105. 105.

    Chen, X. et al. A mouse model of Clostridium difficile-associated disease. Gastroenterology 135, 1984–1992 (2008).

    PubMed  Google Scholar 

  106. 106.

    McKee, R. W., Aleksanyan, N., Garrett, E. M. & Tamayo, R. Type IV pili promote Clostridium difficile adherence and persistence in a mouse model of infection. Infect. Immun. 86, e00943-17 (2018).

    PubMed  PubMed Central  Google Scholar 

  107. 107.

    Woods, E. C., Edwards, A. N., Childress, K. O., Jones, J. B. & McBride, S. M. The C. difficile clnRAB operon initiates adaptations to the host environment in response to LL-37. PLoS Pathog. 14, e1007153 (2018).

    PubMed  PubMed Central  Google Scholar 

  108. 108.

    Purcell, E. B. et al. A nutrient-regulated cyclic diguanylate phosphodiesterase controls Clostridium difficile biofilm and toxin production during stationary phase. Infect. Immun. 85, e00347-17 (2017).

    PubMed  PubMed Central  Google Scholar 

  109. 109.

    Pereira, F. C. et al. The spore differentiation pathway in the enteric pathogen Clostridium difficile. PLoS Genet. 9, e1003782 (2013).

    PubMed  PubMed Central  Google Scholar 

  110. 110.

    Serrano, M. et al. A recombination directionality factor controls the cell type-specific activation of σK and the fidelity of spore development in Clostridium difficile. PLoS Genet. 12, e1006312 (2016).

    PubMed  PubMed Central  Google Scholar 

  111. 111.

    Theriot, C. M. et al. Cefoperazone-treated mice as an experimental platform to assess differential virulence of Clostridium difficile strains. Gut Microbes 2, 326–334 (2011).

    PubMed  PubMed Central  Google Scholar 

Download references


We thank R. J. Roberts (New England Biolabs) for his help with the prediction of R–M systems and orphan MTases in C. difficile genomes using REBASE Tools and for providing comments. He was originally an author of this manuscript; however, as a staunch supporter of the open access movement, he will not author a paper that is not open access. We also thank E. P. C. Rocha (Institut Pasteur, Paris, France) for reading the manuscript and for providing comments. The research was primarily funded by R01 GM114472 (to G.F.) from the National Institutes of Health and Icahn Institute for Genomics and Multiscale Biology. The research was also funded by NIH grants R01 AI119145 (to H.v.B and A.B.), R01 AI22232 (to A.S.), R01 AI107029 (to R.T.) and R35 GM131780 (to A.K.A), a Hirschl Research Scholar award from the Irma T. Hirschl/Monique Weill-Caulier Trust (to G.F.), a Pew Scholar in the Biomedical Sciences grant from the Pew Charitable (to A.S.). G.F. is a Nash Family Research Scholar. A.S. holds an Investigators in the Pathogenesis of Infectious Disease Award from the Burroughs Wellcome Fund. J.W.R was supported by an NIH training grant 5T32GM007310-42. The participation of R. J. Roberts in this project was funded by New England Biolabs. This research was also supported in part through the computational resources and staff expertise provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.

Author information




G.F. conceived the hypothesis. A.S. and G.F. supervised the project. P.H.O. and G.F. designed the computational methods. P.H.O., R.T., A.S. and G.F. designed the experiments. P.H.O. performed most of the computational analyses and developed most of the scripts that support the analyses. J.W.R. performed the growth curves, microscopy analyses (fluorescence and phase contrast), analyses of cell length and sporulation stage, isolation of some of the RNA and processed it for RT–qPCR studies, and RT–qPCR analyses of sporulation genes. A.S. constructed the deletion and catalytic ∆camA mutants, performed complementation, isolated and processed the RNA for several of the RNA analyses, and performed many of the sporulation phenotypic assays. E.M.G. and D.T. performed the animal infection experiment and analysed the data under the supervision of R.T. A.Kim and G.F. performed methylation motif discovery and refinement. O.S. and E.A.M. performed RT–qPCR controls for RNA-seq analyses. O.S., E.A.M., G.D., M.L.-S., C.B., N.E.Z., D.R.A., I.O., G.P., F.W., C.H., S.H., R.S., H.v.B. and A.S. contributed to the other experiments. G.D., I.O. and R.S. designed and conducted SMRT-seq. P.H.O., J.W.R., E.M.G., D.T., A.Kim, O.S., T.P., S.Z., E.A.M., M.T., C.B., S.B., A.K.A., A.B., R.T., E.E.S., R.S., H.v.B., A.Kasarskis, R.T., A.S. and G.F. analysed the data. P.H.O., R.T., A.S. and G.F. wrote the manuscript with additional information inputs from other co-authors.

Corresponding authors

Correspondence to Aimee Shen or Gang Fang.

Ethics declarations

Competing interests

A.S. has a consultant role for BioVector, a diagnostic start-up. The other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Multiple defense systems and gene flux control in C. difficile.

(a) Heatmap aggregate depicts: abundance of defense systems (R-M, abortive infection (Abi), average number of spacers per CRISPR, toxin-antitoxin (T-A), and Shedu systems (other)), homologous recombination (HR) events (given by Geneconv and ClonalFrameML (CFML)), horizontal gene transfer (HGT, given by Wagner parsimony), and number of phage-targeting CRISPR spacers (Supplementary Notes). Phages were clustered according to their family (Siphoviridae (S), Myoviridae (M)), and tail type. (b) Cas genes detected in C. difficile. Apart from the complete Type-IB gene cluster (cas1-cas8), we also observed two truncated gene clusters lacking cas1, cas2, and cas4. One of the truncated operons was present across all genomes, while the second was restricted to ST-1 and ST-55. (c) Example of a putative ‘defense island’ detected in CD_020472 harboring: a Druantia-like system, two T-A systems, two solitary MTases, and one Type I R-M system. The Druantia-like system is similar to the previously reported Type II Druantia systems63 in the sense that a PF00271 helicase conserved C-terminal domain and DUF1998 (PF09369) are associated with a nearby cytosine methylase. However, it lacks a PF00270 DEAx box helicase. (d) Genomic context of the sduA gene in CD_22456 pertaining to the newly identified Shedu defense system. The gene is located in an integrative conjugative element (ICE) (Supplementary Table 2d). (e) Observed/expected (O/E) ratios for co-localized defense systems (maximum of 10 genes apart). Only the most abundant systems were included in the analysis. Expected values were obtained by multiplying the total number of defense systems by the fraction of co-localized defense systems. P values correspond to the Chi-square test.

Extended Data Fig. 2 Relation between gene flux and CRISPR spacer content.

(a) Association between genetic flux (horizontal gene transfer (HGT) and homologous recombination (HR, computed using both ClonalFrameML (CFML) and Geneconv)) and number of CRISPR spacers. The latter were used as proxy of their activity. Data was plotted after excluding very similar ST-1 genomes. The criteria to remove these genomes were based on similarities in R-M content, and gene flux, that is, all ST-1 genomes but CD_020475, CD_020474, CD_021026 were removed (n = 26). (b) Same as (a) but considering the complete genome dataset (n = 36). Spearman’s rank correlation coefficients (ρ) and associated P values (two-sided) are shown in each graph.

Extended Data Fig. 3 Interplay between Type I R-M systems and gene flux in C. difficile.

(a) Observed/expected (O/E) ratios for Type I target recognition motifs in Clostridioides phage genomes. 6 phage genomes representative of Siphoviridae and Myoviridae families and tail types were analyzed (ϕCD111, ϕCDHM11, ϕMMP01, ϕMMP04, ϕC2, ϕCD38). O/E values were obtained with R’MES using Markov chain models that take into consideration oligonucleotide composition. For each motif, we tested if the median value of the O/E ratio in phage genomes was significantly different from 1. In box plots, the middle line indicates the median value, boxes are 25th and 75th quartiles, and whiskers indicate 1.5 times the interquartile range. ***P < 10-3; **P < 10-2 (one-sided one-sample t-test). (b) Relation between HGT and O/E ratio for Type I target recognition motifs. For those C. difficile genomes harboring a single Type I R-M system (that is, without the confounding effect of multiple systems), we computed the average values of HGT, and plotted these values against the average O/E ratio for the corresponding target recognition motif in phage genomes. This was only possible for the n = 6 motifs indicated in brackets. The spearman’s rank correlation coefficient (ρ) and associated P value (two-sided) is shown.

Extended Data Fig. 4 Genomic context and conservation of camA.

(a) CamA protein alignment among Clostridiales (C. mangenotti LM2 (587 aa, 56% identity), C. sordellii (598 aa, 53% identity), C. bifermentans WYM (579 aa, 53% identity), C. dakarense sp. nov (580 aa, 63% identity), Peptostreptococcaceae bacterium VA2) and Fusobacteriales (Psychrilyobacter atlanticus DSM 19335) using ClustalX. The nine conserved motifs (I-VIII and X) typically found in MTases are highlighted. (b) Phylogenetic tree obtained from the MTase alignment. (c) Phylogenetic tree of the 36 C. difficile strains colored by clade (hypervirulent, human/animal (HA) associated) and MLST sequence type (ST). Shown is the genomic context of camA across the entire dataset. (d) Expanded view of the region shown in Fig. 1f. The example shown (including coordinates) refers to the reference genome of C. difficile 630. + and – signs correspond to the sense and antisense strands respectively. Vertical bars correspond to the distribution of the CAAAAA motif.

Extended Data Fig. 5 ∆camA construction, purified spore analyses, broth culture growth, and sporulation kinetics.

(a) PCR to distinguish between wild-type camA and ∆camA using flanking primers and primers internal to the deletion. PCRs were performed twice independently. (b) Growth curves comparing wild-type camA, ∆camA, ΔcamA-C, and camA/N165A cultures grown in BHIS liquid media. Early stationary-phase cultures were diluted to a starting O.D. of 0.05 in BHIS media and growth was measured over 9 h. Each pair genotype / timepoint correspond to mean of n = 3 independent biological replicates. Error bars correspond to standard deviation. (c) Phase-contrast microscopy analyses of sporulating culture samples prior to and after spore purification on a density gradient. No gross differences in spore morphology were observed between wild type and the MTase mutant. The germination efficiency (G.E.) of purified spores from the indicated strains is shown below. Scale bar represents 5 µm. Microscopy analyses were performed on three independent spore preparations. (d) Chloroform resistance of purified ∆camA spores relative to wild type. Spores were treated with 10 % chloroform for 15 min after which spore viability was measured by plating untreated and chloroform-treated spores on media containing germinant and measuring colony forming units. No significant differences in germination efficiency or chloroform resistance were observed. Data are presented as mean ± standard deviation of four independent biological replicates. (e) Heat-resistance (HRES) efficiencies of sporulating cultures 22 h after sporulation induction were determined relative to wild-type. Data are presented as mean ± standard deviation. Three independent biological replicates per group were used. *** P < 10-3, one-way ANOVA with Tukey’s test. Source data

Extended Data Fig. 6 CAAAAA exceptionality, core- / pan-genome analyses of C. difficile, and homologous recombination (HR) landscape.

(a) Observed (O) numbers of CAAAAA motifs in the C. difficile chromosome (n = 7,824), intragenic (n = 6,131), extragenic (n = 1,693), and regulatory regions (n = 794, defined as the windows spanning 100 bp upstream the start codon to 50 bp downstream) were compared with expected (E) values computed in random sequences showing the same oligonucleotide composition. The significance of the difference between O/E was evaluated by computing a P value based on a Gaussian approximation of motif counts under a Markov model of order 4 (*** P < 10-3). (b) Core- and pan-genome sizes of C. difficile. The pan- and core-genomes were used to perform gene accumulation curves. These curves describe the number of new genes (pan-genome) and genes in common (core-genome) obtained by adding a new genome to a previous set. The procedure was repeated 1,000 times by randomly modifying the order of integration of the n = 45 genomes in the analysis. Solid lines correspond to the average number of gene families obtained across all permutations, dashed lines indicate standard deviation of the mean, and shaded regions indicate range. The values for the specific constants obtained after Heap’s law fitting are 2,887 and 0.271, respectively for the k and γ, thus implying an open pan-genome. (c) Spectrum of frequencies for C. difficile gene repertoires. It represents the number of genomes where the families of the pan-genome can be found, from 1 for strain-specific genes to 45 for core-genes. Red indicates accessory genes and blue the genes that are highly persistent in C. difficile. (d) Graphical representation of the recombinational events in the core genome of C. difficile (inferred by ClonalFrameML). The HA and hypervirulent branches of the tree are depicted in colors. Substitutions are represented by vertical lines and recombination events by dark blue horizontal bars. Light blue vertical lines represent the absence of substitutions, and white lines refer to non-homoplasic substitutions. All other colors represent homoplasic substitutions, with increases in homoplasy associated with increases in the degree of redness (from white to red). (e) O/E ratios of orthologous variable CAAAAA motifs (compared to orthologous conserved) in the core-genome (excluding recombination tracts) (n = 770) and recombination tracts (n = 325), or (f) core (n = 1,095) and accessory genome (n = 1,415). P values correspond to the Chi-square test.

Extended Data Fig. 7 Non-methylated CAAAAA motif sites overlapping TFBSs and TSSs.

(a) Interpulse duration ratio (ipdR) density distribution of the terminal adenine of CAAAAA. Motifs were considered as non-methylated if the terminal adenine had IPD ratios < 1.5 (stippled line), coverage > 20×, and methylation scores < 20 (gray distribution). Also shown for comparison are the sections delimited by quantiles (Q) 1, 5, 10, and 50. (b) Additional examples of highly conserved non-methylated CAAAAA motif sites (red ovals) and corresponding genetic context. Positions indicated above the graph correspond to the non-methylated base. (c) %CAAAAA motif sites (non-methylated (NM) and methylated (M)) overlapping CodY and XylR TFBS for each C. difficile isolate excluding ST1 genomes (n = 23). (d) Additional examples of chromosomal regions for which non-methylated CAAAAA motif sites overlap TSSs (shown as arrows). (e) %CAAAAA motif sites (non-methylated and methylated) overlapping TSSs for each C. difficile isolate excluding ST1 genomes (n = 23). For box plots the middle line indicates the median value, boxes are 25th and 75th quartiles, and whiskers indicate 1.5 times the interquartile range. * P< 0.05, *** P< 10-3 (one-sided Mann-Whitney-Wilcoxon rank sum test with continuity correction).

Extended Data Fig. 8 Principal Component Analysis (PCA) and MA-plots for RNA-seq data.

(a) PCA performed using DESeq2 rlog-normalized RNA-seq data (n = 3 biological replicates for each genotype). (b) MA-plots showing the variation of fold change with mean normalized counts (MNC). Number of genes represented: 3,532 (Exp), 3,426 (9 h), 3,523 (10.5), and 3,510 (Stat). Red-colored points have P values < 0.1 (Wald test, Benjamini-Hochberg adjusted). Points that fall out of the window are plotted as open triangles pointing either up or down.

Extended Data Fig. 9 DE, gene, and protein expression analyses.

(a) Enrichment of the CAAAAA motif in DE genes compared to non-DE ones either globally (left, n = 3,649 genes) or at each time point studied (right, nEXP = 3,641, nSPO_9 = 3,636, n SPO_10.5 = 3,644, nSTAT = 3,642). For box plots, the middle line indicates the median value, boxes are 25th and 75th quartiles, and whiskers indicate 1.5 times the interquartile range. * P< 0.05, ** P < 10-2, *** P< 10-3 (one-sided Mann-Whitney-Wilcoxon rank sum test with continuity correction). (b) Time-course change in the expression of genes under the control of the specific sigma factors (σF, σE, σG, and σK) and master transcriptional activator Spo0A at both 9 and 10.5 h after sporulation induction (respectively n = 121 and n = 124 genes). (c) Representative immunoblot time-course (from n = 2 independent biological replicates with similar results) comparing the levels of the early sporulation proteins σF, SpoIIQ, σE, and SpoIVA in WT and ∆camA at 8, 10, 12, 14, and 16 h following induction of sporulation. (d) Western blot for TcdA for each C. difficile genotype. (e) RT-qPCR of spoVD and cwp17 genes (n = 3 independent biological replicates) of exponential and stationary phase liquid broth cultures. Data is presented as mean ± standard deviation. * P < 0.05, ** P < 10-2, two-tailed unpaired Student’s t-test. Source data

Extended Data Fig. 10 Overlap between multiple datasets of differentially expressed (DE) genes.

Comparisons were performed between DE genes called in this study for each time point (blue-shaded, n = 1,537) and those obtained from (a) Jenior et al. (black-shaded, n = 971) and (b) Fletcher et al. (gray-shaded, 299). Color intensities of the outermost layer represent the P value significance of the intersections (3,896 genes were used as background). The height of the corresponding bars is proportional to the number of common genes in the intersection (indicated at the top of the bars for pairwise comparisons between the different studies). Significant overlaps were found between our DE dataset and either (a) genes DE during infection in different mice gut microbiome compositions (P< 10-6, one-tailed hypergeometric test implemented in SuperExactTest, Bonferroni adjusted), or (b) DE genes obtained from mice gut isolates at increasing time points after infection (P < 10-4, one-tailed hypergeometric test implemented in SuperExactTest, Bonferroni adjusted).

Supplementary information

Supplementary Information

Supplementary Notes.

Reporting Summary

Supplementary Tables 1–8

Supplementary Tables 1–8.

Supplementary Data 1

Whole-genome alignment.

Supplementary Data 2

FASTQC files.

Source data

Source Data Extended Data Fig. 5

Unprocessed gel for Extended Data Fig. 5a.

Source Data Extended Data Fig. 9

Unprocessed western blot for Extended Data Fig. 9c: σE, Spo0A, σF, SpoIIQ and SpoIVA.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Oliveira, P.H., Ribis, J.W., Garrett, E.M. et al. Epigenomic characterization of Clostridioides difficile finds a conserved DNA methyltransferase that mediates sporulation and pathogenesis. Nat Microbiol 5, 166–180 (2020).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing