Epigenomic characterization of Clostridioides difficile finds a conserved DNA methyltransferase that mediates sporulation and pathogenesis

Article metrics

Abstract

Clostridioides (formerly Clostridium) difficile is a leading cause of healthcare-associated infections. Although considerable progress has been made in the understanding of its genome, the epigenome of C. difficile and its functional impact has not been systematically explored. Here, we perform a comprehensive DNA methylome analysis of C. difficile using 36 human isolates and observe a high level of epigenomic diversity. We discovered an orphan DNA methyltransferase with a well-defined specificity, the corresponding gene of which is highly conserved across our dataset and in all of the approximately 300 global C. difficile genomes examined. Inactivation of the methyltransferase gene negatively impacts sporulation, a key step in C. difficile disease transmission, and these results are consistently supported by multiomics data, genetic experiments and a mouse colonization model. Further experimental and transcriptomic analyses suggest that epigenetic regulation is associated with cell length, biofilm formation and host colonization. These findings provide a unique epigenetic dimension to characterize medically relevant biological processes in this important pathogen. This study also provides a set of methods for comparative epigenomics and integrative analysis, which we expect to be broadly applicable to bacterial epigenomic studies.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: The methylomes of the 36 strains of C. difficile.
Fig. 2: CamA modulates sporulation levels in C. difficile.
Fig. 3: Abundance, distribution and conservation of CAAAAA motif sites.
Fig. 4: The distribution of non-methylated CAAAAA motif sites and their overlap with TFBSs and TSSs.
Fig. 5: Gene-expression analysis.
Fig. 6: In vivo and additional functional impacts of the ∆camA mutation.

Data availability

Genome assemblies and methylation data are available from NCBI under BioProject ID PRJNA448390. RNA-seq data are available under project ID PRJNA445308. Additional data are available from the corresponding authors on reasonable request.

Code availability

Scripts and a tutorial supporting all of the key analyses of this research are publicly available as a package named Bacterial Epigenome Analysis SuiTe (BEAST) at http://github.com/fanglab/.

References

  1. 1.

    Smits, W. K., Lyras, D., Lacy, D. B., Wilcox, M. H. & Kuijper, E. J. Clostridium difficile infection. Nat. Rev. Dis. Primers 2, 16020 (2016).

  2. 2.

    Sebaihia, M. et al. The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat. Genet. 38, 779–786 (2006).

  3. 3.

    He, M. et al. Emergence and global spread of epidemic healthcare-associated Clostridium difficile. Nat. Genet. 45, 109–113 (2013).

  4. 4.

    Herbert, M., O’Keeffe, T. A., Purdy, D., Elmore, M. & Minton, N. P. Gene transfer into Clostridium difficile CD630 and characterisation of its methylase genes. FEMS Microbiol. Lett. 229, 103–110 (2003).

  5. 5.

    van Eijk, E. et al. Complete genome sequence of the Clostridium difficile laboratory strain 630Δerm reveals differences from strain 630, including translocation of the mobile element CTn5. BMC Genom. 16, 31 (2015).

  6. 6.

    Hargreaves, K. R., Thanki, A. M., Jose, B. R., Oggioni, M. R. & Clokie, M. R. Use of single molecule sequencing for comparative genomics of an environmental and a clinical isolate of Clostridium difficile ribotype 078. BMC Genom. 17, 1020 (2016).

  7. 7.

    Casadesus, J. & Low, D. Epigenetic gene regulation in the bacterial world. Microbiol. Mol. Biol. Rev. 70, 830–856 (2006).

  8. 8.

    Low, D. A., Weyand, N. J. & Mahan, M. J. Roles of DNA adenine methylation in regulating bacterial gene expression and virulence. Infect. Immun. 69, 7197–7204 (2001).

  9. 9.

    Cohen, N. R. et al. A role for the bacterial GATC methylome in antibiotic stress survival. Nat. Genet. 48, 581–586 (2016).

  10. 10.

    Manso, A. S. et al. A random six-phase switch regulates pneumococcal virulence via global epigenetic changes. Nat. Commun. 5, 5055 (2014).

  11. 11.

    Atack, J. M. et al. A biphasic epigenetic switch controls immunoevasion, virulence and niche adaptation in non-typeable Haemophilus influenzae. Nat. Commun. 6, 7828 (2015).

  12. 12.

    Wion, D. & Casadesus, J. N 6-methyl-adenine: an epigenetic signal for DNA-protein interactions. Nat. Rev. Microbiol. 4, 183–192 (2006).

  13. 13.

    Oliveira, P. H., Touchon, M. & Rocha, E. P. Regulation of genetic flux between bacteria by restriction-modification systems. Proc. Natl Acad. Sci. USA 113, 5658–5663 (2016).

  14. 14.

    Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010).

  15. 15.

    Beaulaurier, J., Schadt, E. E. & Fang, G. Deciphering bacterial epigenomes using modern sequencing technologies. Nat. Rev. Genet. 20, 157–172 (2019).

  16. 16.

    Fang, G. et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nat. Biotechnol. 30, 1232–1239 (2012).

  17. 17.

    Murray, I. A. et al. The methylomes of six bacteria. Nucleic Acids Res. 40, 11450–11462 (2012).

  18. 18.

    Davis, B. M., Chao, M. C. & Waldor, M. K. Entering the era of bacterial epigenomics with single molecule real time DNA sequencing. Curr. Opin. Microbiol. 16, 192–198 (2013).

  19. 19.

    Smits, W. K. Hype or hypervirulence: a reflection on problematic C. difficile strains. Virulence 4, 592–596 (2013).

  20. 20.

    Roberts, R. J. et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. 31, 1805–1812 (2003).

  21. 21.

    Wust, J., Sullivan, N. M., Hardegger, U. & Wilkins, T. D. Investigation of an outbreak of antibiotic-associated colitis by various typing methods. J. Clin. Microbiol. 16, 1096–1101 (1982).

  22. 22.

    Barra-Carrasco, J. & Paredes-Sabja, D. Clostridium difficile spores: a major threat to the hospital environment. Future Microbiol. 9, 475–486 (2014).

  23. 23.

    Dembek, M. et al. High-throughput analysis of gene essentiality and sporulation in Clostridium difficile. mBio 6, e02383 (2015).

  24. 24.

    Donnelly, M. L., Fimlaid, K. A. & Shen, A. Characterization of Clostridium difficile spores lacking either SpoVAC or dipicolinic acid synthetase. J. Bacteriol. 198, 1694–1707 (2016).

  25. 25.

    Shen, A., Fimlaid, K. A. & Pishdadian, K. Inducing and quantifying Clostridium difficile spore formation. Methods Mol. Biol. 1476, 129–142 (2016).

  26. 26.

    Schbath, S. & Hoebeke, M. in Advances in Genomic Sequence Analysis and Pattern Discovery Vol. 7 (eds Elnitsk, L. et al.) 25–64 (World Scientific, 2011).

  27. 27.

    Knijnenburg, T. A. et al. Multiscale representation of genomic signals. Nat. Methods 11, 689–694 (2014).

  28. 28.

    Huang, D. W. et al. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35, W169–W175 (2007).

  29. 29.

    Lim, H. N. & van Oudenaarden, A. A multistep epigenetic switch enables the stable inheritance of DNA methylation states. Nat. Genet. 39, 269–275 (2007).

  30. 30.

    Ardissone, S. et al. Cell cycle constraints and environmental control of local DNA hypomethylation in α-Proteobacteria. PLoS Genet. 12, e1006499 (2016).

  31. 31.

    Cota, I. et al. OxyR-dependent formation of DNA methylation patterns in OpvABOFF and OpvABON cell lineages of Salmonella enterica. Nucleic Acids Res. 44, 3595–3609 (2016).

  32. 32.

    Fimlaid, K. A. et al. Global analysis of the sporulation pathway of Clostridium difficile. PLoS Genet. 9, e1003660 (2013).

  33. 33.

    Pishdadian, K., Fimlaid, K. A. & Shen, A. SpoIIID-mediated regulation of σK function during Clostridium difficile sporulation. Mol. Microbiol. 95, 189–208 (2015).

  34. 34.

    Fimlaid, K. A. & Shen, A. Diverse mechanisms regulate sporulation sigma factor activity in the Firmicutes. Curr. Opin. Microbiol. 24, 88–95 (2015).

  35. 35.

    Saujet, L., Pereira, F. C., Henriques, A. O. & Martin-Verstraete, I. The regulatory network controlling spore formation in Clostridium difficile. FEMS Microbiol. Lett. 358, 1–10 (2014).

  36. 36.

    Saujet, L. et al. Genome-wide analysis of cell type-specific gene transcription during spore formation in Clostridium difficile. PLoS Genet. 9, e1003756 (2013).

  37. 37.

    Rosenbusch, K. E., Bakker, D., Kuijper, E. J. & Smits, W. K. C. difficile 630Δerm Spo0A regulates sporulation, but does not contribute to toxin production, by direct high-affinity binding to target DNA. PLoS ONE 7, e48608 (2012).

  38. 38.

    Fimlaid, K. A., Jensen, O., Donnelly, M. L., Siegrist, M. S. & Shen, A. Regulation of Clostridium difficile spore formation by the SpoIIQ and SpoIIIA proteins. PLoS Genet. 11, e1005562 (2015).

  39. 39.

    Ribis, J. W., Fimlaid, K. A. & Shen, A. Differential requirements for conserved peptidoglycan remodeling enzymes during Clostridioides difficile spore formation. Mol. Microbiol. 110, 370–389 (2018).

  40. 40.

    Maldarelli, G. A. et al. Type IV pili promote early biofilm formation by Clostridium difficile. Pathog. Dis. 74, ftw061 (2016).

  41. 41.

    Jenior, M. L., Leslie, J. L., Young, V. B. & Schloss, P. D. Clostridium difficile colonizes alternative nutrient niches during infection across distinct murine gut microbiomes. mSystems 2, e00063-17 (2017).

  42. 42.

    Fletcher, J. R., Erwin, S., Lanzas, C. & Theriot, C. M. Shifts in the gut metabolome and Clostridium difficile transcriptome throughout colonization and infection in a mouse model. mSphere 3, e00089-18 (2018).

  43. 43.

    Lessa, F. C. et al. Burden of Clostridium difficile infection in the United States. N. Engl. J. Med. 372, 825–834 (2015).

  44. 44.

    Deakin, L. J. et al. The Clostridium difficile spo0A gene is a persistence and transmission factor. Infect. Immun. 80, 2704–2711 (2012).

  45. 45.

    Lewis, B. B. & Pamer, E. G. Microbiota-based therapies for Clostridium difficile and antibiotic-resistant enteric Infections. Annu. Rev. Microbiol. 71, 157–178 (2017).

  46. 46.

    Abt, M. C., McKenney, P. T. & Pamer, E. G. Clostridium difficile colitis: pathogenesis and host defence. Nat. Rev. Microbiol. 14, 609–620 (2016).

  47. 47.

    Sanchez-Romero, M. A., Cota, I. & Casadesus, J. DNA methylation in bacteria: from the methyl group to the methylome. Curr. Opin. Microbiol. 25, 9–16 (2015).

  48. 48.

    Griffiths, D. et al. Multilocus sequence typing of Clostridium difficile. J. Clin. Microbiol. 48, 770–778 (2010).

  49. 49.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  50. 50.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  51. 51.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  52. 52.

    Oliveira, P. H., Touchon, M. & Rocha, E. P. The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res. 42, 10618–10631 (2014).

  53. 53.

    Roberts, R. J., Vincze, T., Posfai, J. & Macelis, D. REBASE-a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 43, D298–D299 (2015).

  54. 54.

    Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).

  55. 55.

    Katoh, K. & Standley, D. M. MAFFT: iterative refinement and additional methods. Methods Mol. Biol. 1079, 131–146 (2014).

  56. 56.

    Gouy, M., Guindon, S. & Gascuel, O. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 27, 221–224 (2010).

  57. 57.

    Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).

  58. 58.

    Bland, C. et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinform. 8, 209 (2007).

  59. 59.

    Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).

  60. 60.

    Goldfarb, T. et al. BREX is a novel phage resistance system widespread in microbial genomes. EMBO J. 34, 169–183 (2015).

  61. 61.

    Ofir, G. et al. DISARM is a widespread bacterial defence system with broad anti-phage activities. Nat. Microbiol. 3, 90–98 (2018).

  62. 62.

    Makarova, K. S., Wolf, Y. I., van der Oost, J. & Koonin, E. V. Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biol. Direct 4, 29 (2009).

  63. 63.

    Doron, S. et al. Systematic discovery of antiphage defense systems in the microbial pangenome. Science 359, eaar4120 (2018).

  64. 64.

    Xie, Y. et al. TADB 2.0: an updated database of bacterial type II toxin-antitoxin loci. Nucleic Acids Res. 46, D749–D753 (2018).

  65. 65.

    Fouts, D. E. Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res. 34, 5839–5851 (2006).

  66. 66.

    Arndt, D. et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 44, W16–W21 (2016).

  67. 67.

    Cury, J., Jove, T., Touchon, M., Neron, B. & Rocha, E. P. Identification and analysis of integrons and cassette arrays in bacterial genomes. Nucleic Acids Res. 44, 4539–4550 (2016).

  68. 68.

    Cury, J., Touchon, M. & Rocha, E. P. C. Integrative and conjugative elements and their hosts: composition, distribution and organization. Nucleic Acids Res. 45, 8943–8956 (2017).

  69. 69.

    Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 5, 113 (2004).

  70. 70.

    Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).

  71. 71.

    Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

  72. 72.

    Touchon, M. et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 5, e1000344 (2009).

  73. 73.

    Miele, V., Penel, S. & Duret, L. Ultra-fast sequence clustering from similarity networks with SiLiX. BMC Bioinform. 12, 116 (2011).

  74. 74.

    Tettelin, H., Riley, D., Cattuto, C. & Medini, D. Comparative genomics: the bacterial pan-genome. Curr. Opin. Microbiol. 11, 472–477 (2008).

  75. 75.

    Didelot, X. & Wilson, D. J. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput. Biol. 11, e1004041 (2015).

  76. 76.

    Sawyer, S. Statistical tests for detecting gene conversion. Mol. Biol. Evol. 6, 526–538 (1989).

  77. 77.

    Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).

  78. 78.

    Pfeifer, B., Wittelsburger, U., Ramos-Onsins, S. E. & Lercher, M. J. PopGenome: an efficient Swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 31, 1929–1936 (2014).

  79. 79.

    Csuros, M. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26, 1910–1912 (2010).

  80. 80.

    Ng, Y. K. et al. Expanding the repertoire of gene tools for precise manipulation of the Clostridium difficile genome: allelic exchange using pyrE alleles. PLoS ONE 8, e56051 (2013).

  81. 81.

    Sorg, J. A. & Dineen, S. S. Laboratory maintenance of Clostridium difficile. Curr. Protoc. Microbiol. 12, 9A.1.1–9A.1.10 (2009).

  82. 82.

    Cartman, S. T. & Minton, N. P. A mariner-based transposon system for in vivo random mutagenesis of Clostridium difficile. Appl. Environ. Microbiol. 76, 1103–1109 (2010).

  83. 83.

    Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

  84. 84.

    Donnelly, M. L. et al. A Clostridium difficile-specific, gel-forming protein required for optimal spore germination. mBio 8, e02085-16 (2017).

  85. 85.

    Ducret, A., Quardokus, E. M. & Brun, Y. V. MicrobeJ, a tool for high throughput bacterial cell detection and quantitative analysis. Nat. Microbiol. 1, 16077 (2016).

  86. 86.

    Ribis, J. W., Ravichandran, P., Putnam, E. E., Pishdadian, K. & Shen, A. The conserved spore coat protein SpoVM Is largely dispensable in Clostridium difficile spore formation. mSphere 2, e00315-17 (2017).

  87. 87.

    Edwards, A. N. et al. Chemical and stress resistances of Clostridium difficile spores and vegetative cells. Front. Microbiol. 7, 1698 (2016).

  88. 88.

    Darling, A. E., Mau, B. & Perna, N. T. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5, e11147 (2010).

  89. 89.

    Rissman, A. I. et al. Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics 25, 2071–2073 (2009).

  90. 90.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

  91. 91.

    Novichkov, P. S. et al. RegPrecise 3.0—a resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genom. 14, 745 (2013).

  92. 92.

    Bailey, T. L. & Gribskov, M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48–54 (1998).

  93. 93.

    Mirauta, B., Nicolas, P. & Richard, H. Parseq: reconstruction of microbial transcription landscape from RNA-seq read counts using state-space models. Bioinformatics 30, 1409–1416 (2014).

  94. 94.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  95. 95.

    Kopylova, E., Noe, L. & Touzet, H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012).

  96. 96.

    Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

  97. 97.

    Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2003).

  98. 98.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  99. 99.

    Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

  100. 100.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  101. 101.

    Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).

  102. 102.

    Mi, H., Muruganujan, A., Ebert, D., Huang, X. & Thomas, P. D. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 47, D419–D426 (2019).

  103. 103.

    Wang, M., Zhao, Y. & Zhang, B. Efficient test and visualization of multi-set intersections. Sci. Rep. 5, 16923 (2015).

  104. 104.

    Anjuwon-Foster, B. R., Maldonado-Vazquez, N. & Tamayo, R. Characterization of flagellum and toxin phase variation in Clostridioides difficile ribotype 012 isolates. J. Bacteriol. 200, e00056-18 (2018).

  105. 105.

    Chen, X. et al. A mouse model of Clostridium difficile-associated disease. Gastroenterology 135, 1984–1992 (2008).

  106. 106.

    McKee, R. W., Aleksanyan, N., Garrett, E. M. & Tamayo, R. Type IV pili promote Clostridium difficile adherence and persistence in a mouse model of infection. Infect. Immun. 86, e00943-17 (2018).

  107. 107.

    Woods, E. C., Edwards, A. N., Childress, K. O., Jones, J. B. & McBride, S. M. The C. difficile clnRAB operon initiates adaptations to the host environment in response to LL-37. PLoS Pathog. 14, e1007153 (2018).

  108. 108.

    Purcell, E. B. et al. A nutrient-regulated cyclic diguanylate phosphodiesterase controls Clostridium difficile biofilm and toxin production during stationary phase. Infect. Immun. 85, e00347-17 (2017).

  109. 109.

    Pereira, F. C. et al. The spore differentiation pathway in the enteric pathogen Clostridium difficile. PLoS Genet. 9, e1003782 (2013).

  110. 110.

    Serrano, M. et al. A recombination directionality factor controls the cell type-specific activation of σK and the fidelity of spore development in Clostridium difficile. PLoS Genet. 12, e1006312 (2016).

  111. 111.

    Theriot, C. M. et al. Cefoperazone-treated mice as an experimental platform to assess differential virulence of Clostridium difficile strains. Gut Microbes 2, 326–334 (2011).

Download references

Acknowledgements

We thank R. J. Roberts (New England Biolabs) for his help with the prediction of R–M systems and orphan MTases in C. difficile genomes using REBASE Tools and for providing comments. He was originally an author of this manuscript; however, as a staunch supporter of the open access movement, he will not author a paper that is not open access. We also thank E. P. C. Rocha (Institut Pasteur, Paris, France) for reading the manuscript and for providing comments. The research was primarily funded by R01 GM114472 (to G.F.) from the National Institutes of Health and Icahn Institute for Genomics and Multiscale Biology. The research was also funded by NIH grants R01 AI119145 (to H.v.B and A.B.), R01 AI22232 (to A.S.), R01 AI107029 (to R.T.) and R35 GM131780 (to A.K.A), a Hirschl Research Scholar award from the Irma T. Hirschl/Monique Weill-Caulier Trust (to G.F.), a Pew Scholar in the Biomedical Sciences grant from the Pew Charitable (to A.S.). G.F. is a Nash Family Research Scholar. A.S. holds an Investigators in the Pathogenesis of Infectious Disease Award from the Burroughs Wellcome Fund. J.W.R was supported by an NIH training grant 5T32GM007310-42. The participation of R. J. Roberts in this project was funded by New England Biolabs. This research was also supported in part through the computational resources and staff expertise provided by the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai.

Author information

G.F. conceived the hypothesis. A.S. and G.F. supervised the project. P.H.O. and G.F. designed the computational methods. P.H.O., R.T., A.S. and G.F. designed the experiments. P.H.O. performed most of the computational analyses and developed most of the scripts that support the analyses. J.W.R. performed the growth curves, microscopy analyses (fluorescence and phase contrast), analyses of cell length and sporulation stage, isolation of some of the RNA and processed it for RT–qPCR studies, and RT–qPCR analyses of sporulation genes. A.S. constructed the deletion and catalytic ∆camA mutants, performed complementation, isolated and processed the RNA for several of the RNA analyses, and performed many of the sporulation phenotypic assays. E.M.G. and D.T. performed the animal infection experiment and analysed the data under the supervision of R.T. A.Kim and G.F. performed methylation motif discovery and refinement. O.S. and E.A.M. performed RT–qPCR controls for RNA-seq analyses. O.S., E.A.M., G.D., M.L.-S., C.B., N.E.Z., D.R.A., I.O., G.P., F.W., C.H., S.H., R.S., H.v.B. and A.S. contributed to the other experiments. G.D., I.O. and R.S. designed and conducted SMRT-seq. P.H.O., J.W.R., E.M.G., D.T., A.Kim, O.S., T.P., S.Z., E.A.M., M.T., C.B., S.B., A.K.A., A.B., R.T., E.E.S., R.S., H.v.B., A.Kasarskis, R.T., A.S. and G.F. analysed the data. P.H.O., R.T., A.S. and G.F. wrote the manuscript with additional information inputs from other co-authors.

Correspondence to Aimee Shen or Gang Fang.

Ethics declarations

Competing interests

A.S. has a consultant role for BioVector, a diagnostic start-up. The other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Multiple defense systems and gene flux control in C. difficile.

(a) Heatmap aggregate depicts: abundance of defense systems (R-M, abortive infection (Abi), average number of spacers per CRISPR, toxin-antitoxin (T-A), and Shedu systems (other)), homologous recombination (HR) events (given by Geneconv and ClonalFrameML (CFML)), horizontal gene transfer (HGT, given by Wagner parsimony), and number of phage-targeting CRISPR spacers (Supplementary Notes). Phages were clustered according to their family (Siphoviridae (S), Myoviridae (M)), and tail type. (b) Cas genes detected in C. difficile. Apart from the complete Type-IB gene cluster (cas1-cas8), we also observed two truncated gene clusters lacking cas1, cas2, and cas4. One of the truncated operons was present across all genomes, while the second was restricted to ST-1 and ST-55. (c) Example of a putative ‘defense island’ detected in CD_020472 harboring: a Druantia-like system, two T-A systems, two solitary MTases, and one Type I R-M system. The Druantia-like system is similar to the previously reported Type II Druantia systems63 in the sense that a PF00271 helicase conserved C-terminal domain and DUF1998 (PF09369) are associated with a nearby cytosine methylase. However, it lacks a PF00270 DEAx box helicase. (d) Genomic context of the sduA gene in CD_22456 pertaining to the newly identified Shedu defense system. The gene is located in an integrative conjugative element (ICE) (Supplementary Table 2d). (e) Observed/expected (O/E) ratios for co-localized defense systems (maximum of 10 genes apart). Only the most abundant systems were included in the analysis. Expected values were obtained by multiplying the total number of defense systems by the fraction of co-localized defense systems. P values correspond to the Chi-square test.

Extended Data Fig. 2 Relation between gene flux and CRISPR spacer content.

(a) Association between genetic flux (horizontal gene transfer (HGT) and homologous recombination (HR, computed using both ClonalFrameML (CFML) and Geneconv)) and number of CRISPR spacers. The latter were used as proxy of their activity. Data was plotted after excluding very similar ST-1 genomes. The criteria to remove these genomes were based on similarities in R-M content, and gene flux, that is, all ST-1 genomes but CD_020475, CD_020474, CD_021026 were removed (n = 26). (b) Same as (a) but considering the complete genome dataset (n = 36). Spearman’s rank correlation coefficients (ρ) and associated P values (two-sided) are shown in each graph.

Extended Data Fig. 3 Interplay between Type I R-M systems and gene flux in C. difficile.

(a) Observed/expected (O/E) ratios for Type I target recognition motifs in Clostridioides phage genomes. 6 phage genomes representative of Siphoviridae and Myoviridae families and tail types were analyzed (ϕCD111, ϕCDHM11, ϕMMP01, ϕMMP04, ϕC2, ϕCD38). O/E values were obtained with R’MES using Markov chain models that take into consideration oligonucleotide composition. For each motif, we tested if the median value of the O/E ratio in phage genomes was significantly different from 1. In box plots, the middle line indicates the median value, boxes are 25th and 75th quartiles, and whiskers indicate 1.5 times the interquartile range. ***P < 10-3; **P < 10-2 (one-sided one-sample t-test). (b) Relation between HGT and O/E ratio for Type I target recognition motifs. For those C. difficile genomes harboring a single Type I R-M system (that is, without the confounding effect of multiple systems), we computed the average values of HGT, and plotted these values against the average O/E ratio for the corresponding target recognition motif in phage genomes. This was only possible for the n = 6 motifs indicated in brackets. The spearman’s rank correlation coefficient (ρ) and associated P value (two-sided) is shown.

Extended Data Fig. 4 Genomic context and conservation of camA.

(a) CamA protein alignment among Clostridiales (C. mangenotti LM2 (587 aa, 56% identity), C. sordellii (598 aa, 53% identity), C. bifermentans WYM (579 aa, 53% identity), C. dakarense sp. nov (580 aa, 63% identity), Peptostreptococcaceae bacterium VA2) and Fusobacteriales (Psychrilyobacter atlanticus DSM 19335) using ClustalX. The nine conserved motifs (I-VIII and X) typically found in MTases are highlighted. (b) Phylogenetic tree obtained from the MTase alignment. (c) Phylogenetic tree of the 36 C. difficile strains colored by clade (hypervirulent, human/animal (HA) associated) and MLST sequence type (ST). Shown is the genomic context of camA across the entire dataset. (d) Expanded view of the region shown in Fig. 1f. The example shown (including coordinates) refers to the reference genome of C. difficile 630. + and – signs correspond to the sense and antisense strands respectively. Vertical bars correspond to the distribution of the CAAAAA motif.

Extended Data Fig. 5 ∆camA construction, purified spore analyses, broth culture growth, and sporulation kinetics.

(a) PCR to distinguish between wild-type camA and ∆camA using flanking primers and primers internal to the deletion. PCRs were performed twice independently. (b) Growth curves comparing wild-type camA, ∆camA, ΔcamA-C, and camA/N165A cultures grown in BHIS liquid media. Early stationary-phase cultures were diluted to a starting O.D. of 0.05 in BHIS media and growth was measured over 9 h. Each pair genotype / timepoint correspond to mean of n = 3 independent biological replicates. Error bars correspond to standard deviation. (c) Phase-contrast microscopy analyses of sporulating culture samples prior to and after spore purification on a density gradient. No gross differences in spore morphology were observed between wild type and the MTase mutant. The germination efficiency (G.E.) of purified spores from the indicated strains is shown below. Scale bar represents 5 µm. Microscopy analyses were performed on three independent spore preparations. (d) Chloroform resistance of purified ∆camA spores relative to wild type. Spores were treated with 10 % chloroform for 15 min after which spore viability was measured by plating untreated and chloroform-treated spores on media containing germinant and measuring colony forming units. No significant differences in germination efficiency or chloroform resistance were observed. Data are presented as mean ± standard deviation of four independent biological replicates. (e) Heat-resistance (HRES) efficiencies of sporulating cultures 22 h after sporulation induction were determined relative to wild-type. Data are presented as mean ± standard deviation. Three independent biological replicates per group were used. *** P < 10-3, one-way ANOVA with Tukey’s test. Source data

Extended Data Fig. 6 CAAAAA exceptionality, core- / pan-genome analyses of C. difficile, and homologous recombination (HR) landscape.

(a) Observed (O) numbers of CAAAAA motifs in the C. difficile chromosome (n = 7,824), intragenic (n = 6,131), extragenic (n = 1,693), and regulatory regions (n = 794, defined as the windows spanning 100 bp upstream the start codon to 50 bp downstream) were compared with expected (E) values computed in random sequences showing the same oligonucleotide composition. The significance of the difference between O/E was evaluated by computing a P value based on a Gaussian approximation of motif counts under a Markov model of order 4 (*** P < 10-3). (b) Core- and pan-genome sizes of C. difficile. The pan- and core-genomes were used to perform gene accumulation curves. These curves describe the number of new genes (pan-genome) and genes in common (core-genome) obtained by adding a new genome to a previous set. The procedure was repeated 1,000 times by randomly modifying the order of integration of the n = 45 genomes in the analysis. Solid lines correspond to the average number of gene families obtained across all permutations, dashed lines indicate standard deviation of the mean, and shaded regions indicate range. The values for the specific constants obtained after Heap’s law fitting are 2,887 and 0.271, respectively for the k and γ, thus implying an open pan-genome. (c) Spectrum of frequencies for C. difficile gene repertoires. It represents the number of genomes where the families of the pan-genome can be found, from 1 for strain-specific genes to 45 for core-genes. Red indicates accessory genes and blue the genes that are highly persistent in C. difficile. (d) Graphical representation of the recombinational events in the core genome of C. difficile (inferred by ClonalFrameML). The HA and hypervirulent branches of the tree are depicted in colors. Substitutions are represented by vertical lines and recombination events by dark blue horizontal bars. Light blue vertical lines represent the absence of substitutions, and white lines refer to non-homoplasic substitutions. All other colors represent homoplasic substitutions, with increases in homoplasy associated with increases in the degree of redness (from white to red). (e) O/E ratios of orthologous variable CAAAAA motifs (compared to orthologous conserved) in the core-genome (excluding recombination tracts) (n = 770) and recombination tracts (n = 325), or (f) core (n = 1,095) and accessory genome (n = 1,415). P values correspond to the Chi-square test.

Extended Data Fig. 7 Non-methylated CAAAAA motif sites overlapping TFBSs and TSSs.

(a) Interpulse duration ratio (ipdR) density distribution of the terminal adenine of CAAAAA. Motifs were considered as non-methylated if the terminal adenine had IPD ratios < 1.5 (stippled line), coverage > 20×, and methylation scores < 20 (gray distribution). Also shown for comparison are the sections delimited by quantiles (Q) 1, 5, 10, and 50. (b) Additional examples of highly conserved non-methylated CAAAAA motif sites (red ovals) and corresponding genetic context. Positions indicated above the graph correspond to the non-methylated base. (c) %CAAAAA motif sites (non-methylated (NM) and methylated (M)) overlapping CodY and XylR TFBS for each C. difficile isolate excluding ST1 genomes (n = 23). (d) Additional examples of chromosomal regions for which non-methylated CAAAAA motif sites overlap TSSs (shown as arrows). (e) %CAAAAA motif sites (non-methylated and methylated) overlapping TSSs for each C. difficile isolate excluding ST1 genomes (n = 23). For box plots the middle line indicates the median value, boxes are 25th and 75th quartiles, and whiskers indicate 1.5 times the interquartile range. * P< 0.05, *** P< 10-3 (one-sided Mann-Whitney-Wilcoxon rank sum test with continuity correction).

Extended Data Fig. 8 Principal Component Analysis (PCA) and MA-plots for RNA-seq data.

(a) PCA performed using DESeq2 rlog-normalized RNA-seq data (n = 3 biological replicates for each genotype). (b) MA-plots showing the variation of fold change with mean normalized counts (MNC). Number of genes represented: 3,532 (Exp), 3,426 (9 h), 3,523 (10.5), and 3,510 (Stat). Red-colored points have P values < 0.1 (Wald test, Benjamini-Hochberg adjusted). Points that fall out of the window are plotted as open triangles pointing either up or down.

Extended Data Fig. 9 DE, gene, and protein expression analyses.

(a) Enrichment of the CAAAAA motif in DE genes compared to non-DE ones either globally (left, n = 3,649 genes) or at each time point studied (right, nEXP = 3,641, nSPO_9 = 3,636, n SPO_10.5 = 3,644, nSTAT = 3,642). For box plots, the middle line indicates the median value, boxes are 25th and 75th quartiles, and whiskers indicate 1.5 times the interquartile range. * P< 0.05, ** P < 10-2, *** P< 10-3 (one-sided Mann-Whitney-Wilcoxon rank sum test with continuity correction). (b) Time-course change in the expression of genes under the control of the specific sigma factors (σF, σE, σG, and σK) and master transcriptional activator Spo0A at both 9 and 10.5 h after sporulation induction (respectively n = 121 and n = 124 genes). (c) Representative immunoblot time-course (from n = 2 independent biological replicates with similar results) comparing the levels of the early sporulation proteins σF, SpoIIQ, σE, and SpoIVA in WT and ∆camA at 8, 10, 12, 14, and 16 h following induction of sporulation. (d) Western blot for TcdA for each C. difficile genotype. (e) RT-qPCR of spoVD and cwp17 genes (n = 3 independent biological replicates) of exponential and stationary phase liquid broth cultures. Data is presented as mean ± standard deviation. * P < 0.05, ** P < 10-2, two-tailed unpaired Student’s t-test. Source data

Extended Data Fig. 10 Overlap between multiple datasets of differentially expressed (DE) genes.

Comparisons were performed between DE genes called in this study for each time point (blue-shaded, n = 1,537) and those obtained from (a) Jenior et al. (black-shaded, n = 971) and (b) Fletcher et al. (gray-shaded, 299). Color intensities of the outermost layer represent the P value significance of the intersections (3,896 genes were used as background). The height of the corresponding bars is proportional to the number of common genes in the intersection (indicated at the top of the bars for pairwise comparisons between the different studies). Significant overlaps were found between our DE dataset and either (a) genes DE during infection in different mice gut microbiome compositions (P< 10-6, one-tailed hypergeometric test implemented in SuperExactTest, Bonferroni adjusted), or (b) DE genes obtained from mice gut isolates at increasing time points after infection (P < 10-4, one-tailed hypergeometric test implemented in SuperExactTest, Bonferroni adjusted).

Supplementary information

Supplementary Information

Supplementary Notes.

Reporting Summary

Supplementary Tables 1–8

Supplementary Tables 1–8.

Supplementary Data 1

Whole-genome alignment.

Supplementary Data 2

FASTQC files.

Source data

Source Data Extended Data Fig. 5

Unprocessed gel for Extended Data Fig. 5a.

Source Data Extended Data Fig. 9

Unprocessed western blot for Extended Data Fig. 9c: σE, Spo0A, σF, SpoIIQ and SpoIVA.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Oliveira, P.H., Ribis, J.W., Garrett, E.M. et al. Epigenomic characterization of Clostridioides difficile finds a conserved DNA methyltransferase that mediates sporulation and pathogenesis. Nat Microbiol (2019) doi:10.1038/s41564-019-0613-4

Download citation