Exoproteomic analysis of two MLST clade 2 strains of Clostridioides difficile from Latin America reveal close similarities

Clostridioides difficile BI/NAP1/ribotype 027 is an epidemic hypervirulent strain found worldwide, including in Latin America. We examined the genomes and exoproteomes of two multilocus sequence type (MLST) clade 2 C. difficile strains considered hypervirulent: ICC-45 (ribotype SLO231/UK[CE]821), isolated in Brazil, and NAP1/027/ST01 (LIBA5756), isolated during a 2010 outbreak in Costa Rica. C. difficile isolates were cultured and extracellular proteins were analyzed using high-performance liquid chromatography-tandem mass spectrometry. Genomic analysis revealed that these isolates shared most of the gene composition. Only 83 and 290 NAP1/027 genes were considered singletons in ICC-45 and NAP1/027, respectively. Exoproteome analysis revealed 197 proteins, of which 192 were similar in both strains. Only five proteins were exclusive to the ICC-45 strain. These proteins were involved with catalytic and binding functions and indirectly interacted with proteins related to pathogenicity. Most proteins, including TcdA, TcdB, flagellin subunit, and cell surface protein, were overrepresented in the ICC-45 strain; 14 proteins, including mature S-layer protein, were present in higher proportions in LIBA5756. Data are available via ProteomeXchange with identifier PXD026218. These data show close similarity between the genome and proteins in the supernatant of two strains with hypervirulent features isolated in Latin America and underscore the importance of epidemiological surveillance of the transmission and emergence of new strains.

www.nature.com/scientificreports/ and metabolic activity. From that analysis, we can better understand molecular mechanisms associated with CDI and potentially can identify cellular targets for therapeutic purposes 6,8 . Few studies have analyzed the molecular epidemiology of CDI in Latin America. A recent study compared whole-genome sequences of 25 NAP1, RT027, or ST01 C. difficile clinical isolates with 129 isolates from the same genotype collected worldwide. These lineages entered Mexico, Costa Rica, Honduras, and Chile from different geographic areas, suggesting that the B1/NAP1/RT027/ST01 isolates from these countries are susceptible to acquiring distinct single-nucleotide polymorphisms and genes implicated in antibiotic resistance 5 . The epidemic strain, NAP1/027, had not been isolated in Brazil to date; however, clade 2 strains have been isolated in two locations, including the one analyzed here.
The aim of this study was to use a proteomic and genomic approach to compare two MLST clade 2 C. difficile strains with hypervirulent features, ICC-45 (ST41) 9 and NAP1/RT027 (ST01) 2,9 , isolated in Brazil and Costa Rica, respectively.

Results and discussion
C. difficile strains homologous proteins. Homology analysis of C. difficile ICC-45 revealed a total of 3840 proteins. Of these, 111 were exclusive of this strain. Among those 111 exclusive proteins of C. difficile ICC-45, 83 were singletons and 28 were paralogous proteins. C. difficile NAP1/027 (LIBA5756) had 4012 proteins identified, of which 292 were exclusive to this strain. Moreover, 290 of these C. difficile NAP1/027 exclusive proteins were singletons, and only two were paralogous proteins (Fig. 1).
The numbers of proteins identified in the C. difficile strains in this study were similar to the numbers of proteins identified in other strains available on GenBank/NCBI (access codes: NZ_CM000658.1, NZ_CM000637.1, NZ_CM000661.1). Such strains were isolated from patients diagnosed with severe C. difficile-associated disease in hospital environments and were analyzed in an unpublished comparative genome study. Two of the three C. difficile strains (NZ_CM000658.1 and NZ_CM000661.1) corresponded to NAP1 strains; the other strain (NZ_CM000637.1) was classified as a NAP2-like strain. Of note, one paralogous group composed of 20 copies of the protein "IS200/IS605 family transposase ISBth17" was found in the C. difficile ICC-45 genome.
We found that 3729 proteins of C. difficile ICC-45 and 3720 proteins of C. difficile NAP1/027 (LIBA5756) are included in 3715 orthologous groups (Fig. 1). The reason why C. difficile ICC-45 has nine more orthologous proteins than C. difficile NAP1/027 (LIBA5756) is that 17 orthologous groups had an uneven number of proteins shared between the two strains (Supplementary Table S1).
We identified that 3689 groups of the 3715 orthologous groups were shared between the two C. difficile strains and were composed of only one protein from each strain. Furthermore, 99.9% of the proteins that belong to these shared orthologous groups had a high shared sequence similarity (greater than 95%). This result is corroborated by Costa et al. 9 , who showed that both strains had a close phylogenetic relationship and belonged to the same hypervirulent clade (clade 2).
Analysis of proteins identified in culture supernatants of C. difficile strains. A total of 197 proteins were identified in the supernatants of the two C. difficile strains. Supplementary Table S2 lists all proteins identified, their accession numbers, and total spectrum counts.
The orthology analysis showed that the coding sequences for these five proteins were present in the genomes of both strains. Gene regulation might explain such a difference in the expression of the five proteins. Dupuy et al. 10 found that several environmental factors affected protein expression in C. difficile. Furthermore, pathogenic organisms have well-regulated control of the expression of their proteins to survive within the host 11,12 . www.nature.com/scientificreports/ Considering the ICC-45 exclusive proteins, phosphoglyceromutase (gpmI) plays an essential role in glycolysis and gluconeogenesis. This enzyme interconverts 3-phosphoglyceric acid and 2-phosphoglyceric acid 13 . Nukui et al. 14 showed that cofactor-independent phosphoglyceromutase is a crucial enzyme for the growth of cells and spores in Bacillus species.
Members of the Rrf2 family (transcriptional regulator) are relatively small proteins (12-18 kDa) represented by four regulators (CymR, NsrR, RirA, and IscR) 15 . The protein iron-sulfur biosynthesis regulator (IscR) houses a cluster [2Fe-2S] that coordinates the use of iron and cysteine to form the Fe/S cluster 16 . In Escherichia coli and other bacteria, the genes involved in this process are regulated in response to the availability of [Fe-S] through the IscR protein and, consequently, are induced during iron deficiency and oxidative stress 14,15 .
Among the exclusive proteins from the ICC-45 strain, we identified a conserved hypothetical protein of 44 kDa. After comparison in genomic databases, we determined that the protein is 100% identical to coenzyme F 420 : γ-glutamyl ligase (FbiB) in C. difficile. Coenzyme F 420 is a group of active redox cofactors, including FbiB, found mainly in archaea and actinobacteria (including mycobacteria) 17 . Studies have suggested that coenzyme F 420 protects Mycobacterium tuberculosis against oxidative and nitrosative stress during pathogenesis 18,19 .
Regarding the 192 proteins shared among the C. difficile strains, 26 were subjectively selected based on the best knowledge of their function and role in the bacteria. Those 26 proteins were categorized by activity into six groups: 1) pathogenicity (toxins, cell surface proteins, flagellar proteins, cell wall proteins, hydrolases and proteases), 2) resistance to antimicrobials (beta-lactamases and pyruvate-ferredoxin), 3) oxidative stress and thermal shock (chaperones), 4) resistance to nitric oxide (nitric oxide reductase flavorubredoxin), 5) metabolism and catalytic activity (trehalose-6-phosphate hydrolase and cysteine desulfurase), and 6) other activities (transcription elongation factor) ( Table 2).
In comparison with NAP1/027, increased proportions of proteins involved in CDI pathogenesis were detected in the exoproteome of ICC-45, including cell surface protein-S-layer precursor protein, TcdA, TcdB, cell surface protein (Cwp19), cell wall protein (Cwp22), cell-wall hydrolase, cell wall binding protein (Cwp28), flagellin (FliC), and cysteine protease (Cwp84) ( Table 2). These proteins contribute to the inflammatory response observed in C. difficile pathogenesis 20 , which might explain previous reports of similar increased myeloperoxidase, proinflammatory cytokines, oxidative stress response, tissue nitrite, and epithelial damage in an animal model injected with supernatants of these two strains 9 . However, using western blotting, Costa et al. 9 showed that ICC-45 releases less toxin than does NAP1/027. This divergence might be a result of different cultivation times being used in the two studies (96 h in the previous study, 24 h in the present study). The present study also did not use supernatant filtration. In addition, the antibodies use in the previous studied were not strain-specific, which could underestimate the level of the variant TcdB produced by ICC-45.
Our finding that ICC-45 has a higher proportion of cell surface protein (S-layer precursor protein) than does NAP1/027 (LIBA5756) is in accord with previous findings that ST1 NAP1 produced a low proportion of the unprocessed precursor 21 . However, NAP1/027 (LIBA5756) expressed a higher proportion of mature S-layer protein, which is formed when the SlpA undergoes proteolytic cleavage by the protease Cwp84. The mature S-layer protein consists of two subunit proteins: a low-molecular-weight complex and high-molecular-weight complex thought to play a role in host cell adhesion 22,23 . Using rabbit anti-sera, Quesada-Gómez et al. 21 measured levels of SlpA in the exoproteome relative to the corresponding amount in lysates of vegetative cells and reported a low proportion of SlpA in the bacteria-free supernatant of ST1_NAP1. We found that the proportion of S-layer proteins in ICC-45 was even less than that of the ST1_NAP1 used by Quesada-Gómez et al. 21 Those authors did use different strains (ST1_NAP1 5712 and 6656), and different methodologies. Thus, it is possible that NAP1/027 (LIBA5756) had higher adhesion to host cells or inflammatory capacity than did ICC-45. This remains to be tested.
In the group of antimicrobial resistance-related proteins, ICC-45 produced higher ratios of proteins from the beta-lactamases family, from pyruvate-ferredoxin oxidoreductase, and from nitroreductase than did NAP1/027 (LIBA5756) ( Table 2). Chong et al. 24 also showed high expression of DNA repair proteins, putative nitroreductases, and the ferric uptake regulator (Fur) in strains with reduced susceptibility or resistance to metronidazole, suggesting that these proteins might be involved in metronidazole resistance. As demonstrated previously 9 , ICC-45 was resistant to ceftriaxone and clindamycin, but susceptible to metronidazole, vancomycin, rifampicin, and different from NAP1/027 susceptible to moxifloxacin and levofloxacin. Some groups of antibiotics, such as cephalosporins, clindamycin and fluoroquinolones are associated with increased risk for development of CDI 25,26 . Therefore, resistance to these antibiotics plays an important role in driving the current epidemiological changes of CDI and, consequently, in the appearance of new ribotypes of C. difficile 27 . www.nature.com/scientificreports/ In comparison with NAP/027 (LIBA5756), the ICC-45 strain produced almost threefold the amount of cysteine desulfurase protein involved in oxidative stress response. The ICC-45 strain also secretes many more heat shock proteins (chaperones). In turn, the NAP/027 (LIBA5756) strain secretes more rubrerythrin than does ICC-45 (Table 2). Rubrerythrin, a protein responsive to oxidative stress, was described initially for its role in protecting strictly anaerobic bacteria from stress. In agreement with our study, a proteomic analysis of C. difficile 630 strain showed an increased level of rubrerythrin in response to thermal stress 28 .
The ICC-45 strain produced almost 3 times the amount of the nitric oxide reductase flavorubredoxin that NAP1/027 (LIBA5756) did (Table 2). ICC-45 also produced larger amounts of putative nitric oxide reductase flavoprotein than did NAP1/027 (LIBA5657). These proteins are involved with protection against the effects of nitric oxide 24 . Because nitric oxide is important in the host defense against pathogens, the increased secretion of putative nitric oxide reductase flavoprotein by ICC-45 might contribute to worse CDI outcomes.
In comparison with NAP1/027 (LIBA5756), ICC-45 expressed more trehalose-6-phosphate hydrolase, phosphoenolpyruvate-protein phosphotransferase, and alanine racemase, all of which are involved with metabolism and catalytic activity ( Table 2). Collins et al. 29 showed that dietary trehalose plays a role in the dissemination of two epidemic ribotypes of C. difficile (RT027 and RT078). They also showed that the introduction of trehalose as a sweetener in the human diet might have played a relevant role in the emergence of these epidemic and hypervirulent strains.

Proteins involved in pathogenicity
Cell localization was analyzed using the PSORTb. Proteins belonging exclusively to the ICC-45 strain were of cytoplasmic origin (60%) or unknown origin (40%) (Fig. 2A). More than half (58%) of the 26 shared proteins also were of cytoplasmic origin. The remainder were localized to the cell wall (19%), extracellular medium (8%), cytoplasmic membrane (7%), or were of unknown origin (8%) (Fig. 2B). Similarly, Jain et al. 28 showed that 58 of 107 proteins identified in insoluble subproteome of C. difficile strain 630 were of cytoplasmic origin. Moura et al. 6 performed a proteomic analysis of a commercial culture filtrate of C. difficile and found that of the 101 proteins identified, the majority (72%) also were of cytoplasmic origin.

Functional analysis of protein interaction networks. The interactions of the proteins shared between
ICC-45 and NAP1/027 (LIBA5756), are remarkably similar in both strains (Supplementary Figure S1A and 1B). The main difference is the five proteins exclusive to the ICC-45 strain (circled in red in Supplementary Figure S1A). Three of these proteins interact with the other ICC-45 proteins that are also found in NAP1/027 (LIBA5756), and two do not present any interactions with the others. The IscR protein, a transcriptional regulator exclusively expressed in ICC-45, interacts directly with the chaperone DnaK protein, which in turn, interacts with the GroL chaperone, and then interacts with pathogenicity proteins (TcdA, TcdB, FliC, Cwp84, and SlpA). Delta-aminolevulinic acid dehydratase (HemB) interacts with ribulose-phosphate 3-epimerase (Rpe1), which then interacts with nitroreductase (CD2572) and pyruvate-ferredoxin oxidoreductase (Pfo) (Supplementary Figure S1A). Pfo also interacts with DnaK and GroL proteins. The DnaK and GroL protein interactions and their links to virulence factors (TcdA, TcdB, FliC, Cwp84, and SlpA) suggest that ICC-45-exclusive proteins might influence the pathogenicity of C. difficile. IscR also interacts with cysteine desulfurase (IscS). The interactions of IscR with IscS and with other proteins are involved in cysteine metabolism, being activated under stress conditions or even the absence of iron 16 .
This study documented the substantial similarity of coding sequences of two MLST clade 2 strains of C. difficile isolated in Latin America belonging to different pulsotypes and ribotypes. Differences in the expression of specific proteins or in their expressed levels and their interaction with the other proteins might help clarify variations in pathogenicity, antibiotic resistance, metabolism, oxidative stress, resistance to nitric oxide and other aspects of CDI. In a globalized world, the emergence and dissemination of new strains capable of generating outbreaks requires identification of biomarkers and proteins that might lead to better understanding of pathogenesis, treatment, and vaccine development.   9 .
The institutional review boards approved these protocols, and written informed consent was obtained from the LARs as deceased patient. All methods were carried out in accordance with the international guidelines for research on humans and principles state in declaration of Helsinki. The ethical approval was obtained by the Ethics and Research Committees of Hospital San Juan de Dios (Costa Rica -protocol CLOBI-SJD-O18-2009) and Hospital Haroldo Juaçaba of the Cancer Institute of Ceará (Brazil -protocol 208.362).
Gene ortholog C. difficile strains. For this study, the genome of the C. difficile NAP1/027 (LIBA5756) strain was assembled using the genome data of C. difficile ICC-45 strain as a reference (GenBank Assembly Accession GCA_002891495.1). Paired-end reads of C. difficile NAP1/027 were obtained from the European Bioinformatics Institute (EBI) database (accession: ERR467583) and their quality was assessed using FastQC (v0.11.8) software. Only reads with a Phred quality score greater than 36 were considered for later analysis. Assembly of paired-end reads was performed using SPAdes (v3.13.1; St. Petersburg State University, Russia) software.
Next, genome annotation of C. difficile NAP1/027 (LIBA5756) and C. difficile ICC-45 was performed using the Prokka (v1.14.3) software pipeline, allowing for prediction of RNA features with the rnammer and rfam optional parameters. Using the predicted proteins, orthology analysis of C. difficile NAP1/027 (LIBA5756) and C. difficile ICC-45 was performed using OrthoMCL (v2.0.9) software, using an e-value cut off of 10 −5 . The results obtained by the orthology analysis were treated and reassessed using in-house scripts, considering parameters of similarity and positivity between the protein sequences studied.
Cell culture and supernatants of the C. difficile strains. After growth of the C. difficile isolates on Brucella agar plates, 2-4 colonies of each were inoculated in 40 mL of brain heart infusion (BHI) broth. The strains were incubated for 24 h at 37 °C in anaerobic jars (90% N 2 , 10% CO 2 , 10% H 2 ). After incubation, tubes were centrifuged twice (4000g, 8 min, 4 °C) and the supernatants were stored. The supernatants were not filtered. Uninoculated culture media (negative control, BHI broth) were subjected to the same conditions 9 . This experiment was performed in triplicate and protein extracts were used for exoproteomic analysis.