Screening of biomarkers of drug resistance or virulence in ESCAPE pathogens by MALDI-TOF mass spectrometry

Rapid identification and characterisation of drug-resistant bacterial pathogens have an important role in diagnostic and antimicrobial stewardship. Response time in the diagnosis of not only the etiological agent but also in antimicrobial susceptibility results is of utmost importance in patient treatment. In this study, matrix-assisted laser desorption ionisation–time of flight (MALDI-TOF) mass spectrometry (MS) was used to screen for biomarkers of ESCAPE (vancomycin-resistant Enterococcus faecium, methicillin-resistant Staphylococcus aureus, hypervirulent NAP1/ribotype 027 Clostridioides [Clostridium] difficile, multidrug resistant Acinetobacter baumannii, multidrug resistant Pseudomonas aeruginosa, and carbapenem-resistant Enterobacteriaceae) pathogens to predict antimicrobial resistance or hypervirulence. Several biomarkers of drug-resistant genotypes in S. aureus, A. baumannii, P. aeruginosa, and K. pneumoniae, as well as hypervirulence in C. difficile, were detected. The fastest possible susceptibility testing with MALDI-TOF MS is simultaneous detection of a characteristic drug-resistant peak and species identification in the same spectra generated in routine processing. According to our approach, resistance or virulence biomarker peaks can be identified while performing routine microbiology analysis, and no additional assays nor prolonged incubation time is needed. Outstanding biomarker peaks detected in our study should be further analysed by additional methods to identify the specific proteins involved.

Among P. aeruginosa isolates, 49.1% (27/55) were MDR, and 50.9% (28/55) were non-MDR. The bla VIM and bla IMP genes were not detected. potential biomarkers. The generated protein profile of each isolate spot had mass-charge ratio (m/z) values between 2,000 and 20,000. A MS spectrum sample of each species is shown in Supplementary Fig. S1. Classification models were generated based on the GA, SNN, and QC algorithms, in which at least one of them yielded adequate cross-validation and recognition capability values for vancomycin-resistant vs vancomycin-susceptible E. faecium (90% and 98%, respectively), C. difficile ribotype 027 vs non-027 (90% and 96%, respectively), mecA positive vs mecA negative (87% and 98%, respectively), carbapenem-resistant vs carbapenem-susceptible K. pneumoniae (78% and 90%, respectively), MDR vs non-MDR P. aeruginosa (76% and 89%, respectively), carbapenem/colistin-resistant vs carbapenem/colistin-susceptible K. pneumoniae (67% and 83%, respectively), bla OXA-24 vs bla OXA-58 carbapenem-resistant A. baumannii (59% and 51%, respectively), and bla OXA-24/58 positive vs bla OXA-24/58 negative A. baumannii (50% and 72%, respectively). A complete description of all classification analyses is shown in Supplementary Table S2. The ability of each algorithm to classify spectra from the same set of samples was used to perform an external validation. At least one of the models was able to allocate most of the spectra of the validation set to their corresponding phenotype/genotype profile (Table S2).
The distribution of each strain after PCA based analysis per phenotype/genotype of all species is shown in Supplementary Fig. S4. The distinguishing capability between drug-resistance (or virulence) phenotype/genotype profiles in the isolates based on their peptide mass fingerprints was highly variable.
These statistically significant peaks were selected for further analysis which were the analysis of the peak area/ intensity of the spectra, the coefficient of variation and the value of area under the curve [AUC] of each peak. When all three criteria were met, these peaks were considered as "potential biomarkers peaks".
When analysis VREfm, no potential biomarkers were finally selected. For S. aureus isolates, one peak with a m/z of 4,594 (with a value of area under the receiver operating characteristics curve [AUC-ROC] of 0.82; p < 0.001) was selected (Table 1). This peak was present as a singlet in the MSSA isolates; however, in the MRSA isolates, it displayed a splitting pattern as a doublet (Fig. 1). The presence of the peak m/z 4,594 as a doublet had a sensitivity of 83.3% and a specificity of 96.8% for the detection of methicillin-resistant phenotype (Table 1). Furthermore, a peak previously reported as a potential biomarker (m/z 2,415) was detected in 63.9% (23/36) of the MRSA isolates 16 .
For C. difficile isolates, peaks were selected by strain: the peak at m/z 6,654 (AUC = 0.96) was present only in non-027 strains, except for the 176 ribotype; and the peak at m/z 6,712 (AUC = 0.99; p < 0.001) was present only www.nature.com/scientificreports www.nature.com/scientificreports/ in 027 and 176 strains (Fig. 2). To detect the hypervirulent 027 ribotype, the absence of peak m/z 6,654 and presence of peak m/z 6,712 presented a sensitivity of 100%, specificity of 91.7%, and PPV of 95%.
No potential biomarkers were detected for the carbapenem-resistant K. pneumoniae isolates. However, the absence of peak m/z 6100 (AUC = 0.88) had a sensitivity of 93.8% and a NPV of 91.7% for the detection of carbapenem and colistin-resistant K. pneumoniae ( Fig. 5) (Table 1).
Peptide/protein identity assignment. Assignment of possible peptide/protein identity of all the identified potential marker peaks is shown in Table 1. The m/z 4,594 peak of S. aureus was assigned to the 50 S ribosomal protein L28 with MASCOT score of 26 and protein sequence coverage of 64% Both the m/z 6,654 and 6,712 peaks of C. difficile were assigned to the 30 S ribosomal protein (S20 and S21, respectively), with 31-34 score and 67-98% protein sequence coverage. Both the m/z 6,304 and the m/z 6,332 peak of A. baumannii was assigned to a NADH-quinone oxidoreductase subunit K with a score of 29 and 56% of protein sequence coverage for each peak. In P. aeruginosa, the m/z 2,726 peak was assigned to a UPF0270 protein Pfl01_4103 with a score of 20 and 32% of protein sequence coverage, and the m/z 5,455 peak was assigned to a UPF0391 membrane protein Patl_1732 with a score of 32 and 98% of protein sequence coverage. The m/z 5,742 peak of P. aeruginosa and the m/z 6,100 peak of K. pneumoniae could not be determined.

Discussion
MALDI-TOF MS is now widely used in clinical microbiology for bacterial identification, taxonomy, and strain typing 17 . One application in development is the detection of antibiotic resistance in bacteria. Thus far, several approaches have been proposed in MALDI-TOF MS to detect antimicrobial resistance such as the detection of the entire cell profile, enzymatic activity by antibiotic hydrolysis, or resistance proteins within the cell [18][19][20][21] . In this study, we identified several potential biomarkers of drug-resistant genotypes in S. aureus, A. baumannii, P. aeruginosa, and K. pneumoniae, as well as hypervirulence in C. difficile, using a direct approach.
Regarding C. difficile, 2 out of 73 peaks reached statistical significance, wherein isolates were divided into 027 and non-027 ribotype groups. The peak m/z 6,654 was absent in 027 and 176 ribotypes, whereas the peak www.nature.com/scientificreports www.nature.com/scientificreports/ m/z 6,712 was present in both ribotypes. These findings complicate the differentiation of hypervirulent strains (027) from non-hypervirulent strains (176). However, ribotype 176 shares many similarities with 027, such as the presence of binary toxin genes and a single-base-pair deletion at nucleotide 117 in the gene that encodes a negative regulator of toxin production 22 , in such way that is actually difficult to differentiate both ribotypes in most methodologies 23,24 . Nevertheless, severe disease outcomes and mortality are associated with ribotype 176 25 and in clinical epidemiology, hypervirulent strains should be notified 24 . In our work, 027 strains were differentiated from other non-027 strains, such as 001, 002, 003, 012, 014, 017, 019, 020, 076, 106, 220, and 353 ribotypes. It is highly likely (98% of protein sequence coverage) that 30S ribosomal protein was assigned to both peaks, which binds directly to 16S ribosomal RNA. In addition, both peaks had excellent AUC values (0.96-0.99), sensitivity, specificity, PPV, and NPV, all of which indicate a good performance of a biomarker.
MRSA is one of the most studied pathogens of the ESCAPE group regarding MALDI-TOF analysis 15,19,21,[26][27][28] . A number of studies have used MALDI-TOF MS to discriminate clonal complexes of MRSA (CC5, CC8, CC22, CC30, and CC45) 29 . In the present study, the peak m/z 4,594 had a high specificity of 96.8% to detect MRSA isolates. The 50S ribosomal protein was most probably assigned to this peak. Different protein synthesis inhibitors, such as macrolides, ketolides, lincosamides, and B-type streptogramins, prevent the 50S particle from being formed in growing cells 30 . To our knowledge, this is the first report of a potential biomarker peak for MRSA isolates. In addition, a m/z 2,415 peak was detected in the MRSA isolates (not shown). This peak was previously reported in MRSA isolates with the class A mec gene complex 16 , though in our study the characterisation beyond the mecA gene detection was not performed.
Few reports have included the analysis of drug-resistant A. baumannii through MALDI-TOF MS, and although promising, their results were obtained using the carbapenem hydrolysis assay, not the direct approach 15 . In our study, two potential biomarker peaks were detected for MDR A. baumannii: m/z 6,304 and m/z 6,332. Both peaks had high AUC values (0.99) and were present in bla OXA-58 -and bla OXA-24 -positive isolates, respectively. A NADH-quinone oxidoreductase subunit was likely assigned to both peaks. Type II NADH-quinone oxidoreductases of Gram-negative bacteria can be inhibited by polymyxin B and colistin 31 . The performance of peak m/z 6,332 as a biomarker improved when comparing bla OXA-58 -positive with bla OXA-58 -negative and bla OXA-24 isolates. Our results indicate the discriminatory power of MALDI-TOF MS to differentiate bla OXA-58 from bla OXA-24 isolates, and this may be applied directly in clinical microbiology.
A signal loss of three peaks was detected for MDR P. aeruginosa isolates in comparison to non-MDR isolates. All three peaks had acceptable individual AUC values (0.81-0.84), and interestingly, the inclusion of all three peaks increased their potential as biomarker peaks for the detection of non-MDR isolates (sensitivity, 75%; specificity, 74.1%; PPV, 75%; and NPV, 74.1%). Two of the peaks were likely assigned to hypotheticals proteins, such as UPF0270 protein Pfl01_4103 (m/z 2,726 peak) and UPF0391 membrane protein Patl_1732 (m/z 5,455 peak). In case the identified peaks were proteins, the signal loss could be due to a mutation introducing a premature stop codon or a frame-shift in a gene. As it is, we cannot assume they are indeed associated to proteins and other explanations must be considered. Growth conditions and sample preparation may also contribute to signal loss. www.nature.com/scientificreports www.nature.com/scientificreports/ Therefore, loss of a signal does not alone give distinctive information about the strain genotype and cannot be used as a reliable biomarker peak 29 .
One potential biomarker for carbapenem and colistin resistance in K. pneumoniae was identified at peak m/z 6,100, which could be used to detect this phenotype, giving its NPV (91.7%). The use and the resistance to polymyxins have increased in carbapenemase-producing Enterobacteriaceae. A MALDI-TOF MS-based method that detects polymyxin resistance-related modifications to lipid A (i.e., phosphoethanolamine addition) was previously developed in E. coli. This methodology can be performed in 15 min and simultaneously discriminates between chromosome-and plasmid-encoded resistance 32,33 . This methodology has an advantage over our approach, both in specificity and processing time.
In addition, we did not assess the frequency of the colistin resistance-associated mcr gene in our Enterobacteriaceae isolates. Our study population contained the bla NDM gene only in the carbapenem-resistant K. pneumoniae isolates. Because the bla KPC K. pneumoniae genotype has been previously identified by MALDI-TOF 34 , future studies should analyse a higher number of isolates and a wider diversity of drug resistance-associated genes of K. pneumoniae.
Previous studies have compared VRE mass spectra among vanA or vanB positive and negative groups, specifically on E. faecium isolates 15,21 . A number of peaks were detected among vanA-positive isolates 35,36 as well as  www.nature.com/scientificreports www.nature.com/scientificreports/ vanB-positive isolates 37 . In our study, we detected 70 potential peaks with significant difference (p ≤ 0.05), but no final potential biomarkers were detected. Failure to identify lineage-specific biomarker peaks in Gram-positive pathogens such as E. faecium has been reported previously 38 .
Changes in the distribution of peaks among strains suggest a peak shift, possibly corresponding to a slight protein weight modification due to an amino acid substitution after gene mutations. This shift can be used as a trustable biomarker to identify the strain genotype (either related to virulence or drug resistance), which was the approach used in this study 29 . According to our approach, resistance or virulence biomarker peaks can be identified while performing routine microbiology analysis, and no additional assays nor prolonged incubation time is needed.
The primary limitation of this study was the lack of validation studies. Predictive biomarkers for MALDI-TOF should be first identified with a training set of data and afterward should be validated on an independent set of data. This validation set should include isolates with geographic, temporal, and clonal diversity to ensure the validity of the peak biomarkers 29 . In addition, outstanding potential biomarker peaks detected in our study should be further analysed by additional methods to identify which proteins/peptide each peak corresponds to.
In conclusion, we identified several potential biomarker peaks suggestive of drug resistance in S. aureus, A. baumannii, P. aeruginosa, and K. pneumoniae, as well as hypervirulence in C. difficile using MALDI-TOF MS.

Methods clinical isolates. Selected ESCAPE clinical isolates obtained between January 2007 and December 2017 from
the Hospital Civil de Guadalajara "Fray Antonio Alcalde" (Jalisco, Mexico) were included. The VITEK 2 system (Biomérieux, Craponne, France) was used according to the manufacturer's instructions, and either identification card for Gram positive or negative were used for identification. Bacteria were grown on plates containing Trypticase soy agar with 5% sheep blood and incubated overnight at 35 °C. A suspension of bacteria from pure cultures was performed in a 0.45% sodium chloride solution. This suspension was adjusted to a McFarland standard of 0.5 and along with the ID card, introduced into the system, which automatically inoculated the card and incubated at 35 °C until the end of the test.
A. baumannii species identification was confirmed by polymerase chain reaction (PCR). Primers and PCR conditions were used as described previously 39,40 . First, rapid DNA extraction was performed, in which three to five bacterial colonies were suspended in 100 µl of sterile distilled water and heated at 95 °C for 15 min. After centrifugation at 13,500 rpm for 5 min, the supernatant was collected. A pair of primers, forward (F): 5′-CATTATCACG GTAATTAGTG-3′ and reverse (R): 5′-AGAGCACTGTGCACTTAAG-3′, was used for the amplification of an internal fragment from the 16S-23S rRNA intergenic spacer region of A. baumannii. A second pair of primers, F5′-CCTGAATCTTCTGGTAAAAC-3′ and R5′-GTTTCTGGGCTGCCAAACATTAC-3′, were used for the amplification of a highly conserved region of the recA gene of Acinetobacter genus. The reaction mixture contained 1X PCR buffer, 2 mM MgCl 2 , 0.2 mM of each dNTP, 200 nM of each primer, 1 U of AmpliTaq polymerase and 2 µl of DNA. PCR was initiated by denaturation for 5 min at 94 °C, followed by 30 cycles of 30 s at 94 °C, 30 s at 55 °C and 30 s at 72 °C with final extension for 7 min at 72 °C and a holding step at 4 °C until analysis. Next, 5 µl of the PCR products were electrophoresed on a 2.0% agarose gel prepared with 0.5x TBE buffer for 1 h at 130 V. The gel was stained with 10 µM ethidium bromide, and the PCR product bands were visualized using a UV transilluminator. Expected PCR products for the 16S-23S rRNA intergenic spacer region and recA gene were 208 and 425 bp, respectively.
All the isolates were classified according to manufacturer's recommended score identification criteria. A score of 2.0 to 3.0 indicated reliable species-level identification, a score of 1.7 to 1.9 indicated reliable genus identification but questionable species-level identification, and a score of <1.7 indicated an unreliable identification. potential marker peaks. The MS spectrum of each spot was analysed by the MALDI Biotyper V.3.1.66 with the most updated spectra library, V.7.0 (7,311 spectra). High-quality spectra were captured using the flexControl V.3.4 software (Bruker Daltonics). These spectra were then imported into the ClinProTools V.3.0 software (Bruker Daltonics) for recognition of mass spectra patterns between groups and preprocessed using the default parameters. Spectra were smoothed with 10 cycles of the Savitzky/Golay algorithm for 10 cycles with a width of 2 m/z. Baseline subtraction was performed with the Top Hat algorithm. Peak picking was performed on the average spectrum from each group, with a signal to noise threshold of 5. Peak selection was performed using the P-value T-test/ANOVA sort mode.
Group selection was based on the presence or absence of drug-resistance genotypes, except for C. difficile in which hypervirulent ribotypes determination was considered.
Statistical analyses. The selection of potential biomarker peaks of drug-resistant phenotypes was performed using statistical analyses within the algorithms included in the ClinProTools software. Classification models were generated using the genetic algorithm (GA), supervised neural network (SNN), and quick classifier (QC) algorithms. The parameters were maximum of 5 peaks, maximum of 50 generations, k-nearest neighbor classification with 3 neighbors for GA; 1-25 peaks automatically detected for SNN, and QC (automatic peak detection 1-25 peaks, p-value T-test/ANOVA). The recognition capability and cross validation values were calculated to demonstrate the reliability and accuracy of the model.
The Anderson-Darling test was used to determine the distribution of the population (≤0.05: not normally distributed, >0.05, normally distributed). According to the results, either the t-test (normally distributed data) or the Wilcoxon test (not normally distributed data) were used to confirm significant differences between two classes (on the presence or absence of drug-resistance or virulence phenotype/genotypes). As the p-value provides a measure of the strength of an association, the lower it is, the better the respective peak signal can be used to discriminate the groups. If the p-value for the t-test or the Wilcoxon test was ≤0.05, the peak was confirmed to be significantly different.
Peaks that obtained statistical significance were selected for further analysis, which included evaluating the peak area/intensity of the spectra and the coefficient of variation of each peak; this analysis allowed the selection of potential peaks. ClinProTools also calculates a Receiver Operating Characteristic (ROC) curve for each peak. The ROC curve provides an evaluation of the discrimination quality of a peak. In addition, the area under the curve (AUC) was determined and only peaks with AUC ≥ 0.80 values were selected. The principal component analysis (PCA) was used to evaluate the distinguishing capability between drug-resistance (or virulence) phenotype/genotype profiles in the isolates based on their peptide mass fingerprints.
The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for the potential biomarker peaks.
Peptide/protein identity assignment. MS spectra were first processed using flexAnalysis software (Bruker Daltonics) in which top-hat baseline subtraction and spectra smoothing were performed. The spectra were then exported to the Biotools software (Bruker Daltonics), in which protein/peptide identification was performed by latter submitting the data to the Mascot Server (Matrix Science, Boston, USA) and run against the SwissProt database for peptide/protein assignment.