Identification of resistance in Escherichia coli and Klebsiella pneumoniae using excitation-emission matrix fluorescence spectroscopy and multivariate analysis

Klebsiella pneumoniae and Escherichia coli are part of the Enterobacteriaceae family, being common sources of community and hospital infections and having high antimicrobial resistance. This resistance profile has become the main problem of public health infections. Determining whether a bacterium has resistance is critical to the correct treatment of the patient. Currently the method for determination of bacterial resistance used in laboratory routine is the antibiogram, whose time to obtain the results can vary from 1 to 3 days. An alternative method to perform this determination faster is excitation-emission matrix (EEM) fluorescence spectroscopy combined with multivariate classification methods. In this paper, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Support Vector Machines (SVM), coupled with dimensionality reduction and variable selection algorithms: Principal Component Analysis (PCA), Genetic Algorithm (GA), and the Successive Projections Algorithm (SPA) were used. The most satisfactory models achieved sensitivity and specificity rates of 100% for all classes, both for E. coli and for K. pneumoniae. This finding demonstrates that the proposed methodology has promising potential in routine analyzes, streamlining the results and increasing the chances of treatment efficiency.

Scientific RepoRtS | (2020) 10:12994 | https://doi.org/10.1038/s41598-020-70033-x www.nature.com/scientificreports/ To identify if a strain of bacteria have resistance is necessary a test where an isolated culture is submitted at several types of antibiotics. The antibiotic sensitivity behavior of the isolated strains can be determined by disc diffusion method 15 , such as Minimal Inhibitory Concentrations (MIC) 16 or Minimal Bactericidal Concentrations (MBC) 17 .
Fluorescence spectroscopy has already been used in the detection 18 , structural investigation 19,20 and in the construction of a DNA biosensor for E. coli 21 . Chemometric methods such as Linear Discriminant Analysis (LDA) 22 , Quadratic Discriminant Analysis (QDA) 23 and Support Vector Machines (SVM) 24 , coupled with the dimensionality reduction algorithm: Principal Component Analysis (PCA) 25,26 ,and variable selection algorithms: Genetic Algorithm (GA) 27 and Successive Projections Algorithm (SPA) 28 , tend to enhance the spectroscopic techniques [29][30][31] . This paper brings a new perspective for the differentiation of sensitive and resistant bacteria of E. coli and K. pneumoniae species using excitation-emission fluorescence spectroscopy allied to multivariate classification methods.
The E. coli samples were composed of three groups, named control, resistant 1 and resistant 2. The control group was formed by sensitive E. coli samples (ATCC 25922). Resistance class 1 was composed of CCBH NDM samples, which have an enzyme called New Delhi metallo betalactamase, which attribute resistance to all betalactams, especially carbapenems. The resistant class 2 was formed by E. coli CCBH 7018, which shows a type of beta-lactamase that causes hydrolysis of penicillins, monobactams, cephalosporins and cefoxitin. The EEM data obtained for Escherichia coli: sensitive (Fig. 2a), NDM (Fig. 2b) and CCBH 7018 (Fig. 2c) are presents in Fig. 2, after spectral pre-processing.
As depicted in Fig. 1, it is very difficult to distinguish the classes of sensitive and resistant bacteria only by their spectral profiles due to the great similarity between them. In Fig. 2 there is no such visual similarity, but still, we cannot trace a clear feature that differentiates the classes apart. An exploratory analysis was performed www.nature.com/scientificreports/ using PCA with the unfolded data after spectral pre-processing. Figure 3 shows the PCA scores for Klebsiella pneumoniae data, built with 3 principal components (PCs). It can be observed that in the first component, which explains 51.5% of the explained variance, the control samples do not present separation in relation to the resistant Klebsiella samples. The second PC explains 30.6% of the data variance and also fails to distinguish between control and resistant classes. For the Escherichia coli spectra, we also constructed a PCA using 4 PCs, where the scores are shown in Fig. 4.  www.nature.com/scientificreports/ In Fig. 4, it is not possible to identify a separation between the three classes. Projecting the scores for the first PC, which explains 63.6% of the data variance, it is possible to observe a segregation between part of resistance group 1, in relation to the others samples. However, projecting in the second PC, which explains 14.1% of the data variance the data, the three classes cannot be distinguished. PCA results support that it is necessary to use multivariate classification algorithms that maximize the difference between the sensitive and resistant classes. A total of 75 samples were used for building the models, divided into three groups: calibration (45 samples), validation (15 samples) and prediction (15 samples). Table 1 shows the results of classification models built using the EEM fluorescence data for differentiating sensitive Klebsiella pneumoniae and resistants Klebsiella pneumoniae.
Initially, models were constructed comparing the class of Klebsiella sensitive and that of resistant. For built this last group, samples of two resistant classes are combined. Among these models, the ones that presented the most satisfactory results were 2D-LDA and 2D-PCA-QDA, which obtained 100.0% calibration accuracy and classification rates above 93% in all classes in the prediction set. Models were constructed using the three classes of samples, applying QDA and SVM, coupled to dimensionality reduction algorithms (PCA, SPA and GA) in the unfolded data. With the exception of the USPA-QDA, UPCA-SVM and USPA-SVM models, all others presented satisfactory results, with 100% accuracy, both in calibration and in prediction, for the three classes.
The same strategy was applied to the E. coli samples, the results are shown on Table 2. The first models were created with only two classes: E. coli sensitive and the combined resistant samples. The results were satisfactory, mainly for 2D-PCA-LDA and 2D-PCA-QDA, which obtained 100.0% accuracy in both classes, both in calibration and in prediction. The models constructed with the three classes presented satisfactory results in the classification. Unfolded models (UPCA-QDA and UGA-QDA) also resulted in 100.0% accuracy in calibration and prediction of the three classes in this comparison. Table 3 presents the validation results of the optimized models (UPCA-QDA, UGA-SVM and 2D-LDA) for each classification category of Klebsiella pneumoniae. The models that considered three classes (UPCA-QDA, UGA-SVM) showed promising results, with 100.0% sensitivity and specificity rates. Another notable result is the 2D-LDA model, built with only two classes, achieved similar results, with the same 100.0% sensitivity and specificity rates. The parameters accuracy and F-score were all equal to 100.0%, showing that those models are valid to distinguish between different groups of Klebsiella pneumoniae bacteria.
The validation results of the optimized models UPCA-QDA, UGA-SVM and 2D-PCA-QDA for the E. coli are illustrated in Table 4. The sensitivity and specificity rates for these models are 100.0% for all the analyzed classes. The accuracy and F-score values also reinforce the model efficiency.
According to the literature, bacterial resistance is usually associated with the ability of bacteria to modify their cellular structure and induce them to produce substances that neutralize the action of antibacterial agents. Satisfactory results from the models using EEM fluorescence data, for the E. coli and K. pneumoniae bacteria, demonstrate the sensitivity of the technique in detecting variations in the nuclear content of the cells and in the structure of the membranes itself. As reported by Opačić et al. 19 , who used fluorescence spectroscopy on structural investigation of the transmembrane C domain of the mannitol permease from Escherichia coli, the results showed that the technique was capable to differentiated the structure of EII mtl from structure of a IIC protein transporting diacetylchitobiose. Additionally, Romantsov et. al. 20 used dynamic data obtained by fluorescence correlation spectroscopy to extract structural information on isolated nucleoids, besides the evaluation of the characteristic size of the structural units in terms of the DNA length and estimation of their spatial dimensions.  Initially the pure samples were pealed in a BHI broth, then kept in the oven for 24 h at 38 °C, so that the bacteria multiplied. The sample was then pealed on a petri dish containing CLED culture medium, which was also kept in the oven for 24 h. Finally, a bacterial mass corresponding to approximately 2 × 10 6 colony forming units (CFU) was transferred from culture medium to falcon tube with 2 mL of phosphate buffer solution (1 mol/L), obtaining a concentration of 1 × 10 6 CFU/mL. To assure this concentration the turbidity was compared with the McFarland standard. The initial solution with the concentration of 1 × 10 6 CFU/mL was diluted in a phosphate buffer solution (1 mol/L) to obtain the following concentrations, 5 × 10 5 CFU/mL, 1.3 × 10 5 CFU/mL, 6.3 × 10 4 CFU/mL and 3,1 × 10 4 CFU/ mL.
EEM fluorescence spectroscopy. The excitation/emission fluorescence data were acquired in the wavelength range of 220-310 nm for excitation and 270-900 nm for emission, with steps of 10 and 1 nm for excitation and emission, respectively. A RF-5301 Shimadzu spectrofluorometer with a 0.5 mm quartz cuvette was used. The excitation and emission slits were set at 3 and 5 nm, respectively, the speed scan was set to super mode; the photomultiplier tube was set to the medium level and a cell with a fiber optic reflectance probe was used. A total of 1.5 mL of bacterial solution was added to the fluorescence cuvette for reading. The temperature was maintained at 25 °C throughout the experiments. Five replicates of each concentrations were performed.

Data analysis
Chemometrics procedure and software. Spectral pre-processing and multivariate classification models were built using MATLAB R2011a software (The MathWorks, Natick, USA), and the PLS Toolbox 7.9.3 package (Eigenvector Research, Inc., Wenatchee, USA). A spectral range between 220-310 nm for excitation and 270-900 nm for emission was used for model construction, with steps of 10 and 1 nm used for excitation and emission, respectively. This resulted in a data matrix size of 10 × 651 for each sample. The spectral pre-processing was composed by a cut in the region of 270-659 nm in the emission range, and by removing Rayleigh and Raman scatterings using the 'EEMscat' algorithm 32 .
The following classification methods were utilized: two-dimensional linear discriminant analysis (2D-LDA) 33 , two-dimensional principal component analysis with linear discriminant analysis (2D-PCA-LDA) 34 , quadratic discriminant analysis (2D-PCA-QDA) 34 , and support vector machines (2D-PCA-SVM) 34 . In addition to these, For the construction of classification models, the samples were divided into calibration (60%), validation (20%) and prediction (20%) sets using the Kennard-Stone (KS) sample selection algorithm 35 . The proposed models were evaluated by calculating some quality parameters such as accuracy, sensitivity, specificity and F-score.
To statistically evaluate the classification models, calculations of sensitivity and specificity were performed using the test samples as important quality measures of model accuracy. Both parameters have a maximum value of 100 and a minimum of 0, and are obtained as follows: where FN is defined as a false negative and FP as a false positive; and TP and TN are defined as true positive and true negative, respectively.
Also, the models were evaluated using the area under the curve (AUC) and F-score. The AUC is the area under the receiver operating characteristics conditions (ROC) curve, and the F-score is a measurement of the model accuracy defined by: where SENS stands for sensitivity; and SPEC stands for specificity.

Conclusion
The present study demonstrates the ability of EEM fluorescence spectroscopy associated with multivariate classification in differentiating classes of susceptible and resistant bacteria of the species E. coli and K. pneumoniae. The most satisfactory models for the classification of K. pneumoniae were UPCA-QDA, UGA-SVM and 2D-LDA, which presented 100% accuracy rates for all classes. For the E. coli data, the UPCA-QDA, UGA-SVM and 2D-PCA-QDA models were the best, having 100% predictive performance for the classification of all groups. All these models obtained a sensitivity and specificity rate of 100%. This paper suggest a new alternative in the detection of bacterial resistance, through a methodology that is faster than traditional methods of analysis, simplifying the diagnosis, and increasing the chances of recovery of the patients.