Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning

Raman optical spectroscopy promises label-free bacterial detection, identification, and antibiotic susceptibility testing in a single step. However, achieving clinically relevant speeds and accuracies remains challenging due to weak Raman signal from bacterial cells and numerous bacterial species and phenotypes. Here we generate an extensive dataset of bacterial Raman spectra and apply deep learning approaches to accurately identify 30 common bacterial pathogens. Even on low signal-to-noise spectra, we achieve average isolate-level accuracies exceeding 82% and antibiotic treatment identification accuracies of 97.0±0.3%. We also show that this approach distinguishes between methicillin-resistant and -susceptible isolates of Staphylococcus aureus (MRSA and MSSA) with 89±0.1% accuracy. We validate our results on clinical isolates from 50 patients. Using just 10 bacterial spectra from each patient isolate, we achieve treatment identification accuracies of 99.7%. Our approach has potential for culture-free pathogen identification and antibiotic susceptibility testing, and could be readily extended for diagnostics on blood, urine, and sputum.

Bacterial infections are a leading cause of death in both developed and developing nations, taking more than 6.7 million lives each year 1,2 .These infections are also costly to treat, accounting for 8.7% of annual healthcare spending, or $33 billion, in the United States alone 3 .Current diagnostic methods require sample culturing to detect and identify the bacteria and its antibiotic susceptibility, a slow process that can take days even in state-of-the-art labs 4,5 .Broad spectrum antibiotics are often prescribed while waiting for culture results 6 , and according to the Centers for Disease Control and Prevention, over 30% of patients are treated unnecessarily 7 .New methods for rapid, culture-free diagnosis of bacterial infections are needed to enable earlier prescription of targeted antibiotics and help mitigate antimicrobial resistance.
Raman spectroscopy has the potential to identify the species and antibiotic resistance of bacteria, and when combined with confocal spectroscopy, can interrogate individual bacterial cells (Figure 1a,   b).Different bacterial phenotypes are characterized by unique molecular compositions, leading to subtle differences in their corresponding Raman spectra.However, because Raman scattering efficiency is low (∼ 10 −8 scattering probability 8 ), these subtle spectral differences are easily masked by background noise.
High signal-to-noise ratios (SNRs) are thus needed to reach high identification accuracies 9 , typically requiring long measurement times that prohibit high-throughput single-cell techniques.Additionally, the large number of clinically relevant species, strains, and antibiotic resistance patterns require comprehensive datasets that are not gathered in studies that focus on differentiating between species 10,11 , isolates (typically referred to as strains in the literature) 12,13 , or antibiotic susceptibilities [14][15][16][17][18][19] .

Deep learning for bacterial classification from Raman spectra
To address this challenge, we train a convolutional neural network (CNN) to classify noisy bacterial spectra by isolate, empiric treatment, and antibiotic resistance.As illustrated in Figure 1, we measure Raman spectra using short measurement times on dried monolayer samples, ensuring that the majority of individual spectra are taken over single cells and preparation conditions are consistent between samples (See Methods).We construct reference datasets of 60,000 spectra from 30 bacterial and yeast isolates for 3 measurement times -these 30 isolate classes cover over 94% of all bacterial infections treated at Stanford Hospital in the years 2016-17 and are representative of the majority of infections in intensive care units worldwide 20 .We further augment our reference dataset with 12,000 spectra from clinical patient isolates, including MRSA and MSSA isolates (see Methods for full dataset information).Previously, the lack of large datasets prohibited the use of CNNs due to the high number of spectra per bacterial class needed for training.
Our CNN architecture consists of 25 1D convolutional layers and residual connections 33 -instead of two-dimensional images, it takes one-dimensional spectra as input.Unlike previous work, we do not use pooling layers and instead use strided convolutions with the goal of preserving the exact locations of spectral peaks 34 .Empirically, we find that this strategy improves model performance.This is the first work to adapt state-of-the-art CNN techniques from image classification to spectral data (see Methods for further detail).
We train the neural network on a 30-class isolate identification task, where the CNN outputs a probability distribution across the 30 reference isolates and the maximum is taken as the predicted class.
The model is evaluated using a 5-fold cross validation procedure, where 1600 of the 2000 spectra per class are used for training and the remaining 400 are used for evaluating the test accuracy.
A performance breakdown for individual classes is displayed in the confusion matrix in Figure 2a.Here, we show data for 1 s measurement times, corresponding to a SNR of 4.1 -roughly an order of magnitude lower than typical reported bacterial spectra [10][11][12] ; classification accuracies increase with SNR, as shown in Supplementary Figure 1.On the 30-class task, the average isolate-level accuracy is 93.8 ± 0.1%.Gram-negative bacteria are only misclassified as other Gram-negative bacteria; the same is generally true for Gram-positive bacteria, where additionally, the majority of misclassifications occur within the same genus.In comparison, our implementations of the more common classification techniques of logistic regression and support vector machine (SVM) achieve average accuracies of 89.3± 0.2% and 88.7 ± 0.2%, respectively, on our reference dataset.

Identification of empiric treatments
Species-level classification accuracy is the standard metric for bacterial identification, but in practice, the priority for physicians is choosing the correct antibiotic to treat a patient.Common antibiotics often have activity against multiple species, so the 30 isolates can be arranged into groupings based on the recommended empiric treatment if the bacterial species is known.Classification accuracies can thus be condensed into a new confusion matrix grouped by empiric antibiotic treatment (Figure 2b), where the average accuracy of our method is 99.0±0.1%.

Detection of antibiotic resistance
As a step toward a culture-free antibiotic susceptibility test using Raman spectroscopy, we train a binary CNN classifier to differentiate between methicillin-resistant and -susceptible isolates of S. aureus.This model achieves 95.4±0.5% identification accuracy.Because the consequences for misdiagnosing MRSA as MSSA are often more severe than the reverse misdiagnosis, the binary decision can be tuned for higher sensitivity (low false negative rate), as shown in the receiver operating characteristic (ROC) curve in Figure 2c (dotted line denotes performance of random guessing).The area under the curve (AUC) is 99.1%, meaning that a randomly selected positive example (i.e., Raman sample from patient with MRSA) will be predicted to be more likely to be MRSA than a randomly selected negative example (i.e., sample from patient with MSSA) with probability 0.991.
To test whether our model can detect a specific difference in antibiotic resistance in addition to differences between isolates, we perform binary classification between MRSA 1 and its isogenic variant where the methicillin resistance gene (mecA) is removed 35 (Figure 3a).The expression of mecA results in replacement of Penicillin Binding Proteins (PBPs) with PBP2a, which has a low binding affinity for methicillin 36 .The CNN's ability to differentiate the pair with 78.5±0.6% accuracy (Figure 3b) demonstrates sensitivity to a single genetic difference in antibiotic resistance, with all other factors held constant.The ROC curve for the isogenic binary classification has an AUC of 86.1±0.6% (Figure 3c).

Extension to clinical patient isolates
To demonstrate that this approach can be extended to new clinical settings, we test our model on 25 clinical isolates derived from patient samples, with 5 isolates from each of the 5 most prevalent 37 empiric treatment groups (see Supplementary Table 2 and Supplementary Figure 3).We augment our reference dataset with this clinical dataset comprised of 400 spectra per clinical isolate.To account for changes in the relative prevalence of species and antibiotic resistances over time, the model may be fine-tuned on a small dataset that is representative of current patient populations.We use a leave-one-patient-out crossvalidation (LOOCV) strategy for fine-tuning, where we assign 1 patient in each class to the test set (5 patients total) and use the other 4 for fine-tuning (20 patients total), fine-tuning on 10 randomly sampled spectra per patient isolate -we repeat this 5 times, so all 25 patient isolates appear in the held-out test set once.We then use 10 randomly sampled spectra from each patient isolate in the test set to reach an infection identification for that patient isolate.The sampling procedure for identification is repeated for 10,000 trials, and we report the average accuracy and standard deviation, and display a trial representing the modal result in Figure 4a (full experiment details can be seen in Supplementary Note 1).A CNN pretrained on the reference dataset serves both as initialization for the fine-tuned model and as a baseline, achieving 89.0±3.6% species identification accuracy, a statistically significant improvement over logistic regression and support vector machine baselines (see Methods for details).When the CNN is fine-tuned on clinical data and then evaluated on the held-out patients, the identification accuracy is improved to 99.0±1.9%(Supplementary Figure 4).Samples for the clinical tests were prepared separately for each patient, so we conclude that the measured performance is not due to batch effects from sample preparation or measurement conditions.
Because patient samples may contain very low numbers of bacterial cells without culturing (e.g. 1 CFU/mL or fewer in blood 38 ), only a few individual bacterial spectra per patient may be available to make a diagnosis.As seen in Figure 4c, just 10 cellular spectra are enough to reach high identification accuracy.
The rate of correct identification using 10 spectra is 99.0%, within 1% of the performance with 400 spectra (100.0%).While acquiring spectra from 400 individual bacterial cells would likely necessitate culturing, we achieve high accuracy on spectra from 10 individual bacterial cells, demonstrating the potential of our combined Raman-CNN system to diagnose infections using noisy spectra collected from uncultured samples.
Finally, as a step toward antibiotic susceptibility testing on clinical isolates, we collect Raman spectra on 5 additional clinical MRSA isolates and test the binary MRSA/MSSA classifier that is pre-trained on the reference MRSA and MSSA isolates.Using the same LOOCV process, we fine-tune the binary classifier on the clinical spectra.A representative result is shown in Figure 4b, where misclassifications of MSSA as MRSA are labeled as "suboptimal", indicating that Vancomycin (prescribed for MRSA) is also effective on MSSA but is not considered optimal treatment and may introduce adverse patient effects.On average, the pre-trained binary classifier achieves 61.7±7.3%accuracy and the fine-tuned binary classifier achieves 65.4±6.3%accuracy (Supplementary Figure 4).

Discussion
This work constitutes the first application of state-of-the-art deep learning techniques to noisy Raman spectra to identify clinically relevant bacteria and their empiric treatment.We have collected the largest known dataset of bacterial Raman spectra, both in terms of spectra per isolate and total number of isolates -the size of this dataset enables deep learning approaches.A CNN model pre-trained on this dataset can easily be extended to new clinical settings through fine-tuning on a small number of clinical isolates, as we have shown on our clinical dataset.We envision that fine-tuning processes such as the one demonstrated here could be important components for continuously evaluating and improving deployed models.Our model, applied here to the identification of clinically relevant bacteria, can be applied with minimal modification to other identification problems such as materials identification, or other spectroscopic techniques such as nuclear magnetic resonance, infrared, or mass spectrometry.
This study uses measurement times of 1 s, corresponding to SNRs that are an order of magnitude lower than typical reported bacterial spectra -while still achieving comparable or improved identification accuracy on more isolate classes than typical Raman bacterial identification studies.A common strategy for reducing measurement times is surface-enhanced Raman scattering (SERS) using plasmonic structures, which can increase the signal strength by several orders of magnitude 11,39,40 .SERS spectra can be highly variable and difficult to reproduce, particularly on cell samples 8,41 , making it difficult to develop a reliable diagnostic method based on SERS.However, with a dataset capturing the breadth of variation in SERS spectra, a CNN could enable a platform that processes blood, sputum, or urine samples in a few hours.
The species-level identification accuracy achieved by our Raman-CNN model is 94.7±0.1%approachingmodern culture-based techniques such as MALDI-TOF, which achieves 95.4% 42 to 99.1% 43 species-level identification accuracy.Compared to other culture-free methods 44 including single-cell sequencing [45][46][47][48] and fluorescence or magnetic tagging 49 , Raman spectroscopy has the unique potential to be a technique for identifying phenotypes that does not require specially designed labels, allowing for easy generalizability to new strains.
To achieve treatment recommendations as fine-grained as those from culture-based methods, larger datasets covering greater diversity in bacterial susceptibility profiles, cell states, and growth media and conditions would be needed.Though collecting such datasets is beyond an academic scope, requiring highly automated sample preparation and data acquisition processes, there is promise for clinical translation.Similarly, studies applying the Raman-CNN system to identify pathogens in relevant biofluids such as whole blood, sputum, and urine are a promising future direction to demonstrate the validity of the method as a diagnostic tool.When combined with such an automated system, the Raman-CNN platform presented here could rapidly scan and identify every cell in a patient sample and recommended an antibiotic treatment in one step, without needing to wait for a culture step.Such a technique would allow for accurate and targeted treatment of bacterial infections within hours, reducing healthcare costs and antibiotics misuse, and limiting antimicrobial resistance, and improving patient outcomes.Figure 4: Extension to clinical patient isolates.A CNN pre-trained on our reference dataset can be extended to classify clinical patient isolates and further improved by fine-tuning on a small number of clinical spectra.a) 5 species of bacterial infections are tested, with 5 patients per infection type.After fine-tuning, species identification accuracy improves from 89.0±3.6% to 99.0±1.9%.b) Binary classification between MRSA and MSSA patient isolates is also performed, with an accuracy of 61.7±7.3% that improves to 65.4±6.3%after fine-tuning.c) Dependence of average diagnosis rates for the finetuned model on the number of spectra used per patient.With just 10 spectra, the performance of the model reaches 99% -within 1% difference of the performance with 400 spectra (100%).Error bars are calculated as the standard deviation across 10,000 trials of random selections of n spectra, where n is the number of spectra used per patient.

Dataset
The reference dataset consists of 30 bacterial and yeast isolates, including multiple isolates of Gramnegative and Gram-positive bacteria, as well as Candida species.We also include an isogenic pair of S. aureus from the same strain, in which one variant contains the mecA resistance gene for methicillin (MRSA) and the other does not (MSSA) 35 (see Supplementary Table 1 for full isolate information).The clinical dataset consists of 30 patient isolates distributed across 5 species.The total dataset consists of 2000 spectra each for the 30 reference isolates plus isogenic MSSA at 3 measurement times, and 400 spectra for each clinical isolate at 1 measurement time.

Dataset variance
For our datasets, we observe that intra-sample variance is high, as demonstrated by the pairwise spectral difference analysis summarized in Fig. 2. For 19 out of 30 isolates, spectra from at least one other isolate are more similar on average than spectra from the same isolate, on average.For example, when we rank isolates in order of similarity to E. faecalis 2 (Fig. 2c), there are 8 other isolates where the average difference between a spectrum from E. faecalis 2 and a spectrum from the other isolate is smaller than the average difference between two spectra from E. faecalis 2. When intra-sample variance is high, a large number of spectra per sample may help to better represent the full data distribution and lead to higher predictive performance.

Sample preparation
Bacterial isolates were cultured on blood agar plates each day before measurement.Plates were sealed with Parafilm and stored at 4 • C for 20 minutes to 12 hours before sample preparation.Storage times varied to allow for multiple measurement times per day; however all other sample preparation conditions were kept consistent between samples.Differences in storage time were not found to result in spectral changes greater than spectral changes due to strain or isogenic differences.All clinical isolates were prepared in separate samples with consistent sample preparation conditions.Because test clinical samples were prepared separately from samples used for training, we conclude that classifications are not due to batch effects such as differences in sample preparation.We prepared samples for measurement by suspending 0.6 mg of biomass from a single colony in 10 µL of sterile water (0.4 mg in 5 µL water for Gram-positive species) and drying 3 µL of the suspension on a gold-coated silica substrate (Figure 1a and b).Substrates were prepared by electron beam evaporation of 200 nm of gold onto microscope slides that were pre-cleaned using base piranha.Samples were allowed to dry for 1 hour before measurement.

Raman measurements
We measured Raman spectra across monolayer regions of the dried samples (Figure 1a) using the mapping mode of a Horiba LabRAM HR Evolution Raman microscope.633 nm illumination at 13.17 mW was used with a 300 l/mm grating to generate spectra with 1.2 cm −1 dispersion to maximize signal strength while minimizing background signal from autofluorescence.Wavenumber calibration was performed using a silicon sample.The 100X 0.9 NA objective lens (Olympus MPLAN) generates a diffraction-limited spot size, ∼1 µm in diameter.A 45x45 discrete spot map is taken with 3 µm spacing between spots to avoid overlap between spectra.The spectra are individually background corrected using a polynomial fit of order 5 using the subbackmod Matlab function available in the Biodata toolbox (see Supplementary Figure 1 for examples of raw and corrected spectra).The majority of spectra are measured on true monolayers and arise from 1 cell due to the diffraction-limited laser spot size, which is roughly the size of a bacteria cell.However, a small number of spectra may be taken over aggregates or multilayer regions.We exclude the spectra that are most likely to be non-monolayer measurements by ranking the spectra by signal intensity and discarding the 25 spectra with highest intensity, which includes all spectra with intensities greater than two standard deviations from the mean.We measured both monolayers and single cells, and found that monolayer measurements have SNRs of 2.5±0.7,similar to single-cell measurements (2.4±0.6), while allowing for the semi-automated generation of a large training dataset.The spectral range between 381.98 and 1792.4 cm −1 was used, and spectra were individually normalized to run from a minimum intensity of 0 and maximum intensity of 1 within this spectral range.
SNR values are calculated by dividing the total intensity range by the intensity range over a 20-pixel wide window in a region where there is no Raman signal.

CNN architecture & training details
The CNN architecture is adapted from the Resnet architecture 33 that has been widely successful across a range of computer vision tasks.It consists of an initial convolution layer followed by 6 residual layers and a final fully connected classification layer -a block diagram can be seen in Figure 1.The residual layers contain shortcut connections between the input and output of each residual block, allowing for better gradient propagation and stable training (refer to 33 for details).Each residual layer contains 4 convolutional layers, so the total depth of the network is 26 layers.The initial convolution layer has 64 convolutional filters, while each of the hidden layers has 100 filters.These architecture hyperparameters were selected via grid search using one training and validation split on the isolate classification task.We also experimented with simple MLP and CNN architectures but found that the Resnet-based architecture performed best.
We first train the network on the 30-isolate classification task, where the output of the CNN is a vector of probabilities across the 30 classes and the maximum probability is taken as the predicted class.
The binary MRSA/MSSA and binary isogenic MRSA/MSSA classifiers have the same architecture as the 30-isolate classifier, aside from the number of classes in the final classification layer.We use the Adam optimizer 50 across all experiments with learning rate 0.001, betas (0.5, 0.999), and batch size 10.
Classification accuracies are reported using a standard stratified 5-fold cross validation procedure.For each fold, we hold out one fifth of the data as unseen test data, then split the remaining data into 90/10 train and validation splits, train the CNN on the train split, and use the accuracy on the validation split to perform model selection.We then evaluate and report the test accuracy on the unseen test data.All error values reported for tests on the reference dataset are standard deviation values across 5 folds.
While a high number of samples is good for ensuring dataset variation, deep learning approaches can still benefit from having a high number of examples per sample.When intra-sample variance is high, as we observe for our datasets, a large number of spectra per sample may better represent the full distribution and lead to higher predictive performance.
For the clinical isolates, we start by pre-training a CNN on the empiric treatment labels for the 30 reference isolates.We then use the following leave-one-patient-out cross-validation (LOOCV) strategy to fine-tune the parameters of the CNN.There are a total of 25 patient isolates across 5 species.In each of the 5 folds, we assign 1 patient in each species to the test set, 1 patient in each species to the validation set, and the remaining 3 patients in each species to the training (i.e., fine-tuning) set.We then use the clinical training set (consisting of isolates from 15 patients) to fine-tune the CNN parameters, and use accuracy on the validation set (5 patient isolates) to do model selection.The test accuracy for each fold is evaluated on the test set (5 patient isolates) using the method described below.

Clinical identification data analysis
To reach an identification for patient isolates, 400 spectra are measured across a sample from each patient isolate.10 of these spectra are chosen at random to be classified.The most common class out of the 10 spectral classifications is then chosen as the identification for each patient isolate, with ties broken randomly.All error values reported for tests on the clinical dataset are standard deviations across 10,000 trials of random selections of 10 spectra.

Baselines
In all experiments where logistic regression (LR) and support vector machine (SVM) baselines were used, we first used PCA to reduce the input dimension from 1000 to 20 -this hyperparameter was determined by plotting test accuracies for different settings on one training and validation split for the 30 isolate task and picking a value near where the test accuracy saturated.Using only the first 20 principal components not only decreases computation costs, but also increases accuracy by reducing the amount of noise in the data.For each fold of the cross validation procedure, we use grid search to choose the regularization hyperparameter for each model achieving the best validation accuracy and report the corresponding test accuracy.

Two-sample test of sample means
We use the Welch's two-sample t-test to test whether the differences in mean clinical accuracy for the CNN and the SVM and LR baselines were statistically significant.Welch's t-test is a variation of the Student's t-test that is used when the two samples may have unequal variances.In each case, we start by computing the pooled standard deviation as We then compute the standard error of the difference between the means as Finally, we can compute the test statistic as and then compute the p-value using the corresponding Student's t-distribution.For our computations, n CNN = n LR = n SVM = 10000, µ CNN = 89.0,µ LR = 81.8,µ SVM = 82.9,σ CNN = 3.6, σ LR = 6.0, and σ SVM = 5.9.In comparing the CNN with LR, we computed a t-statistic of 102.9 and in comparing the CNN with SVM, we computed a t-statistic of 88.3.In both cases, we reject the null hypothesis that the means are equal at the 1e-6 p-level.Supplementary Figure 4: a) Classification results for each patient isolate.Element (i, j) represents the percentage out of 10,000 trials in which species j is predicted by the CNN for patient i. b)Classification results for each MRSA/MSSA patient isolate.Heatmap represents the percentage out of 10,000 trials in which the binary CNN accurately identifies whether the isolate is MRSA or MSSA. 10 spectra per isolate are used for both fine tuning and identification.

Figure 1 :
Figure 1: Raman spectra can be used to identify bacteria via classification by a convolutional neural network (CNN).a) To build a training dataset of Raman spectra, we deposit bacterial cells onto gold-coated silica substrates and collect spectra from 2000 bacteria over monolayer regions for each strain.An SEM cross section of the sample is shown (gold coated to allow for visualization of bacteria under electron beam illumination).Scale bar is 1 µm.b) By focusing the excitation laser source to a diffraction-limited spot size, Raman signal from single cells can be acquired.c) Using a one-dimensional residual network with 25 total convolutional layers (see Methods for details), low-signal Raman spectra are classified as one of 30 isolates, which are then grouped by empiric antibiotic treatment.d) Raman spectra of bacterial species can be difficult to distinguish, and short integration times (1 s) lead to noisy spectra (SNR = 4.1).Averages of 2000 spectra from 30 isolates are shown in bold and overlaid on representative examples of noisy single spectra for each isolate.Spectra are color-grouped according to antibiotic treatment.These reference isolates represent over 94% of the most common infections seen at Stanford Hospital in the years 2016-17 37 .

Figure 2 :
Figure 2: CNN performance breakdown by class.The trained CNN classifies 30 bacterial and yeast isolates with isolate-level accuracy of 93.8±0.1% and antibiotic grouping-level accuracy of 99.0±0.1%.a) Confusion matrix for 30 strain classes.Entry (i, j) represents the percentage out of 2000 spectra that are predicted by the CNN as class j given a ground truth of class i; entries along the diagonal represent the accuracies for each class.Misclassifications are mostly within antibiotic groupings, indicated by colored boxes, and thus do not affect the treatment outcome.Values below 0.5% are not shown, and matrix entries covered by figure insets are all below 0.5%.b) Predictions can be combined into antibiotic groupings to estimate treatment accuracy.TZP = piperacillin-tazobactam.All values below 0.5% are not shown.c) A binary classifier is used to distinguish between methicillin-resistant and -susceptible S. aureus (MRSA/MSSA).By varying the classification threshold, it is possible to trade off between sensitivity (true positive rate) and specificity (true negative rate).The area under the curve (AUC) is 99.1%.

Figure 3 :
Figure 3: Isogenic MRSA/MSSA classifier.a) Sensitivity to antibiotic resistance alone with all otherfactors held constant can be tested using an isogenic pair of S. aureus, meaning that the two are genetically identical aside from the deletion of the mecA gene which confers methicillin resistance.The expression of mecA results in replacement of Penicillin Binding Proteins (PBPs) with PBP2a, which has a low binding affinity for methicillin.b) A binary classifier is trained to distinguish between these two bacteria, achieving 78.5±0.6% accuracy.c) The ROC shows sensitivities and specificities significantly higher than random classification, with an AUC of 86.1%.

Table 1 :
37,51ence isolates.The empiric treatments are chosen by the authors of this paper specializing in infectious diseases from recommendations from Sanford Guide to Antimicrobial Therapy and trends in patient susceptibility profiles at the Stanford Hospital and the Veterans Affairs Palo Alto Health Care System37,51.However, specific choices for each of the empiric species groups may be modified according to individual hospital susceptibility profiles.

Table 2 :
Clinical isolatesSupplementary Figure3: Spectra for individual patient isolates, averaged across the full 400 spectra dataset for each patient.
Supplementary Figure1: a) Isolate-level classification accuracy increases with SNR.Under the measurement conditions used in this study, performance of the CNN is negatively affected by shorter measurement times.Further increase of SNR should saturate the performance of the CNN to a minimal baseline error rate.b) Spectral examples (from E. coli 1) for measurement times of 1 s, 0.1 s, and 0.01 s. c) Raw spectra for MRSA 1, E. coli 1, and P. aeruginosa 1 for a measurement time of 1 s.d) Spectra after background subtraction and normalization for a measurement time of 1 s.These are the direct inputs into our model.