Introduction

Bacteraemia is a frequent and challenging condition with a mortality rate ranging between 13% and 21%1,2,3. Risk factors for bacteraemia are advanced patient age, urinary or indwelling vascular catheter, chemotherapy or immunosuppressive therapies and co-morbidities such as malignancies4,5,6,7. A timely diagnosis is pivotal for the survival of bacteraemic patients, as these patients require prompt treatment with the appropriate antibiotics8,9.

Although blood culture (BC) analysis is regarded as the gold standard in bacteraemia diagnostics, the clinical decision as to who should receive BC analysis is not trivial. Furthermore, BC analysis needs a median of three days for a positive report and singularly taken BC often lacks diagnostic sensitivity10,11. Despite profound knowledge about its pre-test probability, which is severely affected by the infection site, the true positive result rate of BC analysis for recognized pathogens ranges between 4% and 7%12,13,14. Moreover, the proportion of false positive BC results related to contaminations is in a comparable range of up to over 8% of all BC analyses14,15,16. Generally, these flaws in the utilization of BC analysis have a fundamental economic impact, with estimated costs ranging between $6,878 and $7,502 for a single false positive BC result17,18,19.

Consequently, physicians are frequently faced with diagnostic uncertainties20. Biomarkers or prediction tools with a high negative predictive value (NPV), enabling the exclusion of bacteraemia, are highly desirable to increase the cost-effectiveness of microbiological tests. Procalcitonin (PCT) is considered as the best biomarker for detecting bacteraemia, with a pooled sensitivity of 76% (95% confidence interval (CI): 72–80%) and a pooled specificity of 69% (95% CI: 64–72)21.

In the current study, machine learning algorithms were applied to data obtained by a prospective cohort study with the goal to improve the diagnostic performance of PCT for identifying patients fulfilling two or more systemic inflammatory response syndrome (SIRS) criteria but without the need for BC analysis.

Results

Study population and available data

Data of 466 SIRS patients was available for predictive model estimation. Among them, 134 patients (28.8%) suffered from microbiologically confirmed bacteraemia, 195 patients (41.8%) presented with an infection but without bacteraemia and 137 patients (29.4%) presented with a SIRS syndrome which was not related to any infection. The in-hospital mortality was 11.1% (n = 52) in our cohort.

In total, 71 patients fulfilled four SIRS criteria, 213 patients presented three SIRS criteria and 182 patients presented with two SIRS criteria. Among the study population, a considerable proportion suffered from oncological or hemato-oncological diseases (40.6%, n = 189). A total of 86 patients received antibiotic therapy (18.5%) before blood sample taking. Clinical and laboratory data of the study population are presented in Table 1 and Table 2. Most common infection foci were respiratory tract infections (n = 94, 14.9% bacteraemia rate), urinary tract infections (n = 51, 23.5% bacteraemia rate) and gastrointestinal system infection (n = 50, 40.0% bacteraemia rate, see: Supplementary Table 1). In 34 bacteraemic patients, no primary infection focus was found. The distribution of pathogens detected in BC and in the SeptiFast MGRADE test (Roche Diagnostics GmbH, Mannheim, Germany) is presented in Supplementary Table 2. More than one pathogen was detected in 13 patients.

Table 1 Clinical data of study participants.
Table 2 Laboratory data analysed in the study.

The best individual variable for predicting bacteraemia was procalcitonin (PCT) with a median area under the receiver operating curve (ROC-AUC) of 0.729 (95%CI: 0.679–0.779). The highest absolute correlation coefficients between PCT and other variables used for model training were found for C-reactive protein (CRP), total protein (TP) and lipopolysaccharide-binding protein (LBP; rs = 0.39, −0.35 and 0.35 respectively, see Fig. 1). As non-routinely used inflammation markers, several cytokines including IL-10, IL-17a and MIP-1b were analysed, which presented a low to moderate predictive capacity with a ROC-AUC ranging between 0.589 and 0.615. Interestingly, CRP, as a widely used infection marker, presented with a low predictive capacity (ROC-AUC: 0.569, 95%CI: 0.512–0.626), while several liver-related blood variables were significantly elevated in bacteraemic SIRS patients (e.g. bilirubin, gamma-glutamyl transpeptidase (γ-GT) or alanine transaminase (ALAT), see Table 2).

Figure 1
figure 1

Correlogram of features with the highest correlation to PCT. The labelling of the x and y axis is presented in the diagonal. Following parameters are displayed: PCT = procalcitonin, CRP = C-reactive protein, TP = total protein, LBP = lipopolysaccharide binding protein, Alb = albumin, Crea = creatinine, IL-6 = interleukin-6, NeuR = relative proportion of neutrophils, Plt = platelets, Bili = bilirubin; Spearman correlation coefficient is presented in the left lower part of the correlogram p-values are denoted as following: ***<0.0001, **<0.001,*<0.05, in the right upper part of the correlogram scatterplots of the presented features are shown.

In a next step, patterns of missing variables were analysed (see Fig. 2). The relative proportion of neutrophils (NeuR) and eosinophils (EosR) as well as fibrinogen (Fib) showed the highest amount of missing data (7%, 4% and 6% missingness respectively). When assessing distinct missingness patterns, Fib alone (3.7% of all patients) and NeuR alone (2.5% of all patients) were the most prominent patterns. Missing data was imputed using MI, generating 50 complete data sets. The imputed data sets differed in their imputed values, resembling the uncertainty of the missing values. After MI, imputed datasets were split into a training set and a test set using a 80:20 ratio and the splitting step was repeated ten times with each complete data set.

Figure 2
figure 2

Missing data aggregation plot. left = distribution of missing data, shown in percentage, right = missing pattern analysis (aggregation missingness plot, VIM package), percentages of missing patterns are displayed on the right side, 81% of the total study population had no missing values.

Model training and test set validation

As described in the Methods section, models were tuned using a 10-fold CV schema (repeated ten times). In test set validation (repeated ten times), the best ROC-AUC was found using the random forest (rf) approach with a 0.738 ROC-AUC (95%CI: 0.606–0.870), while the neural network model (nn) resulted in 0.698 ROC-AUC (95%CI: 0.549–0.857) and the elastic net regression (en) approach yielded 0.654 ROC-AUC (0.493–0.815). All models lead to a similar or lower performance than PCT, as the best individual variable, with 0.729 ROC-AUC (95%CI: 0.679–0.779).

When restricting the model training and validation process to those SIRS patients without any antibiotic therapy before blood culture taking, all three ML approaches presented a similar predictive capacity. Table 3 presents data in comparison to PCT as a reference. Moreover, models were also established for patients with two, three or four SIRS criteria fulfilled (see Table 3). Best results were found in patients with three SIRS criteria fulfilled, in that the rf approach resulted in 0.781 ROC-AUC (95% CI: 0.573–0.988).

Table 3 Comparison of the ROC-AUC of the used ML strategies in different patient groups.

Discussion

Bacteraemia is a life-threatening condition, requiring prompt diagnostic and therapeutic actions. Due to the clinical similarities of symptoms of severe infections to inflammatory reactions not related to infections, treating physicians are faced with many uncertainties resulting in a low true positive result rate of BC analysis20.

In this study, we evaluated linear and non-linear algorithms for predicting bacteraemia in a relevant SIRS patient cohort with a high risk of bacteraemia (prevalence: 28.8%). Apart from PCT, several routinely and non-routinely available variables were evaluated, which presented a poor individual predictive capacity (see Table 2). Among the models tested, rf strategy led to the best performance, resulting in 0.738 ROC-AUC (95%CI: 0.606–0.870). Despite a moderate to low degree of correlation (see Fig. 1), inclusion of these variables did not improve the predictive capacity of PCT in rf-, nn- or en-based models.

In a systematic review published in 2015, fifteen publications on validated prediction systems on bacteraemia were found22. Amongst these, models for several infection-locus specific cohorts or hospital-specific cohorts were established and validated, including patients with community-acquired pneumonia (CAP23,24,25), patients with skin or skin structure infections26, female patients with pyelonephritis27, patients in the emergency department (ED4,27,28,29,30), hospitalized patients19,31,32 or ICU patients29,33. In 13 studies, logistic regression models were applied and in two studies Bayesian networks were implemented, resulting in ROC-AUCs between 0.60 and 0.83. Interestingly, none of these models were routinely applied at the time the review was published. Further, in only two studies was the predictive capacity of PCT for predicting bacteraemia evaluated. Müller et al. evaluated CAP patients and PCT resulted in 0.79 ROC-AUC using a validation cohort assessment (95%CI: 0.72–0.88)23. Unfortunately, only PCT was assessed and therefore the ability of other variables to increase the predictive capacity of PCT remained unevaluated. Tudela et al. used the Charlson co-morbidity index (≥2) and PCT (>0.4 ng/ml) to predict bacteraemia in patients in the ED30, yielding 0.80 ROC-AUC in the derivation cohort (n = 275) and 0.74 ROC-AUC in the validation cohort (n = 137).

Currently, the best validated prediction model was published by Shapiro for patients in ED4. In a prospective observational study with 3,901 patients (8.2% bacteraemia rate), a clinical prediction rule was established with 0.75 ROC-AUC in the validations set (n = 1,264). They stratified patients into three risk groups, the low-risk group showing a bacteraemia rate of 0.9% in the validation cohort. Thus, they concluded that for low-risk patients BC analysis might be omitted. In independent external validation studies, this rule resulted in similar ROC-AUCs34,35. Several similar scores and modifications of the Shapiro score have been established, resulting in a similar outcome36,37,38,39. Among these, in two independent studies a modified score including PCT was used, which performed better than PCT alone38,39. However, the generalizability of these results remains unclear, since in both studies a formal validation strategy was lacking.

Despite multiple pathophysiological differences on the cellular level, one might speculate that the host inflammation response to non-infectious stimuli is controlled similarly to the reaction to invasive pathogens. However, PCT presented with a higher diagnostic capacity in studies conducted at the ICU than on the standard care ward, as shown in a meta-analysis by Hoeber et al.21. They included data from our group as well40. On mixed standard care wards, the pooled sensitivity was 0.76 (95% CI: 0.65–0.85) and specificity was 0.66 (95% CI: 0.57–0.76) when using a 0.5 ng/ml cut-off value.

Since our patient cohort presented with a high degree of comorbidities, CRP or fibrinogen as acute phase reaction mediators were also high in non-bacteraemic SIRS patients. Thus, CRP was not useful as a bacteraemia marker. In a cohort of 785 CAP-patients with 4.5% bacteraemic patients, the PSI score (Pneumonia Severity Index for CAP, ROC-AUC: 0.720, 95%CI: 0.630–0.809) and the CURB-65 score (Confusion, BUN > 7 mmol/l, Respiratory rate ≥30, SBP <90 mmHg, DBP ≤ 60 mmHg, Age ≥ 65, ROC-AUC: 0.720; 95%CI: 0.622–0.819) showed a better capacity for predicting bacteraemia than CRP (ROC-AUC: 0.629, 95%CI: 0.522–0.735)41.

Further, a large proportion of SIRS patients presented with an infection, but without evidence of bacteraemia putatively contributing to the low predictive capacity of CRP. Interestingly, several liver-related blood markers presented a better predictive capacity than CRP for identification of bacteraemia. Our patient cohort was also stratified into risk groups according to the number of SIRS criteria fulfilled; however, the results were less convincing (see: Table 3). Generally, risk group stratification might have performed better when applying it in less specifically selected patients than our SIRS patients4,32,42. This might be based on the fact that SIRS criteria themselves are partly used for risk group stratification and therefore a further selection of low-risk patients was precluded. A similar observation was also found in CAP patients43.

In our study cohort, we found a relative heterogeneity in the patients’ co-morbidities, with a focus on oncological and haematological patients (see: Table 1), as described in40,44,45. Increased homogeneity might have led to better classification performance. Further, the study was performed in a single centre setting, and thus our negative finding is not necessarily generalizable to other settings. Because of this negative finding, an external validation strategy was not applied. Furthermore, since only a limited number of patients were available, we did not use any statistical variable selection strategies, which would have required an additional validation loop (e.g. nested CV)46. However, we applied methods that inherently face the inclusion of non-informative variables by penalization terms or weights. Moreover, within the imputation process, training data and test data sets were imputed at once with respect to their outcome, which could have led to over-optimistic results. However, this effect was considered to be limited, due to the relatively low number of total missing values.

PCT was the best individual marker for predicting bacteraemia in SIRS patients treated on standard care wards with having a moderate diagnostic accuracy. Combinations of clinical variables, various cytokines and routinely available laboratory markers using linear or non-linear machine learning algorithms failed to improve the diagnostic accuracy of PCT. Therefore, we concluded that machine learning models failed to improve the predictive capacity of PCT for identifying bacteraemia in our SIRS patient cohort.

Methods

Study design

The prospective cohort study was performed between July 2011 and September 2012 on 14 medical and 13 surgical standard care wards at the Vienna General Hospital, Austria. After approval by the ethics committee of the Medical University of Vienna (EC-No. 518/2011), the study was conducted in accordance with the Declaration of Helsinki 1964 (including current revisions) and the Good Clinical Practice guidelines of the European Commission. Prior to participation, all patients gave written informed consent. As describe elsewhere40,44,45,47, patients from whom a blood culture analysis was requested were screened for fulfilling at least two SIRS criteria, as defined by48. Neutropenia induced by chemotherapy was not considered an admissible SIRS criterion. Patients after surgical procedures were only included, when SIRS was developed 72 hours after surgery. Bacteraemia was specified by a positive BC or real-time multiplex polymerase chain reaction (PCR) analysis result for a recognized bacterial species. Bacterial contaminants were defined as described by Hall and Lyman49. Coagulase-negative staphylococci (CNS) were considered as causative pathogens only when detected in two blood specimens taken in separate venepunctures. Further, the infection status of all patients was assessed after discharge from hospital by applying the definition criteria for hospital-acquired infections, established by the European Centre of Disease Control (ECDC50,). A total of 3,370 patients with suspected bacteraemia were screened. In 2,750 patients, less than two SIRS criteria were observed and 154 patients met at least one exclusion criterion.

Data collection

Clinical data was recorded during patients’ enrolment in this study, and was complemented after hospital discharge. Blood samples were cultured in a set of FA Plus (aerobic) and FN Plus (anaerobic) bottles using the BacT/ALERT 3D automated blood culture system (bioMérieux, Marcy l’Etoile, France). Bacterial isolates were specified by matrix-assisted laser desorption ionisation (MALDI) time of flight (TOF) mass spectroscopy (MS) using microflex LT with the Biotyper database (Bruker Daltonik GmbH, Bremen, Germany). In the event of Streptococcus pneumoniae identification, the assay result was additionally verified by optochin disc tests. Additionally, occurrence of microbial DNA was evaluated by the SeptiFast MGRADE test, which was applied in 220 patients according to the manufacturer’s specifications, as described in47.

The following 21 blood variables were analysed: procalcitonin (PCT, ng/ml, Hoffmann-La Roche Ltd, Basel, Switzerland), lipopolysaccharide-binding protein (LBP, µg/ml, IMMULITE 2000 Immunoassay System, Siemens Healthcare, Erlangen, Germany), C-reactive protein (CRP, mg/dl, Latex test; Beckman Coulter, Brea, CA, USA), interleukin-6 (IL-6, pg/ml, Hoffmann-La Roche Ltd), and fibrinogen according to Clauss (Fib, mg/dl, Hoffmann-La Roche Ltd, Basel, Switzerland). Further, albumin (Alb, g/l), alanine transaminase (ALAT, U/L), bilirubin (Bili, mg/dl), creatinine (Crea, mg/dl), gamma-glutamyl transpeptidase (γ-GT, U/L), serum iron (SI, µg/dl), lactate dehydrogenase (LDH, U/L), and total protein (TP, g/l; all reagents by Beckman Coulter, Brea, CA, USA) were analysed as standard laboratory parameters. Variables of the complete blood count including white blood cell counts (WBC, G/l), haemoglobin (Hb, g/dl); platelets (G/l), relative proportion of neutrophils (NeuR, %) and eosinophils (EosR, %) were analysed using a Stromatolyser-4DS (Sysmex, Norderstedt, Germany).

Analysis of none-routinely available cytokines

In a screening phase, the following panel of 13 pro- and anti-inflammatory cytokines were analysed in 36 SIRS-patients (including 19 bacteraemic patients): epithelial-derived neutrophil-activating protein (ENA)−78, granulocyte-colony stimulating factor (G-CSF), interleukin (IL)1-Ra, IL1-b, IL-2, IL-4, IL-5, IL-8, IL-10, IL-17a, monocyte chemoattractant protein (MCP)-1, macrophage inflammatory protein (MIP)-1a, MIP-1b. In a second phase, the three markers with the highest predictive capacity (IL-10, (pg/mL), IL-17a (pg/mL), and MIP-1b (macrophage inflammatory protein-1β, pg/ml)) were quantified in all available patients. The human performance kit B (R&D Systems, Thermo Fisher Scientific, Waltham, USA) was used with the Luminex 200™ System (Luminex Corporation, Austin, USA) according to manufacturer’s specifications.

Machine learning process

Machine learning methods were performed using R (version 3.3.0, Vienna, Austria51,). The caret package was used for model tuning and validation52. Random forest (rf, random forest package) and neural network models (nn) were used as non-linear models and compared to elastic net regression (en) as a linear model. Prior to model training, numerical data was standardized (Z-score standardization). The rf implementation described by Breimann was used with a maximum of 1,000 trees53. A single-hidden layer feedforward neural network, implemented in the nnet package, was used to establish the nn model54. During the model tuning process, the number of hidden units ranged from 1 to 10, the weight decay was set to 0, 0.1, 1 or 2, the maximum number of weights was set to 380 and the maximum number of iterations was set to 2,000. The following tuning parameters were used for the en model55: α from 0 to 1 (eight equidistant values, 0 = ridge regression, 1 = lasso regression), lambda from 0.1 to 1 (ten equidistant values).

Prior to the machine learning process, group differences between patients with or without bacteraemia were compared by using Fisher’s exact test or the Mann-Whitney U-test. Further, Spearman’s rank correlation coefficient (rs) was used to analyse the amount of correlation between variables. Statistical significance is defined as p-values less than 0.05 (two-tailed). An alpha accumulation error related to multiple testing was corrected by applying the Bonferroni-Holm correction.

The predictive capacity of individual variables was examined by comparing the area under the receiver operating curve (ROC-AUC). Missing data patterns were graphically assessed using the missing aggregation plot (VIM package). Multiple imputation (MI) was used for missing data imputation, using the mice package56. For imputation of numerical data, a predictive mean matching algorithm was applied, and ordinal or nominal data was imputed using logistic regression. Fifty completed data sets were generated.

Models were tuned using the training sets with a ten-fold cross validation (CV) scheme, repeated ten times. Among competing models, the model with the highest ROC-AUC was chosen. Prior to model training, study patients were randomly allocated to the training or test cohort using an 80:20 ratio (repeated ten times). For this split, bacteraemia status was used as a stratification criterion. Model prediction results of each patient were averaged over all imputed data sets in test set validation. This process was repeated ten times, resulting in different training sets and test sets for each repeat. The resulting ROC-AUCs were averaged over these ten repeats and the 95% confidence intervals (95% CI) of the ten repeats were calculated as follows: \(\pm 1.96\sqrt{\bar{{variance}_{within}}+{variance}_{between}}\)

Availability of materials and data

Data cannot be made openly available to protect the privacy of participants. Further information about the data and conditions for access to anonymized data can be requested from the corresponding author.