Neither Single nor a Combination of Routine Laboratory Parameters can Discriminate between Gram-positive and Gram-negative Bacteremia

Adequate early empiric antibiotic therapy is pivotal for the outcome of patients with bloodstream infections. In clinical practice the use of surrogate laboratory parameters is frequently proposed to predict underlying bacterial pathogens; however there is no clear evidence for this assumption. In this study, we investigated the discriminatory capacity of predictive models consisting of routinely available laboratory parameters to predict the presence of Gram-positive or Gram-negative bacteremia. Major machine learning algorithms were screened for their capacity to maximize the area under the receiver operating characteristic curve (ROC-AUC) for discriminating between Gram-positive and Gram-negative cases. Data from 23,765 patients with clinically suspected bacteremia were screened and 1,180 bacteremic patients were included in the study. A relative predominance of Gram-negative bacteremia (54.0%), which was more pronounced in females (59.1%), was observed. The final model achieved 0.675 ROC-AUC resulting in 44.57% sensitivity and 79.75% specificity. Various parameters presented a significant difference between both genders. In gender-specific models, the discriminatory potency was slightly improved. The results of this study do not support the use of surrogate laboratory parameters for predicting classes of causative pathogens. In this patient cohort, gender-specific differences in various laboratory parameters were observed, indicating differences in the host response between genders.

Single Variable Evaluation. In total, 49 laboratory parameters were evaluated regarding their single predictive power to distinguish between patients with Gram-positives and Gram-negative bacteremia. After applying the Bonferroni-Holm method, significant differences were found in the absolute lymphocyte count, in the relative and absolute count of monocytes, magnesium (all: p < 0.001), phosphate (p = 0.001) and C-reactive protein (CRP, p = 0.001). Details are presented in Table 1. In ROC curve analysis, the best predictive parameters were the absolute and relative amount of monocytes (0.589  1 absence of routine laboratory data for the respective day (more than 95% data missing), 2 lack of identification of the microorganism at the species level or with potentially contaminated blood, 3 rarely detected pathogens (less than 0.15% percent).
Multivariable Prediction. The CFS subset evaluator was applied in order to select a relevant parameter set. This feature selection algorithm selected seven parameters, namely gender, amount of lymphocytes (absolute), monocytes (absolute and relative), fibrinogen, creatinine and CRP. For model selection, widely-used machine learning algorithms, including artificial neural networks and support vector machines, were successively applied. Table 2 presents data of the predictive capacities of the machine learning algorithms evaluated and Fig. 2 shows the corresponding ROC-AUC plots. Supplementary Table  S2 represents an overview of parameters included in each model. Best performances were shown with the decision tree-based random forest algorithm (RF, ROC-AUC: 0.653, CI: 0.622-0.684) and the K-Star algorithm (ROC-AUC: 0.642, CI: 0.610-0.673). Additionally, a wrapper approach selecting an optimal parameter set for a particular algorithm was applied. Using this approach, the K-Star algorithm with 0.675 ROC-AUC (CI: 0.645-0.705) presented the best performance. This model, consisting of seven parameters, was significantly better (p < 0.001) than the absolute monocyte count (0.589 ROC-AUC), which represented the best predictive individual parameter. The model resulted in a poor calibration (Hosmer-Lemeshow test: p < 0.001), with an inadequate increase in the predicted risk compared to the observed risk (see Supplementary Figure S1a). When applying the Youden index method for cut-off selection, the K-star model resulted in 44.57% sensitivity, 79.75% specificity, 62.79% NPV, and 65.23% PPV for predicting Gram-negative bacteremia.

Assessment of gender aspect. Since an unequal distribution between Gram-positives and
Gram-negative cases and patient' s gender was relatively found, it was speculated that laboratory parameters may present gender-specific differences. Apart from mean corpuscular haemoglobin (MCH) and mean corpuscular volume (MCV), for which gender-specific reference values are established, a significant difference between male and female patients was found in a further four parameters. Females presented with lower blood urea nitrogen (BUN), potassium and creatinine levels, but higher cholesterol levels compared to male patients with bacteremia (all: p < 0.001, see Supplementary Table S3).
Gender-specific data of Gram-positive and Gram-negative cases are shown in Table 3. After applying the Bonferroni-Holm correction, albumin and absolute monocyte count showed significant differences between Gram-positives and Gram-negatives in female patients. Similarly, the distribution of relative and absolute monocyte count, absolute lymphocyte count, CRP, fibrinogen and magnesium significantly differed between male patients with Gram-positive and Gram-negative bacteremia.
Since it was evident that patients' gender had considerable influence on various parameters, gender-specific models were trained. When restricted to female patients, the K-Star algorithm performed best in the parameter subset selected by CFS subset evaluator (0.644 ROC-AUC, CI: 0.595-0.693) as well as in a wrapper approach selected subset (ROC-AUC: 0.716, CI: 0.670-0.761). Figure 3 presents ROC-AUCs of both parameter subsets. When using the Youden Index, the K-star wrapper model reached 65.50% sensitivity, 64.71% specificity, 56.22% PPV and 73.05% NPV for predicting Gram-negatives (see Table 4). The model was not adequately calibrated to the given data (p < 0.001), with an underrating of the risk in low-risk patients and an overrating for high-risk patients (see Supplementary Figure S1b).
In male patients, the RF-classifier and the K-star algorithm presented the best ROC-AUCs (RF: 0.657, CI: 0.616-0.700; K-star: 0.633, CI: 0.592-0.674) in the CFS-selected subset. Using the wrapper approach for feature selection, the K-star classifier achieved the best ROC-AUC with 0.699 (CI: 0.660-0.738, see Fig. 4). Male patient-derived model yielded 69.39% sensitivity, 64.37% specificity, 65.75% PPV and 68.09% NPV for the prediction of Gram-negative cases. In contrast to the other models, the calibration curve indicated a better model calibration (see Supplementary Figure S1c). However, especially in the high risk range, a significant deviation between the predicted risk to the observed risk was seen, which was also indicated by the Hosmer-Lemeshow test (p < 0.001).

Discussion
Improvement of survival in patients with severe bloodstream infections largely depends on the appropriate choice of early empiric antimicrobial therapy. The aim of the present study was to investigate the predictive capacity of highly standardizable parameters to discriminate between Gram-positive and Gram-negative bacteremia.
Among the parameters tested, six variables showed a significant difference between patients with Gram-positive and patients with Gram-negative bacteremia. The absolute monocyte count revealed the highest discriminatory abilities with 0.589 ROC-AUC (CI: 0.557-0.622). In accordance with the literature, CRP was higher in Gram-negative than in Gram-positive infections (p = 0.001), while WBC did not show any significant alterations between Gram-positive and Gram-negative pathogens 12 . Using the K-star algorithm, a model with 0.675 ROC-AUC (CI: 0.645-0.705) was established, resulting in 44.6% sensitivity and 79.8% specificity for detecting Gram-negative bacteremia. Although the model was significantly better than the best single discriminatory parameter (p < 0.001), its ability to estimate the predicted risk was poor. Based on these results, laboratory markers cannot be reliably used to predict classes of bacterial pathogens and this clinical practice therefore should be discouraged. This finding is of particular Scientific RepoRts | 5:16008 | DOi: 10.1038/srep16008   Continued importance, since the selection of tailored antimicrobial regimens based on unreliable prediction models may lead to a higher risk of treatment failure and ultimately to higher mortality. Interestingly, a gender-specific aspect in the susceptibility to Gram-negative pathogens as well as in the response to bacteremia in general was noted. As described in the literature, more male than female patients suffered from bacteremia 14,15 . However, a significantly lower rate of Gram-positive bacteremias and comparatively higher relative rate of Gram-negative bacteremia was found in females than in male patients (p = 0.003). At this stage it is only possible to speculate as to the reasons for this higher susceptibility of female patients to Gram-negative pathogens. Several potential causes have been postulated, including the influence of hormones, X-chromosomal gene polymorphisms, or expression of cytokines [16][17][18][19][20] . Such a gender-specific difference in the cytokine expression was found in several studies, with a more steady expression of pro-inflammatory cytokines in male patients 18 . Furthermore, patients' gender has been shown to impact on the outcome of patients with severe infections. In several publications, female patients presented a higher mortality rate [21][22][23][24] . In a prospective study conducted at ICUs, the overall mortality was balanced between both genders, but in subgroup analysis restricted to septic patients, females had a significantly higher mortality rate than males (23.1% vs. 13.7%, p = 0.036) 22 .
Moreover, a significant difference between female and male patients was seen in various laboratory parameters without gender-specific reference ranges (see Supplementary Table S3). However, none of these parameters presented predictive capacities for differentiating Gram-positives and Gram-negatives (Tables 2 and 3). When using data aggregated on the basis of gender, the resulting K-star model was slightly superior to models using both genders. However, the predictive capacity of these models as well as their model calibration was poor.
Due to its retrospective design, several limitations must be considered. Firstly, clinical data might have improved the predictive capacities of established models. However, clinical data are difficult to standardize and therefore models incorporating clinical data are difficult to apply in everyday use and their external validity to other health care institutions and settings is questionable. Furthermore, the analysis was restricted to laboratory parameters routinely available during the study period. Non-routinely available parameters, such as lipopolysaccharide binding protein (LBP) and the CD14-ST isoform, have been shown to be associated with Gram-negative sepsis, and might therefore have improved predictive models 11,25 . The gender-specific differences in the immunological response of bacteremic patients were unexpected findings in our cohort; however, their effect on the resulting model was limited.
In summary, the results of this study do not support the assumption that surrogate parameters are potential predictors for the classification of causative bacterial pathogens. The usefulness of models consisting of routinely available laboratory parameters to discriminate between Gram-positive and Gram-negative bacteremia was limited. In this study cohort, gender-specific differences in various laboratory parameters were observed, indicating differences in the host response to bacterial blood stream infection.

Methods
Study design and data collection. This retrospective cohort study included patients with suspected blood stream infection, treated between January 2006 and December 2010 at Vienna General Hospital. As previously described 26 , all patients for whom their treating physician requested blood culture analysis and a certain panel of laboratory parameters within the same day were screened for eligibility. Those patients with a positive blood culture revealing a bacterial pathogen were included in the study. Exclusion criteria were the patient's age (less than 18 years), absence of routine laboratory data for the respective day (more than 95% data missing) or negative results from the blood culture analysis. Patients for whom the microorganism could not be identified on the species level or with potentially contaminated blood culture results were excluded from the study. Contaminates were pre-defined according to Hall   patients with very rarely detected pathogens (less than 0.15% percent) were also excluded from analysis. In total, 49 laboratory parameters, as well as the patient' s age and gender, were statistically analyzed.  29,30 . Each parameter is characterized as median and interquartile range.
For the establishment of multivariable models, various major classes of machine learning algorithms were employed, including artificial neural networks, support vector machines or Bayes theorem based algorithms. As a reference algorithm, a logistic regression was applied. In brief, (1) the Naïve Bayes classifier (NB) is a simple probabilistic algorithm that assumes that all input parameters are independent  Table 3. Gender specific data of parameters with predictive capacities for discriminating Grampositives and Gram-negatives. 1 Mann Whitney U-test, 2 area under the receiver operating characteristic curve, ns = not significant. CRP = C-reactive protein, MG = magnesium.   Table 4. Predictive capacities of K-Star models. 1 Area under the receiver operating characteristic curve, 2 for prediction of Gram-negative bacteremia, bootstrapped confidence intervals are given in brackets.
from each other and directly applies Bayes' theorem for classification. (2) The artificial neural network algorithm (ANN) approximates nonlinear functions by superpositions of simple nonlinear basic functions (here, the sigmoid function was used), optimized by a gradient-based algorithm. (3) A support vector machine (SVM) uses a Kernel function to transform the original features into a high-dimensional feature space and finds the linear discrimination with the largest margin (large-margin classifier). In this study, the SMO-SVM algorithm was applied, using normalized attributes and a polynomial kernel. (4) The K-Star algorithm is an instance-based classifier using an entropy distance function. (5) The random forest algorithm (RF) is an ensemble algorithm using decision trees with a bagging approach [33][34][35][36][37][38] . All algorithms were used in WEKA standard settings. Results were taken from an internal ten-fold cross validation. These were used for ROC-AUC analysis and estimation of the model's calibration. For this purpose, Hosmer-Lemeshow tests as well as calibration plots were applied 39 . ROC-AUCs were compared by applying the DeLong test as well as the Hanley and McNeil test [40][41][42] . Cut-off points were set using the Youden index method. Confidence intervals (CI) of binary outcome measures, including sensitivity, specificity, negative predictive value (NPV) or positive predictive value (PPV), were bootstrapped in 2,000 iterations. Statistical significance was defined as p-values less than 0.05. Where appropriate, errors related to multiple testing were corrected by applying the Bonferroni-Holm method.