The association between four scoring systems and 30-day mortality among intensive care patients with sepsis: a cohort study

Several commonly used scoring systems (SOFA, SAPS II, LODS, and SIRS) are currently lacking large sample data to confirm the predictive value of 30-day mortality from sepsis, and their clinical net benefits of predicting mortality are still inconclusive. The baseline data, LODS score, SAPS II score, SIRS score, SOFA score, and 30-day prognosis of patients who met the diagnostic criteria of sepsis were retrieved from the Medical Information Mart for Intensive Care III (MIMIC-III) intensive care unit (ICU) database. Receiver operating characteristic (ROC) curves and comparisons between the areas under the ROC curves (AUC) were conducted. Decision curve analysis (DCA) was performed to determine the net benefits between the four scoring systems and 30-day mortality of sepsis. For all cases in the cohort study, the AUC of LODS, SAPS II, SIRS, SOFA were 0.733, 0.787, 0.597, and 0.688, respectively. The differences between the scoring systems were statistically significant (all P-values < 0.0001), and stratified analyses (the elderly and non-elderly) also showed the superiority of SAPS II among the four systems. According to the DCA, the net benefit ranges in descending order were SAPS II, LODS, SOFA, and SIRS. For stratified analyses of the elderly or non-elderly groups, the results also showed that SAPS II had the most net benefit. Among the four commonly used scoring systems, the SAPS II score has the highest predictive value for 30-day mortality from sepsis, which is better than LODS, SIRS, and SOFA. The results of the DCA curves show that using the SAPS II score to predict the 30-day mortality of intensive care patients with sepsis to guide clinical applications may obtain the highest net benefit.

www.nature.com/scientificreports/ discriminant validity and convergent validity 1 . Relative simplification is the advantage of SIRS criteria, and its net benefit of predicting mortality is still inconclusive. This study intends to explore the association between the four scoring systems (SOFA, SAPS II, LODS, and SIRS) and 30-day mortality of sepsis based on the MIMIC-III (Medical Information Mart for Intensive Care III) ICU database, to determine which scoring system could better predict 30-day mortality of sepsis and septic shock from the beginning of ICU admission. Considering that elderly patients with sepsis often present with atypical, nonspecific symptoms, and have greater mortality risks due to delay in time to diagnosis 8,9 , we will conduct a stratified analysis of elderly and non-elderly patients to determine whether age affects the efficacy of the scoring systems. In particular, we expect to discuss the net benefits between the scoring systems and 30-day mortality of sepsis through the decision curve analysis (DCA), a suitable method for evaluating alternative diagnostic and prognostic strategies 10 .

Methods
Database. MIMIC-III is a large, freely-available database comprised of over forty thousand patients admitted to the Beth Israel Deaconess Medical Center (BIDMC) between 2001 and 2012 11 . Any researcher who complies with the data use requirements is permitted to use the database. After passing the "Protecting Human Research Participants" exam on the website of the National Institutes of Health (NIH), an author (Tianyang Hu) was approved to extract data from this database (Record ID: 37474354). All patient-related information in the MIMIC-III database is anonymous and no informed consent is required.
Study population. We followed the method of Johnson et al. 12 to screen patients in the MIMIC-III ICU database from years 2008 to 2012 (the reason was that the group of admissions between 2008 and 2012 were easily identifiable in the database) that met the Sepsis-3 criteria, the core criteria for sepsis were extracted as suspected infection with associated organ dysfunction (SOFA greater than or equal to 2). All patients were required to have at least 24 h of ICU data. Finally, 5784 patients were identified as meeting the criteria, which was consistent with the results of Johnson et al. 12 . Meanwhile, we conducted a stratified analysis of the elderly (more than 65 years old) and non-elderly.
Data extraction. Data  Statistical analysis. Continuous variables were assessed for normality using the Kolmogorov-Smirnov test. Continuous variables with a normal distribution were expressed as mean ± standard deviation (M ± SD), and the independent sample t test was used for the comparison; if the distribution was not normal, continuous variables were expressed as the median with interquartile range (IQR), and the Wilcoxon rank-sum test was used for comparison. Categorical variables were expressed as numbers and percentages, and compared using the Chi-square test. Multiple and binomial logistic regression analysis of the four scoring systems for 30-day mortality among intensive care patients with sepsis were conducted to adjust the results of the statistical analysis for potential confounding factors. Variables with a P-value of < 0.1 in univariate analysis were included in multivariate analysis. Z test was used to compare the predictive value of each scoring system by comparing the area under curves (AUC) of the receiver operating characteristic curves (ROC), and the larger the AUC, the better the predictive performance. All the analyses were conducted using SPSS software (v26.0; IBM, Armonk, NY), MedCalc Statistical Software (v19.6.1; MedCalc Software Ltd, Ostend, Belgium), and R software (version 4.0.3, CRAN). Among them, Z test was performed with Medcalc Statistical Software following the method of Delong et al. 13 ; the DCA was performed with R software, mainly using the "rmda" package. A P-value < 0.05 was considered to be statistically significant.

Results
Baseline characteristics. A total of 5784 sepsis patients (elderly, n = 3138; non-elderly, n = 2646) were included in our study, in which, 1042 died and 4742 survived within 30 days. The age of the death group (71.66 ± 15.61) was higher than that of the survival group (64.17 ± 17.77), and the difference was statistically significant (P < 0.001). In addition, in the death group, the incidence of septic shock and coexisting comorbidities (chronic pulmonary disease and renal failure) was higher (P < 0.001, P = 0.004, and P = 0.006, respectively) and the four scoring systems (SOFA, SAPS II, LODS, and SIRS) scored higher (all P < 0.001). The gender and other coexisting comorbidities (coronary atherosclerotic heart disease, diabetes, and hypertension) show no significant difference between the two groups. The baseline data are summarized in Table 1.
Comparison of ROC curves. ROC curves were performed to evaluate the predictive value of four scoring systems for 30-day mortality for all cases in the cohort study (Fig. 1 www.nature.com/scientificreports/ each scoring system was selected as the diagnostic optimal cut-off value for predicting the 30-day mortality. SIRS criteria had the highest sensitivity of 81.2%, while the corresponding Youden's index was the lowest, 0.137; the SAPS II score had the highest specificity at 79.8%, and its Youden's index was also the highest, with a corresponding sensitivity of 62.8%. The remaining results are summarized in Table 2.   www.nature.com/scientificreports/ For the elderly (Fig. 2), the AUC of LODS, SAPS II, SIRS, SOFA were 0.715, 0.754, 0.619, and 0.665, respectively. The results of the AUC comparisons were as follows: LODS versus SAPS II (Z = 5.122, P < 0.0001), LODS versus SIRS (Z = 7.075, P < 0.0001), LODS versus SOFA (Z = 5.796, P < 0.0001), SAPS II versus SIRS (Z = 10.127, P < 0.0001), SAPS II versus SOFA (Z = 9.417, P < 0.0001), SIRS versus SOFA (Z = 3.280, P = 0.0010). Similarly, SIRS criteria had the highest sensitivity (80.4%), but the Youden's index (0.173) was the smallest; the SAPS II score had the highest specificity (0.721) and Youden's index (0.406). The results are summarized in Table 3.
Comparison of decision curves. According to the DCA, the net benefit ranges in descending order were SAPS II, LODS, SOFA, and SIRS, which means SAPS II was optimal among the four scoring systems (Fig. 4). For stratified analyses of the elderly or non-elderly groups, the results also showed that SAPS II had the most net benefit (Figs. 5 and 6).

Discussion
This study followed the latest definition of Sepsis-3 and selected four commonly used scoring systems to conduct a large-sample retrospective cohort study. Meanwhile, in the selection of patients, we strictly follow the standards of Johnson et al. 11,12 , for they are in charge of the MIMIC database, and some of them also work for the Beth Israel Deaconess Medical Center. In this way, the results we summarized could be more credible. By drawing ROC curves and comparing AUC, we found that the AUC of the four systems from large to small were as follows: SAPS II, LODS, SOFA, and SIRS, indicating that SAPS II has the best predictive value (SAPS II > 46.5 can predict the risk of 30-day mortality in intensive care patients with sepsis), followed by LODS, and the predictive value of SOFA and SIRS is relatively low. This ranking is almost consistent with the complexity ranking of the four scoring systems while the SAPS II is calculated from the worst value of 12 routine physiological measurements 5 . However,  www.nature.com/scientificreports/ these items are easily available in the ICU, so the complexity of SAPS II may not affect its clinical application, even if clinicians are more inclined to use a concise and easily accessible scoring system to predict the risk of death. As a diagnostic criterion for sepsis, SOFA has been shown to be effective in assessing the prognosis of patients with sepsis in large retrospective studies 3,14 , but the results of our study show that SOFA has no priority in predicting the mortality of intensive care patients. Therefore, it is necessary to adopt multiple scoring systems in the ICU management of sepsis. A previous study (n = 7932) showed that the predictive validity for in-hospital mortality of SOFA was not significantly different than the more complex LODS among ICU encounters with suspected infection, supporting the use of SOFA in clinical criteria for sepsis 15 . It is worth noting that the SIRS criteria, SOFA and LODS scores of the study were calculated for the time window from 48 h before to 24 h after the onset of infection, while the relevant scores calculated in our study were all derived from the first 24 h of admission. Although the starting time for the follow-up are not the same, our study also found that the results of LODS predicting the 30-day mortality of non-elderly patients are consistent with the above, however, in predicting the mortality of all sepsis patients, LODS has a slight advantage over SOFA. The finding seems meaningful. More than 60% of sepsis diagnoses are made in the elderly 16 , and it is therefore valuable to predict the 30-day mortality rate of the elderly, especially in ICU. For example, elderly patients not expected to survive sepsis may consider palliative care services to relieve pain and make death more peaceful for instance. However, compared to SAPS II, the predictive effectiveness of LODS and SOFA is much inferior. Our logistic regression analysis also found that LODS and SOFA were not correlated with 30-day mortality in intensive care patients with sepsis, but SAPS II was an independent risk factor. Therefore, SAPS II is undoubtedly a better scoring system choice for predicting the 30-day mortality of elderly or non-elderly patients with sepsis in ICU.
SIRS criteria in this study showed high sensitivity, its specificity and Youden's index were very low. The criteria were once used for the diagnosis of sepsis 1.0 1 , but the SIRS syndrome is not only induced by infection, such as trauma, severe acute pancreatitis, and shock can all lead to SIRS, which may be the root cause of the lack of specificity of SIRS criteria. Previous studies have shown that SOFA, even qSOFA (for quick SOFA, using three criteria, assigning one point for SBP ≤ 100 mmHg, high respiratory rate ≥ 22 breaths per min, or Glasgow coma scale < 15), are superior to SIRS criteria in determining the ICU stay and mortality of patients 17,18 . It can be seen that the SIRS criteria are not appropriate for the diagnosis or the prognosis prediction of sepsis.
In the stratified analysis, the sensitivity and specificity of the different scoring systems markedly changed in the elderly and non-elderly, especially SIRS and SOFA. The SIRS includes only four items (temperature, heart rate, respiration, and white blood cells), which can easily lead to deviations in results. Moreover, previous studies have shown that SIRS is a prevalent feature of sepsis, should be an important component of the diagnostic process 19 , not of the prognostic process. Therefore, the difference between sensitivity and specificity of the stratified analysis also reflects that SIRS is not suitable for predicting 30-day mortality of sepsis. As for the SOFA, previous studies believe that regular and repeated scoring of SOFA can better understand the condition and disease development of the patients 20 . Then, the SOFA scoring performed on the first day of admission may not be sufficient to clarify the real status of the patients or predict the 30-day mortality, and may also potentially cause the difference between sensitivity and specificity of the stratified analysis. In addition, the sample size of elderly and non-elderly patients and the heterogeneity of the patients themselves are also potential factors leading to differences. We noticed that the sensitivity and specificity of the LODS and SAPS II in the stratified analysis are not much different, probably because the LODS provides an objective tool for assessing severity levels for organ dysfunction in the ICU 6 , while the SAPS II provides an estimate of the risk of death without having to specify a primary diagnosis 4 . The SAPS II always maintained the highest Youden's index, which is even more confirmed its universality and robustness.
Traditional metrics of diagnostic performance such as sensitivity, specificity, and AUC only measure the diagnostic accuracy of one prediction model against another, but fail to account for the clinical utility. A model with a higher AUC is likely to be more valuable than one with a lower AUC but models with higher AUCs can www.nature.com/scientificreports/ sometimes lead to inferior outcomes 10 . DCA is a widely used method to measure the clinical utility of a specific model 21 , and can therefore inform the decision of whether to use a model at all or which of several models is optimal. DCA is graphically expressed as a curve with benefit score on the vertical axis and probability thresholds on the horizontal axis. A key concept of DCA is the "threshold probability", where the expected benefit of treatment is equal to the expected benefit of avoiding treatment. The so-called "net benefit" is determined by calculating the difference between the expected benefit and the expected harm. One line is drawn to show what happens when no treatment is ever given (no net benefit. such as "the horizontal line" with ordinate of 0 in Fig. 4), and another curve is drawn as if all patients receive treatment irrespective of predicted results (such as "the diagonal line" in Fig. 4). For any given probability threshold, the curve with the highest benefit score at that threshold is the best choice 10 . If one curve is highest over the full range of probability thresholds, then the associated diagnostic approach would be the best decision for all patients 21 . In our study, regardless of whether it is for all included patients or stratified analysis (by age), the DCA curve of the SAPS II scoring system is the highest within the entire probability threshold range, indicating that using the SAPS II scoring system to judge the 30-day mortality of patients with sepsis and further deciding whether to conduct active intervention will yield the greatest benefits. The range under the DCA curve of the four scoring systems is almost the same as the corresponding AUC, again confirming the superiority of the SAPS II scoring system. It is worth mentioning that the DCA curve of the SIRS mostly overlaps with "the horizontal line" and "the diagonal line", indicating that it is not suitable for clinical application. We must acknowledge some limitations of our study: firstly, the ethnicities of the population included in this study are mainly white and black, and there may be ethnic differences. Therefore, the results may not be applicable to other ethnic groups, such as Asian or Hispanic; secondly, due to the diversity of the scoring system, this study selected just four existing representative ones, hoping to be easily applied to clinical decision-making. Even if SAPS II performs well, it may not be the best score for predicting the 30-day mortality of sepsis. Hou et al. built a model using machine learning technique by XGboost, and found that the net benefit for XGboost model was larger over the range of SAPS II score, which means the novel model is optimal and the SAPS II score inferior 22 . With the rapid development of artificial intelligence (machine learning) and medical big data, better predictive models may be developed in the future; thirdly, this study failed to distinguish patients with sepsis caused by different infection sites, and various scoring systems may have biased predictions of the prognosis. Once the infection sites of the MIMIC database are further supplemented, this issue will be solved. Last but not least, the Medcalc Statistical Software cannot be used for comparison after correction, the related significativity may be overestimated statistically in this study. Thus, prospective researches need to be conducted, and better predictors need to be further explored.

Conclusions
Among the four commonly used scoring systems, the SAPS II score has the highest predictive value for 30-day mortality from sepsis, which is better than LODS, SIRS, and SOFA. The results of the DCA curves show that using the SAPS II score to predict the 30-day mortality of patients with sepsis to guide clinical applications may obtain the highest net benefit.