Vital signs assessed in initial clinical encounters predict COVID-19 mortality in an NYC hospital system

Timely and effective clinical decision-making for COVID-19 requires rapid identification of risk factors for disease outcomes. Our objective was to identify characteristics available immediately upon first clinical evaluation related COVID-19 mortality. We conducted a retrospective study of 8770 laboratory-confirmed cases of SARS-CoV-2 from a network of 53 facilities in New-York City. We analysed 3 classes of variables; demographic, clinical, and comorbid factors, in a two-tiered analysis that included traditional regression strategies and machine learning. COVID-19 mortality was 12.7%. Logistic regression identified older age (OR, 1.69 [95% CI 1.66–1.92]), male sex (OR, 1.57 [95% CI 1.30–1.90]), higher BMI (OR, 1.03 [95% CI 1.102–1.05]), higher heart rate (OR, 1.01 [95% CI 1.00–1.01]), higher respiratory rate (OR, 1.05 [95% CI 1.03–1.07]), lower oxygen saturation (OR, 0.94 [95% CI 0.93–0.96]), and chronic kidney disease (OR, 1.53 [95% CI 1.20–1.95]) were associated with COVID-19 mortality. Using gradient-boosting machine learning, these factors predicted COVID-19 related mortality (AUC = 0.86) following cross-validation in a training set. Immediate, objective and culturally generalizable measures accessible upon clinical presentation are effective predictors of COVID-19 outcome. These findings may inform rapid response strategies to optimize health care delivery in parts of the world who have not yet confronted this epidemic, as well as in those forecasting a possible second outbreak.

Identifying susceptibility to COVID-19 related mortality based on measures immediately available at the first clinical evaluation may assist medical staff in providing timely and effective care for each patient. To date, many of the descriptions of COVID-19 patients rely on small datasets which may suffer from overfitting, limiting the generalization of the findings to other populations [1][2][3][4][5][6] . Further, most existing studies of COVID-19 related health outcomes compile demographic statistics offering information only about single risk factors, rather than combined risk [7][8][9] . Very few studies 10,11 have taken a comprehensive risk evaluation based on personalized demographic and physical characteristics acquired at the first encounter to predict COVID-19 related mortality.
In the present study, we explicitly test how demographic, clinical, and co-morbid disease factors relate to COVID-19 mortality in 8770 patients with laboratory-confirmed SARS-CoV-2 infection. We analysed 3 broad classes of variables; demographic factors, clinical indicators, and comorbid conditions, in a two-tiered analysis that included traditional regression strategies and machine learning methodologies. To provide timely information that would support fast clinical decision-making, we focused on factors that can be assessed immediately at the first clinical evaluation and did not require laboratory processing or extensive medical chart review. Descriptive and inferential modelling. Sample size was determined by the number of SARS-CoV-2 positive patients treated by Mount Sinai Health System during the study period (Date to April 24, 2020) and we did not perform a priori statistical sample size calculation. A multivariable logistic regression model with the binary outcome (survivor/non-survivor) was used to estimate the association between COVID-19 related mortality and baseline demographic, clinical characteristics, and comorbidities. Age was modelled as a decadal continuous variable to increase the interpretability of the results. Race and ethnicity were collected as separate variables and combined into 4 categories: Black, Hispanic, White, and other/Unknown, where patients with Hispanic ethnicity were grouped in the race category 'Hispanic' , regardless of their race classification. Patients with oxygen saturation inferior to 40% were excluded, to eliminate this variable as the cause of death. Smoking was collapsed into two categories: ever/never. Odds ratios (OR) for mortality relative to each predictor were estimated, and statistical significance was assessed relative to an alpha of 0.05. Models were implemented in R (v3.5.1) using the glm package.
Predictive modelling. To assess the utility of these measures in predicting COVID-19 mortality, we constructed a machine learning model utilizing the Extreme Gradient Boosting framework implemented in the Xgboost (v1.0.0.2) package in R. The available data were divided in a training set (60% of data) and a holdout test set. In the training set, the mlr (v2.17.1) package was used to tune model hyperparameters across tenfold stratified cross-validation with a random grid search. Hyperparameters used for tuning included the boosting function (linear, or tree-based), tree depth, learning rate, L1 regularization parameter, L2 regularization parameter, boosting rounds, class weighting, and the minimum loss reduction for partitioning. Following training, overall model performance was evaluated by predicting mortality in the naïve holdout test set, with receiver operating characteristic (ROC) curves and area-under-curve (AUC) used to assess model efficacy. Feature importance was estimated through the calculation of gain, which reflects the fractional contribution of a given feature to the overall model.

Results
As of April 24, 2020, a total of 46,945 patients had an encounter at a Mount Sinai facility who have either been tested for COVID-19 or who are under investigation for COVID-19. RT-PCR confirmation for SARS-CoV-2 was available for 8770 of these patients which comprise the final sample for our analyses. Overall, 4766 (54.3%) of patients were male, 4525 (70.1%) never smoked, 3996 (69.2%) had a BMI greater than 25. Self-reported race/ ethnicity included 2310 (26.4%) White, 1955 (22.3%) Black, and 1975 (22.5%) Hispanic. The median age was 60 years (IQR, 32-88) (range, 0-90 years). A total of 2293 (26.1%) were aged 71 years and older, and 2956 (33.7%) were younger than 51 years. The most common comorbidities were hypertension (2281, 26%), and diabetes (1631, 18.6%). At encounter, 784 (11.5%) presented with a respiratory rate greater than 24 breaths/min, 1308 (18.4%) with temperature greater than 38.0 °C, 2582 (36.6%) with heart rate greater than 100 beats/min, and 2826 (40.4%) with oxygen saturation level below 96%. Among the confirmed cases included in our analyses, 1114 (12.7%) died from COVID-related symptoms. For non-survivors, the median time of death after the encounter was 6 days. For survivors, the median time of discharge after the encounter was 3 days. Sociodemographic, clinical characteristics, and comorbidities of patients stratified by survival are reported in Table 1.
Results of the multivariable logistic regression are presented in Fig. 1 Following cross-validation in a training set, we applied a machine learning model utilizing the extreme gradient boosting framework to a holdout test set. Figure 2, Panel A, shows the receiver operating characteristic (ROC) curve summarizing model performance with an AUC of 0.86. In Fig. 2, Panel B, we show the features that contributed most to model performance, with age, oxygen saturation, BMI, respiratory rate, heart rate, and temperature, contributing most to model importance.

Discussion
In this study of COVID-19 patients hospitalized in a large, socio-demographically diverse New York City hospital system, we report that vital indicators typically collected during initial clinical evaluations are effective predictors of COVID-19 related mortality. We implemented two approaches to investigate factors of COVID-19 mortality: multivariable logistic regression to describe characteristics associated with mortality, and gradient boosting to Scientific Reports | (2020) 10:21545 | https://doi.org/10.1038/s41598-020-78392-1 www.nature.com/scientificreports/ predict mortality. Seven clinical factors were associated with COVID-19 mortality: age (older), gender (male), BMI (higher), heart rate (higher), respiratory rate (higher), O 2 saturation (lower), and chronic kidney disease. When combined, these factors predicted COVID-19 related mortality with an AUC of 0.86 in naïve data following cross-validation in a training set. Age, oxygen saturation, BMI, respiratory rate, heart rate, and temperature contributed most to the prediction of COVID-19 mortality. This study reports significant associations between vitals measured at the first clinical encounter and COVID-19 mortality. Consistent with earlier reports from China, Italy, and another NYC area, our logistic regression models showed that age, sex, and BMI have a major effect on COVID-19 mortality 7,8,[13][14][15] . Although we confirm previous findings of a high prevalence of hypertension and diabetes in COVID-19 patients 8,13,16 , we did not find significant association between these comorbidities and COVID-19 mortality.
Notably, our results show that immediate, objective measures collected at the time of first clinical presentation can be effective predictors of mortality. Moreover, these measures can be obtained if the patient is unresponsive or unconscious. Our results expand prior descriptive reports to provide statistical confirmation of suspected risk factors and emphasize that the interaction of these variables is ultimately predictive of mortality. These findings may inform rapid response strategies to optimize health care delivery in parts of the world who have yet confronted this epidemic, as well as in those forecasting a possible second outbreak.
This study has several limitations. First, due to a lack of widespread testing for COVID-19, only severe cases of COVID-19 had laboratory confirmation of SARS-CoV-2. As such, this study may have disproportionately included patients with poor outcomes, limiting the generalizability of our study. Second, due to the critical nature of the situation in the New York City area, we did not obtain information regarding oxygen support or ICU admissions. As well, by determining the outcome at the time of analyses, we may have misclassified patients that have not completed their hospital admission. Lastly, given that our analysis focuses specifically on patient characteristics in healthcare facilities, our results should not be interpreted as indicative of patterns in the population at large.

Conclusions
In this retrospective observational study focusing on demographic and clinical characteristics of confirmed COVID-19 patients in a large NYC hospital system, older age, being a male, higher BMI, presenting vitals of higher heart rate, higher respiratory rate and lower O 2 saturation as well as having CKD, were identified as risk factors for COVID-19 mortality. We found that these factors could be combined in a gradient-boosting machine learning model to create an effective predictor of mortality with an AUC of 0.86. Notably, our results show that immediate, objective measures collected at the time of clinical presentation, independently of patient level of consciousness, can be effective predictors of mortality. Reliance on results from hematologic and biochemical laboratory tests or extensive medical history review may create a critical lag in response time. These findings may inform rapid response strategies to optimize health care delivery in parts of the world who have yet confronted this epidemic, as well as in those forecasting a possible second outbreak.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.