Abstract
Coronavirus disease 2019 (COVID19) is a respiratory disease with rapid humantohuman transmission caused by the severe acute respiratory syndrome coronavirus 2 (SARSCoV2). Due to the exponential growth of infections, identifying patients with the highest mortality risk early is critical to enable effective intervention and prioritisation of care. Here, we present the COVID19 early warning system (CovEWS), a risk scoring system for assessing COVID19 related mortality risk that we developed using data amounting to a total of over 2863 years of observation time from a cohort of 66 430 patients seen at over 69 healthcare institutions. On an external cohort of 5005 patients, CovEWS predicts mortality from 78.8% (95% confidence interval [CI]: 76.0, 84.7%) to 69.4% (95% CI: 57.6, 75.2%) specificity at sensitivities greater than 95% between, respectively, 1 and 192 h prior to mortality events. CovEWS could enable earlier intervention, and may therefore help in preventing or mitigating COVID19 related mortality.
Introduction
The coronavirus disease 2019 (COVID19) pandemic has recently emerged as a major and urgent threat to healthcare systems worldwide. Since early reports of its outbreak in China in December 2019, the number of global cases has risen to over 21 million known infections and resulted in over 750,000 deaths worldwide as of August 16, 2020^{1}. Despite public health efforts aimed at improving testing^{2}, developing potential vaccines^{3}, and improving prevention strategies^{4}, the disease is placing a significant burden on healthcare systems and existing resources in many countries, particularly where its spread has not been mitigated. Efficient early detection of patients likely to develop critical illness is thus crucial to optimise the allocation of limited resources, and monitor overall disease progression^{5,6}. The use of clinical predictive models from electronic health records (EHRs) can help reduce some of this burden and inform better decisions overall^{7,8,9,10}. For instance, a model able to predict in advance which patients are at higher risk of mortality may help ensure resources are prioritised accordingly for these individuals. In addition, as more observational data are gathered, these models could be used both to discover new risk factors, as well as reveal interactions between existing factors, offering better insights and opportunities for appropriate intervention.
Several approaches have been proposed to determine potential risk factors that contribute to COVID19 mortality. Some of these approaches identify demographics and inflammatory markers associated with increased mortality^{11,12}, but do not account for risk factors potentially changing over time. Moreover, many existing analyses are limited to a single source of data, often from a single hospital, for both learning a model and predicting a patient’s risk which may limit the generalisability of these analyses^{13}. Other more traditional measures of patient prognosis such as sequential organ failure assessment (SOFA) scores^{14} are based on examining a fixed set of risk factors not specifically adapted to COVID19; such measures fail to account for relevant changes in patient status outside these risk factors, and therefore often do not reach high levels of sensitivity and specificity in identifying highrisk patients. Due to these challenges, to date, there does not yet exist a COVID19 risk score that (i) makes use of multiple, representative sources of data to account for patient heterogeneity, (ii) includes important shortterm and longterm risk factors that have a significant impact on mortality risk, (iii) reacts in real time to potentially rapid changes in patient status, and (iv) is adapted to risk factors relevant to COVID19.
To address these issues, we developed the COVID19 early warning system (CovEWS), a risk assessment system for realtime prediction of COVID19related mortality that we trained on a large and representative sample of EHRs collected from more than 69 healthcare institutions using machine learning. In contrast to existing risk scores, CovEWS provides early warnings with clinically meaningful predictive performance up to 192 h prior to observed mortality events, hence enabling critical time to intervene to potentially prevent such events from occurring. Since CovEWS is automatically derived from patient EHRs, it updates in real time without any necessity for manual action to reflect changes in patient status, and accounts for a much larger number of risk factors correlated with COVID19 mortality than existing risk scores. CovEWS was derived from the deidentified EHRs of 66,430 diverse COVID19 patients, and is based on a timevarying neural Cox model that accounts for risk factors changing over time and potential nonlinear interactions between risk factors and COVID19related mortality risk. While these extensions have been pursued in^{15} and^{16} separately, they have neither been considered in combination nor in the context of COVID19 risk scoring using EHRs. We demonstrate experimentally that the predictive performance of CovEWS is superior to existing generic risk scores, such as SOFA^{14}, COVID19 specific risk scores, such as the machine learning models from Yan et al.^{17} and Liang et al.^{18}, and COVER_F^{19}, and a timevarying Cox model with linear interactions^{20}. We additionally show that the gradient information of our differentiable CovEWS model can be used to quantify the influence of the input risk factors on the output score in real time. CovEWS may enable clinicians to identify highrisk patients at an early stage, and may, therefore, help improve patient outcomes through earlier intervention.
Results
COVID19 early warning system (CovEWS)
CovEWS is a clinical mortality risk prediction system for COVID19 positive patients to be used in a continuous manner in both inpatient and outpatient settings. CovEWS uses clinical risk factors from a patient’s EHR to automatically calculate a mortality risk score between 0 and 100 that indicates the current risk percentile that this patient is in relative to the reference cohort (see “Model” for a mathematical definition of CovEWS). A CovEWS score of 90 indicates, for example, that the patient has a higher COVID19 related mortality risk than 90% of COVID19 positive patients in the reference cohort. An important property of CovEWS scores is that they always reflect the momentary risk of patients in their current states, and that they update instantaneously to reflect relevant, EHRderived changes, which is a key differentiator of CovEWS compared to existing COVID19 related mortality risk prediction systems that are not designed to take into account new, incoming clinical evidence. Figure 1 demonstrates the application of CovEWS to two contrasting patient timelines (a deteriorating patient that ultimately died and a patient that initially deteriorates but then recovers) by visualising a selected number of clinical risk factors, such as respiratory rate, oxygen saturation, and creatinine levels, alongside the corresponding momentary risk assessment output by CovEWS. As shown in Fig. 1, CovEWS additionally maintains a high degree of interpretability for clinicians by indicating the relative positive and negative influences of each clinical risk factor over time on the predicted risk score (see "Feature Importance”). The information conveyed by CovEWS can be used to quickly and objectively assess individual COVID19 related mortality risk in order to prevent or mitigate mortality, and optimise prioritisation of scarce healthcare resources.
To develop CovEWS, we used EHR data from two federated networks of US and international healthcare organisations (HCOs), Optum (US) and TriNetX (US + international), that include deidentified EHRs containing data on demographics, clinical measurements, vital signs, lab tests and diagnoses of 47,384 and 5005 patients seen between March 21st and June 5th 2020 (11 weeks) and March 21st and June 25st 2020 (13 weeks), respectively. To demonstrate the generalisability of predictions made by CovEWS, we limited the training of CovEWS to a training cohort of 23,692 (50%) patients from the Optum cohort, used 9477 (20%) Optum patients for model selection, and evaluated CovEWS against both a heldout test cohort of 14,215 (30%) patients from the Optum cohort and a separate external test cohort consisting of the entire TriNetX cohort of 5005 (100%) patients (Table 1, stratification details in “Stratification”). In addition, we collected supplementary EHR data on new patients diagnosed with COVID19 between June 6th to July 13th 2020 (5 weeks) from Optum—the Optum future cohort (14,041 patients)—after CovEWS had been trained to demonstrate the robustness of CovEWS under rapidly changing treatment regimes and other temporal effects. We note that during this period, the RECOVERY Collaborative Group reported results of randomised clinical trials demonstrating the lack of efficacy of hydroxychloroquine^{21} and the efficacy of dexamethasone^{22} in COVID19 patients on June 5th 2020 and June 16th, respectively—which significantly impacted clinical treatment practice of COVID19 patients. The COVID19 diagnoses of all included patients were confirmed based on the presence of positive SARSCoV2 test results and/or COVID19 diagnostic codes in their EHRs (Data Collection). The data formats were normalised across the two federated networks of HCOs (Data Collection), and all data were preprocessed to address the missingness that is characteristic for realworld clinical data (Preprocessing).
Predictive performance for different prediction horizons
We compared the predictive performance of CovEWS, several baselines and existing risk prediction scores (“Baselines”), including a version of CovEWS based on a linear timevarying Cox model^{20} (CovEWS [linear], “Timevarying Covariates”), COVID19 Estimated Risk for Fatality (COVER_F)^{19}, Sequential Organ Failure Assessment (SOFA)^{14}, Modified Early Warning Score (MEWS)^{23}, the decision tree developed by Yan et al.^{17} and the deep learning model developed by Liang et al.^{18}, in terms of their respective specificity for identifying COVID19 related mortality with a conservative fixed sensitivity of at least 95% and a slightly more relaxed level of 90% at a minimum of 1, 2, 4, 8, 16, 24, 48, 96 and 192 h (8 days) prior to observed mortality events on both the holdout test data of Optum cohort and the external test cohort from the TriNetX network (Figs. 2). The last observed EHR entry’s date was taken as a reference time for those patients that did not have an observed mortality event during the data collection period. In terms of specificity at a sensitivity greater than 95%, we found that CovEWS significantly (p < 0.05, onesided MannWhitneyWilcoxon with Bonferroni correction, see Supplementary Data 1 and Supplementary Data 2) outperformed other baselines and existing risk prediction scores at each prediction horizon and on both the Optum and TriNetX cohorts with few exceptions. By comparing the predictive performances of the mortality prediction scores at different time horizons, we additionally quantified the degree to which risk prediction methods give more accurate predictions when the mortality event is closer to the prediction date. For example, the predictive performance of CovEWS in terms of specificity at a sensitivity greater than 95% dropped from 89.3% (95% confidence interval [CI]: 83.0, 91.6%) to 70.5% (95% CI: 65.6, 76.4%) and from 78.8% (95% CI: 76.0, 84.7%) to 69.4% (95% CI: 57.6, 75.2%) from 1 h to 192 h prior to an observed mortality event on the heldout Optum test cohort and the external TriNetX test cohort, respectively. When comparing the predictive performance across the heldout Optum test cohort and the external TriNetX test cohort, we saw the same trends in performance. However, all methods were roughly 10% less specific at greater than 95% sensitivity. This difference persisted even in those risk assessment systems that were not originally trained on the Optum training cohort, such as COVER_F. We thus attributed this apparent difference in performance not to overfitting to the Optum training cohort, but to (i) the difference of 5.38% against 6.91% in baseline mortality between the heldout Optum test cohort and the external TriNetX test cohort, respectively, and (ii) the higher degree of missingness in shortterm mortality risk factors, such as, e.g., respiratory rate, SpO_{2} and blood pressure, in TriNetX (Table 1). In addition to assessing predictive performance, we also evaluated the calibration^{24} of the risk scores predicted by CovEWS. We found that CovEWS overestimates mortality risk when interpreted as the probability of a mortality event occurring within the next 24 h because patients’ states may change between the prediction time and the end of the prediction horizon (Preprocessing).
Predictive performance for different subgroups
We also compared the predictive performance of CovEWS against the baselines and existing scores across various ethnic subgroups, on patients that were not hospitalised, and on the Optum future cohort (Fig. 2; cohort statistics in Table 2). Overall, across each of these cohorts, we found that CovEWS significantly (p < 0.05, onesided MannWhitneyWilcoxon with Bonferroni correction) outperformed all of the baselines at each prediction horizon with the sole exception being the 96 and 192 h prediction horizons on the Optum future cohort—where the performance difference was not in all cases significant. The predictive performance difference between CovEWS and other risk assessment methods was more pronounced in the Caucasian and African American subgroups than in the Asian subgroup, which is likely reflective of the fact that several baselines have been developed using data from predominantly Asian populations. On the subgroup of patients that was not hospitalised, we found that, although lower than the overall performance on the entire Optum test set, CovEWS maintained a high level of performance. We attributed the lower performance on the nonhospitalised group compared to the overall Optum test set to (i) the considerably higher missingness in this patient group caused by nonhospitalised patients not being monitored as closely as hospitalised patients (Supplementary Table 4), and (ii) the overall considerably lower mortality rate in this patient group. Respectable performance on the nonhospitalised patient group is particularly important since the majority of COVID19 patients are treated in an outpatient setting. In addition, when evaluating the various risk assessment methods on the Optum future cohort, we found that CovEWS was largely robust to changes in treatment policies and other temporal effects. A notable anomaly was the 96 and 192 h prediction horizons where the variance in our performance estimates was relatively high since fewer patients with recorded mortality outcomes and longterm monitoring data were available due to the shorter data collection time (5 weeks) of the Optum future cohort compared to the Optum test set (11 weeks) and the TriNetX test set (13 weeks). CovEWS likely remained robust to changes in treatment protocols because they improved outcomes by reducing the occurrence of critical patient states.
Stratified timevarying survival analysis
As illustrated in the examples in Fig. 1, CovEWS continuously varies over time since it accounts for the status of patients deteriorating or improving. To add to the analysis of the predictive performance of CovEWS in identifying the mortality of individual patients at fixed prediction horizons prior to observed mortality events presented in the previous paragraph, we therefore additionally evaluated whether CovEWS enables stratification of highrisk patients continuously over time (Fig. 3). To do so, we stratified the heldout Optum test cohort and the external TriNetX cohort into five strata of the CovEWS score, respectively, assigned to each patient (“Predictive Performance for Different Subgroups”). We found that CovEWS effectively separated patients into risk groups with distinct COVID19 related mortality risk profiles, as patients that were assigned to higher strata of CovEWS scores were more likely to die across all strata over the course of their disease. When comparing stratification results between the heldout Optum test cohort and the external TriNetX cohort, we observed that the ability to stratify patients into risk groups generalised across the two datasets—indicating that the predictive performance of CovEWS can transfer to other sources of data collected with different protocols, from different locations, and under different treatment policies. We also observed that the highest risk stratum of patients assigned CovEWS scores between 90 and 100 was considerably steeper than other strata in the heldout Optum test cohort and this anomaly did not persist to the same degree in the external TriNetX cohort. Qualitatively, we reasoned that this difference between the two datasets was due to the considerably higher missingness of shortterm risk factors associated with mortality, such as, e.g., respiratory rate, SpO_{2} and blood pressure, in the TriNetX cohort (Table 1). Rapid changes in these shortterm risk factors often result in substantially increased nearterm mortality risk and CovEWS scores reflected this increased risk immediately (Fig. 1), moving patients with extreme shortterm risk indicators into the highest risk stratum. Since these shortterm risk factors were not included as frequently in the TriNetX cohort, CovEWS was considerably less able to react to shortterm deteriorations in the status of the patients, which was reflected in a relatively flatter timevarying survival curve of the highest risk stratum in the TriNetX cohort.
Discussion
We developed and validated CovEWS, a realtime early warning system for predicting mortality of COVID19 positive patients, using routinely collected clinical measurements and laboratory results from EHRs. Our validation aimed to evaluate CovEWS and other existing prediction systems for COVID19 related mortality risk prediction in realworld conditions, which include among others missingness, variations in treatment policies and differences in treated populations at different sites, as observed in a large and representative cohort across multiple hospitals. When compared to existing prediction systems, our method not only provides accurate mortality predictions for each patient, but also provides a realtime early warning system of up to 192 h (8 days) prior to an observed mortality event for individuals, while identifying clinicallyrelevant factors for predictive performance. These results are sustained across various ethnic groups and cohorts. Notably, in comparison to existing mortality risk scoring systems, our method achieves significantly higher performance in terms of specificity at greater than 95% sensitivity across all evaluated prediction time frames, and generalises well to data collected under different treatment and data collection policies and environmental conditions. The implications of providing such an early warning system are significant. The provided risk assessment could potentially broadly aid in clinical decisionmaking, as well as in the prioritisation of care and resource allocation. More specifically, CovEWS could enable clinicians to intensify monitoring and therefore initiate treatments earlier in patients with a higher risk of mortality. Moreover, as an additional information source, CovEWS could also help clinicians to decide when to initiate palliative care to improve the quality of remaining life for patients with this need. Additional studies investigating if and how CovEWS can influence clinical decisionmaking would be necessary to improve both treatment outcomes, the involvement of palliative care, or resource allocation to reduce COVIDrelated mortality.
Before applying CovEWS in clinical practice, it is important to decide and calibrate appropriate warning thresholds, e.g., at the 85%, 90% or 95% sensitivity level (Thresholds). Especially when hospitals are overwhelmed and need to strictly allocate resources, alarm fatigue due to illcalibrated thresholds ought to be minimised. Beyond thresholds, it is also important to choose appropriate integration points with existing clinical workflows. While such integrations are highly dependent on existing guidelines and context, key timepoints along the patient journey, such as pretesting, admission, discharge, prior to and after significant interventions such as intubations, and when monitored in critical care on a continuous basis, could be potentially sensible times to assess a patient’s CovEWS scores. In addition, while the data used in this study already comprises multiple hospitals, a further analysis including hospitals from other countries would be useful to investigate the impact of geographic and cultural differences—particularly in those geographic contexts that are not well covered by this study. Due to differences in data collection methodology and expected data formats, another limitation of this study is that the implementation of some existing risk scoring systems is based on certain assumptions that may adversely influence their comparative performance (Baselines). Another limitation of this study related to data collection is that the death records in our datasets may not have been complete for all patients, because HCOs may not in all cases have been aware of deaths that happened to patients that were not under their care anymore. Additionally, the do not resuscitate (DNR) status of patients, which may have significant ramifications for their mortality^{25}, was not available as an input to CovEWS. Similarly, not all important covariates were available for all patients at all time points since our evaluation was based on patient data collected in real care environments, where missingness is pervasive (Table 1). To handle missingness, we employed multiple imputation by chained equations (MICE)^{26} (Preprocessing) which was recommended by Gerry and colleages^{27} for handling missing information in early warning scoring systems. Since imputation errors may have influenced the performance of prediction models, we performed additional analyses on several subsets of patients that had fewer missing covariates to ensure the prediction performance of any specific prediction system was not disproportionally affected by imputation errors (Supplementary Note 1). While there are, in general, various important considerations in evaluating risk prediction systems that rely on different input covariates under missingness, we consider the presented evaluation approach to be the most representative for the envisioned potential clinical use of CovEWS (Supplementary Note 2). Beyond considerations around missingness, we note that this work only concerns risk scores from routinely collected clinical data and patients who are already seeking care at healthcare providers. For efficient mitigation of COVID19, additional, potentially preventative efforts like tracking apps, risk scores of infection prior to admission, masks and social distancing are necessary.
It is also important to acknowledge upfront the pitfalls of mortality prediction of hospitalised patients. A significant proportion of patients who die in the hospital, do so after cessation of treatment. One may argue that models that predict mortality thus actually predict the likelihood of treatment discontinuation. Numerous factors go into the decision with regard to continuing or stopping interventions, including whether the outcome, if the patient were to survive, is aligned with the patient’s preferences. It will only be accurate in a clinical context where clinicians make predictions in a similar manner, where patients share the same values and preferences around the quality of life, and where the decisionmaking process resembles that of the training cohort.
From the perspective of medical staff, prognostication, as well as the perception of the quality of life if the patient were to survive, determine the framing of patient status to the family and friends; these are vulnerable to bias, both conscious or unconscious and influence the decision to admit the patient to the intensive care unit, as well as the decision to discontinue treatment (which almost certainly lead to death among those who are most severely ill). In a perfect world without bias and health disparities, only patient and disease factors determine hospital mortality, but studies have repeatedly demonstrated that this is far from the case. Recently, mortality from critical illness has been shown to be higher in disproportionately minorityserving hospitals after adjustment for illness severity and other biological factors that pertain to the patient and to the disease^{28,29}. It is nearly impossible to incorporate these factors precisely in a model that is trained on mortality as an outcome. As a decision support tool to inform discussion around goals of care, CovEWS is subject to the same limitations that mortality prediction models have—it may permeate or even magnify existing health disparities and provider bias. As an early warning system, however, we speculate that the impact of the exclusion of social determinants on model performance is acceptable.
In summary, we presented, developed, and experimentally validated CovEWS, a realtime early warning system that provides clinically meaningful predictions of COVID19 related mortality up to 192 h (8 days) in advance for individual patients using routinely collected EHR data. In contrast to existing risk scoring systems, CovEWS provides realtime continuous risk assessment that accounts for a large set of shortterm and longterm risk factors associated with COVID19 related mortality, is automatically derived from readily available EHR data, and was externally validated using data from multiple hospitals, diverse patient groups, and across time frames. In terms of influential covariates, a number of known risk factors, including age, SpO_{2}, blood pressure, ischaemic and other heart diseases, hypertension, white blood cell count, neutrophils, lymphocytes, highsensitivity creactive protein, creatinine, lactate dehydrogenase, albumin, pH and PCO_{2}, were found to be significantly associated with mortality outcomes in the Optum training fold (Table 2). Accessible risk assessment from readily available EHRs is especially important in the ongoing COVID19 pandemic since access to advanced clinical lab testing and imaging techniques may be limited in many hospitals. CovEWS allows for critical time in clinical decision making, even without access to specialised lab tests or advanced diagnostic equipment. Prospective studies are needed to conclusively establish if the availability of early warnings for COVID19 related mortality through CovEWS improves patient outcomes compared to the standard of care.
Methods
Overview
The overall pipeline of the method is shown in Supplementary Fig. 1. We refer to “Nonlinear Timevarying Covariates” for a detailed presentation of the predictive model used by CovEWS and Supplementary Fig. 3 for a detailed diagram of the model architecture.
Data collection
We used data collected by two federated networks of healthcare organisations:
Optum
The Optum deidentified COVID19 electronic health records database includes deidentified electronic medical records and clinical administrative data including bedside observations and laboratory data from a geographically diverse set of healthcare institutions in the United States (US). The EHR data was sourced from more than 45 provider groups and integrated delivery networks. We used Optum cohort data collected between 21st March and 5th June 2020, and another cohort separated in time from 6th June to 13th July 2020 for our analysis.
TriNetX
TriNetX is a global health research network providing a deidentified dataset of electronic medical records (diagnoses, procedures, medications, laboratory values, genomic information) including patients diagnosed with COVID19. The data is deidentified based on standard defined in Section §164.514(a) of the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. The process by which Data Sets are deidentified is attested to through a formal determination by a qualified expert as defined in Section §164.514(b)(1) of the HIPAA Privacy Rule. We used TriNetX cohort data collected between 21st March and 25th June 2020 from 24 healthcare organisations in the US, Australia, Malaysia and India for our analysis.
Data quality
Both Optum and TriNetX, as well as the data providing healthcare institutions applied quality control steps to their data, but these procedures are not standardised neither across the federated networks nor across healthcare institutions. Varying levels of data quality across EHRs collected at different healthcare institutions and networks are therefore expected. However, heterogeneous data quality standards are characteristic for realworld data collected at different healthcare institutions. By evaluating CovEWS against an external test cohort from healthcare institutions with data collection policies different from our training cohort, we are able to give a fair assessment as to how robust and transferable CovEWS is in presence of realistic variations in data quality.
Inclusion criteria
We only included patients that were COVID19 positive in our analysis. In both datasets, we considered patients COVID19 positive if they either (i) were diagnosed with any of the International Statistical Classification of Diseases and Related Health Problems 10th revision (ICD10) codes J12.89, J20.8, J40, J22, J98.8 and J80 together with B97.29 or (ii) had a positive COVID19 lab test result (Supplementary Table 1). The criteria listed in (i) correspond to the Centers for Disease Control and Prevention (CDC) COVID19 coding guidelines effective February 20, 2020^{30}. A sensitivity analysis performed in a recent epidemiological study using the Optum Research Database concluded that there were no significant differences in observed outcomes between COVID19 patients that were included based on their recorded diagnoses and those included based on positive SARSCoV2 test results^{31}. For patients identified as COVID19 positive via ICD diagnosis codes, we used the date of diagnosis as the reference diagnosis date for our analyses. For those patients identified as COVID19 positive via a positive lab test, we used the date of the test sample collection as the diagnosis date. For patients with both a positive COVID19 lab test and diagnosis, the available diagnosis date took precedence. For the subgroup of patients that were not hospitalised, we included all patients that were neither admitted to a hospital as inpatients nor an intensive care unit (ICU) at any point according to their EHRs. We note that it is possible that hospitals did not in all cases record inpatient hospital admissions and ICU admissions in their respective EHRs—which may explain the observed nonzero rate of intubations in the nonhospitalised group. Membership in the Asian, Caucasian and Black or African American subgroups was mutually exclusive in the underlying EHR data model, and a patient could therefore only be assigned to one of the subgroups. In contrast, hispanic ethnicity was assigned in conjunction with any of the previous ethnic categorisations. We note that several ethnic subgroups were relatively small and therefore likely not representative of all patients in those groups.
Feature selection
We selected EHRderived covariates for inclusion as input variables for CovEWS based on (i) previously published research on clinical risk factors for COVID19^{32,33,34}, and (ii) expert input from several medical professionals involved in the treatment of COVID19 patients. In addition, we aimed to include both shortterm and longterm risk factors for COVID19 related mortality due to the continuous realtime evaluation of CovEWS. We note that the do not resuscitate (DNR) status of patients, which may have significant ramifications for their mortality^{25}, was not included in our datasets. In our experimental evaluation, information from diagnostic codes was only used from the time point it was entered into the patient EHR. 85.0% and 72.5% of all diagnoses observed in the TriNetX and Optum cohorts, respectively, were available before inclusion of a patient into the cohort based on a COVID19 diagnosis or positive SARSCoV2 test result. The remainder of diagnoses was added after the COVID19 diagnosis of the patient. We present the list of all included model input covariates including their pvalues in Supplementary Table 2, and their distributions across the datasets in Supplementary Table 1.
Data characteristics
The cohort statistics of the two datasets are presented in Supplementary Table 1, the ICD9 and ICD10 codes corresponding to the diagnoses shown in Supplementary Table 1 are given in Supplementary Table 3, and the number of observations per patient for timevarying covariates for the two datasets is visualised in Supplementary Fig. 2. As is characteristic for clinical data collected in realworld contexts, missing covariates are common in both datasets. Missingness in realworld EHR data is caused primarily by differences in laboratory testing guidelines, data collection practices, available testing resources and measurement devices between hospitals, and may in some cases depend on patient status and preferences of clinical staff. For example, Supplementary Table 4 compares the missingness between the Optum test set and the nonhospitalised patient subgroup of the Optum test set. In contrast to traditional clinical studies, realistic missingness patterns in both the training and evaluation datasets are a desirable feature in the context of our study as CovEWS is designed to be deployed in clinical contexts with similar missingness, and therefore has to be trained and evaluated in the presence of missingness patterns seen across a representative range of heterogeneous hospitals. Covariates were mostly balanced across the Optum and TriNetX datasets. The primary differences were a higher observed mortality rate, and higher ratios of intubations, connective tissue disease, and rheumatoid arthritis in the TriNetX data compared to the Optum data. In addition, we note that the majority of admissions were recorded as being of unknown type in the TriNetX database. Since the large fraction of unknown admission entries limited potential admission outcome analyses, we reported hospital and ICU admission outcomes as not available for TriNetX (Supplementary Table 1). In compliance with the HIIPA Privacy Rule Section §164.514(a), patients’ exact dates of death were not available to protect patient privacy. In our analysis, we therefore imputed the last recorded EHR entry date as the reference date of death for deceased patients. The actual dates of death may have happened at a later point, and our performance estimates are therefore potentially underestimating actual predictive performance, since (i) correct predictions that happened later would mean CovEWS predicted sooner than we thought for that patient (which is generally harder, see Supplementary Fig. 2), and (ii) incorrect predictions of CovEWS may actually have been outside of the prediction time horizon. We believe this approximation of the exact date of death is therefore an acceptable tradeoff, since underestimation of performance is not as much a concern as overestimation would be.
Data normalisation
The EHR data across both data sources used two different, but compatible, underlying data models consisting of recorded diagnoses, demographics, lab tests, procedures, medications and clinical observations. For our risk factors of interest, we converted records from both datasets into a unified data representation. We used ICD9 and ICD10 to extract diagnoses (Supplementary Table 3), Logical Observation Identifiers Names and Codes (LOINC) to extract lab tests, Current Procedural Terminology (CPT), and ICD9 Clinical Modification (ICD9CM) and ICD10 Procedural Coding System (ICD10PCS) to extract intubation events from the EHR records. For lab tests, we additionally normalised the unit of each category of lab tests to be the same for each measured record of that category.
Stratification
We split the Optum cohort used for model development into training (50%), validation (20%) and heldout test folds (30% of all patients) at random stratified by patient age, gender, presence of mortality events, presence of intubation events, presence of ICU admission and presence of a human immunodeficiency virus (HIV) diagnosis. We added HIV to the set of stratification covariates since its low prevalence could otherwise have led to imbalances in this risk factor across the folds. Stratification produced balanced cohorts across the three folds (Supplementary Table 1). The Optum training fold was used to train CovEWS, the validation fold was used to select the optimal hyperparameter configuration for CovEWS, and the heldout test fold was used in addition to the external TriNetX test cohort to evaluate the outofsample generalisation performance of CovEWS.
Preprocessing
Discrete covariates with p different values were transformed into their onehot encoded representation with one out of p indicator variables set to 1 to indicate the discrete value for this patient. All continuous features were standardised to have zero mean and unit standard deviation using observed covariate distribution on the Optum training fold. Missing values of continuous covariates were imputed in an iterative fashion using multiple imputation by chained equations (MICE)^{26}. MICE was recently independently recommended by^{27} for handling missing information in early warning scoring systems. MICE models were derived from the Optum training set, and at least 800 patients (≈3.38% of the training set) were available for this purpose for all included covariates. After the preprocessing stage, continuous input features were standardised and fully imputed, and discrete input covariates were onehot encoded. All preprocessing operations were derived only from the training fold, and naïvely applied without adjustment to other folds and datasets in order to avoid information leakage.
Model
We adopt a variation of the widely used Cox proportional hazard model that is adapted to accommodate nonlinear and timevarying effects of covariates on the loghazard function. In the following, the basics of timetoevent analysis that is the main subject of this paper is briefly presented. Then we touch upon the Cox proportional model for continuoustime covariates that is followed by the modifications we applied to this model to prepare it for this work.
Survival analysis
Survival analysis which is also known as TimeToEvent (TTE) analysis included a large body of work consisting of mathematical tools to give a statistical analysis of the time duration until a specified event occurs. In this work, the event is defined to be the time when a patient dies.
An important tool in timetoevent analysis is hazard function. In discretetime setting, (e.g., if times are given in specified periods) the hazard function is a conditional probability defined in discretetime as
that represents the risk of dying at time t if the patient has survived until that time. The relevant covariates of the patients up to time t are encapsulated in the vector \({\bf{x}}\in {{\mathbb{R}}}^{d}\). Age, sex and lab tests are examples of such covariates that can take either binary or standardised real values after preprocessing. Intuitively, the hazard function captures the underlying dynamics of the transition of the condition of the patient from alive to dead. The larger h is at time t, the more likely it is for the patient to die at time t.
Another useful function is called survival function that is denoted by S(t) and in discretetime defined as
Similar functions can be defined in the continuoustime regime. Let T_{c} be the continuous survival time with the probability density function f_{c} and cumulative distribution function F_{c}. Similar to Eq. (2), the continuous survival function represents the probability of surviving until time t that is defined as
Likewise, the continuous hazard function is defined as
Notice that unlike discrete hazard function Eq. (1), the continuous hazard function Eq. (4) is not a probability distribution and can take values larger than one.
The last useful function in continuous survival analysis is the cumulative hazard function
The connection between these quantities can be simply derived:
Before introducing the simple yet flexible Cox model, we discuss a few important issues that must be taken into account in survival analysis.
Censoring. What makes the survival analysis different from a simple regression from the covariates x to T—observed duration up to the occurrence of the event—is the concept of censoring. An observation is called censored—or more precisely rightcensored—if its survival time has not been fully observed. There are several causes for a censored observation. For example, if a patient is not under observation when the event occurs or if the information of the patient is lost for some reason, only a lower bound to the timetoevent T is observed that is the last time the condition of the patient is recorded.
Discrete vs. continuous. Although time is a continuous physical quantity, in practice, it is measured at discrete points. Especially, in medical sciences, the condition of the patient is measured on a regular daily or bidaily basis. This implies that even though the change of the covariates of a patient occurs at certain points of time, the exact time is not known. The transition point is only known up to the resolution of the measurement. We assume the time at which an event of interest occurs is denoted by T. As the resolution of the measurement is hours in the datasets used in this work, T = t refers to an event that occurs within the tth hour after the patient is admitted to the hospital and its health condition is recorded.
Ties. In the limited resolution measurement of time, some observations may have the same survival times, e.g., two patients die on the same day even though it is extremely unlikely that both die at the same moment. However, even in continuous time data, ties may occur which is a hint of underlying discrete sampling in time.
A major difference between continuous and discretetime survival analysis is that the hazard function is a probability distribution in discrete settings while it can take any positive value in continuous settings. However, the traditional continuoustime approach can still be used for discrete event times especially when the measurements are equally spaced.
Cox hazards model
The most widely known model in the analysis of continuous survival time is Cox’s proportional hazard model^{35} that parameterises the hazard function as
where h_{0} is called the baseline hazard that is modulated by the effect of covariates via \(\exp ({{\boldsymbol{\beta }}}^{{\mathsf{T}}}{\bf{x}})\). Notice that in the traditional Cox model Eq. (9) the covariates x are assumed constant over time. Consequently, the temporal variation of the hazard function is separated from the influence of the covariates.
Timevarying covariates
In many experimental settings, the assumption of timeinvariant covariates in Eq. (9) does not hold. For example, many entries in the electronic health records such as heart rate, temperature, and blood measurements do not remain constant over the course of the hospitalisation of a patient. Therefore, the traditional Cox model Eq. (9) is extended to a timevarying setting by replacing x in Eq. (9) with x_{t} that is the measured covariates at time t. Assume a dataset consists of N patients indexed by n = 1, 2, …, N. As a notational convention, \({{\bf{x}}}_{t}^{(n)}\) denotes the vector of the corresponding covariates to the patient n at time t.
If the Cox model holds and continuous events are observed, the following function called partial likelihood is maximised to estimate β:
where t_{1} < t_{2} < … < t_{k} are the ordered times at which the events occur and \({{\bf{x}}}_{{t}_{1}}^{(1)},{{\bf{x}}}_{{t}_{2}}^{(2)},\ldots ,{{\bf{x}}}_{{t}_{k}}^{(k)}\) are the corresponding set of covariates at those times. Notice that the equality of the superscript of the covariate vector \({{\bf{x}}}_{{t}_{i}}^{(i)}\)(patient’s index) and the subscript of time t_{i} emphasises the continuous event times and the fact that at most one patient experiences the event at each time. For the moment, we assume time is continuous that results in distinct event times. The set \({\mathcal{R}}({t}_{i})\) is the set of the patient’s indices that are at risk at time t_{i}. Being at risk means they are alive and can potentially experience the event.
Nonlinear timevarying covariates
One clear limitation of Eq. (10) that is caused by the definition of the hazard function Eq. (9), is the fact that the exponent of the modulating function \(\exp ({{\boldsymbol{\beta }}}^{{\mathsf{T}}}{\bf{x}})\) is a linear function of x. Hence higher order interactions among different dimensions of the covariate vector cannot be captured by this method. To improve the expressiveness of the model, we replace the linear function β^{T}x with a nonlinear function realised by a neural network. Let \(\phi (\cdot ;{\boldsymbol{\theta }}):{{\mathbb{R}}}^{d}\to {\mathbb{R}}\) be the function implemented by the neural network and parameterised by θ. Therefore, the hazard model is represented as
where h_{0}(t) is the baseline populationlevel hazard that is independent of the associated covariates to each patient. Timevarying covariates are transformed by the function ϕ(⋅; θ) to loghazard. The parameters θ are learned via maximising the partial loglikelihood^{35}. Despite traditional Cox proportional hazard model where the gradient and Hessian can be computed analytically, here, we use automatic differentiation to compute gradients with respect to θ. The nonlinear function ϕ(⋅; θ) is implemented as a 2layer multilayer perceptron—see "Model” for a detailed description. The hazard function Eq. (11) estimates the instantaneous risk of death at each time for each patient. Integrating with respect to time and exponentiating the result gives the survival function defined as
Notice that x_{0:t} denotes the set of covariates until time t, meaning that, the probability of survival up to time t depends on the history of the covariates.
The partial likelihood Eq. (10) is rewritten as
To give an intuition of Eq. (13), observe that the partial loglikelihood that is computed by taking logarithm of the righthand side of Eq. (13) will consist of k terms corresponding to k observed events. The parameter vector θ is perturbed such that the hazard increases for the covariates of a patient who dies at time t_{i} while it decreases for the covariates of the patients who remain alive at t_{i}.
Resolving ties
Even though we adopt a continuoustime approach due to the nonnormalised parametric form of the hazard function Eq. (9) and the resultant partial likelihood Eq. (13), the ties can still occur as we work in hourly resolution. Hence, it is possible that two patients die at the same time. When an event occurs for two patients at the same time, the partial likelihood Eq. (13) is not valid anymore. Several methods exist in the literature to break the ties and remove the ambiguity such as average partial likelihood^{35} and Berslow’s method^{36} that lives on two ends of a spectrum. The former takes average among all possible orders of the events that can break the tie. Hence, it is the most accurate method but computationally prohibitive. The latter gives a partial likelihood almost exactly like the original Cox likelihood by assuming that every ordering of tied events results in the same partial likelihood. This method gives a crude estimate but is easy to implement. A midway approach that we adopted in this work is called Efron’s tiebreaker^{37}. In this method, a weighted average likelihood of tied cases is subtracted from the denominator of Eq. (13). Efron’s method gives good accuracy and is moderately easy to work with—see ref. ^{37} for details.
Algorithm details
Survival analysis by the Cox model is done via maximum likelihood estimation, where the aim is to maximise the logarithm of Eq. (10) in the original Cox’s proportional hazard model and Eq. (13) in the nonlinear extension. In the original method with linear exponent both gradient \(\partial \mathrm{log}\,{\mathcal{L}}/\partial {\boldsymbol{\beta }}\) and Hessian \({\partial }^{2}\mathrm{log}\,{\mathcal{L}}/{\partial }^{2}{\boldsymbol{\beta }}\) can be computed analytically. This is not the case for our proposed extension Equation (13) with nonlinear exponent. Instead of an analytical gradient, we use automatic differentiation to compute the gradient \(\partial \mathrm{log}\,{\mathcal{L}}/\partial {\boldsymbol{\theta }}\). Once the gradient is derived, an appropriate gradientbased method is used to perturb θ in the direction that increases the partial likelihood.
As mentioned in "Nonlinear Timevarying Covariates”, the linear exponent β^{T}x in Eq. (10) is replaced with a nonlinear function ϕ( ⋅ ; θ). We use a neural network with L hidden layers to realise this function. The employed network linearly transforms the input features to a N_{dim}dimensional hidden layer. The transformed features are passed through a leaky rectified linear unit (LeakyReLU)^{38} nonlinear activation function. The hidden activations are then transformed by a linear transformation to a single node and finally passes through a tangent hyperbolic (\(\tan {\rm{h}}\)) activation function. In summary the network function can be represented as
where θ = {W_{1}, W_{2}} and W_{i}, i = 1, 2 are the trainable weight matrices of the network (Supplementary Fig. 3). We used Xavier’s method^{39} to initialise the weights θ of the model. To prevent overfitting, we additionally applied dropout with a dropout probability of p_{dropout}. In our PyTorch^{40} implementation of CovEWS, we observed stable convergence of our model using the Adam^{41} optimiser with a learning rate of 0.001 for up to 100 epochs.
Hyperparameter optimisation
For the methods trained on the Optum training fold (CovEWS and CovEWS [linear]), we used a systematic approach to hyperparameter optimisation where each prediction algorithm was given a maximum of 15 hyperparameter optimisation runs with different hyperparameter configurations chosen at random without duplicates from predefined ranges (see Supplementary Table 5). Out of the models generated in the hyperparameter optimisation runs, we then selected the model that achieved the highest specificity at greater than 95% sensitivity on the validation set of the Optum cohort.
Postprocessing and calibration
After training CovEWS using the Optum training cohort, the predicted hazard for a patient with state x is the hazard function h(t∣x) Eq. (9) evaluated at t = 128 h (≈5.33 days) given the current patient covariates x and under the assumption that patient covariates stay constant. To produce CovEWS scores, we additionally apply postprocessing using a percentile transformation that converts h(t∣x) into the percentile of patient states in the Optum validation set that are assigned a lower h(t∣x) than the evaluated patient state x. We chose to output CovEWS scores in form of percentiles to aid in the clinical interpretation of CovEWS as a risk score relative to a representative set of reference states, and to discourage interpretation as a mortality probability. Interpretation of CovEWS scores as a mortality probability is difficult since the mortality risk of a patient depends on their uncertain future trajectory and the prediction horizon, and is influenced by clinical interventions that may be initiated in the future. As shown in Supplementary Fig. 1, patients’ states may change rapidly and frequently, and clinical interventions can significantly and positively alter the trajectory of patients. We also verified experimentally that, when interpreted as a probability of mortality, CovEWS scores overestimate the actually observed probability of death on the Optum and TriNetX test sets since patients’ states may improve, due to intervention or otherwise, between the prediction time and the end of the prediction horizon (Supplementary Fig. 4; similar results with CovEWS [linear] Supplementary Fig. 5). We, therefore, decided to instead output CovEWS scores as relative risk percentiles between 0 and 100 that discourages interpretation as a probability of mortality. To aid in the use of CovEWS, the following “Thresholds” outlines calibrated thresholds that can be used to maximise specificity at the desired target level of sensitivity for different prediction horizons.
Thresholds
A key question for clinical decision making is which threshold should be used for CovEWS scores to indicate severe risk, and potentially trigger an automated warning. To provide guidance in choosing the appropriate CovEWS score depending on the desired tradeoff between sensitivity and specificity, we evaluated the optimal observed thresholds of CovEWS scores for various target sensitivity levels using their respective receiver operator characteristic (ROC) curves for each prediction horizon (Supplementary Table 6). Optimal score thresholds to maximise specificity at high levels of sensitivity were between 61 and 36, 44 and 27, and 34 and 19 depending on the prediction horizon for target sensitivity levels greater than 85%, 90% and 95%, respectively. We note that lower thresholds are necessary to achieve high sensitivity for prediction horizons farther in the future as patients’ deterioration has to be identified earlier in its progression.
Feature importance
Highlighting the clinical risk factors that positively or negatively influenced CovEWS to output a certain score is of high utility as it enables clinical users to contextualise CovEWS scores, and, in some cases, these highlights could potentially even point towards opportunities for timely intervention. We utilised the differentiability of our prediction model as outlined in "Algorithm Details” to provide a realtime visualisation of the clinical covariates that are most important for CovEWS at any given time point (see Supplementary Fig. 1 for an example). To compute the importance scores at each time point, we used the Integrated Gradients (IG)^{42} method that calculates relative importance scores a_{i} ∈ (−100%, 100%) for each input feature x_{t,i} in the feature vector x_{t} with i ∈ [0. . d − 1] where d is the number of input features. We used IG with the mean feature vector \({\bar{{\bf{x}}}}_{t}\) across the Optum training set as a reference, calculated 50 intermediate steps for each explained x_{t}, and normalised a_{i} to the range of (−100%, 100%) by dividing each a_{i} by the sum \({{{\Sigma }}}_{i = 0}^{d1} {a}_{i}\) of all feature attributions for x_{t}. To obtain a timeline of attributions as shown in Supplementary Fig. 1, we calculate attributions a_{i} whenever a change in patient status was recorded in the patient’s EHR.
Baselines
In our analysis, we compared the performance of CovEWS to the following existing generic and COVID19 specific clinical risk scores, and baselines:
CovEWS (linear). A linear timevarying survival Cox model as described in “Timevarying Covariates” trained using the same Optum training set and using the same pipeline as the nonlinear CovEWS. We used the implementation provided in version 0.24.8 of the lifelines^{20} Python package.
COVID19 estimated risk for fatality (COVER_F). The COVER_F scoring system for COVID19 as described in^{19}. Since COVER_F uses a single flag for any heart disease diagnosis, we aggregated all diagnoses in the diagnosis categories ischaemic heart disease, pulmonary embolism, other heart diseases in our dataset into one single joint diagnosis code if any diagnosis in those three categories was present. All other input features used by COVER_F were direct matches with the input covariates of the same name also used by CovEWS (Supplementary Table 1).
Sequential organ failure assessment (SOFA). SOFA scoring is commonly used in clinical contexts to indicate the risk of organ failure in critical patients. We, therefore, used SOFA as a generic risk scoring baseline that was not specifically designed for COVID19 to demonstrate the comparative benefits in the predictive performance of a COVID19 specific risk scoring system. Since we did not have FiO_{2} values available in our EHR datasets, we assumed a default FiO_{2} value of 21% (equal to the fraction of oxygen in inhaled air) for patients that were not intubated, and an average of 71% for patients that are intubated (FiO_{2} is often set to 100% initially and then progressively lowered as the patient stabilises, see e.g., ref. 43 for an example). In addition, we did not have access to Glasgow coma scale (GCS) scores in the EHRs, and potential additional points from a high GCS score (a maximum of +4) were therefore not reflected in our calculated SOFA scores.
Modified early warning score (MEWS). The MEWS^{23} is a risk scoring system for patient deterioration used at bedside. Like SOFA, MEWS was not specifically designed for COVID19, and is therefore a generic risk scoring baseline. Since AVPU (Alert, reacting to Vocal stimuli, reacting to Pain, Unconscious) scores were not available in the EHRs, we assumed a default AVPU state of alert to calculate MEWS scores.
Yan et al. 2020.^{17} derived a simple and interpretable decision rule using three features for mortality prediction in COVID19 patients from data collected from 485 COVID19 positive patients seen in Wuhan, China. In their validation cohort, the decision rule showed a respectable crossvalidated prediction performance of 96.1 ± 0.03 (mean ± standard deviation, 5fold cross validation)^{17}. All input features used by Yan et al.^{17} were direct matches with the input covariates of the same name also used by CovEWS (Supplementary Table 1).
Liang et al. 2020.^{18} developed a prediction model for critical COVID19 related illness using data from 1590 patients seen at 575 medical centres in China using deep learning and 10 input covariates, including observed Xray abnormalities. On three external cohorts from different Chinese provinces, they reported a predictive performance in terms of concordance index (cindex) of 0.890, 0.852 and 0.967 for predicting critical illness under the missingness of input covariates, respectively. Since we did not have access to radiologic assessments in our EHR datasets, we evaluated their model with the Xray abnormality covariate missing for all evaluated patients (i.e., set to zero). To the best of our knowledge, Liang et al.^{18} did not specify which comorbidities were included in their collected dataset. However, their study reports a maximum of 6 comorbidities diagnosed in one patient. In our evaluation, we counted existing patient diagnoses of pulmonary embolism, kidney disease, inflammatory bowel disease, asthma, rheumatoid arthritis, and diabetes towards these 6 comorbidities.
Performance evaluation
In addition to the results presented in the main body of this work, we also present Receiver Operator Characteristic (ROC) curves for CovEWS for various prediction horizons between 1 and 192 h evaluated on the heldout Optum test set (Supplementary Fig. 7) and the external TriNetX test set (Supplementary Fig. 8), the same ROC curves for CovEWS (linear) (Supplementary Fig. 9 and Supplementary Fig. 10) a comparison of CovEWS, Time Varying Cox^{20}, COVER_F^{19}, SOFA^{14}, MEWS^{23}, Yan et al.^{17}, and Liang et al.^{18} at various prediction horizons in terms of AUC, AUPR, F_{1}, sensitivity, specificity and specificity at greater than 95% sensitivity (Spec.@95%Sens.) for predicting COVID19 related mortality on the heldout Optum test set (Supplementary Data 1), on the external TriNetX test set (Supplementary Data 2), on the Optum Future cohort (Supplementary Data 3), and on the Black or African American (Supplementary Data 4), Hispanic (Supplementary Data 5), Asian (Supplementary Data 6), Caucasian (Supplementary Data 7), nonhospitalised (Supplementary Data 8), Fibrin Ddimer (Supplementary Data 9), hsCRP (Supplementary Data 10), Gamma Glutamyl Transferase (Supplementary Data 11), IL6 (Supplementary Data 12), less than 6 missing covariates (Supplementary Data 13) and less than 9 missing (Supplementary Data 14) subgroups of the Optum test set.
Software
The source code used for developing CovEWS and for conducting the presented experiments and analyses was implemented using Python (version 3.7), scikitlearn (version 0.22.2), numpy (version 1.19.1), scipy (version 1.4.1), pandas (version 1.5.0), PyTorch (version 1.5.1) and lifelines (version 0.24.8). Plots shown were generated using the ggpot2 R package (version 3.3.1), the survival R package (version 3.112) and survminer (version 0.4.7).
Hardware
We used the highperformance computing (HPC) infrastructure provided by the Personalised Healthcare Informatics group at F. HoffmannLa Roche Ltd to run the presented experiments. The compute nodes used 1st and 2nd generation Intel Xeon Platinum 8000 series processors and had access to 72 GB random access memory (RAM) each.
Data availability
The data that support the findings of this study, the TriNetX COVID19 research and Optum deidentified COVID19 electronic health record databases, are available from TriNetX, LLC and Optum, Inc. (https://www.optum.com/) but thirdparty restrictions apply to the availability of these data. The data were used under license for this study with restrictions that do not allow for the data to be redistributed or made publicly available. However, for accredited researchers, the TriNetX COVID19 research and Optum deidentified COVID19 electronic health record databases are available for licensing at TriNetX, LLC and Optum, Inc., respectively. Data access may require a data sharing agreement and may incur data access fees.
Code availability
The code for CovEWS is available at https://github.com/d909b/CovEWS under the MIT license https://opensource.org/licenses/MIT^{45}.
References
World Health Organization. Coronavirus disease (COVID2019) situation reports. https://www.who.int/emergencies/diseases/novelcoronavirus2019/situationreports (2020).
Yan, Y., Chang, L. & Wang, L. Laboratory testing of SARSCoV, MERSCoV, and SARSCoV2 (2019nCoV): Current status, challenges, and countermeasures. Rev. Med. Virol. 30, e2106 (2020).
Lurie, N., Saville, M., Hatchett, R. & Halton, J. Developing Covid19 vaccines at pandemic speed. New Engl. J. Med. 382, 1969–1973 (2020).
Hellewell, J. et al. Feasibility of controlling COVID19 outbreaks by isolation of cases and contacts. Lancet Global Health 8, e488–e496 (2020).
Emanuel, E. J. et al. Fair allocation of scarce medical resources in the time of COVID19. Engl. J. Med. 382, 2049–2055 (2020).
Fauci, A. S., Lane, H. C. & Redfield, R. R. Covid19—navigating the uncharted. New Engl. J. Med. 382, 1268–1269 (2020).
Henry, K. E., Hager, D. N., Pronovost, P. J. & Saria, S. A targeted realtime early warning score (TREWScore) for septic shock. Sci. Transl. Med. 7, 299ra122–299ra122 (2015).
Wang, C. J., Ng, C. Y. & Brook, R. H. Response to COVID19 in Taiwan: big data analytics, new technology, and proactive testing. JAMA 323, 1341–1342 (2020).
Green, M. et al. Comparison of the Between the Flags calling criteria to the MEWS, NEWS and the electronic Cardiac Arrest Risk Triage (eCART) score for the identification of deteriorating ward patients. Resuscitation 123, 86–91 (2018).
Churpek, M. M., Snyder, A., Sokol, S., Pettit, N. N. & Edelson, D. P. Investigating the impact of different suspicion of infection criteria on the accuracy of qSOFA, SIRS, and early warning scores. Critical Care Med. 45, 1805 (2017).
Petrilli, C. M. et al. Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease 2019 in New York City: prospective cohort study. BMJ 369, m1966 https://doi.org/10.1136/bmj.m1966 (2020).
Ji, D. et al. Prediction for progression risk in patients with COVID19 pneumonia: The CALL Score. Clin. Infect. Dis. 71, 1393–1399 (2020).
Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID19 infection: systematic review and critical appraisal. BMJ 369, m1328 https://doi.org/10.1136/bmj.m1328 (2020).
Vincent, J.L. et al. The SOFA (Sepsisrelated Organ Failure Assessment) score to describe organ dysfunction/failure. Intensive Care Medicine 22, 707–710 (1996).
Zhang, Z., Reinikainen, J., Adeleke, K. A., Pieterse, M. E. & GroothuisOudshoorn, C. G. Timevarying covariates and coefficients in cox regression models. Ann. Transl. Med. 6, 121 (2018).
Ching, T., Zhu, X. & Garmire, L. X. Coxnnet: An artificial neural network method for prognosis prediction of highthroughput omics data. PLoS Comput. Biol. 14, e1006076 (2018).
Yan, L. et al. An interpretable mortality prediction model for COVID19 patients. Nat. Machine Intell. 2, 283–288 (2020).
Liang, W. et al. Early triage of critically ill covid19 patients using deep learning. Nat. Commun. 11, 3543 (2020).
Williams, R. D. et al. Seek COVER: development and validation of a personalized risk calculator for COVID19 outcomes in an international network. medRxiv https://doi.org/10.1101/2020.05.26.20112649 (2020).
DavidsonPilon, C. et al. Camdavidsonpilon/lifelines: v0.24.16. https://doi.org/10.5281/zenodo.3937749. (2020).
Horby, P. et al. Effect of Hydroxychloroquine in Hospitalized Patients with COVID19: preliminary results from a multicentre, randomized, controlled trial. medRxiv https://doi.org/10.1101/2020.07.15.20151852 (2020).
The RECOVERY Collaborative Group. Dexamethasone in hospitalized patients with covid19—preliminary report. New Engl. J. Med. https://doi.org/10.1056/NEJMoa2021436 (2020).
Subbe, C. P., Kruger, M., Rutherford, P. & Gemmel, L. Validation of a modified early warning score in medical admissions. Qjm 94, 521–526 (2001).
Van Calster, B., McLernon, D. J., Van Smeden, M., Wynants, L. & Steyerberg, E. W. Calibration: the Achilles heel of predictive analytics. BMC Med. 17, 1–7 (2019).
Fuchs, L. et al. Quantifying the mortality impact of donotresuscitate orders in the ICU. Crit. Care Med. 45, 1019 (2017).
White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 30, 377–399 (2011).
Gerry, S. et al. Early warning scores for detecting deterioration in adult hospital patients: Systematic review and critical appraisal of methodology. BMJ 369, m1501 https://doi.org/10.1136/bmj.m1501 (2020).
Danziger, J. et al. Temporal trends in critical care outcomes in us minorityserving hospitals. Am. J. Respirat. Crit. Care Med. 201, 681–687 (2020).
Rush, B., Danziger, J., Walley, K. R., Kumar, A. & Celi, L. A. Treatment in disproportionately minority hospitals is associated with increased risk of mortality in sepsis: a national analysis. Crit. Care Med. 48, 962–967 (2020).
Centers for Disease Control and Prevention. COVID19 coding guidelines effective February 20, 2020. https://www.cdc.gov/nchs/data/icd/ICD10CMOfficialCodingGudanceInterimAdvicecoronavirusfeb202020.pdf (2020).
Rizzo, S. et al. Descriptive epidemiology of 16,780 hospitalized covid19 patients in the united states. medRxiv https://doi.org/10.1101/2020.07.17.20156265 (2020).
Zheng, Z. et al. Risk factors of critical & mortal COVID19 cases: A systematic literature review and metaanalysis. J. Infect. 81, e16–e25 (2020).
Zhou, F. et al. Clinical course and risk factors for mortality of adult inpatients with COVID19 in Wuhan, China: a retrospective cohort study. Lancet 395, 1054–1062 (2020).
Richardson, S. et al. Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID19 in the New York City area. JAMA 323, 2052–2059 (2020).
Cox, D. R. Regression models and lifetables. J. Royal Statist. Soc. 34, 187–202 (1972).
Breslow, N. E. Analysis of survival data under the proportional hazards model. Int. Statist. Rev. 43, 45–57 (1975).
Efron, B. The efficiency of Cox’s likelihood function for censored data. J. Am. Statist. Assoc. 72, 557–565 (1977).
Xu, B., Wang, N., Chen, T. & Li, M. Empirical evaluation of rectified activations in convolutional network. Preprint arXiv:1505.00853 https://scholar.google.com/citations?user=nHh9PSsAAAAJ&hl=en (2015).
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics. 249–256 http://proceedings.mlr.press/v9/glorot10a.html (JMLR Workshop and Conference Proceedings, 2010).
Paszke, A. et al. Automatic differentiation in PyTorch. In Advances in Neural Information Processing SystemsAutodiff Workshop https://openreview.net/forum?id=BJJsrmfCZ (2017).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In International Conference on Learning Representations https://openreview.net/forum?id=8gmWwjFyLj (2015).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In International Conference on Machine Learning. 3319–3328 (2017).
Rachmale, S., Li, G., Wilson, G., Malinchoc, M. & Gajic, O. Practice of excessive FIO2 and effect on pulmonary outcomes in mechanically ventilated patients with acute lung injury. Respirat. Care 57, 1887–1893 (2012).
Therneau, T. M.A Package for Survival Analysis in R. R package version 3.112. https://cran.rproject.org/package=survival. (2020).
Schwab, P. et al. Realtime Prediction of COVID19 related Mortality using Electronic Health Records, d909b/CovEWS: v1.0.0. https://doi.org/10.5281/zenodo.4294874. (2020).
Acknowledgements
S.P. is supported by the Swiss National Science Foundation under P2BSP2_184359. L.A.C. is funded by the National Institute of Health through NIBIB R01 EB017205. BS is a member of the excellence cluster “Machine Learning in the Sciences” funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence StrategyEXC number 2064/1Project number 390727645. We thank Annika Buchholz for helpful discussions.
Author information
Authors and Affiliations
Contributions
A.M. and P.S. created the new software used in the work. P.S., A.M., S.P., L.A.C., J.H., M.H., S.B. were involved in the conception and design of the work, and the analysis and interpretation of results. J.H., L.A.C., and M.H. provided the clinical motivation and interpretation. All authors were involved in reviewing, drafting and/or editing of the manuscript. P.S. and S.B. supervised the work.
Corresponding author
Ethics declarations
Competing interests
P.S. is an employee and shareholder of F. HoffmannLa Roche Ltd. The remaining authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Tadahiro Goto and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Schwab, P., Mehrjou, A., Parbhoo, S. et al. Realtime prediction of COVID19 related mortality using electronic health records. Nat Commun 12, 1058 (2021). https://doi.org/10.1038/s41467020208167
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467020208167
This article is cited by

Cell specific peripheral immune responses predict survival in critical COVID19 patients
Nature Communications (2022)

Predicting health crises from early warning signs in patient medical records
Scientific Reports (2022)

An ML prediction model based on clinical parameters and automated CT scan features for COVID19 patients
Scientific Reports (2022)

iCOVID: interpretable deep learning framework for early recoverytime prediction of COVID19 patients
npj Digital Medicine (2021)

Applications of laboratory findings in the prevention, diagnosis, treatment, and monitoring of COVID19
Signal Transduction and Targeted Therapy (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.