External validation of the 4C mortality score among COVID-19 patients admitted to hospital in Ontario, Canada: a retrospective study

Risk prediction scores are important tools to support clinical decision-making for patients with coronavirus disease (COVID-19). The objective of this paper was to validate the 4C mortality score, originally developed in the United Kingdom, for a Canadian population, and to examine its performance over time. We conducted an external validation study within a registry of COVID-19 positive hospital admissions in the Kitchener-Waterloo and Hamilton regions of southern Ontario between March 4, 2020 and June 13, 2021. We examined the validity of the 4C score to prognosticate in-hospital mortality using the area under the receiver operating characteristic curve (AUC) with 95% confidence intervals calculated via bootstrapping. The study included 959 individuals, of whom 224 (23.4%) died in-hospital. Median age was 72 years and 524 individuals (55%) were male. The AUC of the 4C score was 0.77, 95% confidence interval 0.79–0.87. Overall mortality rates across the pre-defined risk groups were 0% (Low), 8.0% (Intermediate), 27.2% (High), and 54.2% (Very High). Wave 1, 2 and 3 values of the AUC were 0.81 (0.76, 0.86), 0.74 (0.69, 0.80), and 0.76 (0.69, 0.83) respectively. The 4C score is a valid tool to prognosticate mortality from COVID-19 in Canadian hospitals and can be used to prioritize care and resources for patients at greatest risk of death.

www.nature.com/scientificreports/ has been significant variation in in available treatments, vaccine uptake, and dominant viral lineages over the course of the pandemic.

Methods
Study design and setting. We conducted a retrospective validation study using records from the McMaster Multi-Regional Hospital Coronavirus Registry (COREG). COREG is a multi-center data registry collecting information on positive COVID-19 cases in the Kitchener-Waterloo and Hamilton regions of southern Ontario, Canada. The registry includes COVID-19-related emergency department (ED) visits and hospital admissions from six tertiary hospitals across the regions, including three academic and three community centres. The total number of inpatient beds across all sites is approximately 2400.

Participants.
We selected all patients admitted to one of the participating hospitals with a positive SARS-CoV-2 PCR nasal swab test between March 4, 2020 and June 13, 2021. We set the end date on the inclusion window to four weeks before date of final data extraction to allow for at least a 28-day length of stay for all participants. All inpatients were tested for COVID-19 therefore asymptomatic patients admitted for other reasons were included. Elective procedures however required a negative COVID test and therefore were not included. If a patient was transferred between study sites or had two separate admissions only the latter admission was retained. For temporal analysis, patients were separated into waves based on their admission date. Measures. Data in COREG were manually abstracted from electronic medical records by trained abstractors using a modified case report form published by ISARIC and the World Health Organization. Data was regularly audited to ensure accuracy. We utilized demographic and clinical data at presentation, typically from the emergency department. For patients who were directly admitted, we utilized the first day of inpatient records.
Supplementary Table S1 includes additional information on data collection procedures.
4C mortality score. The 4C mortality score was derived and validated within the ISARIC World Health Organization Clinical Characterisation Protocol UK study 7 . The score was derived from a population of over 35,000 hospital inpatients validation on over 22,000 inpatient records indicated good discriminability (area under the receiver operating characteristic curve (AUC) = 0.77) 6 . The 4C score incorporates age, sex, comorbidities, respiratory rate, peripheral oxygen saturation, Glasgow Coma Scale, blood urea nitrogen, and C-reactive protein. We adapted the score to match our available data as the Glasgow Coma Scale was not collected at presentation. We replaced this risk factor with the documented presence of altered consciousness or confusion. The 4C score ranges from 0 to 21 with risk groups defined as Low (0-3), Intermediate (4-8), High (9)(10)(11)(12)(13)(14), and Very high (≥ 15).
Outcome. The primary outcome was all-cause in-hospital mortality. Patients who received a palliative discharge were included. As was done in the initial derivation paper, we did not place a limit on time until death. Statistical analysis. We presented a demographic and clinical profile of study participants, stratified by outcome, and reported the prevalence and missingness of each of element of the 4C score. Missing data was treated by multiple imputation with chained questions with predictive mean matching, using 20 imputations and 50 iterations per imputation. The imputation model included all variables in the 4C score, site id, and outcome. We plotted the proportion of patients who died in hospital by 4C score and risk group and compared them to the initial derivation work in the UK.
We validated the 4C score using the AUC, with 95% confidence intervals calculated via bootstrapping with 2000 resamples. For comparative purposes, we also calculated the AUC using only the age components of the 4C score as age has consistently been shown to be among the strongest predictors of COVID-19 related mortality 8 and can be easily collected and used to triage patients without electronic aid. We calculated all AUCs overall and by wave. Additionally, we calculated diagnostic accuracy measures for cut-offs at scores of 3, 8, 12, and 14. These cut-offs align with the predefined risk groups with an additional group within the "High" category as half of our study participants were classified within this group. Finally, we constructed a calibration plot using bootstrapped predicted probabilities. All analysis was done in R 4.0.3 9 .
Ethics approval. Our study received ethics approval from the Tri-hospital Research Ethics Board and the Hamilton Integrated Research Ethics Board, who waived the requirement for informed consent as the data for this study was retrospectively collected from hospital medical records. All methods were performed in accordance with relevant guidelines and regulations.
Diagnostic accuracy measures for cut-offs at 3, 8, 11, and 14 can be found in Table 4. The sensitivity of each cut-off ranged from 100% for > 3 to 28.3% for > 14 while the specificities ranged from 10.2% for > 3 to 92.7% for > 14. The calibration plot (Supplementary Figure S3) indicated good calibration with significant deviations only occurring at very high predicted probabilities where data was scarce.

Interpretation
We found that the 4C mortality score is a valid tool to prognosticate mortality among COVID-19 patients admission to hospitals in a Canadian population. We observed an overall AUC of 0.77, which is identical to the initial derivation research 6 . The AUC values across waves 1, 2, and 3 were 0.81, 0.74 and 0.76, respectively. The AUC of the 4C score was higher than an age-only model across each of the three waves.
The 4C score has been validated in jurisdictions outside of the United Kingdom, including Canada 10,11 . Our study is confirmatory of these findings and additionally contributes to the literature by conducting an analysis by wave to investigate changes in the performance of the score over time. Changes over time in treatment practices (e.g. use of steroids), vaccine distribution, and the dominant strain of SARS-CoV-2 could all potentially impact the predictive ability of a mortality risk score. The 4C score was derived during the initial phases of the pandemic, before variants of concern emerged, before steroids were demonstrated effective against severe disease 12 , and while vaccines were still in early trials. While these conditions are similar to wave 1 in Ontario, Table 2. Components of the 4C mortality score, missingness, and post-imputation prevalence. Adapted from https:// www. bmj. com/ conte nt/ 370/ bmj. m3339. a Includes chronic lung disease (excluding asthma), chronic cardiac disease, diabetes, chronic liver disease, chronic kidney disease, chronic neurological disease, cancer, obesity (as defined by staff), rheumatological disease, dementia, and HIV/AIDS. www.nature.com/scientificreports/ by the peak of wave 2 vaccine distribution was underway to individuals most at risk 13 and the alpha variant (B.1.1.7) was spreading 14 .
Overall, the predictive ability of the 4C was maintained across the three waves, which is a promising indication given that COVID variants are likely to continue to emerge 15 . We observed a drop in the point estimate of the discriminative ability of 4C score in the later waves, although the differences were not statistically significant. Both the 4C score and age-only model had higher AUC values for wave 3 than wave 2, which may be due to the targeting of early vaccine distribution to older individuals, which would have muted the effect of age on mortality. Future research should examine if the score continues to be predictive as vaccine uptake reaches herd immunity levels.
The proliferation of COVID-19 risk models is evidence of the demand for an accurate, accessible, and generalizable tool 16 , and our research adds to the body of evidence supporting use of the 4C score. Our examination of cut-offs within the 4C score suggests that in practice the score may ultimately be most useful in identifying individuals at particularly low risk of death. The lower two cut-offs (3 and 8) demonstrated negative likelihood ratios of 0 and 0.2, while positive likelihood ratios for any cut-off never exceeded 4. Automated calculation of the    www.nature.com/scientificreports/ 4C score in electronic medical records could be used guide resource management and support clinical decision making such as treatment initiation and admission to ICU.

Limitations.
A key limitation of our study is that we were only able to include data from two regions within southern Ontario. While similarities between health systems across Canada suggest our findings will have excellent generalizability to other Canadian provinces and territories, our results may not generalize to geographically remote settings or to jurisdictions with substantially different health systems. Also, certain variables had high levels of missingness. Although we used multiple imputation per best practice 17 , model performance may have been nevertheless adversely affected by data missing not at random. Additionally, we were not able to collect and report data on race and ethnicity. Finally, the scores were calculated retrospectively and not in real-time.

Conclusion
The 4C mortality score is an valid prognostic tool for use in Canadian hospitals. It can be used to identify and prioritize care for COVID-19 patients at high risk of death.

Data availability
The data used in this study can be accessed for research purposes by submitting a request through the COREG data access portal (https:// www. coreg ontar io. ca/ info-data-access).
Code availability