External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19

Barish, Matthew; Bolourani, Siavash; Lau, Lawrence F.; Shah, Sareen; Zanos, Theodoros P.

doi:10.1038/s42256-020-00254-2

Download PDF

Matters Arising
Published: 12 November 2020

External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19

Nature Machine Intelligence volume 3, pages 25–27 (2021)Cite this article

6092 Accesses
38 Citations
33 Altmetric
Metrics details

Subjects

The Original Article was published on 14 May 2020

arising from Yan et al. Nature Machine Intelligence https://doi.org/10.1038/s42256-020-0180-7 (2020)

Typical artificial intelligence or machine learning algorithm deployment begins with model training followed by testing under varying circumstances to enable model adaptation and verification of a model’s performance. Despite the urgency that the coronavirus disease 2019 (COVID-19) pandemic has posed for deploying predictive models, proper validation is critical before any claims of utility or generalizability are made¹. In the recent publication by Yan et al.², the authors claimed to have developed a novel simplified mortality prediction model that can be applied in clinical practice. We validated their proposed decision tree against a large database of patients with COVID-19³. Yan et al. attempted to determine a subset of biomarkers from blood samples taken throughout a patient’s hospital course that could be used to predict mortality. The authors described the problem as a classification task, where the inputs were the results of the last set of laboratory tests taken from patients with variable severity and where the outcomes were either discharge or death. Following algorithm generation, the authors assessed the feature importance of each parameter. This generated a set of three key features: lactate dehydrogenase (LDH), lymphocyte proportion and high-sensitivity C-reactive protein (hs-CRP). The simplicity of the three-branch model based on these three values is enticing for rapid, wide-scale adoption in clinical practice. Although the authors reported success in their attempt to determine markers of imminent mortality in their dataset, they used a small validation sample size and did not externally validate their model³. We have attempted, under various use cases, to validate their model using data from patients with COVID-19 treated in Northwell Health hospitals. We also attempted to recalibrate the primary branch of the proposed tree-based model based on our data to account for differences between our populations and health systems¹.

Model performance as a triage tool

The model is purported to be applicable to any blood sample, far ahead of the primary clinical outcome, thereby suggesting its use as an admission triage tool. We tested the performance of their model on their validation data at the first time point that all three blood tests were collected, similar to how physicians would risk-stratify new patients (Fig. 1a). This critical time point was not included in the original paper, which is important as the authors suggest that their model can be used to prioritize care. Predictive clinical models are used prospectively where the time of outcome is unknown, unlike this model, which retrospectively utilizes data based on the known date of outcome. Given that the clinician cannot know when the date of discharge or death is to occur, the fact that the performance of the proposed model improves closer to the time of outcome is not clinically useful information. Therefore, it is important to show that the model has sufficient performance to justify changes to clinical care (as the authors suggest) at admission.

**Fig. 1: Performance of decision rule.**

Our analysis was performed on Python and R, using code available at https://github.com/siabolourani/YIN_reply. The precision was 0.48 for predicting mortality, meaning that over half of the patients that the model predicted would die actually survived. The accuracy was 0.88 and the F1 score was 0.41. In interpreting these results, one must take care in accounting for imbalanced data—their validation set had a survival rate of 0.88, meaning that the null model of always predicting survival had a similar accuracy as the proposed full model.

Model performance on external data

To test the clinical portability of the mortality prediction model, we validated it externally using the Northwell Health electronic health record database. Northwell Health is the largest academic health system in New York, comprising 12 acute care hospitals that serve ~11 million people in the North American epicentre of the COVID-19 pandemic⁴. The data used for this validation were collected from the Enterprise Electronic Health record (Sunrise Clinical Manager, Allscripts, Chicago) and included patients who had had COVID-19 and had been discharged from Northwell hospitals between 1 March and 31 May 2020. All patients with a final outcome (death or discharged alive) and LDH, hs-CRP and lymphocyte values measured at least once during their hospitalization were considered. Thus, from a total of 13,106 patients, 1,038 patients were included for the validation of the model.

We initially tested the model performance using the first time point when all three laboratory values were available (Fig. 1b). Simulating the operation of the model at this initial triage point, the precision was 0.40 for death (F1 score of 0.56), with an overall accuracy of 0.48.

The model’s accuracy is reported to increase with laboratory values drawn closer to the patient’s outcome. As previously stated, a clinical model that is contingent on knowing, in advance, the date of outcome, is of dubious use. Nevertheless, we externally validated the model using the final (pre-death or discharge) laboratory values in our dataset. The precision for death remained low at 0.41, with an overall model accuracy of 0.50 (Fig. 1c).

Recalibrated model performance on external data

LDH alone was the primary driver of the decision tree and an LDH value of >365 U l⁻¹ led to a terminal node accounting for 93.0% (146/157) of the true positive predictions of mortality in their dataset. Therefore, it could be argued that LDH alone is a sufficiently robust mortality predictor and naturally lends itself to serve as a triage tool. To test this hypothesis, from all our patients (n = 13,106), we included those with at least one LDH value from the emergency department (n = 3,595). With the proposed threshold of LDH > 365 U l⁻¹, the precision for mortality was 0.34 (Fig. 2). We then varied the LDH threshold and found that the maximal precision achieved for this branch was only 0.54, revealing its lack of prognostic utility at our institution as part of a mortality prediction model upon admission.

**Fig. 2: LDH as a mortality predictor.**

Conclusion

An interpretable mortality prediction model for patients with COVID-19 is a worthwhile pursuit to help inform clinicians in the battle against this pandemic. We have shown that the recently published model by Yan et al. does not perform as a triage tool based on the internal validation dataset provided by the original authors. Furthermore, we have demonstrated that the decision algorithm was not portable to our large external validation dataset, both with unmodified and optimized parameters. We have thus demonstrated the importance of externally validating this model before its widespread adoption in actual clinical practice, especially given the rapid and widespread dissemination of this model post-publication⁵. Furthermore, our findings, consistent with other studies⁶, confirm that the proposed model cannot be recommended for routine clinical implementation.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The datasets analysed and generated during the current study are available from the corresponding author on reasonable request.

Code availability

The analysis code is available at the following repository: https://github.com/siabolourani/YIN_reply.

References

Wynants, L. et al. Prediction models for diagnosis and prognosis of Covid-19 infection: systematic review and critical appraisal. BMJ 369, m1328 (2020).
Article Google Scholar
Yan, L. et al. An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2, 283–288 (2020).
Article Google Scholar
Riley, R. D. et al. Calculating the sample size required for developing a clinical prediction model. BMJ 368, m441 (2020).
Article Google Scholar
Richardson, S. et al. Presenting characteristics, comorbidities and outcomes among 5,700 patients hospitalized with COVID-19 in the New York City area. JAMA 323, 2052–2059 (2020).
Article Google Scholar
An interpretable mortality prediction model for COVID-19 patients. Altmetric https://www.altmetric.com/details/82019437/ (2020).
Gupta, R. K. et al. Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: an observational cohort study. Eur. Resp. J. https://doi.org/10.1183/13993003.03498-2020 (2020).

Download references

Acknowledgements

The authors would like to thank J. Hirsh and K. Coppa for providing the queries and data that enabled this study, as well as the Northwell Machine Learning in Medicine group, whose discussions inspired this paper. We acknowledge and honor all our Northwell team members who consistently put themselves in harm’s way during the COVID-19 pandemic. We dedicate this article to them, as their vital contribution to knowledge about COVID-19 and sacrifices on the behalf of patients made it possible.

Author information

These authors contributed equally: Matthew Barish, Siavash Bolourani, Lawrence F. Lau, Sareen Shah, Theodoros P. Zanos.

Authors and Affiliations

Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, USA
Matthew Barish, Lawrence F. Lau, Sareen Shah & Theodoros P. Zanos
Institute of Health Innovations and Outcomes Research, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
Matthew Barish
Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
Siavash Bolourani & Theodoros P. Zanos
Center for Immunology and Inflammation, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
Siavash Bolourani
Elmezzi Graduate School of Molecular Medicine, Manhasset, NY, USA
Siavash Bolourani
Department of Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Manhasset, NY, USA
Siavash Bolourani & Lawrence F. Lau
Pediatric Critical Care, Cohen Children’s Medical Center, New Hyde Park, NY, USA
Sareen Shah

Authors

Matthew Barish
View author publications
You can also search for this author in PubMed Google Scholar
Siavash Bolourani
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence F. Lau
View author publications
You can also search for this author in PubMed Google Scholar
Sareen Shah
View author publications
You can also search for this author in PubMed Google Scholar
Theodoros P. Zanos
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.B., S.S. contributed in the data analysis. S.S. and T.P.Z. created the figures. M.B., S.B., S.S., L.F.L. and T.P.Z. contributed in the writing, editing and overall concept of the manuscript. T.P.Z. coordinated the project and led the submission.

Corresponding author

Correspondence to Theodoros P. Zanos.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Supplementary Data 1

Supplementary Data 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barish, M., Bolourani, S., Lau, L.F. et al. External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19. Nat Mach Intell 3, 25–27 (2021). https://doi.org/10.1038/s42256-020-00254-2

Download citation

Received: 04 June 2020
Accepted: 02 October 2020
Published: 12 November 2020
Issue Date: January 2021
DOI: https://doi.org/10.1038/s42256-020-00254-2

This article is cited by

A hybrid modeling framework for generalizable and interpretable predictions of ICU mortality across multiple hospitals
- Moein E. Samadi
- Jorge Guzman-Maldonado
- Andreas Schuppert
Scientific Reports (2024)
Big data analytics on the impact of OMICRON and its influence on unvaccinated community through advanced machine learning concepts
- Amalraj Irudayasamy
- D. Ganesh
- Umi Salma
International Journal of System Assurance Engineering and Management (2024)
Early and fair COVID-19 outcome risk assessment using robust feature selection
- Felipe O. Giuste
- Lawrence He
- May D. Wang
Scientific Reports (2023)
Machine learning-based derivation and external validation of a tool to predict death and development of organ failure in hospitalized patients with COVID-19
- Yixi Xu
- Anusua Trivedi
- Pavan K. Bhatraju
Scientific Reports (2022)
Shifting machine learning for healthcare from development to deployment and from models to data
- Angela Zhang
- Lei Xing
- Joseph C. Wu
Nature Biomedical Engineering (2022)