Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Prediction of mortality from 12-lead electrocardiogram voltage data using a deep neural network

Abstract

The electrocardiogram (ECG) is a widely used medical test, consisting of voltage versus time traces collected from surface recordings over the heart1. Here we hypothesized that a deep neural network (DNN) can predict an important future clinical event, 1-year all-cause mortality, from ECG voltage–time traces. By using ECGs collected over a 34-year period in a large regional health system, we trained a DNN with 1,169,662 12-lead resting ECGs obtained from 253,397 patients, in which 99,371 events occurred. The model achieved an area under the curve (AUC) of 0.88 on a held-out test set of 168,914 patients, in which 14,207 events occurred. Even within the large subset of patients (n = 45,285) with ECGs interpreted as ‘normal’ by a physician, the performance of the model in predicting 1-year mortality remained high (AUC = 0.85). A blinded survey of cardiologists demonstrated that many of the discriminating features of these normal ECGs were not apparent to expert reviewers. Finally, a Cox proportional-hazard model revealed a hazard ratio of 9.5 (P < 0.005) for the two predicted groups (dead versus alive 1 year after ECG) over a 25-year follow-up period. These results show that deep learning can add substantial prognostic information to the interpretation of 12-lead resting ECGs, even in cases that are interpreted as normal by physicians.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Summary of model performance as area under the receiver operating characteristic curve for predicting 1-year mortality.
Fig. 2: Receiver operating characteristic curves indicating the optimal operating point and corresponding Kaplan–Meier survival curves.

Similar content being viewed by others

Data availability

All requests for raw and analyzed data and related materials, excluding programming code, will be reviewed by our legal department to verify whether the request is subject to any intellectual property or confidentiality constraints. Requests for patient-related data not included in the paper will not be considered. Any data and materials that can be shared will be released via a material transfer agreement for non-commercial research purposes.

Code availability

Programming code related to data preprocessing and model specification will be made available under GNU General Public License version 3 upon request to the corresponding author.

References

  1. Fye, W. B. A history of the origin, evolution, and impact of electrocardiography. Am. J. Cardiol. 73, 937–949 (1994).

    Article  CAS  Google Scholar 

  2. Chesebro, J. H. et al. Thrombolysis in Myocardial Infarction (TIMI) trial, phase I: a comparison between intravenous tissue plasminogen activator and intravenous streptokinase. Clinical findings through hospital discharge. Circulation 76, 142–154 (1987).

    Article  CAS  Google Scholar 

  3. Eagle, K. A. et al. A validated prediction model for all forms of acute coronary syndrome. JAMA 291, 2727–2733 (2004).

    Article  CAS  Google Scholar 

  4. Kenchaiah, S. et al. Obesity and the risk of heart failure. N. Engl. J. Med. 347, 305–313 (2002).

    Article  Google Scholar 

  5. Levy, W. C. et al. The Seattle Heart Failure Model: prediction of survival in heart failure. Circulation 113, 1424–1433 (2006).

    Article  Google Scholar 

  6. Goldman, L. et al. Multifactorial index of cardiac risk in noncardiac surgical procedures. Surv. Anesthesiol. 22, 482 (1978).

    Article  Google Scholar 

  7. Lloyd-Jones, D. M. et al. Use of risk assessment tools to guide decision-making in the primary prevention of atherosclerotic cardiovascular disease. Circulation 139, e1162–e1177 (2019).

    Article  Google Scholar 

  8. Hwang, W., Chang, J., LaClair, M. & Paz, H. Effects of integrated delivery system on cost and quality. Am. J. Manag. Care 19, e175–e184 (2013).

    PubMed  Google Scholar 

  9. Motwani, M. et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis. Eur. Heart J. 52, 468–476 (2016).

    Google Scholar 

  10. Curry, S. J. et al. Screening for cardiovascular disease risk with electrocardiography: US Preventive Services Task Force recommendation statement. JAMA 319, 2308–2314 (2018).

    Article  Google Scholar 

  11. Lanza, G. A. The electrocardiogram as a prognostic tool for predicting major cardiac events. Prog. Cardiovasc. Dis. 50, 87–111 (2007).

    Article  Google Scholar 

  12. Pan, J. & Tompkins, W. J. A real-time QRS detection algorithm. IEEE Trans. Biomed. Eng. 32, 230–236 (1985).

    Article  CAS  Google Scholar 

  13. Belforte, G., De Mori, R. & Ferraris, F. A contribution to the automatic processing of electrocardiograms using syntactic methods. IEEE Trans. Biomed. Eng. 26, 125–136 (1979).

    Article  CAS  Google Scholar 

  14. Madeiro, J. P. V., Cortez, P. C., Marques, J. A. L., Seisdedos, C. R. V. & Sobrinho, C. R. M. R. An innovative approach of QRS segmentation based on first-derivative, Hilbert and wavelet transforms. Med. Eng. Phys. 34, 1236–1246 (2012).

    Article  Google Scholar 

  15. Köhler, B. U., Hennig, C. & Orglmeister, R. The principles of software QRS detection. IEEE Eng. Med. Biol. Mag. 21, 42–57 (2002).

    Article  Google Scholar 

  16. Addison, P. S. Wavelet transforms and the ECG: a review. Physiol. Meas. 26, R155–R199 (2005).

    Article  Google Scholar 

  17. Acharya, U. R. et al. A deep convolutional neural network model to classify heartbeats. Comput. Biol. Med. 89, 389–396 (2017).

    Article  Google Scholar 

  18. LeBlanc, A. R. Quantitative analysis of cardiac arrhythmias. Crit. Rev. Biomed. Eng. 14, 1–43 (1986).

    CAS  PubMed  Google Scholar 

  19. Luz, E. J., Schwartz, W. R., Cámara-Chávez, G. & Menotti, D. ECG-based heartbeat classification for arrhythmia detection: a survey. Comput. Methods Programs Biomed. 127, 144–164 (2016).

    Article  Google Scholar 

  20. Rahhal, M. M. A. et al. Deep learning approach for active classification of electrocardiogram signals. Inf. Sci. 345, 340–354 (2016).

    Article  Google Scholar 

  21. Liu, W. et al. Real-time multilead convolutional neural network for myocardial infarction detection. IEEE J. Biomed. Health Inform. 22, 1434–1444 (2017).

    Article  Google Scholar 

  22. Goodfellow, S. D. et al. Towards understanding ECG rhythm classification using convolutional neural networks and attention mappings. in Machine Learning for Healthcare Conf. 83–101 (2018).

  23. Yu, S. N. & Chen, Y. H. Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network. Pattern Recog. Lett. 48, 1142–1150 (2007).

    Article  Google Scholar 

  24. Asl, B. M., Setarehdan, S. K. & Mohebbi, M. Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal. Artif. Intell. Med. 44, 51–64 (2008).

    Article  Google Scholar 

  25. Karpagachelvi, S., Arthanari, M. & Sivakumar, M. Classification of ECG signals using extreme learning machine. Comput. Inf. Sci. https://doi.org/10.5539/cis.v4n1p42 (2014).

  26. Kampouraki, A., Manis, G. & Nikou, C. Heartbeat time series classification with support vector machines. IEEE Trans. Inf. Technol. Biomed. 13, 512–518 (2009).

    Article  Google Scholar 

  27. Hannun, A. Y. et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med. 25, 65–69 (2019).

    Article  CAS  Google Scholar 

  28. Smith, S. W. et al. A deep neural network learning algorithm outperforms a conventional algorithm for emergency department electrocardiogram interpretation. J. Electrocardiol. 52, 88–95 (2019).

    Article  Google Scholar 

  29. Attia, Z. I. et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat. Med. 25, 70–74 (2019).

    Article  CAS  Google Scholar 

  30. Attia, Z. I. et al. Prospective validation of a deep learning ECG algorithm for the detection of left ventricular systolic dysfunction. J. Cardiovasc. Electrophysiol. 30, 668–674 (2019).

    Article  Google Scholar 

  31. Attia, Z. I. et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 6736, 1–7 (2019).

    Google Scholar 

  32. Chen, T. & Guestrin, C. XGBoost: reliable large-scale tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (2016).

  33. Chen, T. & He, T. Higgs Boson discovery with boosted trees. JMLR Work. Conf. Proc. 42, 69–80 (2015).

    CAS  Google Scholar 

  34. D’Agostino, R. B. et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 117, 743–753 (2008).

    Article  Google Scholar 

  35. Quan, H. et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med. Care 43, 1130–1139 (2005).

    Article  Google Scholar 

  36. Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV) 618–626 (IEEE, 2017).

  37. Springenberg, J. T., Dosovitskiy, A., Brox, T. & Riedmiller, M. Striving for simplicity: the all convolutional net. Preprint at https://arxiv.org/abs/1412.6806 (2014).

  38. Davie, A. P. et al. Value of the electrocardiogram in identifying heart failure due to left ventricular systolic dysfunction. Br. Med. J. 312, 222 (1996).

    Article  CAS  Google Scholar 

  39. Hedberg, P. et al. Electrocardiogram and B-type natriuretic peptide as screening tools for left ventricular systolic dysfunction in a population-based sample of 75-year-old men and women. Am. Heart J. 148, 524–529 (2004).

    Article  CAS  Google Scholar 

  40. van Buuren, S. & Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, https://doi.org/10.18637/jss.v045.i03 (2011).

  41. Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. Proceedings of the ICML 27, 807–814 (2010).

  42. Lin, M., Chen, Q. & Yan, S. Network in network. Preprint at https://arxiv.org/abs/1312.4400 (2013).

  43. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  44. Hochreiter, S. & Schmidhuber, J. J. Long short-term memory. Neural Comput. 9, 1–32 (1997).

    Article  CAS  Google Scholar 

  45. Prechelt, L. Early stopping—but when? in Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science 7700, 55–69 (Springer, 2012).

  46. Shah, S. J. et al. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation 131, 269–279 (2015).

    Article  Google Scholar 

  47. Carreiras, C. et al. BioSPPy: biosignal processing in Python https://github.com/PIA-Group/BioSPPy (2018).

  48. Kaplan, E. L. & Meier, P. Nonparametric estimation from incomplete observations. Am. Stat. Assoc. 53, 457–481 (1958).

    Article  Google Scholar 

  49. Cox, D. R. Regression models with life tables. J. R. Stat. Soc. B 34, 187–220 (1972).

    Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge C. Nevius and B. McCarty for their help in submitting the IRB approval for the study and developing a scheduler for efficient computational scheduling of the study experiments. The authors also acknowledge the time and contribution of the following cardiologists who performed the survey reported in the paper: N. Mead, B. Carry, G. Yost, S. Siddiqi, T. Rizwan and B. Durr.

Author information

Authors and Affiliations

Authors

Contributions

S.R., C.M.H. and B.K.F. conceived the study and designed the experiments. S.R. conducted all the experiments. S.R., A.E.U.C. and J.S. contributed to the code base and deep learning framework used for the experiments. S.R., A.U.E.C., L.J., D.P.v.M., J.B.L. and D.N.H. assembled the data. H.L.K., B.P.D., A.A.P. and J.S. contributed to many discussions on experimental design. S.R. and D.P.v.M. designed the web application to perform the blinded surveys of cardiologists. C.W.G., J.M.P., A.A. and D.B. are the cardiologists who completed the survey and provided clinical insights. S.R., J.M.P., A.N., M.C.S., T.C., A.H. and K.W.J. contributed to the model interpretation with guided Grad-CAM. S.R., C.M.H. and K.Y. contributed to clinical chart review for cause-of-death analysis. All authors critically revised the manuscript.

Corresponding author

Correspondence to Brandon K. Fornwalt.

Ethics declarations

Competing interests

This work was supported in part by funding from the Pennsylvania Department of Health (SAP 4100070267), an American Heart Association Competitive Catalyst Award (17CCRG33700289), the Geisinger Health Plan and Clinic, and Tempus. The content of this article does not reflect the views of the funding sources. Geisinger receives funding from Tempus for ongoing development of predictive modeling technology and commercialization. Tempus and Geisinger have jointly applied for a patent related to the work. None of the Geisinger authors has ownership interest in any of the intellectual property resulting from the partnership. A.N., T.C., A.H., M.C.S. and K.W.J. are employees of Tempus.

Additional information

Peer review information Michael Basson was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Summary of the data used in the study.

Summary of data used in the study. Note that ‘15 traces’ means the standard 12 ‘short duration’ leads (2.5 seconds of voltage data for each) plus 3 ‘long duration’ leads (10 seconds of voltage data for each). PDF = portable document format. CV = cross-validation.

Extended Data Fig. 2 Model Architecture.

Model architecture used in the study.

Extended Data Fig. 3 Model performance as area under precision recall curve.

Summary of model performance as area under precision recall curve (AUPRC) to predict one-year mortality. (A) The mean AUPRC for the indicated input data, including (i) clinically-acquired ECG measures (9 numerical values and 31 diagnostic labels), (ii) ECG voltage-time traces only, (iii) age and sex alone, (iv) ECG measures with age and sex, and (v) ECG voltage-time traces with age and sex. Models for (i), (iii) & (iv) used XGBoost and models for (ii) & (v) used a DNN. ‘Normal’ refers to the ECGs in the test set labeled as normal by the original interpreting physician at the time of ECG acquisition, ‘abnormal’ refers to any ECGs not identified as normal in the test set and ‘all’ includes both normal and abnormal ECGs in the test set. (B) The relative performance of the DNN models using single leads as input (sorted by increasing performance). The mean AUPRC of the models M1-M5 (derived from 5-fold cross-validation, see text) are shown as the bar heights while individual data points for each of the 5 models are shown as a red ‘x’; black dots represent the AUPRC of model M0 (trained on 60% of the data and tested on the 40% holdout set). ‘2.5 seconds’ and ‘10 seconds’ refers to the duration of the voltage-time traces used for the model (see text for details).

Extended Data Fig. 4 Model explainability with GRAD-CAM.

The guided gradient class activation maps (guided Grad-CAM) overlaid on signal-averaged waveforms for three patients (bottom 3 rows) as well as mean signal and activation across patients (top row) for leads V2 and V3. Clinical ECG findings for all three patients reported anterior acute myocardial infarction with apparent ST segment elevations. Note that these patients were predicted high risk by the model, and all died within a year after this ECG (that is, they were considered ‘true positives’). The overlay of the saliency map from guided Grad-CAM highlights the regions deemed salient (darker red regions) by the model towards prediction of high likelihood of mortality in a year, which coincided with the ST segment.

Extended Data Fig. 5 Cardiologist visual survey.

Accuracy for the ten cardiologists to correctly identify the true positive ECG (dead within a year) when presented with two ‘normal’ ECGs corresponding to a paired set of a true positive and true negative (n=100) (gray bars). Accuracy is also shown (black bars) for the same survey after being shown an independent set of paired ECGs (n=100) with outcomes labeled. All ECG pairs presented were matched for age and sex.

Supplementary information

Supplementary Information

Supplementary Tables 1 and 2.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raghunath, S., Ulloa Cerna, A.E., Jing, L. et al. Prediction of mortality from 12-lead electrocardiogram voltage data using a deep neural network. Nat Med 26, 886–891 (2020). https://doi.org/10.1038/s41591-020-0870-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-020-0870-z

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing