Introduction

Implantable cardioverter-defibrillator (ICD) therapy has become the cornerstone for the primary prevention of sudden cardiac death (SCD) in patients with systolic heart failure (HF) and reduced left ventricular ejection fraction (LVEF)1 due to both ischemic (ICM) and nonischemic cardiomyopathy (NICM). Although the survival benefit of primary prevention ICD is incontrovertible, the rate of appropriate ICD therapies due to ventricular arrhythmia (VA) is relatively low at 1.1–5.1% per year2. In contrast, the adverse event rate exceeds the rate of appropriate therapies in individuals at low risk for SCD. For example, device infection rates are 1.4–2.0% per year3, and rates of inappropriate shocks approach 5–20% per year4, which are associated with increased mortality5 and decreased quality of life6. In addition, LVEF improvement occurs in up to 25–50% of patients which correlates with diminished SCD risk due to VA7. Thus, LVEF is far from being a comprehensive imaging feature to predict VA. Recently, other imaging features of cardiac structure and function were found to be independent predictors of VA, including the extent of heterogeneous myocardial tissue (‘gray zone’) on late gadolinium enhancement (LGE) cardiac magnetic resonance (CMR)8, and right ventricular (RV)9 and left atrial (LA) function10.

Artificial intelligence (AI) algorithms based on deep learning consist of learning complex models directly from data sets. Initial success of AI applications in medical imaging was demonstrated by confirming expert-level diagnoses11. Recently, AI has been shown to predict personalized prognosis, such as individual responses to lung cancer therapy12 and survival for patients with pulmonary hypertension13. While traditional machine learning approaches rely on handcrafted, previously recognized features extracted from medical images, AI can also automatically generate a patient-specific fingerprint containing inherent features of cardiac structure and function from cine CMR14 in an unsupervised fashion15.

The CERTAINTY study (CinE caRdiac magneTic resonAnce to predIct veNTricular arrhythmia) utilizes deep learning for VA risk prediction for individual patients from non-contrast cine CMR images in primary prevention ICD candidates. The findings from the CERTAINTY study are expected to improve our understanding of the mechanisms that predispose to VA, with the hope to develop a new paradigm to identify high- and low-risk individuals by extracting features associated with increased VA risk from cine CMR images in an unsupervised fashion. This article presents an overview of the CERTAINTY design, a descriptive analysis of the demographics of the study cohort, deep learning network architecture, results within the training cohort, and solicitation of participation to contribute external validation data sets.

Results

Study population

The inclusion and exclusion criteria of the CERTAINTY study population are described in Table 11. Baseline characteristics of the training cohort (n = 350) by VA occurrence are summarized in Table 2. The median age was 59 years, and 97 patients (28%) were female. The etiology of HF was ischemic heart disease in 178 patients (51%), and cardiac resynchronization therapy with an ICD (CRT-D) was implanted in 100 patients (29%). The median baseline LVEF was 26%. After a median follow-up of 7.1 years, the primary endpoint was observed in 96 patients (incidence rate of 4.57 per 100 person-years, Table 3). Thirty five patients (10%) received appropriate antitachycardia pacing (ATP) without appropriate ICD shock, and the remaining patients received appropriate ICD shocks. Patients with the primary endpoint (n = 96) were more likely to be male; had larger LV size, extent of total LV LGE, and larger LA size; and lower LA total emptying function than patients without VA events (n = 254).

Table 1 CERTAINTY inclusion/exclusion criteria.
Table 2 Baseline characteristics.
Table 3 Incidence rate and cine risk score for each endpoint.

Cine risk score as a predictor

First, we assessed the value of cine fingerprint to predict outcomes within the training cohort by applying a univariate Cox hazards model to cine risk scores calculated by the risk predictor autoencoder network (Fig. 1). For all the endpoints, the cine risk score calculated from the cine fingerprint was significantly higher in those with compared to without events (Table 3). In addition, C-index of the cine risk score was higher than that of any other independent predictor including LV LGE gray zone, LA maximum volume index and LA total emptying fraction (Table 4). For VA and all-cause death, HR of the cine risk score was also higher than that of the other independent predictors. For HF death, the hazard ratio (HR) of cine risk score was lower than that of LA total emptying fraction. Survival analysis up to 10 years also showed that the cine risk score is a significant predictor for all the endpoints studied (Fig. 2). The cine risk score that is equal to or lower than the cut-off value of 0.15 (= 25 percentile) identifies a low-risk subgroup that achieved 83% VA-free survival at 10 years. This corresponds to an incidence rate of 2.55 per 100 person-years [95% CI 1.56–4.16], which is a 44% reduction from the incidence rate for the overall cohort of 4.57 per 100 person-years [95% CI 3.74–5.59], Table 3. Competing risk analysis showed that the cine risk score remained significantly associated with VA (subhazard ratio 3.82 [95% confidence interval 2.04–7.15], p < 0.001). Net reclassification improvement (NRI) index of cine risk score was 0.111 compared with LV LGE gray zone.

Figure 1
figure 1

Algorithm overview. (A) Cine fingerprint extractor. (B) Risk predictor. See text for details.

Table 4 Univariate Cox hazards model of cine risk scores calculated by the risk predictor autoencoder network.
Figure 2
figure 2

Survival prediction for each endpoint. (A) LV LGE gray zone. (B) LA maximum volume index, (C) LA total emptying fraction, (D) Cine risk score. The shaded area represents 95% confidence interval.

Multivariate analysis

Next, we assessed the covariates together within multivariate Cox Proportional Hazards Regression models (Fig. 3). We performed survival analysis when combining the cine risk score with each of the predictors (LV LGE gray zone, LA maximum volume index and LA total emptying fraction) independently and performed a multivariate analysis with all covariates with and without the cine risk score. In each case, the addition of the cine risk score (e.g. Fig. 3A–C) improved the hazard ratio of each endpoint compared with the covariates without the cine risk score (e.g. Fig. 2A–C, respectively). In addition, the C-indices of the multivariate model with cine risk score also demonstrate the incremental value of the cine risk score (Table 4).

Figure 3
figure 3

Survival prediction for each endpoint using multivariate Cox Proportional Hazards Regression models. (A) Cine risk score + LV LGE gray zone. (B) Cine risk score + LA maximum volume index. (C) Cine risk score + LA total emptying fraction. (D) LV LGE gray zone + LA maximum volume index + LA total emptying fraction. (E) Cine risk score + LV LGE gray zone + LA maximum volume index + LA total emptying fraction. The shaded area represents 95% confidence interval.

We also used unadjusted and adjusted Cox proportional regression analyses to assess the performance of cine risk score to predict the primary endpoint (Table 5). Univariate, unadjusted analysis identified male sex, use of diuretic as contributors of VA. After adjusting for sex, type of cardiomyopathy, use of diuretics, and hsCRP, cine risk score remained significantly associated with VA (HR 3.24, p = 0.005; Model 1). After further adjusting for LVEDI, LV ejection fraction, LV LGE gray zone, LA maximum volume index and LA total emptying fraction, cine risk score remained significantly associated with VA (HR 2.67, p = 0.027; Model 2 in Table 5). We applied the same analysis approach to the secondary endpoints. Cine risk score remained significantly associated with heart failure death after adjusting for age, NYHA class, duration and type of cardiomyopathy, history of diabetes, use of diuretics and digoxin (HR 5.62, p < 0.001; Model 1 in Supplementary Table 1). However, cine risk score was not significantly associated with heart failure death after additionally adjusting for LVEDI, LV ejection fraction, and LV LGE gray zone (HR 2.51, p = 0.119; Model 2 in Supplementary Table 1). In contrast, cine risk score remained significantly associated with all-cause death after adjusting for age, NYHA class, duration and type of cardiomyopathy, history of diabetes, use of diuretics, LVEDI, LV ejection fraction, and LV LGE gray zone (HR 2.27, p = 0.019; Model 2 in Supplementary Table 2).

Table 5 Predictors of ventricular arrhythmia by unadjusted and adjusted Cox proportional regression analysis.

Discussion

Main findings

Our main findings are summarized as follows: (1) cine CMR inherently contains features of cardiac structure and function that improve VA risk prediction in primary prevention ICD candidates; (2) deep learning can automatically extract those features in the form of a cine risk score; and (3) the cine risk score is an independent biomarker of risk associated with VA and all-cause death in primary prevention ICD candidates. To our knowledge, this is the first study to demonstrate the incremental prognostic value of cardiac structure and function assessed from cine CMR in primary prevention ICD candidates. Earlier studies on predictive biomarkers for VA mainly focused on the presence, extent, and characteristics of myocardial scar detected by LGE CMR16. Its rationale is based on a electrophysiological assumption that the gray zone represents transitional tissues between scar and normal myocardium, and that slow conduction within the gray zone serves as a substrate for reentrant arrhythmia17. More recent studies identified LA function10, 18, 19, quantified by echocardiography or cine CMR, as independent predictors of SCD. However, quantification of chamber dysfunction in prior studies relied on pre-specified feature extraction with chamber segmentation in multiple imaging views. In contrast, our algorithm automatically extracts features from only cine CMR in an unsupervised fashion. In addition, our findings showed that the predictive value of cine risk score is independent of LA function (Supplementary Table 1), which suggests that cine CMR inherently contains predictive features beyond LA function.

Mechanistic implications

Although deep learning is emerging as a powerful tool for diagnosis11, 20 and risk stratification12, it suffers from a lack of transparency and explainability21. In our study, the algorithm does not identify specific 4cv cine CMR function features associated with increased risk for VA. One possibility is imaging features associated with structural and functional interactions among the four chambers. Neural network-based algorithms can handle high-dimensional vector space simultaneously. This ability enables assessment of feature interactions as emergent phenomena that cannot be evaluated by studying each feature in isolation. Another possibility is imaging features associated with cardiac hemodynamics such as increased LV filling pressures, decreased LV compliance, increased LV wall stress associated with clinical heart failure. Because low RV9, 22 and LA function10, 18, 19 are independent predictors of SCD, it is possible that the algorithm identified imaging features of impending biventricular failure associated with VA. Importantly, VA risk prediction approaches to date have not comprehensively incorporated metrics from these two possibilities, which are not mutually exclusive. Future studies are needed to address the knowledge gap as to the potential clinical significance of these metrics in VA prediction.

Clinical implications

The proposed algorithm has a potential clinical impact to help primary prevention ICD candidates and their physicians make an informed decision regarding ICD implantation during the shared decision making process23. Notably, the algorithm is applied to 4cv cine CMR, which does not require intravenous contrast agents. This is particularly important for individuals with severe HF and cardiorenal syndrome who are considered for primary prevention ICD. The key innovation of the developed algorithm is that it is does not require manual segmentation of the heart chambers, which allows quicker risk assessment without cognitively biased human intervention.

Solicitation of participation to contribute to external validation cohort

Our findings clearly indicate that non-contrast, cine CMR inherently contains features that improve VA risk prediction in primary prevention ICD candidates. However, the training cohort is of relatively small sample size and derives from a single institution. To assess the generalizability of the algorithm, it needs to be tested in an external validation cohort. The unique value of the CERTAINTY study is the long follow-up duration (10 years), because, unlike pharmacologic or ablation interventions, ICD is usually a lifetime commitment. The ICD candidates need to be informed of long-term implications at the time of shared decision making on ICD implantation. This unique value of long-term follow-up unfortunately limits the data availability of the CERTAINTY study. Even with the advent of remote ICD monitoring, most institutions have very few data sets with this duration of follow-up. Therefore, we encourage participation of multiple institutions in the CERTAINTY study by contributing to the external validation cohort. The baseline characteristics of the CERTAINTY population is described in Table 1. The external validation cohort should meet two important criteria. First, the baseline characteristics of the external validation cohort should be clinically matched to those of the training cohort. Critically important variables to be matched include the follow-up duration and the etiology of cardiomyopathy (ischemic vs. nonischemic). The follow-up duration is of particular importance to accrue an adequate number of events to improve rigor. Second, the external validation cohort should have a sufficiently large sample size to draw statistically meaningful conclusions. We estimate the sample size based on the event-free survival of heart failure death in the training cohort, because the incidence rate of heart failure death was the lowest among the three main outcomes (ventricular arrhythmia, heart failure death and all-cause death) (Table 3). The expected freedom from heart failure death at 10 years in the low- and high-risk group based on the cine risk score was 0.877 (95% CI 0.801–0.925) and 0.740 (95% CI 0.642–0.815), respectively (Fig. 2D). Based on those values, we determined that 344 cases are needed for the external validation cohort to have a power of 90% at two-sided alpha level of 0.05.

Limitations

There are two limitations associated with the study. First, we used only 4cv cine CMR, which was included in a routine image acquisition protocol. Therefore, it is possible that the algorithm underestimated the degree of abnormal structure and function by missing regions that were not covered by the 4cv view. However, we believe that the advantage of our approach outweighs the disadvantage of including multiple views to assess 3-D structure and function, which would increase the scan time and post-processing burden including a higher training complexity for the algorithm. Second, the proposed method focuses on the extraction of cardiac function features while additional imaging features capturing anatomic abnormalities could further improve the risk prediction.

Conclusions

Non-contrast, cine CMR inherently contains imaging features that can improve VA risk prediction in primary prevention ICD candidates without the need for manual contouring or contrast-enhancement. The deep learning algorithm could be easily implemented in routine clinical practice and provide valuable information during the shared decision-making process.

Methods

Training cohort

The protocol was approved by the institutional review board of the Johns Hopkins Medical Institutions, and all the patients provided written informed consent. All methods were carried out in accordance with relevant guidelines and regulations. We retrospectively analyzed a training cohort with ICM and NICM who underwent CMR at Johns Hopkins Medical Institutions (Baltimore, MD) using 1.5-Tesla whole body scanners prior to primary prevention ICD implantation (median 3 days) between 2003 and 2015 with a standard imaging protocol (Left Ventricular Structural Predictors of SCD, ClinicalTrials.gov Identifier: NCT01076660). The device type included a single-, dual-chamber ICD, and CRT-D based on current guidelines1. Patients were evaluated every 6 months and after any ICD shock. Patients who were not seen in-person underwent a telephone interview to update history. Out of this cohort, we have previously reported the association between LGE gray zone extent and VA inducibility at electrophysiology study (n = 47 with ICM)24, or appropriate ICD firing (n = 235 with ICM and NICM)8, the association between LGE-based arrhythmia simulation and appropriate ICD firing (n = 41 with ICM)25, the association between LGE scar characteristics and LVEF improvement (n = 202 with ICM and NICM)26, the association between LA function and inappropriate ICD firing (n = 162 with ICM and NICM)27, the association between LGE scar complexity and VA (n = 122 with ICM)28, and the association between time-varying risk covariates and appropriate ICD firing (n = 382 with ICM and NICM)29. In this study, 350 patients out of this cohort were included where both LGE and 4cv cine CMR images were available to test whether a deep learning algorithm can extract features associated with VA from 4cv cine CMR images in an unsupervised fashion. The primary outcome was defined as adjudicated appropriate ICD shock or appropriate anti-tachycardia pacing (ATP) for VA, including ventricular tachycardia and fibrillation. The secondary endpoint was death due to HF and all-cause death. Deaths were classified according to the most proximate cause after review of ICD interrogations, medical records, death certificates, autopsy reports, and eyewitness accounts.

CMR imaging and analysis

The cohort (n = 350) were studies in one of two different types of scanners (MAGNETOM Avanto, 1.5 Tesla, Siemens Healthcare, Erlangen, Germany [n = 263, 75%] and Signa CV/I, 1.5 Tesla, GE Healthcare, Milwaukee, WI [n = 87, 25%]). Details of CMR imaging and analysis are described in Supplementary Appendix 1 and Supplementary Fig. 1. Briefly, short- and long-axis cine images were acquired with a steady-state free precession sequence. Two- (2-D) or three-dimensional (3-D) LGE cross-sectional short- and long-axis images were acquired starting at approximately 15 min after intravenous administration of 0.15–0.20 mmol/kg of gadolinium-based contrast agent. Typical parameters were TR = 5.4–8.3; TE = 1.3–3.9; TI optimized for nulling of normal myocardium; spatial resolution 1.4–1.5 × 2.2–2.4 × 8 mm. Two observers analyzed all LGE images acquired from both scanners using research software (Cinetool, GE Healthcare, Milwaukee, WI). The core scar and the gray zone were quantified as all pixels with signal intensity (SI) > 50% of maximal SI within the hyper-enhanced region, and SI greater than the peak SI in the normal myocardium but < 50% of the maximal, respectively8. Multimodality Tissue Tracking software (MTT, version 6.0, Canon Medical Systems, Japan) was used to obtain phasic LA volumes, strain, and strain rate from four-chamber cine CMR images27.

Algorithm development

In this work, the above-mentioned cine CMR dataset was used to train two independent neural networks. First, a probabilistic encoder-decoder neural network14 was trained to extract cardiac structure and function features from 4cv cine CMR in a form of cine fingerprint in a fully unsupervised fashion (Cine Fingerprint Extractor, Fig. 1A). Each of the 4cv cine CMR images was cropped to include only the heart within a square of 128 × 128 pixels with a spacing of 2.2 mm. An L2 reconstruction error term, a regularizer acting on the distributions of the latent space, was used to learn a probabilistic fingerprint space. The network was trained using sixfold cross-validation (using 5/6 of the cases for training and 1/6 for validation for each fold). Second, an autoencoder neural network was trained by regressing disease outcomes as a cine risk score (0 to 1 probability scale) with the cine fingerprint as an input (Risk Predictor, Fig. 1B). The network was trained using sixfold cross-validation and was re-trained for each outcome separately (VA, heart failure death, and all-cause death). To enable the use of censored data, a partial likelihood loss function derived from Cox’ semi-parametric proportional hazards model was utilized30. To ensure data cannot leak between both networks, the training procedure was repeated 6 times by selecting a different fold as the validation data set. The results of all 6 validation groups from the different trainings were then combined to evaluate the complete dataset. In addition the Risk Predictor network was evaluated using bootstrapping31 for measuring the performance. Further details of the algorithm development are described in Supplementary Appendix 2 and Supplementary Fig. 2. We implemented both neural networks using Keras (ver. 2.3, https://keras.io/, Google, LLC, Mountain View, CA, USA) with Tensorflow (ver. 1.14, https://www.tensorflow.org/, Google, LLC, Mountain View, CA, USA) backend.

Statistical analysis

We used Pearson's χ2 test for categorical variables and the Student t-test or Mann–Whitney U test for parametric or nonparametric continuous variables, respectively. For risk prediction, we report the mean concordance (C-) index32 between predicted risk score and actual event time. We applied bootstrapping31 with resampling of 100 times for computing the C-index and CI to confirm the cross-validated results. We computed the hazard ratio (HR) including CI and p-value by fitting a linear Cox regression model on the predicted risk scores. To this end, the median risk value was used to divide the cohort in a low- and a high-risk group. Kaplan–Meier estimates for cumulative survival rates for the low- and high-risk groups were determined and statistically evaluated with the log-rank test. Cox proportional hazards models were used to estimate the association between variables and endpoints. Univariable analyses of all baseline variables were performed. Multivariable analyses were performed separately for each CMR feature by adjusting for clinical variables significantly associated with outcomes in univariable analysis (p < 0.05) and/or clinically deemed important (Model 1). We also performed competing risk analysis for the cumulative incidence of VA with death as a competing event, using the method by Fine and Gray33 and net reclassification improvement (NRI) analysis34. We used STATA (ver 16, Stata Corp LP, College Station, TX) and lifelines (ver 0.25, https://lifelines.readthedocs.io/) for statistical analysis. A two-sided p-value of < 0.05 was considered statistically significant.