Main

Sudden cardiac death (SCD) continues to be a leading cause of mortality worldwide, with an incidence of 50–100 per 100,000 in the general population in Europe and North America1, and accounts for 15–20% of all deaths2. Patients with coronary artery disease are at the highest risk of arrhythmic sudden cardiac death (SCDA)3,4. Although implantable cardioverter devices (ICDs) effectively prevent SCD due to ventricular arrhythmias, current clinical criteria for ICD candidacy—that is, left ventricular ejection fraction (LVEF) <30–35%5—capture only a mere 20% all SCDAs6, highlighting the critical need to develop personalized, accurate and cost-effective arrhythmia risk assessment tools to mitigate this enormous public health and economic burden. Several studies have identified risk factors for SCDA, and many risk stratification approaches have attempted to transcend LVEF7,8. However, limitations in these approaches have been barriers to their clinical implementation. Previous attempts have broadly stratified populations based on subgroup risk, failing to customize predictions to patients’ unique clinical features9. SCDA risk has been typically assessed at pre-defined finite time points, ignoring the likely patient-specific time evolution of the disease10. Additionally, in previous work, confidence estimates for predictions have been ‘one-size-fits-all’, varying only by risk subgroup, thus preventing the identification of low-confidence, potentially highly erroneous prediction outliers11. Moreover, few previous studies have validated their results externally or comprehensively compared model performance to standard approaches. A robust, generalizable SCDA risk stratifier with the ability to predict individualized, patient-specific risk trajectories and confidence estimates could considerably enhance clinical decision-making. Finally, although arrhythmia arises, mechanistically, from the heterogeneous scar distribution in the disease-remodeled heart, machine learning the features of that distribution has not been explored for risk analysis. Image-derived mechanistic computational models of cardiac electrical function that incorporate scar distribution have proven successful in predicting arrhythmia risk12; however, they remain exceedingly computationally intensive. Therefore, computational models are impractical as a first-stage screening tool in a broad population. Using raw contrast-enhanced (late gadolinium enhancement (LGE)) cardiac images that visualize scar distribution in a DL framework, which additionally draws on standard clinical covariates, could overcome these limitations and lead to accurate patient-specific SCDA probabilities in fractions of a second.

Here we present a DL technology for prediction of SCDA risk in patients with ischemic heart disease. Our approach, which we term Survival Study of Cardiac Arrhythmia Risk (SSCAR), embeds, within a survival model, neural networks to estimate individual patient times to SCDA (TSCDA). The neural networks learn from raw clinical imaging data, which visualize heart-disease-induced scar distribution, as well as from clinical covariates. The predicted patient-specific survival curves offer accurate SCDA probabilities at all times up to 10 years. The performance and high generalizability of the approach are demonstrated by testing on an external cohort, after internal cross-validation. Our technology represents a fundamental change in the approach to arrhythmia risk assessment, as SSCAR uses the data to directly estimate uncertainty in its predictions. Therefore, SSCAR has the potential to considerably shape clinical decision-making regarding arrhythmia risk, offering not a simple ‘at-risk/not-at-risk’ prediction but, instead, an estimate of the time to SCDA together with a sense of ‘how certain’ the model is about each predicted TSCDA.

Results

SSCAR overview

The arrhythmia risk assessment algorithm in SSCAR is a DL framework that incorporates multiple custom neural networks (which fuse different data types), combined with statistical survival analysis, to predict patient-specific probabilities of SCDA at future time points. Figure 1 presents an overview of SSCAR. On the left and right, cardiac magnetic resonance (CMR) images and clinical covariates (yellow panel) are used as inputs to the two corresponding branches of the model. The goal of each of the branches is to predict the patient-specific survival curve. In the left branch, cardiac CMR images—visualizing the patients’ three-dimensional (3D) ventricle geometry and contrast-enhanced remodeled tissue—are used as input by a custom-designed encoder–decoder convolutional neural sub-network (red panel, left). This CMR sub-network is trained to reduce the dimension of the input (that is, encode) and to discover and extract imaging features associated with SCDA risk directly from the CMR images by learning and applying filters (that is, convolving). The encoder–decoder design of the sub-network ensures that resulting imaging features retain sufficient information to be able to reconstruct the original images (red panel, left, decoder path). In the right branch, the 22 clinical covariates in Table 1 are provided to a dense sub-network (green panel, right), which discovers and extracts non-linear relationships between the input variables. The outputs of the sub-networks are combined (ensembled) in a way that best fits the observed SCDA event training data (center path, dot-dashed) to estimate the most probable TSCDA and the uncertainty in the prediction. The output of the model is a per-patient cause-specific survival curve (bottom, blue).

Fig. 1: Schematic overview of SSCAR.
figure 1

Top panel (yellow) shows patient data used in this study. SSCAR uses contrast-enhanced (LGE)-CMR images with the left ventricle automatically segmented (left inset) and clinical covariates (right inset; see Methods and Table 1 for a complete list) as inputs to the two sub-networks (left and right pathways). Labels associated with each patient (SCDA events, middle inset, dot-dashed contour)—consisting of the observed times to event and indicators whether the events were SCDA or non-SCDA—are used as targets during training only. LGE-CMR data is taken as input by a 3D convolutional neural network constructed using an encoder–decoder architecture (red panel, left). Clinical covariates are fed to a dense neural network (green panel, right). The sub-networks are trained to estimate two parameters (location μ and scale σ) specific to each patient, which fully characterize the probability distribution of the patient-specific time to SCDA (top blue panel; the time to SCDA is modeled as probabilistic, assumed to follow a log-logistic distribution). During training (dot-dashed arrows and white middle panel), the neural network weights are optimized via a maximum likelihood process, in which a probability distribution is sought (blue double-headed arrow in the middle white panel) to best match the observed survival data (yellow ‘x’s in the middle white panel). Finally, the optimized probability function is used on test LGE-CMR images and covariates to predict patient individualized survival curves (blue bottom panel).

Table 1 Clinical covariate data.

SSCAR overall risk prediction performance

SSCAR was developed and internally validated using data from 156 patients with ischemic cardiomyopathy (ICM) enrolled in the Left Ventricle Structural Predictors of Sudden Cardiac Death (LVSPSCD) prospective observational study11,13. SSCAR performance was evaluated comprehensively on this internal set using Harrell’s concordance index (c-index)14—range is [0, 1], higher scores are better—and the integrated Brier score (\(\overline{\,{{\mbox{Bs}}}\,}\))15—range is [0, 1], lower scores are better. SSCAR has excellent concordance on the internal set (.82–.89) for all times up to 10 years (Fig. 2a). Additionally, the \(\overline{\,{{\mbox{Bs}}}\,}\) ranges from .04 to 0.12, suggesting strong calibration, given the high concordance. The model maintains its risk discrimination abilities at all times, as further evidenced by the high areas under the receiver operator characteristic (ROC) curves evaluated at years 2–9 (Extended Data Fig. 1). All events up to 10 years are used to construct the cross-validated ROC and precision-recall (PR) curves for the internal validation set (Fig. 2b,c). The area under the ROC curve is 0.87 (95% confidence interval (CI): 0.84–0.90), whereas the area under the PR curve is 0.93 (95% CI: 0.91–0.95).

Fig. 2: SSCAR overall performance.
figure 2

a, C-index (top, blue) measuring model risk discrimination—higher is better—and integrated Brier score (bottom, red) showing overall fit—lower is better—for various time points. b, ROC curve at 10 years for the internal validation and external test cohorts, with the respective areas under the curve (AUROC). c. PR curve at 10 years for the internal validation and external test cohorts, with the respective areas under the curve (AUPR). For all panels, shaded areas represent approximate 95% CIs; solid and dashed lines indicate averages for the internal and external cohorts, respectively; and random chance performance thresholds are shown using dotted lines (the dot-dashed line is used to differentiate the internal random chance performance from the external). The chosen time of 10 years was used to capture all SCDA events in the population.

Source data

To demonstrate the model’s performance, an external test was performed using an independent case–control set of 113 patients with coronary heart disease selected from participants with available CMR images and the same list of covariates enrolled in the PRE-DETERMINE study16. These patients had less severe left ventricular systolic dysfunction but otherwise had similar inclusion/exclusion criteria to patients in the LVSPSCD study (Methods). Despite the dissimilarities between cohorts, SSCAR performance carries over well to the external cohort, resulting in a c-index of 0.71–0.77 and \(\overline{\,{{\mbox{Bs}}}\,}\) of 0.03–0.14 (Fig. 2a; dashed lines). The area under the ROC curve is 0.72 (95% CI: 0.67–0.77), and the area under the PR curve is 0.73 (95% CI: 0.68–0.78), on the external set (Fig. 2b,c).

Patient-specific survival curves predicted by SSCAR

The SSCAR survival model presented here predicts cause-specific survival curves for each patient through two individualized parameters—the location μ and scale σ—characterizing the probability distribution of TSCDA (Methods). Using deep neural networks to directly learn these parameters from CMR images and from clinical covariates in a way that best models the survival data produces highly individualized survival probability predictions. Extended Data Fig. 2a illustrates individualized cause-specific survival curves (solid, blue) for a patient with TSCDA around 6 years (left panel) and a patient censored (non-SCDA event) at around 7 years (right panel). In both cases, the survival curves estimated by SSCAR accurately predict the event probabilities: in the first case, the estimated survival probability crosses the 50% threshold close to the event time; in the censored case, SSCAR predicts more than 80% probability of survival at the time of the (non-SCDA) event. For reference, two commonly used survival curves are depicted: the Kaplan–Meier estimate (purple, dot-dashed) and the Breslow estimate based on a Cox proportional hazards model using the clinical covariates (green, dashed), demonstrating worse performance by underestimating the risk for the patient with SCDA and overestimating the risk for the censored patient. Further details on SSCAR’s internal performance compared to the Cox proportional hazards model are presented in Fig. 3.

Fig. 3: Model comparison.
figure 3

In blue (left y axis), c-index (dark blue), balanced accuracy (BA, mid-blue), F-score (light blue) and in red (right y axis) integrated Brier score (\(\overline{\,{{\mbox{Bs}}}\,}\)) are shown for a standard Cox proportional hazards model fit on the clinical covariates (linear Cox PH), the covariate sub-network of the SSCAR approach using clinical covariates with a Cox survival model (covariate only, Cox), the covariate sub-network of SSCAR using clinical covariates and the log-logistic survival model (covariate only), the CMR sub-network using images only (CMR only) and the full arrhythmia prediction neural network model (SSCAR). Random chance performance thresholds are shown using dotted lines. All performance measures are calculated using data up to τ = 10 years. All model comparison values are based on averages over 100 cross-validation train/test splits of the internal validation dataset. The error bars represent approximate 95% CIs.

Source data

The predicted location parameter estimates the most probable TSCDA, and the predicted scale parameter provides a measure of confidence for the location. The inclusion of both a location and a scale parameter in the model offers the advantage of building in uncertainty directly into the TSCDA prediction. Notably, this uncertainty is patient-specific and learned from data. Extended Data Fig. 2b presents examples of predicted TSCDA probability distributions for two patients (P1 and P2) with different scale parameters, visualized as the widths of the distributions. Shown are the actual (dotted) and predicted (solid) TSCDA as well as the probability distributions (shaded). For P1, the prediction error is small (solid versus dashed vertical lines), and the model is certain, as seen by the narrower probability distribution of P1’s TSCDA or, equivalently, a smaller predicted scale parameter. In the case of P2, the prediction error is larger, and the model predicts a wider distribution or, equivalently, a larger scale parameter, indicating higher uncertainty. Remarkably, using the entire internal cohort to quantify this direct relationship between prediction error—calculated as the relative mean absolute difference of actual and predicted times—and scale parameter reveals significant positive correlation (Pearson’s r = 0.42, P < 0.001), demonstrating that SSCAR recognizes which predictions of TSCDA will turn out inaccurate and ‘lowers the confidence’ in them through a larger scale parameter.

Image-based risk prediction

The CMR sub-network (see Extended Data Fig. 3 for architecture details) in SSCAR integrates neural network DL on images within an overall statistical survival model. This branch of SSCAR uses LGE-CMR—a modality uniquely suited for visualizing ventricle geometry and portions of the myocardium with contrast-enhanced remodelling—to learn image features most useful in predicting a patient’s survival TSCDA. CMR raw pixel values from the automatically segmented left ventricle are directly provided to the network, eliminating the need for arbitrary thresholds aiming to delineate areas of enhancement. Using only images as inputs (Fig. 3 and Supplementary Table 1), SSCAR achieves 0.70 (95% CI: 0.67–0.72) c-index and 0.17 (95% CI: 0.167–0.178) \(\overline{\,{{\mbox{Bs}}}\,}\) for event data truncated at 10 years on the internal validation set. On the external testing set, the CMR-only model achieves 0.63 (95% CI: 0.59–0.66) c-index and 0.19 (95% CI: 0.186–0.200) \(\overline{\,{{\mbox{Bs}}}\,}\). It is noteworthy that, among the covariate sub-network’s 22 clinical covariates, it already includes manually engineered features from the CMR images. For example, infarct size—calculated as the percentage of left ventricular tissue deemed fibrotic using manual segmentation performed by trained experts—was among the 22 and, indeed, had considerable effect on lowering TSCDA. Despite including CMR-based features in the covariate network, the CMR sub-network (using only CMR as inputs) achieves similar performance to the covariate one (Fig. 3). Furthermore, ensembling the two sub-networks together leads to a significant increase in overall performance compared to using just the covariate-based one, demonstrating that the CMR sub-network identifies different CMR-based features than the manually engineered ones.

Imaging features learned by the CMR network can be interpreted using a gradient-based sensitivity analysis (Fig. 4a). The gradient here quantifies the effect on the predicted TSCDA of features identified by the CMR neural network, which are averaged per patient to form the gradient map (Methods). This map overlaid on the myocardium (right column, blue and red heat map) shows the degree of contribution of the local pixel intensity to the most probable TSCDA (that is, to the location parameter) for a patient without an SCDA event (top) and one with SCDA (bottom). Myocardial regions found to be characterized with large positive gradient (dark blue) are interpreted as having high importance in increasing TSCDA, and, conversely, regions with large magnitude negative gradient (dark red) represent areas that are responsible for decreasing the predicted TSCDA. The areas of contrast-enhanced myocardium (middle column, in brighter green) do not fully overlap with the gradient map, which suggests that, although features learned by the CMR neural network may co-localize with enhanced tissue, the algorithm does not act as a mere enhancement locator. For example, the patient who did not experience SCDA has contrast-enhanced tissue, but the effect of these regions is to increase the predicted TSCDA, suggesting a nuanced relationship between presence of enhancement and propensity of SCDA.

Fig. 4: SSCAR interpretation.
figure 4

The features learned by SSCAR are interpreted by performing a gradient-based sensitivity analysis of the location parameter (the most probable TSCDA) to changes in the neural network input or features. The gradient value quantifies this sensitivity. The magnitude of the gradient measures the strength of the sensitivity of the predicted TSCDA to inputs or intermediary features. The sign of the gradient shows the direction of the effect. That is, for a small increase in the value of inputs or features, a positive gradient (blue) indicates a higher predicted TSCDA, whereas a negative gradient (red) indicates a decrease in the predicted TSCDA. a, Shown is the CMR sub-network feature interpretation for an example patient who did not experience SCDA (No SCDA, top) and for a patient who did (SCDA, bottom). For each patient, a subset of 3 of the 12 contrast-enhanced short-axis CMR images (corresponding to three locations in the heart, base to apex, top to bottom, left column) used as inputs by SSCAR are overlaid with blood pool and myocardium segmentation (middle column, orange and green, respectively). A heat map of extracted features scaled by the value of the gradient shows contribution of the local pixel intensity to the predicted location parameter for the last convolutional layer (right column, blue and red heat maps). Of note, although the patient with SCDA shows high gradients in areas with contrast enhancement, the patient on the left shows that enhancement can also lead to positive gradients, suggesting that the network does not simply create a mask of the enhanced regions to make predictions but learns a nuanced relationship between scar and propensity for SCDA. b, Covariate sub-network interpretation based on an average of all patients (mid-blue and mid-red bars), patients with SCDA (dark blue and dark red bars) and patients with no SCDA (light blue and light red bars). Top four highest (blue bars) and bottom four lowest (red bars) average gradients of the neural network output (that is, the predicted location parameter) with respect to the clinical covariate inputs are shown. The error bars represent approximate 95% CIs. LVEF CMR, left ventricular ejection fraction computed from CMR; betablock, use of β-blocker medication; ECG hr, heart rate from ECG; digoxin, use of digoxin medication; infarct %, infarct size as % of total volume; ECG QRS, QRS complex duration from ECG; LV mass ED, left ventricular mass in end-diastole.

Source data

Non-linear neural network for covariate data

SSCAR incorporates patient clinical covariate data (Table 1) through the use of a dense, multi-layer neural network (Fig. 1, green panel). This sub-network discovers and extracts potential non-linear relationships between the covariates and integrates them within SSCAR’s overall survival predictions. We demonstrate the utility of the sub-network by comparing its performance with a (linear) Cox proportional hazards model (Fig. 3). To avoid mis-attributing performance differences to the underlying statistical models, we consider an intermediary model that uses neural network feature extraction with a Cox proportional hazards model. Using clinical covariate data only, SSCAR with a Cox survival model (covariate only, Cox) outperforms the standard Cox proportional hazards model (linear Cox PH) in terms of c-index (0.73 versus 0.58, dark blue, left y axis), balanced accuracy (0.65 versus 0.45, mid-blue, left y axis), F-score (0.78 versus 0.69, light blue, left y axis) and \(\overline{\,{{\mbox{Bs}}}\,}\) (0.14 versus 0.30, red, right y axis). We show that the neural network model maintains interpretability by performing a sensitivity analysis of the predicted TSCDA with respect to changes in the covariates (Fig. 4b). As above, high positive gradients (blue) denote covariates for which small increases in their values lead to large increases in TSCDA, whereas small negative gradients (red) represent covariates for which small increases lead to large decreases in TSCDA. The top four positive gradient covariates are LVEF computed from CMR, β-blocker medication, heart rate computed from electrocardiogram (ECG) and use of digoxin. The bottom four negative gradient covariates are left ventricular mass at end-diastole, use of diuretic medication, QRS duration computed from ECG and infarct size (%).

Discussion

Here we present an approach to SCDA risk assessment, termed the SSCAR framework, which uses a deep neural network survival model to predict patient-specific survival curves in ischemic heart disease. SSCAR consists of two neural networks: (1) a 3D convolutional network learning on raw unsegmented for enhancement LGE-CMR images that visualize heart-disease-induced scar distribution and (2) a fully connected network operating on clinical covariates. SSCAR’s predicted patient-specific survival curves offer accurate SCDA probabilities at all times up to 10 years. SSCAR is not only a highly flexible model, able to capture complex imaging and non-imaging feature inter-dependencies, but is also robust owing to the statistical framework governing the way these features are combined to fit the survival data. Our framework predicts entire probability distributions for the TSCDA, allowing for uncertainties in predictions to be themselves patient-specific and learned from data, thereby equipping the model with a self-correction mechanism. This approach remedies a well-known important limitation of neural networks—the high confidence in erroneous predictions. SSCAR’s integration of deep neural network learning within a survival analysis and the resulting detailed outputs could represent a paradigm shift in the approach to SCDA risk assessment.

Despite many heralding DL as the arrival of the artificial intelligence (AI) age in personalized healthcare17,18,19,20,21, no considerable progress has so far been made using DL on contrast-enhanced cardiac images to assess arrhythmia risk. Although there have been non-DL efforts to incorporate clinical imaging-derived features in SCDA risk stratification22,23,24, these severely underuse the data, suffering from two main limitations: features often rely on time-consuming, manual processing steps, typically involving arbitrarily chosen image intensity thresholds; or features are either too coarse to capture the intricacies of the scar distribution or highly mathematical, undermining their physiological underpinning. On the other hand, the DL efforts related to arrhythmia have focused primarily on its cardiologist-level detection in ECG signals25,26,27,28,29. In the current work, we present a DL approach that takes as input directly raw, unsegmented LGE-CMR images and automatically identifies features that best model and predict the TSCDA.

SSCAR is an SCDA risk prediction model that combines raw imaging with other data types in the same DL framework. Our technology operates on LGE-CMR images and clinical covariates within a unified feature learning process, allowing for the different data types to synergistically inform the overall survival model. Among the clinical covariates used in SSCAR are standard manually derived imaging features, which prevents the CMR neural network from merely re-discovering these known features and, instead, encourages it to learn new features. SSCAR achieves performance that is beyond the state-of-the-art in both relative terms—SCDA risk ordering among patients—and absolute terms—accurately calibrated probabilities of SCDA. Our robust testing scheme overcomes important limitations of previous work on SCDA risk prediction10,16,22,23,30. First, we demonstrate high generalizibility by computing internal cross-validation performance numbers resulting from 100 train/test splits of the data and, notably, on an entirely separate external cohort, showing modest performance degradation. Second, our approach prevents the model from being over-tuned to a certain time horizon by computing performance metrics at multiple time points up to 10 years.

Because SSCAR is a combination of neural networks, each working on different data types (images and clinical covariates), we were able to perform a comprehensive bottom-up analysis of overall performance. We demonstrated that the added complexity of our DL approach—potentially at some expense to interpretability—is justified by the significantly elevated performance numbers. Indeed, we developed and evaluated a regularized Cox proportional hazards model using the available clinical covariates to serve as a baseline for the rest of the analysis. We showed that the neural-network-driven feature extraction of SSCAR on the same covariates performs significantly better in the same proportional hazards setting, highlighting the importance of non-linear relationships in the covariates. Furthermore, we showed that, even when using only LGE-CMR images to predict arrhythmia risk, the CMR neural network in SSCAR (1) outperforms the Cox proportional hazards model constructed using clinical covariates, which include standard imaging and non-imaging features, and (2) performs on par with the covariate-only network in SSCAR using the same clinical variables, suggesting that the image-only neural network in SSCAR is able to identify highly predictive imaging features in the LGE-CMR images. Finally, we demonstrate that the imaging features found by SSCAR’s CMR network cannot be explained away even when considering non-linear relationships between standard covariates, as evidenced by the ensembled SSCAR model superior performance over SSCAR using either data type.

Notably, a level of interpretability is embedded in the overall design of the custom neural network used in SSCAR. Interpretability of AI algorithms is paramount to their broad adoption, and concerns surrounding it are particularly prevalent in healthcare. In our approach, we take multiple steps to ensure the relevance and interpretability of resulting features. Our sensitivity analysis of the outputs to the extracted features offers a lens into the neural network, rendering some transparency to the algorithm ‘black-box’ (Fig. 4). In addition, CMR images taken as input by the CMR neural network are automatically segmented to include myocardium-only raw intensity values, and the network is designed as an encoder–decoder to ensure minimal loss of information during the feature extraction process.

SSCAR achieves strong performance despite working on a relatively small dataset. A concern with DL on smaller datasets is overfitting, which manifests itself as high performance during training (good fit) but poor performance when applied to a new test set. Indeed, the results in this paper show some differences between metrics on the internal validation and external test cohorts. However, we emphasize that, although the two cohorts’ covariates were ‘harmonized’ where possible (Methods), they represent two different distributions (for example, low versus moderately reduced LVEF, unmatched versus matched case–control and three versus 60 CMR acquisition sites), likely accounting for any performance differences in the two populations. Furthermore, several measures were taken to mitigate overfitting. In addition to standard techniques—dropout, kernel and bias regularizers—we designed the CMR sub-network as an encoder–decoder that uses the distilled features used in risk prediction to also re-construct the original image as an additional regularization technique. Finally, all numbers cited on the internal validation set are averages of the test performance of hundreds of train/test data splits, adding a layer of statistical rigor.

In SSCAR, we directly model the cause-specific hazard rate and use the implied survival function to make predictions. A potential shortcoming of models that do not directly model competing risks is that predicted probabilities for the event of interest assume a reality where no other type of death could occur, thereby potentially undermining interpretability. A limitation here is that we could not compute the cause-specific cumulative incidence function, as it requires additional all-cause mortality data as well as competing risk data (for example, revascularization data). However, should such data become available, our competing risk framework makes such an extension straightforward.

An additional limitation in this work is that the list of covariates is not comprehensive. Few standard clinical covariates were dropped when ‘harmonizing’ the internal and external cohorts (for example, all diuretic types were merged into one variable, and there were no angiotensin receptor-neprilysin inhibitor data). However, because no left ventricle standard imaging covariates were excluded, we do not expect any of the omitted variables to affect conclusions drawn regarding the performance of the sub-components of SSCAR relative to the baseline Cox model. Including additional covariates identified in past work as predictors of SCDA, but not part of standard clinical practice, was beyond the scope of our work. However, these could, in principle, erode the performance of the image-based feature extraction in SSCAR in favor of the covariate-only part. Nevertheless, we would expect that, in general, including more variables with proper regularization can only improve the overall results in SSCAR, even if a re-balance of its components’ performance contribution occurs. Similarly, including right ventricle CMR images and parameters and adjusting the methodology accordingly could help generalize SSCAR to more cardiomyopathies.

SSCAR fuses cutting-edge DL technology with modern survival analysis techniques. It represents innovation in CMR imaging feature extraction and learning of non-linear relationships among standard clinical covariates. The technology aims to transform clinical decision-making regarding arrhythmia risk and patient prognosis by encouraging practitioners to eschew the view of predicted risk as a single number outputted by a ‘black-box’ algorithm but, rather, to be guided by the estimated time-to-outcome in the context of patient-specific time prediction uncertainty, which is itself built in SSCAR’s learning process. Through its accurate predictions and considerable levels of generalizability and interpretability, SSCAR represents an essential step toward bringing patient trajectory prognostication into the age of AI.

Methods

The research protocol used in this study was reviewed and approved by the Johns Hopkins University institutional review board and the Brigham and Women’s Hospital institutional review board. All participants provided informed consent to be part of the clinical studies described below. There was no participant compensation.

Patient population and datasets

This study was a retrospective analysis based on a subset (n = 269) of patients selected from the prospective clinical trials described below using the process outlined in Extended Data Fig. 4. Of note is that the entire model development described in this manuscript was based on the internal cohort (see below), whereas the case–control external cohort was used exclusively for testing (outcomes were solely used for computing relevant metrics once the model was fixed).

LVSPSCD cohort (internal)

Patient data came from the LVSPSCD study (ClinicalTrials.gov ID NCT01076660) sponsored by Johns Hopkins University. As previously described11,13, patients satisfying clinical criteria for ICD therapy for SCDA (LVEF ≤35%) were enrolled at three sites: Johns Hopkins Medical Institutions (Baltimore, Maryland), Christiana Care Health System (Newark, Delaware) and the University of Maryland (Baltimore, Maryland). A total of 382 patients were enrolled between November 2003 and April 2015. Patients were excluded if they had contraindications to CMR, New York Heart Association (NYHA) functional class IV, acute myocarditis, acute sarcoidosis, infiltrative disorders (for example, amyloidosis), congenital heart disease, hypertrophic cardiomyopathy or renal insufficiency (creatinine clearance <30 ml min−1 after July 2006 or <60 ml min−1 after February 2007). The protocol was approved by the institutional review boards at each site, and all participants provided informed consent. CMR imaging was performed within a median time of 3 days before ICD implantation. The current study focused on the ischemic cardiomyopathy patient subset with adequate LGE-CMR, totaling 156 patients. As part of the clinical study, the participants had undergone single-chamber or dual-chamber ICD or cardiac resynchronization with an ICD implantation based on current guidelines. The programming of anti-tachycardia therapies was left to the discretion of the operators.

PRE-DETERMINE and DETERMINE Registry cohorts (external)

The PRE-DETERMINE (ClinicalTrials.gov ID NCT01114269) and accompanying DETERMINE (ClinicalTrials.gov ID NCT00487279) Registry study populations are multi-center, prospective cohort studies comprised of patients with coronary disease on angiography or documented history of myocardial infarction (MI). The PRE-DETERMINE study enrolled 5,764 patients with documented MI and/or mild to moderate left ventricle dysfunction (LVEF between 35% and 50%) who did not fulfill consensus guideline criteria for ICD implantation on the basis of LVEF and NYHA class (that is, LVEF >35% or LVEF between 30% and 35% with NYHA Class I heart failure) at study entry6. Exclusion criteria included a history of cardiac arrest not associated with acute MI, current or planned ICD or life expectancy of less than 6 months. The accompanying DETERMINE Registry included 192 participants screened for enrollment in PRE-DETERMINE who did not fulfill entry criteria on the basis of having an LVEF <30% (n = 99), an LVEF between 30% and 35% with NYHA Class II–IV heart failure (n = 19) or an ICD (n = 31) or who were unwilling to participate in the biomarker component of PRE-DETERMINE (n = 43). Within these cohorts, 809 participants had LGE-CMR imaging performed. Within this subset of patients, 23 cases of SCD occurred and were matched to four controls on age, sex, race, LVEF and follow-up time using risk set sampling. Of the resulting 115 patients, the current study focused on 113 patients with adequate LGE-CMR images for analysis. Finally, covariate data for this cohort were minimally ‘harmonized’ with the internal cohort, by retaining common covariates only. Some important differences between the external and internal cohorts remained, such as significantly higher LVEF in the external cohort.

LGE-CMR acquisition

The CMR images in the internal and external cohort were acquired using 1.5-T magnetic resonance imaging devices (Signa, GE Medical Systems; Avanto, Siemens). The exact software versions for the devices cannot be precisely retroactively ascertained given the very broad nature of the study. All were two-dimensional (2D) parallel short-axis left ventricle stacks. The contrast agent used was 0.15−0.20 mmol kg−1 of gadodiamide (Omniscan, GE Healthcare) or gadopentetate dimeglumine (Magnevist, Schering), and the scan was captured 10−30 minutes after injection. Owing to the multi-center nature of the clinical studies considered here, there were variations in CMR acquisition protocols. The most commonly used sequence was inversion recovery fast gradient echo pulse, with an inversion recovery time typically starting at 250 ms and adjusted iteratively to achieve maximum nulling of normal myocardium. Typical spatial resolutions ranged 1.5−2.4 × 1.5−2.4 × 6−8 mm, with 2−4-mm gaps. CMR images in the external cohort were sourced from 60 sites with a variety of imaging protocols, whereas those in the internal cohort originated from three sites and were more homogeneous. No artifact corrections were applied to the images. More details regarding on CMR acquisition can be found in previous work11,13,31,32.

Clinical data and primary endpoint

In both LVSPSCD and PRE-DETERMINE/DETERMINE cohorts, baseline data on demographics, clinical characteristics, medical history, medications, lifestyle habits and cardiac test results were collected (see Table 1 for a list of the common ones between the cohorts that were used in SSCAR). The primary endpoint for LVSPSCD was SCDA defined as therapy from the ICD for rapid ventricular fibrillation or tachycardia or a ventricular arrhythmia not corrected by the ICD. For the PRE-DETERMINE studies, the primary endpoint was sudden and/or arrhythmic death. Deaths were classified according to both timing (sudden versus non-sudden) and mechanism (arrhythmic versus non-arrhythmic). Unexpected deaths due to cardiac or unknown causes that occurred within 1 hour of symptom onset or within 24 hours of being last witnessed to be symptom free were considered sudden cardiac deaths. Deaths preceded by an abrupt spontaneous collapse of circulation without antecedent circulatory or neurological impairment were considered arrhythmic in accordance with the criteria outlined by Hinkle and Thaler16. Deaths that were classified as non-arrhythmic were excluded from the endpoint regardless of timing. Out-of-hospital cardiac arrests due to ventricular fibrillation that were successfully resuscitated with external electrical defibrillation were considered aborted arrhythmic deaths and were included in the primary endpoint.

Data preparation

The inputs to our model were the unprocessed LGE-CMR scans and the clinical covariates listed in Table 1. The training targets were the event time and event type (SCDA or non-SCDA). As a pre-processing step, the raw LGE-CMR scans were first segmented for left ventricle myocardium using a method based on convolutional neural networks developed and described in previous work33. In brief, this segmentation network consisted of three sub-networks: a U-net with residual connections trained to identify the entire region of interest; a U-net with residual connections trained to delineate the myocardium wall; and an encoder–decoder tasked with correcting anatomical inaccuracies that may have resulted in the segmentation. In this context, anatomical correctness was defined via a list of pass/fail rules (for example, no holes in the myocardium, circularity threshold and no disconnected components). Once each patient’s LGE–CMR 2D slices were segmented via this method, they were stacked, all voxels outside the left ventricle myocardium were zeroed out and the slices were sorted apex-to-base using DICOM header information and step-interpolated on a regular 64 × 64 × 12 grid with voxel dimensions 2.5 × 2.5 × 10 mm. These dimensions were chosen to make all patient volumes consistent with minimal interpolation from the original resolution while allowing enough room to avoid truncating the left ventricle. Finally, the input to the neural network model consisted of a two-channel volume (that is, 64 × 64 × 12 × 2). The first channel was a one-hot encoding of the myocardium and blood pool masks. The second channel had zeros outside of the myocardium and the original CMR intensities on the myocardium, linearly scaled by multiplication with half the inverse of the median blood pool intensity in each slice. To mitigate overfitting, train/time data augmentation was performed on the images, specifically 3D in-plane rotations in increments of 90° to avoid artifacts and panning of the ventricle within the 3D grid. The clinical covariate data were de-meaned and scaled by the standard deviation.

Survival model

Statistical fit

For each patient i, the outcome data were the pair (Xi, Δi), where Xi is the minimum between the time to SCDA from arrhythmia Ti and the (right) censoring time Ci, after which either follow-up was lost or the patient died due to a competing risk. The outcome Δi is 1 if the patient had the arrhythmic event before they were censored (TiCi) and 0 otherwise. We estimated the (pseudo-)survival probability function Si(t), the probability that the time to SCDA exceeds t. We modeled the Ti values as independent, each having a cause-specific hazard rate34 based on the log-logistic distribution with location parameter μi and scale parameter σi, such that \({S}_{i}(t;{\mu }_{i},{\sigma }_{i})=1/\{1+\exp [(\log t-{\mu }_{i})/{\sigma }_{i}]\}\). The patient-specific parameters μi and σi were modeled as outputs of neural networks applied to LGE-CMR images and clinical covariates, trained by minimizing the loss function given by the negative likelihood:

$$-\log {{{\mathcal{L}}}}=-\mathop{\sum}\limits_{i}\frac{\log {x}_{i}-{\mu }_{i}}{{\sigma }_{i}}+{\delta }_{i}\log {\sigma }_{i}+(1+{\delta }_{i})\log \left[1+\exp \left(\frac{\log {x}_{i}-{\mu }_{i}}{{\sigma }_{i}}\right)\right],$$

where xi is the observed time and δi is the censoring status. With μi and σi estimated, the patient-specific survival functions were given by Si(t) as above.

Performance metrics

The all-time performance of the models was evaluated using two measures. The first was Harrell’s c-index14 with the patient-specific μi parameters as the risk scores (\(\exp ({\mu }_{i})\) is the mode of the log-logistic distribution) to gauge the model’s risk discrimination ability. The second was the integrated Brier score15, which is defined as the time-average of mean squared error (MSE) between true 0/1 outcome and predicted outcome probability and gauges both probability calibration and discrimination. Both measures were adjusted for censoring, corrected by weighing with the inverse probability of censoring, and calculated for data before a given cutoff time τ35; if unspecified, τ = 10 years, corresponding with the maximum event time in the dataset. Metrics derived from the confusion matrix (for example, precision and recall) were computed at several time points (τ = 2, 3… years). Probability thresholds at these times were selected by maximizing F-score (for precision, recall and F-score) or Youden’s J statistic (for sensitivity, specificity and balanced accuracy) on the training data. Of note, to preserve consistency in evaluation between the internal and external cohorts, metrics computed on the external cohort were not covariate adjusted, potentially underestimating performance36.

Neural network architecture

SSCAR is a supervised survival analysis regression model composed of two sub-networks, each operating on different input types (Fig. 1): a convolutional sub-network (‘CMR’), which takes the LGE-CMR images as inputs, and a dense sub-network (‘covariate’). which uses the clinical covariate data. Feature extraction in the CMR sub-network from the LGE–CMR images was achieved by a 3D convolutional encoder–decoder model. The encoder used a sequence of 3D convolutions and pooling layers, followed by one dense layer to encode the original 3D volume into a lower-dimensional vector. Non-linear activation functions and dropout layers were added before each downsampling step. The encoding was further used for two purposes: survival and reconstruction. For the survival branch, the encoding was first stratified into one of r (learned) risk categories (Supplementary Table 2) and then fed to a two-unit dense layer to predict—for each patient—a set of two parameters, location μ and scale σ, which fully characterized the probability distribution of the patient’s log-time to SCDA (see the ‘Statistical fit’ section), followed by a bespoke activation function. This activation function clipped \(\ln \mu\) on [ − 3, 3] and clipped σ from below at σmin, where σmin was found such that the difference between the 95th and 5th percentiles of the predicted TSCDA distribution was no less than 1 month. This survival activation function effectively restricted the ‘signal-to-noise’ ratio μ/σ. For the purpose of reconstruction, the encoding was decoded via a sequence of transposed convolutions to re-create the original volume. Feature extraction from the clinical covariate data was performed using a sequence of densely connected layers, followed by a dropout layer to prevent overfitting. The resulting tensor used a similar path to the one followed by the convolutional encoding to eventually map to the two survival parameters. Finally, once the two sub-networks were trained, they were frozen and joined using a learned linear combination layer to ensemble the survival predictions.

The predicted survival parameters (location and scale) aimed to minimize the aforementioned negative log-likelihood function for the log-logistic distribution, accounting for censoring in the data and class imbalance. The reconstructed output of the CMR sub-network minimized the MSE to the original input. Its contribution to total loss was learned to provide regularization to the imaging features extracted, ensuring that the survival fit relied on features able to reconstruct the original image. Both stochastic gradient descent (SGD) and Adam37 optimizers were used. All code was developed in Python 3.7 using Keras 2.2.4 (ref. 38), TensorFlow 1.15 (ref. 39), numpy 1.6.2, scipy 1.2.1, openCV 3.4.2, pandas 0.24.2 and pydicom 1.2.2. Each train/evaluate fold took 3–5 minutes on an Nvidia Titan RTX graphics processing unit.

Training and testing

The entire model development and internal validation were performed using the LVSPSCD cohort. After a hyperparameter tuning step, the best model architecture was then used on the entire internal validation set to find the best neural network weights. As the ensembling layer was hyperparameter free, it did not use hyperparameter tuning.

Hyperparameter tuning

A hyperparameter search was performed using the set of parameter values described in Supplementary Table 2, given the vast number of hyperparameter configurations available to define the model architectures. The package hyperopt 0.1.2 (ref. 40) was used to sample parameter configurations from the search space using the Parzen window algorithm to minimize the average validation loss resulting from a stratified ten-times-repeated ten-fold cross-validation process. The maximum number of iterations was 300 for the covariate sub-network and lowered to 100 for the CMR sub-network, given its highly increased capacity. Each fold was run using early stopping based on the loss value on a withheld 10% portion of the training fold with a maximum of 2,000 epochs (20 gradient updates per epoch). In hyperparameter tuning, models were optimized using SGD with a learning rate of .01 (default value in the neural network package used). The architecture with the highest Harrell’s c-index14 was selected. Hyperparameters deemed to have little effect on learning (for example, maximum number of epochs) were fixed. Convolutional kernel size and the activation function for convolutions were kept at the default values in the neural network package used. The batch size was set to the highest value, given the memory constraints of our hardware.

Internal validation and external test

Internal model performance was assessed using ten repetitions of stratified ten-fold cross-validation on the LVSPSCD cohort. Early stopping based on the c-index on a withheld 10% subset was implemented with a maximum training of 2,000 epochs (20 gradient updates per epoch). The optimizer was Adam with learning rate 10−5 for the CMR sub-network, 5 × 10−4 for the covariate sub-network and .01 for the ensemble. A final model was trained with all the available LVSPSCD data and tested on the PRE-DETERMINE cohort. Of note, the final model shares the same architecture and training parameters with all the models in the 100 internal data splits but has different (fine-tuned) weights, which are derived using the entire internal dataset. To estimate CIs on the external cohort, the same cross-validation process was applied to the PRE-DETERMINE cohort, supplementing the training data in each fold with the LVSPSCD cohort. Approximate normal CIs were constructed using the 100 folds.

Gradient-based interpretation of SSCAR

The trained network weights in SSCAR were interpreted for both covariate and CMR sub-network using the gradients of outputs with respect to intermediary neural network internal representations of data. For the CMR sub-network, we adapted Grad-CAM41 to work on regression problems and applied it to SSCAR by performing a weighted average of the last convolutional layer feature maps, where the weights were averages of gradients of the location parameter output with respect to each channel. The result was then interpolated back to the original image dimensions and overlaid to obtain the gradient maps shown (Fig. 4a, bottom row). For the covariate sub-network, the gradient of the location parameter output was taken with respect to each of the inputs and averaged over three groups: all patients, patients with SCDA and patients with no SCDA.

Statistical analysis

All values reported on the internal validation data set were averages over 100 data splits resulting from a ten-times-repeated ten-fold stratified cross-validation scheme. Values reported on the external test dataset represented a single evaluation on the entire set. All CIs were normal approximations resulting from the aforementioned 100 splits. In computing CIs for the external test set, the same procedure was used on all available data, ensuring that test folds came exclusively from the external dataset. Error bars are standard errors with sample standard deviation estimated from the 100 splits. Correlation P value was based on the exact distribution under the bivariate normal assumption. Covariate P values are based on two-sample Welch’s t-test42 for continuous variables and Mann–Whitney U-test for categorical variables. Cox proportional hazards analysis was performed using the Python lifelines 0.25.5 (ref. 43) package; it included a hyperparameter sweep for the 1 and 2 regularization terms and followed the same train/test procedure as the neural network models.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.