Enhancing cardiovascular risk prediction through AI-enabled calcium-omics

Whole-heart coronary calcium Agatston score is a well-established predictor of major adverse cardiovascular events (MACE), but it does not account for individual calcification features related to the pathophysiology of the disease (e.g., multiple-vessel disease, spread of the disease along the vessel, stable calcifications, numbers of lesions, and density). We used novel, hand-crafted calcification features (calcium-omics); Cox time-to-event modeling; elastic net; and up and down synthetic sampling methods for imbalanced data, to assess MACE risk. We used 2457 CT calcium score (CTCS) images enriched for MACE events from our large no-cost CLARIFY program (ClinicalTrials.gov Identifier: NCT04075162). Among calcium-omics features, numbers of calcifications, LAD mass, and diffusivity (a measure of spatial distribution) were especially important determinants of increased risk, with dense calcification (> 1000HU, stable calcifications) associated with reduced risk Our calcium-omics model with (training/testing, 80/20) gave C-index (80.5%/71.6%) and 2-year AUC (82.4%/74.8%). Although the C-index is notoriously impervious to model improvements, calcium-omics compared favorably to Agatston and gave a significant difference (P < 0.001). The calcium-omics model identified 73.5% of MACE cases in the high-risk group, a 13.2% improvement as compared to Agatston, suggesting that calcium-omics could be used to better identity candidates for intensive follow-up and therapies. The categorical net-reclassification index was NRI = 0.153. Our findings from this exploratory study suggest the utility of calcium-omics in improved risk prediction. These promising results will pave the way for more extensive, multi-institutional studies of calcium-omics.


S.1 Detailed feature engineering
The three main traditional whole-heart scores (Agatston, mass, and volume) are given below.
1. Agatston score.Agatston score uses a weighting factor depending upon the maximum HU value for each lesion in a 2D CT image.The total Agatston score was calculated by summing the product of the density weighting factor (DWF) and the 2D area of each calcified lesion.(Some larger 3D lesions will have multiple 2D entries.)The whole heart Agatston score is obtained as given below.
Here,  is the number of 2D lesions,   is the maximum HU value within the -th lesion.  is the -th lesion 2D area in  2 .As the Agatston score was originally calculated with a 3  slice thickness, we adjust values with the ratio, 3/, where ST is the new slice thickness.Values for ( ) are given below [19].
In addition to whole heart aggregated features (such as Agatston score, Volume score, and mass score), we collected lesion, lesion-to-lesion, and arterial-wise features.We calculated per artery score features, including Agatston score, mass score, and volume score.We engineered more calcium-driven features such as lesion aggregated areas, HU statistical features (min, max, average, median, and standard deviation), distance from the first slice to last calcification, and distance from first to last lesion along descending arterial lesions.We also collected lesion-based statistical histogram bins of the first moment, second moment, mean moment, skewness moment, kurtosis moment, and average HU.Some of these features are briefly explained as follows, where 2D features are slice-based, and 3D features are volume-based: 1-Numerical features include: • Area2D (total heart summation of lesions' areas across all slices) • NumLesion3D (total heart number of 3D lesions) • numLesionPerArtery3D_<<name>>1 (num of 3D lesion in specified artery)  [min, max, mean, std] statistical values of Hounsfield Units of each calcified voxel within artery<<name>>) • <<name>>_diffus (factor indicates diffusivity of lesions within <<name>> artery, calculated as the ratio of number of lesions to Euclidean distance along lesions with artery from first to last lesion.We considered the non-calcified artery to have zero diffusivity while the single lesion artery to have diffusivity=one) 2-Categorical and conditional (Boolean) features include: • isAgZero (is Agatston score equal zero?) • isLesion3DBelow5 (is number of lesions less than 5?) • AgGroupX1-X3, Agatston score groups of (0,1-99, 100-399, 400+) represented in three (X1, X2, X3) Boolean digits to be used in Cox.• isArt2plus (are there two or more calcified arteries?)• isArt3plus (are there three or four calcified arteries?) • numArtCalc (number of calcified arteries 0-4) • HU1000 (Does the patient have any calcified lesions with HU value above 1000?) These image-based engineered features are listed in Table S1.We exclude features that are clinical or highly correlated.Among the remaining 61 features, an elastic net with 10-fold cross-validation selected 40 features, as indicated, with their corresponding Cox model coefficient values.The elastic-net Cox proportional hazard model was deemed the Calcium-omics model.

S.2 Time-to-event modeling with Cox proportional hazard model and elastic-net regularization
For a clinical study at a fixed time with persons entering at various times, censoring of the observation time is an issue requiring time-to-event modeling rather than binary classification.A time-to-event model estimates the probability that the event (MACE in our study) may have occurred during a follow-up period.Whether the patient had an event or being censored, data can be modeled by a distribution function [20] of observed time , at a patient survival time , called the cumulative incidence function: ∞  where() is the probability density function.( <  ) is the probability function that survival time is less than .The survival function (), is 1 − (), which is the probability that the time  is greater or equal to .() = ( ≥ ) = 1 − ().
(6) The hazard function is represented as the risk of hazard of an event occurring at time  and is defined as: The Cox proportional-hazard regression model [21] is widely used in survival modeling.The Cox model provides a semi-parametric hazard rate of each covariate in the model, as follows: where ℎ 0 () is the baseline hazard,  = [ 1 ,  2 , …   ] is the covariate feature vector of  features,   is the  ℎ covariate coefficient.The Cox model is optimized using maximum-likelihood.We used Cox regression for univariate and small multivariable models to study feature effects and identify high risk features.There are practical considerations.So as not to over-emphasize large covariate values, we compress dynamic range by taking a logarithm of some covariates (e.g., log (Agatston Score)).As Cox modeling is sensitive to correlated features, results may not reflect the actual effect of one feature over another.Too many features can result in over-fitting.We used elastic-net regularization with cross-validation [22] to select the best features.
To overcome the effect of low event rates [18], we applied down sampling followed by up sampling techniques on the majority and minority class, respectively.We used a modified Synthetic Minority Oversampling Technique (modified-SMOTE) approach.For major class down sampling, we used few continuous features (e.g., Agatston score, mass score, and volume score) to determine eligibly samples to be removed using k-nearest neighbors (KNN) (k=5) in feature space.For up-sampling, we created synthetic instances "nearby" actual samples in "covariate space."Briefly, we used similar features (as in down sampling), synthetic instance was inserted within KNN (k=5).For the new sample, continuous feature value was calculated as the median of the corresponding k-neighbors feature value, while non-continuous (logical and categorical) feature values were copied from the nearest single neighbor.The new instance time-to-event was randomly set.Down sampling was done until MACE events increased from 13.8% in the original data to 16.4% by removing 20% of the No-MACE cases.Followed by up sampling, new cases were inserted until the MACE events increased to 30%.We never applied up or down-sampling on held-out test data.

S.3 Comparison between Agatston and Calcium-omics Cox models
Whole Agatston score had a non-linear relationship with MACE events in the log hazard ratio regression curve (Fig. S1), the calcium-omics model had a more linear curve.These curves were plotted using the Cox model of penalized spline of a feature and calculated the log of hazard ratio of each patient to show the distribution along the regression curve.The calcium-omics model showed a wide range of risk levels for cases with similar Agatston scores in an interactive 2D surface regression plot (right plot in Fig. S1) implying good distinguishable values for cases having similar Agatston score.The contours in this plot delineate areas that correspond to equivalent levels of disease severity.Interestingly, the plot shows the capability of calcium-omics to cover a wide variation of values for narrow Agatston score values.
Figure S2 shows the advantage of calcium-omics over Agatston score model for two patients with approximately equal Agatston scores (~204), but one has diffuse disease with 11 lesions in three territories (left), and the other has only two lesions in one territory (right).

Figure S1.
Log hazard ratio, i.e., ln[h(t)/h0(t)], regression plots for Cox models as a function of Agatston (A) and calciumomics scores (B).Visualizations are available using the visreg( ), and visreg2D( ) functions in visreg R Library.Briefly, we used penalized spline (pspline) function to create the blue log hazard ratio curves.Each data point represents a patient's deviance residual, and the shaded-gray areas show the 95% CI.As compared to the Agatston model, the calcium-omics model shows a desirable, linear distribution along the data points.In the case of the Agatston model (A), a wide range of Agatston score  gives very similar results.Similar observations are shown in (C), where the log hazard ratio is displayed in gradient-colored contours, from low (blue) to severe (red).Calcium-omics is platted as a function of Agatston.At a given value of Agatston, there is considerable variation of the log hazard calculated from calcium-omics.For example, with an Agatston score of 500, several levels of severity are covered by the calcium-omics model.This suggests added value.
Table S1.List of all selected and excluded (before using elastic-net) image-based calcification-driven features.19 features were excluded prior to the proposed model design due to their high correlation with other features.We used them in designing univariate and multivariable Cox models to investigate and compare with other models (such as Agatston score, HU1000, and LAD_diffus).Among the 61 listed features, an elastic net with 10-fold cross-validation selected 39 features, as indicated, with their corresponding Cox model coefficient values.The elastic net Cox proportional hazard model was deemed the calcium-omics model.These features were used to design the calciumomics Cox model using training data without sampling.

Figure S2 .
Figure S2.Whole-heart Agatston does not reflect the spread of disease and risk for these two patients, both with a wholeheart Agatston score of ~204.The left heart has 11 calcifications spread throughout the heart (i.e., LAD:6, LCX:3, and RCA:2), with Agatston scores of (29.6, 84.4, and 90.8), respectively.The right heart has two "nearby" large calcifications (LM:1, LAD:1) with Agatston scores of (108.2 and 95.6), respectively.Both patients are from the held-out test set, with the same Age (~67).Despite an equal whole-heart Agatston score, the calcium-omics model described later predicted a 3-year risk for the left heart 2.3 times that of the right heart.The patient on the left patient had a MACE event, while the right did not have MACE.
Selected 61 engineered features (elastic-net selected only 39, shown with coefficients)