Introduction

There is a great need for precision risk tools to guide personalized prevention strategies for heart health. While cardiovascular risk can be estimated using many widely available cardiovascular risks scores from clinical factors, most scores suffer from poor discrimination1. The CT calcium score (CTCS) imaging exam can provide direct evidence of coronary atherosclerosis when calcifications are present in the coronary arteries and is acknowledged by several guidelines as a preferred risk assessment tool2,3. The presence of coronary artery calcium (CAC) is by far the best predictor of future major adverse cardiovascular events (MACE) outperforming every other risk factor and composite clinical risk scoring approaches. The addition of CAC score to traditional risk factors has been shown to consistently improve discrimination and reclassification4. Despite their acknowledged superiority over current risk assessment approaches, current approaches for CAC-based risk prediction are overly simplistic and suffer from a number of limitations. The Agatston method simply uses a non-linear weighted sum of the areas of coronary artery calcium (CAC) with densities above 130 HU. A calcium mass score is known to be more reproducible5. Importantly, current CAC scoring approach ignores a plethora of other CAC features that may be pathophysiologically important, including density, distribution, geometry, and others. Some alternatives have appeared in the literature (e.g., spatial distribution, diffuse CAC, and high-density calcified plaque6,7,8) but never in combined fashion. Other pathophysiologic observations on calcifications have suggested a number of aspects that could be important but are currently not incorporated in the analysis of CTCS images.

The use of CTCS imaging has been re-invigorated with AHA guideline recommendations for the test. Many sites offer low-cost exams, and our institution offers a no-cost CT calcium score exam. As a result, large data sets are being accrued, buoying our interest in creating a more intensive AI analysis approach than has been done to date.

In this work, we evaluate a novel machine approach that includes numerous hand-crafted features aimed at capturing pathophysiology of atherosclerotic calcifications (calcium-omics) and evaluate their collective ability to predict MACE in time-to-event models. Our calcium-omics includes for the first-time combinations of shape, mass, density, volume, number of calcifications, diffusivity, and others which together could potentially better capture a patient’s risk of a MACE event. We use Cox time-to-event modeling, elastic net, up and down synthetic sampling methods for imbalanced data, and determine our ability to separate low and high-risk groups. In addition to determining risk from combined calcium-omics, we use Cox models on individual features and subsets of features in order to determine explainable high-risk characteristics in the images.

Methods

Study data

Non-contrast CTCS images were acquired from a variety of CT scanners using 120-kVp, nominally 30-mAs, with an average 0.5 × 0.5-mm in-plane voxel spacing and 2.5-mm slice thickness. A typical CTCS volume consists of 40 slices of 512 × 512 voxels, giving 10.5 million voxels per volume. We used CTCS images from 2457 patients (single CTCS volume per patient) enriched for MACE (13.8%), with characteristics in Table 1. MACE was defined as first event of myocardial infarction, stroke, coronary revascularization, or all-cause mortality. Cardiovascular outcomes were obtained from the UH CLARIFY study with a maximum of 6 years of follow-up (mean follow-up is 1.9 years). The included population had not experienced MACE, including revascularization, before the CTCS exam. The CTCS images utilized in this study were accompanied by manual segmentations conducted by experts from our institution as part of the clinical routine. The experts identified coronary calcifications and excluded other calcifications (e.g., aortic valve, aortic, or pulmonary artery calcifications). Experts excluded any stent, pacemaker, or other man-made objects. Patient’s MACE-free time is reported as the duration from the start time (time of CTCS exam) until the patient either had MACE or was censored (left the study or survived to the end of observation without MACE). This study on de-identified data was approved by the Institutional Review Board (IRB) of the University Hospitals Cleveland Medical Center. All scans in this study were obtained as part of clinical care and informed consent was obtained from all subjects and/or their legal guardian(s). Methods were carried out in accordance with relevant guidelines and regulations.

Table 1 Characteristics of our randomly chosen cohort of 2457 enriched with regards to MACE events.

Image analysis and risk prediction methods

Data preparation

Previously, patient images were analyzed using semi-automated commercial software. The criteria for calcification detection were according to prior standards endorsed by guidelines which specified three connected voxels with HU \(\ge \) 1309. Analysts went through each volume, slice-by-slice, and assigned each coronary calcification to a territory. For each heart, the software created a mask volume, identifying the calcifications in each territory with a different color and computed whole heart as well as territorial Agatston score. We excluded cases that had (1) poor image quality and (2) showed > 10% Agatston score difference between commercial and automated in-house deep learning software. As a preprocessing step, the color-coded masks were deciphered to obtain the proper territory, creating a clean mask volume. This step required special processing to ignore extra text labels embedded in the image. The pipeline of our proposed model is shown in Fig. 1.

Figure 1
figure 1

MACE prediction using calcium-omics features pipeline. In (1), CAC lesions in CTCS images were analyzed and labeled using semi-automated commercial software. In (2), calcium-omics features were engineered and categorized based on whole heart, territorial, and lesion features. In (3) MACE risk prediction model was designed using elastic-net and Cox model. In (4), results and statistical analyses were performed to assess the importance of our novel calcium-omics model compared to variety of univariate and multivariable Cox models.

Calcium-omics feature engineering

Using the mask volume as a guide, we created software to compute various calcium-omics features for each individual calcification, artery territory, and whole heart. For each individual calcification, we collected elemental features including mass, volume, territory, HU values, first moment, second moment, shape, distance to a subsequent lesion, distance to the top of the CT volume, artery diffusivity, among others. (Artery diffusivity was the ratio of number of calcified lesions to the Euclidian distance from first to last lesion within an artery) and represents the distribution of lesions within artery. For a territory with no calcifications or a single calcification, we set diffusivity to 0 and 1, respectively. In addition, additional statistical features such as mean, standard deviation, skewness, kurtosis, and small histogram were obtained per territory and for the entire heart. In total, we collected 80 calcium-omics features. Agatston, mass, and volume were obtained at the level of individual calcification, coronary territory, and whole heart levels. As demonstrated in Fig. 2, different features were aggerated within three levels (lesion calcification, artery, and whole heart). Details of calcium-omics features, and time-to-event modeling are described in the supplemental file.

Figure 2
figure 2

Individual calcifications and engineered calcium-omics features. Three consecutive calcifications in the LCX artery territory are shown in (A) and magnified in (B), where dashed lines annotate the vessel wall. Calcification masks are rendered in (C). Some features are aggregated along each artery, such as Agatston, mass, and volume scores, which give this LCX artery 84.4, 13.3, and 73.2, respectively. Calcification centroids are used to calculate the Euclidean distances between calcifications. The sub-voxel centroid (x, y, z) locations are used to calculate the calcified arterial distance to sum DistFirst2LastLesionPerArtery from a centroid to a centroid in consecutive sequential order. An example of a new feature is “DistFirst2LastLesionPerArtery_LCX” which represents the total Euclidean distance along lesions within LCX.

MACE risk prediction and performance evaluation

We randomly divided data into training/held-out-testing subsets with 80:20 ratio for all our experiments, maintaining a similar MACE-event ratio for training and testing sets. We used the natural logarithm function to condense features with broad-range values (e.g., Agatston and mass scores). Starting with 80 calcification features, we eliminated 19 irrelevant or highly autocorrelated features by univariate Cox modeling, leaving 61 features (Table S1).

To determine high risk features and enhance explainability of results, we investigated selected univariate and multivariable Cox models. We evaluated the impact of mass scores using multivariable Cox models and investigated the impact of adding features such as the number of lesions, max HU, distance-based along territories, and CAC distribution along territories (diffusivity) to the mass score model. As a machine learning technique, Cox modeling provides interpretable results that can explain the effect of those imaging features which would be unavailable in deep learning. To enhance comparisons, all Cox models in our study were trained and tested on the same data.

We selected the most informative and non-correlated features using elastic-net as implemented in R package \(glmnet\). Elastic-net was performed on the training subset using tenfold cross-validation, with \(\alpha =0.05\), and \(\lambda =0.074\), where these parameters were determined in preliminary evaluations. Out of the 61 engineered features, elastic net selected 39 features with non-zero coefficients (\(\beta \)). Features include whole heart features (e.g., mass score, volume score, and number of lesions), territorial features (e.g., mass score in LAD, number of lesions in RCA, and distance from top to last lesion in LCX), and calcification features (e.g., mass histogram bin, the maximum first momentum value of individual calcification, and the third skewness value of individual calcification), as shown in table S1. These features were aggregated into a single “calcium-omics” feature by summing the products of these features by their corresponding coefficients. We used R 4.2.110, the Cox model package \(coxph( )\), and elastic-net package \(glmnet( )\).

To evaluate the performance of those models, we utilized multiple time-to-event analyses. Standard metrics included C-index, AUC at fixed time points, and log-rank score. As C-index is notoriously incentive to model improvements11,12, we used other metrics to evaluate performance. Hazard ratios with confidence intervals are presented so as to isolate the impact of a single feature. In addition, we stratified risk groups and created Kaplan–Meier (KM) plots. We also computed categorical net reclassification improvement. In some instances, we compared groups using student’s t-test, with significant differences identified when p < 0.05.

Results

Data analysis and model subsets to identify high risk features

Histograms of selected calcium-omics features are shown in Fig. 3. Distributions for MACE and no-MACE have considerable overlap, eliminating the possibility of creating clear-cut decision rules for MACE based on single feature, with the exception of zero total calcium score. The lack of clear discriminating thresholds suggests the need for an AI approach using multiple features at once.

Figure 3
figure 3

Normalized histograms of feature values for the MACE and no-MACE groups. For the 80 features we analyzed, we found that no single feature, including Agatston and mass scores, gave strong visual evidence of differences between groups. However, because of the large number of samples, t-tests gave p < 0.0001, allowing us to reject the null hypothesis of no difference in the means. Of course, these histograms do not consider censoring as done with time-to-event modeling. The x-axis represents the values of each feature, while the y-axis is the probability of histogram bins as obtained by normalizing the histogram.

We investigated multiple univariate and multivariable Cox models to understand and explain the role of particular features on MACE prediction (Table 2). Comparing Agatston (line 1) and mass score (line 2), we determined that mass score had a slightly higher C-index and AUC at 2 years. As the mass score is generally considered more reproducible than Agatston5,13, we used it in subsequent evaluations. When we examined territorial mass scores (line 3), we found improved discrimination (C-index and AUC) compared to a whole heart mass score particularly for the LAD, which has the lowest p-value and the highest HR, indicating that an equivalent mass in the LAD was more predictive of MACE than that in another territory. For a given mass score, increasing the number of lesions had a significant effect (p < 0.004) (line 4). For every unit increase of (ln(1 + NumLesions)), the risk of having MACE increased by 1.48-fold. Hence, compared to one lesion, the risk was increased by 227% for 40 lesions, a number sometimes observed. Adding a logical feature to indicate two or more territories with calcification did not improve mass score model (p = 0.12) (line 5 versus line 2). In contrast, for a given mass score, HU > 1000 was protective (line 6). The distance from the “top” to “bottom” calcification per territory (line 7) improved performance with regard to log-rank score, C-index, and AUC as compared to other models (lines 1–6), even though no single territory produced a significant effect (p < 0.05). The number of lesions per total distance in each territory (diffusivity as described in Methods) performed better than lesion distance (line 8 versus line 7). Regarding their HRs, diffusivity in territories ranked as LM > RCA > LCX > LAD. A calcium-omics model with 39 features after elastic net regularization (line 9) was highly predictive of MACE with (HR = 3.62, p < 0.0001). When compared with other features (lines 1–8), the calcium-omics model had the best performance metrics in multiple categories. With the use of sampling to improve the event rate and elastic-net determination of 59 features, our calcium-omics model with sampling (line 10) yielded even better performance. Compared to the conventional standard (Agatston score, line 1) on held-out test data, this model improved C-index from 70.3% to 71.6% and the year-2 AUC from 68.8 to 74.8%. As these metrics are notoriously difficult to improve, we deem this increase substantive.

Table 2 Comparison of calcification risk models.

Calcium-omics prediction comparison to traditional Agatston score

In Fig. 4, we show year-2 ROCs for the calcium-omics model with and without sampling and compare them to the conventional Agatston score. Without sampling, calcium-omics gave (training/testing) AUCs of (74.7%/71.4%), while the Agatston model gave (71.8%/68.8%), respectively. Utilizing modified-SMOTE sampling, the calcium-omics model gave AUCs of (82.4%/74.8%), while sampling did not affect the Agatston score model. Similarly, at year-3, calcium-omics with sampling gave the best results. However, at year-3, there were fewer cases due to censoring and events, giving more uncertain results.

Figure 4
figure 4

Performance of calcium-omics risk prediction with and without sampling (bottom and top, respectively). Along each row, calcium-omics ROCs are shown for training at 2 years, and testing at 2 years, respectively. Agatston score results are shown for comparison. Across the board, calcium-omics was superior to Agatston. Calcium-omics performance was improved with sampling and yielded a significant difference to Agatston (p <  < 0.0001) as compared to no sampling calcium-omics to Agatston (p = 0.008). For sampling, we used a modified-SMOTE (see text) with down sampling and up sampling on training data only. The held-out test set was not subjected to any data sampling strategy. The p-values correspond to the Wald test for the AUC significance of a model compared to the rival model.

The Agatston model has a number of limitations. In contrast, the calcium-omics model is more discriminating due to its capacity to accommodate a broader range of calcium-omics features. While whole Agatston score had a non-linear relationship with MACE events in the log hazard ratio regression curve (Fig. S1), the calcium-omics model had a more linear curve. The calcium-omics model showed a wide range of risk levels for cases with similar Agatston scores in an interactive 2D surface regression plot (right plot in Fig. S1) implying good distinguishable values for cases having similar Agatston score.

Consistent with the linear relationship between the calcium-omics model and MACE shown in Fig. S1, calcium-omics stratified risk groups better than did Agatston (Fig. 5). For the Agatston score (left), patients were stratified into the five risk groups recognized by the Lipid Association with Agatston score ranges (0, 1–99, 100–299, 300–999, and 1000 +)14,15. For calcium-omics, we set thresholds for the aggerated calcium-omics feature that gave the same proportions of patients in the risk groups as in the Agatston score plot. The calcium-omics model separated risk groups much better than does the Agatston model. Focusing on year two for Agatston, the three middle-risk curves are not very informative giving nearly the same MACE-free proportions. For calcium-omics at this time, there is informative, good separation. These results are consistent with the left and middle log hazard ratio regression curves in Fig. S1.

Figure 5
figure 5

Kaplan Meier survival (MACE-free) curves with stratification provided by Cox modeling for all data. Plots represent the full MACE-enriched cohort using the standard Agatston score model (left) vs. calcium-omics model (right). The x-axis represents survival time, while the y-axis represents the survival probability of patients within a risk group. Agatston score was stratified into 5 groups according to the Lipid Association recommendation with Agatston score ranges (0, 1–99, 100–299, 300–999, and > 1000). Groups for the calcium-omics model were created with scores (< 0.99, 0.99–1.56, 1.56–1.76, 1.76–2.09, > 2.09) to give equivalent numbers of patients as for Agatston. The five risk severity groups are ordered (0–4), where 0 is the lowest-risk. Visually, the calcium-omics model much better stratified the five groups as compared to Agatston. In particular, risk groups 1, 2, and 3 are much more clearly separated for calcium-omics than Agatston. Due to the low number of held-out test samples, these plots are done with all data.

Calcium-omics improved net reclassification in the held-out test set. Figure 6 shows Kaplan–Meier curves for the 20% held-out-test subset. As this smaller data set is insufficient to support 5-group stratification, we use a single cut-off of Agatston = 100, a value often considered in the literature. As before, the threshold for creating the calcium-omics curves gave the same starting patient proportions as for Agatston. Again, there is better separation provided by the calcium-omics model than the Agatston model. Considering results at year four, the calcium-omics model identified 73.5% of MACE cases in the high-risk group, a 13.2% improvement as compared to Agatston. The overall categorical net reclassification improvement was, \(NRI=0.154 [95\% CI 0.006- 0.302;p=0.042]\), indicating improvement in the proposed model.

Figure 6
figure 6

Kaplan Meier survival (MACE-free) curves with stratification provided by Cox modeling for held-out test set. Agatston score model stratified into low-risk (cyan) and high-risk (pink) risk groups based on below or above Agatston score of 100 (Left). In a similar ratio to the left model, the calcium-omics model stratified patients into low and high-risk (right) with a calcium-omics feature value = 0.25. The calcium-omics model showed better visual separable stratification by reclassifying some patients to fit into high or low-risk groups. Survival probability of 50% was reached at year 4.5 with Agatston model, while reached in 3.8 years in calcium-omics, showing advantageous to the latter model. At year four, we investigated the calcium-omics model categorical reclassification performance compared to Agatston score model. For the patients with MACE, the calcium-omics model showed a categorical net reclassification improvement of \(NR{I}_{MACE}=0.132\). While with No-MACE patients, the new model showed \(NR{I}_{NoMACE}=0.022\). The total NRI showed advantage to the new model with \(NR{I}_{Total}=0.154 [95\% CI 0.006, 0.302; p=0.042].\) To conclude, the calcium-omics model identified 73.5% of MACE cases in the high-risk group, a 13.2% improvement as compared to Agatston, clearly showing high performance of the calcium-omics model for clinical decision-making.

Example patient highlighting calcium-omics advantage

Figure S2 highlights the limitation of the whole-heart Agatston score in two patients who have approximately equal Agatston scores (~ 204), but one has diffuse disease with 11 lesions in three territories, and the other has only two lesions in one territory. Whole heart Agatston would have predicted the same risk, but our calcium-omics approach predicts that at 3 years, the patient with only two lesions (right) will have a MACE-free survival probability 2.3 times better than the other patient with diffuse disease (left). The high-risk patient had a MACE later in the study period.

Discussion

In this paper we provide an initial evaluation of an integrated radiomic approach that incorporates 80 different features spanning multiple elemental features of shape, texture, distribution and statistical parameters to predict MACE and compared this with the traditional Agatston score. The use of calcium-omics was far more discriminatory than Agatston, improving the AUC from 68.8 to 74.8 (p = 0.07). The relationship between an aggregated calcium-omics score and MACE was nearly linear with a graded effect, when compared to Agatston which displayed a non-linear relationship particularly at high levels of calcium score (Fig. S1). Importantly, calcium-omics resulted in a graded dose effects on MACE, as opposed to considerable overlap in risk across risk stratification quartiles (Fig. 5). While our findings are intuitive in the sense that incorporation of multiple features may be expected to enhance risk prediction, this is not always true given that many features may be correlated.

The success of calcium-omics relative to Agatston lies in its ability to better characterize coronary artery disease as compared to the Agatston score, which is simply a summation of calcium in the coronaries, albeit in a non-linear way. Calcium-omics captures characteristics from individual calcifications, including mass, volume, HU values, numbers of calcifications, numbers of territories, and spatial distribution. Univariate and multivariable Cox models in Table 2 offer explanation as to why calcium-omics does better, per the following observations. (1) Summing over the entire heart, mass score slightly improves prediction compared with Agatston, potentially due to its improved reproducibility5,13. (2) When we simply add the dense calcification (> 1000 HU) feature to mass, there is improvement as compared to mass or Agatston alone. Highly calcified “older” lesions are likely more stable explaining this finding8. (3) Adding mass scores from individual arteries improves performance. This suggests that having disease present in more than one artery is a risk factor. This is directly shown as a logical (≥ 2) in line 5 giving HR = 1.45 after accounting for total mass score. (4) Adding the number of calcifications to the whole heart mass score greatly improved risk prediction, again suggesting that the spread of disease is a risk factor. (5) Our diffusivity metric is a risk factor indicating that the spread of disease along arteries is a risk factor. Taking all features together in calcium-omics simply provides more information about the disease, enabling improved overall risk prediction.

Although the C-index is notoriously impervious to model improvements11,12, calcium-omics compared favorably to Agatston and gave a significant difference (P < 0.001). In addition to C-index, we consider it important to report other performance evaluations. As described in the Results, the calcium-omics model identified 73.5% of MACE cases in the high-risk group, a 13.2% improvement as compared to Agatston, suggesting that calcium-omics could be used to better identify candidates for intensive follow-up and therapies. The categorical net-reclassification index was NRI = 0.153.

There are important contributions of our work. Calcium-omics outperforms the current state-of-the-art, whole heart Agatston score. This development could contribute to more precise personalized therapies for cardiovascular patients. We purposefully used a MACE-enriched cohort in this preliminary study on AI analysis of calcium-omics features. The high event rate improved confidence intervals, allowing us to make more confident assessments of point estimates of HR values and of model comparisons, for this data set. In addition, this smaller cohort allowed us to carefully vet all data to ensure data quality. Overall performance might be different when larger numbers of cases with a low event rate are used in larger studies. We found that modified-SMOTE up sampling and down sampling reduces the problem with low event rate data. This is the first time that such strategies have been used with CT calcium score data. Often overlooked, down sampling and up sampling has been previously described for coronary heart disease cohort16.

Our study undoubtedly has limitations. Importantly, we had a limited observation period in our cohort (average 1.9 years within the 6-years study), which limits event rates. The data used in this study were from sites across the University Hospitals Health System, which is restricted to northeast Ohio. Other locales might have somewhat different results. Additionally, data used in this study were obtained using various scanners with similar acquisition parameters. We did not perform analyses to identify model performance by scanner type. Lead time bias is an inherent limitation in this study since MACE-free time is reported as duration from the time of CTCS exam until a patient either had MACE or was censored. This study primarily focuses on image-based classifications within coronary arteries, whereas comparisons to clinical-data-only models (e.g., MESA calculator and pooled cohort equation) will be part of future, more comprehensive studies on larger data sets. Our preliminary calcium-omics model used CT images and masks. Segmentation of coronary arterial calcifications is a labor-intensive and time-consuming task that is carried out by experts. It involves delineating each calcified lesion and assigning appropriate labels to the designated regions. Automation of this step is highly desirable.

In conclusion, we have obtained promising results using an AI analysis on detailed calcification features. Clearly, there will be advantage as compared to the standard whole-heart Agatston score. It is hoped that results will carry over to larger, confirming studies.