Differential progression of coronary atherosclerosis according to plaque composition: a cluster analysis of PARADIGM registry data

Patient-specific phenotyping of coronary atherosclerosis would facilitate personalized risk assessment and preventive treatment. We explored whether unsupervised cluster analysis can categorize patients with coronary atherosclerosis according to their plaque composition, and determined how these differing plaque composition profiles impact plaque progression. Patients with coronary atherosclerotic plaque (n = 947; median age, 62 years; 59% male) were enrolled from a prospective multi-national registry of consecutive patients who underwent serial coronary computed tomography angiography (median inter-scan duration, 3.3 years). K-means clustering applied to the percent volume of each plaque component and identified 4 clusters of patients with distinct plaque composition. Cluster 1 (n = 52), which comprised mainly fibro-fatty plaque with a significant necrotic core (median, 55.7% and 16.0% of the total plaque volume, respectively), showed the least total plaque volume (PV) progression (+ 23.3 mm3), with necrotic core and fibro-fatty PV regression (− 5.7 mm3 and − 5.6 mm3, respectively). Cluster 2 (n = 219), which contained largely fibro-fatty (39.2%) and fibrous plaque (46.8%), showed fibro-fatty PV regression (− 2.4 mm3). Cluster 3 (n = 376), which comprised mostly fibrous (62.7%) and calcified plaque (23.6%), showed increasingly prominent calcified PV progression (+ 21.4 mm3). Cluster 4 (n = 300), which comprised mostly calcified plaque (58.7%), demonstrated the greatest total PV increase (+ 50.7mm3), predominantly increasing in calcified PV (+ 35.9 mm3). Multivariable analysis showed higher risk for plaque progression in Clusters 3 and 4, and higher risk for adverse cardiac events in Clusters 2, 3, and 4 compared to that in Cluster 1. Unsupervised clustering algorithms may uniquely characterize patient phenotypes with varied atherosclerotic plaque profiles, yielding distinct patterns of progressive disease and outcome.


), predominantly increasing in calcified PV (+ 35.9 mm 3 ). Multivariable analysis showed higher risk for plaque progression in Clusters 3 and 4, and higher risk for adverse cardiac events in Clusters 2, 3, and 4 compared to that in Cluster 1. Unsupervised clustering algorithms may uniquely characterize patient phenotypes with varied atherosclerotic plaque profiles, yielding distinct patterns of progressive disease and outcome.
Understanding the process of coronary atherosclerosis could facilitate timely medical intervention to retard the development of clinically significant coronary artery disease and its consequences. Coronary computed tomographic angiography (CCTA) which provides noninvasive and comprehensive evaluation of coronary atherosclerosis has been utilized to understand the pathophysiologic progression of coronary atherosclerosis over time [1][2][3] . Furthermore, CCTA has the potential to personalize preventive therapy by quantitatively evaluating heterogeneous coronary atherosclerotic plaque volume and composition in whole coronary trees 4,5 . However, plaque components form a pathophysiologic continuum, thus it is difficult to determine the threshold of plaque composition in terms of clinical implications. Machine learning using unsupervised cluster analysis aims to group similar data points into clusters based on inherent similarities among them. It thus enables the exploration of possible heterogeneity within a disease category that has historically been considered homogeneous 6,7 . In the present study, we hypothesized that unsupervised cluster analysis could categorize heterogeneous patients according to atherosclerotic plaque component proportions. Furthermore, we aimed to determine how these differences in atherosclerotic plaque components at baseline differentially impact plaque progression and composition change.

Study design and population. The Progression of AtheRosclerotic PlAque DetermIned by Computed
TomoGraphic Angiography Imaging (PARADIGM) study was a multinational observational registry that prospectively enrolled 2252 patients who underwent clinically indicated serial CCTAs at an inter-scan interval of ≥ 2 years at 13 sites in 7 countries between 2003 and 2015 8  For the present analysis, we excluded patients with uninterpretable CCTAs (n = 492), prior revascularization (n = 282), and no coronary atherosclerotic plaque at baseline (n = 358). (Supplementary Fig. 1) To explore the natural history of coronary atherosclerosis, we defined statin-naïve patients as patients who were not using statin at the time of the baseline and follow-up CCTAs. Statin-taking patients were defined as those who were using statin at the time of follow-up CCTA 2 . After further excluding patients without information on statin use (n = 121) and those who discontinued statin use after the baseline CCTA (n = 52), 947 patients remained for the final analysis. CCTA analysis. Acquisition and analysis of CCTAs were performed in accordance with guidelines 8 . Data from each participating site were transferred to a core laboratory for blinded image analysis by level-III experienced readers using semi-automated plaque analysis software (QAngioCT Systems, Leiden, the Netherlands) with manual correction 8 .
All coronary artery segments with a diameter ≥ 2 mm were evaluated for plaque and vessel volume (mm 3 ) using a modified 17-segment American Heart Association model 9,10 . Segments were matched between baseline and follow-up CCTAs using branch points as landmarks. The presence of atherosclerosis was defined as any tissue ≥ 1 mm 2 within or adjacent to the lumen that could be discriminated from surrounding pericardial tissue, epicardial fat, or lumen, and identified in ≥ 2 planes. Plaque volume (PV) (mm 3 ) was measured and further sub-classified by the composition using pre-defined Hounsfield unit (HU) cut-off values (necrotic core, − 30 to 30 HU; fibro-fatty plaque, 30 to 130 HU; fibrous plaque, 131 to 350 HU; and calcified plaque, ≥ 351 HU) 11,12 . To account for differences in the total vessel length between patients and to provide an equal weighting of each patient in the calculation of PV, we normalized PV as [(absolute PV/the total vessel length) * mean population vessel length] 13 . We calculated the annualized total PV change as (Δtotal PV/CCTA interval, mm 3 /year), and used the median value as a cut-off point to determine the plaque progression.
Unsupervised clustering. Clustering is an unsupervised technique used to group objects that are "close" to one another in a multi-dimensional feature space, usually to uncover some inherent structure within the data without prior assumptions 7,14 . K-means clustering is a vector quantization method used for partitioning n observations into a pre-defined number (k) of mutually exclusive clusters, in which each observation belongs to the cluster with the nearest mean. The algorithm iteratively minimizes the sum of the square distances between cluster points and the cluster mean. As such, it locally optimizes the following objective function, using an iterative procedure similar to the expectation-maximization algorithm as below (n = the number of data points, K = the pre-defined number of clusters, w ik = 1 if x i belongs to cluster k or 0 otherwise, and x i = the mean of cluster k) 6 Fig. 2) 6,15 . Since the clustering algorithm does not provide a specific cluster order, we ordered the clusters based on the %vol of calcified plaque.
Clustering was validated with nonparametric bootstrapping (Supplementary Table 1) 16 , and visualized using several techniques including 3-dimensional (3D) plots displaying 3 of 4 features at a time, and the remapping of multi-dimensional plots into 2-dimensional (2D) plots using radial visualization (RadViz) and t-distributed stochastic neighbour embedding (t-SNE) 17,18 . Study outcomes. After clustering, we compared the individual clinical phenotypes and changes in PV and characteristics among clusters, followed by an interpretation of the clinical relevance. In 808 patients (85%) with available clinical outcome data, we also compared the composite of major adverse cardiac events (MACE), including all-cause mortality, acute coronary syndrome, and coronary revascularization.
Statistical analysis. Continuous variables are presented as the median [interquartile range (IQR)]; categorical variables are presented as numbers (percentages). Differences among clusters were evaluated using the analysis of variance or Kruskal-Wallis test for continuous variables, and the χ2 test or Fisher's exact test for categorical variables, as appropriate, followed by Bonferroni's correction or Dunn's post-hoc testing for multiple comparisons. Multivariable logistic regression analysis, including the 10-year atherosclerotic cardiovascular disease (ASCVD) risk, diabetes mellitus, baseline total PV, and statin use, was performed to compare the risk for plaque progression among clusters. Additionally, multivariable Cox regression analysis, including the 10-year ASCVD risk, diabetes mellitus, baseline total PV, annualized total PV change, and statin use, was performed to evaluate the relative hazard for MACE among clusters. Cluster 1 was used as the reference group for multivariable analyses, and the results are expressed as the adjusted odds ratio (aOR) or adjusted hazard ratio (aHR) with the corresponding 95% confidence interval (CI). MACE-free survival data were plotted using the Kaplan-Meier method and compared by the log-rank test. All statistical analyses, including clustering, were performed using RStudio (Version 3.6.3) and its packages. P < 0.05 was considered statistically significant. When we visualized clusters in 3D space ( Fig. 3), the separation of Cluster 1 was mainly driven by its higher %vol of the necrotic core, and Cluster 4 was separated from others due to its higher calcified plaque %vol. Although Cluster 2 was separated from Clusters 3 and 4 by its higher %vol of fibro-fatty plaque, separation from Cluster 1 depended on its %vol of the necrotic core and fibrous plaque. We have also provided 2D plots using RadViz and t-SNE ( Supplementary Fig. 3). (Table 1). However, the patients in Cluster 4 tended to be older, have lower body mass index and triglyceride levels, and higher high-density lipoprotein levels than the patients in other clusters. While statin use at baseline was higher in Cluster 4 than in Cluster 3, statin use at follow-up was similar among clusters.  Table 2). Necrotic core and fibro-fatty PV were greatest in Cluster 1 and gradually decreased (in order) from Cluster 2 to Cluster 4 (P < 0.001 for both). Fibrous PV of Cluster 2 was comparable to that of Cluster 3 (P = 0.589) and Cluster 4 (P = 0.088) and was significantly greater than that of Cluster 1 (P = 0.001). The calcified PV was lowest in Cluster 1 and gradually increased (in order) from Cluster 2 to Cluster 4 (P < 0.001). Cluster 4 demonstrated the greatest maximal diameter and area stenosis (P < 0.001 for both).  Table 2). Necrotic core and fibro-fatty PV regression were evident in Cluster 1 and gradually weakened (in order) from Cluster 2 to Cluster 3 and Cluster 4 (P < 0.001 for both). While the increase in fibrous PV was highest in Cluster 1 and gradually decreased in    Table 3). Each plaque component %vol at the time of baseline and follow-up CCTAs are illustrated in Fig. 4. Cluster 1, which had the least total PV progression, demonstrated decreased %vols of the necrotic core (median, from 16.0% to 4.8%, P < 0.001) and fibro-fatty plaque (from 55.7% to 35.9%, P < 0.001), and increased %vols of fibrous plaque (from 26.7% to 50.3%, P < 0.001) and calcified plaque (from 0.0% to 2.1%, P < 0.001). Although Cluster 2 showed a greater increase in the calcified portion (from 4.7% to 16.3%, P < 0.001) than did Cluster 1, both clusters showed PV progression mainly driven by an increase in fibrous PV (114% and 48% of the total PV increase, respectively). In contrast, in Clusters 3 and 4, the fibrous plaque %vol decreased (from 62.7% to 50.3%, P < 0.001; and from 36.4% to 28.9%, respectively, P < 0.001), and PV progression was mostly driven by an increase in calcified PV (52% and 71% of the total PV increase, respectively).

Clinical characteristic comparisons. Clusters 1, 2, and 3 demonstrated quite similar clinical characteristics
Changes in plaque volume according to statin use. When we evaluated the differences in PV progression among clusters in statin-naïve patients (n = 307), there was no significant difference in the total PV change (P = 0.241) and necrotic core and fibro-fatty PV regression was not evident even in Cluster 1 (Supplementary  Table 4). Whereas, in statin-taking patients (n = 640), necrotic core or fibro-fatty PV regression was observed in Cluster 1 and 2, and the total PV increase was significantly greater in Cluster 3 (50.  Table 5). Multivariable logistic regression analysis showed a higher risk of plaque progression for Clusters 3 and 4 in statin-taking patients, but not in statin-naïve patients (Supplementary Table 3).

Discussion
The present analysis of a large prospective observational cohort of patients with coronary atherosclerosis undergoing serial CCTA used unsupervised cluster analysis to categorize patients according to their coronary atherosclerotic plaque composition. The identified clusters of patients demonstrated markedly different plaque progression patterns, changes in composition, and clinical outcomes. This study provides insight into how patients with heterogeneous coronary atherosclerotic plaque composition differentially experience coronary atherosclerotic plaque progression and adverse cardiac events according to their baseline plaque composition. www.nature.com/scientificreports/ CCTA enables the accurate assessment of the change in coronary atherosclerotic plaque noninvasively over time 1,19,20 . Recent advances have further promoted the use of CCTA, by providing semi-automated segmentation and characterization of the plaque composition. The PARADIGM registry is the largest available serial CCTA database with quantitative measures of atherosclerotic burden and composition 8 . The prior PARADIGM registry studies provided important information regarding the natural course of coronary atherosclerosis and the clinical determinants of plaque progression or regression, by evaluating the impact of statin taking or high-risk features on the progression of coronary plaque lesions 2,21 or categorizing patients according to their clinical risk factor s such as diabetes mellitus 3,22 . In the present study, we performed patient-specific plaque phenotyping, using the ability of CCTA to visualize plaque components in the entire coronary tree, to bridge the gap between the recognition of heterogeneous plaque composition on CCTA and individualized cardiovascular risk assessment and preventive strategy establishment.
Unsupervised clustering is an exploratory data analysis technique that provides insight into the data structure by segregating groups with similar traits and assigning them into clusters 7,14 . K-means clustering is one of the most popular and simplest clustering algorithms. It partitions a feature space into k clusters by placing each data point in the cluster closest to its mean value 23 . Since we aimed to categorize heterogeneous patients with coronary atherosclerosis according to their baseline atherosclerotic plaque composition, we applied k-means clustering to the %vol of each plaque component. In other words, k-means clustering allowed us to find groups of similar data points in a 4-dimensional feature space comprising the %vols of the necrotic core, fibro-fatty plaque, fibrous plaque, and calcified plaque. Considering that coronary atherosclerosis is a continuous process, it was not surprising that the distances between clusters were not large. Nevertheless, the resulting 4 clusters demonstrated distinct features and significantly different plaque progression patterns.
Phenotyping coronary atherosclerotic plaque at the patient level offers insight into how the progression and transformation of coronary atherosclerosis differ according to the baseline composition. Hwang et al. previously applied topological data analysis (TDA) to PARADIGM registry data and identified three distinct group of patients 24 . Since TDA aims to pattern or shape the complex dataset using a geometric approach, multiple quantitative CCTA parameters, including total vessel length, total vessel volume, total lumen volume, PV, fibrous component volume, fibrofatty component volume, necrotic core volume, and dense calcium volume, were utilized to categorize patients in this study. The resultant groups demonstrated not only distinct plaque composition but also increasing PV accompanied by increasing age and prevalence of comorbidities. In contrast, the present www.nature.com/scientificreports/ clustering was performed independently of patient clinical characteristics and other CCTA characteristics, such as the total PV. Nevertheless, the resultant clusters from both studies are in concordance with the known natural history of atherosclerotic plaque 24 , suggesting that the clustering was in accordance with the evolutionary stage of atherosclerosis at the patient level [25][26][27] . The present study provides deeper insight into the compositional changes during the plaque progression. Clusters 1 and 2 comprised patients who had earlier-stage coronary atherosclerotic plaques with more vulnerable plaque components, such as the necrotic core and fibro-fatty plaque, showed regression of these components and PV progression mainly driven by an increase in fibrous PV. Clusters 3 and 4 represented patients who had more advanced and stabilized plaques, with more calcium and showed PV progression mostly driven by an increase in calcified PV. The similarities in clinical characteristics between clusters can be attributed to the multifactorial influences on the advent and progression of coronary atherosclerosis in an individual. Nevertheless, Clusters 3 and 4 demonstrated a higher risk for plaque progression, independent of clinical risk factors, statin use, and baseline total PV. The differential progression status among clusters www.nature.com/scientificreports/ underlines the role of CCTA in evaluating plaque composition, in addition to obstruction severity and plaque burden. Furthermore, the significantly different risk for MACE between clusters suggests that patient-specific phenotyping of coronary atherosclerosis would facilitate personalized risk assessment and preventive treatment. The ability to predict how coronary atherosclerotic plaque progresses based on its composition may help clinicians decide who may benefit most from statins or other modifiers of atherosclerotic pathogenesis, while reducing harm. We additionally evaluated whether statin use differentially affects the plaque progression according to the baseline plaque composition. Although the PARADIGM registry's observational study design limits the direct comparison of the impact of statin use in each cluster, the subgroup analysis according to statin use provided clues regarding the differential impact of statins across clusters. The higher risk of plaque progression in Clusters 3 and 4 compared to Cluster 1 was only observed in statin-taking patients, not in statin-naïve patients. Similarly, the higher risk of MACE in Clusters 2, 3, and 4 was only observed in statin-taking patients. The observation of preserved differential plaque progression and clinical outcome in only statin-taking patients supports the need for a more personalized assessment of the cardiovascular risk and the deployment of a preventive strategy based on patient-specific plaque phenotyping. However, the value of patient-specific plaque phenotyping in facilitating personalized decision-making regarding statin use should be evaluated in randomized controlled trials that integrate CCTA with a targeted prevention strategy. Cluster 1 demonstrated decreased %vols of the necrotic core and fibro-fatty plaque, and increased %vols of fibrous plaque and calcified plaque. Although Cluster 2 showed a greater increase in the calcified portion than did Cluster 1, both clusters showed PV progression mainly driven by an increase in fibrous PV. In contrast, in Clusters 3 and 4, the fibrous plaque %vol decreased, and PV progression was mostly driven by an increase in calcified PV. www.nature.com/scientificreports/ Study limitations. First, the PARADIGM study enrolled patients with repeated CCTA scans. Therefore, the current study populations mostly comprised patients with low-to-moderate risk, and were, therefore, not eligible for invasive coronary angiography. Furthermore, as patients with more rapid progression were more likely to experience clinical events and might not attend a second CCTA, the study population tended to represent patients with earlier stage of coronary artery disease; the risk of this selection bias must be considered before generalizing these results to a higher-risk population. However, our results provide valuable clues regarding earlier changes in coronary atherosclerosis. The difference in plaque progression and MACE risk across the clusters indicates that the evaluation of plaque composition using CCTA has clinical implication from the earlier stage of coronary atherosclerosis. Second, the optimal number of clusters in k-means clustering is somewhat subjective; however, our decision to use 4 clusters was based on the Calinski-Harabasz Index and Average Silhouette Width, as well as the elbow method 28 . Furthermore, the visualization of the clusters suggests that the clustering was done in a clinically intuitive manner based on the %vols of the 4 different plaque components. Finally, although an external validation dataset was not available, because of the paucity of registries similar to the PARADIGM registry with serial and quantitative measures of each plaque component, the clustering algorithm provided stable phenotyping as supported by bootstrapping validation. Ideal clustering should not only have good statistical properties, but should also provide clinically relevant results. We believe that the current study results provide important clues to understanding the impact of patient-level plaque composition on plaque progression and change in its character.

Conclusion
In conclusion, unsupervised clustering analysis of patients with coronary atherosclerotic plaque identified substantial phenotypic heterogeneity in coronary atherosclerotic plaque composition. Patient-specific plaque phenotyping may help our understanding of heterogeneity in coronary atherosclerotic plaque progression. Further research is needed to determine the utility of patient-specific plaque phenotyping in personalized risk assessment and preventive treatment.