Data-driven models for the prediction of coronary atherosclerotic plaque progression/regression

Coronary artery disease is defined by the existence of atherosclerotic plaque on the arterial wall, which can cause blood flow impairment, or plaque rupture, and ultimately lead to myocardial ischemia. Intravascular ultrasound (IVUS) imaging can provide a detailed characterization of lumen and vessel features, and so plaque burden, in coronary vessels. Prediction of the regions in a vascular segment where plaque burden can either increase (progression) or decrease (regression) following a certain therapy, has remained an elusive major milestone in cardiology. Studies like IBIS-4 showed an association between plaque burden regression and high-intensity rosuvastatin therapy over 13 months. Nevertheless, it has not been possible to predict if a patient would respond in a favorable/adverse fashion to such a treatment. This work aims to (i) Develop a framework that processes lumen and vessel cross-sectional contours and extracts geometric descriptors from baseline and follow-up IVUS pullbacks; and to (ii) Develop, train, and validate a machine learning model based on baseline/follow-up IVUS datasets that predicts future percent of atheroma volume changes in coronary vascular segments using only baseline information, i.e. geometric features and clinical data. This is a post hoc analysis, revisiting the IBIS-4 study. We employed 140 arteries, from 81 patients, for which expert delineation of lumen and vessel contours were available at baseline and 13-month follow-up. Contour data from baseline and follow-up pullbacks were co-registered and then processed to extract several frame-wise features, e.g. areas, plaque burden, eccentricity, etc. Each pullback was divided into regions of interest (ROIs), following different criteria. Frame-wise features were condensed into region-wise markers using tools from statistics, signal processing, and information theory. Finally, a stratified 5-fold cross-validation strategy (20 repetitions) was used to train/validate an XGBoost regression models. A feature selection method before the model training was also applied. When the models were trained/validated on ROI defined by the difference between follow-up and baseline plaque burden, the average accuracy and Mathews correlation coefficient were 0.70 and 0.41 respectively. Using a ROI partition criterion based only on the baseline’s plaque burden resulted in averages of 0.60 accuracy and 0.23 Mathews correlation coefficient. An XGBoost model was capable of predicting plaque progression/regression changes in coronary vascular segments of patients treated with rosuvastatin therapy in 13 months. The proposed method, first of its kind, successfully managed to address the problem of stratification of patients at risk of coronary plaque progression, using IVUS images and standard patient clinical data.

out in compliance with all guidelines and regulations, and the study was approved by the ethics committee and review boards of the Inselspital, University Hospital Bern (Bern, Switzerland), and all participating centers.
A 20-MHz ultrasound catheter (Eagle Eye, Volcano Cooperation, Rancho Cordova, CA) was used, at a speed of 0.5 mm/s.Images were acquired at 30 frames per second, meaning that frame spacing is 1/60 mm .Baseline (BL) and Follow-up (FU) pullbacks were acquired 13 months apart.For each (BL, FU) pullback pair, the largest common region available was assessed using dedicated software between two anatomical landmarks (e.g.distal: side branch, proximal: LM bifurcation or ostium of the RCA).Those common matching frames that were identified were used to manually identify the same anatomical region on both pullbacks.For the selected frames (mean frame spacing 0.4 mm), the lumen and the vessel contours were delineated using the same dedicated software (QIVUS, Medis, Leiden, The Netherlands).
A total of 140 arteries from 81 patients were included in the present study.Figure 1 presents the (BL, FU) pullback pair of IVUS images.Clinical variables definitions and demographics are provided in Table 1.The original IBIS4 study by 2 analyzed arterial cross-sectional geometry through area-based measurements and plaque composition.The later measurements were unavailable at the time of the present study.Therefore, the set of geometric descriptors shared with the IBIS4 study is listed in Table 2.

Frame-wise feature definition
For any given frame of a pullback, BL or FU, for which the contour delineation of the lumen (L) and the vessel (V) are available, a set of geometry-based descriptors are calculated, see Table 3.Since healthy arteries have, ideally, a circular cross-section lumen and vessel contours, with relatively uniform wall thickness, the proposed features aimed at capturing the contour geometry complexity, and deviation from such ideal references, using well-known measures such as eccentricity, circularity, and local curvatures.Also, we proposed features that aim at characterizing the plaque distribution over the contour.

Pre-processing
Frame-wise features were linearly interpolated to fill gaps between unevenly spaced frames and generate longitudinal signals.Features PB and V were used to (manually) co-register BL and FU signals, resulting in anatomi- cally consistent signals, with a uniform frame spacing of 60 frames/mm , see Figure 2, for an illustration.Manual co-registration consisted of shifting and clipping the tails of the signals until matching of the local extrema.
(1) PAV = PAV FU − PAV BL Table 2. List of features used in the original IBIS-4 study.We refer to these features as Clinical features.They are defined from the (lumen and vessel) area of the manually delineated contours at each frame and then condensed to volumes by numerical integration.The change in percentage atheroma volume is the variable of interest of the study.

L
Area enclosed by lumen contour, in mm 2 .
V Area enclosed by vessel contour, in mm 2 .

P
Area between vessel and lumen contours, defined as the V − L , in mm 2 .

Clinical condense features
TLV Volume of the lumen along a set of consecutive frames, integration of L , in mm 3 .
TVV Volume of the vessel a set of consecutive frames, integration of V , in mm 3 .
TAV Volume of the plaque along a set of consecutive frames, integration of P , in mm 3 .

PAV
The difference of the PAV variable between follow-up ( FU ) and baseline ( BL ) at a specific region of interest, in %.
Table 3. List of proposed geometric features based on manually defined lumen (L) and vessel (V) contours.
Proposed features aimed at capturing the contour geometry complexity using well-known measures such as eccentricity and local curvatures.Also, we proposed features that aim at characterizing the plaque distribution over the contour.
Ratio between the maximum and minimum lumen (L) or vessel (V) diameters, measured using the line that runs through the image origin and joins opposite points through the corresponding contour.
ξ P Distance between the centroids of the lumen and vessel contours, divided by the average between the lumen minimum and maximum diameters.

ρ [L|V ]
Percentage of circumferential angle for which the ratio between plaque thickness and lumen (l) or vessel (v) radius is over 0.2.

̺
Ratio between the maximum plaque thickness over the mean plaque thickness, along the circumferential direction.
Geometric definition of eccentricity for ellipses, 1 − (min/max) 2 , where min and max are the smallest and larger diameters of the lumen (L) or vessel (V).
Geometric circularity of a polygon, 4π A/P 2 , where A and P are the area and perimeter of the lumen (L) or vessel (V) contours.

τ [L|V ]
Curvature irregularity 12 , defined as the difference between the maximum and minimum curvature of the lumen (L) or vessel (V) contour.

κ [L|V ]
Curvature roughness 12 , reflects the lumen surface evenness concerning the curvature, smaller values representing a more circular or even surface, and a perfect circular lumen shape will have roughness being 1.It is calculated using the following formula (r/2π) κ 2 �l , where r is the radius of the circle best fitting the lumen or vessel contour, κ is local curvature, and l is the local length between adjacent points).
Vol.:(0123456789)  2 and 3, the first derivative was computed using a central finite difference approximation.Hereafter, ∂f indicates the first derivative of the frame-wise feature f, and it is itself a frame-wise feature represented as a longitudinal signal.

Region of interest
A single IVUS pullback can be used to image a large portion of an epicardial coronary artery.Changes in the PB, from BL to FU , along the entire segment can either be positive (progression) or negative (regression), in a wide range of magnitudes.Furthermore, by considering a certain threshold, the outcome could be an unchanged PB.Since the target variable, PAV , is defined over an arterial volume, it is key to define regions of interest within the pullback, for which the regression/classification model will analyze and predict the associated PAV (value, or sign).
We defined the ROIs in 6 ways to analyze the potential and limitations of the proposed classification methodology.We propose to use (a) the complete pullback (FP criterion); (b) ROIs defined by using the difference between PB of the FU and BL (ChPAV criterion); (c) ROIs defined by using thresholding on the BL PB signal (PBR criterion), and three different fixed-length widows (W30O10, W60O20, and W120O20 criteria).Table 4 presents the criteria used for ROI definition.

Condensed feature definition
All the frame-wise features presented here were condensed into ROI-wise features using classical statistical indexes such as the median, Shannon's entropy (H), and discrete Fourier Transform (FFT).Moreover, the ROI's length is also used as a condensed feature.Each of these ROI-wise features is a real number, and a set of n features represents each ROI as a point in an R n space.The arterial label, eg.LAD, LCx, or RCA, is the only categorical variable used in this work.Table 5, presents all ROI-wise features.

Machine learning model
We use an XGBoost regressor model 13 .Hyperparameters were defined empirically and remained fixed for all the tests performed in this work.The XGBoost setup consists of 256 estimators with a maximum depth of 12 levels, and the loss function was the squared error.The complete list of hyperparameters is given in the Appendix.www.nature.com/scientificreports/

Feature handling
We use an initial feature set, followed by standardization, and feature selection.Regarding the initial feature set, we employ the following notation for the three alternatives studied here.
• S a : set of clinical features, see

Training/test methodology
A repeated stratified k-fold cross-validation (RSKFCV) strategy was used for training and testing.Folds were always defined at the pullback level, i.e. using the FP criterion for ROI definition.Then, depending on the Table 4. Definition of criteria for the generation of ROIs.We propose to assess our methodology using the complete pullback (FP), the partition using the difference between PB of the FU and BL (ChPAV), a threshold on the BL PB signal (PBR), and three different fixed-length widows (W30O10, W60O20, and W120O20).

ROI code name Criterion
FP Full (complete) pullback.This criterion results in larger ROIs in which sub-regions of progression or regression may occur.Nevertheless, it is useful to establish a level of comparison and also to separate sets for training and validation at a global level, as will be shown in forthcoming sections.

ChPAV
Each pullback is divided into ROIs where the difference between BL and FU of PB is positive (or negative) in all (interpolated) frames of the ROI.Note that this approach for ROI definition is not applicable in clinical scenarios, where a new pullback is being analyzed and the FU data is not available.Nevertheless, for studying the potential of the method, this methodology provides an ideal ROI definition for PAV regression/classification models.Regions with |PB FU − PB BL | < 0.5% were discarded.Regions spanning less than 15 interpolated frames, i.e. less than 0.25 mm, were discarded.

W30O10
ROIs are divided using a fixed window size of 30 (interpolated) frames, which is shifted at 30 frames over the complete pullback.The ROI generation is repeated using 0, 10, 20 frames as offsets.Given the frames-spacing, these ROIs span 0.5 mm in length.

W60O20
ROIs are divided using a fixed window size of 60 (interpolated) frames, which is shifted at 60 frames over the complete pullback.The ROI generation is repeated using 0, 20, 40 frames as offsets.Given the frames-spacing, these ROIs span 1.0 mm in length.

W120O20
ROIs are divided using a fixed window size of 120 (interpolated) frames, which is shifted at 120 frames over the complete pullback.The ROI generation is repeated using 0, 20, 40, 60, 80, 100 frames as offsets.Given the frame-spacing, these ROIs span 2.0 mm in length.
Table 5. List of condensed features defined within each ROI.Frame-wise features are condensed into realnumber features using classical statistical indexes such as the median, using the Shannon entropy (H), and discrete Fourier Transform (FFT).In addition, the arterial label and ROI's length are also used.as condensed features..
Statistics over each frame-wise feature (f)

MED(f )
The median of f in the ROI.
Information-theory over each frame-wise feature (f) Shannon's Entropy from an approximation of the discrete probability function of the frame-wise feature f.
Fourier analysis over frame-wise feature (f) The magnitude of the 1st Fourier harmonic, of the point-wise feature f.
The phase of the 1st Fourier harmonic, of the point-wise feature f.

A
The arterial label, e.g.LAD, LCx, RCA, etc, that was interrogated. ℓ The length of the ROI.
Vol.:(0123456789)After iterations are completed, the mean and standard deviation of each prediction metric are gathered for assessing and comparing the performance.Figure 3 illustrates the complete pipeline to perform one regression/ classification experiment.
All tests reported in this study were performed using the same data set partitions at pullback level, for the RSKFCV loop.Also, the XGBoost configuration remained fixed for all tests.Regarding the feature selection, it also remained fixed (whenever used), the only parameter that changed was the number of features to be selected (either 8 or 32).The reader is directed to the Appendix for more details.
We make use of Shapley additive explanations (SHAP) values, proposed by 14 , to measure the impact of each feature on the output of the XGBoost model.The SHAP value of a feature can be computed for each sample and then averaged over the test set.The larger the mean absolute SHAP value of a feature over the test set, the greater the impact on the prediction of the model relative to the mean prediction over the test set.

Results
Table 6 presents basic statistics for the PAV variable considering the different criteria to define the ROI parti- tions.Furthermore, the total number of samples per class (progression " + " or regression "−") is reported, as well as the distribution in terms of mean, (std), [min, max] of the number of samples per class for training (tr) and test (te) sets for the RSKFCV iterations.Observe that the class representing plaque progression ( + ) is the minor- ity class for all partitions.Moreover, the class imbalance is more notorious at FP partition (34%), and it is most balanced for the ChPAV partition (47%), while the rest of the partitioning criteria exhibit a similar prevalence of the progression class (between 40% and 43%).In terms of sample size, the FP partition contains the smallest number of samples ( n = 140 ), followed by PBR with ( n = 763 ) which is almost 3 times smaller than the ChPAV partition sample size ( n = 2167 ).The other partitions are much larger in terms of sample size.
The statistics of the prediction metrics for all tests included in this work are presented in Table 7.These tests are characterized by the dataset (ROI definition criterion), and by the set of features/feature-selection-strategy employed.
For an in-depth analysis, we considered the ChPAV and PBR ROI definition criterion.The former is useful as a reference because it represents the best possible ROI partition criterion since all frames in a given partition will render either progression or regression.In turn, the PBR criterion was chosen because it rendered the best results among the rest of the ROI partition criteria.Moreover, we selected the best-performing feature set according to the mean MCC metric, as presented in Table 7. Specifically, the S d k=32 feature set for both, the ChPAV and PBR criteria, is used for an in-depth analysis.
Next, we exploited the fact that each ROI was used in 20 different models as part of the test set during the RSKFCV, therefore the means of the prediction and absolute error for each ROI are computed for the 20 models, which differ among ROIs.The correct classification rate (CCR) represents the number of times a given ROI was classified correctly by the set of 20 models.This metric is used to discriminate between CCR > 0.5 and those with CCR ≤ 0.5 , i.e. those ROIs that were more often classified correctly than incorrectly.In Fig. 4, ROIs were divided by their CCR status and ordered by the true PAV value (blue marker).The top panel corresponds to the ChPAV criterion and the bottom panel to the PBR criterion.Reddish regions stand for mostly incorrectly classified ROIs (CCR ≤ 0.5 ), and greenish regions for mostly correctly classified ROIs (CCR > 0.5 ).Black mark- ers represent the mean prediction per ROI (as given by the 20 models), and the envelope-colored area around those markers is obtained by adding and subtracting the mean absolute error.This allows both a qualitative and quantitative comparison of the mean accuracy of the models.Note that the mean absolute error (reddish and greenish regions) seems to be related to the magnitude of the PAV , except for the ROIs when CCR ≤ 0.5 in the PBR criterion when �PAV < 0 in the prediction.Another qualitative conclusion that can be extracted from the plots is that the mean prediction seems to be limited in a more narrow range than the actual PAV , more observations on this are made in the following paragraph.
In Fig. 5, besides separating ROIs by their CCR status, we discretized them according to their PAV in even intervals.Finally, we use the mean error of each ROI over the 20 models in which it belongs to the test group, to construct a box/violin-plot.It is noteworthy that distribution on both cases, ChPAV and PBR, condense 96% and 88% on the [−10, 10) PAV interval respectively.This interval corresponds to the prediction range mentioned in the previous paragraph.Moreover, classification rates outside of the interval are mostly correct.Which somehow shows that the prediction seems robust to out-layers.
In Fig. 6, the concept of mean percentage of correct classified length (CCL) per pullback is introduced, and it is used to dichotomize pullbacks by a threshold of CCL> 0.5 .Recalling the cross-validation strategy explained in Sect."Training/test methodology", folds are defined at pullback level, which ensures that all ROIs of a given pullback are used for training xor testing.Therefore, again we use for each pullback only the 20 models in which it was used as a test sample, to compute the mean and std of the CCL.Specifically, the CCL index for a given pullback is computed as the ratio of the summation of the lengths of ROIs correctly classified and the summation of all its ROIs lengths.In qualitative terms, there is no association of the CCL to the length of the pullback.The distributions of CCL are presented in the box-/violin-plots in Fig. 7.It can be noted that for CCL> 50 %, the mean and median are close to 70%, with an interquartile range (IQR) of around 20%, being slightly larger for the PBR case compared to the ChPAV one, also the distribution looks more homogeneous in the PBR case.Regarding the cases for CCL≤ 50 %, the ChPAV case presents half the number of samples in this category than the PBR case.Moreover, the distribution yields mean and median values close to 40% and an IQR around 10%.While the PBR case features more spread distributions of CCL, with mean and median close to 30% and an IQR around 25%.
Regarding the initial set of features, feature selection, and feature impact on model prediction, we present the following analysis.Scenarios presented here, namely ChPAV and PBR, used the same initial feature set and selection strategy, S d k=32 .The top panel in Fig. 8 presents the selection count and the selection ratio times the summation of the mean absolute SHAP value.Note that the mean absolute SHAP value of a feature is computed for each of the 100 models used during the RSKFCV procedure, using the corresponding test samples.Since feature selection of k=32 is performed prior to model training, we count the number of times a feature was selected throughout the 100 models and then compute the selection ratio as the selection count over 100, and we use it to weight the mean absolute SHAP value of the feature.Ordering by this weighted value and retrieving the top ten, we obtain the features with more impact on model predictions over the 100 models.The mid panel of Fig. 8 presents a histogram ordered by the selection count.This visualization shows that from the initial set of 242 features, the selection process always selects the same 10 (selection count equals 100), and that this count rapidly drops around 20 for the 50th feature.From there different features are rarely selected, as seen in the selection count plot.
Table 8 presents the selection count and total mean absolute SHAP values times the selection ratio (SR) values for all clinical features defined in Tables 1 and 2, in addition to the arterial label ( A ) and ROI length ( ℓ ).Interest- ingly, the TLV , TVV , and ℓ were rarely selected and when selected the impact in the models in terms of SHAP values was low, for the ChPAV and the PBR cases.Instead, the TAV impacted the ChPAV models considerably, but its contribution to the PBR models was poor.Overall, the PAV was the most impactful feature regarding the PBR models, while in the ChPAV case, the PAV contributed weakly.Also, although the arterial label A reached a high selection count, its impact was weak on the PBR models and was close to zero on the ChPAV case.In terms of the clinical variables defined at the patient level, they were seldom selected within the ChPAV cases, and, consequently, the overall impact was close to zero.In turn, for the PBR case, the MSX, DAPT, and HDLwere selected in all 100 models, although their overall global impact was weak.As for the rest of the clinical features, although they were selected more often in the PBR than in the ChPAV case, displayed poor relevance.
We now focus the analysis on the models that resulted in the best prediction performance according to the MCC metric.Again, for the ChPAV and PBR scenarios using S d k=32 feature management setup.Figure 9 presents the regression plots and their prediction metrics.Importantly, the ChPAV scenario renders better performance because ROIs are built, by definition, using the regression/progression feature as a proxy.Using this scenario as a reference, the PBR case resulted in almost double the mean absolute error (MAE=5.04%),moderate linear, and Mathew's correlation ( r = 0.37 and MCC=0.36)which are 0.07 and 0.16 lower than the ChPAV scenario.In terms of accuracy and mean F1-scores, we get fairly good results (ACC = F1 a = 0.65 ), compared to 0.76 in the ChPAV case.Regarding the most influential features for these scenarios, Fig. 10 shows the mean absolute SHAP values of the five most influential features.The top three features of each model were selected to visualize the entire ROI sample distributions in both scenarios, ChPAV and PBR, see Fig. 11.The foremost features are those related to plaque burden (PB) and lumen-area ( L ), according to mean absolute SHAP values.Namely, the FFT p (∂PB) , FFT m (∂� L ) , and IQR(PB) for the ChPAV case and the MED(PB) , and FFT p (∂� L )for the PBR case.Note that in the PBR case, the clinical feature HDL is the second most relevant feature for the model according to the mean absolute SHAP values.Nevertheless, as can be seen in Fig. 11, the mean HDL is not different between the regression/progression groups when taken as ROI level for ChPAV or PBR.This can be explained because HDL is defined at the patient level, meaning that for all ROIs of a pullback (regardless of the PAV sign), the HDL is the same.Therefore, it is the interaction of HDL with the other features that produce an impact on the model output.Analogous reasoning can be used to explain the impact of other patient-level clinical features on Table 6.For each ROI partition criterion, we present the sample Mean (std) of PAV and the total number of ROIs (n) and the number of ROIs of class regression ( n − ) and class progression ( n + ).The right part of the table provides the statistics (mean, std, minimum, and maximum) for the number of ROIs in the classes regression and progression for the training set (•) tr and for the testing set (•) te in the RSKFCV process.between treatment and plaque evolution in terms of basic statistics 1,3 .More recently, researchers presented predictive models to assess plaque evolution, see the supplementary material for a summary.
In this work, we presented an XGBoost regressor model to estimate the change in the percent of atheroma volume ( PAV ) as the end-point, which is the standard measurement of plaque evolution 1 .Previous works did not use such a machine learning model, nor used the PAV , because they center on frame-wise prediction  instead of region-wise prediction.We have detailed our pre-processing and training methodology, for which we use 5-Fold cross-validation (20-repetitions) at the patient level, which is considerably more robust than using cross-validation at frame-level since consecutive (and therefore very similar) frames are expected to provide highly correlated data into the training and validation phases.The input data for our method are vessel and lumen contours extracted from IVUS pullbacks, which are already the standard measurement in the clinic, in contrast to other methods that need VH-IVUS, OCT, or 3DQCA and segmentation of other structures such as lipids to obtain morphological features and to construct computational domains for complex computer simulations.Finally, the proposed model was developed with a subset of the IBIS-4 data set, spanning 140 arteries from 81 patients, becoming the largest data set ever used for plaque progression/regression prediction models, compared to state-of-the-art publications which used at most 9 pullbacks.This work is the first of its kind that aims to contribute to the field through: (i) proposing a complete methodology to process contours extracted from IVUS to generate a comprehensive geometric description; (ii) developing a machine-learning-based model to predict plaque progression/regression from such features and patient Figure 6.For each pullback, the CCL is computed and used to group samples.The length per pullback (blue marker, right axis), is used to sort them.Black markers (right axis) represent the CCL, that is, the average length of a pullback that was correctly classified (left axis).The colored areas are created by adding and subtracting the std of CCL as predicted by the 20 models.Top panel plots samples from the ChPAV ROI definition criterion, and the bottom panel plots samples for the PBR criteria.

Impact of ROI definition
When the full pullbacks were used for model training, i.e. when the FP ROI definition criterion was employed, the models trained with feature sets containing geometric-based descriptors performed poorly.Such a result could be attributed to the fact that full pullbacks in the data set cover different lengths and is common for them to contain regions in which the plaque will increase (progression) and others in which the plaque decrease (regression).The compensation of these regions in the total PAV is hard to predict when frame-wise features are condensed over the complete pullback.In other words, this confirms the fact that the plaque progression/regression problem depends upon focal phenomena, something that is widely accepted in the specialized literature.
In turn, the FP ROI definition criterion was adequate to test the feature subset S a , i.e. standalone clinical features defined at the patient level defined in Table 1.Under such a scenario, the method resulted in an average of 0.65 ACC and 0.21 MCC .It is worth noting that such predictive capabilities are similar to those obtained using the ROI definition criterion based on baseline plaque burden alone (PBR).Nonetheless, to correctly interpret such results, the following points should be taken into consideration.www.nature.com/scientificreports/ The next steps of this research should be oriented towards improving prediction performance by (i) collecting more data and (ii) envisioning a different analysis of ROIs.Regarding the first point, there are two possible courses of action, (i.a) try to access retrospective data of studies similar to IBIS4; or (i.b) design and execute new trials.Regarding the second point, we propose three courses of action, (ii.a) to process frame-wise features as spatial signals (in contrast to ROI-wise features strategies) using models for signal forecasting or natural language processing, such as Recurrent Neural Networks; (ii.b) to explore alternative image-related features from the IVUS pullbacks that could be markers for plaque evolution, such as plaque composition (e.g.image texture, among others), and localization (e.g.bifurcation); and finally, ii.c) to estimate hemodynamic and mechanical environments (through blood flow and structural computer simulations) to gather information on wall shear stresses or plaque inner stresses, and used them as complementary features in the methodology proposed here.

Limitations
While the machine learning algorithm presented in this study showed promising results in detecting coronary plaque regression or progression from IVUS pullback and standard patient clinical data, it is essential to acknowledge the limitations that should be considered in interpreting and applying the findings presented here.
• Regarding the characteristics of the data, it is known that the performance of any machine learning algorithm heavily relies on the quality and representativeness of the dataset used for training and evaluation.In this context, we highlight the following: -All patients were treated with high-intensity statin therapy during the 13-month time window between BL and FU acquisitions.Therefore, the data set may not fully represent the diversity and heterogeneity of lumen, vessel, plaque geometry, and hemodynamic/environmental scenarios that affect atherosclerotic plaque regression and progression mechanisms in different patient populations.-The size of the dataset, was deemed adequate for the present pilot study.Nevertheless, the present study must be replicated using a larger, more diverse patient cohort.
• Since lumen and vessel contours used in this work were manually defined by specialized cardiologists, the method's sensitivity to inter-and intra-observer variability remains to be investigated.Moreover, variability and subjectivity in the annotation process may introduce inherent biases.Furthermore, other intrinsic IVUS image characteristics, e.g.image noise, artifacts, image resolution, gating, and so on, may affect the accuracy of contour delineation, and consequently impact the prediction model.Automatic segmentation of lumen and vessel contours by machine learning algorithms can help to circumvent this issue and increase the number of patients involved in the analysis.• Although we tackle the interpretability of the model predictions through the incorporation of SHAP values, there is still a considerable number of features, relevant to the proposed methodology, which is not as intuitive as {PAV, TAV, TLV, TVV} (those commonly used by cardiologists).This could delay adoption in clinical practice.
In summary, further research and validation efforts are necessary to overcome these limitations and enhance the algorithm's robustness, generalizability, interpretability, and ethical and regulatory compliance.

Conclusions
In this work, we presented promising results toward predicting atherosclerotic plaque regression/progression over time from patient data at baseline.Specifically, clinical data was integrated with IVUS-derived data (lumen and vessel contours) at two time points, baseline, and 13-month follow-up, to train an XGBoost regressor/classifier.When such a model is trained/validated on regions defined by the very progression/regression of the plaque burden, the accuracy, and the Mathews Correlation Coefficient were, on average, 0.70 and 0.41 respectively, for stratified k-fold cross-validation ( k = 5 , r = 20 , 100 models in total).Using an ROI partition criterion based only on the plaque burden at baseline yielded, on average 0.60 and 0.23 for the accuracy and the Mathews Correlation Coefficient, respectively.The use of fix lengths along the pullback to define ROIs did not improve these metrics.The proposed framework enables the prediction of plaque changes (positive: progression, and negative: regression) in patients treated with rosuvastatin therapy.Moreover, the method may help to stratify patients at risk of coronary plaque progression, using IVUS images and standard patient clinical data.

Figure 1 .
Figure 1.For a selected patient, longitudinal views of the baseline IVUS pullback ( BL , left panel), and the corresponding follow-up IVUS pullback ( FU right panel).The horizontal axis indicates the frame number and the vertical axis indicates the pixel coordinate of the images, giving a sense of the image resolution of the IVUS frames ( 480 × 480).

Figure 2 .
Figure 2. Illustration of the manual BL/FU registration procedure for one selected pullback.Top panel: raw plaque, vessel, and lumen signals of the BL (blue-toned) and FU (red-toned).Bottom panel: co-registered signals.Co-registration produced anatomically consistent signals, by a two-step method: first, homogenizing frame spacing to 60 frames per second using linear interpolation; and second, manual shifting, and clipping tails of the signals until matching of the local extrema was reached..

4 .
Train an XGBoost regressor model configured as detailed in Sect."Machine learning model".5. Perform prediction over the test set.This involves standardization, feature selection (both steps using parameters from the training set), and then forward-passing through the model.6. Compute prediction metrics over the test set: mean absolute error (MAE), mean square error (MSE), Pearson's correlation coefficient (r). 7. Perform classification over the regression estimation using 0 as the threshold to classify in estimated progression or regression.8. Compute prediction metrics over the test set.Accuracy (ACC), Mathews Correlation Coefficient (MCC), F1-score for �PAV < 0 class ( F1 − ), F1-score for �PAV > 0 class ( F1 + ), and the average F1-score between the two classes ( F1 a ).

Figure 3 .
Figure 3. Illustration of the complete processing pipeline.The data preparation block comprises co-registration between BL and FU pullbacks, followed by signal interpolation, frame-wise feature computation, ROI definition, and ROI-wise feature characterization using elements from statistics, information theory, and signal processing.The cross-validation folds are defined at the pullback level, but actual partitions are defined from corresponding ROIs.Training samples are standardized, and a feature extraction algorithm is used (optionally) to reduce the dimension of the feature space.An XGBoost regressor is trained and then used to predict PAV data for the current fold.Finally, ROI-wise comparison of the predicted values by the regressor, and the resulting classification into progression/regression is performed, and several performance metrics are computed and analyzed.

Figure 4 .
Figure 4. Distribution of mean PAV prediction and absolute error segregated as a function of the true PAV .Mean prediction per ROI as is shown with black markers (left axis), the colored areas are computed by adding or subtracting to each black marker the mean absolute error of the ROI (left axis).Samples are separated by the correct classification rate (red: CCR ≤ 0.5 , green: CCR > 0.5 ), and ordered by the sample's PAV value, blue markers (right axis).Top panel corresponds to the ChPAV ROI definition criterion, and the bottom panel plot corresponds to the PBR definition criterion..

Figure 5 .
Figure 5. Violin-/box-plots of the mean PAV prediction per ROI for each range of the true PAV .Mean error in the prediction of PAV for the different ROIs, segregated according to the correct classification rate (red: CCR ≤ 0.5 , green: CCR > 0.5 ), and ordered from left to right according to the PAV value range.The top panel corresponds to the ChPAV ROI definition criterion, and the bottom panel plot corresponds to the PBR definition criterion..

Figure 7 .
Figure 7. Box-/Violin-plots for the mean percentage of correct classified lengths per pullback grouped by CCL>0.5.Left panel plots samples from the ChPAV ROI definition criterion, and the right panel plots samples for the PBR criteria.

Figure 8 .
Figure 8.The top panel presents the selection count and the weighted total mean absolute SHAP values of the top ten features for the ChPAV (left) and PB (right) ROI definition criterion.A sorted histogram of the selection count is presented in the bottom panel, again for the ChPAV (left) and PB (right) criterion.

Table 1 .
Baseline characteristics for the 81 patient sample.For continuous variables, the mean (std) is presented, for boolean variables the n (percentage of total) is presented.CAD: Coronary Artery Disease, PCI: Previous Percutaneous Infarction, DAPT: Dual Anti-Platelet Therapy.a Total cholesterol >5.0 mmol or 190 mg/ dL or requiring treatment.b >60 eGFR.

Table 1 ;
• S b : set S a plus set of geometric features from the original IBIS-4 study, see Table2; • S d : set S b plus all condensed features presented in Table5; d, where we select the best k = 32 features (see also the Appendix).
experiment, the corresponding ROIs for each pullback were accordingly assigned to the training or testing sets in the fold.Moreover, fixing the random generator's seed parameter of the RSKFCV implementation ensures that all experiments are trained/tested over the same pullback sets, what is changed are the ROIs defined inside the pullbacks.Stratification was based on the sign of the PAV variable, at the pullback level.The following pipeline is performed for each model/scenario.1. Choose the initial feature set from the ones defined in Sect."Feature handling" 2. Perform feature standardization using the training samples to compute standardization parameters, see Sect."Feature handling".3. Feature selection, see Sect."Feature handling".

Table 7 .
Predictive capabilities of the model trained/tested under different ROI definition criteria using the same RSKFCV procedure, with 20 repetitions and 5 folds.For each metric, the mean (std) and [min, max] values are reported.The global maximum mean values per column are highlighted in bold and italic font, while the local maximum values, i.e. per ROI partition criterion, are highlighted in italic font.

Table 8 .
2or all geometric features defined in the original IBIS-42paper and all clinical features defined at the patient level, the selection count and total mean absolute SHAP values times the selection ratio.Results are presented for the ChPAV and PBR ROI definition criteria.Importantly, some clinical variables (see Table8) impacted the predictions of the models based on PBR ROI criterion.Mean absolute SHAP values (blue bars), with corresponding standard deviations (black lines), for the top five features with more impact on model predictions.The models with the largest MCC for the ChPAV case (left) and for the PBR case (right) were selected.Violin-/box-plots of the top 3 features with more impact on the best-performing models in the scenarios ChPAV and PBR regarding the ROI definition criterion.