Assessing cardiovascular risks from a mid-thigh CT image: a tree-based machine learning approach using radiodensitometric distributions

The nonlinear trimodal regression analysis (NTRA) method based on radiodensitometric CT distributions was recently developed and assessed for the quantification of lower extremity function and nutritional parameters in aging subjects. However, the use of the NTRA method for building predictive models of cardiovascular health was not explored; in this regard, the present study reports the use of NTRA parameters for classifying elderly subjects with coronary heart disease (CHD), cardiovascular disease (CVD), and chronic heart failure (CHF) using multivariate logistic regression and three tree-based machine learning (ML) algorithms. Results from each model were assembled as a typology of four classification metrics: total classification score, classification by tissue type, tissue-based feature importance, and classification by age. The predictive utility of this method was modelled using CHF incidence data. ML models employing the random forests algorithm yielded the highest classification performance for all analyses, and overall classification scores for all three conditions were excellent: CHD (AUCROC: 0.936); CVD (AUCROC: 0.914); CHF (AUCROC: 0.994). Longitudinal assessment for modelling the prediction of CHF incidence was likewise robust (AUCROC: 0.993). The present work introduces a substantial step forward in the construction of non-invasive, standardizable tools for associating adipose, loose connective, and lean tissue changes with cardiovascular health outcomes in elderly individuals.

The concomitant loss of muscle mass and increase in adipose tissue in aging individuals suggest the use of quantitative imaging techniques, such as X-ray computed tomography (CT) or magnetic resonance imaging (MRI) to characterize overall changes in skeletal muscle [19][20][21] . Indeed, another defining characteristic of aging is the loss of muscle strength from both the reduction of dense contractile myofibers and the infiltration of non-contractile adipose tissue -a phenomenon known as myosteatosis 22 . These changes altogether present a reduction in muscle 'quality' , which has been cited as a significant causal mechanism in the loss of muscle function -particularly when in conjunction with reduced muscle mass 13,14,23 . CT imaging has shown particular utility in quantifying these changes 20,21 . This is often performed via the use of radiodensitometric absorption values, measured in Hounsfield units (HU). Here, changes in segmented cross-sectional areas have been used to illustrate changes in volume [24][25][26][27][28][29][30] , and changes in average HU values have been used to illustrate changes in muscle quality 31,32 . We have recently shown the utility of modelling entire radiodensitometric distributions from CT cross-sections of the mid-thigh, highlighting the novel nonlinear trimodal regression analysis (NTRA) method 33,34 . Indeed, soft tissue HU distributions associated with cross sections from the mid-thigh can be characterized by tissue types: fat, loose connective, and lean muscle (Fig. 1). These sub-distributions are Gaussian in form and can be defined by amplitude, location, width, and skewness parameters. These parameters establish a unique 11-term soft tissue profile for each individual that can be defined using NTRA analysis for whole HU distributions 32 . In developing and using these profiles, we have demonstrated the predictive value of these parameters with functional biometrics, as well as biochemical and nutritional data from healthy aging volunteers in the longitudinal AGES-Reykjavík study. This large-scale population research study (n = 3,157) was designed to examine risk factors and disease associated with aging, including genetic susceptibility and environmental interactions.
In the present study, we compare the integration of these 11 NTRA parameters to classify elderly at risk for CHD, CVD, and CHF using multivariate logistic regression modelling and three different tree-based ML algorithms: random forests (RF), ADA-Boost (ADA-B), and gradient boosting (GB). These algorithms were applied, using regression, by Recenti et al. 35 on the AGES database with the NTRA parameters to predict Body Mass Index (BMI). Figure 1 depicts this study workflow. Results from each ML model were assembled over a typology of four predictive comparisons: total classification score, classification by tissue type, tissue-based feature importance, classification by age. Further model validation was compared for each ML model using longitudinal CHF data. Results from this investigation highlight the substantial capacity of NTRA-based ML modelling to predict all three cardiovascular health outcomes; these findings are most evidenced by the high classification scores of RF models with CHF -findings which are further validated by the robust predictive performance of CHF incidence from longitudinal data. The present study altogether serves as a substantial step forward in the construction of reproducible tools for predicting cardiovascular health in elderly individuals.

Results
Descriptive AGES-Reykjavik statistics and NTRA parameters. Prior to the construction of logistic regression and ML models, descriptive statistics and mean NTRA parameters were assembled from the AGES-I and AGES-II databases. Table 1 contains a summary of these values. These NTRA parameters describe four fundamental features of each individual's HU distribution: amplitude, width, location and skewness. The amplitude and width terms generally describe the summed area of each tissue type. The location parameter indicates mean tissue radiodensity, while skewness reflects the geometrical symmetry of the muscle and fat Gaussian distributions (See Fig. 1). As shown, from the total sample size of n = 3,157 subjects who were present for both studies, Figure 1. Workflow of the present study with nonlinear trimodal regression analysis parameters Gaussian distribution: from a mid-thigh CT scan, 11 radiodensitometric distributions parameter are extracted and used as features for assessing cardiovascular risks through three tree-based algorithms.  Table 1. Summary statistics and nonlinear trimodal regression analysis parameters with relative standard deviation (SD) from AGES-I and AGES-II subjects by cardiac pathophysiology (coronary heart disease (CHD), cardiovascular disease (CVD), chronic heart failure (CHF), and no condition). Note: *From the total sample size of n = 3,157 subjects that participated in both the AGES-I and AGES-II studies, 585 individuals presented with more than one cardiac pathophysiology.
of ML models, the smote technique was applied for all cardiac conditions to obtain a balanced dataset with an equal distribution of sick and healthy people. In this phase, the 11 NTRA parameters were employed to make the predictions with GB, RF and ADA-B. K-fold cross-validation was employed three times (k = 8,10, and 12) to compute the pathophysiology predictions; here, the 12-fold cross-validation was empirically found to be the best option for predicting all three conditions (see Appendix C for k = 8 and k = 10 results). The results from k = 12 analyses are summarized in Table 4 and the respective ROC curves are shown in Fig. 3.
Regarding the ML analyses, CHF was classified with the highest overall scores; specifically, the RF method yielded the best results, evidenced by an accuracy of 95.9%, an exceptionally high AUCROC of 0.994, and all additional scores above 95.0%. Nevertheless, ADA-B likewise surpassed 90.0% accuracy and obtained a high AUCROC (0.987). Concerning the CHD condition, ADA-B again obtained the second highest accuracy among all pathophysiologies, and RF was again the best algorithm (85.0% in accuracy and AUCROC of 0.936). CVD was likewise accurately predicted, although the condition yielded the weakest overall results among the three, with a highest achieved predictive accuracy of 82.1% obtained from the RF method and AUCROC of 0.914.

NTRA-based classification by tissue type.
In regarding the elaborations presented by logistic regression, ML analyses were further employed with features grouped by the three tissue types defined by their inherent NTRA parameters, as described: N, μ, σ, and α for fat and lean muscle, and N, μ, and σ for loose connective tissue. Table 5 details the evaluation metrics computed per ML algorithm in this regard, defined by each tissue type and cardiac pathophysiology.
When predicting cardiac pathophysiology from NTRA defined tissue type (Table 5), the best results were again obtained from RF models; CHF was predicted with mean accuracies of 88.4%, 89.6% and 86.6% for fat, muscle, and connective tissue, respectively. Fat's features, in general, yielded the best overall predictive value for CHF. In comparison, CHD was predicted with an accuracy of 79.6% by fat and muscle, and 78.4% by connective tissue; all tissues yielded nearly identical overall predictive results. In predicting CVD, the tissues, commensurate with the previous ML results, obtained the lowest overall scores (under 80.0%). The highest model performances, in accordance with AUCROC, were achieved with the prediction of CHF, wherein all models surpassed the value of 0.9.
Tissue-based feature importance. Next, feature importance was computed and grouped again by tissue type defined by NTRA parameters, allowing for the comparison of the respective contributions from fat, muscle, and connective tissue NTRA values towards the accuracy of pathophysiology prediction. These tissue contributions are detailed in Fig. 4, alongside an example of a segmented false-color CT cross-section that illustrates the morphology of each NTRA tissue type. NTRA-based classification by age. As logistic regression models implicated age and sex as strongly significant confounders for prediction of all three cardiac conditions, we additionally sought to illustrate whether the excellent classification scores identified in initial ML analyses held with respect to age, indicating their relative dependencies. From the original database, individuals were classified into three subgroups according to their age: 66-75, 76-84, and 85-98 years old. Results from these analyses are shown in Table 6.
For CVD, the maximum classification accuracy and AUCROC were 82.1% and 0.914; splitting into three groups, RF kept on being the best algorithm and showed an accuracy between 78.0% and 85.4%, and an AUCROC between 0.875 and 0.937. Concerning CHD, the best accuracy and AUCROC were 85.0% and 0.937, respectively; subgrouping by age, RF obtained an accuracy above 82.0% for all subgroups and an AUCROC above 0.9 for each group. Finally, CHF showed again the best results with an accuracy range between 88.6% and 95.6% and AUCROC between 0.962 and 0.994 through RF. Despite subgrouping by age, results were still excellent,  Table 2. Multivariate logistic regression models for coronary heart disease (CHD), cardiovascular disease (CVD) and chronic heart failure (CHF) using soft tissue nonlinear trimodal regression analysis parameters from CT images of the mid-thigh. Notes: For each model, sex and age yielded strong significance (p < 0.001) as corrected confounders. *p < 0.05; **p < 0.01; ***p < 0.001.
www.nature.com/scientificreports www.nature.com/scientificreports/ presenting an accuracy range of 92.6% to 97.9% and AUCROC between 0.981 and 0.998. These results confirm that ML classification is accurate, independent from age as a confounder, and considering the operation of these algorithms, it is further reasonable to assume an analogous classification independence from sex in prediction.
NTRA-based longitudinal assessment. In order to validate the ML prediction results, a cross sectional dataset obtained between AGES-I and AGES-II was used; here, only CHF was possible to assess due to no change in the number of individuals who received a CVD or CHD diagnosis between the two study timepoints.
To test the predictive potential of our ML models against the diagnosis of CHF, an incidence index was defined; here, the null condition '0' was assigned as a control to subjects without CHF in either AGES-I or AGES-II, whereas '1' was assigned to those without CHF in AGES-I but with the condition in AGES-II. This method thereby removed all individuals presenting CHF at both timepoints. Table 7 illustrates the results from predicting CHF incidence using each of the aforementioned ML models.   Table 3. Mean nonlinear trimodal regression analysis parameters from AGES-I and AGES-II subjects by sex and cardiac pathophysiology. The following convention for the p-value was employed: *p < 0.05; **p < 0.01; ***p < 0.001.   www.nature.com/scientificreports www.nature.com/scientificreports/ As shown in Table 7, the RF method again yielded the best predictive accuracy (95.2%) and AUCROC (0.993) for the prediction of CHF incidence. In contrast, ADA-B was analogously second-best in predictive accuracy (94.3%), and GB was the least accurate of the three (88.3%). Nonetheless, each ML algorithm surpassed an AUCROC value of 0.95, as well as specificity and precision values greater than 90.0%.

Discussion
Deleterious changes in skeletal muscle in patients with poor cardiovascular health outcomes have been discussed in literature. Patients with CHF have been shown to develop significant ultrastructural abnormalities in their skeletal muscle, suggesting poor muscle oxidative capacity as reflected by decreased exercise capacity 36,37 . Indeed, abnormal skeletal muscle function, increased thigh intermuscular fat, and reduced exercise capacity have been   Table 5. The 11 nonlinear trimodal regression analysis parameters grouped by tissue type (fat, connective and muscle) were used to assess cardiovascular risks through machine learning algorithms and evaluation metrics were computed.
cited as primary chronic symptoms in heart failure patients with preserved ejection fraction (HFpEF) 38 . However, literature on the use of ML-modelling for the prediction of these conditions remains scarce, despite recent systematic review evidence that highlights its promising utility in datamining and classifying health outcomes 39,40 .
At the time of this work, only one study could be found that reports using ML-modelling of CT images to classify individuals according to cardiovascular health outcomes. In this study, coronary CT angiography images were combined with ML-modelling to develop an artificial intelligence-based imaging biomarker to predict myocardial infarction in healthy subjects 41 . However, the use of CT images of skeletal muscle for classifying cardiovascular health outcomes remains unreported. Furthermore, the methodological heterogeneity between ML-based clinical studies is generally high, as predictive parameters or ML methods remain largely study-specific and unstandardized. As such, the present work aimed to explore ML-modelling techniques to classify individuals diagnosed with CHD, CVD, and CHF using CT-based NTRA parameters as a quantitative construct for skeletal muscle health.
Summary of main findings. From our multivariate logistic regression models, several key trends emerged when comparing the odds ratios for each significant NTRA parameter. Notably, both fat amplitude and connective tissue width were significantly and inversely-related to all three outcome conditions; this suggests that an increase in fat tissue, concomitant with a wider connective tissue distribution, may be significant protective factors against cardiovascular pathophysiology. However, an increase in fat amplitude as a protective factor is somewhat counterintuitive, as increased skeletal muscle adiposity has been readily linked with poor cardiovascular health outcomes 42 . Nevertheless, these models indicate that connective tissue amplitude is significantly and directly related to all three outcome conditions, as an accumulation of pixels at this center radiodensitometric distribution was significantly associated with the probability of CHD, CVD, and CHF. Finally, as each model was generated from the same series of NTRA parameters, it is further useful to directly compare Akaike information criteria (AIC) to resolve any differences in trade-off between model fit and complexity. AIC values for the CHD and CVD models were relatively similar (5,971 and 6,657 respectively); however, the AIC of the CHF model (1,943) indicates its comparatively high parsimony, which implicates the CHF model for having the best predictive utility amongst the three 43 .
It is critical here to discuss the salience of these NTRA parameter changes to physiological changes associated with muscle degeneration. We have previously hypothesized that the characteristic infiltration of fat into lean muscle tissue defined as myosteatosis would result in a shift of 'pure' fat or muscle CT pixels towards the center of www.nature.com/scientificreports www.nature.com/scientificreports/ the HU distribution due to radiodensitometric value averaging 34 . This could, in-turn, result in several distributional changes that may occur independently; decreases in fat and muscle amplitude, a shift in fat and muscle peak locations towards zero, an increase in connective amplitude and a decrease in its width, and increases in fat and muscle skewness magnitude. Here, we see all of these phenomena together in the logistic regression prediction of all three adverse cardiovascular outcomes, with the exception of skewness terms. Indeed, this offers a possible explanation for our aforementioned counterintuitive protective factors of increased fat amplitude and connective tissue width for all three conditions. Altogether, these results serve as strong evidence that NTRA parameters hold utility in linking subtle physiological indicators of myosteatosis with cardiovascular health. While this relationship is strong for the classification of CVD and CHD, the prediction of CHF is particularly robust.
It is likewise important to discuss the pathophysiological characteristics of the three cardiovascular outcomes utilized in this study to interrogate the particular predictive strength of CHF and relative similarity in prediction of CVD and CHD. Firstly, CVD is understood as an overarching typology of cardiovascular conditions that includes CHD alongside a host of other disease types, such as atherosclerosis or myocardial infarction 44 . As such, the comparative prediction of all-type CVD and CHD may be expected to be relatively similar. Contrastingly, CVD and CHD have been implicated as a primary etiology of CHF alongside other key comorbidities such as diabetes 45 . As such, while CHF may be a downstream consequence of CVD or CHD, its prediction likely relies on additional exogenous factors and may therefore be relatively independent. This could explain the relative similarity of significant logistic regression terms and AIC for CVD and CHD compared to CHF; furthermore, residual   Table 7. The 11 nonlinear trimodal regression analysis parameters from AGES-I were used to predict the presence of chronic heart failure in AGES-II through machine learning algorithms and evaluation metrics were computed.
www.nature.com/scientificreports www.nature.com/scientificreports/ diagnostics and predicted probability curves (Appendix A) show striking similarities between CHD and CVD models which largely differ from CHF curves.
From our ML models, there were again similarities between the classification accuracy of CVD and CHD, while CHF classification consistently outperformed the other two conditions. Nevertheless, all three conditions yielded high overall accuracies and excellent AUCROC values, suggesting the high general utility of NTRA-based modelling for all outcomes. Regarding tissue-based feature importance (Table 4), several key insights are shown, with differences apparent between cardiovascular conditions. Firstly, fat had a predominate role in classifying CHD (41.0%), while muscle had a comparatively minor contribution (11.9%). Contrastingly, lean muscle gave the highest contribution in classifying CHF (41.0%), while connective tissue yielded the lowest contribution (24.9%). Finally, fat and connective tissue gave almost the same contribution in classifying CVD (about 33.2% and 31.3%, respectively), while lean muscle was comparatively much lower (17.6%). These condition-based differences in classification indicate the potential specificity of tissue types to each condition, further suggesting the importance of segmenting classifying parameters by these three tissue types, which is one of the key features of NTRA computational modelling.
The value of the present work. In general, this work features several key novelties for the use of skeletal muscle to classify cardiovascular health in advanced age. Firstly, we describe the NTRA computational modelling method, wherein radiodensitometric distributions from CT image cross-sections yield 11 subject-specific soft-tissue parameters that altogether present a robust and standardizable construct for quantifying muscle degeneration. This method has shown sensitivity and specificity to lower-extremity function and nutritional parameters in previous investigations 33,34 , but the present use of these parameters to classify cardiovascular health outcomes is new. Furthermore, the present work utilizes these NTRA parameters to compare the classification accuracy of three tree-based ML model algorithms with standard multinomial logistic regression, which is again novel in the context of cardiovascular health. Finally, we validate the ML classification results using longitudinal CHF data to independently model the prediction of CHF incidence.
Altogether, a key advantage of this methodology is its derivation from CT images. As a non-invasive and standardized imaging modality that is widely utilized for diagnostic applications and pathophysiological monitoring, CT-derived HU distributions of soft-tissue radiodensity can be directly compared across clinical contexts. As such, the present use of NTRA-based classification is highly reproducible and can be readily built into existing CT analysis frameworks for patient evaluation. This tool can be further adapted into additional ML-based platforms for the detection and monitoring of adverse health outcomes in accordance with the current paradigm shift towards personalized medicine 46 . Altogether, the present work serves as a substantial step forward in the construction of reproducible tools for associating skeletal muscle changes with cardiovascular health outcomes in elderly individuals.

Limitations.
As the AGES-Reykjavik study consisted of otherwise-healthy volunteers (presenting with or without various pathologies), standard clinical measurements of key cardiac functions, such as coronary perfusion or ejection fraction measurement, were absent from the dataset. For this reason, the primary purpose of this work to test the classification of cardiac health from NTRA parameters. However, the validity of our results would be strengthened by the classification of these intermediate clinical measurements, as the outcomes of CVD, CHD, and CHF are largely heterogeneous in nature. The future use of our reported methods with clinical cardiovascular data would likewise allow for the interrogation of the causal relationship between cardiac health outcomes and changes in radiodensitometric NTRA values. Further testing of this relationship using independent patient cohorts may likewise be needed to further refine our ML models.
Although in the multinomial logistic regression there are graphical (Fig. 2) and statistical (Table 3) indications of sex differences between the NTRA distributions, particularly associated to muscle and fat amplitude, this research did not investigate deeply this theme. Thus, further studies could focus more on this direction.
Finally, while evidence for the classifying power of ML-modelling continues to grow, its literature base still lacks a standardized methodology, and the mechanisms governing some of these classifications may remain unclear. As such, exploring the contextual value of different ML-modelling algorithms remains essential.

Materials and Methods
The AGES-I and AGES-II database. The AGES-Reykjavík study recruited 3,316 healthy subjects from 66-98 years of age (mean: 77.46) to participate in a series of two multimetric assessments separated by approximately five years, collectively defined as the AGES-I and AGES-II database. Informed consent was obtained from all participants 47 , ethical approval for patient data acquisition was obtained by the Icelandic Science and Ethics Committee (RU Code of Ethics, cf. Paragraph 3 in Article 2 of the Higher Education Institution Act no. 63/2006), and patients' data were acquired in accordance with relevant international regulations of both Iceland and U.S. National Institutes of Health.In addition to receiving CT scans (see 'CT acquisition') and having a host of nutritional, neurological, and lifestyle parameters measured or surveyed, subjects were assessed for the incidence of CVD, CHD, and CHF. Of the original recruitment, n = 3,157 subjects participated in both the AGES-I and AGES-II studies separated by five years; as new CT images and incidences of cardiovascular pathophysiology were obtained separately in both studies, the total dataset size for the present work contained 6,314 records.
CT acquisition and segmentation. All participants in the AGES-Reykjavík database were scanned with a 4-row CT detector system at 120-kV (Sensation; Siemens Medical Systems, Erlangen, Germany) as previously described 34 . The localized scanning region extended from the iliac crest to the knee joints; prior to transaxial imaging, correct positions were determined by measuring the maximum femoral length on an anterior-posterior localizer image, followed by the localization of the center of the femoral long axis. After image acquisition, for each subject, a single 10 mm section was taken from mid-thigh, midway between the acetabulum of the hip joint and the knee joint. Pixels from this slice were then processed to obtain subject-specific distributions of radiodensitometric values across the range of −200 to 200 HU.
Nonlinear trimodal regression analysis (NTRA). The method utilized to computationally describe each HU distribution was a form of modified nonlinear regression analysis that has been previously described 33 . Here, each HU distribution is defined as a quasi-probability density function defined by three Gaussian distributions (two skewed and one standard): x N N e e rfc x ( , , , , where N is the amplitude, μ is the location, σ is the width, and α is the skewness of each distribution -all of which are iteratively evaluated at each CT bin, x. This trimodal definition operationalizes the hypothesis that HU distributions across segmented soft tissue represent the sum of three distinct tissue types whose linear attenuation coefficients primarily occupy specific HU domains: namely, fat [−200 to −10 HU], loose connective tissue and atrophic muscle with approximately water-equivalent absorptivity [−9 to 40 HU], and lean muscle [41 to 200 HU]. The inwardly-sloping asymmetries characterized by fat and muscle distributions can be described respectively by their positive and negative skewnesses, whereas the central 'connective' tissue distribution is assumed to be non-skewed. Utilizing this definition, theoretical curves can be iteratively generated for each HU distribution by employing a generalized reduced gradient algorithm via the minimization of the sum of standard errors at each CT bin value. This method thereby generates 11 NTRA parameters that are altogether unique to every individual's CT image.
Multinomial logistic regression models and statistical analyses. As a comparative and complimentary analysis to ML modelling, three multivariate logistic regression models were first generated using generalized linear models employing the logit link function. Classification was defined for CHD, CVD, and CHF binary indicator variables, with each of the 11 NTRA parameters taken as independent predictors, with age and sex corrected for as hypothesized confounders; in total, 62 individuals were removed due to missing pathophysiology data. Predicted probabilities curves were then generated for each model, along with scatter plots for each NTRA predictor generated against the logit for each cardiovascular outcome to identify any nonlinearity in predictor variables. Deviance residual diagnostic plots were likewise generated to assess model heteroscedasticity and identify any outliers with sufficient leverage, as defined by Cook's distance. Next, log-odds coefficients for each NTRA parameter were exponentiated to enable the direct comparison of their contributory odds ratios for each cardiovascular pathophysiology, along with 95% confidence intervals and individual-level statistical significance. Finally, overall model significance was calculated by computing the differences in χ 2 values between null and residual deviances; logistic regression classification accuracy for each model was later computed alongside ML models to facilitate comparison.

ML methodologies.
After computing logistic regression models to predict CHD, CVD, and CHF, three tree-based ML model algorithms were performed as a methodological comparison of prediction accuracy using the 11 NTRA parameters: random forests (RF), ADA-Boost (ADA-B), and gradient boosting (GB). First, however, the 'Smote' technique for achieving dataset balance and k-fold cross-validation were utilized to ascertain the optimum number of mutually-exclusive folds for ML models to train and test. Following this, the results from each ML model were assembled over a typology of five comparisons: total classification score, classification by tissue type, tissue-based feature importance, classification by age, and finally classification with longitudinal data. Each of these analyses is described in the following sections.
Knime analytic platform. The Konstanz Information Miner (Knime) analytics platform (v. 3.7.1) was employed to conduct the ML model analyses in the present study 48 . In the Knime platform, ML analyses are managed through a comfortable and intuitive workflow by combining multiple nodes and facilitating the configuration of each parameters to optimize results. Knime was in the class of "leaders" identified by the Gartner Magic Quadrant in 2017, and its validity is widely acknowledged in literature 49  Smote. Some supervised algorithms learning (such as decision trees) require an equal class distribution to obtain better and realistic classification performance. When required for the present ML methods, 'Smote' (Synthetic Minority Over-sampling Technique) was employed -a technique that implements an algorithm 55 that generates artificial data by extrapolating between a real object of a given class and one of its nearest neighbors (of the same class). It then chooses a point along the line between these two objects and determines new object attributes based upon this randomly chosen point.

K-fold cross-validation.
Finally, prior to ML modelling, the statistical procedure known as k-fold cross-validation was employed 56 ; this method divides a dataset randomly into 'k' mutually-exclusive subsets (or 'folds') of equal dimension. This model is then trained and tested 'k' times, wherein each training is performed on different 'k-1' folds and tested on fold 'k' . The cross-validation estimate of accuracy is defined as the overall number of correct classifications divided by the number of instances in the dataset. (2020) 10:2863 | https://doi.org/10.1038/s41598-020-59873-9 www.nature.com/scientificreports www.nature.com/scientificreports/ Machine learning tree-based algorithms. The ensemble learning techniques of randomization, bagging and boosting were applied on decision tree. On the one hand, decision tree is the easiest algorithm known in literature and it does not need the normalization of the six thousand patients in AGES dataset; on the other hand, it is a weak and instable learner and ensemble techniques are useful to improve the performance of weak and instable algorithms and reduce the noise in AGES dataset.
The first ML method employed for this work was the Random Forests (RF) ensemble learning method, which features Decision Trees that share identical basic properties and the capacity to avoid overfitting 57,58 . Each tree is learned on its own, but some randomization is injected into this phase to reduce the variance of the predictions; this is performed by subsampling the AGES on each iteration to get a different training set or consider different random subsets of the 11 NTRA to split upon at each tree node. To make a prediction on a new patient, RF aggregates predictions from all their decision trees by a majority vote.
The second ML method we utilized was Ada-Boost (ADA-B) -another ensemble method belonging to the boosting family, whose core principle is the strengthening of weak learners 59 . ADA-B training selects only the NTRA parameters that improve the predictive power of the model, reducing model complexity in terms of dimension and thereby improving execution time. Data modifications at each boosting iteration consist of applying weights to every training sample, setting them such that the first step consists of training the learner on the original training data. For all other successive iterations, sample weights are modified, and the learning algorithm is applied again to the data with its new weight. At a given step, patients used for training that were wrongly predicted by the boosted model at the previous step have their weights increased, whereas these weights are decreased for examples that were predicted correctly. As iterations proceed, patients that are difficult to predict/ diagnose receive ever-increasing influence. Each sequential weak learner is then forced to concentrate on patients that are previously missed.
The third ML method utilized for the present work was Gradient Boosting (GB); this method produces competitive, highly robust, interpretable procedures for both classification and regression, which is especially appropriate for mining sub-optimally clean data. Our implementation follows the algorithm of Friedman 60 . Not only does this method exploit randomization and bagging principles, but it also includes a special form of boosting to build an ensemble of weak models (in this case, decision trees).

Evaluation metrics.
A wide range of evaluation metrics are well known in literature 61 , but the following six were employed for this study: • Accuracy: the number of correct predictions over their total number.

Data availability
The AGES I-II dataset cannot be made publicly available, since the informed consent signed by the participants prohibits data sharing on an individual level, as outlined by the study approval by the Icelandic National Bioethics Committee. Requests for these data may be sent to the AGES-Reykjavik Study Executive Committee, contact: Ms. Gudny Eiriksdottir, gudny@hjarta.is.