A multimodal approach to cardiovascular risk stratification in patients with type 2 diabetes incorporating retinal, genomic and clinical features

Cardiovascular diseases are a public health concern; they remain the leading cause of morbidity and mortality in patients with type 2 diabetes. Phenotypic information available from retinal fundus images and clinical measurements, in addition to genomic data, can identify relevant biomarkers of cardiovascular health. In this study, we assessed whether such biomarkers stratified risks of major adverse cardiac events (MACE). A retrospective analysis was carried out on an extract from the Tayside GoDARTS bioresource of participants with type 2 diabetes (n = 3,891). A total of 519 features were incorporated, summarising morphometric properties of the retinal vasculature, various single nucleotide polymorphisms (SNPs), as well as routine clinical measurements. After imputing missing features, a predictive model was developed on a randomly sampled set (n = 2,918) using L1-regularised logistic regression (lasso). The model was evaluated on an independent set (n = 973) and its performance associated with overall hazard rate after censoring (log-rank p < 0.0001), suggesting that multimodal features were able to capture important knowledge for MACE risk assessment. We further showed through a bootstrap analysis that all three sources of information (retinal, genetic, routine clinical) offer robust signal. Particularly robust features included: tortuousity, width gradient, and branching point retinal groupings; SNPs known to be associated with blood pressure and cardiovascular phenotypic traits; age at imaging; clinical measurements such as blood pressure and high density lipoprotein. This novel approach could be used for fast and sensitive determination of future risks associated with MACE.


Methods
Analysis dataset. Data from 3,891 individuals with type 2 diabetes were selected from the GoDARTS bioresource 9 . Participants underwent regular diabetic retinopathy screening and had digital fundus images that matched our quality criteria for semi-automated analysis of retinal vascular features with several clinical outcomes 10 . VAMPIRE 3.1 software (Vascular Assessment and Measurement Platform for Images of the Retina, Universities of Dundee and Edinburgh, Scotland, UK) [11][12][13] was used to semi-automatically measure features with a direct clinical interpretation. Features were measured from standard pre-defined annular zones, following well-established protocols 11 , and included optic disc (OD) radius, central retinal arteriolar equivalent (CRAE), central retinal venular equivalent (CRVE), retinal arterio-venule-ratio (AVR), tortuosity of arteries (tortA) and veins (tortV), by retinal zone, quadrant, vessel generation and vessel type (artery or vein). A total of 157 retinal features were available per image. Readers are referred to Supplementary Material for a detailed explanation of the different retinal feature sub-categories. Two trained operators (SH and RW) performed the measurements following a standard, validated protocol for VAMPIRE. Training for each operator was carried out over two sessions: 11 an introductory session where the protocols and software were presented and familiarity with them gained through practice on a demonstration image set (n = 20, one day); and an assessment session where competency in operation was assessed on a testing image set (n = 20, one day). Training lasted for approximately two days in total and was followed by periodic re-validation sessions.
We used a validated data linkage algorithm on anonymised electronic medical records of GoDARTS participants. The median of each clinical measure for a 3-year period prior to the date of the fundus photograph was obtained. Clinical measures were diastolic and systolic blood pressures adjusted for blood pressure lowering drugs, total cholesterol, high density lipoprotein (HDL) cholesterol, triglycerides levels and glycated haemoglobin (Table 1). Additionally, we incorporated information on the median number of blood pressure lowering drugs, smoking history, cardiovascular disease history, duration of diabetes, age at imaging, and sex. A total of 343 single nucleotide polymorphisms (SNPs) were also included. These were selected from the GoDARTS genotype database and consisted of available SNPs that had been identified in previous genome-wide association studies for cardiovascular disease 14 , blood pressure 15 and Alzheimer's disease 16 . Weighted genetic risk scores for each phenotype were constructed using the relevant SNPs. These risk scores were all included in the analysis, in addition to the entire set of individual SNPs. MACE was defined as hospitalisation for myocardial infarction or stroke, or cardiovascular death. This was determined through linkage with hospital admission and cause of death records similar to previously reported studies 10 . Participants were censored at date of non-cardiovascular death or last available date of follow-up. Analysis pipeline. Sampling and imputation of data. The dataset was rather heterogeneous as only 239 participants had no missing features. A total of 2,893 participants had between 1 and 61 missing features each (median missing retinal features = 0; genomic = 2; clinical = 0). This was higher in the remaining 759 participants, who each had between 270 and 384 missing features (median missing retinal features = 0; genomic = 347; clinical = 0). Following concatenation of retinal, genomic and clinical data, 75% of the cohort was sampled at random and used to build and fine-tune the model (model development set). The remaining 25% was retained for model validation (clinical validation set). A k-nearest neighbour algorithm (k = 10) was used to impute missing features using the knn.impute function of the bnstruct package in R 17 . In essence, the algorithm obtains imputed values from similar participant profiles; all available features were used to search for the neighbours. For continuous features, the neighbours' median value over the set of similar profiles was used, whilst for categorical features the mode was used. This step was blinded to the participant's MACE outcome to avoid leakage of class information into the predictive model. Imputation was undertaken separately for the development and clinical validation sets.
Computation of the multimodal MACE classifier. The model development set was used to build classifiers for predicting the binary outcome of MACE onset before censoring. The well-established L 1 -regularised logistic www.nature.com/scientificreports www.nature.com/scientificreports/ regression (lasso) 18 performed simultaneous feature selection and model estimation. Implementation used the R glmnet package 19 . No-MACE participants occurred 2.2 times as often as MACE participants in the dataset (see Table 1). To account for this class imbalance, weights were assigned to each observation (the MACE: no-MACE assigned observation weights were 2.2:1). The λ parameter, which controls the strength of regularisation and hence model sparsity, was fine-tuned using 10-fold cross-validation. The value resulting in the lowest binomial deviance, λ min , was identified and used to train a model on the entire development set. A second value, λ 1SE, is also of interest as it corresponds to the most regularised model leading to a binomial deviance within one standard error of that obtained at λ min , usually involving fewer features (given the less strict requirement on the binomial deviance). We performed risk stratification using a λ min -based model's prediction of an individual's probability of MACE before censoring. We further explored whether a more compact λ 1SE -based model could achieve similar performance outcomes.
Evaluation of performance on the clinical validation set. A tuned λ min -based model was validated on the clinical validation set (random 25% retained subset from the original cohort). The output probability predicted by the model was used to stratify patients into two groups, high-risk and low-risk, and to generate Kaplan-Meier plots. Stratification was undertaken using a predefined threshold identified by a 10-fold cross-validation stage, specifically the mean of the model output probabilities across participants. To assess statistical significance between both groups, a log-rank p value was computed using the survival package in R 20 . This process was repeated using a λ 1SE -based model 21 .
Evaluation of feature robustness using bootstrap. Different data samples give rise to different feature sets being selected when computing the classifier. We performed a bootstrap analysis to assess how likely features were to be selected over a large number of randomly selected training sets 22 . The frequency with which a feature or feature set was selected across the bootstraps was used as a proxy measure of feature robustness. A total of 500 bootstrap trials was carried out on the development set using λ min , and the proportion of times each feature had a non-zero weight was recorded, providing a measure of how likely each feature is to be selected. Binomial deviance was calculated across the 500 bootstraps, together with the corresponding 95% confidence intervals (CI). A total of 910 samples from each class was included in every trial to ensure class balancing.
In addition to individual feature occurrence, we computed the frequencies with which at least one retinal, genomic and clinical feature was selected across the bootstraps. Retinal features can be broadly divided into six sub-categories: tortuosity (108 features), width gradient (16), branching point (18), fractal analysis (6), OD-based (2), and Zone B width (5) features. Features within each sub-category may be highly correlated, and as such we were interested in the frequencies with which at least one feature from each retinal sub-category was selected

Results
Participant baseline characteristics. A total of 1,219 individuals were recorded as undergoing MACE during the follow-up period. The mean and median times to MACE following retinal imaging were 3.45 and 2.99 years respectively with a standard deviation of 2.48 years. The mean and median ages for these participants at imaging were 72.43 and 73.54 years respectively; 528 participants were female and 691 were male.
For the remaining participants (no-MACE), points of right censoring ranged from 0.05 years to 10.95 years post image-capture. The mean and median times to censoring were 7.39 and 8.81 years, respectively; standard deviation was 2.65 years. For no-MACE participants, the mean and median age at imaging were 68.17 and 69.20 years respectively; 1,293 participants were female and 1,379 were male. A breakdown of participant demographics for the development and clinical validation sets is shown in Table 1.
Model development and tuning. The cross-validation curve for the development set is indicated by the red dotted line in Fig. 1. Upper and lower standard deviation curves (error bars) are also plotted. Increasing the extent of regularisation reduced binomial deviance until a model that retained 51 features was reached; beyond that point, regularisation resulted in increased binomial deviance. Two selected λ values are indicated by the vertical dotted lines: λ min (minimum binomial deviance) and λ 1SE (binomial deviance within 1 SE of the minimum).
As can be seen in Table 2, the λ min -based model had 51 features from all three categories: retinal, genetic and clinical. Selected retinal features comprised optic disc radius, venular gradient width, venular fractal dimension, as well as various tortuosity measures. Of the genetic features included, 34 SNPs were selected. As per the Ensembl genome browser (Human GRCh38.p12 assembly), these variants had been previously shown to be associated with a variety of phenotypic traits that include cardiovascular disease, blood pressure, and Alzheimer's disease. Examples of selected SNPs and their known phenotypic associations include rs34923683: pulse pressure measurement; rs9549328: systolic blood pressure; rs687621: cholesterol; rs12921187: diastolic blood pressure; rs12413409: coronary heart disease; rs2048327: coronary heart disease; and rs11218343: Alzheimer's disease; (refer to Table 2 for a complete list of SNPs selected by the λ min -based model). Noteworthy, of the 3 composite scores analysed, only the cardiovascular gene score was selected. Finally, age at imaging and a variety of clinical measurements were selected by the model, namely: number of blood pressure lowering drugs taken, history of www.nature.com/scientificreports www.nature.com/scientificreports/ smoking, evidence of CVD before imaging, diastolic blood pressure, high density lipoprotein, glycated haemoglobin, triglycerides, and duration of diabetes.
Notably, whilst the λ min -based model had 51 features, λ 1SE was based on the selection of 7 features, indicative of a more compact, yet comparably effective classifier. 6 of the 7 λ 1SE features were routine clinical measurements and the features retained in the regularised model are also listed in Table 2 for λ 1SE .
Model Evaluation on retained clinical validation data. The clinical validation set was evaluated using the regularised models based on the development set features. Figure 2 shows Kaplan-Meier curves for λ min -based model predictions and overall time-to-event (censoring point) of the clinical validation set. The time scale used was age; left-truncation was ensured by subtracting the age at imaging from age at event (or censoring). Cases were stratified into two groups, high-risk and low-risk, using a pre-defined threshold (0.47) in the model development stage. The numbers of participants in each risk category are listed beneath the curves. Figure 3 shows plots from the same procedure carried out using a λ 1SE -based model.
The Kaplan-Meier curves generated using both models showed very similar patterns, whereby the number of individuals that belonged to the predicted high-risk group was consistently lower than half of those in the low-risk group, when observed at 2.5, 5, 7.5 and 10 years time points. In both cases, model predictions were Category Non-zero coefficient features • None selected  www.nature.com/scientificreports www.nature.com/scientificreports/ highly associated with the MACE event (log rank p < 0.0001), indicating that the feature-sets used to train them captured strong signal for cardiovascular risk stratification.
Readers are referred to Supplementary material for a further analysis using only routine clinical measurements, age at imaging and sex.
Bootstrap Analysis. In all 500 bootstrap trials, the model selected age at imaging, along with various groupings that always included features from each of the three main feature categories: retinal, genomic, and clinical ( Table 3).
All clinical features are listed with their corresponding frequencies. Those selected at high frequencies across the trials (greater than 75%) were evidence of CVD before imaging, diastolic blood pressure, smoking history, high-density lipoprotein and glycated haemoglobin.
Genomic features included 343 SNPs and 3 composite gene scores. Of the 3 composite scores analysed, only the cardiovascular gene score was selected at a frequency greater than the defined threshold (>75%). Given the number of SNPs considered, only those exceeding the defined frequency threshold of 75% are highlighted (Table 4). Only 11 of the analysed SNPs exceeded the threshold. An interesting observation is that all 11 SNPs identified by bootstrap were also present in the list of 34 SNPs previously selected when building the model once using λ min . The 11 SNPs and their known phenotypic trait associations, as per the Ensembl genome browser, were rs12921187: diastolic blood pressure; rs4308: diastolic blood pressure; rs2048327: coronary heart disease; rs9549328: systolic blood pressure; rs34923683: pulse pressure; rs687621: cholesterol; rs2014408: depressive symptoms; rs7136259: coronary heart disease; as well as rs3752728, rs2291435, and rs200999181.
Retinal features were grouped into the sub-categories described in Section 2.3.4. Features within each sub-category are highly correlated, and as such no individual retinal measurements were selected at high frequencies. We computed the frequencies with which the feature-set selected included at least one feature from each sub-category, and observed that all retinal sub-categories offered a highly robust signal: tortuousity sub-category (100%), width-gradient (100%), branching point (100%), OD-based (83%), fractal analysis (82%), and Zone B width (71%).
We finally computed the mean, median and standard deviation values of feature coefficients (β) across the 500 regularised models for individual features selected at highest frequencies (Table 4). This was carried out in an effort to illustrate the interpretability of our proposed approach. Whilst each bootstrap trial may offer a slightly different coefficient value, the coefficient sign is unlikely to change; utilising the coefficients of highly robust Figure 2. Kaplan-Meier curves for λ min -based model predictions and overall time-to-event analysis of the clinical validation set. Cases were stratified into two groups, high-risk and low-risk, using a pre-defined threshold (i.e., the mean predicted probability for the model-development set when λ min was used in a 10 fold cross-validation).
www.nature.com/scientificreports www.nature.com/scientificreports/ features can aid the answering of questions such as 'how does a one year increase in age affect the odds of an individual developing a MACE outcome?' .

Discussion
A cost-effective, non-invasive means of identifying high-risk individuals for MACE would be of tremendous value. In recent years, routine investigation of observable retinal characteristics has improved through advances in digital imaging, software capabilities, eye screening programmes, and wider availability through improved infrastructure at high street opticians. Previous evaluation of retinal features and cardiovascular risk has been limited. The cross-sectional, population-based Rotterdam study (n = 5,674) reported associations between wider venular diameter and atherosclerosis, inflammation and cholesterol 1 .
However, several studies have reported conflicting findings. The Cardiovascular Health Study reported associations between wide retinal venular calibre and high incidence of coronary heart disease (CHD) in both elderly women and men 2 , while other studies have found associations only in younger populations but not in elderly ones 23 . In light of these inconsistent findings, McGeechan and colleagues undertook a participant-level meta-analysis of over 22,000 participants from 6 studies 3 and concluded that retinal vessel calibre changes (wider venules and narrower arterioles) associated with an increased risk of CHD in women but not men. However, that meta-analysis excluded studies on diabetic populations.
Other studies have also sought to improve the prediction of MACE, based on non-retinal data. McCarthy and colleagues 24 developed linear models based on a 649 participants from the CASABLANCA study, incorporating a range of information on plaque erosion, acute phase reactants, inflammatory markers, and biomarkers of atherosclerosis. Model validation in an independent cohort illustrated the benefits and utility of integrating complementary clinical measurements from multiple sources to improve the prediction of individual MACE risk.
In this study, we investigated the combined potential of retinal parameters, genetic data and routinely collected clinical information for risk assessment of MACE in patients with type 2 diabetes. We used a regularisation approach in a supervised classification framework to develop a lasso-based predictive model. The model was developed and trained on a set of 2,918 participants and validated on an independent set of 973 participants. Lasso was similarly used to identify novel cancer biomarkers by Beck and colleagues 25 . The feature selection that underpins this approach is advantageous in that it summarises the multimodal features used into a single score (retinal vascular morphology, genetic data, clinical features). Additionally, the coefficients (β) associated with selected features can be used for interpreting the model and are relative to features' original scales e.g. If the β Figure 3. Kaplan-Meier curves for λ 1SE -based model predictions and overall time-to-event analysis of the clinical validation set. Cases were stratified into two groups, high-risk and low-risk, using a pre-defined threshold (i.e., the mean predicted probability for the model-development set when λ 1SE was used in a 10 fold cross-validation).
www.nature.com/scientificreports www.nature.com/scientificreports/ coefficient associated with age at scan is 0.03, this means that a one year increase in age at scan corresponds to exp(0.03) increase in odds of developing a MACE outcome.
A suitable value for the lasso λ parameter can be determined through optimisation on the model-development set. Two values of λ were considered: (a) one that corresponds to the lowest binomial deviance (λ min ), and (b) one that gives deviance within one standard error of (a) (λ 1SE ), achieving a similar performance whilst using a more compact set of features. A λ min model selected 51 features whereas a λ 1SE model selected 7. The λ min -based model   www.nature.com/scientificreports www.nature.com/scientificreports/ provides evidence for the utility of including retinal parameters, genetic data and clinical information to improve the accuracy associated with MACE risk stratification. However, comparable performance was achieved using mostly clinical information as identified by the λ 1SE -based model (Fig. 1). These observations support previous findings from UK Biobank and EyePACS cohorts 4 .
It is important with any statistical feature selection method, lasso included, to obtain estimates of the relative robustness of selected and discarded features. Evaluation of features using glmnet does not imply that unselected features are weak; they may simply be highly correlated with those retained in the model. Furthermore, feature selection is sensitive to data sampling effects. This highlights the need for estimation of feature robustness, an essential step in biomarker discovery. Therefore, we performed a bootstrap analysis and identified the features or feature-sets occurring at high frequencies, using selection frequency as a proxy measure of robustness. Bootstrap analysis has been similarly used as a measure of robustness to investigate gene interaction, albeit using a different feature selection method 22 .
One feature that appeared in each bootstrap trial was age at imaging. Routine clinical features selected with high frequency for inclusion in the model (defined as >75%) included history of CVD, diastolic blood pressure, smoking history, HDL, glycated haemoglobin and genetic features (11 SNPs and cardiovascular gene score). Furthermore, bootstrap analysis revealed that some retinal parameters were always included when building the model, although individual features were not selected with the highest frequencies. Further analysis of the measures of tortuosity, vessel width and branching point sub-categories identified similar patterns, whereby different combinations of feature-sets were always selected (i.e. in every bootstrap there was always at least one tortuousity feature, at least one vessel-width feature, and at least one branching point feature). Retinal features from the Zone B vessel width, fractal analysis and OD-based sub-categories were selected in 71%, 82% and 83% of the trials, respectively.
In conclusion, this study yielded three main findings. Firstly, a multimodal classifier that was trained on retinal, genetic and routine clinical features was able to stratify risk of MACE in this cohort of patients with type 2 diabetes. This offers exciting future possibilities, such as rapid and inexpensive population screening technologies for the early detection of cardiovascular diseases. Secondly, we showed that a classifier trained mostly on routine clinical features was similarly able to stratify risk of MACE in this cohort. This suggests that risk of developing cardiovascular disease can manifest in various forms, and whilst retinal and genetic data can unveil such information on cardiovascular health, readily available clinical data can offer a complementary perspective. This is in line with state-of-the-art findings recently published on UK Biobank's retinal and clinical data. Finally, we showed through a bootstrap analysis that all three sources of information (retinal, genetic, routine clinical) offer robust signal. In doing so, we also identified specific genetic variants that were selected at very high frequencies within each of the bootstrap models.
There are a number of limitations to our work. Firstly, we investigated only semantic retinal features, i.e. features capturing directly interpretable quantities of the vasculature. Non-semantic features ought to be included in the future, as candidates emerge from replicated, large deep-learning studies 4 , but ideally after their computation and clinical meaning have been clarified. Secondly, VAMPIRE retinal measurements are semi-automatic. While this reduces overall errors, it limits the number of images that can be measured in a given time period. The trade-off between accuracy and automation is currently under debate in the retinal image analysis community 26,27 . Thirdly, GoDARTS is a diabetic cohort. Hence, our findings complement those of similar studies on non-diabetic cohorts like UK Biobank and EyePACS but remain specific to the characteristics of our cohort. Replication on further diabetic cohorts is necessary. Fourthly, using one eye only per participant assumes sufficiently symmetric left-right measurements, an assumption sub judice in the recent literature 28 . Moreover, when carrying out missing feature imputation, the number of neighbours (k) used was set to 10. Investigating the optimal number of neighbours for use on this cohort using repeated cross-validation experiments, as well as investigating the optimal imputation strategy, would make interesting future work. Finally, longitudinal clinical information was represented by the median value of measurement across time; future work could use time series analysis to ensure more accurate data representation. In addition to the above, we plan to incorporate further retinal imaging modalities. Candidates being addressed in parallel studies including optical coherence tomography (OCT), OCT-angiography and ultra-wide-field-of-view imaging. The work will certainly require prospective analyses in clinical trials, but a reduction in mortality from CVD through early detection of risk by only a small percentage would represent several hundreds of thousands of lives saved annually worldwide, given that CVD represents 31% 29 of all global deaths.