Introduction

Cardiovascular diseases (CVD) are a major public health burden1. Prognostic CVD prediction models allow identifying individuals at high risk that are eligible for lifestyle interventions and preventive treatment by estimating individual CVD risk. Their development is largely focussed on applications in clinical settings to support treatment decisions as for example with the Systematic COronary Risk Evaluation (SCORE) and the Pooled Cohort Equations (PCE)2,3,4,5. However, as these evaluations require information from physical examinations (blood pressure) and blood tests (cholesterol), application of these scores is unfeasible in most physician-independent settings like self-assessment of individuals, health education campaigns, and step-wise screening procedures including a non-clinical stage. The few available non-clinical models to be used independently of physical examinations are limited in terms of study design, originating from case–control studies or high-risk cohorts6,7; short follow-ups and lack of equations to calculate absolute risks6,7; the endpoints, predicting only myocardial infarction (MI) or stroke7,8; or inclusion of dietary predictors on a nutrient level requiring assessment of a large variety of individual foods, thus hampering the applicability in practice6,9. We only identified one model allowing large-scale estimation of individual CVD risk based on non-clinical parameters10. However, despite established risk associations, the score does not include potentially informative dietary information11.

Moreover, overlap in risk factor profiles of CVD and type 2 diabetes (T2D) offers the potential for combined risk assessment with only minor deviations in the required predictors, including dietary parameters. The German Diabetes Risk Score (GDRS) is a multiply validated non-clinical score to predict T2D and its extension for CVD risk prediction would enable simultaneous quantification of individual CVD and T2D risk in non-clinical settings12.

Thus, we aimed to develop and externally validate a non-clinical risk score to predict 10-year CVD risk based on shared predictors with the GDRS and to compare its performance to the identified non-clinical and established clinical CVD risk scores. Furthermore, we developed a clinical extension with routinely available clinical predictors for step-wise screening approaches.

Results

Descriptive comparison of the unimputed and imputed data, including the proportion of missingness, is presented in the supplement (Supplementary Table (ST) 1). The median follow-up time in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam was 11.35 years (interquartile range (IQR) 1.38). Both samples contained proportionally more women than men (female EPIC-Potsdam: 61.6%; EPIC-Heidelberg: 54.6%) and the median age at baseline was 50 years (Table 1). Prevalence of self-reported hypertension was higher in Potsdam (31.8%) compared to Heidelberg (27.2%), while the proportion of participants reporting a family history of CVD was higher in Heidelberg (52.8%, Potsdam: 37.1%), as well as current heavy smoking (≥ 20 units/day) at baseline (Potsdam 5.7%, Heidelberg 9.5%).

Table 1 Baseline characteristics of the EPIC-Potsdam and EPIC-Heidelberg cohorts.

Score derivation

The final non-clinical model included the predictors age, gender, waist circumference, smoking status, self-reported hypertension and T2D, CVD family history, and consumption of whole grain, red meat, coffee, high energy soft drinks, and plant oil. The clinical model additionally contained systolic and diastolic blood pressure, total and HDL cholesterol.

The proportional hazards assumption was fulfilled for all included predictors. The supremum test for functional form was only significant for ‘CVD points’ in the clinical model. However, subsequent examination of the according restricted cubic splines did not indicate strong deviations from a linear function (Supplementary Figure (SF) 1).

Estimates derived by using Cox proportional hazards regression and the Fine and Gray model were overall comparable. However, comparison of the model performance indicated slightly better calibration of absolute risks by the Fine and Gray model compared to the Cox model in the upper risk range (SF2). As a consequence, we proceeded with the competing risk approach.

Adding statistically significant interaction terms or squared terms as well as deriving gender-specific equations of the Fine and Gray models did not improve overall performance relevantly (SF3).

The final parameters used for absolute risk calculation based on the competing risk model are depicted in Table 2 (example calculation: Supplementary Note (SN) 1).

Table 2 Risk associations of the included predictors with CVD and parameters used for absolute risk calculation.

Performance in EPIC-Potsdam and EPIC-Heidelberg

Discrimination

Competing risk-adjusted C-indices indicated good discrimination of both developed models in EPIC-Potsdam (non-clinical: 0.786, 95% confidence interval (95%CI) 0.736–0.832; clinical 0.796, 0.746–0.841) and EPIC-Heidelberg (non-clinical: 0.762, 0.715–0.807; clinical: 0.769, 0.721–0.813). The categorical Net-Reclassification-Improvement (NRI) suggested only slight improvement of risk category assignment by additional clinical parameters (NRI EPIC-Potsdam: 0.015, 95%CI − 0.028 to 0.057; EPIC-Heidelberg 0.078, 0.041–0.116). Sensitivity and specificity in both cohorts are shown in the ST2. As an example, when using a cut-off of 5% predicted risk in EPIC-Heidelberg, sensitivity and specificity were 48.8% and 83.4% for the non-clinical and 53.3% and 81.9% for the clinical score.

Comparison of the performance with established risk scores demonstrated that the two derived equations reached the highest C-indices in EPIC-Potsdam (e.g., Framingham CVD Risk Score (FRS) with blood lipids 0.781, 0.730–0.828) (Fig. 1). In EPIC-Heidelberg, C-indices were overall slightly lower than in EPIC-Potsdam. The C-index of the non-clinical score ranged among the highest, comparable to established clinical scores (e.g., FRS with blood lipids 0.764, 0.717–0.809), while the derived clinical score still showed the highest C-index. C-indices of the non-clinical chronic metabolic disease (CMD) score were considerably lower in EPIC-Potsdam (0.738, 0.685–0.789) and EPIC-Heidelberg (0.722, 0.672–0.769).

Figure 1
figure 1

Discrimination of the developed scores and established CVD risk scores in EPIC-Potsdam and EPIC-Heidelberg. Discrimination is depicted as C-indices adjusted for competing risk analyses and 95% confidence intervals (95%CI). EPIC, European Prospective Investigation into Cancer and Nutrition. CMD, chronic metabolic disease. BMI, body mass index. MI, myocardial infarction. PCE, Pooled Cohort Equation. SCORE, Systematic Coronary Risk Evaluation.

Calibration

The derived scores were well calibrated for the majority of individuals in the lower nine deciles of predicted risk while they slightly overestimated risk in the highest decile of predicted risk (Fig. 2). Expected-to-observed ratios were 1.17 (95%CI 1.08–1.27) for the non-clinical and 1.13 (1.04–1.22) for the clinical score in EPIC-Potsdam and 1.05 (0.97–1.13) and 1.11 (1.03–1.20) in EPIC-Heidelberg, respectively. Calibration plots suggested slight overestimation of risk by the recalibrated PCE (Fig. 2) and substantial overestimation by both FRS (not shown).

Figure 2
figure 2

Calibration plots for the developed scores and the recalibrated PCE in EPIC-Potsdam and EPIC-Heidelberg. Observed and predicted CVD risk is grouped by deciles of predicted risk and plotted with the according 95% confidence interval (95%CI). Distribution of predicted risk up to the 99th percentile (p) is indicated in the background. EPIC, European Prospective Investigation into Cancer and Nutrition. PCE, Pooled Cohort Equation.

Subgroup and sensitivity analyses

Subgroup analyses indicated that C-indices were consistently higher for women compared to men and for MI compared to stroke for both derived scores in EPIC-Potsdam and EPIC-Heidelberg (SF4).

Calibration plots showed better calibration of the scores for women than men, with a more pronounced overestimation of risk for the higher decile groups of predicted risk in men (SF5).

Additional appraisal of CVD mortality discrimination resulted in higher C-indices for the derived scores than for SCORE in both cohorts (C-index EPIC-Heidelberg non-clinical: 0.774, 95%CI 0.525–0.960; clinical: 0.763, 0.513–0.954; SCORE: 0.740, 0.486–0.939). However, due to the limited number of fatal cases, estimates were imprecise.

Discussion

We derived and externally validated a non-clinical risk score predicting 10-year CVD risk with superior or comparable performance to established clinical CVD risk scores. Additional clinical parameters only slightly improved discrimination. Our results suggest that estimation of 10-year CVD risk based on the selected and easily obtainable non-clinical CVD risk factor information is feasible without loss of predictive accuracy compared to clinical models.

Other external validations of the CMD Score showed acceptable to good discrimination in an Iranian (areas under the receiver operating characteristic curve (AUC): men 0.71, 95%CI 0.66–0.75; women 0.81, 0.76–0.85) and an Australian population (AUC: men 0.82, 0.77–0.86; women 0.88, 0.83–0.94) which is comparable or higher than in our samples13,14. Two meta-analyses, one based on 86 prospective studies, concluded that the PCE discriminates relatively well (C-index 0.723, 0.719–0.727) and reported a prediction interval (men 0.70, 0.60–0.79; women 0.74, 0.63–0.83) covering the observations from our study samples15,16. A pooled analysis of two other German population-based cohort studies showed a C-index (0.76, 0.73–0.79) comparable to our findings17. For the FRS including blood lipids, a meta-analysis of prospective studies reported a C-index of 0.719 (0.715–0.723), which is lower than in our cohorts16. For SCORE, the same meta-analysis suggested relatively good discrimination for all CVD events (C-index 0.719, 0.715–0.723) and better discrimination for fatal events only (C-index 0.758, 0.752–0.763)16. SCORE showed higher discriminatory ability in our samples when including all cases, while discrimination for fatal events was comparable. The PROCAM score for MI showed lower discrimination in other European validation studies than in our sample, with AUCs ranging from 0.55 to 0.7418.

The PCE endpoint definition (MI or coronary heart disease death, or fatal or non-fatal stroke) is largely comparable to our definition. While the PCE was well calibrated in a German sample after recalibration, our study still suggests slight overestimation of the recalibrated equation17. This could be related to deviations in the documented CVD incidence as a result of actual incidence differences in the studied populations and/or differences in the case identification and ascertainment procedure, potentially leading to systematically fewer or more identified cases. Additional inclusion of heart failure and angina in the FRS endpoint definition might explain the strong overestimation of risk detected in our samples.

Despite minor heterogeneity across individual validation studies potentially related to deviations in the population characteristics and covariate structure19, these findings indicate that the established clinical CVD models performed mostly comparable or better in EPIC-Potsdam and EPIC-Heidelberg compared to other studies. This suggests that underestimation of the performance in our samples is unlikely.

Several features of our approach are worth mentioning. Firstly, and most importantly, the developed non-clinical risk score extends individual risk prediction to prevention settings that are not covered by existing clinical risk scores without loss of predictive precision. These include self-assessment of individuals, health education campaigns, and step-wise screening procedures with a non-clinical stage. Secondly, the inclusion of selected GDRS parameters (age, waist circumference, smoking status, self-reported hypertension, consumption of whole grains, red meat, and coffee) in the non-clinical score allows simultaneous risk assessment of CVD and T2D with only a few additional parameters. Thirdly, our non-clinical score contains several lifestyle risk factors, modifiable and easily to be obtained, including dietary information. As effect sizes and directions of the modifiable predictors are in line with previous evidence (compare ST3), the score plausibly supports health behaviour recommendations, pointing out potential ways to reduce CVD risk, for example, by choice of a healthy diet or reducing waist circumference. Inclusion of behavioural over clinical parameters emphasises the role of primary lifestyle prevention rather than focussing on (medicinal) treatment of clinical parameters such as blood lipids or blood pressure, frequently used for CVD risk prediction, as potential consequences of adverse health behaviour. This is supported by our results showing that the investigated clinical parameters don’t provide much predictive information beyond our non-clinical predictors.

There are several strengths to our study. We based our analyses on physician-verified cases, reducing false-positive case assignment to a minimum. The application of the World Health Organization (WHO) Monitoring trends and determinants in cardiovascular disease (MONICA) criteria in the derivation cohort facilitates reproduction in other cohorts based on a standardised outcome definition. Harmonised data collection and procession methods between the EPIC centres in Potsdam and Heidelberg enabled us to fully rebuild the prediction model for external validation without regression or substitution of predictors that could be unavailable in other cohorts. Relevant sample sizes and case numbers in both cohorts (events per variable EPIC-Potsdam: non-clinical model 40.2, clinical model 136.8; events EPIC-Heidelberg n = 692) allowed the derivation of robust estimates, to perform sensitivity analyses, and to examine the performance in subgroups20,21.

However, there are some limitations. Firstly, due to the case-cohort design, the proportion of missingness was high for most biomarkers. However, it has been shown that multiple imputation is a valid approach to handle missing data for absolute risk estimations22. Secondly, we used the non-clinical score points as one predictor for the clinical score instead of individually modelling its risk factors. This approach may have diminished performance improvement. However, post-hoc re-estimation of the clinical model including the non-clinical risk factors individually showed that C-index increased only by 0.001, suggesting negligible loss of discriminatory ability. Thirdly, heterogeneous outcome definitions of the composite endpoint CVD may have hampered performance comparison with other risk scores, especially calibration. Finally, as we developed and validated our scores in German adults, generalisability to other populations with differences in case-mix and deviations in predictor and outcome assessment remains unclear.

To conclude, we developed and externally validated a non-clinical risk score predicting 10-year CVD risk based on shared predictors with a validated T2D risk score with comparable or superior performance to established clinical CVD risk scores. It can be used independently of physical examinations and includes a variety of modifiable risk factors supporting both, risk assessment and subsequent counselling for preventive lifestyle modifications, e.g., through an online calculator. The models will be implemented in the online tool of the GDRS (https://drs.dife.de/) and a paper questionnaire will be developed.

Methods

Study population

Analyses were based on the EPIC-Potsdam and EPIC-Heidelberg cohorts consisting of 27,548 and 25,540 participants recruited in the areas of Potsdam (age mainly 35–65 years, 60.4% female) and Heidelberg (age 35–66 years, 53.3% female). The data was collected from 1994 to 2012. Detailed information on recruitment and follow-up procedures is described elsewhere23,24. For baseline assessment, participants underwent physical examinations and blood sample drawing by trained medical personnel. Information on lifestyle, sociodemographic characteristics, and health status were documented with validated questionnaires and during face-to-face interviews. Participants were actively re-contacted every 2–3 years for follow-up information by sending questionnaires and phone calls if required. Additionally, passive follow-up sources like registry linkage or information of death certificates were used. Response rates ranged from 90 to 96% per follow-up round23.

In both cohorts, participants with prevalent CVD, non-verifiable, silent events, stroke cases with prior brain cancer, meninges, or leukaemia, and with missing follow-up information were excluded. Exclusively in EPIC-Potsdam, we excluded individuals with ‘possible’ events according to the WHO MONICA criteria. Exclusively in EPIC-Heidelberg, we excluded participants with events only indicated by a death certificate but without further sources suggesting an event. The analysis sample in EPIC-Potsdam contained 25,993 participants for the full follow-up, including 684 overall CVD cases (fatal n = 82), 383 myocardial infarctions (MI), and 315 stroke cases and after 10 years 584 overall CVD (fatal n = 70), 324 MI, and 269 stroke cases. Non-CVD death was documented for 2312 participants (8.9%) during the full follow-up and 847 participants (3.3%) within the first 10 years. The respective analysis sample in EPIC-Heidelberg contained 23,529 participants, including 692 overall CVD (fatal n = 87), 370 MI and 345 stroke cases after 10 years of follow-up (details: SF6). Non-CVD death was documented for 2596 participants (11.0%) during the full follow-up and 1074 participants (4.6%) during the first 10 years of follow-up. The studies were approved by the Ethical Committee of the State of Brandenburg and the Heidelberg University Hospital, Germany, and were carried out according to The Code of Ethics of the World Medical Association (Declaration of Helsinki). Participants gave written informed consent for participation.

Assessment of predictors

Self-reported information on smoking, diet, prevalent hypertension and T2D, and medication was collected at baseline via questionnaires. Daily food consumption was assessed with self-administered semi-quantitative Food Frequency Questionnaires including photographs of portion sizes to estimate intake, summarised into food groups, and translated to portions per day as described elsewhere (overview of selected food groups and included dietary items: ST4)25. Waist circumference, systolic and diastolic blood pressure were measured by trained personnel at baseline examination (details: SN2). Biomarker measurements were performed in the established case-cohorts, consisting of a randomly drawn sample (subcohorts: Potsdam n = 2500; Heidelberg n = 2739) of participants who provided blood samples at baseline and incident cases of the according disease (case-cohorts: SF7, SN3, ST5; biomarker measurements: SN2)26. Family history of MI and stroke was collected at the 5th follow-up via questionnaires and summarised to parental and sibling history of CVD.

Case ascertainment

Incident CVD was defined as all incident cases of non-fatal and fatal MI and stroke (International Statistical Classification of Diseases and Related Health Problems, Tenth revision (ICD-10) codes: I21 acute MI, I63.0–I63.9 ischemic stroke, I61.0–I61.9 intracerebral haemorrhage, I60.0–I60.9 subarachnoid haemorrhage, I64.0–I64.9 unspecified stroke). In both cohorts, events were systematically detected via self-report of a diagnosis, information of death certificates, and reports by local hospitals or treating physicians. If an event was indicated by the aforementioned sources, treating physicians were contacted for diagnosis verification, occurrence date, and diagnostic details. Only events with physician–verified diagnoses were considered as incident CVD cases. In EPIC-Potsdam, physician-verified cases were additionally ranked into ‘definite’, ‘probable’, and ‘possible’ events by two trained physicians based on the WHO MONICA criteria for MI and an adapted version for stroke (details: SN4).

Statistical analyses

We applied multiple imputation by chained equations (m = 10) to handle missing values in predictor candidates and parameters needed to derive other scores for comparison (SN5)27,28.

Data of the EPIC-Potsdam cohort (follow-up time: median 11.35 years, IQR 1.38 years) was used for score derivation. We used the predictors of the GDRS in the first step and assessed their association with CVD using Cox proportional hazard regression in each imputed set separately2,22,29. Only parameters that were consistent in regards to effect size and direction with available meta-analyses or large-scale studies remained in the model. For the identification of CVD-specific predictor candidates, the literature was screened for established non-clinical and routinely available clinical CVD risk factors. To derive the non-clinical score, we considered candidates with regard to anthropometric measures, gender, CVD family history, self-reported prevalent diseases, medication, weight history, and dietary information as the main focus. The final selection of the predictors was based on the following criteria: performance improvement, assumed availability in physician-independent settings or routine care, consistency with previous evidence, and robustness of the association. Different predictor candidate combinations were added to the previously identified shared predictors from the GDRS to assess the independence and robustness of the associations. For the clinical extension, we used the score points of the non-clinical score as one predictor and subsequently added clinical candidates with regard to blood pressure measurements, blood pressure or lipid-lowering medication, blood lipid concentrations (total cholesterol, HDL cholesterol, and the respective ratio), and HbA1c. Predictor candidates meeting the previously defined criteria were included in the final scores.

Linearity assumptions of the risk associations were examined by deriving Martingale residuals and performing supremum tests for functional form30. The proportional hazards assumption was assessed by visual inspection of the Schoenfeld residuals.

Even though previous studies have commonly used Cox proportional hazards regression models for absolute risk predictions, including the PCE and FRS, non-CVD mortality is considered a competing risk event for the analysis of CVD endpoints. Despite a limited proportion of non-CVD mortality events in EPIC-Potsdam (3.3% during the first 10 years of follow-up), we additionally used Fine and Gray models accounting for competing risks, calculated absolute risks, assessed the model performance, and compared it to the performance of the Cox proportional hazards models31,32.

In a final step, we additionally considered squared terms and multiplicative interaction terms of the selected predictors with gender and age and, if statistically significantly associated with the outcome, added them to the model and re-evaluated the performance. To assess the potential benefit of modelling gender-stratified equations, we re-estimated the models in men and women separately and compared their performance.

β estimates of the final models were rounded, multiplied by 100 and the following equation including the subdistribution baseline survival S0 and mean values \(\overline{{X}_{i}}\) of all participants was applied to calculate the absolute 10-year risks29,33:

$${\widehat{p}=1-{S}_{0}(t)}^{\mathrm{exp}(({\sum }_{i=1}^{p}{\beta }_{i } {X}_{i}-{\sum }_{i=1}^{p}{\beta }_{i }\overline{{X}_{i}})/100)}$$

We evaluated the performance of the generated scores in EPIC-Potsdam and for external validation in EPIC-Heidelberg censored at 10 years of follow-up and compared it to the performance of established CVD risk scores. Namely the non-clinical CMD risk score, the for Germany recalibrated PCE, two FRS including blood lipids or BMI, the ESC SCORE, and two PROCAM Scores predicting MI or stroke (calculation of scores: ST6)3,10,17,33,34,35. To quantify the discrimination of the scores, we calculated C-indices by using a bootstrap approach dividing each imputed set into 10 random subsets and adjusting for competing risks36,37,38. Calibration was assessed with calibration plots and expected-to-observed ratios. The calibration of the CMD score, SCORE, and both PROCAM scores was not evaluated due to differences in the predicted time frame or in the endpoint definitions (CVD mortality, MI, stroke). Potential changes in risk group assignment between the derived non-clinical and clinical score were assessed using the NRI with previously implemented risk groups (< 5%, ≥ 5%–< 7.5%, ≥ 7.5%–< 10%, ≥ 10%)2. Sensitivity and specificity were calculated based on the aforementioned risk cut-offs.

Sensitivity analyses were performed assessing the discrimination separately for men and women and for MI and stroke. For comparison with SCORE, we additionally calculated C-indices for fatal cases only.

Statistical analyses were performed with SAS (version 9.4).