Cardiovascular risk algorithms in primary care: Results from the DETECT study

Guidelines for prevention of cardiovascular diseases use risk scores to guide the intensity of treatment. A comparison of these scores in a German population has not been performed. We have evaluated the correlation, discrimination and calibration of ten commonly used risk equations in primary care in 4044 participants of the DETECT (Diabetes and Cardiovascular Risk Evaluation: Targets and Essential Data for Commitment of Treatment) study. The risk equations correlate well with each other. All risk equations have a similar discriminatory power. Absolute risks differ widely, in part due to the components of clinical endpoints predicted: The risk equations produced median risks between 8.4% and 2.0%. With three out of 10 risk scores calculated and observed risks well coincided. At a risk threshold of 10 percent in 10 years, the ACC/AHA atherosclerotic cardiovascular disease (ASCVD) equation has a sensitivity to identify future CVD events of approximately 80%, with the highest specificity (69%) and positive predictive value (17%) among all the equations. Due to the most precise calibration over a wide range of risks, the large age range covered and the combined endpoint including non-fatal and fatal events, the ASCVD equation provides valid risk prediction for primary prevention in Germany.


Results
We studied 4044 patients, whose data were valid and fully available at the beginning of the study. The demographic and clinical characteristics of the study population are shown in Table 1. The mean age of the study population was 53.8 ± 13.7 years (18 to 93 years) at baseline, 65.3% were women. The prevalence rate of hypertension was 37.1%. Patients with diabetes mellitus were excluded. There were no major differences between the original third layer laboratory sample (n = 7519) of the DETECT study (for explanation see Methods) and the current study population (n = 4044, Supplementary Tables 1 and 2), and there were also no differences between participants living in East and West Germany (Supplementary Table 3).
The characteristics of the risk algorithms compared in the DETECT study is shown in Table 2. The absolute number of endpoints and the event rates for each of the risk algorithms is shown in Fig. 1. Table 4 shows parametric (Pearson) and nonparametric (Spearman) correlation coefficients of 10-years risks in 2463 study participants calculated with the data of the first survey (subset of the sample between 40 and 65 years of age as described in "Methods"). All algorithms correlate well with each other. The Pearson and Spearman correlation coefficients are similar. The FRS-CVD has the best correlations with all other algorithms. Supplementary Fig. 1 shows scatter diagrams in which results of the FRS-CVD are placed on the x-axis and the other algorithms on the y-axis. The relative "closeness" of the algorithms to each other is shown graphically in Supplementary Fig. 2. FRS-CHD1, FRS-CHD2, FRS-CVD and ASCVD are closest to each other, respectively.

Correlations. Supplementary
Discrimination. For all algorithms, we calculated AUCs and Harrel C statistics for the clinical endpoint belonging to the respective algorithm and for the other endpoints (Table 3). For broad clinical endpoints (EP3 and EP4; for definitions see Methods) the discriminatory power of all algorithms was lower than for narrowly defined endpoints (EP1 and EP2). This finding did not depend on whether an algorithm was originally developed for a wide or narrow endpoint. For example: the AUC and Harrell C statistics of the FRS-CVD for EP4 (associated endpoint) are 0.72 and 0.72, respectively; AUC and Harrell C statistics of the FRS-CVD for EP1 (endpoint PROCAM or FRS hardCVE) are 0.78 and 0.80, respectively.
We also calculated continuous net reclassification improvements according to Pencina et al. 21 using each of the scores once as a reference and then comparing it to all others (Supplementary Table 5). This revealed significant reclassification of individuals in 18 out of the 36 pairwise comparisons. Figure 2A shows the fifth, 25th, 50th (median), 75th and 95th percentiles of the risk equations.
Assuming arbitrarily that 25% of the DETECT population were at high risk and eligible for intervention (exceeding the 75th percentile), the following thresholds for the calculated 10-year risk would be assigned: FRS-CVD (EP4) 14.9%; ARRIBA (EP2) 13.0%; FRS-CHD2 (EP3) 12.1%; FRS-CHD1 (EP3) 10 Figure 3 compares predicted and observed incidence rates. The incidence rates predicted by Reynolds, ASCVD and ESC-HS are consistent with the observed ones, whereby the results for the ESC-HS in the high-risk group have to be interpreted with caution because of the large confidence interval and the low mortality rate in this group. ARRIBA, PROCAM I, PROCAM II, FRS hard-CVE, FRS-CHD1 and FRS-CHD2 overestimate the actual risk in the middle-and/or high-risk group. FRS-CVD slightly underestimates the risk at medium risk.
In the category of the highest risk of the ESC-HS (more than 10 percent) not a single event occurred in the age group of 40 to 65 years. For this reason, the calibration of the ESC-HS could not be evaluated. When we extended the age range to 40-79 years, the ESC-HS was in good agreement with observed incidence rates for the risk groups 0-1%, 1-5% and 5-10%, while at higher risks (above 10 percent) the calculated risk still exceeded the observed one. Predicted and observed incidence rates were significantly different by the Hosmer-Lemeshow test (Table 4) for PROCAM I, FRS-CHD1, FRS-CVD, FRS-hard CVE and ARRIBA. ASCVD and Reynolds showed the best agreement between observed and predicted incidence rates. The PROCAM II score is not included in Table 4, because it provides only five discrete values which cannot be broken down into deciles. The p-value 0.94 for ESC-HS in the age group 40-79 years is to be considered with caution because of the low numbers of events. thresholds for intervention. Table 5 compares the performance of all algorithms at the thresholds of 5, 10 and 20 percent 10-year risk (or 1, 2.5 and 5 percent, respectively, for the ESC-HS). This simulation has been conducted to delineate an optimum threshold of risk above which intervention and treatment should be considered.
At the threshold of 5% (1% threshold for ESC-HS) the sensitivity of all algorithms for the corresponding endpoints is high; it varies between 56% (PROCAM) and 93% (FRS-CVD and ASCVD) or 94% (ESC-HS). At the 5% threshold, between 25 (PROCAM) and 69 (FRS-CVD) percent of the population would qualify for an intervention. The relationship between sensitivity and number of treated persons appears most favorable for the ASCVD; with a sensitivity of 93%, placing 55% of the population in need of treatment. The ESC-HS provides similar results, with a sensitivity of 94%, and 48% of the population in need of treatment. If the calculated risk was under 5%, the predictive value of negative tests for all algorithms is 99% or greater, saying that falling below this threshold an event occurring within the next 10 years is very unlikely. This also applies to a risk threshold of 10% (threshold of 2.5% for the ESC-HS). The sensitivity varies between 30% (PROCAM-I) and 82% (ARRIBA) or 84% (ESC-HS); when using the ARRIBA almost 40% of the population would still be treated. ASCVD and FRS-CVD (78%) reveal slightly lower sensitivities. With FRS-CVD, 45% of the population would be treated, when using the ASCVD 35% of individuals would require intervention. Using ESC-HS only 27% would qualify for treatment. The relationship between sensitivity and the population that needs intervention appears optimal for the ASCVD (endpoint: non-fatal and fatal events).
The predictive value of a negative tests remains high even at the risk threshold of 20% (5% threshold for the ESC-HS). At risks above 20%, the sensitivity of the algorithms is between 9% (PROCAM-I) and 56% (ARRIBA). Slightly lower sensitivity than ARRIBA have the ASCVD (47%), the FRS-CVD (44%) and the ESC-HS (42%). Among the four algorithms with high sensitivity the ASCVD and the ESC-HS have the highest specificity (90% and 89%).

Discussion
While large-scale validations of risk scores already exist in US populations [22][23][24][25][26][27] , this is the first comprehensive analysis of major cardiovascular risk algorithms in a German primary care setting. Beyond the scores currently recommended by international guidelines (ASCVD, ESC-HS), we also included earlier Framingham scores because they had been used to generate the ARRIBA score which is widely used by general practitioners in Germany and the PROCAM score which has been developed in Germany. Basically, our research reveals that   the results of these risk algorithms vary widely in the population examined. Out of the ten different scores that were evaluated in this contemporary large cohort, the ASCVD showed the best agreement between calculated and observed risk. An important, but not the sole reason for the differences between the scores are the variable components of the endpoints predicted. The strongest differences exist between the ESC-HS (indicates the risk of fatal cardiovascular events only), and the other algorithms which include nonfatal events. In the highest risk category of the ESC-HS, we did not observe a single fatal cardiovascular event in the age group (40-65 years) in which the algorithm has been developed. Therefore, we provisionally expanded the age range to 40 to 79 years, which resulted in improved comparability with other risk calculators, but the results need to be interpreted with caution.
The other algorithms also differ significantly in their calibration. FRS-CVD, FRS-CHD2, FRS-CHD1 and ARRIBA yield high results for the risk of broadly defined clinical endpoints; FRS hard-CVE, ASCVD, PROCAM I, PROCAM II, and Reynolds, which focus on narrowly defined clinical endpoints, provide lower results. These differences are not exclusively based on different endpoints. For instance, the end-points of ARRIBA and ASCVD are similar, but ARRIBA yields significantly higher risks.
For broad clinical endpoints the discriminatory power of all algorithms was lower than for narrowly defined endpoints, and it was surprisingly irrelevant what endpoint was originally used for the creation of a score. No significant differences were seen when the discrimination of all algorithms for a single endpoint was calculated. The reason for this observation could be that so-called "soft" end-points are less reliably detected and annotated. For instance, the end-point angina pectoris may be diluted with noncardiac chest pain. In addition to the end-point definition, the clinical parameters that define the risk formulas vary. Age, gender, smoking status, and systolic blood pressure are included in all risk equations, but cholesterol (total or LDL or HDL cholesterol), diabetes mellitus status, family history, antihypertensive therapy, HbA1c and CRP are only used in part of them.
The risk calculators correlate well with each other, but at high risks ARRIBA shows a strong upward deviation and is thus not optimally calibrated in this range, i.e. the predicted risks are significantly higher than the observed ones. Reynolds, FRS-CHD2 and ASCVD seem to be more accurate at higher risks. Allan et al. 22 have come to a similar result: They compared 25 different risk calculators in 128 hypothetical patients with maximal seven risk factors. The absolute risks differed most at calculated risks above 20%. They conclude that this is of secondary importance for clinical decisions because all risk categories above 20% will be classified as "high", regardless whether the calculated risk is 30%, 50% or 70%.
When we calculated continuous net reclassification improvements according to Pencina et al. 21 18 out of the 36 pairwise comparisons revealed statistically different reclassifications of subjects which also indicated substantial differences between the scores. Rose 28 points out that the focus of preventive strategies on a limited number of persons with putatively high risk has great benefit individually, but a smaller effect on the incidence rate of events in a population overall. Because most cardiovascular events occur in individuals with apparently low risk they are excluded from preventive treatment if a "high risk strategy" is followed stringently. For this reason, we have considered three different scenarios with different thresholds for "high risk" (and thus treatment recommendation) ( Table 4).
At the intervention threshold of 20% in 10 years currently recommended in many guidelines 50 to 80% of patients who will suffer a cardiovascular event in the next 10 years will be excluded from therapeutic interventions, since they are not classified as high-risk patients. On the other hand, if the intervention threshold is lowered to 5%, the sensitivity greatly increases, so that more than 90% of all patients with subsequent cardiovascular events will be identified. However, this is expected to be at the expense of specificity and many patients who will never experience an event would also receive a therapy with the potential risk of side effects of medications or lower quality of life.
At the intervention threshold of 10% and 2.5%, respectively, in 10 years, ASCVD, FRS-CVD, ARRIBA, and ESC-HS have the highest sensitivities (74 to 84%). The specificities (69% and 64%) and the predictive value of positive tests (18% and 14%, proportion of persons with positive tests, in which an event also occurs) are highest for the ASCVD and the ARRIBA. The ESC-HS has a very low predictive value of positive tests (5%) but a high specificity of 74%. Due to the good calibration over the entire range of risks, the large age range and the combined fatal and non-fatal endpoint, we conclude that the ASCVD is the preferable risk score for Germany. Using the ASCVD, the threshold of a 10-years risk of 10% is exceeded by one-third of the study participants older than 40 years. The ASCVD is based on the very recent studies ARIC (Atherosclerosis Risk in Communities) 29 , the Cardiovascular Health study 30 , the CARDIA (Coronary Artery Risk Development in Young Adults) study 31 , and the Framingham study 32,33 and calculates the probability of a non-letal myocardial infarction, a letal or non-letal stroke or death due to coronary heart disease 34 . Finally, our data do not confirm the reported overestimation of the risks by the ASCVD in other cohorts [23][24][25]35 .
The proposed risk threshold of 10% is very close to the threshold of 7.5% recommended by the national US guidelines for intervention with moderate to high intensity statins treatment in primary prevention 2 . A most recent published study of the Copenhagen General Population showed that the application of lower risk  Table 3. AUCs and Harrell's C-statistics of risk algorithms related to four composite end points (n = 2463, in brackets: 95%-confidence intervals). bold: Harrell-C-Statistics for the endpoint belonging to the respective algorithm. thresholds for statin therapy could prevent more atherosclerotic cardiovascular events than the use of higher thresholds for therapy and intervention 36 .
Using the ASCVD, 30% of our cohort would require treatment. Assuming an event rate of 17% in 10 years and a relative risk reduction of 30% as an effect of treatment, the number needed to treat with statins and antihypertensive drugs to prevent one event would be reasonable at about 20 in 10 years or 40 in 5 years.
Limitations. Absolute incidence rate of events. The total number of events recorded during follow up was comparatively low. This is related to the fact that we strictly confined our evaluation to a primary care population free of vascular disease at baseline and that the duration of the follow-up was limited. However, the absolute incidence rate of vascular events appears to be within the range of other cohorts recruited in Germany [37][38][39] .
Representativeness for primary care in Germany. Eligible doctors were identified to evenly represent the geographic areas of Germany at high granularity. The overall response rate of 60.2% was lower than in other studies with nation-wide random sampling 40,41 . This lower participation rate may be due to fact that eligible doctors were asked beforehand for their willingness to step into the more demanding third laboratory and follow-up layer of the study. Yet, in comparison to a study using a similar sampling strategy 40 we did not identify any selective drop-outs by region or type of primary care setting. Further, 90% of patients eligible also participated.  Comparisons of our study sample to the entire DETECT population and the laboratory sample (third layer) revealed significant differences, because individuals with cardiovascular disease and diabetes mellitus were excluded from the current analysis (Supplementary Table 1). We also used only a part of the laboratory sample in which all items needed to calculate the risk scores were available. This subgroup and the laboratory cohort (third layer, Supplementary Tables 1 and 2) free of cardiovascular disease and diabetes mellitus were not significantly different from each other. Taken together, we are convinced that our study population is representative of the corresponding population in Germany. This is also exemplified by the prevalence rate of hypertension in the DETECT study which corresponds to the ones in a series of other German studies, even more recent ones (Supplementary Table 4) 39,42,43 . For further details we refer to a previous article addressing the representativeness of the DETECT study 44 .
Medication use. The degree at which medication may have confounded our results is hard to estimate. It needed to be considered that (a) the use of anti-hypertensives is already part of some of the risk scores while it may influence all of them by affecting blood pressure values at baseline or during follow-up (Table 2) (b) that hypertension was under-treated (Table 1) (c) that anti-thrombotic and lipid-lowering treatment had been prescribed to a low proportion of patients only ( Table 2).
Use of the ASCVD in Germany. Differences in CVD risk have been reported between East and West Germany 45,46 most likely as a sequel of differences in life-style, nutrition and social systems before the reunion of Germany. Very recently, however, the living conditions in the former East and West have been converging. It further needs to be considered, that the nature and the strength of an association between a risk factor and clinical endpoints is unlikely to differ between East and West Germany. Rather the prevalence rate and expression of risk factors had  Table 5. Comparison of methods for cardiovascular risk assessment in participants of the DETECT study at threshold levels 5, 10 und 20% risk for an event in 10 years (1, 2.5 and 5% for ESC-HS, respectively). *Risk ≥ threshold value in 10 years **Age: 40-79 years. Relative risk: Observed relative risk of high-risk group compared to the low-risk group. Sensitivity: Proportion of people with a calculated risk ≥ threshold in 10 years related to all persons, in which the cardiovascular event occurs for the corresponding score. Specificity: proportion of individuals with a calculated risk < threshold in 10 years related to all persons without a cardiovascular event for the corresponding score. PVP (Predictive value of the positive tests): proportion of people with a cardiovascular event belonging to the calculated score to all people with a calculated risk ≥ threshold. PVN (Predictive value of the negative tests): proportion of people without a cardiovascular event belonging to the calculated score to all people with a calculated risk < threshold. Diagnostic efficiency: the ratio of correct-predicted and correctly excluded cardiovascular events in the total cohort. likely been responsible for the differences in cardiovascular disease burden between the two geographical areas of Germany in the past. Unexpectedly, we found that the ASCVD is well suited to German primary care. Apart from Caucasians, the ASCVD has included persons of Hispanic and African-Americans. These ethnicities are hardly represented in Germany, while an inclusion of Arab and Turkish immigrants is currently emerging. None of the algorithms examined here allows adjustment for these ethnicities nor is there any data available that would allow for taking this demographic change into account.

Conclusion
In conclusion, the ASCVD (pooled cohort equation) recommended in the US guidelines is well suitable for Germany. At an intervention threshold of 10% risk in 10 years the ASCVD has a favorable ratio of sensitivity (80%) and specificity (about 70%), it combines non-fatal and fatal events as endpoint and can be applied over a wide age range.

Methods
Study design, participants and clinical characterization. The DETECT study has been a three-layer, multi-center, prospective long-term study and was initiated to investigate the prevalence and time course of CHD and its metabolic risk factors in primary care patients in Germany. Details of the study protocol have been published 44 . The study had been reviewed and approved by the Ethics Committee of the Medical Faculty Carl Gustav Carus at the Technical University Dresden (AZ: EK149092003; 16.09.2003) and registered at clinicaltrials.gov (NCT01076608). All participants were informed about the study and gave written informed consent. The authors confirm that all research was performed in accordance with relevant guidelines/regulations, the "Declaration of Helsinki" and the German data protection rules in place at the time of conducting the study. The first layer was the recruitment of centers which was based on a nation-wide sample of physicians with primary care functions (medical practitioners, general practitioners, general internists). Sampling was based on 1060 regional segments (according to the criteria of IQVIA, formerly the Institute for Medical Statistics, Frankfurt am Main, Germany), clustered into 128 geographical areas for which primary care practitioners' addresses were available. From this database a random sample of 7053 physicians was drawn. A total of 468 study monitors was responsible for recruiting these doctors. Monitors were requested to inform doctors about the study aims and procedures, to recruit up to eight doctors, strictly following the order on the list provided and to collect reasons not to participate. Out of initially 7053 eligible primary care physicians, 3188 (45.2%) finally joined in. The most common reasons for non-participation were: protocol too sophisticated, no interest, no participation in clinical trials in general, allowance not high enough, ethical concerns, lack of time or at baseline not available.
On the second layer, the participating physicians were instructed to screen all patients presenting in their practice alternatively on the forenoon of either the 16th or 18th of September 2003. The protocol specifically demanded inclusion of all attendees and prohibited any systematic choice of patients to provide a typical reflection of their everyday practice and avoid major bias. Exclusion criteria for the patients were: age under 18 years, the presence of a life-threatening illness, dementia or other serious, cognitive disorders, severe visual limitations. The total number of eligible patients was 59,403 patients to whom questionnaires were distributed. 3607 patients refused participation. In an additional 278 patients no doctor's assessment was performed, leaving a total number of 55,518 patients (response rate 93.5%) for the DETECT main investigation.
For the third layer, 1000 doctors of the main study were randomly selected for participation in the laboratory and follow-up arm of DETECT. Participating doctors in this arm were asked to additionally include at least 12 randomly selected patients to undergo laboratory analysis and follow-up investigation. In 7521 patients the laboratory screening program was completed, valid laboratory data were obtained for a total 7519 patients from 851 doctors.
At each visit, physicians documented symptoms, diagnoses, treatments and health behavior of patients due to a structured interview; current heart rate, body mass index, waist and hip circumference and systolic and diastolic blood pressure were measured. Patients reported information about their health status and their psychosocial situation in a structured questionnaire.
Laboratory testing. Blood samples were collected in the morning and sent to the Clinical Institute of Medical and Chemical Laboratory Diagnostics of the Medical University of Graz overnight. Cholesterol, triglycerides, glucose and "highly-sensitive" C-reactive protein (hsCRP) were determined on a Roche Modular automatic analyzer. LDL and HDL cholesterol were determined using a HELENA SAS-3/4-SAS electrophoresis system after separation of plasma proteins and enzymatic detection of cholesterol in lipoproteins by densitometry. Hemoglobin A1c (HbA1c) was measured on a ADAMS HA 8160 analysis system 44 .
Clinical definitions. Hypertension was defined as systolic blood pressure >140 mm Hg, diastolic blood pressure >90 mm Hg 47 , a history of hypertension and/or the use of antihypertensive drugs. Diabetes mellitus was defined as glycated hemoglobin A1c (HbA1c) about 6.5% or fasting glucose above 125 mg/dl 48 , a history of diabetes mellitus and/or the use of oral hypoglycemic agents or insulin. Study participants were classified as active smokers, when they consumed a tobacco product in the last four weeks preceding the survey.

endpoints.
One year and four years after the recruitment of patients the health status was documented by the participating investigators. Total mortality and cardiovascular causes of death, nonfatal MI, coronary revascularization (bypass surgery (CABG) or percutaneous coronary intervention), fatal and non-fatal stroke, transient cerebral ischemia, and symptomatic occlusive peripheral arterial disease were documented. The information about the endpoints was collected using a standardized form by the family physician and/or the facility in which the patient has previously been treated. The median time of follow-up was 4.02 years, the maximum period 4.6 years. Risk algorithms. We considered the following risk models: Framingham-hard-Cardiovascular Endpoints (FRS-hard-CVE) 11 , Framingham CHD1 (FRS-CHD1) and Framingham CHD2 (FRS-CHD2) 12 , Framingham CVD (FRS-CVD) 49 , ARRIBA (which is widely used by general practitioners in Germany), PROCAM I 38 and PROCAM II 50 , Reynolds score 51,52 , ESC Heart Score (ESC-HS) 13 and atherosclerotic cardiovascular disease score (ASCVD), sometimes called Pooled Cohort Equation 34 . The algorithm for the calculation of a continuous PROCAM score in its latest version of 2007 50 is not public, since the supplemental data mentioned in the publication is not accessible. For this reason, in Fig. 2, Supplementary Figs 1 and 2, and in Table 3 and Supplementary  Table 4, only the PROCAM I version published in 2002 was used in which the risk for women was estimated by dividing the calculated risk of men by 4. The corresponding risks were calculated for each of the study participants based on the records of the first survey in 2003. Table 2 provides an overview of the covariates included in the risk algorithms. FRS-CHD1 and FRS-CHD2 differ because FRS-CHD1 uses total cholesterol and FRS-CHD2 uses LDL cholesterol. For the calculation of the ESC-HS, the risk algorithm was applied as proposed for Germany 13,53,54 . For ARRIBA the risks were determined by using accessible risk charts (www.arriba-hausarzt.de/material/papier.html). sample and subgroups. In our analysis, we included patients from the third layer of the study in whom a) follow-up data (one year and four years after the start of the study) was available or who had died during the observation period, b) at recruitment no evidence of coronary artery disease, symptomatic peripheral arterial disease, cancer, severe kidney disease existed nor a history of heart attack or stroke, and c) no diabetes mellitus was diagnosed. Patients with diabetes mellitus were excluded because in relevant guidelines 55 diabetes mellitus is treated as a "coronary risk equivalent" so that risk calculation would not be needed. After application of these criteria our sample consisted of 4044 patients.
Different inclusion criteria and different clinical endpoints were used through the development of the algorithms. We have adopted these criteria and defined five combined clinical endpoints (EP): EP1: fatal and non-fatal myocardial infarction, revascularization, sudden cardiac death (PROCAM I/II, FRS hard-CVE); EP2: EP1 plus fatal and non-fatal stroke (Reynolds, ASCVD, ARRIBA); EP3: EP1 plus angina (FRS CHD1, FRS CHD2); EP4: EP1 plus heart failure (NYHA III or IV), fatal and non-fatal stroke, transient ischemic attack (TIA), symptomatic peripheral arterial disease (PAD) (FRS-CVD); EP5: death by cardiovascular cause (ESC-HS). For further details see Table 2 and Fig. 1. statistical methods. The characteristics of the study cohort are presented as means and standard deviations (continuous traits) and relative frequencies (categorial traits) ( Table 1).
Correlations of 10-years risks at the first investigation in 2003 were calculated according to Spearman and Pearson (Supplementary Table 4), the relationship between the 10-years risks is shown in scatter plots with best-fit lines and their 95% confidence intervals, with the FRS-CVD on the abscissa. Both axes are scaled logarithmically, because the risk values obtained were skewed to the right in the study sample ( Supplementary Fig. 1). Based on the Spearman correlation matrix the entries of which can be considered as "distances" between scores, a multidimensional scaling was performed ( Supplementary Fig. 2). The lower the "distance" between two scores in the graph, the higher is their correlation.
Discrimination. To compare the discriminatory power of the prognostic models, we calculated the areas under the receiver operator characteristics curves (AUCs) and the Harrell's C-statistics for all considered risk equations in a subpopulation of persons 40 to 65 years of age (n = 2463) ( Table 3).
Calibration. To examine the concordance of the predicted with the incidence rates actually observed, we divided the subpopulations in risk groups <10%, 10-20%, and ≥20%. For each of the resulting risk groups, we plotted the means of the calculated risks against the 10-year rates of associated endpoints (Fig. 3). The event rates were extrapolated to 10 years assuming that the cumulative incidence rate for the events is linear in time. Additionally, we have divided the calculated risks in deciles and again compared the mean predicted risks with the actual incidence rates. For each of the risk groups, ("low" to "very high"), we calculated the 95% confidence intervals. We examined how well the risk score approximates the observed incidence rates in the score deciles using the Hosmer-Lemeshow test. Specifically: If K j is the number of observed events in the j-th decile with n observations, E j is the expected sum of the events and Σ is the test statistic from the sum of H j = (K j − E j ) 2 /E j (1 − E j /n), j = 1, …, 10, then the p-value is equal to the chi-square function with eight degrees of freedom in Σ (Table 4). Sensitivity, specificity, positive and negative predictive value of the algorithms at the threshold values 5, 10 and 20% of the calculated risks are given in Table 5. All calculations were performed using R (R-Project, version 3.1.3) and Matlab (version R2015a). Continuous net reclassification improvements (Supplementary Table 5) were calculated according as described 21 .