Re-estimation improved the performance of two Framingham cardiovascular risk equations and the Pooled Cohort equations: A nationwide registry analysis

Equations predicting the risk of occurrence of cardiovascular disease (CVD) are used in primary care to identify high-risk individuals among the general population. To improve the predictive performance of such equations, we updated the Framingham general CVD 1991 and 2008 equations and the Pooled Cohort equations for atherosclerotic CVD within five years in a contemporary cohort of individuals who participated in the Austrian health-screening program from 2009–2014. The cohort comprised 1.7 M individuals aged 30–79 without documented CVD history. CVD was defined by hospitalization or death from cardiovascular cause. Using baseline and follow-up data, we recalibrated and re-estimated the equations. We evaluated the gain in discrimination and calibration and assessed explained variation. A five-year general CVD risk of 4.61% was observed. As expected, discrimination c-statistics increased only slightly and ranged from 0.73–0.79. The two original Framingham equations overestimated the CVD risk, whereas the original Pooled Cohort equations underestimated it. Re-estimation improved calibration of all equations adequately, especially for high-risk individuals. Half of the individuals were reclassified into another risk category using the re-estimated equations. Predictors in the re-estimated Framingham equations explained 7.37% of the variation, whereas the Pooled Cohort equations explained 5.81%. Age was the most important predictor.


Supplementary Methods S1. Data preparation
Distributions of blood pressure (BP) and blood parameter measurements were truncated at the respective 0.5 th and 99.5 th percentiles. Missing values in BP treatment (8.4%) were assumed to indicate no treatment. The low number of missing values in risk factors (0.08% of individuals) allowed a complete-case analysis. Causes of death given in ICD-10 codes (International Classification of Diseases and Related Health Problems, 10th edition) were provided by the Austrian's federal institute for Statistics. Causes of death were split into CVD-related and CVD-unrelated death. The category definitions varied slightly for each equation because of different CVD definitions. For the assignment we used the date of death, gender, and birth year. For most deceased individuals in the health screening data base unambiguous assignment of CVD death was possible. The remaining deceased individuals, received a probability that his/her death was CVD-related. For an individual, this probability was the number of CVD-related deaths of his/her possible matchings divided by the total number of his/her possible matchings in the registry of deaths. Using these calculated probabilities, CVD-related death was randomly assigned to each deceased individual. Misclassification of CVD deaths may have been introduced by the probabilistic assignment of ambiguous causes of deaths. This potentially affected only 10.3% of the combined (fatal and non-fatal) CVD outcomes. Individuals with a history of CVD were excluded. These events were identified by hospital stays (with information on the discharge diagnoses in ICD-10 codes) prior to the health screening. For discharge diagnoses using ICD-9 codes, we applied forward mapping to transform ICD-9 codes to ICD-10 codes. The exact exclusion criteria can be found in Supplementary Tables 2a-c. As the investigated equations had different exclusion criteria, different subsets of the final data set were used in the analysis. The following Venn diagram shows the overlap of individuals in the subsets for each equation. R 3.5.0 and SAS 9.4 were used for analysis and figure preparation. 1 2

Exclusion criteria
In the Framingham 1991 equation, individuals were excluded if they were not aged 30 to 74 or experienced a cardiovascular event before their first health screening.  defined cardiovascular disease (CVD) as coronary heart disease (CHD), stroke, transient ischemia, congestive heart failure, and peripheral vascular disease. 4 We identified ICD-10 codes in the table below for this definition. Additionally, the authors excluded individuals with cancer other than basal cell carcinoma. Therefore, we excluded patients having ICD-10 codes C00-C97 (malignant neoplasms) without C44.3, D00-D09 (in situ neoplasms), or the corresponding ICD-9 codes. Finally, the Austrian study cohort for the Framingham 1991 equation comprised 1 575 614 individuals.

Definition of CVD
Anderson et al. defined CVD as myocardial infarction, death from coronary heart disease, angina pectoris, coronary insufficiency, stroke incl. transient ischemia, congestive heart failure and peripheral vascular disease. 4 This definition in ICD-10 codes was already described in the table above.

Regression coefficients & model formula Predictors
Original equation Re-estimated equation

Exclusion criteria
For the Pooled Cohort equations, 6 the authors included individuals aged 40 to 79 and excluded individuals with a history of non-fatal recognized or unrecognized myocardial infarction, stroke, heart failure, percutaneous coronary intervention, coronary artery bypass surgery, or atrial fibrillation. We translated this definition to ICD-10 codes as shown in the table below and excluded individuals with a hospital stay due to one of those ICD-10 (or mapped ICD-9) codes. In the Austrian study cohort 1 337 475 individuals fulfilled these criteria.

Supplementary Figure 1. Compute the ten-year risk from the five-year risk
The Figures below show the hazard (plus 95%-confidence intervals) for the training and the test set for general CVD as defined in the Framingham equations and for ASCVD as defined in the Pooled Cohort Equations. Given the very narrow scale, the assumption of constant hazards seems reasonable. If the assumption of constant hazards is reasonable, the risk at five years is assumed to be equal to the distribution function of an exponential distribution  For the cut-offs used in the reclassification table in Table 3 and Supplementary Table 6, the risk at five years corresponds to the following risks at ten years.

Relative survival
The figure below shows the relative survival of individuals from the study cohort compared to the general Austrian population. 8

Distribution of age
The two figures below compare (separately for women and men) the distribution of age in the study cohort and the general Austrian population as of 2011. 9

Supplementary Table S5. Calibration in-the-large and calibration slope for the original, recalibrated and re-estimated risk equations.
An optimal calibration-in-the-large is zero, while an optimal calibration slope is one.
Recalibrating an equation by updating the baseline risk does not change the original calibration slope.

Equation Type
Calibration-in-the large

Supplementary Table S6: Observed five-year risk of cardiovascular disease (CVD) and atherosclerotic cardiovascular disease (ASCVD)
For the in Table 3  The observed 5-year risk was computed only for cells with at least 100 observations and at least one event.