Main

Despite declining age-standardized rates, myocardial infarction remains the leading and increasing cause of death globally1. Prevention of myocardial infarction is highly prioritized2, but the targeting of primary preventive efforts is hampered by inefficient means of identifying individuals at the highest risk for an imminent myocardial infarction (IMI). This could be partially explained by the inability of most risk prediction models to account for the highly dynamic nature of the period leading up to a myocardial infarction. For instance, traumatic events, such as a cancer diagnosis or loss of a spouse, markedly increase the risk of myocardial infarction3,4. In addition, the degree of stenosis in the culprit lesion in the coronary artery appears to increase in the months just before the myocardial infarction5. Nonetheless, to date, most biomarkers have been investigated over several years of follow-up because of a low number of individuals with a first myocardial infarction shortly after baseline in the general population. Hence, a large population-based study focusing on identifying biomarkers of an IMI is needed.

Primary prevention for asymptomatic risk factors over a long period is costly, and motivation among patients and providers is limited even for secondary prevention6. Risk prediction in the short term based on biomarkers of IMI might tilt the scales for prevention, as the knowledge of an increased risk of a first myocardial infarction within the ensuing few months might motivate patients and doctors to consider preventive strategies.

We hypothesized that circulating biomarkers of the dynamic biological processes that operate in the months preceding a myocardial infarction could be measured and used to assess risk. We tested this in a new nested case–cohort study and devised a prediction model for an imminent first myocardial infarction.

Results

We assembled a new nested case–cohort study, the Markers of Imminent Myocardial Infarction (MIMI) study. The study includes initially cardiovascular disease-free individuals in six European general population-based cohorts who developed a myocardial infarction within the first 6 months after the baseline examination, with up to four cohort representatives per case (Fig. 1 and Supplementary Table 1). The case–cohort design allows for time-to-event analyses and derivation of accurate prediction models; it is also less prone to certain biases than the case–control design7. After exclusions, data of 2,018 individuals weighted to represent the full cohort of 169,053 persons were available for analysis (420 IMI cases and 1,598 subcohort representatives). Their characteristics at baseline are shown in Extended Data Table 1.

Fig. 1: Derivation of the sample representing 169,053 individuals without previous cardiovascular disease from six European population-based cohorts.
figure 1

The distribution of MIMI participants across Europe is shown, with the participating countries and cohort centers indicated. Cases (n = 420) were initially sampled, and center-specific strata based on sex and median age were constructed. From each cohort center, up to four subcohort representatives were drawn for each case from the same stratum. A subcohort (n = 1,598) weighted to represent the total cohort (N = 169,053) based on the number of individuals in the age and sex strata in the total cohort was thus assembled. NA, not applicable.

Thereafter, we determined the levels of 817 proteins (some duplicates) and 1,025 metabolites in biobanked plasma samples from the cohort baseline examinations in a core laboratory and harmonized 16 clinical variables between the cohorts. We divided the study sample into a discovery sample (EpiHealth, Trøndelag Health Study (HUNT) and Lifelines; 70% of the sample) and an external validation sample (European Prospective Investigation into Cancer and Nutrition—Cardiovascular Disease (EPIC-CVD), Estonian Biobank study and Malmö Preventive Project (MFM); 30% of the sample). Considering the limited sample size of the study, we also performed an internal validation as an exploratory analysis by randomly splitting the study sample into a 70:30 discovery/validation sample, repeated in 100 random draws.

We investigated the associations of proteins, metabolites and clinical variables with the risk of a first myocardial infarction within 6 months after baseline using weighted, stratified Cox proportional-hazards regression models in the discovery sample. Biomarkers that passed multiple testing bounds (a Benjamini–Hochberg false discovery rate (FDR) of <0.05) were verified in the same models in the validation sample (this was done in the external and internal validation sets), with directionally consistent results at P < 0.05 considered replicated.

In one-by-one models adjusting for technical covariates (season, storage time and plate; Fig. 2), 48 proteins, 43 metabolites and 3 clinical variables (age, sex and systolic blood pressure) were found to be associated with IMI after the discovery–validation process (Fig. 3 and Supplementary Table 2).

Fig. 2: Associations of proteins, metabolites and clinical variables with IMI risk.
figure 2

The associations of 817 proteins, 1,025 metabolites and 16 clinical variables with the risk of a first myocardial infarction within 6 months in the full MIMI study, adjusted for technical covariates, are shown by biomarker category (clinical, metabolite or protein). HR relates to a doubling of the concentration of proteins and metabolites and a one-unit higher level of clinical biomarkers on their original scale (for example, years, mmol l−1). The top 25 biomarkers that passed external validation and ranked on how many internal validation splits the biomarker passed the replication criteria in the model adjusted for technical covariates in addition to the external validation are highlighted. aIL-6 and bKIM1 were measured on multiple Olink panels and tested in separate statistical tests. n = 420 cases and 1,598 noncases.

Fig. 3: Top variables associated with IMI risk.
figure 3

The top 25 biomarkers that passed external validation and ranked on how many internal validation splits the biomarker passed the replication criteria in the model adjusted for technical covariates in addition to the external validation are shown. Each predictor is represented by two rows, with the discovery result (blue) presented first and the validation result presented second (red). The results are sorted by predictor type (clinical, metabolite or protein) and effect size from the combined analysis of the discovery and validation samples. P value was calculated based on a 2 d.f. Wald test for metabolites analyzed using the missing indicator method (biomarker and missing indicator) and a 1 d.f. Wald test otherwise (biomarker only), two-sided in both cases. The 95% CI of the point estimate (log(HR)) was calculated for the biomarker only and might include 1 even if P < 0.05 from the 2 d.f. (biomarker + indicator) Wald test. aIL-6 and bKIM1 values were determined from multiple Olink panels and tested in separate statistical tests. n = 296 cases and 1,121 noncases in the discovery sample; n = 124 cases and 477 noncases in the validation sample.

Thereafter, we investigated promising markers in models further adjusting for age and sex. Among them, brain natriuretic peptide (BNP) was the only biomarker with a borderline significant association with IMI (HR per doubling of BNP level (95% confidence interval (95% CI)) = 1.33 (1.15, 1.55), P = 1.63 × 10−4, FDR = 0.11 in the discovery sample and 1.40 (1.00, 1.94), P = 0.049 in the validation sample; Extended Data Fig. 2). BNP was the only biomarker with a suggestive association in the internal validation, passing the formal replication criteria in 22 of 100 random splits. By comparison, stem cell factor (SCF) and interleukin-6 (IL-6), biomarkers with a weaker support of an association, replicated in only 5 or 4 of 100 random splits. The cumulative hazard of IMI by fourths of BNP is shown in Extended Data Fig. 3. The associations of BNP with IMI in sensitivity analyses excluding one cohort at a time and in a random-effects meta-analysis were similar, as shown in Extended Data Figs. 4 and 5. For some of the 94 variables, we observed substantial between-cohort heterogeneity in the estimates when they were evaluated in a random-effects meta-analysis (Supplementary Table 3). The addition of interaction terms between sex and the biomarkers did not reveal any additional associations. Associations with IMI within 3 months (185 cases) were similar to those within 6 months (Extended Data Fig. 6).

In a model investigating the total effect of the BNP–IMI association (with a priori selected confounders, not mediators, according to Extended Data Fig. 1), adjusting for age, sex, weight, height, creatinine and systolic blood pressure, the association of BNP with IMI remained similar (HR (95% CI) = 1.34 (1.14, 1.57), P = 3.12 × 10−4 in the discovery sample and 1.51 (1.05, 2.18), P = 0.028 in the validation sample; per doubling of BNP level).

We then investigated the association of the most promising marker, BNP, with the coronary artery calcium score (CACS) at a cardiac computer tomography examination in an external population-based cohort of 1,586 participants of the Swedish CArdioPulmonary bioImage Study (SCAPIS) who were free from self-reported cardiovascular disease. Here, a higher CACS was not notably associated with a higher BNP level (odds ratio (95% CI) = 1.14 (0.91, 1.42), P = 0.25; per doubling of BNP level) in an ordinal regression model adjusting for the same covariates as in the total-effects model.

Finally, we investigated the possibility of developing a clinical risk prediction algorithm for a first IMI using clinically available variables and a weighted Cox ridge regression model. The prediction model achieved an internally validated C-index of 0.78, indicating a good ability to discriminate between IMI cases and noncases. When validating the model in the UK Biobank, a C-index of 0.82 was obtained, while a calibration plot showed some overestimation of 6-month IMI risks. As a comparison, the recalibrated SCORE2 achieved C-indexes of 0.77 (MIMI cohort) and 0.81 (UK Biobank) and overestimated the IMI risks in both samples (Extended Data Fig. 7). A nomogram based on the model is shown in Fig. 4, with a worked example of its intended use displayed in Extended Data Fig. 8 and its cross-validated calibration presented in Extended Data Fig. 7. An interactive web application is presented at miscore.org. Coefficients for predicting IMI from the model are shown in Supplementary Table 4.

Fig. 4: Nomogram of the model for the clinical prediction of an IMI.
figure 4

A nomogram for predicting IMI risk based on the final clinical model is shown. Each variable value contributes points (ruler at the top) that are summed up and translated to the predicted risk of a myocardial infarction within 6 months (bottom two rulers). Equation, β coefficients, 6-month survival and mean variable values are provided in Supplementary Table 4. A worked example is shown in Extended Data Fig. 8. The model is also presented as an interactive web application at miscore.org.

No biomarkers improved risk prediction in a LASSO (least absolute shrinkage and selection operator) Cox regression model; the variable selection by the LASSO was unstable, with the 95% bootstrap CI on the model size being 0–128 variables. No biomarkers improved risk prediction in a random forest model using 2,000 trees; it also ranked BNP and N-terminal pro-BNP (NT-proBNP) at the top but with very large CIs (Supplementary Table 5).

Discussion

We here set out to identify and test biomarkers and the predictability of an imminent first myocardial infarction using a new case–cohort consortium of individuals without prior cardiovascular disease and with biobanked blood samples. From more than 1,800 biomarkers, we identified 48 proteins, 43 metabolites and 3 clinical variables associated with the risk of an imminent first myocardial infarction independent of technical covariates. Further analyses revealed BNP as the only biomarker consistently associated with IMI risk. We also derived a prediction model to discriminate between subsequent cases and noncases. The IMI phenotype has rarely been studied prospectively in the general population and with a broad panel of biomarkers. The findings may have implications for both clinical primary prevention studies and further etiological studies.

In the current study, higher BNP levels in individuals without a known cardiovascular disease were linked to a higher risk of a first myocardial infarction within 6 months in several models. Cardiomyocytes produce BNP in response to strain8, and NT-proBNP measurement is a pillar of the clinical management of heart failure9 but is not used in diagnosing myocardial infarction10. Diastolic dysfunction is an early feature of myocardial ischemia, and a higher BNP level in this context is likely underpinned by diastolic dysfunction caused by subclinical ischemia11 in individuals with some degree of coronary stenosis. This is supported by the weak association of BNP and CACS observed herein, although the association should be interpreted carefully. The noncausal explanation is further supported by the noncausality suggested by Mendelian randomization studies (acknowledging that associations of genetically determined lifelong BNP levels with coronary disease may have limited relevance to a temporally boxed-in series of events): a genetic variant affecting the expression of the BNP gene (NPPB, rs198389) is not associated with cardiovascular endpoints12 or coronary artery disease13. The influence of chance on the finding is low, as NT-proBNP was also significantly associated with IMI in the discovery sample, with a borderline association in the validation sample (Extended Data Fig. 5). While BNP may hence reflect an underlying coronary artery disease, it did not add materially to a risk prediction model for IMI composed of more readily available biomarkers.

Several known mechanisms implicated in atherosclerosis and ischemia were represented among the other 94 biomarkers associated with an IMI in both the discovery and validation samples after adjusting for technical covariates, including inflammation (IL-6)14, extracellular matrix metabolism (WAP four-disulfide core domain protein 2 (WFDC2))15, hypertrophy (adhesion G-protein-coupled receptor G1 (AGRG1))16, apoptosis (triggering receptor expressed on myeloid cells 1 (TREM1), tumor necrosis factor receptor superfamily member 10B (TRAIL-R2)) and cell adhesion (AGRG1). We also observed associations with markers representing mechanisms less often implicated in coronary diseases, such as markers of kidney injury (kidney injury molecule 1 (KIM1))17, appetite regulation (growth differentiation factor 15 (GDF15))18, and an α-amino acid found in dietary supplements and associated with paracetamol use (pyroglutamine)19. While some associations may be causal, others, such as associations with levels of chitinase-3-like protein 1 (CHI3L1)20, pleiotrophin (PTN) or KIT, may more likely be responses to myocardial ischemia. These findings may accelerate further etiological studies of acute coronary events.

We here developed a prediction model for IMI in the general population. An imminent infarction is difficult to predict; the signals are weak, and we faced power limitations. The model achieved good discriminative ability, with acceptable calibration in the lower risk range. It is possible to transpose to other settings by entering the base hazards and variable means of those settings, for example, interactively at miscore.org. Given the increasing global burden of deaths from myocardial infarction, the importance of predicting them and increasing the individual motivation for preventing such deaths may be substantial; this can be tested in clinical trials.

The current study has several limitations. First, the use of multiple cohorts introduced heterogeneity. We addressed this at the sampling, biomarker analysis and statistical analysis stages, with the resulting limitation that the heterogeneity decreases statistical power. The strengths are the same as in other multicenter studies, including that only biomarkers with consistent importance in different settings are brought forward. Other study limitations are inherent to the uncertainty of ranking the top findings and the inability of one-by-one strategies to capture complex interrelationships. The instability of the variable importances from the random forest was unsurprising, as such methods are notoriously data hungry and require far larger datasets than classical modeling techniques21. While the studied markers are easily obtainable by a simple blood test or clinical assessment, a limitation is that a blood sample will not always capture tissue-specific processes. In addition, our study was limited to proteins and metabolites that remain stable in the freezer for many years. The biomarker analyses used herein are currently not available in clinical practice, and we lacked the clinically available and more precise immunoassay measurements of, for example, NT-proBNP and cardiac troponin; hence, imprecision in the proximity extension assay and ultra-high performance liquid chromatography–mass spectrometry (UPLC–MS) technologies may preclude definitive mechanistic insights and maximal clinical utility. Further, making causal assumptions is fundamentally challenging in a multimarker landscape where many causal pathways are unknown. Most markers could be potential mediators in pathways for known causes of myocardial infarction, including age and sex. Consequently, we provided models adjusted for technical covariates only and models with further biological covariate adjustment. Thus, some associations could be explained by confounding by, for example, age and sex. Notably, mediators of causal effects are also important to identify, with implications for prediction and use as treatment targets.

In conclusion, we identified biomarkers associated with the risk of an imminent first myocardial infarction, including BNP. Delineation of the distinct biological processes that operate in the months before the first myocardial infarction will be key to discovering prevention targets. We developed and validated a prediction model with a fair ability to discriminate between persons with and without risk of an imminent first myocardial infarction. Risk prediction in the short term may enhance the motivation of patients and doctors for primary prevention.

Methods

Study sample and outcome

The MIMI study sample draws biobanked blood and data from six European cohorts of the BBMRI-LPC (Biobanking and Biomolecular Research Infrastructure—Large Prospective Cohorts) collaboration22, as shown in Fig. 1 and Supplementary Table 1. After sample size determination, we supplied each cohort with a standardized protocol (in which all definitions are described in detail) and an R script for selecting cohort representatives for the subcohort (Supplementary Notes).

Cohort participants with biobanked samples (at least 250 μl of plasma or serum; eventually, only plasma was included) and no previous clinical cardiovascular disease were eligible for inclusion in the present study. The exclusion criteria were previous clinical cardiovascular disease (defined as the presence at any time before baseline of any of the following: myocardial infarction, coronary procedure, heart failure, structural heart disease, tachyarrhythmias, stroke, thromboembolic disease and peripheral vascular disease) and renal failure.

Individuals with acute myocardial infarction (International Classification of Diseases, tenth revision (ICD-10), I21; ICD-9, 410.0–410.6 and 410.8) as the primary cause of hospitalization or death within 6 months after baseline were defined as IMI cases. We included both ST-elevation and non-ST-elevation myocardial infarctions; we encouraged efforts to include only type 1 myocardial infarctions by not counting cases with any of the following ICD codes in secondary positions: anemia (for example, ICD-10, D50–D64; ICD-9, 280–285), tachyarrhythmias (for example, ICD-10, I47–I49; ICD-9, 427), heart failure (for example, ICD-10, I50; ICD-9, 428), renal failure (for example, ICD-10, N17–N19; ICD-9, 584–586), chronic obstructive pulmonary disease (for example, ICD-10, J43–J44; ICD-9, 491, 492 and 496), sepsis and other severe infections (for example, ICD-10, A40–A41; ICD-9, 038), or hypertensive crises.

Up to four cohort representatives per available IMI case were randomly drawn from the full cohort to the subcohort in 50 strata based on sex, age (above/below median) and study center in a stratified case–cohort design7. All 420 IMI cases, and 1,598 subcohort representatives, were drawn from the full cohort of 169,053 participants, as summarized in Fig. 1.

Exposures

Clinical variables (age, sex, height, weight, waist circumference, systolic and diastolic blood pressure, triglycerides, high-density lipoprotein (HDL) cholesterol, non-HDL cholesterol, low-density lipoprotein (LDL) cholesterol, total cholesterol, glucose, diabetes status, highest education, smoking status, previous smoking exposure, alcohol intake and physical activity) were harmonized between the cohorts (Supplementary Notes). Non-HDL cholesterol was calculated as total cholesterol − HDL cholesterol. LDL levels were calculated using the extended Martin–Hopkins equation23.

All blood samples were randomized into appropriate measurement plates, stratified by cohort (with a similar number from each cohort on every plate), and aliquoted into the plates. Quality controls are summarized below and described in detail in the Supplementary Notes.

Protein measurements were done using the Olink proximity extension assay (Olink), a highly specific 92-plex immunoassay. Overall, 829 proteins across nine panels (cardiometabolic, cardiovascular II, cardiovascular III, development, immune response, inflammation, metabolism, oncology II and organ damage) were analyzed, including 804 unique proteins (considering overlap between panels). Relative protein values on a log2 scale are reported, with each protein value normalized by plate by centering all plates at the same median, assuming random plate placement. Values below the assay’s lower limit of detection (LOD) were also included in the analyses.

Metabolites were analyzed using the UPLC–tandem MS (UPLC–MS/MS)-based Metabolon platform (Metabolon) by four different methods: reversed-phase UPLC–MS/MS with positive-mode electrospray ionization (early and late phase), reversed-phase UPLC–MS/MS with negative-mode electrospray ionization, and hydrophilic interaction LC/UPLC–MS/MS with negative-mode electrospray ionization. Overall, 1,135 metabolites were captured, including 925 with known identity and 210 with unknown identity. Relative metabolite levels were determined and normalized by analysis day. Metabolite levels were log2 transformed, and nondetectable levels (<LOD or metabolite not present in the sample) were constant value imputed to a value below the minimum metabolite value (minimum/sqrt(2)).

Samples that did not satisfy the quality control criteria were initially excluded; exclusion filters were applied separately for the proteomics and metabolomics analyses, and only samples passing quality control for both analyses were included in the analysis set. For the proteomics analysis, samples with more than 50% of panels failing for technical reasons were excluded (n excluded = 33). For the metabolomics analysis, samples were excluded because of low volume or detection of fewer metabolites than expected (n excluded = 4). Consequently, samples for 420 cases and 1,598 subcohort representatives remained for analysis.

Next, biomarkers with an extremely high proportion of nondetectable or below-LOD measurements were excluded, with the same exclusion filters for proteins and metabolites. Biomarkers had to be detected in all six cohorts with at least 30 detectable values across all cohorts (~1.5% of the MIMI samples) or were otherwise excluded. Consequently, 817 proteins (some duplicates) and 1,025 metabolites were retained for analysis.

Statistical analysis

All analyses were done using R (version 4.1.1)24 with the glmnet25, mice26, rms27, ranger28 and survival29 add-on packages.

One-by-one etiological analyses

In the discovery sample, the associations of all clinical variables (listed in Extended Data Table 1), proteins and metabolites with IMI were analyzed in separate weighted, stratified Cox proportional-hazards regression models adjusting for covariates, as described below. Inverse sampling probability weights (Borgan II) were applied to account for the case–cohort design in a stratified model, allowing for a different shape of the baseline hazard for each MIMI cohort (six levels) and using a robust variance estimator (Huber–White). Nonlinear relationships between continuous covariates (not including the biomarkers) and IMI were modeled using restricted cubic splines, and all factor variables were considered unordered.

Associations with an FDR (Benjamini–Hochberg) of <0.05 were taken forward to the validation sample, in which directionally consistent results with P < 0.05 were considered replicated.

Cox proportional-hazards models adjusting for technical covariates (season, storage time and plate) were initially applied. Replicating biomarkers from the model adjusting for technical covariates were investigated in a model further adjusting for age and sex. A model allowing for an interaction between the biomarker and sex was further tested. Replicating biomarkers in the model adjusted for age and sex were then subjected to causal assumptions (Extended Data Fig. 1), and a bias-minimized model for each biomarker was investigated, estimating the total effects (including the effects of mediators).

Missingness and sensitivity analyses

Clinical variables with high missingness (previous smoking exposure, alcohol intake and physical activity) were not used in the analyses. Protein values below the LOD were included in the analyses; nondetectable metabolite levels were replaced with a constant value, and a missing indicator was added, as described below. The remaining missing values in the covariates were multiple imputed (n imputations = 20) using chained equations including the outcome, clinical covariates and other variables correlated with the variable in the imputation model30. Regression results across imputed datasets were combined using Rubin’s rules31.

Interactions with sex were investigated by analyzing an interaction term for sex and each biomarker in models adjusting for technical covariates, age and sex. The interaction terms and all terms including the biomarker were tested using a multivariable chi-squared test with the same multiple-testing correction described above, requiring directionally consistent discovery and validation results.

The following secondary sensitivity analyses were included: random-effects inverse variance-weighted meta-analyses (DerSimonian–Laird) combining per-cohort results, leave-one-out analyses investigating the influence of single cohorts, complete-case analyses not imputing missing values in the clinical covariates, and analyses limiting the follow-up time to 3 months.

Simultaneous modeling and development of a prediction model

To attempt predicting this phenotype, we developed a prediction model for IMI using age, sex, anthropometric variables (height, weight and waist circumference), variables routinely collected in the laboratory (LDL cholesterol, HDL cholesterol, creatinine, glucose and triglycerides), systolic and diastolic blood pressure, smoking status (never, former or current) and education level. Regression coefficients were estimated using a weighted Cox ridge regression model, which shrinks coefficients toward zero using an L2 penalty to accommodate overfitting. The strength of the penalty (lambda) was determined using tenfold cross-validation over a grid of 250 lambda values, repeated 100 times. The lambda selection was repeated in each imputed dataset, and the coefficients associated with the lambda giving the lowest cross-validated deviance were extracted. The final coefficient set was obtained by taking the median of the coefficients from each imputed dataset. A single-imputed dataset was used for validation and calibration. The C-index, which indicates a model’s ability to rank the risks, was determined using 100 repeats of tenfold cross-validation. A calibration curve was constructed using 100 repeats of tenfold cross-validation32. All modeling steps were repeated in each fold to assess the calibration accuracy objectively. The model containing only clinical variables was then reduced by approximating the linear predictor from the full model through stepwise regression. Predictions from the full model were used as the outcome in a linear model wherein variables were dropped sequentially until R2 > 0.95. This yielded a highly parsimonious final model incorporating the main drivers of predictions. The prediction model was compared to SCORE2, a validated prediction model for the 10-year risk of cardiovascular disease developed using multiple European cohorts33. The 10-year survival probability and the covariate mean values used in the SCORE2 equations were replaced with the estimated 6-month survival probability and mean values from the current data to calculate the SCORE2-estimated 6-month cardiovascular disease risk34. Two additional external validations of the model were performed in the UK Biobank. First, all coefficients and covariate mean values in Supplementary Table 4 were used to validate the model. Second, the model was recalibrated using mean values and the estimated baseline risk from the UK Biobank cohort before validation.

To evaluate whether any biomarkers added to the clinical prediction model improve risk prediction, we used the linear predictor from the prediction model as an offset in a LASSO Cox regression model. Before model fitting, all proteins and metabolites were adjusted for technical variables. Briefly, each biomarker was used as the outcome variable in a regression model with all technical variables as covariates. The residuals from these models were used in place of the original biomarker values in the LASSO model. The LASSO model fitting was bootstrapped 250 times to investigate the stability of the variable selection.

As the biomarkers may have nonlinear associations with the outcome and interact with one another, and prior knowledge about nonlinearities and interactions among these variables is scarce, a random forest with 2,000 trees was fitted to the data as an exploratory analysis. Briefly, the random forest fits survival trees to bootstrap data samples using a random subset of the variables in each tree, handling interactions and nonlinearities naturally. A variable importance measure is associated with each variable and calculated based on the number of splits in which a variable is involved. The random forest was bootstrapped 250 times to obtain CIs for the variable importance measures.

Further analysis of relevant biomarkers

The associations of proteins detected using the Olink panels cardiovascular II and cardiovascular III with the CACS were available for testing in individuals free from cardiovascular disease (self-reported myocardial infarction, angina, coronary intervention, heart failure, atrial fibrillation, stroke and peripheral artery disease) for 1,586 participants at the Malmö or Uppsala centers of SCAPIS35. A higher CACS reflects a higher myocardial infarction risk. Proteins replicated in the primary MIMI analysis (BNP) were tested for an association with the CACS using an ordinal regression model adjusting for age, sex, body mass index, systolic blood pressure, creatinine, center, Olink plate, analysis date and season.

Consent

This study was approved by the Uppsala Ethics Authority (Dnr 2016/197). All Estonian Biobank participants signed a broad informed consent form. The study was carried out under ethical approval 258/M-21 from the research ethics committee of the University of Tartu and data release J08 from the Estonian Biobank. The Lifelines protocol was approved by the University Medical Center Groningen medical ethical committee under number 2007/152. The study was performed in accordance with the Declaration of Helsinki. The EpiHealth study was approved by the ethics committee of Uppsala University, and all participants provided informed written consent. The MFM was approved by the previous regional research committee in Lund, Sweden (2014/643), and all participants provided informed consent. Ethical review boards of the cohorts in EPIC-CVD approved the study protocol, and all participants provided written informed consent. Participation in the HUNT study was based on informed consent, and the Data Inspectorate and the Regional Ethics Committee for Medical Research in Norway approved the study.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.