Fibrinogen and hemoglobin predict near future cardiovascular events in asymptomatic individuals

To identify circulating proteins predictive of acute cardiovascular disease events in the general population, we performed a proteomic screen in plasma from asymptomatic individuals. A “Discovery cohort” of 25 individuals who subsequently incurred a cardiovascular event within 3 years (median age = 70 years, 80% male) was matched to 25 controls remaining event-free for > 5 years (median age = 72 years, 80% male). Plasma proteins were assessed by data independent acquisition mass spectrometry (DIA-MS). Associations with cardiovascular events were tested using Cox regression, adjusted for the New Zealand Cardiovascular Risk Score. Concentrations of leading protein candidates were subsequently measured with ELISAs in a larger (n = 151) independent subset. In the Discovery cohort, 76 plasma proteins were robustly quantified by DIA-MS, with 8 independently associated with cardiovascular events. These included (HR = hazard ratio [95% confidence interval] above vs below median): fibrinogen alpha chain (HR = 1.84 [1.19–2.84]); alpha-2-HS-glycoprotein (also called fetuin A) (HR = 1.86 [1.19–2.93]); clusterin isoform 2 (HR = 1.59 [1.06–2.38]); fibrinogen beta chain (HR = 1.55 [1.04–2.30]); hemoglobin subunit beta (HR = 1.49 [1.04–2.15]); complement component C9 (HR = 1.62 [1.01–2.59]), fibronectin isoform 3 (HR = 0.60 [0.37–0.99]); and lipopolysaccharide-binding protein (HR = 1.58 [1.00–2.49]). The proteins for which DIA-MS and ELISA data were correlated, fibrinogen and hemoglobin, were analyzed in an Extended cohort, with broader inclusion criteria and longer time to events, in which these two proteins were not associated with incident cardiovascular events. We have identified eight candidate proteins that may independently predict cardiovascular events occurring within three years in asymptomatic, low-to-moderate risk individuals, although these appear not to predict events beyond three years.

Cardiovascular disease (CVD) is a leading cause of death and disability worldwide 1 . In 2015 the prevalence of CVD and number of CVD-related deaths was estimated at 423 million and 18 million respectively, with ischemic heart disease (IHD) and stroke contributing the most to loss of age-standardized disability-adjusted life years 1 .
Determining an individual's risk for developing CVD is commonly based on a set of key variables, including age, gender, ethnicity, blood pressure, diabetes mellitus, smoking and lipid status. These established risk factors for CVD have been incorporated into risk prediction models such as the Framingham risk score (FRS), which is used to predict CVD incidence within 10 years 2,3 . Subsequent strategies to improve CVD risk prediction, such as the updated 5-year New Zealand (NZ) primary prevention equations, PREDICT-1°, have incorporated additional variables including the NZ Deprivation score (an area-based measure of socioeconomic deprivation), atrial fibrillation confirmed by electrocardiograph (ECG) and use of blood-pressure-lowering, lipid-lowering, and antithrombotic drugs in the 6 months before the index assessment 4 . The QRISK3 risk prediction model implemented in the UK takes into account 22 risk variables to estimate 10-year CVD risk, with the additional factors being chronic kidney disease, variability of systolic blood pressure, migraine, treatment with corticosteroids, systemic lupus erythematosus, atypical antipsychotic medication, severe mental illness, HIV or AIDS, and erectile dysfunction 5 . While risk prediction models such as these work well on the population level, they suffer from poor discrimination at the level of the individual. It remains difficult to predict which individuals within any broad risk stratum will subsequently experience a CVD event 6 . CVD events still occur frequently in people predicted to be at low-to-moderate risk 6 . Addition of endogenous biomarkers to CVD risk scores may refine risk stratification for the individual. The only circulating biomarkers routinely incorporated in current CVD risk prediction models are lipids i.e. total cholesterol (TC), high-density lipoprotein (HDL) or their ratio (TC/HDL) and calculated low-density lipoprotein (LDL). Additional biomarker molecules used to diagnose comorbidities in conjunction with CVD risk estimation include glycated hemoglobin (HbA1C) for diabetic status and creatinine for renal status 7,8 . Plasma concentrations of B-type natriuretic peptide (BNP), its amino-terminal pro-hormone congener (NT-proBNP) and the cardiac troponins T and I (TNT, TNI) are key biomarkers in the clinical diagnosis of heart failure (HF) and myocardial infarction (MI), respectively 9-12 . Both markers have also been shown to provide powerful information about CVD events in asymptomatic community populations, with modest elevations in these markers being independently predictive of atherosclerotic CVD, HF, fatal coronary events and total mortality [13][14][15] .
However, despite some improvement in risk prediction algorithms over the past decades, traditional risk factor profiling fails to identify many individuals at impending risk of an acute CVD event, highlighting the need for new strategies for risk prediction in the general population. We sought to identify novel circulating protein markers associated with incident CVD events in asymptomatic individuals by performing an unbiased proteomics screen using data independent acquisition mass spectrometry (DIA-MS), followed up in an independent cohort using enzyme-linked immunosorbent assays (ELISA).

Methods
Study participant recruitment and sample collection. Plasma samples for this case--control study (n = 50 for the Discovery cohort and n = 151 for the Extended cohort) were selected from the Canterbury Healthy Volunteers cohort (HVols, n = 3,358). HVols participants were randomly selected from the Canterbury (New Patient selection for DIA-MS discovery study. For the initial Discovery arm of the study using data independent acquisition mass spectrometry (DIA-MS), a subset of HVols aged < 80 years, BMI < 30 kg/m 2 , systolic BP < 150 mmHg, without diabetes and non-smokers, who subsequently experienced an acute CVD event within 3 years were identified (n = 25 cases, Fig. 1  www.nature.com/scientificreports/ elevation myocardial infarction, ST-elevation myocardial infarction, ischemic stroke, transient ischemic attack, unstable angina, other angina or death due to coronary heart disease. Matched controls (n = 25) meeting the same criteria but without incurring any CVD events for at least 5 years from recruitment were selected using the R package MatchIt 16 with "nearest" matching for age, BMI, and systolic blood pressure, and exact matching for gender.
Patient selection for ELISA extended cohort study. For the Extended arm of the study, a separate subset of healthy volunteers aged < 80 years, BMI < 35 kg/m 2 , systolic BP < 160 mmHg, without diabetes and nonsmokers, who experienced an acute CVD event within 5 years were selected (n = 76 cases, Fig. 1). Criteria for the Extended arm had to be relaxed compared with the Discovery Cohort, particularly the time to first event, due to limited sample numbers. Matched controls (n = 75) were selected as described above.

DIA-MS.
To mitigate batch effects a randomized blocked experiment design was used with equal numbers of cases and controls processed at the same time. Plasma was thawed on ice and centrifuged to remove particulate matter. The detailed sample preparation and data analysis methods are outlined in supplementary information.
In brief, plasma (2 µL) was denatured (Supplementary Table 1 Immunoassays. The concentrations of three leading candidate plasma proteins were measured using commercial ELISA kits, including fetuin A (DFTA00 Human Fetuin A Quantikine ELISA Kit, R&D Systems, Minneapolis, MN, USA), hemoglobin (AB157707 Human Hemoglobin ELISA Kit, Abcam, Cambridge, MA, USA) and fibrinogen (AB208036 Human Fibrinogen SimpleStep ELISA Kit, Abcam), with each assay performed according to manufacturer's instructions. All samples were run in duplicate, using a randomized, blocked experiment design to minimize bias due to inter-plate variability. Protein concentration was calculated from a calibration curve using StatLIA (Brendan Technologies, Inc, Carlsbad, CA, USA) 19 . Samples were re-analyzed if the coefficient of variation between duplicates was ≥ 20%.
Ethics approval and consent to participate. The study conformed to the Declaration of Helsinki and was approved by the New Zealand Health and Disability Ethics Committee (Reference CTY/01/05/062). All participants gave written, informed consent.

Results
A flow chart describing participant selection criteria is shown in Fig. 1. The baseline characteristics of participants in the Discovery and Extended arms of the study are summarized in Table 1. There were no significant differences in baseline characteristics or cardiovascular risk factors between cases and controls in either the Discovery or the Extended sub-study, with the exception of hsTNI, which was higher in cases compared with controls in the Extended cohort (p = 0.015).
Plasma mass spectrometric analysis. In the Discovery cohort, the 5-year risk PREDICT-1° score did not correspond to time-to-CVD-event (Cox model HR = 1.11 [0.79-1.56], p = 0.53), reflecting that the score alone was not able to distinguish those with subsequent CVD events from those who remained event free in this low-moderate risk sample. In the Discovery cohort, 76 proteins were robustly quantified in human plasma by DIA mass spectrometry. Of these, eight proteins/subunits were associated with CVD events independent of the PREDICT-1° score (Supplementary Table 2 Fig. 2). On average, protein levels differed 1.2-fold (range 1.1to 1.3-fold) between cases who had an event compared with controls who remained event free (Supplementary Table 2). Fibrinogen alpha chain, fetuin A, clusterin isoform 2 and hemoglobin were also independent of BNP, NT-proBNP, and hsTNI in Cox models (Supplementary Tables 3 & 4). Fetuin A and clusterin isoform 2 were highly correlated to one another ( Supplementary Fig. 1) leading to the final choice of candidates, fibrinogen, fetuin A and hemoglobin, for further analysis via ELISA assay.
ELISA versus DIA-MS assessment. The concentration of fibrinogen, hemoglobin and fetuin A in the same 50 samples from the Discovery arm were also measured using commercially-available ELISAs (Fig. 3). Limits of detection for DIA-MS were applied on each of the detected 9576 ions using IQR guided lower-end cutoffs (see Supplementary Methods). Our stringent quality control also included CV filtering of ions resulting in 1288 high-confidence ions for relative quantification with > 97% of these ions being detectable in each of the 50 people of the Discovery cohort. The coefficient of variation between duplicates measured by ELISA was ≤ 20% with all samples diluted appropriately to lie within the range of the standard curves. DIA mass spectrometry and www.nature.com/scientificreports/ ELISA methods were strongly correlated for hemoglobin (Spearman correlation r = 0.76 and 0.70, p < 0.05 for the hemoglobin alpha chain and the hemoglobin beta chain, respectively), and moderately correlated for fibrinogen (Pearson correlation r = 0.45, r = 0.45, r = 0.50, p < 0.05 for fibrinogen chains alpha, beta and gamma). However, no correlation was observed between DIA-MS and ELISA quantitation for fetuin A (Pearson correlation r = 0.15, p = 0.30). Therefore, we took both hemoglobin and fibrinogen (but not fetuin A) as candidates to validate our findings in the larger Extended cohort using ELISA assays.
Extended cohort. There were statistically significant differences between the risk factor profiles of the

Discussion
Biomarkers to improve prediction. This study provides the first proof of principle that a DIA-MS proteomic approach can be used to identify additional circulating plasma biomarkers to improve current population-based screening approaches to predict future CVD events in asymptomatic individuals. The absence of predictive value of the PREDICT-1° score on its own in this low-to-moderate risk population sample demonstrates the need for improved models to estimate CVD risk. The three top ranked candidate plasma biomarkers from our DIA-MS Discovery screen (hemoglobin, fibrinogen and fetuin A) were independently associated with nearfuture CVD events (≤ 3 years) in asymptomatic individuals, independent of the PREDICT-1° score. However, in the Extended cohort, where the median time-to-event was considerably longer and individuals with a broader range of risk factors were included, neither hemoglobin nor fibrinogen were independently predictive of future CVD events in Cox-modelling. It is conceivable that these two proteins are associated with more immediate risk as the time to first event was ~ threefold shorter in the Discovery cohort (median of 244 days) compared with 791 days in the Extended Cohort, despite the Discovery cohort being characterised by a generally lower burden of other known risk variables than the Extended cohort (Table 1). Further, it is possible that the moderate correlation between detection techniques, DIA-MS and ELISA, contributes to the fact that the relationship between protein and CVD outcome observed in the Discovery cohort was not demonstrated in the Extended cohort. In general, a high correlation between ELISA and mass spectrometric results can be expected if the peptides used for quantification in DIA-MS and the detection of epitopes by ELISA antibodies allow equivalent detection of the analyte by the respective technology. However, there are many examples of mass spectrometry and ELISA not correlating strongly, oftentimes due to post-translational modification or due to cross-reactivity issues in ELISA [20][21][22] . Our findings indicate that certain circulating markers may improve risk prediction of impending CVD events in low-to-moderate CVD risk individuals. Risk prediction models such as the NZ PREDICT-1° score, and the UK QRISK3 work best on the population level but suffer from poor discrimination at the level of the individual. Up to 20% of patients diagnosed with CVD have no traditional risk factors, and up to 40% have only one, suggesting that the accuracy of risk prediction models for individuals is modest and that CVD commonly occurs in people predicted to be at low-to-moderate risk 6 . Biomarkers may add value for individual risk stratification, especially for those in the low-to-moderate risk "grey zone, " where clinicians often do not have appropriate tools available to guide their decision making, and in those of older age (i.e. > 75 years). In our study of older individuals with a relatively low burden of cardiovascular risk factors, the NZ PREDICT-1° risk score was unable to discriminate between those who subsequently incurred a CVD event and those that remained event-free. Our findings highlight the feasibility of an unbiased DIA-MS approach for discovery of additional circulating plasma biomarkers that may add value to established risk factors and current approaches.
In the Discovery cohort we identified eight proteins (i.e. fibrinogen alpha chain, fetuin A, clusterin isoform 2, fibrinogen beta chain, hemoglobin subunit beta, fibronectin isoform 3, complement component C9, lipopolysaccharide-binding protein) that have potential to improve risk prediction of a near-future CVD event (within 3 years). Of these, fibrinogen is probably the best characterized in terms of its relation to CVD risk prediction. It has been previously proposed that circulating fibrinogen concentration be included in CVD risk stratification 23,24 . A previous study reported the age-and sex-adjusted hazard ratio per 1 g/L increase in usual fibrinogen concentration was 2.42 (95% CI, 2.24-2.60) for coronary heart disease; 2.06 (95% CI, 1.83-2.33) for stroke; 2.76 (95% CI, 2.28-3.35) for other vascular mortality and 2.03 (95% CI, 1.90-2.18) for nonvascular mortality 25 . Our data extend these findings by suggesting that fibrinogen remains independently predictive of outcomes even when adjusted for multiple other cardiovascular risk factors.
Hemoglobin circulating freely in the plasma and not contained within the erythrocyte has been implicated in pathological conditions, and has been proposed to contribute to abnormal production of highly reductive and oxidative compounds via the four heme ligands incorporated into each hemoglobin protein 26 . While not specifically associated with CVD risk, increased concentrations of cell-free hemoglobin have been linked to Figure 2. Cox Hazard Ratios of z-score standardized protein abundance in the Discovery cohort, adjusted for the log 2 -transformed PREDICT-1° 5-year CVD risk score. The whiskers represent 95% confidence intervals. The dashed vertical line corresponds to a Hazard Ratio of 1 (i.e. no difference of CVD risk based on this protein). The dashed horizontal line divides those proteins with p < 0.05 from those proteins with p 0.05. Z-score scaling of protein biomarker data was used instead of using raw protein mass spectrometric intensity to allow easy comparison between proteins. A one unit increase in z-score scaled data represents a one standard deviation increase in protein concentration. Protein IDs are the mnemonic of the uniprot ID (Supplementary Table 5). Figure generated using the 'survival' (https ://githu b.com/thern eau/survi val) and 'ggplot2′ (https ://githu b.com/ tidyv erse/ggplo t2) packages within R/RStudio 17,18,37,38 . www.nature.com/scientificreports/ micro ruptures in the fragile vasculature of unstable atherosclerotic plaques 27 . In this context, hemoglobin has been reported to function as a "locally active disease modifier" and "intrinsic alarm molecule" (i.e. a biomarker) indicating bleeding and tissue destruction 28 .
The multifaceted biological role of Fetuin A has been recently reviewed and has been variably reported to be inversely correlated with coronary artery disease and atherosclerotic burden (presumed due to protection against vascular calcification) and positively correlated with coronary artery disease, possibly via its role in diabetes mellitus through the inhibition of insulin receptor tyrosine kinase 29 . Proteomics and biomarker discovery approaches. We have used a robust and unbiased analysis protocol for measuring plasma proteins in human plasma. Applying DIA mass spectrometry to biomarker discovery allows the identification and quantifation of > 300 proteins 30 . All of these proteins tend to be in the high-to medium-abundant concentration range in plasma, and so are readily detectable by a number of clinical assay www.nature.com/scientificreports/ platforms. However, depletion or further fractionation steps may be required to detect low-abundance proteins using DIA 31 . Plasma is the ideal, accessible sample reservoir for discovery of potential disease biomarkers, as it captures a considerable proportion of the proteome (~ 2,000 out of ~ 20,000 known proteins detectable using mass spectrometry, not considering isoforms) 32 . Furthermore, DIA approaches interrogating phosphorylation and glycosylation patterns, important for cell-signalling and protein function, are also being elucidated and likely to contribute further in the identification of novel biomarkers 33,34 . Continuous improvements in mass spectrometry technology, analysis pipelines and sample throughput are likely to contribute to future precision medicine biomarker discovery 35 .
The selectivity/specificity of mass spectrometry is superior to that of many ELISAs and also facilitates interrogation of post-translational modifications, which carry additional biological information that may be related to pathophysiology 36 . However, sensitivity of MS methods is still lacking compared with ELISA, especially for complex mixtures such as plasma where a small fraction of proteins dominates the contribution to the overall protein concentration. Additionally, ELISA is easily scalable with high-throughput easily implemented. DIA-MS methods are especially useful for tissue and cell-line proteomics, where the dynamic range of protein concentrations is smaller compared to that of plasma.
Limitations. Our study has several limitations. First, by design we selected individuals with a relatively low burden of cardiovascular risk considering their older age, which limited the sample size of our Discovery and Extended cohorts. Second, our Discovery and Extended cohorts were not well-matched for time to first event (3 years vs. 5 years) and cardiovascular risk, reducing our ability to replicate our findings. Third, the heterogeneity of outcomes and the different proportions of each outcome in the Discovery and the Extended cohorts may reduce power to detect biomarker associations with specific outcomes. Fourth, despite the good correlation in the Discovery phase, the two techniques (DIA-MS and ELISA) have inherent differences affecting their specificity in terms of which part of the protein is detected. Fifth, the presence of abundant plasma proteins in our samples may have prevented detection of less abundant proteins with similar or stronger associations with CVD events. The MS-detectable plasma proteome is approximately 2000 proteins 32 , of which 76 were robustly quantified across the 50 DIA samples in this study. This is approximately 28% of the number of proteins observed in other studies (272 robustly quantified proteins) using MS1-level BoxCar quantification. This reflects differences in mass spectrometric data acquisition methods and quality filtering for inclusion/exclusion of peptides into the final data-set 32 .

Conclusions
DIA-MS methods have great potential in biomarker discovery/hypothesis generation due to measuring many proteins simultaneously. Using DIA-MS we have identified eight protein biomarkers that may be independently associated with near-future CVD events (< 3 years) in asymptomatic individuals of older age with a relatively low cardiovascular risk.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.