Performance of A Metabolomic Biomarker Score Compared to Three Prognostic Scores in Chronic Heart Failure


 The Cardiac Lipid Panel (CLP) is a novel panel of metabolomic biomarkers that has previously shown to improve the diagnostic and prognostic value for CHF patients. Several prognostic scores have been developed for cardiovascular disease (CVD) risk, but their use is limited to specific populations and precision is still inadequate. We compared a risk score using the CLP plus NT-proBNP to three commonly used risk scores: The Seattle Heart Failure Model (SHFM), Framingham Risk Score (FRS), and Meta-Analysis Global Group in Chronic Heart Failure (MAGGIC) score. We included 280 elderly CHF patients from the Cardiac Insufficiency Bisoprolol Study in Elderly (CIBIS-ELD) trial. Cox Regression and hierarchical cluster analysis was performed. Integrated area under the curves (IAUC) was used as criterium for comparison. The mean (SD) follow-up period was 81 (33) months, and 95 (34%) subjects met the primary endpoint. The IAUC for FRS was 0.53, SHFM 0.61, MAGGIC 0.68, and CLP 0.78. Subjects were partitioned into three risk clusters: low, moderate, high with the CLP score showing the best ability to group patients into their respective risk cluster. A risk score composed of a novel panel of metabolite biomarkers plus NT-proBNP outperformed other common prognostic scores in predicting 10-year cardiovascular death in elderly ambulatory CHF patients. This approach could improve the clinical risk assessment of CHF patients.


Introduction
The prevalence of chronic heart failure (CHF) in the western world continues to increase, especially in patients older than 65 years [1]. CHF is a major burden on the health care system and is associated with high morbidity and mortality, including a poor quality of life [2]. An important aspect of CHF management is to ensure that clinicians and patients with CHF have the necessary knowledge and resources to make the best health decisions. A prognostic model is one such resource, de ned as a formal combination of multiple predictors from which risks of a speci c outcome can be calculated for individual patients.
Prognostic models are abundant in the literature, and the most popular ones include the SHFM (Seattle Heart Failure Model), FRS (Framingham Risk Score), and MAGGIC (Meta-analysis Global Group in Chronic Heart Failure). The SHFM score is the most thoroughly validated and contains the most predictor variables of the three [3]. The MAGGIC score [4] was developed from a dataset of over 39,000 patients across 30 studies and validated on more than 60,000 patients using 2 large CHF cohorts [5,6]. The FRS score was developed as a sex-speci c risk score that can be conveniently used to calculate general cardiovascular disease (CVD) risk and risk of individual CVD events [7]. These models all use common clinical and demographic variables to predict the prognosis of CHF patients and have convenient online calculators. Although these scores have been validated, they have not been widely adopted possibly because they are not routinely calculated in clinical practice [8][9][10], have poor reliability at the individual patient level [11], suffer from a signi cant amount of missing data requiring imputation.
Metabolomics is a rapidly growing eld in biomarker pro ling that could help meet the need for more robust prognostic biomarkers. By applying nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS), it is now possible to analyze a hundreds of metabolites from human samples such as blood, urine, saliva, and tissue, which can elucidate the outcome of complex networks of endogenous and exogenous biochemical reactions [12]. This approach could provide a more comprehensive signature of biochemical activities that could be associated with diet, medication, disease progression, and thus negative outcomes due to these complex mechanisms [13,14]. Previous studies have shown that metabolomic biomarkers can be used for risk prediction as well as diagnosis of CHF [15][16][17][18][19][20][21][22][23][24][25][26][27].
One promising metabolomic biomarker panel in CHF patients is the cardiac lipid panel (CLP) which is supplemented by N-terminal pro-B-type natriuretic peptide (NT-proBNP). The CLP is consists of three speci c metabolomics features: triacylglycerol (TAG) 18:1/18:0/18:0, phosphatidylcholine (PC) 16:0/18:2, and the sum of the 3 isobaric sphingomyelin (SM) species SM d18:1/23:1, SM d18:2/23:0, and SM d17:1/24:1. The diagnostic value of CLP was rst discovered in a study by Mueller and colleagues, where they compared CHF patients to healthy controls, and found that CLP was able to improve the diagnostic performance over NT-proBNP alone [28]. The incremental prognostic value of the CLP was rst assessed in a recent study which found it improved the discrimination and risk assessment over NT-proBNP and clinical risk factors [29].
The objective of this study was to compare the performance of a risk score composed of the CLP panel plus NT-proBNP to the three commonly used traditional risk scores (SHFM, FRS, and MAGGIC) to predict long-term cardiovascular mortality in ambulatory CHF patients. We hypothesized that the CLP risk score would improve our ability to classify risk of cardiovascular death in comparison to the three validated clinical risk prediction algorithms. Table 1 shows the baseline characteristics of the total population (n = 280) as well as the variables included in each score. Mean age of this sub-cohort was 72.1 (4.9) years, 26.4% were women, 45% patients had heart failure with reduced ejection fraction (HFrEF) (LVEF < 35%), and most patients were in NYHA functional class II (67.5%) with the remaining in NYHA class III. Hypertension was present in 80% of participants and 45% were current or former smokers; 29% had diabetes and 71% had CAD. During the follow-up period (mean = 81 months, SD = 33; median = 96 months), 95 (34%) patients met the primary outcome. The sample selection criteria as well as the comparison of this sub cohort's baseline characteristics to the source cohort has previously been reported [29], however, this study analyzed 10 year follow up rather than the previously reported 4 year follow up. All variables were available for each score except for the lymphocytes (%) variable in the SHFM score, which was imputed as previously described. The SHFM model had the highest number of variables (n = 17), followed by MAGGIC (n = 13), FRS (n = 7), and CLP (n = 4). There were 10 overlapping variables which were included in at least 2 scores. The SHFM score included the most medication (n = 6) and laboratory (n = 5) variables while MAGGIC included the most clinical (n = 7) and demographic variables (n = 3).   Table 1). The three traditional scores were all signi cantly different (p < 0.001) from the CLP score according to Uno's difference in concordance statistic (Supplemental table 2).  Figure 2 shows the hierarchical cluster dendrogram mapped to illustrate the assignment of patients into their respective clusters and the associated color map shows the range of each prognostic score and their distribution within each cluster. Hierarchical clustering grouped the patients in separate clusters accounting for the noise between smaller clusters. Each observation was treated as a unique cluster, and this method: (1) identi ed the two similar or close clusters, and (2) merged the two most similar clusters.

Results
Using this clustering technique, similar prognostic score data from participants were grouped together, such that the members in the same group were more similar to each other than the members in the other groups. We can infer from the cluster centres and cluster memberships that CLP risk score was better at grouping patients with respect to their cardiovascular mortality risk and associated clinical characteristics compared to the other three scores. The survival curves for each risk cluster are shown in Fig. 3. Rates of mortality were: low risk cluster (23%), moderate risk cluster (41%) and high-risk cluster (54%). Supplemental Fig. 1 shows the constellation plot on a two-dimensional plane with nodes and links to describe relationship among component nodes. This plot is an alternate depiction of the dendrogram and illustrates the length between clusters and a balanced structure. Supplemental Fig. 2 shows the scatterplot matrix of all 4 scores and clusters to illustrate the relationships between each prognostic score and risk cluster assignment. Table 3 shows the cohort characteristics and the prognostic score distribution for each risk cluster. The three clusters were: low risk, n = 148; moderate risk, n = 74; high risk, n = 58. Most characteristics were different across risk clusters, with 28 out of the 50 cohort characteristics signi cantly different across the 3 clusters. In particular, patients in the highest risk cluster were older, with lower LVEF, higher NT-proBNP, and experienced a higher frequency of events. All prognostic scores were signi cantly different across their respective risk clusters. Of the continuous risk scores (FRS, SHFM, MAGGIC), only FRS had its highest mean score in the high-risk cluster. The categorical CLP score showed a skewed distribution of higher risk scores (3)(4) in the moderate and high-risk clusters. In the high-risk cluster, the majority of subjects were scored with highest possible CLP score of 4. The correlation of the CLP biomarkers TAG, PC, and SM were most correlated with the clinical characteristics triglycerides (r = 0.531, p < 0.001), total cholesterol (r = 0.431, p < 0.001), and LDL (r=, 0.502, p < 0.001), respectively (Supplemental Fig. 3).

Discussion
We found that a risk score based on a novel panel of three metabolite-based biomarkers plus NT-proBNP outperformed commonly used traditional prognostic models for predicting cardiovascular mortality in elderly ambulatory CHF patients. We rst measured the association of each risk score with the outcome, followed by discrimination analysis, then cluster analysis to support the discrimination ndings, and nally correlation analysis of the individual CLP biomarkers with the clinical characteristics. In our study cohort, CLP score, showed the best discrimination compared to the other 3 scores. This indicates that the biomarker information included in the CLP score could more precisely classify high risk CHF patients than the information included in the 3 other risk scores. On the other hand, the biomarker information from the CLP is not as easily attainable and no convenient calculator exists yet, as these ndings should rst be validated in larger cohorts. Nevertheless, the other risk scores may be improved with the addition of common biomarkers in their score calculation. For instance, NT-proBNP is a well-established biomarker that is known to be associated with ventricular wall stress [43] and is considered the gold-standard biomarker in CHF diagnosis and prognosis [44]. None of the other prognostic scores included NT-proBNP in their risk score calculation.
We performed cluster analysis to assess how well the risk scores could partition subjects into different risk groups, blinded to the study outcome. A strength of this approach is that clusters could de ne relevant groups of patients and could mitigate the problems of multicollinearity while determining if the predictive variables are useful in separating these groups. In our study, patients within each cluster varied considerably along measures of age, comorbidities, laboratory parameters, as well as the prognostic scores. Several prior studies have used similar clustering methods to identify clinically relevant patient subgroups for CHF [41,42], but we are not aware of previous studies using clustering methods to compare a novel biomarker score to other conventional prognostic scores for CHF.
The combination of the CLP's metabolomic features with NT-proBNP into a risk score may help overcome limitations of using only traditional clinical risk factors. Furthermore, application of a single biomarker such as NT-proBNP for outcome prediction is limited by insu cient speci city (low predictive value or high false positive rate) [45,46]. Recently, it was reported that the CLP added incremental prognostic value to NT-proBNP in predicting 4-year cardiovascular mortality [29]. We used the same method to calculate the CLP score for this study, as we were not interested in the CLP's incremental value to NT-proBNP, rather the comparison of the CLP score to other established prognostic score. A recent metaanalysis of 18 metabolomic prognostic biomarker studies for CVD found those which incorporated a selection of metabolites into a score (n = 5 studies) had the best prognostic performance rather than using the individual biomarker values [15]. Another systematic review [19] reported 6 studies [20][21][22][23][24][25] developed a metabolite-based score to predict CVD risk with each score composed between 4 and 16 biomarkers.
In addition to improving risk prediction, developing a biomarker-based risk score could also improve our understanding of the pathophysiology and biological mechanisms involved in CHF. The CLP metabolites can be grouped into three different lipid subclasses, sphingomyelin (SM) phosphatidylcholine (PC), and triglycerides (TAG), which have previously been found to be associated with cardiomyocyte stress/apoptosis [47], intestinal microbial metabolism/in ammation [18], and coronary artery disease [48], respectively. The ndings in this study showed that the individual CLP metabolites were correlated with traditional lipid measures; the sphingomyelin metabolite was correlated with LDL, phosphatidylcholine with total cholesterol, and triacylglycerol with total triglycerides. Future studies are needed to establish whether the CLP biomarkers are characteristic of altered biological pathways or are representative of CHF compensation mechanisms.

Study Limitations
The homogeneity of this cohort, elderly patients with stable CHF, may have had an impact on the performance of the prognostic scores. The SHFM score may have been affected by the imputation of lymphocytes % as well as the lack of patients taking allopurinol. The FRS was originally developed for coronary artery disease and not CHF, which may explain its poor performance on this cohort. The CLP biomarker kit was developed for routine use in the clinic; however, it is still a research biomarker panel pending regulatory approval and must be sent to a lab equipped with MS technology. Our ndings are limited to this population of elderly CHF patients and future validation studies should be performed to include a more heterogenous cohort such as younger, more women, and early/ asymptomatic patients.
Other common biomarkers such as ST2, hs-CRP, and troponins should be compared to the CLP as they are more readily available and do not require samples be sent to a specialized lab. The CLP panel was originally developed as a diagnostic and early detection biomarker for HFrEF, and clinicians and researchers should be cautious when using it as a prognostic tool, as these are still preliminary ndings.

Conclusion
In a cohort of ambulatory CHF patients, we have shown that the prognostic scores included in this study were useful in stratifying patients into risk clusters. Our ndings demonstrate that the CLP risk score comprising a panel of 3 novel metabolomic biomarkers and NT-proBNP, could improve the prediction of cardiovascular mortality over traditional prognostic scores. In the future, a broader array of biomarkers should be integrated into a more comprehensive risk score that may improve discrimination potential and risk strati cation and the CLP offers a promise. The CLP score is a step in the direction of providing a more precise decision support tool to assist clinicians and patients in managing their CHF treatment.

Study Population
This study used a sub-cohort randomly selected from the Cardiac Insu ciency Bisoprolol Study in Elderly (CIBIS-ELD) trial, a multi-center, randomized, double-blind trial with ≥ 65-year-old patients being treated for CHF. The original study design and results of the CIBIS-ELD trial have been published previously [30,31]. Brie y, patients with CHF were randomized in a 1:1 fashion to receive two different beta-blockers, either bisoprolol or carvedilol, and up titrated every fortnight for 12 weeks and then followed at 10 years. From this source cohort (n = 883), there were n = 589 with available blood samples. Patients were randomly selected and included in the analysis only if they passed quality control [32,33]

Biomarker Measurements
Targeted metabolite pro ling of the serum samples which passed quality control was performed at a specialized metabolomics lab using a commercially available kit. The kit uses a protocol based on a 1phase extraction of the blood samples followed by gas chromatography mass spectrometry (GC-MS) (Agilent 6890 GC coupled to an Agilent 5973 MS-System) and liquid chromatography tandem-mass spectrometry (LC-MS/MS) (Agilent 1100 HPLC-System coupled to an Applied Biosystems API4000 MS/MS-System) analysis as previously described [28]. The analytical protocol was designed for routine measurement in the clinical practice setting; however, it is currently only available in specialized labs equipped with MS technology. The samples were stored at − 80 °C and transferred on dry ice prior to analysis. The three CLP metabolomic features and NT-proBNP measurements, were generated at baseline, only for the previously mentioned samples (n = 280). NT-proBNP was a measured using commercially available assays (Elecsys, Roche Diagnostics).

Calculating Prognostic Scores
Each prognostic score was calculated using the corresponding method proposed by the original authors (3)(4)(5). For calculating the SHFM score, % lymphocyte was missing, and the median (31%) of the normal range (20-40%) was imputed for all subjects. The CLP risk score was calculated as the count of biomarkers above the Youden index cut-off [35]. The Youden's index calculates each biomarker's optimal cut-off from the Cox regression. There were 4 cut-off values, since four biomarkers are included in the score: three from the CLP and NT-proBNP. Based on the cut-off, a value of 1 or 0 was assigned if the biomarker value was above/below the cut-off value, or in the direction of greater risk, then all 4 values were summed to generate the nal score for each subject. The score ranged from 0 to 4, higher scores indicating higher risk. The primary outcome, cardiovascular death, was de ned as death by myocardial infarction, non-responding arrhythmia, asystole, chronic pump failure, or other cardiac cause and veri ed by a blinded committee of cardiologists.

Statistical Analysis
Power and Sample Size The sample size was adjusted for an anticipated event rate of 0.34. A Cox regression of the log hazard ratio on a covariate with a standard deviation of 1.5 based on a sample of 257 observations achieves 80% power at a 0.050 signi cance level to detect a regression coe cient equal to 0.2. Adjusting for an anticipated loss to follow up rate of 10%, the nal sample size would be 283.

Discrimination Analysis
Categorical variables were expressed as number (%) and continuous variables were expressed as mean (SD). Univariate Cox Regression was performed on each of the prognostic scores, and hazard ratios and 95% con dence intervals were calculated to assess their relationship with the outcome. For the survival models, integrated area under the receiver operator curves (IAUC) and Harrell's c statistic [36] calculated to assess the discrimination of each score in predicting the outcome. Hypothesis testing of the change in discrimination was performed by calculating the differences in Uno's concordance statistics [37]. The IAUC curves are computed as a weighted average of the AUC values at all the event times, with the weights as the jumps of the Kaplan-Meier estimate of the survivor function Cluster Analysis Hierarchical cluster analysis was performed using Ward's minimum variance method to assess each prognostic score's ability to separate cases into risk groups. The distance between two clusters is the ANOVA sum of squares between the clusters summed over all variables. Only the 4 risk scores used as the input variables for the cluster analysis to examine how well they classi ed patients into a low, moderate, and high-risk of cardiovascular mortality. Data was standardized (mean of 0 and SD of 1), to perform clustering. The clinical characteristics and scores were compared across risk clusters. Comparisons among continuous variables were performed using Wilcoxon rank sum test; and Pearson's chi-square test (or Fisher's exact test) or Mantel-Haenszel Chi-square test for categorical and ordinal data, respectively. Kaplan-Meier curves were used to compare the survival distribution across risk clusters. Survival time was calculated from baseline until cardiovascular death or censoring at 10 year follow up.

Correlation Analysis
To investigate potential relationships between the CLP biomarker values and common clinical parameters, Pearson's correlation coe cients were calculated, signi cant at the 0.01 level (2-tailed). All analyses were performed using SAS statistical software version 9.4 (SAS Institute, Inc., Cary, North Carolina) and JMP pro software version 14, and R software version 3.6.1 [38][39][40].  Supplementary Files This is a list of supplementary les associated with this preprint. Click to download.