A metabolic profile of all-cause mortality risk identified in an observational study of 44,168 individuals

Predicting longer-term mortality risk requires collection of clinical data, which is often cumbersome. Therefore, we use a well-standardized metabolomics platform to identify metabolic predictors of long-term mortality in the circulation of 44,168 individuals (age at baseline 18–109), of whom 5512 died during follow-up. We apply a stepwise (forward-backward) procedure based on meta-analysis results and identify 14 circulating biomarkers independently associating with all-cause mortality. Overall, these associations are similar in men and women and across different age strata. We subsequently show that the prediction accuracy of 5- and 10-year mortality based on a model containing the identified biomarkers and sex (C-statistic = 0.837 and 0.830, respectively) is better than that of a model containing conventional risk factors for mortality (C-statistic = 0.772 and 0.790, respectively). The use of the identified metabolic profile as a predictor of mortality or surrogate endpoint in clinical studies needs further investigation.

R obust predictors of intermediate-and long-term mortality may be valuable instruments in clinical trials and medical decisionmaking. Predicting mortality in the final year of the life of a patient is generally feasible because of the abundance of available clinical data 1 . There is no consensus on the ultimate set of predictors of longer-term (5-10 years) mortality risk, since the predictive power of the currently used risk factors is limited 2 , especially at higher ages. However, it is especially this age group and follow-up time window for which a robust tool would aid clinicians in assessing whether treatment is still sensible. Some of the currently used risk factors for mortality, such as systolic blood pressure and total cholesterol, show opposite associations with mortality in the elderly (i.e., above 85 years) as compared to middle age 3,4 . This could be due to mortality crossover of these risk factors or metabolic shifts that are difficult to predict in individuals 5,6 , thus making them less suitable for accurate prediction of mortality in older individuals. Given the multimorbidity among older people, predictors of intermediate-and long-term mortality should ideally represent generic immune-metabolic health adversity rather than only being indicators of specific pathology. The number of molecular scores that are able to predict mortality across all ages is currently limited.
Fischer and colleagues used a high-throughput and wellstandardized nuclear magnetic resonance (NMR) platform and identified four metabolic biomarkers, i.e., albumin, glycoprotein acetyls (GlycA), mean diameter for very low-density lipoprotein (VLDL) particles and citrate that are independently associated with all-cause and cause-specific (cardiovascular disease (CVD) and cancer) mortality [7][8][9] . The same metabolomics platform had also been utilized to predict CVD and type 2 diabetes [10][11][12] . Although the initial sample size of the study by Fischer and colleagues was large, the statistical power of the study was limited due to the relatively small number of observed deaths (n = 684) and underrepresentation of older individuals.
The current metabolomics study is the largest thus far, and includes 44,168 individuals (from 12 cohorts), spanning a wide age range. We first determine which metabolic biomarkers independently associate with prospective mortality in all individuals. Subsequently, we test the association of the biomarkers with mortality in different age strata. In the FINRISK 1997 cohort, consisting of 7603 individuals of whom 1213 died during follow-up, we compare the predictive value of a score based on the identified mortality-associated biomarkers with a score based on conventional risk factors for mortality.

Results
Association of metabolic biomarkers with all-cause mortality. The primary survival meta-analysis of the 226 metabolic biomarkers for all-cause mortality was performed in 44,168 individuals from 12 cohorts, of whom 5512 died during follow-up (mean follow-up time per study ranging from 2.76 to 16.70 years) ( Table 1). As depicted in Supplementary Data 1, 136 of the biomarkers showed a significant association with all-cause mortality after adjustment for multiple testing. When we subsequently adjusted for the 4 previously identified metabolic biomarkers 9 , i.e., albumin, GlycA, VLDL particle size and citrate, the number of significant biomarkers increased to 159 (including the 4 previously identified biomarkers) (Supplementary Data 1). As the majority of the associated biomarkers are highly correlated, we tried to identify all independent biomarkers that were significantly associated with mortality. For this, we used a stepwise (forward-backward) procedure. To decrease the chance of overfitting, we only included a subset of 63 biomarkers in this step (Supplementary Data 2). Since the associations of the biomarkers with mortality in the primary survival analysis were similar in men and women (Supplementary Data 3), we performed the secondary analyses in men and women combined to increase power. After the stepwise procedure, 14 biomarkers showed to be independently associated with mortality. For the total lipids in chylomicrons and extremely large VLDL and small high-density lipoprotein (HDL), the mean diameter for VLDL particles, the ratio of polyunsaturated fatty acids to total fatty acids, and the concentrations of histidine, leucine, valine, and albumin a higher level is associated with decreased mortality, while for the concentrations of glucose, lactate, isoleucine, phenylalanine, acetoacetate, and GlycA the opposite applies (Table 2 and Supplementary Data 1). Of note, of the 4 previously identified mortality-associated biomarkers, only citrate was not selected in the fully adjusted model with 14 biomarkers due to its limited additional contribution. An increase of one unit in the metabolic biomarker score based on the 14 identified biomarkers, which ranges between −2 and 3 in most cohorts (see Supplementary  Fig. 1, for examples), is associated with a 2.73 times higher mortality risk (HR = 2.73, 95% CI: 2.60-2.86, P < 1.00 × 10 −132 ). The forest plots for each of these 14 biomarkers, based on the fully adjusted model, and the biomarker score are depicted in Supplementary Figs. 2-16. Association of biomarkers with disease-specific mortality. To determine whether the identified biomarkers are indicators of disease-specific mortality risk, we also explored the associations of the biomarkers with cardiovascular, cancer, and infection-related mortality in the FINRISK 1997 cohort. As indicated in Table 3, the majority of the biomarkers associated with multiple mortality outcomes in the same direction as observed for all-cause mortality, including nonlocalized infections, thus representing general markers of health and disease, although some biomarkers, such as glucose, seem to be risk factors for a specific mortality outcome, in this case cardiovascular-related mortality.
Association of metabolic biomarkers across the lifespan. To investigate the association across the lifespan for the mortalityassociated biomarkers identified in this study, we performed agestratified mortality analyses. All 14 biomarkers that were part of the fully adjusted model showed consistent associations with mortality across all strata (Supplementary Data 4) and the same was true for the metabolic biomarker score ( Supplementary Fig. 17).
Mortality risk prediction accuracy of identified biomarkers. To determine the mortality risk prediction accuracy of the 14 identified biomarkers, we generated weighted risk scores based on conventional risk factors and on our identified biomarkers plus sex. The weights of the risk scores were estimated in the Estonian Biobank cohort and the FINRISK 1997 cohort was used as validation cohort to compare the mortality risk prediction accuracy of the models. Instead of looking at the added value of the individual biomarkers, we directly compared the two models to determine if this single point NMR measurement on itself could be used as a standard for risk assessment of mortality. Removal of FINRISK 1997 from the discovery analysis resulted in similar effect estimates as those reported in Table 2, indicating that it is unlikely that the risk prediction analyses are influenced by overfitting. Given the restricted follow-up time in the elderly cohorts and the need for mortality risk indicators in the clinic at higher ages, we investigated both 5-and 10-year mortality in all individuals as well as only in those above 60 years of age. As depicted in Fig. 1 and Table 4, the C-statistic was 0.065 (P = 5.48 × 10 −4 ) or 0.040 (P = 2.48 × 10 −5 ) larger when comparing the model with the 14 biomarkers (C-statistic = 0.837 and 0.830) to the model with conventional risk factors (C-statistic = 0.772 and 0.790) when looking at 5-or 10-year mortality, respectively. The difference in the C-statistic was even larger when only individuals above 60 years of age were included (Table 4). Reclassification analyses showed higher integrated discrimination improvement (IDI) (e.g., + 8.6% (P = 1.83 × 10 −12 ) for 10-year mortality) when comparing the model with the biomarkers to the model with conventional risk factors (Table 4). When compared to a model with the 4 previously identified mortality-associated biomarkers, the model with the 14 biomarkers also showed higher C-statistics and IDI's for both 5-and 10-year mortality (Supplementary Table 1). Since the conventional risk factors were only partially correlated with the 14 biomarkers ( Supplementary  Fig. 18), we also compared a model with the biomarkers to a model combining the conventional risk factors and biomarkers to HR hazard ratio, CI conference interval, P P value, I 2 heterogeneity statistic, het heterogeneity, VLDL very low-density lipoprotein particle, HDL high-density lipoprotein. The statistics in this Table have been generated with the R-package meta using the survival analyses results from the individual cohorts as input  Table 2).
Reproducibility and validation of metabolic biomarkers. The reproducibility of the quantification of the 14 identified biomarkers, which was determined using previously generated inhouse NMR from the Leiden Longevity Study (LLS) offspring + partners and nonagenarians 13,14 , was very good (all r > 0.8, Supplementary Fig. 19). This, in combination with the previously published validation of some additional identified biomarkers with other techniques (i.e., the ratio of polyunsaturated fatty acids to total fatty acids and the concentrations of albumin and GlycA) 10,15,16 , and the provided data on the consistency of the identified mortality-related small molecules (i.e., the concentrations of glucose, lactate, histidine, isoleucine, leucine, valine, and phenylalanine) as measured with other widely used metabolomics platforms (i.e., Metabolon and Biocrates, Supplementary Table 3) show evidence of high analytical consistency with other biomarker assays, providing confidence that our findings should be reproducible when the metabolic biomarkers would be measured using other metabolomics platforms or techniques.

Discussion
By performing high-throughput metabolic biomarker profiling in 44,168 individuals from 12 cohorts, we identified a set of 14 biomarkers independently associating with all-cause mortality. The associations of these biomarkers were consistent in men and women and across age strata. The identified biomarkers represent general health up to the highest ages rather than specific diseaserelated death causes. In combination, these biomarkers clearly improve risk prediction of 5-and 10-year mortality as compared to conventional risk factors across all ages. These results suggest that metabolic biomarker profiling could potentially be used to guide patient care, if further validated in relevant clinical settings.
Our results show that the use of an affordable, well-standardized, and high-throughput NMR platform measuring multiple biomarkers leads to a high mortality risk prediction accuracy. We observed similar effects of the biomarkers on mortality in the cohorts using either EDTA plasma (Alpha Omega Cohort, ERF study, FINRISK 1997 cohort, DILGOM study, LLS nonagenarians, LLS offspring + partners, PROSPER, and Rotterdam Study) or serum (ALSPAC, EGCUT, KORA F4, and TwinsUK). In addition, the associations of the identified biomarkers with mortality are independent of the sex, age and cause of death of the individuals, and are thus unaffected by mortality crossover. Hence, in comparison to conventional risk factors, such as systolic blood pressure and total cholesterol, these biomarkers seem much more suitable for guided screening of older individuals at risk, as surrogate endpoint in clinical trials among older individuals, and for targeted prevention of mortality.
The 14 identified biomarkers are involved in various processes, such as lipoprotein and fatty acid metabolism, glycolysis, fluid balance, and inflammation. Although the majority of these biomarkers have been associated with mortality before, this is the first study that shows their independent effect when combined into one model. In comparison to the previous study by Fischer et al. 9 , we increased the sample size and number of deaths by fivefold and almost tenfold, respectively. This resulted in identification of more biomarkers (14 versus 4) and improved prediction accuracy. We were able to replicate the associations of all four biomarkers identified in the previous work. However, citrate was not included in our fully adjusted model, since this biomarker did not pass the multiple testing threshold. A possible explanation for this could be that one, or multiple, of the currently included biomarkers partially capture the effect of citrate, resulting in the attenuation of the association.
The total lipids in chylomicrons and extremely large VLDL and small HDL and the mean diameter for VLDL particles play a role in lipid metabolism and their association with mortality is likely caused by their involvement in the regulation of plasma triglyceride levels, a known risk factor for mortality 17 . The association of polyunsaturated fatty acids with different mortality outcomes has been attributed to its variety of actions, including its antiinflammatory properties and inhibition of atherosclerosis 18 . The association between postprandial glucose levels and mortality is likely attributable to a loss in glycemic control 19 , while the association of both albumin and GlycA with mortality has been attributed to their role in inflammation 16,20 . The association between the other identified biomarkers and mortality is less well described, although they all play a well-known role in health and disease [21][22][23] . Future studies should be performed to determine which health conditions are further reflected by the identified For two of the biomarkers, i.e., the total lipids in extremely large VLDL lipids and isoleucine, the direction of effect changes when adjusting for the remaining 12 biomarkers. This change is most likely due to the inclusion of GlycA and the two other branch-chain amino acids, i.e., leucine and valine, in the model. Adjusting for GlycA removes the correlated negative effect of the total lipids in extremely large VLDL lipids, while adjusting for leucine and valine removes the correlated positive effect of isoleucine, resulting in appearance of opposite associations for these biomarkers. A similar effect was observed by Fischer et al. 9 for VLDL diameter after inclusion of GlycA in their model. It would be interesting to see if a similar effect is also observed for other phenotypes using multivariate adjusted models.
A potential limitation of our study is that the number of biomarkers captured by our targeted NMR platform is only a fraction of the metabolites in the human serum 24 . More complete high-throughput metabolic biomarker platforms are available, but these are usually more expensive. The predictive accuracy of these more complete platforms may be compared to the one used in this study. Efforts to increase the number of identifiable biomarkers using inexpensive high-throughput metabolic biomarker platforms (e.g., NMR or liquid chromatography-mass spectrometry) will likely result in identification of many more mortalityassociated biomarkers and, hence, improved risk prediction.
Although we were able to show a good predictive ability of our biomarkers for mortality risk using two complementary methods (the C-statistic and IDI), the metabolic biomarker score constructed is not yet suitable for classification of patients in the clinic, since it is based on scaled biomarker values created separately for each cohort. Future efforts should therefore be focused on creation of a metabolic biomarker score that could be used for clinical research based on concentration units that could be generated using individual-level data.
In conclusion, we identified a set of 14 metabolic biomarkers that independently associate with all-cause mortality. A score based on these 14 biomarkers and sex leads to improved risk prediction as compared to a score based on conventional risk factors. This indicates that this affordable, well-standardized, and high-throughput NMR measurement may be used to generate a standard for risk assessment of mortality in the clinic. Such a score could potentially be used in clinical practice to guide treatment strategies, for example when deciding whether an elderly person is too fragile for an invasive operation. In addition, it may be used as a surrogate endpoint for clinical trials in older individuals, since showing (a reduction in) the total mortality endpoint is mostly not feasible due to the limited duration and number of cases in a regular clinical trial. The currently used metabolomics platform can be incorporated in ongoing clinical studies to explore its value, opening up new avenues for research to establish the utility of metabolic biomarkers in clinical settings.
The summary statistics of our primary survival meta-analysis have been made publically available in the BBMRI -omics atlas: http://bbmri.researchlumc.nl/atlas.  Table 1 and the Supplementary Methods.

Methods
We have complied with all relevant ethical regulations for work with human subjects. All participants provided written informed consent, and the studies were approved by the relevant institutional review boards.
Measurement of metabolic biomarkers. The metabolic biomarkers were quantified from EDTA plasma and serum samples using high-throughput NMR metabolomics (Nightingale Health Ltd., Helsinki, Finland). This method provides simultaneous quantification of routine lipids, lipoprotein subclass profiling with lipid concentrations within 14 subclasses, fatty acid composition, and various lowmolecular metabolites, including amino acids, ketone bodies, and gluconeogenesisrelated metabolites, in molar concentration units. Details of the experimentation and applications of the NMR metabolomics platform have been described previously 8,25 . Several of the metabolic biomarkers have already been validated with other techniques (i.e., routine clinical chemistry assays, gas chromatography, an enzymatic method, and/or mass spectrometry) 8,10,15,16,26 . Furthermore, the metabolic biomarkers measured using the Nightingale Health platform have been used in numerous published epidemiological studies (see https://nightingalehealth. com/publications for an overview). The genetic work based on the Nightingale Health platform also underscores that the labels given to the metabolic biomarkers are correct and are associated with biologically relevant and plausible genes [27][28][29] . For the analyses in this study we first used all 226 available measurements, including the highly correlated lipid subclasses and compositions (for a full list see Supplementary Data 2). Due to the high correlation among the measurements, the selection of independently associated biomarkers was based on a subset of 63 biomarkers to prevent overfitting. The selection of these biomarkers was based on previous studies using this platform and the list comprises the total lipid concentrations, fatty acid composition, and low-molecular-weight metabolites, including amino acids, glycolysis-related metabolites, ketone bodies and metabolites involved in fluid balance and immunity (Supplementary Data 2) 10,12 .
Statistical analyses. For each study, a value of one was added to all biomarkers containing zeroes (i.e., x + 1), which indicates the value was below the detection limit. Subsequently, all biomarkers were log-transformed and scaled to standard deviation units, separately per study. Similar to the previous study by Fischer et al., 9 a Cox proportional hazards model with age at blood sampling as the time scale, was used to determine the associations of the biomarkers with all-cause mortality. In addition, the basic models were adjusted for age at blood sampling, sex and studyspecific covariates that are related to demography and relatedness of the included individuals. Age at blood sampling was included in the model to make the results directly comparable with the age-stratified analyses (see below) in which the follow-up time of some individuals encompassed multiple age groups, so the age at sampling could have been before the age at start of the age group. To check for differences between sexes, we also performed sex-stratified analyses. >60 0.650 0.715 0.065 ± 0.014, P = 3.29 × 10 −6 11.9 ± 1.5%, P = 1.13 × 10 −14 The estimates for the risk scores were derived from the Estonian Biobank cohort. The conventional risk factor score, included sex, body mass index, systolic blood pressure, total cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, creatinine, smoking status, alcohol consumption, and prevalent diabetes, cardiovascular disease, and cancer. The metabolic biomarker score, included total lipids in extremely large very low-density lipoprotein particle (VLDL), total lipids in small HDL, VLDL diameter, ratio of polyunsaturated fatty acids to total fatty acids, glucose, lactate, histidine, isoleucine, leucine, valine, phenylalanine, acetoacetate, albumin, glycoprotein acetyls, and sex. IDI integrated discrimination improvement. The statistics in this Table have been generated with custom-made functions in R For the secondary analyses, we additionally adjusted the basic model for the 4 metabolic biomarkers previously reported by Fischer et al., 9 i.e., albumin, GlycA, the mean diameter for VLDL particles and citrate (step 1), as well as 11 of the independently associating biomarkers discovered in the current study, i.e., the total lipids in chylomicrons and extremely large VLDL and small HDL, the ratio of polyunsaturated fatty acids to total fatty acids, and glucose, lactate, histidine, isoleucine, leucine, valine, phenylalanine, and acetoacetate levels (step 2). To select the biomarkers used for adjustment in step 2, we performed a stepwise (forward-backward) procedure based on successive rounds of meta-analyses. In each round we added to the model the unselected biomarker that showed the lowest P value in the previous round of the stepwise procedure (forward step). Next, we removed biomarkers from the model if the previous step resulted in an increase of the P value above the threshold (backward step). We stopped the procedure once all unselected biomarkers showed a P value above the threshold in the working model. As threshold we used the Bonferroni-adjusted P value to adjust for multiple testing (see below). To test the combined effect of the 14 identified biomarkers, we also created a metabolic biomarker score. To this end, the log-transformed and scaled biomarkers were multiplied by their weights, based on the meta-analyses results (i.e., ln(hazard ratio) from Table 2), and subsequently summed.
For the age-stratified analyses, samples were divided into age groups of <60, 60-70, 70-80, 80-90, and >90 years. Some samples were used multiple times, since their follow-up time encompassed two, or even three, age groups.
To determine the predictive value of the identified mortality-associated biomarkers, we constructed four weighted risk scores. The weights for the risk scores were based on the Estonian Biobank cohort (Supplementary Table 4) to avoid overestimation. The selection of the Estonian Biobank as training set was based on the fact that this was the largest dataset in our study containing data on most conventional risk factors for mortality, with the exception of C-reactive protein. The selection of conventional risk factors was based on the previous study by Fischer et al. 9 using the same dataset. The first risk score contains the conventional risk factors (i.e., sex, body mass index, systolic blood pressure, total cholesterol, HDL cholesterol, triglycerides, creatinine, smoking, alcohol, prevalent diabetes, prevalent CVD, and prevalent cancer). The second risk score contains our 14 identified independent mortality-associated biomarkers plus sex. The third risk score contains the four previously identified mortality-associated biomarkers plus sex. The fourth score contains our 14 identified independent mortality-associated biomarkers and the conventional risk factors (excluding total cholesterol, HDL cholesterol, triglycerides, and creatinine, since they were also part of the Nightingale Health platform). Age at sampling was not included in the risk scores, since this was used as the time scale. The predictive ability of the weighted risk scores was tested in the FINRISK 1997 cohort. We used two measures to assess the predictive value of the risk scores: (1) C-statistics and (2) IDI 30,31 .
Biomarkers were considered significant when the P value was below the Bonferroni-adjusted threshold of 2.21 × 10 −4 (0.05/226), which takes into account that we tested 226 biomarkers. The P values for the difference between sexes and age strata were calculated using meta-analyses heterogeneity statistics (I 2 ). The survival analyses in the individual cohorts were performed using R and STATA/SE 11.2 (StataCorp LP, College Station, TX, USA), while the meta-analyses were performed using a fixed-effect model implemented in the R-package meta. The discrimination and reclassification analyses were performed using custom-made functions in R.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Code availability
We have provided the most important scripts that we used for the scaling of the metabolic biomarkers, single cohort analyses (for the Alpha Omega Cohort, as example), and meta-analyses as Supplementary Datas 5-7. The custom-made R functions used to perform the discrimination and reclassification analyses are available from the corresponding author upon request.