Triangulating evidence from longitudinal and Mendelian randomization studies of metabolomic biomarkers for type 2 diabetes

The number of people affected by Type 2 diabetes mellitus (T2DM) is close to half a billion and is on a sharp rise, representing a major and growing public health burden. Given its mild initial symptoms, T2DM is often diagnosed several years after its onset, leaving half of diabetic individuals undiagnosed. While several classical clinical and genetic biomarkers have been identified, improving early diagnosis by exploring other kinds of omics data remains crucial. In this study, we have combined longitudinal data from two population-based cohorts CoLaus and DESIR (comprising in total 493 incident cases vs. 1360 controls) to identify new or confirm previously implicated metabolomic biomarkers predicting T2DM incidence more than 5 years ahead of clinical diagnosis. Our longitudinal data have shown robust evidence for valine, leucine, carnitine and glutamic acid being predictive of future conversion to T2DM. We confirmed the causality of such association for leucine by 2-sample Mendelian randomisation (MR) based on independent data. Our MR approach further identified new metabolites potentially playing a causal role on T2D, including betaine, lysine and mannose. Interestingly, for valine and leucine a strong reverse causal effect was detected, indicating that the genetic predisposition to T2DM may trigger early changes of these metabolites, which appear well-before any clinical symptoms. In addition, our study revealed a reverse causal effect of metabolites such as glutamic acid and alanine. Collectively, these findings indicate that molecular traits linked to the genetic basis of T2DM may be particularly promising early biomarkers.

www.nature.com/scientificreports/ between the different markers in diagnosing T2DM is not optimal 1 , and their screening capacity for pre-diabetes is low 2 . While these markers are powerful predictors of the disease, they are far from perfect for the identification of individuals who are prone to develop T2DM. Early detection of individuals with high T2DM predisposition is important as non-pharmacological approaches (i.e. lifestyle changes) can reduce substantially (and at a reduced cost) the risk of developing T2DM 3,4 . Several predictive scores have been developed using clinical and genetic data (e.g. 5,7 .), but their performance is far from optimal. Since many metabolites and proteins are expected to be altered in pre-diabetic state, using different omics profiles in addition to the classical clinical, biological and genetic risk factors is expected to increase the prediction accuracy. Several cross-sectional and longitudinal metabolomic studies, focussing on blood samples using targeted approaches, have been initiated to identify candidate biomarkers of pre-diabetes, with a few exceptions employing untargeted approaches (e.g. a cross-sectional study of 115 T2DM individuals 8 ). In the population-based cooperative health research of Augsburg (KORA), 140 metabolites were quantified for 4297 participants and several metabolites altered in pre-diabetic individuals have been identified 9 . Using metabolite-protein network and targeted approaches on serum samples, Wang-Sattler et al. identified seven T2DM-related genes associated with these metabolites by multiple interactions with four enzymes. Lysophosphatidylcholine (18:2) and glycine were strong predictors of glucose intolerance, even 7 years before disease onset. These metabolites, in addition to sugar metabolites, acylcarnitines and other aminoacids, have been identified as predictors of T2DM also in the European Prospective Investigation into Cancer and nutrition cohort 10 . More recently, Padberg et al. described a metabolic signature that includes glyoxylate associated with T2DM and prediabetic individuals 11 . Wang et al. using longitudinal data on 201 incident T2DM cases, identified a signature of five branched-chain and aromatic metabolites for which individuals in the top quartile exhibited a five-fold higher risk to develop T2DM 12 . Particularly, a combination of three amino acids predicted future T2DM, with a more than five-fold higher risk for individuals in top quartile, suggesting that amino acid profiles could aid in diabetes risk assessment. These results were confirmed by a recent meta-analysis from 8 prospective studies on 8000 individuals, which found a higher risk of T2DM for isoleucine, leucine, valine and phenylalanine 13 .
By targeting serum carnitine metabolites on 173 incident T2DM cases among 2519 patients with coronary artery disease, Strand et al. demonstrated that trimethyl-lysine, g-butyrobetaine, as both precursors on free carnitine and palmitoyl-carnitine, predict long-term risk of T2DM independently of traditional risk factors 14 .
As another example, shotgun lipidomics was applied in a transversal study on plasma of pre-diabetic mice from different genetic backgrounds and revealed a group of ceramides correlated with glucose tolerance and insulin secretion 15 . These results were interestingly confirmed by quantitative analysis in the plasma of individuals from two population-based prospective cohorts showing that dihydro-ceramides were significantly elevated in the plasma of individuals who will progress to diabetes up to 9 years before disease onset 15 . Other studies have struggled to identify the contribution of individual metabolites and focused more on metabolome-wide prediction 16 , which are difficult to replicate.
The previously listed studies provide several important candidate metabolites to benchmark our experimental and modelling setup. Here, we used a subset of the CoLaus study intentionally enriched for T2DM incident cases to maximise discovery power of baseline metabolite levels being associated with developing T2DM at a later follow-up stage. We compared our findings with a similarly sized population-based cohort, DESIR, and also with bidirectional Mendelian randomisation using metabolite-and T2DM QTLs as instruments.

Methods
The CoLaus study. The CoLaus study (www.CoLau s-psyco laus.ch) is a population-based prospective study based on a single random sample of 6733 participants from the overall population aged between 35 and 75 living in Lausanne (10). The baseline survey was conducted between 2003 and 2006. Each participant was extensively phenotyped regarding personal, lifestyle and cardiovascular risk factors; extensive blood and urine characterization was performed, and over 500,000 SNPs were directly genotyped and a further 20.4 million imputed (with r2-hat > 0.3). The first follow-up was performed between April 2009 and September 2012; median follow-up time was 5.4 (average 5.6, range 4.5-8.8) years; it included 5064 participants, and the 5.5-year incidence of T2DM was 6.5%, with 284 incident cases. The second follow-up was performed between May 2014 and April 2017; median follow-up was 10.7 (average 10.9, range 8.8-13.6) years. In this study, we selected 262 T2DM incident cases at the first follow-up and 524 controls matched for sex, age and baseline glucose. For each case, two types of controls were selected: one with a very low risk of T2DM (as assessed by a multivariable T2DM risk score 17 ) and one with pre-diabetes (with a high-risk score, but no T2DM at the CoLaus second follow-up) (Fig. 1). Incident T2DM cases were defined as fasting glucose ≥ 7 mmol/L and/or presence of antidiabetic drug treatment and/or HbA1c ≥ 6.5%. The most important study characteristics are included in Table 1. All research was performed in accordance with relevant guidelines and regulations. The study protocols were approved by the Ethical Committee of the Canton de Vaud and all participants provided written informed consent.
The DESIR cohort. The prospective D.E.S.I.R. cohort is a 9-year follow up study of 2391 middle-aged European ancestry participants [18][19][20] . We analysed participants from a case-cohort design embedded within the larger cohort that includes 231 cases of incident T2DM and 836 participants randomly sampled from the entire cohort. Baseline and follow-up clinical characteristics of participants included in the training population are shown in 5 (see Table 1). T2DM was defined using one of the following criteria: use of glucose lowering medication, fasting plasma glucose [FG] ≥ 7 mmol/L, or glycated hemoglobin A1c [HbA 1c ] ≥ 6.5% (48 mmol/mol). Clinical and biological evaluations were performed at inclusion and after 3, 6, and 9 years, as previously described 21 .
All research was performed in accordance with relevant guidelines and regulations. All participants provided written informed consent and the study protocol was approved by the Ethics Committee for the Protection of Targeted metabolomics analysis. Plasma and urine samples collected at the baseline of the CoLaus cohort were processed for targeted metabolomics analysis as described elsewhere 22 . Briefly, metabolites were extracted from 100 µL of plasma or urine samples and Quality Control (QC) samples using a cold methanolethanol solvent mixture in a 1:1 ratio. After centrifugation at 14,000 rpm for 15 min, supernatant was recovered, evaporated and resuspended in 100 µL (for plasma) or 200 µL (for urine) of H 2 O:MeOH (9:1). 5 µL of the samples were analyzed by LC-MRM/MS on a hybrid triple quadrupole-linear ion trap QqQ LIT (Qtrap 5500, Sciex) hyphenated to a LC Dionex Ultimate 3000 (Dionex, Thermo Scientific). Analyses were performed in positive and negative electrospray ionization using a TurboV ion source. The chromatographic separation was performed on a Kinetex column C18 (100 × 2. In order to monitor the signal drift and system performance over time, and to avoid repeated thawing-freezing cycles of the study samples, quality control (QCs) surrogate samples were used. These QC samples were prepared in the same way and at the same time of the study samples from aliquots of a pool of human plasma or urine that was the same for all the analytical batches. QC samples were injected every 8 samples in both positive and negative modes.
The MS instrument was controlled by Analyst software v.1.6.2 (AB Sciex). Peak integration was performed with MultiQuant software v.3.0 (AB Sciex). The integration algorithm was MQ4 with a Gaussian smoothing of a half-width equal to 1.5 points. For plasma samples, the analysis was narrowed to the 124 and 48 metabolites that were detected in all samples with a noise percentage of 80% and a gaussian peak shape, in positive and negative modes, respectively. For urine samples, we detected 124 and 77 metabolites in positive and negative modes, respectively. In case of remaining missing values, they were replaced by the lowest value of the corresponding metabolite. To correct for batch effect, raw data were normalized with the dbnorm package 23 , by using the ber model.

Statistical approaches.
We performed logistic-and linear regression analysis to test for association between baseline metabolite levels and T2DM incidence and glucose level changes, respectively. We included the following covariates: family history of diabetes, smoking status, body-mass index (BMI), HDL cholesterol, triglycerides, insulin, glucose and HOMA measure at baseline. Since none of our association P values passed strict Bonferroni correction for multiple testing (P < 0.05/172) or FDR correction (P FDRadj < 0.05), we declared P values below 0.01 as suggestively significant.
Bi-directional metabolome-wide mendelian randomization. To explore the causal paths between metabolites and glucose and T2DM, we performed Mendelian randomization (MR), an instrumental variable method to distinguish correlation from causation in observational data 24 . The idea of MR is to use genetic variants as instrumental variables to attempt causal inference about the effect of modifiable risk factors, which can overcome some types of confounding and reverse causation.
We performed two-sample bidirectional MR. We tested whether genetically varying levels of a particular metabolite affect the risk for elevated glucose and T2DM (we call this MR) and whether genetically increased risk of T2DM or elevated glucose is associated with circulating levels of a particular metabolite (we call this reverse MR). The associations between the instrumental variables and the exposure and the outcome are estimated from independent studies.
To run MR for each metabolite, as instrumental variables, we used independent (pairwise r 2 < 0.01) significant (P < 1 × 10 -05 ) SNPs associated with the metabolite in study (Supplementary Table 1). Such data are from a large GWAS performed for 453 whole blood metabolites in 7824 European individuals 25 . To run the reverse MR, for glucose and diabetes we used as instrumental variables the independent (r 2 < 0.01) genome-wide significant SNPs (P < 5 × 10 -08 ) found by the GWASs performed on UKBB and the DIAGRAM Consortium for T2DM 26 , respectively (Supplementary Table 2).

Results
Study characteristics. Selected basic features of the CoLaus study are listed in Table 1. We selected 788 participants, including 263 T2DM incident cases at the first follow-up and 525 controls matched for sex, age and baseline glucose. Summary data are expressed either as counts (and percentage) for categorical variables and as median [interquartile range] or mean ± standard deviation for continuous variables.

Metabolites association scan.
Based on quality criteria such as sensitivity and peak shape, we detected 172 urine and plasma metabolites (MS) in the 788 selected CoLaus participants. The analytical samples were collected at the baseline and included 525 participants without diabetes and 263 participants who became diabetic over the following 10 years. For each metabolite, we ran logistic/linear regression analysis with diabetes incidence/change in glucose levels as outcome and family history of diabetes, smoking status, body-mass index (BMI), HDL cholesterol, triglycerides, insulin, glucose and HOMA measure at baseline, metabolite levels as Here, all the results are based on the first CoLaus follow-up which is the best powered for predictive analysis. Indeed, when we compared the effects estimated in the first and second follow-up we observed significant weaker effects in the second follow-up (P t-test = 1.38 × 10 -05 , see Fig. 2). It is logical that as more time passes, other factors emerge that may influence diabetes incidence, reducing the predictive power of baseline biomarkers.
The metabolome-wide association scan revealed seven metabolites associated with glucose change at suggestively significant level (P < 0.01) in the CoLaus study, see Table 2. When we meta-analysed metabolome-wide results from the CoLaus and DESIR studies, we similarly found leucine and four additional suggestively significant (P < 0.01) metabolites associated with glucose change ( Table 3). As DESIR is a 9-year follow up study where biological evaluations were performed every 3 years, we used the data collected in the second follow-up (after 6 years) to match as closely as possible the 5-years follow-up performed in CoLaus. Table 4 shows the significant results from the metabolome-wide MR approach. Among the 453 testable metabolites, genetically altered levels of one and six metabolites were found significantly associated with glucose and T2DM, respectively. These include betaine, mannose, lysine, and three phospholipid species.

Metabolome-wide mendelian randomization.
Applying the reverse MR, we found that the genetic predisposition to T2DM is associated with the levels of 12 metabolites.  Table 2. Metabolites associated with glucose change in the CoLaus cohort. For each metabolite we report its effect size on glucose in the CoLaus and DESIR cohorts, the combined P value and the forward and reverse causal effect on glucose and T2DM estimated by Mendelian Randomisation. "X" indicates missing value, i.e. when the metabolite was not available for the respective analysis. The ID for the Human Metabolome Database is indicated for each metabolite. www.nature.com/scientificreports/ None of the metabolites found significant by MR was associated with glucose in CoLaus (P > 0.05). By contrast, among the seven metabolites reported in Tables 2 and 3, three were testable with MR. While we did not observe any significant effect for glucose, T2DM showed a significant causal effect on valine (P = 0.003), leucine (P = 5.8 × 10 -05 ) and glutamate (P = 2.8 × 10 -06 ).

Discussion
Using mass-spectrometry targeted metabolomics analysis, we identified a panel of metabolites whose levels are associated with glucose changes before the onset of T2DM in the CoLaus Cohort. We replicated our findings in an independent study (DESIR), which reassuringly revealed five metabolites with combined P value below 0.01, including l-carnitine, leucine, and cortisol. In addition, we applied a metabolome-wide Mendelian Randomization (MR) approach which allowed us to confirm the causal effect of leucine on T2DM, but also to identify Table 3. Additional metabolites found significantly associated with glucose change after combining CoLaus and DESIR Cohort results. For each metabolite we report its effect size on glucose in CoLaus and DESIR cohorts, the combined P value and the forward and reverse causal effect on glucose and T2DM estimated by Mendelian Randomisation. The ID for the Human Metabolome Database is indicated for each metabolite.  Table 4. Metabolome-wide mendelian randomization results. The ID for the human metabolome database is indicated for each metabolite. *Indicates metabolites that are not identifiable in the existing databases. www.nature.com/scientificreports/ new reverse causal relationships between glucose/T2DM and metabolites, such as leucine, valine, glutamic acid, alanine and mannose. While we focused our analysis on predicting T2DM for a 5-years follow-up period, we observed that the effect of the potentially predictive metabolites diminished over time. This is unsurprising as risk factors change over time, hence more and more unknowns contributing to diabetes conversion accumulate with time. We have noticed, furthermore, that more than 80% (140/172) of the metabolite effects are positive, meaning that generally an increased level of metabolites represents a risk factor for diabetes. This observation needs to be considered with caution, since it might be due to a latent diabetes-associated confounding factor, which is linked to overall metabolite concentration. This explanation is rather unlikely since we accounted for metabolomic principal components in all association scan.
The confirmed metabolites have been repeatedly supported by various types of evidence in both human and model organisms. Individuals with obesity and T2DM have elevated levels of branched-chain amino acids (leucine, isoleucine and valine) 12,[27][28][29][30] . Such changes are already present before the onset of diabetes 12,13,31,32 and their causal role is believed to be exerted via the modulation of the mTOR pathway. Increased leucine levels can lead to insulin resistance via activation of the TORC1 pathway, with induction of beta cell proliferation and insulin secretion 12,33 and disruption of insulin signal in skeletal muscle 34 . On the other hand, insulin resistance enhances protein catabolism in skeletal muscle, which can increase the release of branched-chain amino acids 35 . Moreover, hyperglycemia negatively correlates with adipose tissue expression of genes involved in branched-chain amino acid oxidation, which can further contribute to raise the levels of BCAA 35 . Thus, so far it remains unclear whether the observed BCAA changes are only a consequence of hyperglycemia or if they have a causative role in the development of T2DM.
Another metabolite which is displaying a significant association with glucose changes in the CoLaus cohort is glutamic acid, although this is not replicated in the DESIR study. A link between increased glutamate levels and insulin resistance traits was already observed in multiple cohorts 28,32,36 , and more recently, a meta-analysis conducted in 18 prospective studies highlighted glutamate as positively correlated with T2DM 37 . Glutamate is a glucogenic aminoacid, which can enter into the Krebs cycle through its conversion to α-ketoglutarate. In addition, it can favour gluconeogenesis by increasing the transamination of pyruvate to alanine 38 , and can directly stimulate glucagon release from pancreatic α-cells 39 . However, its possible causal role remains controversial and several reports suggest that it is rather a reduced ratio between glutamine, a glutamate derivative, and glutamate itself, that is informative of metabolic risk 36,40 .
In our study, free carnitine, cortisol, phenylacetylglutamine and pantothenic acid also appeared as significantly associated to the development of T2DM, although only when our data were combined with the French cohort (CoLaus + DESIR). Carnitine esterification with fatty acids is required for the shuttling of the latter into the mitochondria for fatty acid oxidation. Lower levels of free carnitine were reported in diabetic individuals 14,41,42 , while changes of acylcarnitines and/or carnitine precursors were highlighted in several studies as indicators of prediabetes/T2DM 9,10,29 , although data are not always consistent. Our targeted LC-MS method included several of these acylcarnitines, such as acetylcarnitine, propionylcarnitine, and isovalerylcarnitine, but no significant association was found with subsequent development of T2DM.
Cortisol dysregulation has been linked to T2DM in cross-sectional and longitudinal studies 43,44 . More specifically, diabetic individuals present a flattened diurnal cortisol curve compared to non-diabetic ones 45 , with lower morning and higher afternoon and evening concentrations 46 . Interestingly, high levels of evening cortisol were also shown to be predictive of T2DM development in an occupational cohort 47 . Even though the mechanisms underlying this association are not completely understood, cortisol contributes to many metabolic processes that can potentially perturb glucose homeostasis 48 . Cortisol has a major role in raising glucose levels through gluconeogenesis activation. Moreover, it induces lipolysis, thus increasing the release of free fatty acids that may favour the impairment of glucose uptake. Of note, diabetes is considered a common complication in clinical states characterized by prolonged hypercortisolaemia, such as in Cushing disease 49 . However, the causality of this association remains to be determined. Phenylacetylglutamine is a nitrogenous metabolite almost exclusively derived from the gut microbiota conversion of phenylalanine 50 . Its accumulation is known to occur in uraemia and was shown to be increased in type 2 diabetic patients, particularly in association with renal damage 51,52 . Other findings with variable degree of evidence in our study have also solid corroborating literature. Sarcosine (N-methylglycine) is an intermediate and by-product in glycine synthesis and has been found to be a moderately strong (OR = 1.3) predictor of T2DM incidence 53 . More specifically, the addition of urine sarcosine to other established predictors of incident T2DM was shown to improve model performance and T2DM risk prediction in a cohort of 4164 patients with suspected stable angina pectoris. Diabetic individuals have higher circulating proline levels and, moreover, proline-induced insulin transcription impairment may contribute to the β-cell dysfunction observed in T2DM 54 .
Causal inference. Drawing causal inference is extremely difficult. While most human studies are observational and cross-sectional predominantly, only correlations are calculated between a disease status and the levels of various predictors. Such a simple measure cannot tease apart forward-, reverse causation or confounding. Longitudinal studies provide more specific directional link (called Granger causality) between a potential biomarker and disease outcomes. With its roots in differential equation modelling, an association between the baseline level of a predictor and the change of the outcome over time may imply a causal relationship. An orthogonal axis of evidence for causality can be provided by Mendelian randomization (MR), where exposure-associated genetic markers act as instruments to tease out the causal relationship between a potential risk factor and an outcome 24  www.nature.com/scientificreports/ very scarce due to the little overlap between the respective scientific communities. A pioneering work 55 in this aspect has shown good agreement between disease-to-biomarker MR results, observational correlation and longitudinal associations. However, their longitudinal association included exposure (adiposity) change regressed on outcome (metabolite level) change, which implies no directionality and is not the intuitive way to perform such analysis. Several MR studies have explored the causal effect of possible metabolite markers on T2DM. However, they mainly focused on classic blood lipid markers [56][57][58] and on aminoacids previously associated to T2DM, such as BCAAs 4,59,60 . Among the previously investigated metabolites we could test only 12 (2-methylbutyroylcarnitine, alanine, bilirubin, citrulline, glutamate, isoleucine, leucine, N-acetylglycine, phenylalanine, tryptophan, tyrosine and valine). Apart from replicating the association of leucine and valine, we have shown that leucine has bidirectional causal relationship with diabetes, with a larger reverse (diabetes to leucine) causal effect. Our finding is consistent with a previous report that suggested that insulin resistance might drive higher levels of circulating BCAAs 56 . Moreover, in our study, (predisposition to) diabetes has a significant causal effect also on glutamate and valine, while their direct effect on diabetes is not significant. Hence small molecules targeting these metabolites may be more effective for treating downstream organ damage of T2DM, such as cardiovascular disease, neuropathies or nephropathy. In line with this hypothesis, glutamate accumulation in the retina can cause neurotoxicity and the development of diabetic retinopathy 61,62 , even though the actual connection between plasma and retinal glutamate levels remains to be assessed 40 .
Interestingly, our metabolome-wide MR approach further highlighted new causal relationships between additional metabolites and glucose/T2DM. Betaine, for instance, which was previously found to have a protective role in T2DM 53 , appears to have a causal effect on glucose. Alterations of lysine levels were also already associated to T2DM risk 63 . Another interesting metabolite, which shows bidirectional causal effect on T2DM in our study, is mannose, a hexose with essential function for glycoprotein synthesis. Mannose was repeatedly found as associated with high glucose levels and with the development of T2DM in prospective studies 8,10,64,65 . Of note, this carbohydrate might be directly synthesized from glucose 66 , which could explain the causal effect of glucose itself and of T2DM on mannose levels, while the understanding of the opposite effect (metabolite on T2DM onset) requires more investigations. Finally, we found a significant causal effect of T2DM on alanine levels, in agreement with a recent MR report 59 . Increased levels of alanine aminotransferases (ALT), the enzymes catalysing the conversion of alanine to pyruvate and glutamate, were already associated to T2DM, and could therefore underlie the observed causal effect.

Strengths and limitations.
Our study has numerous strengths. First, is our use of two well characterized prospective cohorts (one for replication), whose participants have been followed longitudinally for more than 10 years. This approach allowed us to investigate potential biomarkers in blood samples collected when individuals were still free of diabetes. Second, the robustness of our targeted methods and of our results is evidenced by the fact that we confirm many previous findings. Third, we triangulate evidence by combining these longitudinal association results with other causal inference techniques, such as MR.
The major limitation of this study is the relatively low number of incident diabetes cases that we could analyse, which prohibited us from new discoveries with unequivocal statistical evidence. In the light of these findings, we recommend future research focussing more on untargeted metabolomic approaches better exploring the vast space of metabolite species and the investigation of other omics biomarkers in parallel.
Another limitation concerns the MR analysis: pleiotropic effects of the chosen genetic instruments may lead to biased estimates. They may be underpowered because the metabolite GWASs were performed in only 7824 individuals. In addition, the causal effect of metabolites without known QTLs cannot be investigated.

Conclusions.
Our study has confirmed most of the identified-to-date metabolites in a medium-sized longitudinal population-based study (enriched for incident cases) and provided complementary evidence from bi-directional MR. However, the quest for early metabolic biomarkers predicting the development of T2DM requires more research effort including larger studies in order to understand the potentially minute contributions of many circulating metabolites. www.nature.com/scientificreports/