Indolepropionic acid and novel lipid metabolites are associated with a lower risk of type 2 diabetes in the Finnish Diabetes Prevention Study

Wide-scale profiling technologies including metabolomics broaden the possibility of novel discoveries related to the pathogenesis of type 2 diabetes (T2D). By applying non-targeted metabolomics approach, we investigated here whether serum metabolite profile predicts T2D in a well-characterized study population with impaired glucose tolerance by examining two groups of individuals who took part in the Finnish Diabetes Prevention Study (DPS); those who either early developed T2D (n = 96) or did not convert to T2D within the 15-year follow-up (n = 104). Several novel metabolites were associated with lower likelihood of developing T2D, including indole and lipid related metabolites. Higher indolepropionic acid was associated with reduced likelihood of T2D in the DPS. Interestingly, in those who remained free of T2D, indolepropionic acid and various lipid species were associated with better insulin secretion and sensitivity, respectively. Furthermore, these metabolites were negatively correlated with low-grade inflammation. We replicated the association between indolepropionic acid and T2D risk in one Finnish and one Swedish population. We suggest that indolepropionic acid, a gut microbiota-produced metabolite, is a potential biomarker for the development of T2D that may mediate its protective effect by preservation of β-cell function. Novel lipid metabolites associated with T2D may exert their effects partly through enhancing insulin sensitivity.

Well-established lifestyle, metabolic and genetic factors are currently used for stratifying people at high risk of developing type 2 diabetes (T2D). However, the metabolic basis and early molecular events related to the onset of the disease are still poorly understood. Therefore, there is a need to utilize novel technologies to broaden this understanding, ultimately improving the potential for early prevention and reducing disease incidence.
Metabolomics enables the concomitant measurement of low-molecular weight metabolites such as nutrient intermediates, lipids, hormones and other signaling molecules, and may also provide new insights into the pathogenesis of T2D [1][2][3][4] . In particular, the non-targeted metabolite profiling, an approach that allows the hypothesis-free assessment of a wide spectrum of metabolites resulting from endogenous metabolism, dietary intake and gut microbial activity 5 , has the potential to broaden the possibility of novel discoveries related to the pathogenesis of T2D.
In the Finnish Diabetes Prevention Study (DPS) population 6 , which recruited participants with impaired glucose tolerance (IGT), the lower risk of developing T2D was associated with better insulin sensitivity (IS) and preserved β -cell capacity probably achieved by changing lifestyles 7 . Therefore, by applying a non-targeted metabolomics approach, our primary aim was to identify novel metabolites that may predict the risk of T2D. Moreover, we sought to investigate if these metabolites would modify two basic mechanisms of T2D, i.e. insulin secretion capacity or insulin sensitivity.

Results
Characteristics of the DPS participants. Participants who developed T2D (cases) did not differ in age and sex distribution from those who did not develop diabetes (non-cases) ( Table 1). However, cases were more obese and had more disturbances in insulin and glucose metabolism than non-cases at metabolomics sampling (1-year follow-up) ( Table 1).

Identified biomarkers and diabetes likelihood.
Differential metabolic signatures were associated with a higher or lower likelihood of developing T2D (Fig. 1, Supplementary Table S1). The most prominent differences were found in several phosphatidylcholine (PC) lipid species and in indolepropionic acid, which were both inversely related to likelihood of developing T2D (Fig. 1). In contrast, certain amino acids and bile acids were increased in individuals who developed T2D during the first years of the DPS study ( Fig. 1).
Lipid-related metabolites protect from T2D. Six PCs and four lysoPCs (LPCs), and one lysophosphatidylethanolamine (LPE) were among the metabolites significantly inversely associated with diabetes at FDR-P < 0.05 (Fig. 1). Most of these lipids contained at least one of the following three fatty acids as part of the metabolite: C15:0 (pentadecanoic acid), C17:0 (heptadecanoic acid) or C15:1 (pentadecenoic acid). The very-long chain fatty acid C22:6 (docosahexaenoic acid) and the long chain fatty acid C18:2 (linoleic acid) were also frequently part of these protective lipids (Fig. 1).
Amino acids and bile acid metabolites increase the likelihood of developing T2D. Several amino acids were significantly associated with T2D, for example phenylalanine and tyrosine (Fig. 1). The amino acids alanine, proline and isoleucine were also associated with increased risk of developing T2D (FDR-P < 0.05, Fig. 1).
Among the identified bile acids, the ones strongest associated with T2D are described in Fig. 1. They all nominally increased the likelihood of developing T2D (Fig. 1).
Sensitive and post-hoc analyses. In sensitivity analyses, we excluded the participants who developed T2D during the first year of the follow-up. We observed similar chances of developing T2D according to each of the metabolites as in the whole study sample, even though the significances were attenuated (Supplementary Table S2).
In post-hoc analyses, adjustments for confounding factors such as BMI, fasting and post-load glucose and insulin at metabolomics sampling, or sex in the logistic regression models did not change the direction of the associations, even though, except for models including sex, the FDR-P values lost significance for most of the metabolites (Supplementary Tables S3-S5).

Impact of lifestyle intervention on key metabolites.
We also looked at the interaction between each metabolite and DPS study group (lifestyle or control) in the logistic regression models for the metabolites most strongly associated with T2D likelihood at FDR-P < 0.05. Interestingly, the group-wise stratification strengthened the protective role of the odd-chain lipids in the intervention group, while the branched-chain amino acids were linked with a higher chance of developing T2D only in the control group (Table 2). However, we only found an interaction between the study group for two lipids, PC(18:1/22:6) and LPC(19:0), whose inverse association with T2D risk was stronger in the intervention group, and between the study group with both tyrosine and proline, whose direct association with T2D risk remained significant only in the control group (Table 2). For any of the other metabolites, including indolepropionic acid, we did not find any interaction with the study group that could modify its association with the chance of developing T2D (Table 2).
Indolepropionic acid associates with the course of insulin secretion during the follow-up in the DPS non-T2D cases. After identifying the 17 strongest putative metabolites associated with T2D ( Fig. 1), we examined whether these compounds may exert their effects by modulating IS (Matsuda ISI) or insulin secretion (DI 30 ). Among non-cases, indolepropionic acid was directly associated with a better DI 30 (β = 0.25 [0.06-0.44], P = 0.011) independently of the study group. However, indolepropionic acid was not associated with DI 30 in study participants who developed T2D during the follow-up (Fig. 2).
Among T2D cases, none of the metabolites were statistically significantly associated with DI 30 , but isoleucine, phenylalanine and tyrosine were inversely associated with Matsuda ISI during the follow-up (β = − 0.26 to − 0.40, P < 0.05; Table 3).
We next investigated whether insulin secretion, IS or their changes would modify the likelihood of developing T2D according to each of the metabolite mentioned above. Overall, the direction of the associations remained similar, but not all statistically significant, e.g. indolepropionic acid when adjusted for insulin secretion at follow-up (P = 0.09) and the amino acids tyrosine and isoleucine when adjusted either for IS (P = 0.27 and P = 0.40, respectively) or insulin secretion (P = 0.72 and P = 0.83, respectively).
Indolepropionic acid and lipid metabolites are associated with high sensitive C-reactive protein (hsCRP) levels. Due to the interrelation among gut microbiota, low-grade inflammation and T2D, we examined if circulating levels of hsCRP would be related with indolepropionic acid or the lipid metabolites associated with T2D risk and either DI 30 or Matsuda ISI during the follow-up.
Serum hsCRP was negatively correlated with indolepropionic acid at the time of sampling (r = − 0.23, P = 0.006), independently of BMI (P = 0.03), both fasting (P = 0.009) and 2-h glucose (P = 0.02) and study group (P = 0.006). Nevertheless, adjustment for hsCRP in the statistical analyses did not modify the association of indolepropionic acid with insulin secretion in non-T2D cases (β = 0.31 [0.09-0.51], P = 0.004).    Table 2. Top ranking metabolites associated with T2D in lifestyle and groups* and their interaction with study group**. * Refers to the association of the respective metabolite with T2D in the unadjusted logistic regression in each of the study group (Lifestyle; control). ** Refers to the interaction of study group (lifestyle or control) vs. metabolite in the logistic regression testing the association of the respective metabolite with T2D adjusted for the study group. Indolepropionic acid is associated with fiber intake. We next examined the association between the identified metabolites and the intake of relevant nutrients cross-sectionally at the time of serum sampling. Indolepropionic acid was the metabolite most significantly and consistently related to both total carbohydrate and fiber intakes (r = 0.28, P = 9.1 × 10 −5 and r = 0.23, P = 0.001; respectively). Overall, the majority of the lipids were negatively correlated with the intake of saturated fatty acids ( Fig. 3 and Supplementary Table S7).

Indolepropionic acid and T2D in two other independent population-based studies. In order
to test if the observed protective association of indolepropionic acid with the development of T2D could be repeated, we examined it in two population-based studies, METSIM 8 (Metabolic Syndrome in Men), and VIP (Västerbotten Intervention Program) 9 .
METSIM is a prospective population-based study in Finnish men. We analysed samples from baseline and follow-up of 110 randomly selected participants free of T2D at baseline from which 55 were diagnosed with T2D at the 5-year follow-up. At baseline, there was no cross-sectional association between indolepropionic acid and T2D risk (P = 0.72), but during the 5-year follow-up indolepropionic acid was lower in subjects who developed T2D than in those who remained non-diabetic (P = 0.027) and they had decreased level of indolepropionic acid during the follow-up whereas in non-cases it was increased (Fig. 4A). Moreover, an increase in indolepropionic acid during the 5-year follow-up was inversely associated with the likelihood of developing T2D (OR: 0.31 [0.12-0.76], P = 0.01). Both associations, however, did not remain after controlling for BMI at baseline or at follow-up (P = 0.06 for the association of indolepropionic at follow-up with T2D in both models) or after controlling for BMI changes (P = 0.20 for the association of changes in indolepropionic and T2D).
After finding a suggestive, yet significant inverse association between the change in indolepropionic acid level and the likelihood of developing T2D in Finnish men, we wanted to further analyse it in a larger independent Swedish population including both genders. VIP is a Swedish population-based prospective cohort, within which the diabetes registry DiabNorth has identified individuals with diabetes. A case-control study, BioDIVA (Biomarker Discovery and Validation) within this cohort, comprises 503 incident T2D cases and their individually matched healthy controls. In BioDIVA indolepropionic acid was about 15% lower in T2D cases than in their matched healthy controls at baseline (P = 0.0032) and it was negatively associated with T2D incidence (OR: 0.

Discussion
As summarized in Fig. 5, our findings show that indolepropionic acid, a gut microbial metabolite [10][11][12] , is associated with a reduced likelihood of progression to T2D in overweight individuals with IGT. In persons who did not develop diabetes within a 15-year follow-up, serum indolepropionic acid was associated with better preservation of β -cell function during the initial 7-year follow-up. Additionally, indolepropionic acid was directly associated with dietary fiber intake, suggesting a link between diet, intestinal microbiota, insulin and glucose metabolism and T2D risk. The suggestive protective role of indolepropionic acid was found also in another Finnish study, Metsim. Furthermore, these observations were replicated in a Swedish healthy population, as the baseline indolepropionic acid levels were associated with lower likelihood of future T2D, and likewise were correlated with dietary fiber intake. Interestingly, several novel PC species were also inversely associated to the development of T2D. Most of these lipids were associated with better insulin sensitivity in persons who did not develop T2D. Moreover, several amino acids and bile acids were associated with early development of T2D in line with previous studies 1,2,13,14 .
The putative protective effect of serum indolepropionic acid on the development of T2D may be explained firstly by its role in modulating incretin secretion from enteroendocrine L cells, more specifically, glucagon-like peptide (GLP)-1 15 . Incretin hormones, especially GLP-1, may play a critical role in the pathogenesis of T2D 16 . Secondly, indolepropionic acid has been shown to exert potent anti-oxidative stress capacity 17  possible role of this metabolite on protecting β -cell from damage associated with metabolic and oxidative stress, and possibly from amyloid accumulation 19,20 . Diet is a major factor influencing the composition and metabolism of the colonic microbiota, which can elicit a wide range of systemic effects [21][22][23] . In our study, higher serum indolepropionic acid was directly associated with the intake of total fiber, mainly originated from whole grains. We confirmed this observation by serum alkylresorcinol measurements 24 . Indolepropionic acid is a microbiota-produced deamination product of the amino acid tryptophan. The type of carbohydrate ingested and pH level can affect the production of metabolites as indole compounds by the intestinal microflora 10,11 . We hypothesize that high intake of dietary fiber and whole grain cereal products may change gut microbiota towards a higher production of indolepropionic acid, thereby promoting preservation of insulin secretion capacity. This hypothesis fits well with previous observational findings on the protective role of fiber 25 and low-fat high-complex carbohydrate diet for T2D 21 . The role of microbiota in e.g. efficient conversion of complex indigestible dietary carbohydrates into short-chain fatty acids and maintenance of gut microbiome carbohydrate fermentation seem to be important to maintain gut and systemic health 22,26 .
We identified several LPCs and PCs that were inversely associated with T2D incidence. The PCs are the major glycerophospholipids in eukaryotic cells and an essential component of cellular membranes. In animal models, LPCs have been reported to induce insulin secretion from pancreatic β -cells 27 , to directly activate glucose uptake by adipocytes and to lower blood glucose levels in models of Type 1 diabetes and T2D 28 . About half of the PCs  identified in our study were directly associated with IS during the follow-up and inversely correlated with total saturated fat intake and circulating levels of hsCRP. Therefore, the protective effect of these metabolites on T2D might occur at least partly through its influence on IS 29 and perhaps low-grade inflammation 30,31 .
The protective lipid metabolites in our study had mainly long-chain unsaturated and odd-chain fatty acids in their structure, e.g. 15:0 and 17:0. Odd-chain fatty acids in blood phospholipids are considered as biomarkers of dairy intake, although inconsistently, and have been related with reduced T2D incidence [32][33][34] . Gut microbiota is also related to the regulation of lipid homeostasis 35 , e.g. some lipid species, such as triacylglycerol containing odd-chain fatty acids, are linked to certain gut microbiota, and not necessarily to dietary fat intake 34 . In this regard, we found that indolepropionic acid was positively correlated with odd-chain fatty acid containing PCs, also suggesting that these metabolites could result from the metabolism of microbiota. Until now, only LPC18:2 and LPC17:0 have been reported in the literature to predict T2D 2,36-38 , and therefore our findings regarding the other lipid species are novel.
We also observed that certain serum amino acid metabolites already shown to be directly associated with T2D, insulin resistance and glucose metabolism 1-3,39-41 , increased the likelihood of T2D. Similarly, several metabolites identified as bile acids were related to the development of T2D and, overall, were positively correlated with the amino acids associated with T2D, especially with the ones putatively affecting IS. Interestingly, when addressing the likelihood of T2D separately in the control and lifestyle groups of the DPS, we found that in particular tyrosine and proline interacted with lifestyle, indicating that the deleterious metabolism resulting in their increase and consequent development of T2D can be diminished by lifestyle intervention that includes changes in body weight, exercise and the quality of diet.
Recently, the concept that bile acids can act as metabolic modulators of lipid and glucose metabolism has arisen [42][43][44] . Even though most of these bile acids lost their association with T2D after taking into account the effect of insulin and glucose metabolism, recent findings showed considerably increased values of most of these bile acids in plasma of T2D patients compared to healthy subjects 13 . In addition, the observed differences in the circulating bile acid levels in individuals at risk of T2D are likely attributed to the altered composition of gut microbiota 45 . Moreover, most of the top significant metabolites containing odd-chain fatty acids in our study were inversely associated with the bile acid metabolites related to increased T2D risk, therefore reinforcing that these lipid species may result from microbiota activity and not necessarily directly from dietary intake. Taken together, our results suggest that T2D is predicted by circulating metabolites reflecting gut microbiota composition and function. The observed inverse relationship of low-grade inflammation estimated by hsCRP with protective lipid metabolites and indolepropioinic acid support this view 46,47 .
Our study has some limitations. First of all, we did not have baseline samples available from the DPS study, and therefore we had to use the samples collected one year after the onset of the study. Additionally, insulin secretion and IS were not measured either by the hyperinsulinemic-euglycemic clamp or the intravenous glucose tolerance test (IVGTT). Instead, we used an IVGTT for validation of the indices 7 . Strengths of the present study include the well characterized and homogenous study population (obese, middle-aged individuals with IGT), and yearly measurements of insulin secretion and sensitivity estimates during a long period of follow-up of a carefully conducted lifestyle intervention study population. A particular strength of our study is that we were able to find a suggestive association between indolepropionic acid and the incidence of T2D among Finnish men in a small sub-sample of the Metsim study and finally to replicate the inverse association of indolepropionic acid with T2D risk in an independent study in a Swedish men and women. Furthermore, in that study indolepropionic acid was also associated with fiber intake. These results are convincingly suggesting a potential biological role for indolepropionic acid, and they highlight the importance of the explorative metabolite profiling approach in bringing out novel findings that can subsequently be addressed in focused examinations for replication and eventually validation.
In this study, we observed a link between diet -especially fiber, intestinal microbiota, insulin and glucose metabolism and T2D risk. We therefore propose that gut-microbiota derived indolepropionic acid is a compound that has a protective role concerning the development of T2D. The possible role of indolepropionic acid in mediating the association of preservation of β -cell function with lower risk of developing T2D, and of the specific lipid metabolites exerting their protective effects partly through enhancing insulin sensitivity and lowering inflammation require further investigation.

Methods
Study participants. The DPS was a randomized, controlled, multicenter study carried out in Finland between the years 1993 and 2001 (ClinicalTrials.gov NCT00518167). A total of 522 individuals with BMI > 25 kg/m 2 , age 40-64 years, and IGT based on the mean values of two 75 g glucose oral glucose tolerance tests (OGTT) and on WHO 1985 criteria were randomly allocated into either a lifestyle intervention or control group in five centers during 1993 to 1998 (Supplementary Fig. S1). After a mean four-year intervention (active study) period, the post-intervention follow-up was carried out with annual examinations. The DPS study design and methods have been reported in detail elsewhere 6,48 and are briefly described in the online supplementary methods. The study protocol was approved by the Ethics Committee of the National Public Health Institute of Helsinki, Finland. The study design and procedures of the study were carried out in accordance with the principles of the Declaration of Helsinki. All study participants provided written informed consent.
In the DPS the main end-point was diagnosis of T2D defined by the WHO 1985 criteria (plasma fasting glucose ≥ 7.8 or 2-h glucose ≥ 11.1) to be confirmed by a repeated positive OGTT and verified by a physician. At baseline and at annual visits, individuals completed a medical history questionnaire and a 3-d dietary record, and underwent physical examination including anthropometric measurements and an OGTT 6,48 . The completeness of the food records was checked by the study nutritionist during each study visit 6 . In the present study, we also measured total alkylresorcinols and C17:0/C21:0 ratio, biomarkers of whole grain intake or relative whole grain rye intake, respectively, according to Wierzbicka et al. 49 , in serum samples at 1-year follow-up.
After the intervention (active study) period, the post-intervention follow-up was carried out with annual examinations. The individuals free of T2D participated in the post-intervention follow-up study at least once 6 .
Study design. The present study was designed to include a selected subgroup of 200 participants from DPS (Table 1) who had fasting serum samples available from the first year visit of the active study period (1-year follow-up). These participants were either diagnosed with T2D during the first five years of the follow-up (N = 96; "cases") or remained free of T2D (N = 104; "non-cases") during the 15-year follow-up since the beginning of the DPS (Supplementary Fig. S1). The purpose of this exploratory design was to separate the extremes in terms of development of T2D within the follow-up to best characterize the early metabolic differences related to the increased risk of the disease. Laboratory determinations. Glucose and insulin levels were determined as previously described 7 . In the DPS, during 1993 to 1996 a baseline 2h-OGTT was performed (75 g oral glucose load) with fasting and 2-h samples for glucose and insulin and during follow-up visits starting from the middle of 1996, samples were also taken for 30 min insulin and glucose measurements 7 . High sensitive C-reactive protein was measured in fasting serum at metabolomics sampling using an IMMULITE ® 2000 Systems Analyzer according to the manufacturer instructions (Siemens Healthcare Diagnostics, Inc. Tarrytown, NY).

Calculations.
As surrogate indices of the first/early-phase insulin secretion and of peripheral IS we used the disposition index 30 (DI 30 ) and the Matsuda index of IS (Matsuda ISI), respectively, which were calculated as previously validated from an OGTT 7,50,51 .
In converters to T2D (cases) Matsuda ISI or DI 30 annual values were averaged from the available yearly measurements at years 2, 3, 4 and 5, and in non-converters (non-cases), the first two post study follow-up measurements were also included.
Non-targeted LC-MS metabolite profiling analysis. An aliquot (100 μ L) of stored (− 80 °C) fasting serum samples was mixed with 400 μ L of acetonitrile (ACN; VWR International, Leuven, Belgium), and mixed in vortex at maximum speed 15 s, incubated on ice bath for 15 min to precipitate the proteins, and centrifuged at 16 000 × g for 10 min to collect the supernatant. The supernatant was filtered through 0.2 μ m PTFE filters in a 96 well plate format. Aliquots of 2 μ L were taken from at least half of the plasma samples, mixed together in one tube, and used as the quality control sample in the analysis. Additionally a solvent blank was prepared in the same manner.
The samples were analyzed by the UHPLC-qTOF-MS system (Agilent Technologies, Waldbronn, Karlsruhe, Germany) that consisted of a 1290 LC system, a Jetstream electrospray ionization (ESI) source, and a 6540 UHD accurate-mass qTOF spectrometerry. The samples were analyzed using two different chromatographic techniques, i.e. reversed phase (RP) and hydrophilic interaction (hilic) chromatography. Data were acquired in both positive (+ ) and negative (− ) polarity. The sample tray was kept at 4 °C during the analysis. The data acquisition Scientific RepoRts | 7:46337 | DOI: 10.1038/srep46337 software was MassHunter Acquisition B.04.00 (Agilent Technologies). The quality control and the blank samples were injected after every 12 samples and also in the beginning of the analysis. The sample order of the analysis of the samples was randomized. Details on the technical procedures and parameters are described in the online supplementary methods. Data collection. Data  Only signals over compound height threshold of 2500 counts containing least with two ions were included in the compound list. Peak spacing tolerance for isotope grouping was 0.0025 m/z plus 7 ppm, with isotope model for common organic molecules. Data files (.cef-format) were exported to Mass Profiler Professional (Agilent Technologies) for peak alignment. After the first initial alignment, the data were combined in one.cef file, against which the original raw data was reanalyzed. For this recursive analysis, compound mass tolerance was±15 ppm, retention time±0.2 min and symmetric expansion value for chromatograms ± 10 ppm. Resulting compounds were re-exported to Mass Profiler Professional software for peak alignment and data cleanup. The number of collected metabolite features from RP(+ ), RP(− ), hilic(+ ), and hilic(− ) was 2775, 2905, 1871, 1056, respectively.
In the case of the BioDIVA cohort (See "Validation cohorts"), the data collection and deconvolution was performed with XCMS 52 . The R program based pipeline, 'batchCorr' was used for alignment correction and withinand between-batch signal normalization 53 . Selection of metabolites to be identified. To account for non-normal distributions, metabolomics data were transformed using rank-based inverse normal transformation. Logistic regression for comparisons between T2D cases and non-cases as the dependent variable and further adjusted for study group was applied. The P-values were adjusted for multiple testing using Benjamini-Hochberg false discovery rate (FDR) within each analytical approach. FDR-P < 0.05 was considered to be statistically significant and a P < 0.05, nominally significant.
We first ranked the metabolites, according to their statistical significance applying the cut-off FDR-P < 0.05, as explained in the methods above. Additionally, the relative difference in average peak area value between cases and non-cases had to be at least 5%. In order to further remove noise and insignificant signals, only metabolic features found in at least 30 cases or controls with relative peak area abundance of > 50000 counts were considered relevant. After this filtering procedure, the number of statistically significant metabolite features was 243, 176, 125, and 56, in RP(+ ), RP(− ), hilic(+ ), and hilic(− ), respectively. Notably, in the hilic(+ ) mode, altogether 53 signals were related to the hexose sugar, reflecting the disturbance of glucose metabolism related to T2D risk, and were not considered for further study. The differential metabolites were identified based on the MS/MS spectral comparison with pure standard compounds, or via search of the candidate compounds in the databases including the Human Metabolome database, METLIN, ChemSpider and SciFinder, and the results verified with the MS/ MS spectral features included in the databases or reported in earlier publications. In Supplementary Table 1 we present the metabolites associated with T2D that survived correction for multiple comparisons and their corresponding identification level.
For the metabolites that remained significant after the adjustments for multiple testing, ANCOVA models were applied to examine the effect of each metabolite on the averaged subsequent study follow-up years Matsuda ISI or DI 30 by T2D conversion group. Cases at 1-year follow-up were excluded for this purpose.
We also investigated the associations between metabolites and T2D by logistic regression models where we adjusted for the characteristics that differed significantly between cases and non-cases at serum sampling in order to control for confounding. Conditional logistic regression was used for testing the association between indolepropionic acid and T2D in BioDIVA study.
Cross-sectional correlations were calculated using Pearson's product (r) between metabolites and dietary intake or hsCRP. For the correlation analyses, the mean daily nutrient intakes were previously energy-adjusted 54 due to the high correlation between the intake of energy and each individual nutrients. A two-sided P value of < 0.05 was considered statistically significant for all described secondary analyses.

Validation cohorts.
In order to test if the results obtained in relation to indolepropionic acid could be replicated, two dietary studies were examined.
A random selection of 110 subjects participating in the prospective population-based METSIM cohort, which includes 10,197 Finnish men aged 45-73 years and examined in 2005-2010 (See Supplementary Methods online) 8 were analysed for the relationship between the metabolite indolepropionic acid and T2D development found in DPS. From these 110 participants who were free of T2D at baseline, 55 developed T2D (cases) and 55 remained free of T2D (control) during a mean of 5.9-year follow-up. Indolepropionic acid data derived from the non-targeted metabolomics analyses were available from baseline and follow-up. A written informed consent was obtained from all study subjects. The study was approved by the Ethics Committee of the University of Eastern Finland and Kuopio University Hospital. The study design and procedures of the study were carried out in accordance with the principles of the Declaration of Helsinki.
Additionally, a total number of 503 matched case-control pairs were included in the study BioDIVA, utilizing the DiabNorth diabetes registry to form a study nested within the Västerbotten Intervention Programme (VIP) cohort, which is one of the sub-cohorts of the Northern Sweden Health and Disease Study (NSHDS) 9 . Cases had a median follow-up time of 7-year before T2D diagnosis and were individually matched to healthy Scientific RepoRts | 7:46337 | DOI: 10.1038/srep46337 controls at baseline. A written informed consent was obtained from all study subjects. The study was approved by the regional ethical review board in Uppsala. The study design and procedures of the study were carried out in accordance with the principles of the Declaration of Helsinki.