Type 2 diabetes (T2D) is an increasingly prevalent metabolic disorder that causes serious micro- and macrovascular complications1. In Korea, the prevalence of diabetes and its accompanying cardiovascular disease burden have continuously increased as dietary habits become more westernized2,3. Therefore, identifying novel risk factors of T2D along with well-known factors such as insulin resistance or insufficient insulin secretion is important because proper screening can lower or delay T2D development.

Unlike genes or proteins, metabolites are biomarkers of the biochemical activity and are closely related to clinical phenotypes4. They can also serve as a good indicator of enzyme activity resulting from biological process of transcription and translation, allowing the monitoring of systemic changes in biological systems. Accordingly, metabolite analysis provides a functional readout of the physiological state of phenotypes, revealing previously undetected biological mechanisms that underlie diseases and metabolic pathways5. Therefore, through metabolic profiling, we can identify individuals or populations at a high risk for developing T2D and seek methods of controlling the T2D occurrence.

Several studies have confirmed the association between metabolites and new-onset T2D. In a recent meta-analysis of eight prospective studies, several blood amino acids (leucine, valine, tyrosine, and phenylalanine) were positively associated with T2D risk, whereas others (glycine and glutamine) were negatively associated with T2D risk6. The European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort identified serum metabolites such as hexoses, phenylalanine, and diacyl-phosphatidylcholines (C32:1, C36:1, C38:3 and C40:5) to be potential predictors of incident T2D7. Another prospective study in the Framingham Offspring Study cohort with a 12-year follow-up revealed that amino acids, namely isoleucine, leucine, valine, tyrosine, and phenylalanine, either singly or in combination increased the odds ratio for developing future diabetes8. Through similar approaches, a nested case–control design by the Framingham Heart Study cohort suggested that metabolite 2-aminoadipic acid is a strong marker of T2D onset9.

Although blood metabolite profiling of T2D pathogenesis has accumulated much knowledge in the western population7,8,9, efforts to identify predictive metabolites of new-onset T2D in the Asian population are limited. A recent metabolic analysis reported a distinctive metabolic signature, including palmitoylcarnitine, for incident T2D in the Chinese population10. Untargeted metabolomics analysis suggested a combination of six metabolites including proline, glycerol, aminomalonic acid, lysophosphatidylinositol (LPI) (16:1), 3-carboxy-4-methyl-5-propyl-2-furanpropionic acid and urea to have potentials to predict the development of T2D in Chinese population with high risk of T2D11. Another targeted metabolomics analysis identified 37 metabolites including LPI (16:1) and dihomo-γ-linolenic acid that are associated with incident T2D in a subset of the Singapore Chinese Health Study cohort12. Previous studies have identified several T2D-associated metabolites in Korean populations, but these studies were not prospective13,14. Therefore, the present study aims to identify the metabolites related to incident T2D in large populations by studying a prospective cohort in Korea. Whether the dietary pattern is related to the observed association between metabolites and incident T2D is also discussed.

Subjects and Methods

Study participants

The study population was assembled through the Korean Genome and Epidemiology Study (KoGES) (the Ansan–Ansung study). The procedure and design of the KoGES-Ansan and Ansung cohorts are described elsewhere15,16. Briefly, 10,030 individuals aged 40–69 years living in the Ansan (urban) and Ansung (rural) districts were recruited as the baseline in 2001–2002. The aim was to construct a genomic and epidemiologic database for examining the genetic and environmental effects on disease prevalence in Korean. The participants attended questionnaire-based interviews in a community clinic, where they were questioned on their sociodemographic information, lifestyle, health condition, and medical history. They were also subjected to anthropometric measurements and clinical examination, and biennial follow-up examinations. Our dataset was obtained from the second follow-up (2005–2006) of the KoGES study. Among the 7,515 participants, we selected 2,580 participants whose metabolite information was available. After excluding subjects with no data on diabetes (n = 20), preexisting cancer (n = 33), diabetes (n = 537), and cardiovascular diseases (n = 51) at the time of enrolment in the study, 1,939 subjects were recruited. Patients with cancer, diabetes, and cardiovascular diseases were excluded because their medical treatments could alter their metabolic profiles (Fig. 1). All study participants gave their informed consent. The study protocol was approved by the Institutional Review Board of the Korea Centers for Disease Control and Prevention17,18, and by the Institute Review Board at the Korea University (KU-IRB-16-EX-272-A-1). All experiments were performed in accordance with relevant guidelines and regulations.

Figure 1
figure 1

Flow chart of study population.

General data and anthropometric and biochemical measurements

The demographic and behavioral information of the participants (i.e., information on age, sex, physical activity, cigarette smoking, and alcohol consumption) was obtained from survey questionnaires administered by trained interviewers. The income level (monthly household income) was divided into lowest (<1 million Korean won), lower-middle (100–199 thousand Korean won), upper-middle (200–399 thousand Korean won), and highest (≥4 million Korean won). Here 1,000 Korean won approximately corresponds to 0.9 US dollars. Smoking and drinking statuses were classified into three categories: never, former, and current. Participants were asked how long they had participated in five types of activities (sedentary, very light, light, moderate, and intense physical activities). The total metabolic equivalents (METs) were calculated by summing the METs during each type of activity (1.0 for sedentary, 1.5 for very light, 2.4 for light, 5.0 for moderate, and 7.5 for intense)19. Height and body weight were measured to the nearest 0.1 cm and 0.1 kg, respectively, while wearing light clothes without shoes. The body mass index (BMI; kg/m2) was calculated as the weight divided by the squared height. Blood pressure was repeatedly measured by a trained technician using a mercury sphygmomanometer. The participant assumed a sitting position and the blood pressure was read from the left and right arms, with a 5-minute rest between the two readings. The measurements were recorded to the nearest 2 mmHg and averaged to obtain the systolic and diastolic blood pressures. Hypertension was defined as a systolic blood pressure of  ≥140 mmHg or a diastolic blood pressure of ≥90 mmHg, a doctor-diagnosis of hypertension by the participants, or the taking of anti-hypertensive medication. To assay their fasting plasma glucose (FPG; mg/dL), fasting plasma insulin (FPI; μIU/mL) and triglycerides (TG; mg/dL), participants were required to fast for at least 8 h before providing a blood sample. The assays were performed by an automatic analyzer (ADVIA 1650 and 1680, Siemens, Tarrytown, NY, USA). Glycosylated hemoglobin (HbA1c) was measured by high-performance liquid chromatography (Variant II, Bio-Rad, Hercules, CA, USA). Homeostasis model assessment of insulin resistance (HOMA-IR) was calculated as follows20: HOMA-IR = [FPI (μIU/mL) × FPG (mg/dL)]/450.

Metabolite measurements

The serum metabolites in the 2,580 subjects were quantitatively analyzed by a targeted metabolomics approach using the AbsoluteIDQTM p180 kit (BIOCRATES Life Sciences, Innsbruck, Austria), which contains 40 acylcarnitines, 21 amino acids, 19 biogenic amines, 1 hexose, 90 glycerophospholipids, and 15 sphingolipids. The experimental procedure and sample measurements in the present study are described in detail elsewhere13,14. Briefly, 10 µL of serum was aliquoted on a 96-well plate with a filter and then centrifuged for 2 min at 100 × g. After extraction of metabolites, the extracts were analyzed by flow injection analysis/tandem mass spectrometry (FIA–MS/MS) for the analysis of acylcarnitines, hexose, glycerophospholipids and sphingolipids in both positive (acylcarnitines, glycerophospholipids, and sphingolipids) and negative (hexose) ion mode. Amino acids and biogenic amines were quantified by using liquid chromatography/tandem mass spectrometry (LC–MS/MS) in positive ion mode. Internal standards labeled with stable isotopes, such as 13C or 15N, were used as reference for the identification and quantification of all metabolites provided with the AbsoluteIDQTM p180 kit. The metabolite concentration measurement in uM units was automatically carried out with the MetValTM software package (BIOCRATES Life Sciences AG, Innsbruck, Austria), and quality assurance was performed for the quality assessment using calibration standards and QC samples included on each plate as well as reference standards as normal human pooled serum. Data quality of each metabolite was checked based on the following criteria: (1) the coefficient of variance for each metabolite in the reference standards <25%, (2) half of the analyzed metabolite concentrations in the reference standards > limit of detection, and (3) half of the analyzed metabolite concentrations in the experimental samples > limit of detection. After excluding 51 metabolites that failed the quality criteria, we finally used 135 metabolites (13 acylcarnitines, 21 amino acids, 10 biogenic amines, 1 hexose, 78 glycerophospholipids, and 12 sphingolipids) for the present study.

Dietary assessment

Dietary assessments were collected through a validated semi-quantitative food frequency questionnaire (FFQ), which records the consumption frequencies and portion sizes of 103 food items21 (see the previous study for details). The frequency of dietary consumption was divided into nine categories: never or seldom, once a month, 2-3 times a week, 1-2 times a week, 3-4 times a week, 5-6 times a week, once daily, twice daily, or more than three times daily. For the statistical analyses, the daily consumption frequencies were converted into weekly consumption frequencies. The serving size of each food item was categorized as small, medium, or large. Participants were also asked about the daily frequency of their meals: one meal a day, two meals a day, three meals a day, more than four meals a day, or irregular. From this information, we calculated the recommended food score (RFS), and hence measured the diet quality in the study population.

Recommended food score

RFS is a dietary score based on the consumption frequency of food items that are emphasized in the current dietary guidelines for Americans, following the methods of Kant et al.22. In this study, we modified RFS for consistency with the current dietary guidelines for Koreans23. Our RFS included 54 previously validated food items with the following slight modifications24,25; whole grain (two items; barley, steamed rice with barley), legumes (four items; steamed rice with cong, tofu, miso soup or soybean paste, soy milk), vegetables (20 items; green chilies, pepper leaves, spinach, lettuce, wild sesame leaves, chives, green vegetables, white radish, bellflower, onion, Chinese cabbages (excluding kimchi), cucumbers, soybean sprouts, carrots, pumpkin, young pumpkin, brackens, vegetable wrappings, oyster mushrooms, other mushrooms), fruits including juice (12 items; persimmons, tangerines, melons, bananas, pears, apples, oranges, watermelons, peaches, strawberries, grapes, tomatoes), fish (nine items; sashimi, hairtail, eel, yellow corbina, mackerel, mackerel pike, anchovy, tuna, fish cakes), seaweed (two items; laver, kelp), dairy products (three items; milk, yogurt, cheese), nuts (one item; peanuts or almonds or pine nuts) and tea (one item; green tea). One point was awarded for each food item consumed at least once a week. Participants were assigned an additional score of 1 if they ate three meals daily. Missing dietary variables including daily frequency of meals were imputed by fully conditional specification approach26. RFS was computed by summing the points for each of the 54 recommended food items listed in the FFQ, plus the daily-frequency-of-meals score. Therefore, the total score can range from 0 to 55, with a higher score indicating better diet quality.

Follow-up and incident T2D

Participants were followed up at two-yearly intervals. At each follow-up examination, the subjects were administered the questionnaire, FPG, and a 2-h oral glucose tolerance test. New-onset T2D was defined by at least one of the following criteria: self-reported diagnosed diabetes, treatment with a hypoglycemic medication, FPG levels of ≥126 mg/dL, or plasma glucose levels of ≥200 mg/dL after the oral glucose tolerance test18.

Statistical analysis

All continuous and categorical values were expressed as mean ± standard error and as number of counts, respectively. Differences among the modified RFS quartiles were determined by one-way variance and a general linear model with a Bonferroni’s multiple comparisons test, in which the possible confounding factors were sex, age, energy intake, metabolic equivalent, smoking status, household income, and educational level. The level of each metabolite was log-transformed and normalized to z scores with a mean of 0 and a standard deviation of 1. Some variables outside of the normal distribution were also log-transformed. Whether a metabolite was associated with new-onset T2D was determined by a multivariable Cox regression analysis. The association between the modified RFS and T2D risk was tested similarly. Finally, the selected metabolites (phosphatidylcholine diacyl [PC aa] 32:1 and hydroxysphingomyelin [SM(OH)] 22:2) and the dietary quality score were adjusted in the model, both alone and in combination. The hazard ratios (HRs) were obtained by a Cox proportional hazards regression model, adjusting for sex, age, BMI, educational level, household income, smoking status, drinking status, METs, total energy, consumptions of coffee, red meat and whole grain, and history of hypertension. The p value was corrected for the false discovery rate (FDR) by the Benjamini–Hochberg method (q < 0.05). The cross-sectional association between the metabolites and the modified RFS was evaluated by a multiple linear regression model, adjusting for age, sex, energy intake, METs, smoking status, drinking status, household income and educational level. All analyses were performed using Stata SE 13.0 (Stata Corp, College Station, TX, USA) with a two-sided p value of <0.05 signifying statistical significance.

Data availability

All data generated or analyzed during this study are included in this published article and its Supplementary Information files.


The average age of the 1,939 participants was 56.6 ± 0.2 years, and 892 (46.0%) were males. Table 1 displays the baseline characteristics of the study participants in the quartiles of modified RFS. The subjects in the highest modified RFS quartile were significantly younger, had a higher education and household income, smoked less and exhibited lower metabolic equivalent than subjects in the other groups. All nutrient intakes, including the percentage of macronutrients, significantly differed among the quartiles of modified RFS (Supplementary Table S1). After adjusting for confounders, high-density lipoprotein-cholesterol was significantly higher in the highest quartile of RFS.

Table 1 Baseline characteristics of the study population.

Prospective associations between metabolites and T2D risk

During the 8-year follow-up period, 282 incidents cases of new-onset T2D were identified. Figure 2 presents the metabolites prospectively associated with T2D risk after adjusting for covariates, with a strict correction for multiple comparisons. Twenty-two of the 135 metabolites were significantly related to T2D risk (with FDR-corrected p value of <0.05, Fig. 2 and Supplementary Tables S2). Levels of alanine [HR, 1.47; confidence interval (CI), 1.29–1.67], arginine (HR, 1.27; 95% CI, 1.12–1.45), isoleucine (HR, 1.35; 95% CI, 1.19–1.54), proline (HR, 1.26; 95% CI, 1.12–1.42), tyrosine (HR, 1.27; 95% CI, 1.12–1.45), and valine (HR, 1.46; 95% CI, 1.27–1.66) were positively associated with new-onset T2D. Among the biogenic amines, only spermine was negatively correlated with new-onset T2D (HR, 0.73; 95% CI, 0.66–0.82). Hexose sugar was independently related to new-onset T2D (HR, 1.76; 95% CI, 1.55–1.99). Among the choline-containing phospholipid compounds, lyso-phosphatidylcholine acyls (LysoPC a) C17:0 (HR, 0.79; 95% CI, 0.70–0.89), C18:2 (HR, 0.75; 95% CI, 0.66–0.85), PC aa C38:0 (HR, 0.80; 95% CI, 0.71–0.90), C40:1 (HR, 0.79; 95% CI, 0.70–0.90), and C42:1 (HR, 0.80; 95% CI, 0.71–0.90), along with phosphatidylcholine acyl-alkyls (PC ae) C34:3 (HR, 0.77; 95% CI, 0.69–0.87) and C36:3 (HR, 0.80; 95% CI, 0.70–0.90), were negatively related to new-onset T2D. In contrast, PC aa C32:1 (HR, 1.42; 95% CI, 1.25–1.61), C34:1 (HR, 1.32; 95% CI, 1.16–1.49), C36:1 (HR, 1.26; 95% CI, 1.11–1.42), C40:5 (HR, 1.33; 95% CI, 1.17–1.51), and C42:5 (HR, 1.24; 95% CI, 1.11–1.39) were positively associated with new-onset T2D. Two sphingomyelins (SM), SM(OH) C22:2 (HR, 0.72; 95% CI, 0.63–0.82) and SM C16:1 (HR, 0.79; 95% CI, 0.69–0.90), were negatively correlated with new-onset T2D (Fig. 2).

Figure 2
figure 2

Metabolites being associated with future Type 2 diabetes mellitus risk. Hazard ratios were obtained with cox proportional hazards regression model adjusting for sex, age, energy intake, body mass index, metabolic equivalent, smoking status, drinking status, household income, and education level, consumption of coffee, red meat, and whole grain, and history of hypertension. LysoPC a: lyso phosphatidylcholine acyl; PC aa: phosphatidylcholine diacyl; PC ae: phosphatidylcholine acyl-alkyl; SM(OH): hydroxysphingomyelin; SM: sphingomyelin.

Association between selected metabolites and biomarkers for T2D

Figure 3 illustrates the correlations between the metabolites associated with new-onset T2D risk and the established biomarkers for diabetes (HOMA-IR, HbA1C and TG). The metabolites associated with new-onset T2D were also significantly correlated with the diabetes indicators. The exceptions were PC aa C38:0 for HOMA-IR, and alanine, arginine, spermine, LysoPC a C17:0 and C18:2 for TG (Fig. 3).

Figure 3
figure 3

Correlation analysis plot of metabolites and biomarkers. *Correlation coefficients were obtained with partial correlation analysis adjusting for sex, age, energy intake, body mass index, metabolic equivalent, smoking status, drinking status, household income, and education level (p < 0.05). The value of biomarkers used in these analyses were log-transformed. LysoPC a: lyso phosphatidylcholine acyl; PC aa: phosphatidylcholine diacyl; PC ae: phosphatidylcholine acyl-alkyl; SM(OH): hydroxysphingomyelin; SM: sphingomyelin.

Associations among dietary score, selected metabolites, and incident T2D

Given that 22 metabolites were identified as predictors for incident T2D, we further tested the associations among habitual dietary quality, metabolites, and T2D incidence in this study population. As shown in Table 2, a healthier dietary pattern (as quantified by the modified RFS) was significantly negatively associated with T2D incidence (HR: 0.80 CI: 0.70–0.90) after adjusting for covariates. The cross-sectional associations between dietary score and the 22 selected metabolites are shown in Supplementary Table S3. After multiple linear regression analysis and covariate adjustment, two metabolites that were significantly associated with T2D incidence, namely, PC aa C32:1 and SM(OH) C22:2 (beta coefficients −0.28 and 0.26, respectively), were significantly associated with healthier dietary pattern (FDR-corrected p value of <0.05). We further tested whether the metabolite levels exert a mediating effect in the observed association between modified RFS and T2D incidence. The results showed that both modified RFS and metabolites significantly contribute to future T2D risk, although the magnitudes of the associations were more or less attenuated (Table 2).

Table 2 Risk of type 2 diabetes according to metabolites and diet quality score.


In the present study, various biological pathways, including amino acid and choline-containing phospholipid metabolisms, were altered in the incident T2D group of our community-based cohort sample. Recent Korean reports have implicated seven metabolites (hexose, valine, and five PC aas) in obesity and T2D. These metabolites are related to the fat mass and obesity-associated (FTO) genotype13. Altered metabolites associated with genetic loci were also identified in Korean T2D subjects14. Extending these previous analyses, which investigated only the cross-sectional association between metabolites and T2D, this study sought the metabolic profiles associated with incident T2D over an 8-year follow-up period.

Amino acids

Previous studies have identified the branched-chain amino acids (BCAAs) as among the most consistent and important metabolites in dysregulation of the peripheral glucose metabolism. BCAAs and aromatic amino acids have been implicated in insulin resistance27, obesity28, diabetes risk8,29,30 and coronary artery disease31. Similarly, increased isoleucine, valine, and tyrosine were predictive markers of future T2D risk in our present study, and were also correlated with metabolic markers of insulin resistance (HOMA-IR, HbA1C, and TG). The association between BCAAs and new-onset T2D might be explained by following mechanisms. First, the increased BCAAs may evoke catabolic materials (propionyl CoA and succinyl CoA), leading to accumulation of incompletely beta-oxidized fatty acids and glucose, impaired insulin effect, and (ultimately) disturbance of glucose control32. Second, the composition and derangement of the intestinal microbiota might induce diabetes development33. According to recent research, higher BCAA levels are associated with gastrointestinal microbiome patterns such as Prevotella copri and Bacteroides vulgatus species34, suggesting a role for the human microbiome in the association between BCAAs and T2D development. Our results imply that a healthier dietary pattern significantly reduces the likelihood of future T2D development. Thus, whether certain dietary factors interact with the potential BCAA-biosynthetic component of the gut microbiome pattern, thereby affecting BCAA levels, deserves further investigation. Finally, the mammalian target of rapamycin complex 1 (mTORC1) is important for insulin signaling and secretion and the proliferation of pancreatic beta cells. The primary regulator of mTORC1 signaling is leucine35. Long–term elevation of these BCAAs may contribute to the hyperactivation of mTORC1 and Jun N-terminal kinase signaling, presumably causing impaired insulin signaling, and subsequent early dysfunction and destruction of beta cells36,37,38. Despite the accumulating evidence that BCAAs are related to T2D pathogenesis, whether increased BCAA levels are a cause or consequence of T2D has not been elucidated. Very recently, the causal association between BCAA metabolism and T2D was clarified in a Mendelian randomization study, using genetic proxies as instrumental variables30. The authors identified the BCAA-raising polymorphisms by a genome-wide approach, and associated them with elevated T2D risk. This result supports a causal pathway of BCAA metabolism in T2D pathogenesis. In the present study, we found that higher levels of alanine and arginine increase the likelihood of incident diabetes. Alanine levels might increase when the metabolism is altered by glutamate turnover39, when alanine is released from sites other than skeletal muscle, and when alanine production is disturbed by BCAA catabolism, which is associated with increased T2D risk. Arginine plays multiple beneficial roles against metabolic abnormalities, but might also induce oxidative stress40. Collectively, we speculate that altered amino acid metabolisms predict the future risk of T2D through mechanisms involving impaired BCAA metabolism, increased oxidative stress, and increased muscle protein degradation, which precedes the development of T2D.

Biogenic amines

The biogenic amines spermine and spermidine are involved in several cellular processes, including DNA replication, RNA transcription, and protein biosynthesis41. They also trigger glucose-stimulated insulin release41; moreover, spermine is a possible glycation inhibitor42,43. Consistent with these studies, we found a negative association between blood spermine levels and new-onset T2D, emphasizing the importance of evaluating various metabolites, such as biogenic amines, in new-onset T2D prediction.


As expected, hexose level was positively associated with new-onset T2D in our present study, even after adjusting for confounding influencers of insulin resistance. Hexose comprises not only glucose, but all six-carbon monosaccharides. Increased hexose level may indicate pancreatic beta-cell dysfunction and insulin resistance. In previous studies, the six-carbon monosaccharide fructose was increased in T2D regardless of glucose level44, and higher intake of fructose (including sweetened beverages) was associated with insulin resistance and the risk of new-onset T2D45. Our result was consistent with the earlier results44,45, confirming that hexose metabolites are valuable for evaluating associations or predicting new-onset T2D in population-based samples.

Glycerophospholipids and sphingomyelin

Consistent with earlier studies6,7, we found a possible involvement of phospholipid metabolism in incident T2D among our Korean population. Specifically, two LysoPC metabolites (17:0 and 18:2), 8 PC aa metabolites (32:1, 34:1, 36:1, 38:0, 40:1, 40:5, 42:1, and 42:5), 2 PC ae (34:3 and 36:3), SM(OH) 22:2, and SM 16:1 were significant predictors of T2D risk in this prospective study. Phospholipids such as PC and SM are the dominant components of cellular membranes and play an important role in cellular signal transduction46. They also constitute most of the human lipoproteins47. The LysoPC 17:0 and 18:2 were negatively associated with incident T2D in our cohort. LysoPC a 17:0, found exclusively in dairy foods, is particularly effective in reducing T2D risk48, indicating the beneficial effect of regularly consuming dairy products. LysoPC is a signaling molecule involved in atherogenic and inflammatory processes49, of which the major function needs to be clearly elucidated. PC aa is essential for the hepatic release of TG-rich very-low-density lipoprotein46. On the other hand, PC ae is an antioxidant that inhibits lipoprotein oxidation50. Previous studies have shown that PC ae levels are reduced in obese or insulin-resistant subjects50,51. Furthermore, reduced SM synthesis is associated with increased levels of reactive oxygen species and reduced insulin secretion52. The significant contributions of LysoPC, PC, and SM to T2D development in the present study, and the correlations of these metabolites with blood phenotype (observed as expected), support the hypothesis that altered choline-containing phospholipid metabolism potentiates the development of T2D.

Several recent studies have also identified the metabolites associated with food intake, and related them to the metabolic diseases at the population level53,54,55,56,57,58. Untargeted metabolomics analysis in the subset of African Americans from the Atherosclerosis Risk in Communities (ARIC) Study cohort identified 48 pairs of significant association between diet and metabolites53. Specifically, ‘sugar-rich foods and beverages’ were associated with metabolites related to oxidative stress and lipid profiles53. Another approach to reveal the association between dietary pattern and plasma/urinary metabolites reported that O-acetylcarnitine and phenylacetylglutamine are positively associated with different food groups, red-meat and vegetable intakes, respectively54. Given that diet can be a primary and secondary (by induction of metabolic responses) source of metabolites57,58, we further tested the interrelationships among dietary quality, metabolites, and T2D incidence. In this study, only 22 metabolites were significant predictors of incident T2D. After controlling for covariates, multiple regression analysis revealed that a healthier diet was significantly associated with reduced PC aa 32:1 level and increased the SM(OH) 22:2 level. To test whether each of these two metabolites may act as a mediator in the observed negative association between dietary quality and T2D risk, both dietary score and metabolite were considered in the Cox proportional regression model. The result showed that each metabolite and dietary score remained significant in combination, although the HRs were somewhat attenuated. This indicates that a better diet quality, characterized by well-balanced nutrient intakes reduces the future T2D risk both dependently and independently of intermediate metabolites. Given that food intake and relevant nutrients possibly interact with the environment and the gut microbiome, and that these interactions would be reflected in the blood metabolites, the links among dietary intake, gut microbiota and metabolic biomarkers and their associations with T2D must be further investigated. Indeed, the consumption of PC-containing foods such as meat and eggs reportedly increases the T2D risk59, possibly through bacterial transformation of PC and consequent production of trimethylamine N-oxide60,61. In addition to the complexity and differences of etiology of T2D and environmental factors including diet, ethnic difference also should be considered to interpret metabolic profile in a specific population. Because the response to the specific stimulus (e.g. chemicals, nutrients, etc.) can be distinct depending on gender and ethnic group62, separate analysis in a certain population is required to apply the identified metabolites for planning the preventive strategy as well as for pharmacological targeting to treat T2D.

In conclusion, our study showed that metabolites relevant to amino acids, biogenic amines, spermine, hexoses, and choline-containing phospholipid metabolism are associated with incident T2D in a prospective community-based cohort study. Through targeted metabolite profiling, we elucidated the underlying biochemical pathway of T2D pathogenesis. However, many of the covered metabolites were choline-containing phospholipids with structural similarities and metabolic interrelationships, which hindered the comprehensive assessment of the metabolic alterations preceding T2D. Our results confirm that metabolites beyond the conventional T2D risk factors are important for preventing or controlling the development of new-onset T2D. With this knowledge, we can develop strategies for early intervention against average-risk T2D.