Diabetes-associated metabolites may aid the identification of new risk variants for type 2 diabetes. Using targeted metabolomics within a subsample of the German EPIC-Potsdam study (n = 2500), we tested previously published SNPs for their association with diabetes-associated metabolites and conducted an additional exploratory analysis using data from the exome chip including replication within 2,692 individuals from the German KORA F4 study. We identified a total of 16 loci associated with diabetes-related metabolite traits, including one novel association between rs499974 (MOGAT2) and a diacyl-phosphatidylcholine ratio (PC aa C40:5/PC aa C38:5). Gene-based tests on all exome chip variants revealed associations between GFRAL and PC aa C42:1/PC aa C42:0, BIN1 and SM (OH) C22:2/SM C18:0 and TFRC and SM (OH) C22:2/SM C16:1). Selecting variants for gene-based tests based on functional annotation identified one additional association between OR51Q1 and hexoses. Among single genetic variants consistently associated with diabetes-related metabolites, two (rs174550 (FADS1), rs3204953 (REV3L)) were significantly associated with type 2 diabetes in large-scale meta-analysis for type 2 diabetes. In conclusion, we identified a novel metabolite locus in single variant analyses and four genes within gene-based tests and confirmed two previously known mGWAS loci which might be relevant for the risk of type 2 diabetes.
Up to now numerous common risk variants (MAF ≥ 5%) for type 2 diabetes were identified by genome-wide association studies (GWAS)1, 2. However, they only explain a small proportion of the population variance1, 2. To address this missing heritability, current studies concentrate on rare variants with potentially strong effects. One study identified four low frequency and rare variants which were associated with risk of type 2 diabetes3. Furthermore, the CHARGE consortium used exome chip variants from 23 different studies and found a novel association of a low-frequency nonsynonymous variant in the GLP1R with type 2 diabetes by an initial exploratory analysis of fasting glucose and insulin and a subsequent analysis of significant results for association with type 2 diabetes4. Similarly, metabolites associated with the risk of type 2 diabetes may aid the identification of new type 2 diabetes risk variants5. GWAS of metabolomics datasets (mGWAS) identified several so-called genetically influenced metabotypes (GIMs)6,7,8,9,10 that show large genetic effect sizes explaining 10–20% of the observed variance in the metabolite traits11. In the current study population (EPIC-Potsdam), 14 single metabolites have been identified to be independently associated with risk of type 2 diabetes12. Two metabolite factors (principal components) largely explained the variance in metabolite traits and were associated with risk of type 2 diabetes in opposing directions12. Within this study, we aim to identify genetic variants which are associated with those previously identified diabetes-related metabolite traits12 and test for their association with type 2 diabetes risk. To do so, we firstly, investigated selected known mGWAS loci7, 8 for their association with diabetes-related metabolite traits (Fig. 1). Secondly, we conducted an exploratory analysis using HumanExome v1.1 Bead Array data to additionally explore the association of low-frequency (1% < MAF < 5%) and rare genetic variants (MAF < 1%) which were not captured by classical GWAS arrays (Fig. 1). Finally, we analyzed the associations between single genetic variants identified for being associated with diabetes-associated metabolite traits and risk of type 2 diabetes.
All primary analyses have been carried out within the EPIC-Potsdam study with up to 2932 participants within the diabetes analyses (Fig. 1). Results of exploratory analyses were replicated in the KORA F4 study with up to 2692 participants (Fig. 1). Baseline characteristics of both study populations can be found in Supplementary Table S1.
Eleven known mGWAS loci were associated with diabetes-related metabolite traits
We analyzed the association of 19 known mGWAS loci with 106 diabetes-associated metabolite traits (Fig. 1). These metabolite traits included single metabolites (n = 14), metabolite ratios defined on the basis of the observed metabolite network (n = 90) and two metabolite factors derived from PCA of diabetes-associated metabolites12.
Eleven of the 19 candidate SNPs from previous studies were associated with diabetes-related metabolite traits (P < 4.31 × 10−5) in EPIC-Potsdam (Supplementary Table S2). The SNP rs174547 (FADS1) was significantly associated with 36 single metabolite traits which primarily originate from the groups of acyl-alkyl-phosphatidylcholines and diacyl-phosphatidylcholines. Further statistically significant associations were found for rs9393903 (ELOVL2), rs7156144 (PLEKHH1), rs12641551 (ACSL1), rs272893 (SLC22A4/ OCTN1), rs603424 (SCD) and phosphatidylcholine ratios. Rs715 (CPS1), rs541503 (PHGDH) and rs1718306 (PAH) were associated with amino acids. Rs11158519 (SYNE2), rs364585 (SPTLC3) and rs603424 (SCD) were related to ratios of sphingomyelins. Previous reports for genotype-by-sex-interactions of rs715 (CPS1) indicating larger effects in women than in men were confirmed in EPIC-Potsdam (Supplementary Table S3)
Exploratory analysis using exome chip data identified one novel locus for diabetes-associated metabolite traits
We carried out an exploratory analysis of all common (MAF ≥ 5%), low-frequency (1% ≤ MAF < 5%) and rare variants (MAF < 1%) with an allele frequency above 0.5% (n ≈ 42.000) for association with 106 diabetes-related metabolite traits (Fig. 1). The analysis revealed twelve loci that were associated with one or multiple metabolite traits (Supplementary Table S4). As the exome chip includes previously published GWAS hits, some of our results from the exploratory analysis overlap with findings from our candidate gene studies by representing the same locus (FADS1, CPS1, SGPP1/SYNE2) or the same SNP (rs12641551(ACSL1), rs364585 (SPTLC3)). For overlapping results between candidate and exploratory analyses we calculated models including all associated SNPs (from candidate and exploratory analysis) at each locus for the top associated metabolite trait to identify independent signals (Supplementary Table S5). The exome chip variant rs174550 (FADS1) showed stronger associations with the PC aa C36:3/PC aa C36:4 ratio than the previously known mGWAS SNP rs174547 on the FADS1 locus (Supplementary Table S5). An intergenic exome chip variant on chromosome 2 (rs4672596) near the previously identified CPS1 locus was associated with levels of glycine and the ratio of glycine and serine (Supplementary Table S4). However, the previously identified mGWAS SNP rs715 showed stronger associations in analyses on glycine levels (Supplementary Table S5). One low-frequency variant from the exome chip (rs12881815 located within SYNE2 with MAF = 4.8%) showed significant associations with two sphingomyelin ratios (SM C16:1/PC aa C28:1 and SM (OH) C22:2/SM (OH) C14:1) (Supplementary Table S4). However, in analyses including all three variants on SYNE2 (Supplementary Table S5) with the SM C16:1/PC aa C28:1 ratio, the exome chip variant rs7157785 (SGPP1) showed stronger effects than rs12881815 (SYNE2) and stronger effects than the previously identified mGWAS SNP (rs11158519 (SYNE2)).
We aimed to replicate suggestive significant findings from EPIC-Potsdam (P < 1.64 × 10−7) in the KORA F4 study. Meta-analysis of results from both studies revealed six significant loci (P < 4.55 × 10−3) associated with one or multiple metabolite traits (Table 1). Among them was one novel locus, rs499974 (MOGAT2), that was associated with the ratio of PC aa C40:5/PC aa C38:5 (P = 6.88 × 10−15). The most deleterious variants identified within this study were located in APOE (CADD = 30.0) and REV3L (CADD = 32.0).
As previous studies observed that SNP-metabolite associations might be different between women and men, we analyzed SNP-by sex interactions (Fig. 1). Although two variants indicated significant interaction with sex in EPIC-Potsdam, these findings could not be confirmed in KORA F4 (Supplementary Table S6).
The relation of the identified SNPs with the metabolic network structures is depicted in Fig. 2.
Biologic pathway annotations
Not all variants could be linked to specific pathways within the KEGG database (Supplementary Table S7). However, with regard to the annotated gene functions plausible metabolic pathways were found for most variants. For example, FADS1 and FADS2 were linked to “Biosynthesis of unsaturated fatty acids” and FADS2 was associated with “alpha-Linolenic acid metabolism” and “PPAR signalling pathway”. MOGAT2 was connected to “Glycerolipid metabolism” and “Fat digestion and absorption”.
Gene-based analyses identified four genes associated with metabolite traits
Single SNP analyses of low-frequency and rare variants are usually considered underpowered in studies of moderate sample size like ours13. Therefore, we conducted two gene-based analyses to enhance the power of our study (Fig. 1). We identified ten genes associated with metabolite traits in SKAT or burden tests (Supplementary Table S8). However, only three genes (GFRAL, BIN1, TFRC) showed nominal significant associations (P < 0.05) with metabolite traits within KORA F4 (Table 2). The association for GFRAL included three low-frequency variants (rs147652095, rs115053739, rs146300118; all with MAF = 1.8%) while BIN1 and TFRC included only common variants (Supplementary Table S9). In a second step, we considered functional annotation and restricted the gene-based analysis to those exome chip variants which are predicted to be damaging (see Methods section). This analysis revealed six different genes in EPIC-Potsdam (Supplementary Table S10) of which, only one (OR51Q1) including two low-frequency variants (rs151161477 with MAF = 1.21% and rs58283839 with MAF = 1.57%) (Supplementary Table S11) was replicated in within KORA F4 (Table 2). By excluding the top variants from the gene-based analyses (see Methods section) we found that multiple low-frequency variants within GFRAL are involved within the association, while in OR51Q1, the association is driven by the top low-frequency variant (rs58283839) (Supplementary Table S12). However, none of the single SNPs within the genes was significantly associated with metabolite traits individually (Supplementary Tables S9 and S11).
Identified genetic variants were weakly associated with risk of type 2 diabetes and other traits
Among the genes identified to be associated with diabetes-related metabolite traits by gene-based analyses, none was associated with type 2 diabetes in gene-based analyses (Supplementary Table S13).
Based on the single SNP analyses, 16 independent loci could be identified from the candidate SNP analyses and exploratory analyses (Fig. 1).
None of the 16 SNPs associated with diabetes-related metabolic traits (predicted functions are summarized in Supplementary Table S14) was additionally associated with type 2 diabetes within the EPIC-Potsdam study (Table 3). Nevertheless, nine of them (rs541503 (PHGDH), rs715 (CPS1), rs272893 (SLC22A4/OCTH1), rs9393903 (ELOVL2), rs3204953 (REV3L), rs603424 (SCD), rs174550 (FADS1), rs499974 (MOGAT2), rs7157785 (SGPP1)) showed nominally significant association with diabetes in published data from bigger GWAS consortia for type 2 diabetes (up to 110.452 individuals) or exome array analyses (up to 75.670 individuals) of which two (rs174550 (FADS1) and rs3204953 (REV3L)) were significant after correction for multiple testing by FDR (Table 3). For only four out of the nine variants (rs541503 (PHGDH), rs715 (CPS1), rs603424 (SCD), rs499974 (MOGAT2)) the expected effect was directional consistent with the one which was actually observed within the DIAGRAM consortium or the GoT2D consortium2, 14, resulting in a directional consistency of 44.4% (P = 1) in the binomial test. In addition to the association with type 2 diabetes, one variant (rs174550 (FADS1)) was significantly associated with fasting glucose based on a large-scale meta-analysis of the MAGIC (n = 58.074)15 and two (rs715 (CPS1), rs1136001 (NTAN1)) were significantly associated with BMI within large-scale GWAS meta-analysis of the GIANT consortium (n = 253.288)16 (Table 3).
Within this study, we investigated the contribution of genetic variants on diabetes-related metabolic traits to get a deeper insight into the underlying biological processes and to identify novel risk variants for this polygenic disease. We tested findings from previous GWAS and found associations with diabetes-associated metabolites for variants located in FADS1, ELOVL2, PLEKHH1, SPTLC3, ACSL1, SCD, SLC22A4, OCTN1, PHGDH 7; CPS1 and PAH 8. Furthermore, our exploratory analysis identified additional genetic variants from the exome chip, which were located in REV3L, PNLIPRP2, NTAN1, APOE, MOGAT2, and SGPP1 and of which one locus (MOGAT2) was novel for metabolite traits.
Using diabetes-associated metabolites as intermediate phenotypes provides the possibility to investigate biomarkers which are more proximal to genes and biological pathways than complex disease endpoints, ensuring higher statistical power to detect genetic associations6.
To deal with multiple testing issues, we selected a reduced panel of metabolite traits using their already published network structure17. This approach further reduces the number of outcomes to be tested compared to studies which used all possible ratios - independent of their correlation structure. Although varying heritability estimates have been found for the analyzed diabetes-associated metabolites traits ranging from low (h2 = 3%) for phenylalanine18 to moderately to high (33% ≤ h2 ≤ 59%)18 or even higher e.g. glycine (h2 = 70%)19, individual metabolic profiles are long-term conserved20, supporting their suitability for studying pathways underlying disease development.
Despite the smaller sample size in our study compared to one meta-analysis which analyzed Biocrates metabolites in 7,478 individuals9, we identified one novel locus rs499974 (MOGAT2) which was related to a ratio between two diacyl-phosphatidylcholines (PC aa C40:5 and PC aa C38:5). This variant was previously mainly shown to be associated with lipid traits (HDL-c, total cholesterol, triglycerides)21. Probably due to the limited power of our analysis, none of the single SNP findings could be related to incident type 2 diabetes in EPIC-Potsdam. Nevertheless, two SNPs (FADS1, REV3L) which were associated with metabolite traits were significant in bigger consortia for type 2 diabetes, such as DIAGRAM2 or GoT2D14. Furthermore, according to data from the MAGIC consortium15, FADS1 was related to fasting glucose. In addition, we show that the rs174550 (FADS1) is stronger related to metabolite traits than the previously known mGWAS hit (rs174547; LD with rs174550: r² = 1; D’ = 1) and remained significantly associated with a ratio of phosphatidylcholines in analyses containing both SNPs. On a functional level, we cannot consider one or another as being more relevant for diabetes risk as the CADD scores are low for both variants (rs174550: CADD = 3.57; rs174547: CADD = 6.23). A recent study identified a functional multiallelic variant (rs174557; LD with rs174550: r² = 0.778; D′ = 1) located in an AluYe5 element in intron 1 of FADS1 which affects two transcription factor binding sites with opposing effects on FADS1 expression22. This newly described variant may represent the causal variant at the FADS1 locus. The FADS1-2 locus is linked to fatty acid metabolism and biosynthesis of unsaturated fatty acids. Fatty acids itself as well as the fatty acid desaturases which catalyse the desaturation of fatty acids are known to be associated with type 2 diabetes risk23. Variants within FADS1 and FADS2 encoding the fatty acid Δ5 desaturase and Δ6 desaturase, respectively, are located in a gene family (FADS1-2-3) and belong to the most frequently identified mGWAS hits in previous studies showing associations with levels of phosphatidylcholines and sphingolipids6, 7, 24. In EPIC-Potsdam, the minor allele of the variant rs174546 within FADS1 (LD with present FADS1 variant rs174550: r² = 1; D = 1) was associated with lower Δ5 desaturase and Δ6 desaturase activity23. While a genetically-determined higher Δ5 desaturase activity tended to be associated with lower type 2 diabetes risk after adjustment for the other estimated desaturase activity, a genetically-determined high Δ6 desaturase activity was significantly associated with higher risk23. As this gene cluster is characterized by high LD, it is difficult to assign effects of single genetic variants located at this locus to one or another desaturase activity. Therefore, observed associations for FADS1 in the present study might be attenuated by confounding by LD of FADS2 and vice versa.
The second finding was a missense variant REV3L (rs3204953, encoding p. Val3064Ile) belonging to the 0.1% most deleterious variants within the human genome (CADD = 32.0)25 which was associated with a ratio between tyrosine and methionine and showed significant association with type 2 diabetes in databases after correction for multiple testing. While this variant is common among Europeans (MAF = 17%), it is rather rare in other populations: Africans (MAF = 0.005), East Asians (MAF = 0.001)26.
According to the PhenoScanner database27, rs3204953 was associated with alanine to tyrosine ratio (p = 2.98 × 10−11)28, schizophrenia (0.07 for the effect allele (T); p = 2.14 × 10−5)29 and age at menopause (beta = 0.15 for the effect allele (T); p = 1.10 × 10−7)30 in previous GWAS. The latter trait is itself related to type 2 diabetes31. On the one hand, factors such as sex-hormones are discussed in this context32, on the other hand, early menopause might represent a marker for premature aging31 as it has been related to DNA damage repair30. In this context, REV3L encoding the REV3L (REV3 Like, DNA Directed Polymerase Zeta Catalytic Subunit), a specialized DNA polymerase33, was linked to the Fanconi anemia pathway in the KEGG database representing an essential pathway for the DNA repair of interstrand cross-links34. More specifically, this enzyme is involved in the translesion DNA synthesis, one essential step within the Fanconi anemia pathway, which represents one of the cellular mechanisms for DNA damage tolerance or post-replication repair34. For example, Singh et al. could show that the human REV3L maintains the integrity of the mitochondrial genome by affecting the mtDNA metabolism and that inactivation of Rev3 leads to mitochondrial dysfunction35. There is quite some evidence that DNA damage induced by oxidative stress is relevant in type 2 diabetes and might additionally represent a mechanistic link with cancer36; however, whether genetic markers in REV3L might play a role in this context needs to be studied.
We applied gene-based tests which have been shown to be better powered than single variant analyses13 and a decreased number of tests is carried out by grouping variants based on genes. In these analyses, we identified associations between metabolite outcomes and four genes (GFRAL, BIN1, TFRC and OR51Q1). However, none of the identified genes was linked to type 2 diabetes; neither in EPIC-Potsdam nor in previous studies. Although we found associations with the genes and metabolite traits, it is striking that none of the single variants located in the genes showed any association with the metabolites of interest which further impedes interpretation. Another issue is that compared to our single SNP findings, replication of the gene-based results in KORA F4 was less successful and some of the replicated genes included only common variants. Although, we checked the allele frequencies between both cohorts; still, some minor differences might have affected the gene-based results.
Even though our study identified some interesting candidates which were linked to relevant pathways of type 2 diabetes, all identified SNPs in the single SNP analysis were common variants and no low-frequency or rare variants were identified, which is plausible due to the limited power of our analysis.
In general, SNPs can be used to assess the causality of observed associations between biomarkers and complex traits such as type 2 diabetes by Mendelian Randomisation studies. However, Mendelian randomisation studies rely on a number of assumptions of which one is that “the genotype is associated with the outcome through the studied exposure only”37 or simply said that there are no pleiotropic effects of the SNP. However, many of the variants in our study, e.g. FADS1, SPTLC3 and APOE were associated with different metabolite traits in addition to other biomarkers of lipid metabolism. The observational design of our study does not allow to discriminate if those different effects are all in a causal chain or related via independent physiological mechanisms. Therefore, we did not investigate causal SNP effects of the analysed genetic variants and type 2 diabetes by Mendelian Randomization. However, if we compare our observations to the assumptions of Mendelian randomization, we have to acknowledge that our observations would be more in line with a non-causal relationship between specific metabolites such as certain ratios of diacyl-phosphatidylcholines (PC aa C36:3/PC aa C36:4 or PC aa C38:3/PC aa C38:4) and tyrosine to methionine ratio and type 2 diabetes, since expected and observed effect direction were inconsistent for SNPs in FADS1 and REV3L. Similarly, a recent study5 found inconsistent effect directions for two of four nominal significant variants in DIAGRAM data1 using intermediate metabolite traits to determine an expected direction of the SNP-diabetes association. However, with regard to FADS1, we have only tested the most significant metabolite ratios for their association with type 2 diabetes. Hence, other significant ratios might show diabetes associations with directional consistency.
In summary, we identified a novel metabolite locus (MOGAT2) in single variant analyses and four genes (GFRAL, BIN1, TFRC, OR51Q1) within gene-based tests and could show that two previously known mGWAS loci (FADS1 and REV3L) might be relevant for the risk of type 2 diabetes. Our findings do not clearly support the idea that specific diabetes-associated metabolite ratios (PC aa C36:3/PC aa C36:4, PC aa C38:3/PC aa C38:4 and tyrosine/methionine) may be causal traits for type 2 diabetes, but rather show that shared genetic influences on diabetes-related metabolite traits and type 2 diabetes itself exist.
Materials and Methods
The European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study consists of 27,548 participants recruited between 1994 and 1998 from the general population in Potsdam and surroundings38. The baseline examination involved a personal interview including questions on prevalent diseases, a self-administered questionnaire on socio-economic and lifestyle characteristics, interviewer-conducted anthropometric measurements and a blood sample collection39. We used a prospective case-cohort nested within the EPIC-Potsdam study, described in detail previously40. Briefly, a sub-cohort of 2,500 individuals was randomly selected from 26,444 participants who provided blood samples at baseline. Additionally, 849 incident type 2 diabetes cases were identified in the full cohort of whom 820 cases provided blood samples (Supplementary Fig. S1).
Participants with missing or implausible data on metabolite measurements (described in detail within the metabolite measurements section), prevalent diabetes, or participants with uncompleted follow-up were excluded, leaving 2283 individuals for analyses in the sub-cohort. Similar exclusion criteria were applied for type 2 diabetes cases, leaving 784 incident cases for analysis. Because the sub-cohort is representative of the full cohort at baseline, the sub-cohort included 73 individuals who developed incident type 2 diabetes during follow-up. Exclusions due to missing genotype data or outlying metabolite values were done separately in each analysis.
Cooperative Health Research in the Region of Augsburg study (KORA F4)
KORA (Cooperative Health Research in the Augsburg Region) is a research platform of independent population-based health surveys and subsequent follow-up examinations of community-dwelling adults living in the region of Augsburg in Southern Germany. Study design, sampling method and data collection have been described in detail elsewhere41. The KORA S4 survey (1999 to 2001) comprised 4,261 participants, 25 to 74 years old42. Of these, 3,080 subjects participated in the follow-up examination KORA F4 (2006 to 2008)43. The present study is based on a subsample of 2,818 participants of KORA F4 with Biocrates metabolomics data available.
All participants provided written informed consent. The EPIC-Potsdam study was approved by the ethics committee of the State of Brandenburg, Germany and the KORA study was approved by the ethics committee of the Bavarian medical association, Germany. All procedures were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
DNA-extraction, genotyping and quality control
The DNA has been extracted from buffy coats using the chemagic DNA Buffy Coat Kit special on a Chemagic Magnetic Separation Module I (PerkinElmer Chemagen technologies, Baesweiler, Germany) according to the manufacturer’s manual.
The selection of candidate SNPs (Supplementary Table S15) was based on previous studies available from the literature which were conducted within the KORA F4 study7, 8. Illig et al.7 reported 15 GWAS hits for metabolite traits which showed associations with the 14 diabetes-associated metabolite traits from EPIC-Potsdam. However, one finding from Illig et al.7 (rs7094971; SLC16A9) which was exclusively associated with phenylalanine levels and showed only a small explained variance in the linear model of 0.10 was not selected for our analysis. As Shin et al.8 found a stronger candidate for glycine levels, the hit reported by Illig et al.7 (rs2216405; CPS1) was replaced by the SNP rs715 (CPS1). Additional five candidates for glycine (n = 4) and phenylalanine (n = 1) levels were selected from Shin et al.8 based on a predefined significance threshold of <10−6. If more than one SNP was associated on a specific locus, the one showing the lowest p-value while considering LD structures between the variants was selected. By doing so we ended up with 19 candidate SNPs for the present analysis. If neither the respective variant nor a proxy SNP was present on the exome chip, the variants were genotyped within EPIC-Potsdam by KBioscience¸ Teddington, UK (http://www.kbioscience.co.uk) using KASP SNP genotyping system.
The Illumina HumanExome v1.1 Bead Array44 was used for genotyping of type 2 diabetes cases and the sub-cohort. Genotyping was performed in the Life and Brain Center in Bonn, Germany. Genotype calling and quality control (QC) were carried out using Illuminas GenomeStudio v2011.1 software suite. To improve genotype calling, all single nucleotide variants were re-clustered based on a cluster file from genotyping of 23000 women in the Women’s Genome Health Study45 genotyped with the Illumina HumanExome v1.1 Bead Array. This cluster file was generated according to the CHARGE Best Practices and Joint Calling Protocol, which was also used to derive the final dataset46. To improve the genotype calling for rare variants zCall with a threshold of 8 was applied47. Individuals with low call rate, discordant sex information (F value between 0.2 and 0.8), related or duplicated individuals (IBD > 0.185) and individuals with divergent ancestry were excluded from further analysis.
DNA was extracted from 9 ml EDTA-blood as described elsewhere48. Genotyping was done using the Illumina HumanExome v1.0 Bead Array, calling was carried out according to the CHARGE Best Practices and Joint Calling Protocol46.
Metabolite measurements in EPIC-Potsdam and KORA F4
Metabolite quantification of both studies, EPIC-Potsdam and KORA F4, was performed in the Genome Analysis Center at the Helmholtz Zentrum München. All samples have been measured using the AbsoluteIDQ TM p150 Kit (Biocrates Life Sciences AG, Innsbruck, Austria) in combination with flow injection analysis-tandem mass spectrometry (FIA-MS/MS). Serum samples of 10 µL serum were used to quantify 163 simultaneously, including 41 acylcarnitines (Cx:y), 14 amino acids, hexose (sum of six-carbon monosaccharides without distinction of isomers), 92 glycerophospholipids (lyso-, diacyl-, and acyl-alkylphosphatidylcholines), and 15 sphingomyelins. The method for sample preparation and measurement as well as the metabolite denomination was previously described in full detail49. All samples of EPIC-Potsdam have been processed in one batch50. The KORA F4 samples have been measured in three batches of approximately 1000 samples at three different time points with a recalibration of the equipment in between7, 51; therefore, all analyses calculated in KORA were further adjusted for batch effects. Metabolites showed an overall good reliability over a 4-month period, with a median intraclass correlation coefficient of 0.5750. To ensure valid measurements, metabolites below the limit of detection (n = 30) and those with very high analytical variance (n = 6) were excluded, resulting in 127 metabolites50. For the present analysis 34 diabetes-associated metabolites and 2 metabolic factors were selected12. Previous studies could show that the usage of metabolite ratios compared to single metabolites provides higher statistical power to detect significant associations with genetic variants52. Therefore, we selected all metabolite ratios out of the 34 diabetes-associated metabolites which were connected via one edge17 based on a metabolite network within EPIC-Potsdam build by Gaussian graphical modeling (Fig. 1)53. We included all single metabolites and 2 metabolites factors which were independently associated with type 2 diabetes (Fig. 1)12. Therefore, our analysis (Fig. 1) included 106 outcomes referred to as diabetes-related metabolite traits consisting of 90 metabolite ratios, 14 single metabolites and 2 metabolite factors = (list of metabolite traits in Supplementary Table S16).
Assessment of type 2 diabetes in EPIC-Potsdam
Systematic information sources for incident cases were self-reports of a type 2 diabetes diagnosis, type 2 diabetes-relevant medication, and dietary treatment due to type 2 diabetes during follow-up. Furthermore, we obtained additional information from death certificates or from random sources, such as the tumor centers, physicians, or clinics that provided assessments from other diagnoses. Although self-reports of type 2 diabetes were generally reliable, by including other sources of information, we even improved the completeness of case ascertainment. Once a participant was identified as a potential case, disease status was further verified by sending a standard inquiry form to the treating physician. Only physician- verified cases with a diagnosis of type 2 diabetes (International Classification of Diseases, 10th revision code: E11) and a diagnosis date after the baseline examination were considered confirmed incident cases of type 2 diabetes.
All data analyses were performed by using the software packages Statistical Analysis System (SAS) Enterprise Guide 6.1 with SAS version 9.4 (SAS Institute Inc., Cary, NC, USA), PLINK v1.0754 and R (version 3.1.0 (2014-04-10)).
Candidate analysis of known mGWAS hits
We selected 19 candidate SNPs that showed associations with diabetes associated metabolites in previously published mGWAS of metabolic traits (Fig. 1). Associations of the selected candidate SNPs (list in Supplementary Table S15)7, 8 with metabolite traits were investigated within the sub-cohort via linear regression models (additive genetic model; per copy of the minor allele) adjusted for age and sex using SAS version 9.4. Metabolite traits (except metabolite factors) were natural log-transformed to normalize the right-skewed distributions. After ln-transformation metabolite outliers of >4SD from the mean were excluded and metabolite traits were standardized (mean = 0; SD = 1). Correction for multiple testing was done by the Bonferroni method [significance threshold: 0.05/(19 × 61 independent metabolite traits) = 4.31 × 10−5]. The effective number of independent metabolite traits of 61 out of 106 was determined by using equation 5 of Li and Ji 200555 which is included in the matSpD R script available onlin56, 57.
Exploratory single variant association analysis was performed in PLINK v1.0754 and included all exome chip variants with MAF ≥ 0.005 (n~42,000 markers) as exposure and 106 metabolite traits defined above as outcome (Fig. 1). We considered a genomic control (GC) corrected p-value as exome chip-wide significant at P < 1.95 × 10−8 [ = 0.05/(42,000 polymorphic variants × 61 independent metabolite traits)]55. Suggestive significance threshold was defined as P < 1.64 × 10−7 [ = (1.00 × 10−5/61)]. Transformation of metabolite traits and outlier removal was identical to the replication analysis of previously identified mGWAS hits. We assumed an additive genetic model (per copy of the allele on the forward strand), adjusted for age at recruitment and sex. Identified SNPs with at least suggestive significance (P < 1.64 × 10−7) were replicated within the KORA study by applying the same linear models which were further adjusted for batch effects. Results for single variant analysis from EPIC-Potsdam and KORA F4 were pooled by meta-analysis using a random effects model with the R package meta 58 and significance was defined by correcting for eleven test (P < 4.55 × 10−3). Analyses for the top associated metabolite traits were performed to check for independent signals at each locus [Y = β1 SNP 1 + β2 SNP 2 (+β3 SNP 3 ) + β4 age + β5 sex + n]. Independently associated SNPs were selected for the diabetes analysis.
Additionally, we performed an exploratory interaction analysis for sex by modeling linear regression models adjusted for age and sex and including an interaction term (Y = β1 SNP + β2 age + β3 sex + β4 (sex × SNP) + n). Interaction terms with P < 1.64 × 10−7 were considered as suggestively significant and attempted to replicate in KORA F4. Gene-based analyses were performed with the R package SKAT 59. We applied the sequence kernel association test (SKAT-C) and burden-C test with the default options using “SKAT_CommonRare” function60 to test for associations between common and rare genetic variants in a gene and 106 metabolite traits (Fig. 1). Models were adjusted for age and sex. Correction for multiple testing was done with the Bonferroni method [suggestive significance threshold: 0.05/(number of genes with >1 variants (ranging from 7243 to 7332)) = 6.8 × 10−6 to 6.9 × 10−6]. Additional analyses considering only variants annotated as potentially damaging (based on the annotation list from the CHARGE consortium: column “sc_damaging”) were conducted. Genes showing suggestive significance in EPIC-Potsdam were replicated in KORA F4 considering p < 0.05 as replicated. Within genes including more than two variants (GFRAL, TFRC and OR51Q1), we removed the top variant(s) showing the lowest p-values in pooled single variant analyses from the gene-based analyses. By repeating the SKAT and burden-C test we determined the impact of the top variant(s) on the gene-based statistical significance. Variants were mapped to Ensembl annotation version 84 (GRCh37)26 and files provided by the CHARGE consortium46 were used for gene annotation.
Association with type 2 diabetes
We evaluated all replicated previously known and all novel exploratory identified genetic variants (n = 16) with regard to their association with type 2 diabetes risk within EPIC-Potsdam (Fig. 1). Associations between genetic variants and diabetes risk were evaluated in SAS version 9.4 using Cox regression modified for the case-cohort design according to the Prentice method61. Age was used as the underlying time scale in all Cox models, with entry time defined as the participant’s age at recruitment and exit time defined as the age at the end of follow-up based on the date of diagnosis, death, or return of the last follow-up questionnaire. The analysis was stratified for age at the baseline examination in one-year intervals. Cox models were further adjusted for sex and waist circumference.
Associations between the investigated SNPs and risk of type 2 diabetes as well as BMI and fasting glucose were looked up in the AMP-T2D Program14. This database provides information on genetic variants and their diabetes-association in consortia for type 2 diabetes using exome sequencing, exome arrays for low-frequency variants and arrays for common variants14. Additionally, GWAS meta-analysis results for 24 other traits are available14.
Correction for multiple testing was applied by using false discovery rate (FDR) according to Benjamini and Hochberg62. We defined the expected direction for the SNP-diabetes association based on the sign of the product between the SNP-metabolite trait association (e.g. rs3204953 (REV3L) → tyrosine/methionine) and the metabolite trait-type 2 diabetes association (e.g. tyrosine/methionine → type 2 diabetes risk). For the latter we selected either the metabolite trait showing the lowest p-value with the SNP or the one of the SNP-associated metabolite traits indicating significant association with type 2 diabetes risk adjusted for age and sex in EPIC-Potsdam. We tested for directional consistency between expected and observed direction by applying a binomial test in R.
Assessment of functionality of genetic variants
To estimate the relative pathogenicity of identified genetic variants, we used Combined Annotation Dependent Depletion (CADD) v1.325. This method integrates various annotations into a single measure (C score) for each genetic variant. The scaled C score ranges from 1 to 99 and ranks each variant in the genome relative to all possible 8.6 billion substitutions in the human reference genome based on the formula −10 × log10 (rank/total number of substitutions)25.
Bioinformatics annotation of genes at metabolite-associated loci
All identified SNPs were systematically linked to genes using a flanking region of at least 500 kb. A “GWAS to pathway” workflow method63 was then used to link these genes to potential pathways within the KEGG database64. We used the Taverna Workbench Bioinformatics 2.565 software to run the workflow.
Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet 44, 981–990, doi:10.1038/ng.2383 (2012).
Mahajan, A. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat Genet 46, 234–244, doi:10.1038/ng.2897 (2014).
Steinthorsdottir, V. et al. Identification of low-frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes. Nat Genet 46, 294–298, doi:10.1038/ng.2882 (2014).
Wessel, J. et al. Low-frequency and rare exome chip variants associate with fasting glucose and type 2 diabetes susceptibility. Nat Commun 6, 5897, doi:10.1038/ncomms6897 (2015).
Fall, T. et al. Non-targeted metabolomics combined with genetic analyses identifies bile acid synthesis and phospholipid metabolism as being associated with incident type 2 diabetes. Diabetologia, doi:10.1007/s00125-016-4041-1 (2016).
Gieger, C. et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet 4, e1000282, doi:10.1371/journal.pgen.1000282 (2008).
Illig, T. et al. A genome-wide perspective of genetic variation in human metabolism. Nature Genetics 42, 137–141, doi:10.1038/ng.507 (2010).
Shin, S.-Y. et al. An atlas of genetic influences on human blood metabolites. Nature Genetics 46, 543–550, doi:10.1038/ng.2982 (2014).
Draisma, H. H. M. et al. Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels. Nature Communications 6, 7208, doi:10.1038/ncomms8208 (2015).
Kettunen, J. et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nat Commun 7, 11122, doi:10.1038/ncomms11122 (2016).
Suhre, K. Metabolic profiling in diabetes. Journal of Endocrinology 221, R75–R85, doi:10.1530/joe-14-0024 (2014).
Floegel, A. et al. Identification of serum metabolites associated with risk of type 2 diabetes using a targeted metabolomic approach. Diabetes 62, 639–648, doi:10.2337/db12-0495 (2013).
Ladouceur, M., Dastani, Z., Aulchenko, Y. S., Greenwood, C. M. & Richards, J. B. The empirical power of rare variant association methods: results from sanger sequencing in 1,998 individuals. PLoS Genet 8, e1002496, doi:10.1371/journal.pgen.1002496 (2012).
AMP-T2D data base; T2D-GENES Consortium, GoT2D Consortium, DIAGRAM Consortium., http://www.type2diabetesgenetics.org/home/portalHome (21.07.2016).
Dupuis, J. et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet 42, 105–116, doi:10.1038/ng.520 (2010).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206, doi:10.1038/nature14177 (2015).
Krumsiek, J. et al. Network-based metabolite ratios for an improved functional characterization of genome-wide association study results. preprint, doi:10.1101/048512 (2016).
Shin, S. Y. et al. Interrogating causal pathways linking genetic variants, small molecule metabolites, and circulating lipids. Genome Med 6, 25, doi:10.1186/gm542 (2014).
Yet, I. et al. Genetic Influences on Metabolite Levels: A Comparison across Metabolomic Platforms. PLoS One 11, e0153672, doi:10.1371/journal.pone.0153672 (2016).
Yousri, N. A. et al. Long term conservation of human metabolic phenotypes and link to heritability. Metabolomics 10, doi:10.1007/s11306-014-0629-y (2014).
Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat Genet 45, 1274–1283, doi:10.1038/ng.2797 (2013).
Pan, G. et al. PATZ1 down-regulates FADS1 by binding to rs174557 and is opposed by SP1/SREBP1c. Nucleic Acids Res, doi:10.1093/nar/gkw1186 (2016).
Kröger, J. et al. Erythrocyte membrane phospholipid fatty acids, desaturase activity, and dietary fatty acids in relation to risk of type 2 diabetes in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam Study. Am J Clin Nutr 93, 127–142, doi:10.3945/ajcn.110.005447 (2011).
Demirkan, A. et al. Genome-Wide Association Study Identifies Novel Loci Associated with Circulating Phospho- and Sphingolipid Concentrations. PLoS Genetics 8, e1002490, doi:10.1371/journal.pgen.1002490 (2012).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics 46, 310-315, doi:10.1038/ng.2892 (2014).
Cunningham, F. et al. Ensembl. Nucleic Acids Res 43, D662–669, doi:10.1093/nar/gku1010 (2015).
Staley, J. R. et al. PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32, 3207–3209, doi:10.1093/bioinformatics/btw373 (2016).
Kettunen, J. et al. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nat Genet 44, 269–276, doi:10.1038/ng.1073 (2012).
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427, doi:10.1038/nature13595 (2014).
Day, F. R. et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat Genet 47, 1294–1303, doi:10.1038/ng.3412 (2015).
Brand, J. S. et al. Age at menopause, reproductive life span, and type 2 diabetes risk: results from the EPIC-InterAct study. Diabetes Care 36, 1012–1019, doi:10.2337/dc12-1020 (2013).
Ding, E. L., Song, Y., Malik, V. S. & Liu, S. Sex differences of endogenous sex hormones and risk of type 2 diabetes: a systematic review and meta-analysis. JAMA 295, 1288–1299, doi:10.1001/jama.295.11.1288 (2006).
Gibbs, P. E., McGregor, W. G., Maher, V. M., Nisson, P. & Lawrence, C. W. A human homolog of the Saccharomyces cerevisiae REV3 gene, which encodes the catalytic subunit of DNA polymerase zeta. Proc Natl Acad Sci USA 95, 6876–6880 (1998).
Kim, H. & D’Andrea, A. D. Regulation of DNA cross-link repair by the Fanconi anemia/BRCA pathway. Genes Dev 26, 1393–1408, doi:10.1101/gad.195248.112 (2012).
Singh, B. et al. Human REV3 DNA Polymerase Zeta Localizes to Mitochondria and Protects the Mitochondrial Genome. PLoS One 10, e0140409, doi:10.1371/journal.pone.0140409 (2015).
Lee, S. C. & Chan, J. C. Evidence for DNA damage as a biological link between diabetes and cancer. Chin Med J (Engl) 128, 1543–1548, doi:10.4103/0366-6999.157693 (2015).
Boef, A. G., Dekkers, O. M. & le Cessie, S. Mendelian randomization studies: a review of the approaches used and the quality of reporting. Int J Epidemiol 44, 496–511, doi:10.1093/ije/dyv071 (2015).
Kroke, A. et al. Measures of Quality Control in the German Component of the EPIC Study. Ann Nutr Metab 43, 216–224 (1999).
Boeing, H., Wahrendorf, J. & Becker, N. EPIC-Germany–A source for studies into diet and risk of chronic diseases. European Investigation into Cancer and Nutrition. Ann Nutr Metab 43, 195–204, 2786 (1999).
Stefan, N. et al. Plasma Fetuin-A Levels and the Risk of Type 2 Diabetes. Diabetes 57, 2762–2767, doi:10.2337/db08-0538 (2008).
Holle, R., Happich, M., Löwel, H. & Wichmann, H. E. KORA–a research platform for population based health research. Gesundheitswesen 67(Suppl 1), S19–25, doi:10.1055/s-2005-858235 (2005).
Meisinger, C. et al. Prevalence of undiagnosed diabetes and impaired glucose regulation in 35-59-year-old individuals in Southern Germany: the KORA F4 Study. Diabet Med 27, 360–362, doi:10.1111/j.1464-5491.2009.02905.x (2010).
Rathmann, W. et al. Incidence of Type 2 diabetes in the elderly German population and the effect of clinical and lifestyle risk factors: KORA S4/F4 cohort study. Diabet Med 26, 1212–1219, doi:10.1111/j.1464-5491.2009.02863.x (2009).
Exome Chip Design, http://genome.sph.umich.edu/wiki/Exome_Chip_Design (14.04.2015).
Ridker, P. M. et al. Rationale, design, and methodology of the Women’s Genome Health Study: a genome-wide association study of more than 25,000 initially healthy american women. Clin Chem 54, 249–255, doi:10.1373/clinchem.2007.099366 (2008).
Grove, M. L. et al. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS One 8, e68095, doi:10.1371/journal.pone.0068095 (2013).
Goldstein, J. I. et al. zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics 28, 2543–2545, doi:10.1093/bioinformatics/bts479 (2012).
Grallert, H. et al. APOA5 variants and metabolic syndrome in Caucasians. J Lipid Res 48, 2614–2621, doi:10.1194/jlr.M700011-JLR200 (2007).
Römisch-Margl, W. et al. Procedure for tissue sample preparation and metabolite extraction for high-throughput targeted metabolomics. Metabolomics 8, 133–142 (2012).
Floegel, A. et al. Reliability of serum metabolite concentrations over a 4-month period using a targeted metabolomic approach. PLoS One 6, e21103, doi:10.1371/journal.pone.0021103 (2011).
Jourdan, C. et al. Associations between thyroid hormones and serum metabolite profiles in an euthyroid population. Metabolomics 10, 152–164, doi:10.1007/s11306-013-0563-4 (2014).
Petersen, A.-K. et al. On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies. BMC Bioinformatics 13, 120, doi:10.1186/1471-2105-13-120 (2012).
Floegel, A. et al. Linking diet, physical activity, cardiorespiratory fitness and obesity to serum metabolite networks: findings from a population-based study. International Journal of Obesity 38, 1388–1396, doi:10.1038/ijo.2014.39 (2014).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559–575, doi:10.1086/519795 (2007).
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227, doi:10.1038/sj.hdy.6800717 (2005).
Nyholt, D. R. http://neurogenetics.qimrberghofer.edu.au/matSpD/ (27.01.2017).
Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 74, 765–769, doi:10.1086/383251 (2004).
Schwarzer, G. General Package for Meta-Analysis version 4.3-2, https://cran.r-project.org/web/packages/meta/.
Lee, S. S., Miropolsky, L. & Wu, M. SNP-set (Sequence) Kernel Association Test version 1.0.1, http://cran.r-project.org/web/packages/SKAT/ (23.09.2015).
Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum Joseph, D. & Lin, X. Sequence Kernel Association Tests for the Combined Effect of Rare and Common Variants. The American Journal of Human Genetics 92, 841–853, doi:10.1016/j.ajhg.2013.04.015 (2013).
Prentice, R. L. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11 (1986).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society 57, 289–300 (1995).
Hettne, K. M. et al. Structuring research methods and data with the research object model: genomics workflows as a case study. J Biomed Semantics 5, 41, doi:10.1186/2041-1480-5-41 (2014).
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40, D109–114, doi:10.1093/nar/gkr988 (2012).
Wolstencroft, K. et al. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Research 41, W557–W561, doi:10.1093/nar/gkt328 (2013).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498–2504, doi:10.1101/gr.1239303 (2003).
We thank the Human Study Centre (HSC) of the German Institute of Human Nutrition Potsdam-Rehbrücke, namely the trustee and the data hub for the processing, and the participants for the provision of the data, the biobank for the processing of the biological samples and the head of the HSC, Manuela Bergmann, for the contribution to the study design and leading the underlying processes of data generation. Furthermore, we are grateful to the field staff in Augsburg and Munich involved in the conduct of the KORA studies and we also thank Julia Scarpa, Arsin Sabunchi and Dr. Werner Römisch-Margl for metabolomics measurements performed at the Helmholtz Zentrum München, Genome Analysis Center, Metabolomics Platform. Special thanks belong to Kristina M. Hettne (Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands) for her help in applying her “GWAS to pathway” workflow method to our data and Stefan Herms (Division of Medical Genetics, Department of Biomedicine, University of Basel, Basel, Switzerland; Department of Genomics, Life and Brain Center, Bonn, Germany; Institute of Human Genetics, University of Bonn, Bonn, Germany) for his assistance in quality control of the EPIC-Potsdam exome chip data.
The authors declare that they have no competing interests.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
About this article
Cite this article
Jäger, S., Wahl, S., Kröger, J. et al. Genetic variants including markers from the exome chip and metabolite traits of type 2 diabetes. Sci Rep 7, 6037 (2017). https://doi.org/10.1038/s41598-017-06158-3
This article is cited by
Reviews in Endocrine and Metabolic Disorders (2021)
Identification of novel biomarkers affecting the metastasis of colorectal cancer through bioinformatics analysis and validation through qRT-PCR
Cancer Cell International (2020)
Scientific Reports (2019)
Identification of novel alleles associated with insulin resistance in childhood obesity using pooled-DNA genome-wide association study approach
International Journal of Obesity (2018)