Type 2 diabetes (T2D) is a common metabolic disease, characterized by disturbances in glucose and insulin metabolism. The pathogenesis of T2D is driven by inherited and environmental factors1. There is increasing interest in differential DNA methylation in the development of T2D as well as with glucose and insulin metabolism2,3,4,5,6. Depending on the region, DNA methylation may result in gene silencing and thus regulate gene expression and subsequent cellular functions7. Differential methylation in the circulation may predict the development of future T2D beyond traditional risk factors such as age and obesity3,8, but it may also be part of the biological mechanism that links age and/or obesity to glucose, insulin metabolism and/or T2D. A recent longitudinal study with multiple visits reported that most DNA methylation changes occur 80–90 days before detectable glucose elevation9, suggesting that differential DNA methylation evokes changes in glucose and is involved in the early stage(s) of diabetes. Differential DNA methylation is further associated with obesity, which is an important driver of the T2D risk and also precedes the increase in glucose and insulin level in persons developing T2D8. A key question to answer is whether the differential methylation associated with glucose and insulin metabolism is an irrelevant epiphenomenon that is related to obesity acting as a statistical confounder or whether there are functional effects of the differential methylation relevant of obesity that is associated to metabolic pathology.

Here, we aim to determine the relation of differential DNA methylation and fasting glucose and insulin metabolism as markers of early stages of diabetes pathology in non-diabetic subjects, accounting for obesity measured as body mass index (BMI). We identify and replicate nine CpG sites associated with fasting glucose (in FCRL6, SLAMF1, APOBEC3H and the 15q26.1 region) and insulin (in LETM1, RBM20, IRS2, MAN2A2 and the 1q25.3 region). Using cross-omics analyses, we present in silico evidence supporting the functional relevance of the CpG sites on the development and progression of diabetes, in terms of their effect on expression paths and elucidate the genetic networks involved.


Epigenome-wide association analysis and replication

In the discovery phase, we performed a blood-based epigenome-wide association study (EWAS) meta-analysis of four cohorts including 4,808 non-diabetic individuals of European ancestry (Supplementary Data 1), which revealed differential DNA methylation at 28 unique CpG sites in either the baseline model without BMI adjustment or in the second model with BMI adjustment (Table 1 and Supplementary Table 1). The summary statistic results of the EWAS are provided as a Data file []. These include three CpG sites associated with both insulin and glucose, eight CpG sites associated with fasting glucose only and 17 with fasting insulin (P value < 1.3 × 10−7 in meta-analysis). Of these 28 CpG sites, 13 were identified by earlier EWAS studies of either T2D or related traits, including glucose, insulin, hemoglobin A1c (HbA1c), and homeostatic model assessment-insulin resistance (HOMA-IR)2,3,4,5,8,10,11 (Supplementary Table 1). The known CpG sites include three sites located in SLC7A11, CPT1A and SREBF1 that are associated with both glucose and insulin. The remaining ten CpG sites, located in DHCR24, CPT1A, RNF145, ASAM, KDM2B, MYO5C, TMEM49, ABCG1 (harboring two CpG sites) and the 4p15.33 region, are associated with insulin only. All of the previously reported CpG sites with glycemic traits are also associated with BMI in previous EWAS8,10,12,13,14,15 (Supplementary Table 1).

Table1 CpG sites associated with glycemic traits in discovery phase

The 15 novel CpG sites were tested using the same statistical models in 11 independent cohorts, including 11,750 non-diabetic participants from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium (Supplementary Data 1). Nine unique CpG-trait associations were replicated when correcting for multiple testing using Bonferroni (15 CpGs, P value threshold for significance < 3.3 × 10−3) and were investigated in the further analyses (Table 2). These include five sites (in LETM1, RBM20, IRS2, MAN2A2 and the 1q25.3 region) associated with fasting insulin and one site (in FCRL6) associated with fasting glucose in the baseline model without adjusting for BMI, and three (in SLAMF1, APOBEC3H and the 15q26.1 region, all associated with fasting glucose) emerging in the BMI-adjusted model. Of note, no locus was found to be associated with fasting insulin in the BMI-adjusted model.

Table2 CpG sites associated with glycemic traits in replication

Because the replication cohorts also included individuals of African ancestry (AA, n = 4355) and Hispanic ancestry (HA, n = 577), we also performed the replication stratified by ancestry (Supplementary Data 2). Two CpG sites (cg13222915 and cg18247172) were replicated in the AA population when corrected by the number of tests and two (cg00936728 and cg06229674) replicated with nominal significance. In the HA population, cg20507228 was replicated at nominal significance. Two CpG sites (cg18881723 and cg13222915) show the opposite direction for the effect estimate in HA ancestry population as compared to the other two populations. However, the estimates of effect size are not significantly different from zero (P value = 0.63 in cg18881723 and P value = 0.092 in cg13222915).

Glycemic differential DNA methylation and transcriptomics

To determine whether the differential DNA methylation has functional effects on gene expression and subsequent cellular functions, we conducted three series of analyses. Figure 1 shows the overview of the cross-omics analyses. First, we explored the Genotype-Tissue Expression (GTEx)16 database for the expression levels of the genes which annotated to the novel CpG sites. We found that the genes are expressed in a wide range of tissues, including whole blood and spleen (in particular MAN2A2 and RBM20), but also other tissues relevant for glucose and insulin metabolism such as adipose subcutaneous, adipose visceral omentum, liver (in particular, SLAMF1, APOBEC3H, FCRL6 and RBM20), pancreas and skeletal muscle (in particular, SLAMF1, APOBEC3H, FCRL6 and MAN2A2) and small intestine terminal ileum (in particular, MAN2A2, RBM20, FCRL6 and APOBEC3H; Supplementary Figure 1).

Fig. 1
figure 1

Overview of the cross-omics analysis. (1) Methylation quantitative trait loci (meQTL). (2) Expression quantitative trait loci (eQTL). (3) Expression quantitative trait methylation (eQTM). (4) Epigenome-wide association study (EWAS) and Mendelian randomization (MR). (5) Genome-wide association study (GWAS). (6) The association of gene expression expressed in the glucose or insulin metabolism-related tissues and glycemic traits. Results in 1, 2, 3 were extracted from the summary statistics from Biobank-based Integrative Omics Study (BIOS) database (n = 3814). Results in 4 was the results in the current EWAS (discovery phase, n = 4808, replication phase, n = 11,750) and the two-sample Mendelian randomization based on the BIOS database (n = 3814) and GWAS results of Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC). Results in 5 was from the GWAS results of MAGIC or the DIAbetes Genetics Replication And Meta-analysis consortium (DIAGRAM, n = 96,496–452,244). Results in 6 was based on the summary statistics of Genotype-Tissue Expression project (GTEx) and MAGIC or DIAGRAM (n = 153–491)

Second, the effect on gene expression in blood of the previously identified 11 independent CpG sites (cg00574958 in CPT1A and cg06500161 in ABCG1 were used) and the nine novel sites from our current study was examined in the Biobank-based Integrative Omics Study (BIOS) database that is part of the Biobanking and BioMolecular Infrastructure of the Netherlands (BBMRI-NL)17 (indicated in Fig. 2 in the orange boxes). We found that five CpG sites, i.e. cg00936728 (FCRL6), cg18881723 (SLAMF1), cg00574958 (CPT1A), cg11024682 (SREBF1) and cg06500161 (ABCG1), are expression quantitative trait methylations (eQTMs), i.e. there is correlation between gene expression and methylation18. In most cases, the differential methylation levels are associated with the expression (indicated in Fig. 2 in the yellow boxes) of their respective genes. Cg18881723 (SLAMF1) is also associated with the expression of two other genes near SLAMF1, i.e. SLAMF7 and CD244 (Supplementary Table 2).

Fig. 2
figure 2

Significant associations of the cross-omics integration. The effect allele is standardized across all associations. Only the significant associations which passed the specific P value threshold in each association step and the direction of effects consistent were shown in the figure. FG fasting glucose. FI fasting insulin, T2D type 2 diabetes, HbA1c hemoglobin A1c

Third, we investigated whether the genetically regulated expression of the annotated genes in specific tissues is altered in T2D or related traits, such as glucose, insulin and HbA1c. To answer this question, we mined in the MetaXcan database for genome-wide association studies (GWAS) of T2D, fasting glucose, HbA1c, insulin and HOMA-IR19,20,21,22,23 as a genetic proxy for the traits24. No association was found between glycemic traits and the DNA expression in adipose subcutaneous, adipose visceral omentum and small intestine terminal ileum. Supplementary Table 3 gives the significant findings for tissues known to be implicated in glucose and insulin metabolism including blood, liver, pancreas and skeletal muscle (P value < 0.05 for MetaXcan). As described earlier, we associated the increased expression of SREBF1 with decreased risk of T2D and decreased HbA1c levels in the whole blood25. The increased expression in the whole blood of ABOBEC3H, a methylation locus we identified in the present study, is associated with increased HOMA-IR level, a measure of insulin resistance. In skeletal muscle, the increased fasting glucose is associated with the increased expression of KDM2B and decreased expression of MAN2A2. Moreover, we discovered that increased hepatic expression of FCRL6, which was annotated to the methylation locus associated with fasting glucose in the present study, is associated with the risk of T2D. In the pancreas, the increased expression of the methylation loci MYO5C and RBM20 are associated with increased fasting glucose levels.

Glycemic differential DNA methylation and genomics

Although differential DNA methylation may be the result of environmental exposures, the process is often (partly) heritable with genetic variants (co-)determining the process26. Therefore, we next set out to find whether the differential methylation associated with fasting glucose and insulin levels is driven by genetic variants which referred to as methylation quantitative trait loci (meQTLs). Using the BIOS database (blood-based data)17, we were able to study 18 out of the 20 unique CpG sites in this respect. We associated 2,991 single-nucleotide polymorphisms (SNPs) in 29 unique meQTLs (indicated in Fig. 2 in the blue boxes) with differential methylation either in cis or trans (for details see Supplementary Data 3). Six of these meQTLs (4 cis and 2 trans-acting) are also associated with T2D, fasting glucose, fasting Insulin, or HbA1c in earlier studies21,22,27,28,29 and the directions of the effect between the SNP, methylation and glycemic traits are consistent (shown in Fig. 2 in the pink boxes, for details see Supplementary Table 4). A genetic locus near TMEM61 is a common genetic driver affecting the differential methylation at nearby CpG cg17901584 (DHCR24) in our study and fasting glucose levels in an earlier study22. Further, the RNF145 locus was found to be a common driver affecting the differential methylation at cg26403843 (RNF145) and fasting insulin levels21. The KDM2B locus affects differential methylation at cg13708645 (KDM2B) and fasting glucose levels22, and the TOM1L2/RAI1 locus affects the differential methylation at cg11024682 (SREBF1) as well as HbA1c and T2D27,28. Two trans-acting loci involve a genetic locus in CCDC162P that is affecting differential methylation at cg20507228 (MAN2A2) and HbA1c27 and the genetic locus in RP11-16L9.4 affecting the differential methylation at cg11024682 (SREBF1) and HbA1c27.

We next explored if these genetic variants associated with differential methylation (meQTLs) are also associated with gene expression, i.e. quantitative trait loci (eQTLs; see the integrated outline of analyses in Fig. 2 and detailed in Supplementary Table 5). We searched specifically for expression profiles earlier associated with glycemic CpG sites in blood (listed in Supplementary Table 2). We associated three genetic variants with both differential methylation and gene expression in blood. These include that: 1) rs11265282 in FCRL6 is positively associated with the differential methylation at cg00936728 (FCRL6) and decreased the expression of FCRL6 in blood, 2) rs1577544 near SLAMF1 is associated with decreased differential methylation at cg18881723 (SLAMF1) and decreased SLAMF1 expression in blood, and 3) rs6502629 in TOM1L2 is associated with increased differential methylation at cg11024682 (SREBF1) and decreased SREBF1 expression in blood.

As we observed that the genes driving glycemic CpG sites overlapped with genetic determinants of T2D or related traits, we studied the causal effect of differential methylation on glucose and insulin metabolism with a generalized summary statistic-based Mendelian randomization (MR) test30. Up to eight independent genetic variants include in the genetic risk score were used as the instrumental variable for each CpG. Thirteen CpG sites out of the initial 20 met the present MR criteria and were tested by MR (Supplementary Data 4). No significant association was detected when adjusting for multiple testing accounting for 13 independent tests (P value threshold for significance < 3.8 × 10−3). The genetic risk score for cg15880704 (RBM20) methylation levels is nominally significantly associated with fasting insulin levels (P value = 0.04), and the genetic risk score for cg18881723 (SLAMF1) levels is nominally associated with fasting glucose levels (P value = 0.05) in the MR tests.

Multi-omics integration and functional annotation

To understand the biological relevance of our findings, we first integrated the cascade of associations into genomics, epigenomics, transcriptomics and glycemic traits through EWAS, eQTM, meQTL and eQTL. There are three pathways emerging when considering the consistency of the direction of the effects between the associations. One pathway involves SREBF1, which in part, was reported earlier3,25,31 but substantially extended in the current report. The other two involve differential methylation of FCRL6 and SLAMF1 (Fig. 3). The C allele of rs11265282 in FCRL6 is associated with increased methylation, which turns down the FCRL6 expression in blood. In addition, the genetically decreased FCRL6 expression in the liver is also associated with a decreased risk of T2D. The T allele of rs1577544 near SLAMF1 increases the differential methylation levels in the blood, which decreases SLAMF1 expression in the circulation, which is consistent with the negative association between the genetic variant and gene expression levels.

Fig. 3
figure 3

The cross-omics integration of CpGs in SREBF1 (a), FCRL6 (b) and SLAMF1 (c). Cascading associations cross multi-omics were integrated in the network. * The association happens in the FCRL6 expression in liver. All other differential methylation or gene expression was measured in blood. FG fasting glucose, FI fasting insulin, T2D type 2 diabetes, HbA1c hemoglobin A1c

To understand the correlation of the findings, we clustered the normalized differential methylation values of the nine novel CpG sites including those not annotated to a gene. Two clusters emerge, one including IRS2, MAN2A2, 1q25.3 locus (intergenic), RBM20, LETM1 and SLAMF1 and the second one including FCRL6, 15q26.1 (intergenic) and APOBEC3H (Fig. 4 and Supplementary Table 6). Four CpGs in FCRL6, 15q26.1 (intergenic), APOBEC3H and SLAMF1 are highly correlated with each other, in which the absolute correlation coefficients are bigger than 0.6, while they are located in different chromosomes, suggesting a common biological mechanism: SLAMF1 and FCRL6 from chromosome 1, APOBEC3H from chromosome 22 and 15q26.1 from chromosome 15. We next performed gene set enrichment analysis in different pathway databases, including KEGG pathways32, Reactome Pathway Knowledgebase33 and Gene Ontology (GO) biological process classification34. We found that the genes in the first cluster are highly enriched together in multiple pathways, including regulation of leukocyte proliferation, protein secretion and cell activation (SLAMF1 and IRS2), hexose, monosaccharide and carbohydrate metabolism (IRS2 and MAN2A2). Further, SLAMF1 (cluster 1) and APOBEC3H (cluster 2) are both enriched in immune effector processes and innate immune response (Supplementary Table 7).

Fig. 4
figure 4

Clustered correlation of the nine novel glycemic CpGs. The correlation of the novel CpG sites was checked by Pearson’s correlation test (n = 1544). The hierarchical cluster analysis was used in the clustering

BMI in the association of methylation and glycemic traits

Of note, among the 20 methylation loci associated with glycemic metabolism in the present analyses, 11 are associated with BMI in the previous EWAS8,10,12,13,14,15. These 11 loci are all associated with insulin metabolism (Supplementary Table 1). Based on the bi-direction MR findings performed as part of the previous EWAS of BMI8, we found that BMI appears to drive methylation for cg06500161 (ABCG1, P value = 6.4 × 10−5), a CpG that we associated with insulin levels. Using a marginal P value of 0.05 in their MR results, the differential methylation appears to be a consequence of obesity rather than a cause for three other CpG sites: cg110244682 (SREBF1; P value = 4.1 × 10−3), cg17901584 (DHCR24; P value = 4.1 × 10−3) and cg26403843 (RNF145; P value = 0.011)8. Taken together (Supplementary Table 1), our results raise the question whether BMI is driving differential methylation, which subsequently raises insulin level in the circulation. Such a pathway would predict that the association between BMI and insulin changes when adjusting for differential methylation at ABCG1, SREBF1, DHCR24 and RNF145. We tested this hypothesis in the non-diabetic individuals of the Rotterdam Study by comparing the relationship between BMI and fasting insulin with and without adjusting for the methylation levels at the four CpG sites. The variance explained (R2) by the linear regression model improves significantly from 0.40 to 0.43 (P value = 1.2 × 10−13 by analysis of variance (ANOVA) testing) when adjusting for the CpG effect, while the effect estimates for BMI decrease by 9.2% (beta: 0.065, standard error (SE): 0.003, P value = 1.2 × 10−82 for the model without CpG adjustment compared to beta: 0.059, SE: 0.003, P value = 2.9 × 10−70 adjusting for the four CpGs). When we extended the adjustment to the 16 CpG sites associated with circulating insulin levels, the variance explained by the model improves further (R2 = 0.46, P value = 2.1 × 10−18) and the beta for BMI reduces further by 16.9% (beta: 0.054, SE: 0.003, P value = 4.6 × 10−58 for the model adjusting for 16 CpG sites).


The current large-scale EWAS identify and replicate nine CpG sites associated with fasting glucose (in FCRL6, SLAMF1, APOBEC3H and the 15q26.1 region) or insulin (in LETM1, RBM20, IRS2, MAN2A2 and the 1q25.3 region). When we adjust for BMI as a potential confounder, three CpG sites (in SLAMF1, APOBEC3H and the 15q26.1 region) are associated with fasting glucose only after adjustment for BMI. We validate 13 previously reported CpG sites from 11 independent genetic loci2,3,4,5,6,8,10,12,13,14,15 and complement the understanding on why these CpG sites are associated with T2D and/or glycemic traits based on comprehensive cross-omics analyses. We present in silico evidence supporting the functional relevance of the CpG sites, in terms of their effect on expression paths and elucidate the genetic networks involved.

Our data show that differential methylation plays a key role in understanding the immunological changes observed in glucose metabolism35. SLAMF1 and APOBC3H are both enriched in immune function and the innate immune response. The differential methylation level at FCRL6, 15q26.1 (intergenic), APOBEC3H and SLAMF1 were highly correlated though they were on three different chromosomes. This finding suggests a common pathway. SLAMF1 belongs to the immunoglobulin gene superfamily and is involved in T-cell stimulation36. APOBEC3H proteins are part of an intrinsic immune defense that has potent activity against a variety of retroelements36 and its expression in whole blood is positively associated with HOMA-IR from the current study. FCRL6 is a distinct indicator of cytotoxic effector T-lymphocytes that is upregulated in diseases characterized by chronic immune stimulation36. Meanwhile, we show that decreased FCRL6 differential methylation increased expression of FCRL6 and fasting glucose in the blood. A key finding that links FCRL6 to glucose metabolism is that the genetically determined FCRL6 expression in the liver is also associated with decreased risk of T2D. In line with a role in immune relation and pathology37,38, the HLA region (6p22.1 region) is a key meQTLs of FCRL6 (rs2523946), 15q26.1 (rs3129055 and rs4324798) and SLAMF1 (rs3129055). Of interest is that in the population of non-diabetic individuals, we found strong signals of the immune system particularly when we adjust the effects attributed to BMI. Remarkably, three out of the four methylation loci at SLAMF1, APOBEC3H and the 15q26.1 region emerged in the BMI-adjusted model, suggesting that these associations were masked by confounding noise of BMI on methylation in opposite effects to that of insulin.

We studied the interplay between BMI, fasting glucose and insulin levels, and differential methylation in the circulation. On the one hand, we find evidence that the differential methylation of the insulin-related CpG sites together explained up to 16.9% of the association between obesity and insulin levels. These findings are in line with the Nature paper on the EWAS of BMI that found that the methylation patterns in blood predict future diabetes8. Our study reveals that insulin is a key player underlying the association reported earlier8. On the other hand, we find evidence that the association between differential methylation and insulin metabolism is attenuated up to 62%, e.g. CpG sites in SREBF1 (62%), ASAM (56%), CPT1A (54%) and TMEM49 (52%), when BMI is accounted for in the model, suggesting that the interplay between BMI, differential methylation and insulin metabolism is extremely complex and differs across CpG sites. BMI may be a confounder of associations for some CpGs but may be in the causal pathway for others.

To our knowledge, we report for the first time that, in blood, differential methylation of IRS2 was associated with fasting insulin level. Expression level of IRS2 (insulin receptor substrate 2) in β-cells in the pancreas are associated with the onset of diabetes39,40,41. Though the expression level of IRS2 is low in blood, we find its blood-based differential methylation was associated with fasting insulin. We also find an insulin-related genetic locus, MAN2A2 (mannosidase alpha class 2 A member 2) in our EWAS. MAN2A2 encodes an enzyme that forms intermediate asparagine-linked carbohydrates (N-glycans)42. It is related to the hexose/monosaccharide metabolism. In addition, the expression of MAN2A2 in skeletal muscle is negatively associated with fasting glucose level and the meQTL (rs9374080) of MAN2A2 associates with HbA1c27. Together, these findings suggest that regulating the differential methylation level or expression level of MAN2A2 may be relevant to the development of insulin resistance. Another interesting gene that emerged is the familial cardiomyopathy related gene RBM20, which may play a role in cardiovascular complications of diabetes via mediating insulin damage in cardiac tissues43. The expression of RBM20 in the pancreas is also associated with fasting glucose. The meQTL for RBM20 is associated with pulse rate (P value = 4.6 × 10−5) in UKBIOBANK GWAS44, and its mRNA is highly expressed in cardiac tissues45.

One limitation of our study is that the main findings are based on data from blood which was the only accessible tissue in our epidemiological studies and may not be representative of more disease-relevant tissues. However, the concordance of differential methylation between blood and adipose is high for certain pathways46. DNA methylation globally is considered a relatively stable epigenetic mark that can be inherited through multiple cell divisions47,48. However, some changes can be dynamic reflected by recent environmental exposures. This phenomenon could be site-specific. While our study provides a snapshot of associations specific to the fasting state, instant methylation of different CpG sites in the vicinity of IRS2 and KDM2B have been reported earlier49. Such effects may also occur at the loci presented in the present study. Our present MR analyses yield no evidence for the causal effects of CpG sites on fasting glucose or insulin. One limitation in the interpretation of the findings is that low power of the MR due to the fact we lack insight in the genes driving differential methylation. For instance, seven of the 13 performed CpG sites have instrumental variables which explain less than 5% of the exposure. Further studies are needed to include additional biologically relevant tissues and perform MR based on the tissue-specific meQLTs. Last but not least, cg19693031 in TXNIP has been repeatedly associated with type 2 diabetes case-control status earlier3,50,51. Although it did not pass our pre-defined EWAS significance threshold, TXNIP is associated with fasting glucose in the non-diabetic population (P value = 7.6 × 10−7 in the BMI adjustment model) if we take the current study aiming to replicate earlier findings. Of note is that cg19693031 is not associated with fasting insulin (in BMI-unadjusted model, p value = 0.30; in the BMI-adjusted model, p value = 0.37).

In conclusion, our large-scale EWAS and replication identifies nine differentially methylated sites associated with fasting glucose or insulin, and shows that differential methylation explains part of the association between obesity and insulin metabolism. The integrative in silico cross-omics analyses provide insights of glycemic loci into the genetics, epigenetics and transcriptomics pathways. We also highlight that differential methylation is a key point in the involvement of the adaptive immune system in glucose homeostasis. Further studies in the future will benefit from tissue-specific methylation and meQTL databases which are currently the missing piece of the in silico data integration framework.


Study population

The discovery samples consisted of 4808 European individuals without diabetes from four non-overlapped cohorts, recruited by Rotterdam Study III-1 (RS III-1, n = 626), Rotterdam Study II-3 and Rotterdam Study III-2 (called as RS-BIOS, n = 705), Netherlands Twin Register (NTR, n = 2753) and UK adult Twin registry (TwinsUK, n = 724). The replication sets contained up to 11,750 individuals from 11 independent cohorts from CHARGE, including up to 6818 individuals from European ancestry, 4355 from African ancestry and 577 from Hispanic ancestry (Supplementary Data 1). They are from Atherosclerosis Risk in Communities (ARIC) Study, Baltimore Longitudinal Study of Aging (BLSA), Cardiovascular Health Study (CHS), Framingham Heart Study Cohort (FHS), The Genetic Epidemiology Network of Arteriopathy (GENOA), Genetics of Lipid Lowering Drugs and Diet Network (GOLDN), Hypertension Genetic Epidemiology Network (HyperGEN), Invecchiare in Chianti Study (InCHIANTI), Kooperative Gesundheitsforschung in der Region Augsburg (KORA), Women’s Health Initiative - Broad Agency Award 23 (WHI-BAA23) and Women’s Health Initiative - Epigenetic Mechanisms of PM-Mediated CVD (WHI-EMPC). We excluded individuals with known diabetes and/or fasting glucose ≥ 7 mmol/l and/or those on anti-diabetic treatment. All studies were approved by their respective Institutional Review Boards, and all participants provided written informed consent. Details about the studies have been reported previously, and the key references as well as the summary of the design of each study are reported in Supplementary Note 1.

Glycemic traits and covariates

Venous blood samples were obtained after an overnight fast in all discovery and replication cohorts. BMI was calculated as weight over height squared (kg m−2) based on clinical examinations. Smoking status was divided into current, former and never, based on questionnaires. White blood cell counts were quantified using standard laboratory techniques or predicted from methylation data using the Houseman method52. The cohort-specific measurement of glycemic traits and covariates are shown in Supplementary Note 1.

DNA methylation quantification

The Illumina© Human Methylation450 array was used in all discovery and replication cohorts to quantify genome-wide DNA methylation in blood samples. We obtained DNA methylation levels reported as β values, which represents the cellular average methylation level ranging from 0 (fully unmethylated) to 1 (fully methylated). Study-specific details regarding DNA methylation quantification, normalization and quality control procedures are provided in the Supplementary Note 1.

Epigenome-wide association analysis and replication

All statistical analyses were performed using R statistical software and the two-tailed test was considered. Insulin was natural log transformed. In the discovery analysis, we first performed EWAS in each cohort separately. Linear regression analysis was used to test the association between glucose and insulin with each CpG site in the Rotterdam Study samples. Linear mixed models were used in NTR and TwinsUK, accounting for the family structure. We fitted the following two models for each cohort: (1) the baseline model adjusting for age, sex, technical covariates (chip array number and position on the array), white blood cell counts (lymphocytes, monocytes, and granulocytes) and smoking status, and (2) a second model additionally adjusting for BMI. We removed probes that have evidence of multiple mapping or contain a genetic variant in the CpG site53. All cohort-specific EWAS results for each model were then meta-analysed using inverse variance-weighted fixed effect meta-analysis as implemented in the metafor R package54. In total, we meta-analysed 393,183 CpG sites that passed quality control in all four discovery cohorts. The details of the quality control for each cohort could be found in the Supplementary Note 1. The association was later corrected by the genomic control factor (λ) in each meta-EWAS55. We produced quantile-quantile (QQ) plots of the -log10 (P) to evaluate inflation in the test statistic (Supplementary Figure 2). A Bonferroni correction was used to correct for multiple testing and identify epigenome-wide significant results (P < 1.3 × 10−7). We did not correct the number of glycemic traits and models, as they are highly correlated and not independent. The genome coordinates were provided by Illumina (GRCh37/hg19). The CpG sites were annotated to genes using Infinium HumanMethylation 450 BeadChip annotation resources. The correlation of the CpG sites located in the same gene was further checked in the overall RS III-1 and RS-BIOS samples by Pearson’s correlation test (n = 1544) to find the independent top CpG sites.

For the associations discovered in the meta-EWAS that have not been reported previously, we attempted replication in independent samples using the same traits and regression models as in the discovery analyses. Study-specific details of replication cohorts are provided in Supplementary Data 1 and Supplementary Note 1. Results from each replication cohort were meta-analysed using the same methods as in the discovery analyses. Bonferroni P value < 3.3 × 10−3 (0.05 corrected by 15 CpGs tested for associations) was considered significant.

Glycemic differential DNA methylation and transcriptomics

To explore whether the differential CpG sites were associated with gene expression level in blood, we explored eQTMs17 from the European blood-based BIOS database17 from BBMRI-NL which captured meQTLs, eQTLs and eQTMs from genome-wide database of 3841 Dutch blood samples (See resources of the database in URLs). The associated gene expression probes of the known and replicated CpG sites were searched. We then tested whether the expression of the genes that harbor the identified methylation sites was associated with T2D and related traits in glucose metabolism-related tissues (adipose subcutaneous, adipose visceral omentum, liver, whole blood, pancreas, skeletal muscle and small intestine terminal ileum) using MetaXcan package24,56. MetaXcan associates the expression of the genes with the phenotype by integrating functional data generated by large-scale efforts, e.g. GTEx project16 with that of the GWAS of the trait. MetaXcan is trained on transcriptome models in 44 human tissues from GTEx and is able to estimate their tissue-specific effect on phenotypes from GWAS. For this study, we used the GWAS studies of T2D19, fasting glucose traits21,22, fasting insulin22, HbA1c23 and HOMA-IR20. We used the nominal P value threshold (P value threshold for significance < 0.05) as we had separate assumptions for each terminal pathway between gene expressions and phenotype. The associations with genes in low prediction performance were excluded, i.e. the association of the tissue model’s correlation to the gene’s measured transcriptome is not significant (P value > 0.05).

Glycemic differential DNA methylation and genomics

We identified the genetic determinants of the significant CpG sites known or replicated through the current EWAS using the results of the cis and trans meQTLs from the European blood-based BIOS database17 (See resources of the database in URLs). All the reported SNPs with P value adjusted for false discovery rate (FDR) less than 0.05 in the database were treated as the target genetic variants in the present study. The SNPs were annotated based on the information in the BIOS study17 or the nearest protein-coding gene list from SNPnexus57 on GRCh37/hg19. We also explored the associations of these DNA methylation-related genetic variants with T2D or related traits, i.e., fasting glucose, insulin, HbA1c and HOMA-IR, based on public GWAS data sets in European ancestry20,21,22,27,28,29. Meanwhile, we checked the effect direction consistency of the association between the SNPs, CpG sites and T2D or related traits. That is the direction of the association between SNP and T2D or related traits should be a combination of the direction of SNP with CpG sites and CpG sites with T2D or related traits. A multiple-testing correction was performed by Bonferroni adjustment (P value significant threshold < 1.8 × 10−3, 0.05 corrected by the 29 genetic loci shown in Supplementary Data 3). The associations of the DNA methylation-related genetic variants and the gene expression were also looked up in the BIOS database17. This is limited to the expression profiles earlier associated with glycemic CpG sites in blood.

For the significant CpG sites known or replicated through EWAS, we attempted to evaluate the causality effect of CpG sites on their significant traits, either fasting glucose or fasting insulin, using two-sample MR approach as described in detail before by Dastani et al.30,58 based on the summary statistic GWAS results from the BIOS database and the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) database17,21 (Supplementary Figure 3). Briefly, we constructed a weighted genetic risk score for individual CpG on phenotype using independent SNPs as the instrument variables of the CpG, implemented in the R-package gtx. The effect of each score on phenotype was calculated as

$${\rm{ahat}} = \frac{{{\sum} {(\omega _{\rm{i}}\beta _{\rm{i}}/s_{\rm{i}}^2)} }}{{{\sum} {(\omega _{\rm{i}}^2/s_{\rm{i}}^2)} }},$$

where βi is the effect of the CpG-increasing alleles on phenotype, si its corresponding standard error and ωi the SNP effect on the respective CpG. Because the genetic variants might be close (cis) or far (trans) from the methylated site, we also performed MR test in the cis only SNPs if the CpG has both cis and trans genetic markers. All SNPs were mapped to the human genome build hg19. For each test (one CpG with one trait), we extracted all the genetic markers of the CpG in the fasting glucose or insulin GWAS from the MAGIC data set (n = 96,496)21 with their effect estimate and standard error on fasting glucose or insulin. Within the overlapped SNPs, we removed SNPs in potential linkage disequilibrium (LD, pairwise R2 ≥ 0.05) in 1-Mbp window based on the 1000 Genome imputed genotype data set from the general population: Rotterdam Study I (RS I, n = 6291)59. We managed to exclude the genetic loci which were genome-wide associated with glycemic traits, but none of the genetic loci meet this exclusion criterion. The instrumental variables that explain more than 1% of the variance in exposure (DNA methylation) were taken forward for MR test. The Bonferroni P value threshold was used to correct for the 13 CpG sites available for MR (P value < 3.8 × 10−3).

Functional annotation

Further, we integrated the cascade of associations as above among the results of EWAS, eQTM, meQTL and eQTL and showed in Fig. 3. We checked the effect direction consistency of the association between the SNPs, CpG sites, gene expression in blood and glycemic traits. The correlation of the novel CpG sites was checked in the overall RS III-1 and RS-BIOS samples by Pearson’s correlation test (n = 1544). The hierarchical cluster analysis was used in the clustering. Gene set enrichment analyses were performed in the genes of new CpG sites60. We tested if genes of interest were over-represented in any of the pre-defined gene sets from KEGG pathway database32, Reactome Pathway Knowledgebase33 and GO biological process34. Multiple test correction was performed in the tests. Gene sets of KEGG pathway database, Reactome Pathway Knowledgebase were obtained from Molecular Signatures Database (MsigDB) c2 and GO biological process was obtained from MsigDB c560. We used the platform of Functional Mapping and Annotation of Genome-wide Association Studies (FUMA GWAS)61 and GENE2FUNC function to perform the gene set enrichment analysis and the tissue-specific gene expression patterns based on GTEx v616. Besides, the tools Ensembl Human Genes62 (see URLs) and UCSC GRCh37/hg1963 (see URLs) were also used in interpreting genetic determinants, CpG sites and genes.

BMI in the association of methylation and glycemic traits

We used linear regression to check the effect of CpGs on the relationship between BMI and fasting insulin in the non-diabetic individuals in Rotterdam study. The initial model used BMI as the independent variable and the natural log transformed insulin as the dependent variable. The covariates included age, sex, technical covariates (chip array number and position on the array), white blood cell counts, smoking status and data set (RS III-1 and RS-BIOS). The normalized differential methylation values of CpG sites were added as covariates in the advanced model. The differences of the models were compared by ANOVA testing using anova function in R (P value < 0.05).


BIOS database, []; SNPnexus, []; GWAS database of glycemic traits, []; GWAS database of T2D, []; MetaXcan, []; NHGRI-EBI Catalog, []; Ensembl, []; FUMA, []; UCSC, [] (available: 1st Jan 2019)

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.