Epigenome-wide association study in peripheral white blood cells involving insulin resistance

Insulin resistance (IR) is a hallmark of type 2 diabetes, metabolic syndrome and cardiometabolic risk. An epigenetic phenomena such as DNA methylation might be involved in the onset and development of systemic IR. The aim of this study was to explore the genetic DNA methylation levels in peripheral white blood cells with the objective of identifying epigenetic signatures associated with IR measured by the Homeostatic Model Assessment of IR (HOMA-IR) following an epigenome-wide association study approach. DNA methylation levels were assessed using Infinium Methylation Assay (Illumina), and were associated with HOMA-IR values of participants from the Methyl Epigenome Network Association (MENA) project, finding statistical associations for at least 798 CpGs. A stringent statistical analysis revealed that 478 of them showed a differential methylation pattern between individuals with HOMA-IR ≤ 3 and > 3. ROC curves of top four CpGs out of 478 allowed differentiating individuals between both groups (AUC≈0.88). This study demonstrated the association between DNA methylation in some specific CpGs and HOMA-IR values that will help to the understanding and in the development of new strategies for personalized approaches to predict and prevent IR-associated diseases.

Individuals with HoMA-IR > 3 showed a differential methylation pattern. Participants were classified according to the HOMA-IR cut-off of 3 in order to analyse whether methylation was differential between both groups. There were 78 individuals with HOMA-IR > 3 and 254 with HOMA-IR ≤ 3 (Table 1). Methylation   Table S2).
The resulting 478 CpGs were clustered in a heat map according to methylation patterns (Fig. 3). Two main clusters of 61 and 271 individuals were generated. The first cluster contained 62.3% of individuals with HOMA-IR > 3. However, the second cluster only included 14.8% of HOMA-IR > 3. The difference in HOMA-IR proportions of the clusters was statistically significant (p < 0.001).   www.nature.com/scientificreports www.nature.com/scientificreports/ Differentially methylated CpGs between HOMA-IR groups were related to glucose and insulin pathways. Canonical pathways were obtained from Ingenuity Pathway Analysis (IPA) for these 478 CpGs The top four CpGs allow to differentiate between HOMA-IR ≤3 and >3. In order to further analyse whether methylation could differentiate between both HOMA-IR groups, the Receiver Operating Characteristic (ROC) curves adjusted by study, sex, age and BMI for the top four CpGs (cg23475244, cg06115835, cg16278828, and cg16639311) were calculated. The areas under the curve (AUC) of these CpGs were around 0.90 (AUC cg23475244 = 0.8965, AUC cg06115835 = 0.9026, AUC cg16278828 = 0.8989, and AUC cg16639311 = 0.8952), and after an internal validation (Fig. 6), the values were around 0.88 (AUC cg23475244 = 0.8865, AUC cg06115835 = 0.8919, AUC cg16278828 = 0.8893, and AUC cg16639311 = 0.8826).

Discussion
This study involving the Methyl Epigenome Network Association (MENA) project demonstrated the association between DNA methylation in specific CpGs and HOMA-IR values. Our results also provided evidence of a differential methylation pattern between individuals with a HOMA-IR ≤3 and >3. Additionally, these data have led to the identification of four CpGs that allow us to differentiate individuals between HOMA-IR ≤ 3 and >3 with an approximate AUC of 0.88. This assay adds further insights and knowledge about the relationship between T2D-related traits and epigenetic DNA modifications.
As aforementioned, IR is a hallmark of several diseases and unhealthy cardiometabolic conditions such as T2D, CVD, hypertension, obesity 7 and metabolic syndrome 8 . Epigenetic mechanisms have been involved in the onset and development of IR 9 . Indeed, several studies have related methylation of specific genes with HOMA-IR 3,7,8,[22][23][24][25][26][27][28][29][30][31] . Nevertheless, few EWAS have been performed to date 5,14,15,18 . In line with these studies, this EWAS of the MENA project showed an association of 798 CpGs with HOMA-IR (slope ≥ |0.1| and FDR < 0.05). In our study, from the top 10 CpGs, selected ones with better association and significant after linear regressions adjusting by study, age, sex, and BMI were cg07638362 (according to Illumina CG database this CpG was not associated to any gene), cg13133503 (CLCA4) and cg16462528 (LECT1). These CpGs, to our knowledge, have not been previously described in other EWAS. However, some of the mentioned genes have been found in the list of one study. Specifically, differentially methylated regions of LECT1 and CLCA4 have been significantly different between diabetics and non-diabetics 32 . Both CLCA4 and LECT1 have been related to methylation regulation [33][34][35] . CLCA4 has been involved in the activation of cAMP-dependent protein kinase A [www.genecards.org]. This www.nature.com/scientificreports www.nature.com/scientificreports/ pathway is intimately connected to glucose homeostasis 36 . On the other hand, LECT1 plays a role as antiangiogenic factor in cardiac valves, preventing valvular heart diseases 37 . Methylation of this gene may be associating IR with CVD. Thus, the association of several CpGs between DNA methylation and IR detected in our study adds further support for a potential role of abnormal DNA methylation in IR 7 .
As a novelty, our results have shown that individuals with HOMA-IR ≤ 3 or >3 exhibited a differential methylation pattern for at least 478 CpGs. Furthermore, the clustering showed that 62.3% of individuals in the first cluster had a HOMA-IR > 3. Thus, more than half of the people with similar methylation patterns presented a HOMA-IR > 3. However, the distribution of some cohorts was not heterogeneous. This situation is due to the specific recruitment requirements for each cohort. Indeed, cohorts such as RESMENA, where all the patients had metabolic syndrome, is completely found in the first cluster.
Furthermore, these 478 CpGs corresponded to some genes involved in glucose and insulin-related pathways according to IPA. For example, Protein Kinase A Signalling, where protein kinase A activation triggers insulin secretion in β-cells 55 ; Sirtuin Signalling Pathway, where sirtuins influence many steps of glucose metabolism in liver, pancreas, muscle and adipose tissue 56 ; and G-Protein Coupled Receptor Signalling, where insulin and www.nature.com/scientificreports www.nature.com/scientificreports/ glucagon secretion is affected by factors binding to G-protein coupled receptors on the surface of βand α-cells 57 . Other pathways were Rac Signalling, which is involved in the regulation of insulin-stimulated glucose uptake 58 ; RhoA Signalling, pathway that has been implicated in the pathogenesis of diabetes 59 ; and Leptin Signalling in Obesity, since leptin is a regulator of glycaemic control 60 . Furthermore, Maturity Onset Diabetes of Young (MODY) Signalling represents the pathway of another type of diabetes that accounts for less than 2% of all diabetic cases. MODY is a monogenic form of diabetes characterized by an early onset, autosomal dominant mode of inheritance and a primary defect in pancreatic β-cell function 61 .
Only two of the top four CpGs with statistically significant differences between HOMA-IR ≤3 and >3 individuals presented associated genes according to Illumina CG database. Those genes were SH3RF3 and MAN2C1. The function of SH3RF3 is not well known, whereas MAN2C1 is related to glycosaminoglycan (GAG) metabolism. The GAGs are heteropolysaccharides formed by a chain of repeating disaccharide units 62 . Changes in GAGs structure and function have been reported in the kidney, liver, arteries and retinal vessels of diabetics 63 .
Since methylation patterns of the 478 CpGs were able to cluster HOMA-IR individuals, we analysed the ability of the top four CpGs to differentiate between HOMA-IR ≤3 and >3 individuals. These top four CpGs distinguished HOMA-IR groups with a valuable AUC around 0.88 after an internal validation based on the optimistic correction model described by Harrell 64 , suggesting these CpGs as potential valuable biomarkers of IR.
This study was not devoid of limitations. Firstly, methylation is tissue-specific and the ideal tissue for this study would have been the pancreatic β-cells or cells from recognized insulin sensitive tissues such as skeletal muscle or white adipose tissue 65 . However, peripheral blood is the best non-invasive alternative tissue that reflects multiple metabolic and inflammatory pathways 66 , and relevant studies have demonstrated that epigenetic reprogramming may serve as a surrogate marker for metabolic disorders 41 . Interestingly, gene methylation parallelisms between peripheral blood cells and pancreatic islets have been recently reported, suggesting that blood may be used as a marker for islet DNA methylation 67 . Secondly, type I and type II error cannot be discarded, although multiple comparison tests and statistical adjustments for potential confounding factors such as sex, age, cohorts, DNA methylation chips, and cell composition heterogeneity have been performed. Thirdly, a validation sample would have been useful to corroborate the results in the selected genes. Unfortunately, this sample was not available. However, in order to resolve this issue and correct the overestimation of AUC, an internal validation using a bootstrap method 64 was performed, obtaining similar results. Further studies are needed to verify the relationship between the selected CpGs and HOMA-IR. Finally, due to the cross-sectional feature of the study, methylation cannot be defined as a cause or consequence of cardiometabolic conditions. Remarkably, although there is an epigenetic programming during the first stages of human development 68 , Wahl et al. have described methylation alterations as a cause of higher BMI and adiposity 20 .
Epigenetic gene regulation, and specifically, DNA methylation, is playing a role in the pathogenesis of many complex disorders, including T2D, obesity or metabolic syndrome 22 . There is great interest to perform www.nature.com/scientificreports www.nature.com/scientificreports/ methylation profiling in peripheral blood to find potential methylation disease-related associations and use specific DNA methylated regions as biomarkers 69 . In summary, this study found associations between DNA methylation and IR, a hallmark of T2D, with a differential methylation pattern between individuals with HOMA-IR ≤ 3 and > 3 in genes that are mainly involved in glucose and insulin-related pathways, and suggested four CpGs as biomarkers of IR. These results will hopefully contribute to the understanding of some epigenetic mechanisms that may regulate glycaemic traits, such as HOMA-IR, and the risk of T2D, as well as provide the basis for creating personalized strategies to predict, prevent and treat IR-associated diseases. Study designs, characteristics, inclusion and exclusion criteria were described for each study cohort, except for NormoP, whose design has not yet been described. All of them were approved by the Research Ethics Committee of the University of Navarra (CEI-UN, Pamplona, Spain), except for GEDYMET, which was approved by the Ethics committee of the School of Medicine, Pontificia Universidad Católica de Chile (Santiago, Chile), in compliance with the Helsinki Declaration of ethical principles for medical research involving human subjects. All participants provided written informed consent.
The NormoP cohort participants recruitment started in 2016 in the University of Navarra (Pamplona, Spain). Eligible participants were self-declared healthy individuals, >18 years old, and had a BMI of between 18.5 and 24.9 kg/m 2 . Exclusion criteria included pregnancy, type I diabetes, severe renal and digestive diseases, hydroelectrolitic disorders, acute CVD, cardiac arrhythmias, ictus, neoplasia, anaemia, eating disorders, pharmacological treatment, and dietary supplements that may affect the results. study variables. Anthropometric measurements and the metabolic profile were obtained from databases of the aforementioned cohorts, which followed validated protocols. Data of some characteristics were not available for all the 474 participants. IR was estimated using the validated HOMA-IR index method 10 .
DNA extraction and DNA methylation analysis. Venous blood samples were drawn on EDTA tubes.
Genomic DNA was extracted from PWBCs using the MasterPure TM DNA Purification kit (Epicenter, Madison, WI), whose quality was assessed with the Pico Green dsDNA Quantitation Reagent (Invitrogen, Carlsbad, CA). High-quality DNA samples (500 ng) were treated with bisulfite using the EZ-96 DNA Methylation Kit (Zymo Research Corporation, Irvine, CA) according to the manufacturer's instructions, converting cytosine into uracil. DNA methylation levels were measured by microarray with the Infinium Human Methylation 450 K bead chip technology (Illumina, San Diego, CA, USA) in all the cohorts, except OBEKIT, which was performed with Infinium MethylationEPIC beadchip (Illumina). This analysis was conducted in the Unidad de Genotipado y Diagnóstico Genético from Fundación Investigación Clínico de Valencia, as detailed elsewhere 80 . treatment of methylation raw data. Beta-values have been used as metrics to measure methylation levels. Beta-value in methylation experiments is the estimate of the methylation level using the ratio of the methylation probe intensity and the overall intensity, corresponding to the percentage of methylation on a specific site 81 . After obtaining intensity data using ChAMP package for R v.1.11.0 82 as described elsewhere 83 , the filtering process was performed in probes with a detection p-value above 0.01 in one or more samples, probes with a beadcount <3 in at least 5% of samples, non-CpG probes, probes with SNPs 84 , probes that align to multiple locations 84 and probes located on the X or Y chromosomes.
From the 523 initial participants, samples with a failed CpG fraction above 0.01 were eliminated (n = 20), leaving 503 individuals. After filtering probes, intra-cell type normalization was done using Subset-quantile Within Array Normalization (SWAN) method to avoid the bias introduced by the Infinium type 2 probe design 85 . In order to assess the similarity of normalized methylation samples in both batches and the pooled data, multidimensional scaling plots based on top of 1000 most variable probes were performed. A total of 29 samples failed to fulfil this requisite, which left 474 participants for the subsequent analyses.
After SWAN normalization, magnitude of batch effects were assessed and corrected using the ComBat normalization method, which is an empirical Bayes based method to correct for technical variation related to the slide 86,87 . Furthermore, differences in methylation resulting from differences in cellular heterogeneity were corrected using the Houseman procedure 88 .
The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus 89 and are accessible through GEO Series accession number GSE115278 (https://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=GSE115278). statistical analysis. After pre-processing, LIMMA package from the R statistical software 82 was used to compute a linear regression between DNA methylation values and HOMA-IR. A total of 332 subjects from the MENA project showed data for both variables (Table 1). This analysis was adjusted by the effect of confounding factors such as sex, age, study and bead chip. Raw p-values were corrected using the Benjamini-Hochberg procedure for multiple comparisons, and a FDR cut-off of 0.05 and a slope ≥ |0.1| were used as statistically significant thresholds. The top 10 CpGs were analysed for robustness with Spearman correlations and then, linear regressions between HOMA-IR and methylation adjusted for study, sex, age, and BMI were also performed for the six selected CpGs.
www.nature.com/scientificreports www.nature.com/scientificreports/ The cut-off for HOMA-IR differs for different races, ages, genders, diseases, complications, etc. 90 and no reference value has been established 91 . Since there is no consensus for the HOMA-IR cut point and in order to facilitate the analysis of this metabolically heterogeneous group, a cut-off of HOMA-IR = 3 was chosen, corresponding to a value between the 75th and 80th percentiles, which are established as cut points by International Diabetes Federation (IDF) and Adult Treatment Panel III (ATPIII) for metabolic syndrome 92 . No influences in terms of races were considered, since more than 92% of the individuals were Caucasian in the MENA project and additionally, the study has been considered as a covariate in the analyses. Moreover, some studies have previously used this cut-off for HOMA-IR 93,94 . Differentially methylated CpGs between individuals with HOMA-IR > 3 and HOMA-IR ≤ 3 were explored using two-tailed Student's t-test with Bonferroni correction. A p-value < 6.26·10 −5 was considered significant. Adjusted (for study, sex, age, and BMI) ROC curves were performed to determine the AUC of the top selected CpGs distinguishing individuals between HOMA-IR ≤ 3 or > 3. Furthermore, an internal validation using a correction for optimistic prediction was performed according to Tibshirani's enhanced bootstrap method described by Harrell 64 in order to evaluate the overestimation of the model.
Statistical calculations were performed with STATA version 12.0 (Stata Corp, College Station, TX, USA), unless otherwise indicated. Manhattan plots, correlation graphs and box plots were produced using GraphPad Prism 6 (Graph-Pad Software, CA, USA). The heat map was created with the R software 82 using library gplots and the heatmap.2 function.
Ingenuity pathway Analysis. Differentially methylated CpGs between individuals with HOMA-IR > 3 and HOMA-IR ≤ 3 were analysed by IPA software (Qiagen Redwood City, CA, USA, www.ingenuity.com) as defined in the package. Predefined pathways and functional categories of the Ingenuity Knowledge Base were used in order to detect associated pathways and relevant gene regulatory networks 95 . Pathway analyses were performed with IPA's Core Analysis module. Canonical pathways with a p < 0.05 after Fisher's test were defined as a statistically significant overrepresentation of input genes in a given process.

Data Availability
The data have been deposited in NCBI's Gene Expression Omnibus 89 and are accessible through GEO Series accession number GSE115278 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115278).