Obesity is a well-established risk factor for various chronic diseases, including cerebral vascular disease1, cardiovascular diseases (CVD)2, and cancer3. To date, obesity has been thoroughly studied and considered as a consequence of unhealthy behavioral exposures, such as physical inactivity and excessive energy intake4,5. Despite the heightened awareness of these behavioral risk factors, the epidemic of childhood obesity still imposes a major threat to public health in the U.S. According to U.S. Center for Disease Control and Prevention National Center for Health Statistics6, the prevalence of childhood obesity was about 17% and affected 12.7 million children and adolescents in 2014. Furthermore, it has been estimated that 40%-70% of the variation in obesity-related phenotypes in humans is heritable7,8, suggesting that genetic background also plays a role in obesity. While a large number of obesity-related genes have been identified by genome-wide association studies9,10,11,12, these genetic studies have only been able to explain a limited proportion of the variance in obesity. For example, a common variant in FTO, the strongest obesity-related gene, only explains 1% of the variance in body mass index (BMI)13. Therefore, a more thorough understanding of the factors associated with the susceptibility to obesity is needed.

In the last decade, the epigenetic regulation of gene expression has been studied as a novel mechanism that may fill the gap in the etiology of complex disease. In brief, epigenetics is the study of heritable patterns of phenotype resulting from changes in a chromosome without alterations in the DNA sequence14. It has been suggested as a molecular mechanism that mediates the interplay between behavioral/lifestyle factors and genetics, by adapting the genome to environmental exposures15. Therefore, it is plausible that aberrant epigenetic changes may contribute to the development of human disease. In fact, epigenetic changes, such as DNA methylation, have been associated with the development and progression of chronic disease, including cancer16 and CVD17.

Although there has been a rapid growth in the field of epigenetics and successes in identifying target sites that relate to obesity, the epigenetic basis of obesity remains largely unknown, particularly in early developmental stages, such as adolescence. Therefore, we carried out this cross-sectional study to explore the epigenetic profile, in particular DNA methylation within obesity related genes, in relationship with marker of general obesity in adolescents. For this report, we assessed peripheral blood leukocyte DNA methylation level, through enhanced, reduced representation bisulfate sequencing (RRBS) assay, and BMI percentile obtained from a population based sample of 263 adolescents who participated the Penn State Child Cohort follow-up exam.


The major demographic characteristics of the 263 participants with DNA methylation data were summarized in Table 1. The mean (SD) age of the sample was 16.7 (2.2) years. 78.7% of them were non-Hispanic white and 55.9% were male. The prevalence of tobacco use and alcohol drinking were 10.3% and 19.6%, respectively. When stratified according to 4 CDC-defined age- and sex-specific BMI categories18, the proportion of underweight, normal weight, overweight, and obese in our sample of adolescents were 1.9%, 61.6%, 20.5%, and 16.0%, respectively. While there was no significant difference in racial distribution across the four age- and sex-specific BMI categories (p = 0.23), the proportion of non-Hispanic white was substantially lower in obese group. Indeed, non-Hispanic white participants showed a significantly lower BMI percentile than participants with other race/ethnicity identity (p = 0.01).

Table 1 Demographic and phenotype statistics of the entire study population and stratified by age- and sex-specific BMI categoriesa.

Among the 166,158 analyzable cytosine-phosphate-guanine (CpG) sites, 165,297 (99.5% of) the sites were successfully analysed in linear regression models. After excluding intergenic sites, 103,466 intragenic sites were annotated onto 11,490 unique genes. The Manhattan plot showing the genome-wide p values of the association between DNA methylation level on individual CpG site and BMI percentile is provided in Fig. 1. The 20 most significant site were presented in Supplementary (see Supplementary Table S1). As noted, none of the site passed the threshold for genome-wide significance (p < 4.8E-07), with an estimated genomic inflation factor of 1.04. Further looking into all intragenic sites, 5,669 were associated with BMI percentile at p < 0.05 level. Among them, 28 were on 11 obesity-related genes. In fact, one of the site was the 4th most significant hit (p = 4.2E-05) throughout the entire epigenome. The distribution of sites based on their relationship with obesity genes were summarized in Table 2. Briefly, among the 103,466 intragenic sites, 308 were on obesity-related genes, 9.1% (N = 28) of them were significant at p < 0.05 level, while in sites not related to obesity genes (N = 103,158), 5.5% (N = 5,641) were achieved a p < 0.05. Both hypergeometric and permutation test P values (0.006 and 0.006, respectively) suggested that the obesity-related genes were significantly enriched.

Figure 1
figure 1

Manhattan plot for genome-wide p values for the association. P values were obtained by linear regression analysis with adjustment for age, race, and sex, and batch effect. The y axis shows the −log10(p) of 103,466 sites, and x axis shows their chromosome position. The red dots represent the sites on obesity-related genes. The horizontal red line represents the threshold for significance within obesity-related sites (p = 1.6E-04). None of the sites was significant at genome-wide level (p = 4.8E-07).

Table 2 Distribution of significant sites according to obesity and non-obesity related genes.

We further examined the relationship between site-specific methylation level within obesity-related genes, which was obtained through a query of gene-disease association database (DisGeNET)19, and BMI percentile. The top 15 most significant sites within obesity-related genes were summarized in Table 3. As shown in the Table 2 sites were within SIM1 gene. More importantly, there was 1 site (Chr6: 100903612), which is on CpG:24 of SIM1gene, that remained significantly related to BMI percentile, even after applying a Bonferroni (p = 4.2E-05) or false discovery rate adjustment (q = 0.013). The regression coefficient (SE) between a 10% increase in methylation in this site and BMI percentile was 7.2 (1.7).

Table 3 Top 15 significant sitesa in association between 10% increase in DNA methylation level and BMI percentile.


Although there is an increasing awareness of lifestyle and environmental factors associated with obesity, major gaps still exist in understanding the contribution of epigenetics to the risk of obesity, especially in otherwise healthy adolescents. As obesity is a complex and multifactorial-determined disease, it is highly plausible that BMI is regulated by epigenetic mechanisms in multiple related genes. Since the 1980’s, a large body of evidence supporting the role of DNA methylation as a major epigenetic regulator of gene expression has been obtained20,21,22,23,24,25,26,27. Therefore, the primary aim of this study was to assess the overall association between DNA methylation and BMI percentile. Our enrichment analyses results, derived from a representative cohort of central Pennsylvania adolescents, suggest that an aggregated change of methylation levels in obesity-related genes is significantly associated with obesity in adolescents free of cardiometabolic disease manifestations. This finding confirmed, and more importantly, extended the understanding of the association between DNA methylation and obesity, since a majority of previous studies reported the association based on studies in newborns or adult populations19,20,21,22,23, participants with clinically diagnosed morbidities23,24, or interventional studies25,26,27. For example, Dick et al. reported that DNA methylation in whole-blood DNA was related to BMI in a group of middle-aged myocardial infarction survivors and healthy blood donors22. In another sample of 25 obese men, Milagro and coworkers reported that weight loss was associated with DNA methylation in peripheral blood mononuclear cells in an 8-week nutritional intervention25. We, for the first time, observed that DNA methylation levels in obesity genes are related to BMI percentile in otherwise healthy adolescents. Such an association may operate mainly through energy homeostasis (e.g. SMI1 and CPE) and lipid mobilization (e.g. ADRB3). While methylation level in the individual genes may be weakly related to BMI percentile in adolescents, in aggregate, the enrichment analysis demonstrated a significant relationship. The collective change in methylation profiles in obesity-related genes may well lead to obesity.

Recently, a large amount of epigenetic markers have been reported in both adult28,29 and children30,31,32 in epigenome-wide association studies (EWAS). Wahl et al. in a recent meta-analyses demonstrated that BMI was associated with DNA methylation changes in 187 genetic loci in adults28. Later, Sayols-Baixeras et al. discovered that methylation level in 70 CpGs and 33 CpGs related to BMI and waist circumferences in middle-aged adults and elderly29. In a case-control study between severely obese children and lean controls, Fradin and coworkers found 10 CpGs sites with a difference in methylation greater than 10%30. In another two studies in children, Ding et al. found a total of 852 differentially methylated CpGs in Chinese population31, while Huang et al. identified 129 differentially methylated CpGs among children seeking treating for obesity in Western Australia32. However, none of the CpGs was consistently associated with obesity across these studies, which indicated substantial false positive findings may exist in these EWAS studies. Therefore, our secondary goal of the current study was to identify individual sites, whose methylation level is related to obesity, with focus on sites within genes that known to be associated with obesity. Unlike the previous EWAS studies, none of the sites was significantly related to obesity at genome-wide level. The difference in our results and previous studies may due to the differences in the genetic background of the population and study design. However, we found one novel site in SIM1 that was significantly associated with BMI percentile when focusing on obesity-related genes even after Bonferroni correction. To the best of our knowledge, this is the first time that SIM1methylation has been associated with obesity in adolescents.

SIM1 plays a key role in neuronal differentiation within the paraventricular nucleus of hypothalamus, which is critical for food intake regulation33. Data from both mice34 and human35,36 studies have shown that insufficiency of SIM1 is related to obesity and Prader-Willi-Like (PWL) syndrome. Bonnefond et al. demonstrated a firm link between SIM1 loss-of-function mutations, severe obesity, and PWL features in 1,193 children36. Therefore, hyper-methylation in SIM1, which may result in a repression of gene expression and consequently SIM1 insufficiency, could relate to obesity. Indeed, our analyses showed that increased methylation level on 1 site (Chr6: 100903612) in SIM1 was significantly associated with BMI percentile. Specifically, with 10% increase in methylation level, the corresponding increase in BMI percentile was 7.2 (1.7). While validation is pending, our result provided a potential target site for future studies investigating DNA methylation and obesity. Meanwhile, it should be noted that discovery studies, including the current one, may overestimate the strength of the association. Also, despite previous evidence suggested that DNA methylation in SIM1 may be predictive of obesity, it is also biologically plausible that alternation in DNA methylation was driven by presence of obesity. Therefore, more longitudinal studies are warranted.

The modifiable factors responsible for the variation in methylation levels among obesity-related genes are pending further investigations. One of the hypothesized origins of this variation includes dietary/lifestyle factors in the life course37. For example, in a meta-analysis of 13 cohorts (N = 6,685) of newborns, Joubert and coworkers reported that a large number of sites responded to maternal smoking in pregnancy38. Tsaprouni et al. observed that cigarette smoking altered DNA methylation level at multiple sites in middle-aged adults, and the effect might be partially reversible upon cession39. Moreover, previous data has shown that dietary and physical activity interventions25,26,27,40 could induce changes in DNA methylation levels in genes related to obesity. On the other hand, environmental exposures have also been postulated as a contributor to the variation in DNA methylation41. Specifically, ambient air pollution and heavy metal exposure may trigger changes in DNA methylation at both genome-wide42,43,44 and gene levels45,46,47, with the time-course of the effects ranging from hours to months. For example, two articles based on Normative Aging Study showed that both rapid45 and prolonged46 air pollution exposure are associated with global DNA methylation level changes. Despite cumulating efforts to investigate the factors affecting DNA methylation level, more studies of variation in methylation levels in obesity-related genes among healthy adolescents are needed.

Compared to previous studies exploring the relationship between DNA methylation and obesity, several strengths of our current study may be noted. Our study is the first to investigate such a relationship in a representative sample of adolescents from general population. Majority of the previous studies were conducted in populations in other age groups20,21,22,23,24 or individuals with cardiovascular and metabolic morbidities25,26. As age and cardiovascular health may substantially change the DNA methylation landscape, those results may not be extrapolated to healthy adolescents. Our analyses on DNA methylation level also were based on linear regression models, not case-control comparative approaches. By utilizing the linear regression model, we preserved the statistical power of our data and avoided the dilemma of arbitrarily establishing a threshold for “differential methylation”. In addition, analyses performed in adolescents from the extremes of the obesity distribution30,31,32,48 are unlikely to reflect the association between DNA methylation and obesity in the general population as a whole. Therefore, results from our study may have wider public health implications. Furthermore, our high throughput assay generated genome-wide site-specific methylation levels that were leveraged in the identification of a novel site in SIM1 that was significantly related to BMI percentile. This result may contribute to the current collection of “targeted sites” for future studies.

While the current study possesses the strengths stated above, its limitations shall not be overlooked. First, there were extensive missing data in the raw methylation sequencing data, which is a result of using a novel whole-genome methylation sequencing technique on the low-yield DNA samples. To maintain the validity of our results, we purposely excluded those sites with <10x coverage or available from <50% of the study population. In the enrichment analyses, we further limited our scope on CpG sites intragenic to obesity genes. However, we did not find a statistically significant bias associated with the missing data when comparing BMI percentile in the missing and non-missing groups using a 5% level two-sample Mann-Whitney U test on the 15 sites presented in Table 3. Nevertheless, the degree of missing methylation data has substantially reduced the number of analyzable sites. Second, the number of DNA samples was relatively small. With 263 DNA samples, our statistical power to detect significant a relationship between DNA methylation and BMI percentile is limited, especially at genome-wide level. More importantly, our ability to control for assay-related covariables, particularly cell type mixture, was severely restricted. Our small sample size also prevented us from dividing the entire cohort into discover and validation cohorts. Therefore, external validation is needed to confirm our results. However, both gene-set based and site-based analyses, independently, showed a significant relationship between DNA methylation and BMI percentile. The internal consistency of these results boosts confidence in the findings. Third, there is little previous functional work and evidence confirms that methylation in SIM1 would result in an absence of SIM1 signaling. However, one can postulate that hyper-methylation of SIM1 would suppress its expression. As it has been proved that haploinsufficiency of SIM1 is related to obesity in genetic studies34,35,36, it is plausible that methylation in SIM1 may contribute to reduced SIM1 expression, and through which related to obesity. Fourth, alcohol drinking and cigarette smoking were not controlled for in the analyses. However, since the vast majority of our study population was under legal alcohol drinking age and/or minimum age of purchasing cigarettes, drinking and smoking behaviors obtained through a self-administered questionnaire, are highly likely to be biased. Lastly, the current analyses were based on cross-sectional data. Hence, causal inference cannot be made. In fact, in a recently published paper28, Wahl et al. used genetic association analyses to demonstrate that the alterations in DNA methylation are predominately the consequence of obesity, rather than the cause. However, one could argue that failures in gene imprinting, due to aberrant DNA methylation changes, in particular on those genes affecting growth factors or regulators of genes controlling growth and energy metabolism, is the underlying mechanism linking DNA methylation and obesity8.

In conclusion, we observed that aggregated change in methylation of DNA from peripheral blood leukocytes on obesity-related genes was associated with obesity in a population-based sample of healthy adolescents. Although validation is pending, we further identified a novel site on SIM1, whose DNA methylation level is related to obesity. Taken together, this evidence suggests that DNA methylation change is related to obesity, even in healthy adolescents. Such an early life change may be associated with elevated cardiometabolic risk in adulthood. However, further longitudinal studies with larger sample sizes are warranted. Such studies may be used to confirm our findings and provide deeper insight into the causal direction of the association, as well as possible origins of the variation in DNA methylation levels.

Materials and Methods


For this analysis, we used data from participants in the Penn State Child Cohort follow-up examination. The recruitment and examination procedures for the baseline49 and follow-up50 examinations have been published elsewhere. Briefly, a total of 700 children aged 5–12 years were recruited from central Pennsylvania and participated in the baseline examination in 2002–2006. 421 of them participated in the follow-up examination in 2010–2013, with an average of 7.70 years of follow-up. In the follow-up examination, each participant underwent a standardized physical examination, in which the participant’s height and weight were measured by a trained investigator. BMI was calculated as weight/height2 (kg/m2). Then BMI was converted to BMI percentile in accordance with U.S. Centers for Disease Control and Prevention’s guideline and algorithm51. Participants’ major demographic characteristics, including age, race, sex, tobacco use, and alcohol drinking were collected via a self-administered questionnaire. Parents’ history of obesity, diabetes, and hypertension were also obtained.

The study protocol was approved by Penn State University College of Medicine Institutional Review Board. All research was performed in accordance to relevant guidelines/regulations. Written informed consents were obtained from participants and their parents or legal guardians if younger than 18 years.

Genome-wide Methylation Assay

391 of the 421 participants in the follow-up study consented for blood draw. Fasting peripheral blood samples were collected from each participant and stored at −80 °C until use. Among the 391 blood samples, 263 yielded adequate amount of DNA were sequenced. There was no significant difference in the major demographic characteristics between the subjects whose DNA were sequenced and the original cohort (see Supplementary Table S2). DNA from peripheral blood leukocyte was extracted and subjected to RRBS using a modified method. Single nucleotide resolution of DNA methylation in CpG sites and surrounding regions were detected using Illumina HiSeq. 2500. This highly sensitive, multiplexed method generated a specific, reduced representation of the genome of DNA fragments enriched for CpG dinucleotides. Briefly, genomic DNA (minimum 5 ng) was digested with a methylation-insensitive restriction enzyme, MspI, which recognizes CCGG. The digested DNA fragments were purified and subjected to adapter ligation and size selection by AMPure magnetic beads (Beckman Coulter Inc., Brea, CA, USA.). The resulting libraries, covering the target size range between 40 to 200 bp, were quantified by Kapa Library Quantification Kit (Kapa Biosystems Inc., Wilmington, MA, USA). Equimolar libraries were pooled and unmethylated cytosines (C) were converted to uracils (U) with bisulfite, amplifed by polymerase chain reaction, and then sequenced. The degree of methylation of each fragment, estimated from the number of converted reads compared to the unconverted reas in each CpG, was calculated. Base calls of bisulfite treated sequencing reads with phred quality scores <20 and/or length <35 bp were trimmed and the adaptor was cut using trim_galore V0.3.3 (Babraham Bioinformatics, Cambridge, UK). Resulting reads were mapped to the hg19 assembly and methylation calls were performed using Bismark v0.10.1 (Babraham Bioinformatics, Cambridge, UK). Approximately 1.6 million methylation levels were detected at more than 10x coverage for each subject.

Statistical analyses

The demographic and phenotypic characteristics of the entire cohort, as well as stratified by four CDC defined age- and sex-specific BMI categories18, were summarized as mean (SD) and proportions for continuous and categorical variables, respectively. ANOVA and Cochran-Mantel-Haenszel test were used to compare the distribution of these characteristics, as appropriate.

To ensure the validity of the methylation data and statistical inferences of the analyses, we excluded bases with <10x coverage or available from <50% (N = 132) of the subjects, leaving a total of 166,158 analyzable sites. We used multi-variable adjusted linear regression models to investigate the relationship between DNA methylation and BMI percentile. In the models, BMI percentile and site-specific methylation levels were treated as dependent and independent variables, respectively, while adjusting for age, race, sex, and batch of assay. Sites from the converged models were annotated based on the hg19 assembly. All intragenic sites were then taken forward for enrichment analyses.

From a query of DisGeNET19, a database of gene-disease associations, we identified a list of fifty obesity related genes (Table 4) using a reliability score threshold (≥0.20). The intragenic sites were subsequently categorized into obesity and non-obesity-related sites. Gene-set enrichment was assessed with a hypergeometric test and a permutation test52 with 10,000 permutations. These tests compared the distribution of CpG sites, whose methylation level associated with BMI percentile at p < 0.05 level intragenic to obesity-related and non-obesity related genes. A p value < 0.05 was used to determine the significance of the enrichment analysis.

Table 4 List of top 50 obesity-related genesa.

To identify individual sites, particularly those intragenic to obesity-related genes that were significantly related to BMI percentile, Bonferroni correction was used to account for multiple testing. A genomic inflation factor was also calculated to assess the degree of bulk inflation and the excess false positive rate53. All analyses were performed by using R (version 3.3.0). The entire analytic process was summarized in a diagram (Fig. 2) to graphically illustrate our statistical approaches.

Figure 2
figure 2

Analytic diagram.