Genome-wide DNA methylation and gene expression analyses in monozygotic twins identify potential biomarkers of depression

Depression is currently the leading cause of disability around the world. We conducted an epigenome-wide association study (EWAS) in a sample of 58 depression score-discordant monozygotic twin pairs, aiming to detect specific epigenetic variants potentially related to depression and further integrate with gene expression profile data. Association between the methylation level of each CpG site and depression score was tested by applying a linear mixed effect model. Weighted gene co-expression network analysis (WGCNA) was performed for gene expression data. The association of DNA methylation levels of 66 CpG sites with depression score reached the level of P < 1 × 10−4. These top CpG sites were located at 34 genes, especially PTPRN2, HES5, GATA2, PRDM7, and KCNIP1. Many ontology enrichments were highlighted, including Notch signaling pathway, Huntington disease, p53 pathway by glucose deprivation, hedgehog signaling pathway, DNA binding, and nucleic acid metabolic process. We detected 19 differentially methylated regions (DMRs), some of which were located at GRIK2, DGKA, and NIPA2. While integrating with gene expression data, HELZ2, PTPRN2, GATA2, and ZNF624 were differentially expressed. In WGCNA, one specific module was positively correlated with depression score (r = 0.62, P = 0.002). Some common genes (including BMP2, PRDM7, KCNIP1, and GRIK2) and enrichment terms (including complement and coagulation cascades pathway, DNA binding, neuron fate specification, glial cell differentiation, and thyroid gland development) were both identified in methylation analysis and WGCNA. Our study identifies specific epigenetic variations which are significantly involved in regions, functional genes, biological function, and pathways that mediate depression disorder.


INTRODUCTION
Depression is currently the leading cause of disability worldwide [1], and it is predicted to be one of the three leading causes of illness by 2030 [2]. Although the heavy social and economic burden, the potential molecular mechanisms underlying depression remain poorly understood.
The depression risk is influenced by both genetic and environmental factors. It is suggested that epigenetic modification could mediate the lasting increasing depression risk resulting from exposure to adverse life events and provide a mechanistic framework, where genetic and environmental factors were integrated [3,4]. DNA methylation was one important form of epigenetic modification, and one recent review of 67 studies concluded that there was evidence for the association of DNA methylation variation with depression risk [5]. Additionally, candidate gene studies discovered that BDNF and SLC6A4 hypermethylation were related to depression or major depressive disorder (MDD) [5]. Even currently some significant methylation modifications were found to be associated with depression, however, no consistent results were identified.
Nowadays, using monozygotic twins discordant for a trait or disease has been proved to be a powerful and popular design for EWAS in linking the environmental basis of epigenetic modification variation to disease status while controlling for individual genetic component [6,7]. This design has been extensively used to explore specific DNA methylation variation associated with diseases, such as cognitive function decline [8], Alzheimer's disease [9], and rheumatoid arthritis [10]. Since the Chinese population are different from other ethnic populations worldwide concerning genetic constitutions, environmental exposure, a multitude of life styles and occupations, the DNA methylation variation may also differ. However, to our knowledge, yet no EWAS has been performed to explore the DNA methylation variation associated with depression using Chinese monozygotic twin samples.
Accordingly, we aimed to conduct an EWAS to detect DNA CpG sites associated with depression and further integrate with gene expression data in a sample of middle and ole-aged Chinese monozygotic twins.

MATERIAL AND METHODS
The primary materials and methods of this study were similar to our previously published studies [8,11,12].

Participants
Participants recruitment and collection were described in detail previously [13]. Participants who suffered from cerebrovascular disorders, stroke, traumatic brain injury, central nervous system (CNS) tumor, CNS infections, and alcohol or drug dependence were excluded. Meanwhile, participants who were unconscious, unable, or unwilling to cooperate were also dropped. Finally, a total of 58 complete monozygotic twin pairs with a mean age of 52 years (SD: 7.5) were included in the methylation analysis, and a subsample consisted of 12 twin pairs were included in the gene expression analysis. The median of absolute values of intrapair depression score difference (Δ(depression score)) of all twins was 4 (range: 1-15). The number of twin pairs for Δ(depression score) ranging in 1-5, 6-10, and 11-15 was 39, 15, and 4 in the methylation analysis and 9, 3, and 0 in the gene expression analysis, respectively. The median of ratio difference calculated as |Δ(depression score)|/max(depression score) was 0.41 (range: 0.14-1.00).
After providing written informed consent, all participants took a standardized questionnaire and underwent a health examination. This study was approved by the Regional Ethics Committee of the Qingdao CDC Institutional Review Boards. And the ethical principles of the Helsinki Declaration were also followed.

Assessment of depression
Depression was assessed by the Geriatric Depression Scale-30 (GDS-30) [14]. The GDS-30 had 30 items, and participants were asked to answer "yes" or "no" to the items based on how they felt over the past 1 week. The higher the total score was, the more severe the participant's mental condition was.

Reduced representation bisulfite sequencing (RRBS) analysis
The total DNA was extracted from whole blood and sent to one biomarker technology corporation in China for the RRBS experiment. Briefly, genomic DNA was first digested to generate short fragments that contained CpG dinucleotides at the ends. Then the CpG-rich DNA fragments were extracted and bisulfite-converted. The cDNA library was constructed and sequenced to get raw sequencing data, which was then preprocessed and mapped by Bismark [15] and smoothed by R package BiSeq [16]. The methylation β-value was transformed to M-value for statistical modeling. Finally, a total of 551,447 CpG sites were included.

Cell-type composition estimation
Considering total DNA was extracted from whole blood, different methylation profiles of distinct cell types may lead to false discoveries [17]. We used ReFACTor method to control for the cell-type composition effect on DNA methylation in EWAS [18]. ReFACTor is an unsupervised reference-free method that selects methylation sites, which are informative about the cell composition in the data to apply to principal component analysis (PCA) and further uses the top components of PCA to construct surrogates for the underlying cell-type compositions for adjustment in statistical analysis. In our study, the top five components were chosen as covariates to correct the cell-type heterogeneity.
RNA library construction, sequencing, and quality control Briefly, after total mRNA was extracted from whole blood, the RNA-Seq library was constructed and sequenced to get the sequenced data. The data was then mapped to the human genome by TopHat2 [19]. The gene expression level was estimated by FPKM value through Cufflinks software [20].

Methylation analysis
Epigenome-wide association analysis: The association between the methylation level of each CpG site and depression score was tested by a linear mixed effect model, which was equivalent to the regression model as proposed by Tan et al. [6]. The fixed effect variables of age, gender, and cell-type composition as well as random effect variable of twin pairing were adjusted for in the model. Predicting functions of cis-regulatory regions and ontology enrichments analysis: The identified epigenome CpG sites (P < 0.05) were submitted to the Genomic Regions Enrichment of Annotations Tool (GREAT) online to analyze the functional significance of cis-regulatory regions and ontology enrichments [21]. The default "basal plus extension" association rule was chosen. In this rule, a "basal regulatory region" irrespective of the presence of neighboring genes which extended 5 kb upstream and 1 kb downstream of the transcription start site (TSS) were firstly assigned. Then each gene's regulatory domain was extended up to the basal regulatory region of the nearest upstream and downstream genes, but no longer than 1 Mb in each direction. FDR < 0.05 was set as statistically significant in ontology enrichments analysis.
Differentially methylated region (DMR) analysis: Based on the bisulfite sequencing data and corresponding EWAS results, the DMRs associated with depression score were detected by using the comb-p approach [22]. Significant enriched DMRs were evaluated by Stouffer-Liptak-Kechris (slk) corrected P-value < 0.05.

Gene expression analysis
Weighted gene co-expression network analysis (WGCNA): The WGCNA package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis [23,24]. Briefly, we first established a weighted adjacency matrix. Then the topological overlap matrix (TOM) was constructed [25][26][27] and used as input for hierarchical clustering analysis [28]. Gene modules were detected by using a dynamic tree cutting algorithm. The module eigengenes (MEs) were correlated with the trait of depression score. Relationships among modules were illustrated by a hierarchical clustering dendrogram of MEs [29], and a heatmap plot of the corresponding eigengene network. Intramodular hub genes were defined following criteria of depression score based gene significance (GS) > 0.70 and module membership (MM) > 0.90 with a threshold of P-value < 0.01 [30].
For the genes clustered in the module associated with depression score, GO enrichment analysis and BIOCARTA, KEGG, and REACTOME pathway enrichment analysis were conducted by the DAVID tool [31,32]. The modified fisher exact P-value < 0.05 was considered as enrichment cut-off criterion.
Differentially expressed genes analysis: Five depression cases (depression score > 10) and eight health controls were included. The gene expression levels of 46 genes (including the genes where the top CpG sites and the DMRs were located) between the two groups were compared by the Wilcoxon rank sum test. The P-value < 0.05 was considered as statistically significant.

Methylation analysis
A total of 58 monozygotic twin pairs with a mean age of 52 years (SD: 7.5) were included. The median of depression score was 8 (95% range: 0-21). Most of the clinical indicators were statistically intrapair correlated, indicating that the co-twin design could be beneficial (Table 1).
Predicting functions of cis-regulatory regions and ontology enrichments analysis A total of 15,978 genomic cis-regulatory regions were identified to be associated with one or more genes. (Supplementary Fig. 1) Many important pathway terms probably related to depression were significantly enriched, such as Notch signaling pathway, nicotine pharmacodynamics pathway, Huntington disease, p53 pathway by glucose deprivation, Parkinson disease, and hedgehog signaling pathways. Moreover, the GO enriched terms mainly highlighted DNA binding and nucleic acid metabolic process (Table 3).
The DMRs were located at/near 19 genes, among which DGKA and NIPA2 might play an important roles in regulating depression. Interestingly, several DMRs covered the top CpG sites listed in Table 2. The DMR1 (located at CAGE1), DMR3 (PTPRN2), and DMR9 (PRDM7) covered 4 CpG sites, and the DMR7 (located at FCGBP) covered three CpG sites.

Gene expression analysis
There were 12 twin pairs (including seven male pairs) with a median age of 53 years (95% range: 43-65) and a median depression score of 7.5 (range: 1-27) included in the gene expression analysis.
The common genes and enrichment terms between methylation analysis and WGCNA The CpG sites (P < 0.05) were annotated to 2808 genes, of which 404 genes were also clustered in the pink module in WGCNA. Among these common genes, DENND5B, KBTBD13, TENM3, and BMP2 were also identified as hub genes following our strict criteria. And genes including PRDM7, KCNIP1, PLEKHH3, GRIK2, PROB1, PAX3 were where the top CpG sites or DMRs were located. (Supplementary Table 2) Many common enrichment terms were also found, including extra cellular matrix (ECM)-receptor interaction pathway, maturity onset diabetes of the young pathway, complement and coagulation cascades pathway, DNA binding, neuron fate specification, glial cell differentiation, thyroid gland development, and cellular response to hormone stimulus.

DISCUSSION
In this study based on monozygotic twins, we detected some important epigenetic variants underlying depression by EWAS. A total of 66 interesting CpG sites (P < 1 × 10 −4 ) and 19 DMRs were identified. Moreover, many crucial GREAT ontology enrichments were identified. Genes clustered in the pink module were positively correlated with depression score in WGCNA, and many genes and enrichment terms were overlapped between methylation analysis and WGCNA. Finally, four genes were found to be differentially expressed in depression cases and health controls.
In EWAS, some genes where the top CpG sites and DMRs were located (Tables 2 and 4) may play essential roles in regulating depression status: (1) PTPRN2: DNA methylation variation of PTPRN2 was found to be associated with mood state disturbances across [33]; (2) HES5: HES5 could negatively regulate 5-HT1A receptor gene, which was related to MDD and suicide [34]; (3) GATA2: It was reported that overexpression of human GATA2 interfered with spine formation and produced depressive behavior in rats [35]; (4) DGKA: Blood transcript levels of DGKA differed significantly between participants with MDDs and nondepressed controls [36]; (5) NIPA2: It was suggested that rare copy number variants (CNVs) in NIPA2 could increase the risk of MDD by disrupting regulatory regions [37]; (6) PRDM7: The protein encoded by this gene was involved in lysine degradation pathway, and lysine methylation was a physiological post-translational modification of tau protein which played an important role in aging and Alzheimer's disease [38]; (7) KCNIP1: The protein encoded by this gene was a member of the family of cytosolic voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), and could regulate rapidly inactivating (A-type) currents and hence neuronal membrane excitability; (8) GRIK2: GRIK2, as one glutamate-related gene, might be related to risk for mood disorders [39], and the gene polymorphism of GRIK2 was associated with depressive symptoms [40]. The other genes have an unknown function in terms of depression now, whereas they may also be interesting potential candidates to be future researched and validated. As additional validation, we integrated the methylation data with gene expression data. Genes clustered in the pink module were positively correlated with depression score in WGCNA. And some genes were in common with EWAS findings, like BMP2, PRDM7, KCNIP1, and GRIK2. It was indicated that histone deacetylases could control neurogenesis in embryonic brain by inhibiting BMP2/4 signaling [41]. The other three genes had been discussed above. Additionally, four genes HELZ2, PTPRN2, GATA2, and ZNF624 were differentially expressed between depression cases and health controls. PTPRN2 and GATA2 have been discussed above, whereas the biological of HELZ2 and ZNF624 involved in depression remained to be studied further.
Two strengths can be noticed in our study. Since the case cotwin design using monozygotic twins discordant for a trait or disease was a powerful tool for EWAS [6,7], our results based on the twin data would be credible. Meanwhile, considering the various genetic constitutions, environmental exposures, and a multitude of life styles in different ethnic populations worldwide, our findings will specifically help elucidate the underlying pathogenesis of depression in the Chinese population.
Nevertheless, some limitations of our study should also be considered. First, the sample size of the present study was relatively limited due to the challenges of recruiting and identifying qualified twin pairs. We'll further validate the top CpG sites, essential genes, and biological pathways in a community population. And we'll also evaluate if the physical distribution of top CpG sites at different chromosomes is overrepresented in the regulatory domain of one specific biological pathway. Additionally, we'll conduct a causal effect analysis based on one specific biological pathway by integrating data of genetic variation, epigenetic variation, and environmental factors. Second, the Townsend deprivation index (TDI) was    indicated to be associated with depression [42,43]. However, we couldn't add this factor as a covariate in the linear mixed effects model, because we didn't investigate the corresponding information of TDI during the epidemiological survey. We'll consider the TDI factor in the validation analysis in the future.
In summary, we confirm that epigenetic factors are significant in explaining depression through twin-based analysis. We detected    multiple CpG sites, genes, DMRs, and pathways that were potentially associated with depression. The findings provided an important clues to further elucidate the pathogenesis of depression and helped to identify new diagnostic biomarkers and therapeutic targets for this disease.

DATA AVAILABILITY
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.