Background

Major depressive disorder (MDD) affects over 350 million people worldwide and is projected to be the leading cause of disease burden by 20301. Although the etiology of MDD involves genetic and environmental factors2, the precise underlying mechanisms remain poorly understood. A growing body of evidence has implicated a role for DNA methylation in MDD3,4, but the specific genes and biological pathways underlying MDD still remain unclear.

DNA methylation is influenced by genetic5, early life environment, and behavioral factors6, and is tissue and cell-type specific7. Thus, establishing the relationship between DNA methylation and mental illnesses, such as MDD requires the control of these potential confounding variables. Monozygotic (MZ) discordant twin pairs provide a powerful tool to examine the role of epigenetic mechanisms in depression, because they are matched on genotype, age, and sex. Moreover, identical twins reared together share early life familial environment8, which may contribute to the risk of depression later in life. At the time of this writing, several epigenome-wide association studies (EWAS) have been conducted to identify differentially methylated (DM) genes/regions associated with MDD using either unrelated individuals4 or twin pairs9,10,11,12. However, almost all previous studies utilized heterogeneous cell types from blood, buccal, or postmortem brain tissue samples. As different cell types have different DNA methylation profiles7, the use of homogeneous cell types such as purified monocytes should minimize confounding by cellular heterogeneity in epigenetic research. Moreover, because monocytes are key innate immune cells involved in inflammation, a mechanism known to be implicated in MDD13, identifying methylation changes in circulating monocytes is likely to provide mechanistic insight into disease pathogenesis. Further, the potential functional consequences of epigenetically altered genes on gene expression have not been adequately evaluated in previous studies.

Here we report findings from an integrated genome-wide profiling of DNA methylome and transcriptome in purified blood monocytes from 79 MZ twin pairs discordant for lifetime history of MDD, followed by replication in brain tissue and network analysis. Our goal is to identify key candidate genes and novel pathways associated with MDD.

Methods

Twin pairs

The current analysis included 79 MZ twin pairs discordant on lifetime history of MDD. Twins included in the current analysis were enrolled through the Mood and Methylation Study (MMS), an observational study designed to understand epigenetic and other molecular mechanisms underlying MDD using a co-twin control design. Detailed information regarding the study design, twin recruitment, clinical examination, and sample collection for the MMS has been described elsewhere14. Briefly, all twins enrolled in the MMS were members of the Washington State Twin Registry (WSTR), a community-based twin registry consisting of over 10,000 twin pairs15. Detailed methods for the construction of the registry and enrollment of twin pairs have been described previously15. All twin participants provided informed consent. Zygosity of the twin pairs included in the present study was confirmed using the 59 polymorphic SNPs included in the HumanMethylationEPIC array. Power analysis showed that we should have 80% power to detect a 1.1-fold change (FC) in DNA methylation at a genome-wide significance level16.

MDD diagnosis

To identify twin pairs for the present study, we sent an introductory letter to WSTR members via postal mail and email that asked general questions about MDD. These letters were only sent to twins where one or both members were in Western Washington because of the need for an in-person study visit in Seattle. Interested twins were then interviewed by trained research staff using approved scripts to screen for likelihood of eligibility. After pre-screening, lifetime and current MDD diagnoses were determined using the Structured Clinical Interview for DSM-IV Research Version (SCID-4-RV). Interviews were administered via phone by a clinical psychologist (E.D.S.), who was blind to pre-screening clinical information about both the interviewee and his or her co-twin, and final diagnosis was confirmed by consulting the senior psychiatrist (P.P.R.B.). The final sample was drawn from a total of 693 clinical interviews and all eligible and interested MZ pairs who could complete the in-person visit prior to the end of the enrollment period were enrolled. A discordant pair was defined as a twin pair in which one twin met the criteria for lifetime history of MDD, and his/her co-twin did not.

Inclusion/exclusion criteria

Only complete twin pairs were eligible for the present study. As over 85% of the twins in the WSTR are Caucasian (which reflects Washington State generally), all twin pairs included in the current analysis are Caucasian. Inclusion criteria included: (1) monozygosity determined by DNA analysis; (2) pairwise discordance on lifetime history of MDD; (3) aged 18 or older; (4) reared together; (5) Beck Depression Inventory II (BDI-II)17 score < 13 for the non-depressed twin (0–13 is the range for ‘minimal symptoms’ recommended by the BDI-II authors); and (6) willingness to provide blood samples. The primary exclusionary conditions included schizophrenia or other psychotic disorder, bipolar disorder, current substance use disorder, cancer within the past 5 years, autoimmune disorders, uncontrolled endocrine disorders, and uncontrolled sleep apnea.

Other measures

Each twin was also asked to complete standard questionnaires regarding sociodemographic factors, lifestyle, early life experience, use of psychiatric medications, and disease history. Severity of current depressive symptoms was assessed by the BDI-II and the Quick Inventory of Depressive Symptomatology (QIDS-SR-16) at the time of the blood draw18. Self-reported early life stress was measured using the Adverse Childhood Experience (ACE)19 and Early Trauma Inventory (ETI)20 questionnaires. PTSD was not exclusionary in this study due to high comorbidity with MDD, but was diagnosed and recorded during the SCID-RV interviews. Participants reported their use of medications commonly prescribed for MDD along with the approximate duration of use (ranging from 2 weeks to >10 years). They were also asked to list any additional psychiatric medications they had taken. For the current analyses, medications were first categorized into types (i.e., antidepressants, benzodiazepines, mood-stabilizers, and other (i.e. Ritalin)), and then translated into a “yes/no” variable for each of these categories for each participant. A participant was defined as having a history of antidepressant use if he/she used any of the medications.

Monocyte isolation and DNA/RNA extraction

Monocytes were isolated from fresh peripheral blood (collected into EDTA vacutainer tubes) using the Monocyte Isolation Kit II from Miltenyi Biotec (Auburn, CA, USA). DNA/RNA was isolated from monocytes using the AllPrep DNA/RNA/miRNA Universal Kit (Qiagen, CA, USA) according to manufacturer’s instructions. Double strand DNA and total RNA was quantified using PicoGreen Fluorometry. RNA integrity was assessed by capillary electrophoresis (e.g., Agilent BioAnalyzer 2100). Except for three samples with RNA integrity number (RIN) < 8 (one sample 7.0, one sample 7.6 and one sample 7.9), all other samples have RIN > 8. Of these, 88% samples have RIN > 9 (Table S1).

DNA methylation profiling

The Infinium HumanMethylationEPIC BeadChip (Illumina Inc., CA, USA)21 was used for DNA methylation profiling. Genomic DNA (500 ng) was bisulfite converted using the EZ-96 DNA methylation kit (Zymo Research, CA). Modified DNA was then amplified, fragmented, and hybridized, followed by fluorescent staining and scanning on a HiScan scanner (Illumina Inc.). The methylation level of each probe was represented as a β-value ranging from 0 (unmethylated) to 1 (fully methylated). To ensure accuracy of the methylation assay, each batch included two CEPH DNA samples (Coriell Institute, NJ) as positive controls. Twin pairs were hybridized on the same chip to minimize batch effect.

Methylation data pre-processing and QC

We first removed the following: (1) probes with a detection p > 0.05 in more than 20% of the samples; (2) probes with raw signal intensities greater or less than three SD from the mean; and (3) probes located on sex chromosomes (both X- and Y-chromosomes). Probes passing initial QC were then annotated to the UCSC (GRCh37/hg19), and those mapped to multiple locations or overlapping with known SNPs were further removed. The final analyses included 813,382 autosomal probes. All samples passed QC procedures. Prior to analysis, DNA methylation data were normalized with functional normalization using the R package minfi22. This method corrects for bias of different probe types and batch effects via an unsupervised approach22.

Transcriptome profiling by RNA-seq

Monocyte gene expression was quantified by paired end RNA-seq (50 PE). Using sequence-specific Ribozero capture probes, total RNA (500 ng) was globin-depleted and rRNA-depleted, fragmented, and reverse transcribed. Trueseq libraries were quantified using PicoGreen Fluorimetry and sequenced on HiSeq 2500. Sequencing reads were aligned to the UCSC (GRCh37/hg19) by bowtie223. Gene expression level measured as transcription per million reads (TPM) was quantified by RSEM v 1.1224. Genes with expression level below an appreciable level (log2(TPM + 1) < 1) in more than 5% samples were discarded. A total of 10,329 genes in all samples was included in the statistical analysis. At least 20 million reads per sample were captured in RNA-seq analysis (Table S1).

Replication in the brain

To replicate the putative DMRs identified in blood monocytes, we downloaded brain DNA methylation data (HumanMethylation450K BeadChip) through the publicly available GEO database (GSE41826). The brain tissue was collected by the NICHD Brain Bank of Developmental Disorders25, and DNA methylation data were generated in sorted neuronal nuclei from 58 postmortem brain tissue, including 29 MDD patients and 29 controls (51.7% female, mean age 32.6 ± 16.0 years old). Demographic information of the brain donors was described previously25.

To replicate the putative DEGs identified in blood monocytes, we downloaded brain gene expression data (RNA-seq) through publicly available GEO database (GSE101521). The expression data were generated in 59 postmortem brain tissue (29% females, mean age 49.3 ± 20.3 years old) collected by The Division of Molecular Imaging and Neuropathology at the New York State Psychiatric Institute and Columbia University. In brief, the whole-exome gene expression was examined in non-psychiatric controls (N = 29), MDD suicides (N = 21), and MDD non-suicides (N = 9) in postmortem brain tissue (dorsal lateral prefrontal cortex, Brodmann Area 9). All participants were free of medications. Detailed information of the brain samples was described previously26. Depression was diagnosed by the structured clinical interview (DSM-IV) in both datasets.

Statistical analysis

The primary goal of our statistical analysis is to identify differentially methylated genes/regions (DMRs) associated with MDD. The overall study design and analytical plan was shown in Fig. S1.

Identifying differentially methylated regions (DMRs) associated with MDD

As DNA methylation between adjacent probes could be functionally and/or spatially correlated, identifying genomic regions containing biologically relevant probes should be preferable compared to single CpG analysis. To achieve this, we employed a mixed-model designed specifically for discordant twin pairs27

$${\mathrm {Log}}_2\left[ {\frac{{{\mathrm {Methylation}}\,{\mathrm {level}}\,{\mathrm {in}}\,{\mathrm {depressed}}\,{\mathrm {twin}}}}{{{\mathrm {Methylation}}\,{\mathrm {level}}\,{\mathrm {in}}\,{\mathrm {nondepressed}}\,{\mathrm {cotwin}}}}} \right] = \alpha + {\sum} {\beta _ix_i} + {\sum} {1|y_j}$$

In this model, βi represents regression coefficient for the ith fixed-effect variable (e.g., age, sex), and yj denotes the random-effect of the jth covariate (e.g. batch effect). The intercept α stands for the mean FC in DNA methylation between depressed and non-depressed twins within a pair. By testing the null hypothesis of α = 0, we were able to test whether DNA methylation alteration (α > 0—hypermethylation, α < 0—hypomethylation) at a specific CpG site was associated with MDD or not. The effect of environmental factors on DNA methylation was assessed by testing, where β equals 0 or not (β > 0 indicates that the corresponding covariable causes hypermethylation, β < 0 indicates that it causes hypomethylation at the CpG site being tested in the depressed twins). This model allows for establishing a link between environmental exposure, epigenetic alteration, and depression. In this analysis, we adjusted for covariates including twin age, sex, BMI, smoking (pack-year), alcohol consumption, family income, and education level as fixed effect, and batch was included as random effect.

Region-based analysis was performed using results from single CpG analysis (p-value and effect size) by the program DMRcate28, which identifies genomic regions harboring CpG sites and accounts for correlations between adjacent probes. Here we defined a DMR as a region containing ≥ 5 correlated probes (peak probe p < 0.01 and correlation among probes ≥ 0.30), and a significant DMR was defined as a region with q < 0.05 after correcting for total number of 6858 regions. Putative DMRs were ranked and annotated to genomic features based on the UCSC database (GRCh37/hg19). Genomic features of DMRs were compared to the null distribution of CpG probes included in the MethylationEPIC array21.

Differentially expressed genes (DEGs) associated with MDD

Using the statistical model described above, we identified DEGs associated with lifetime history of MDD, adjusting for same covariates.

Integrated methylome and transcriptome analysis

To examine the impact of DNA methylation on gene expression in peripheral blood monocytes, we calculated partial correlation coefficients (corrected for twin age, sex, BMI, smoking, alcohol, family income, and education) between DNA methylation and cis-acting gene expression for each probe in all samples. Here cis-acting was defined as correlation between DNA methylation of a putative gene with its own expression (±5 kb to a tested probe). We used a conservative threshold to determine significant negative (partial correlation < −0.84) or positive (partial correlation > 0.84) correlation. This threshold was obtained by randomly permuting DNA methylation and gene expression datasets for all twins. The cutoff was based on the fifth percentile of the empirical distribution of partial correlation coefficients, assuming no correlation between methylation and gene expression across all participants (null hypothesis). To further examine the role of DNA methylation in gene regulation, we used the Fisher’s Exact test implemented in the GeneOverlap software29 to determine the statistical significance of genes showing both differential methylation and differential expression in relation to MDD status.

Methods used for the replication in brain

For each of the 39 DMRs identified in blood monocytes, we tested the association of each probe within each region with depression (y/n) using logistic regression, adjusting for age and sex. Similar analysis was conducted to test the association of each putative DEG with depression, adjusting for age, sex, RIN, and brain PH. Multiple testing corrected for 39 DMRs or 30 DEGs using false discovery rate (FDR).

Co-methylation and co-expression networks

To examine the correlation patterns among putative DMRs and identify genes that are co-methylated, we performed network analyses using the weighted gene correlation network analysis (WGCNA). Co-methylation or co-expression networks were constructed separately in depressed twins and their non-depressed co-twins using all genes. Differential methylation or expression networks were identified by comparing the connectivity of differential genes between the two groups. This analysis included 322 DMRs showing nominal association (raw p < 0.001) with MDD. Similar analysis was used to identify co-expression networks (326 genes with raw p < 0.001 was included). Network visualization was done using CytoScape30.

Functional-enrichment analysis

To explore the potential functional relevance of the identified DM or differentially expressed (DE) genes, we conducted functional enrichment analysis using the program DEPICT31. We first tested whether the putative genes (p < 0.001) are enriched in GWAS loci for major depression. A total of 746 independent SNPs previously associated with major depression (p < 1 × 10−5) was downloaded from the GWAS catalog32 for this analysis. We then tested whether the identified genes are enriched in gene sets related to antidepressants by extracting 283 drug target genes from the Open Targets database33.

Sensitivity analysis

To examine whether ACE modulates the association between DNA methylation and MDD, we further adjusted for ACE (y/n) in the above-described statistical model. Similarly, we tested the influence of antidepressants usage (y/n) or history of PTSD (y/n) on the relationship between DNA methylation and MDD.

Control for multiple comparisons

In the above-described analyses, we adjusted for multiple testing by FDR and FDR-adjusted p (i.e., q-value) < 0.05 was considered statistically significant.

Results

Table 1 shows the characteristics of the twins (mean age 38.2 ± 15.6 years, 68.4% females). Of the 79 twins with MDD, 8 twins have current MDD and 71 have MDD in the past. In addition, among the 79 depressed twins, 12 twins were under current medications, and 30 twins were taking medications in the past. Of note, 12 non-depressed co-twins were taking antidepressants prescribed for a number of different clinical indications outside of MDD, including chronic pain, anxiety, insomnia, and post-partum mood symptoms. None of the non-depressed co-twins with a history of antidepressant use met criteria for any MDD at any point in their lives. Except for current BDI-II score, PTSD, and use of antidepressants, depressed twins did not differ significantly from their non-depressed co-twins.

Table 1 Clinical characteristics of twin pairs participating in the MMS

DMRs associated with lifetime history of MDD

We identified 39 DMRs (annotated to 36 unique genes) significantly associated with MDD at q < 0.05 (Table 2). Of these, 33 DMRs are hyper-methylated, and 6 are hypo-methylated in relation to MDD. Figure 1 shows a Manhattan plot for these DMRs. The genomic distribution of these DMRs is shown in Fig. S2, which shows that the identified DMRs are enriched in the first exon and promoter regions but depleted in intergenic regions. In relation to CpG context, the identified DMRs are largely located within CpG Islands (CGIs).

Table 2 Significant DMRs associated with lifetime history of MDD in MZ discordant twin pairs
Fig. 1: Manhattan plot displaying the DMRs associated with MDD in monozygotic discordant twin pairs (N = 79 pairs).
figure 1

The p-values (−log10) of each DMR are plotted against their respective positions on each chromosome. The genome-wide threshold (q < 0.05) is indicated with a red line

DEGs associated with MDD

We identified 30 DEGs (14 upregulated, 16 downregulated) associated with lifetime history of MDD at q < 0.05 (Table 3). A Manhattan plot of these putative DEGs is shown in Fig. S3.

Table 3 Significant DEGs associated with lifetime history of MDD in MZ discordant twin pairs

Replication in the brain

Of the 39 DMRs identified in blood, 10 regions contain at least one CpG probe showing significant association (same direction) with MDD (q < 0.05) in the brain after adjustments for covariates and total number of probes in each region (Table S2).

Of the 30 DEGs identified in blood monocytes, two genes (NDUFA8, GUSBP9) were also significantly associated with MDD (q < 0.05) in the brain (Table S3). While the expression level of the NDUFA8 gene was lower in MDD patients than that in controls (i.e., downregulated) in both blood and brain, the GUSBP9 gene was in an opposite direction (i.e., upregulated in blood, but downregulated in the brain between cases and controls).

Genome-wide integration of DNA methylome and transcriptome in blood monocytes

Figure 2 displays the genome-wide correlation patterns between DNA methylation and cis-acting gene expression in peripheral blood monocytes. It shows that DNA methylation was largely negatively (74% of the correlation pairs) correlated with gene expression, but positive correlations were also observed. Interestingly, we found that the correlation patterns between DNA methylation and gene expression vary by genomic locations, with genes located about 1 kb upstream (negative) or downstream (positive) of the transcription start site (TSS) showing stronger correlation, whereas those located in between showing weak or no correlation (Fig. S4). These correlation pairs (Table S4) involve 140 distinct methylated loci (from 64 unique genes) and 60 corresponding expression regions (representing 57 unique genes). The observed correlation between DNA methylation and gene expression appears to be independent of MDD, because after additional adjustment for MDD status (y/n) in the correlation analysis, majority (93.8%) of the correlation pairs remain statistically significant.

Fig. 2: Circos plot showing the genome-wide relationship between DNA methylation and gene expression in peripheral blood monocytes in relation to major depression.
figure 2

The outermost ring displays chromosome numbers and bands. The second ring (green) shows differential methylation in depressed (red) and non-depressed (blue) twins. The third ring shows Pearson correlation between DNA methylation and gene expression in 500 bp bin in the combined samples. The inner most circle (yellow) represents mRNA differential expression in depressed (red) and non-depressed (blue) twins. The height of the histogram bins indicates the level of DNA methylation or gene expression

Differential network analysis

Our co-methylation network analysis identified three differential co-methylation modules containing at least 50 genes. Table S5 lists the co-methylation modules along with hub genes (genes with highest module membership) and biological pathways in each module. A total of 304 genes (94.4% of the total 322 genes showing nominal association with MDD) were assigned to at least one module. The largest module (Fig. S5) comprises 167 genes involved in four biological processes. The network connectivity (measured by node degrees) of two pathways in this module significantly differs between depressed and non-depressed twins. For example, the node connectivity for a pathway named “negative regulation of neuron apoptotic process” in depressed twins was significantly higher than that in non-depressed twins (3.8 vs. 2.6, P = 7.15 × 10−5), whereas the node connectivity of another pathway called “stress-activated protein kinase signaling cascade” was significantly lower in depressed twins compared to their non-depressed co-twins (2.8 vs. 4.2, P = 2.21 × 10−5).

Our co-expression network analysis identified five modules containing 292 genes (Table S6). The largest module comprises 94 genes involved in eight biological processes. Of these, the “positive regulation of cytokine secretion” pathway showed significant difference between depressed and non-depressed twins (Fig. S6)

Functional-enrichment analysis

The identified DM genes are significantly enriched in pathways related to stress-activated protein kinase signaling cascade, neuron apoptotic process, negative regulation of insulin receptor signaling, mTOR signaling, and nerve growth factor receptor signaling pathways (Table S7). The differentially expressed genes are enriched in biological processes related to “positive regulation of cytokine secretion” and “regulation of response to stress”. Gene overlapping analysis revealed that the putative DM genes are 2.44 times more likely to be differentially expressed (P = 1.1 × 10−4), and are significantly overrepresented in previous GWAS loci for major depression (2.32 times, P = 2.4 × 10−4). Moreover, these DM genes are significantly enriched in drug targets related to antidepressants (2.83 times, P = 7.6 × 10−5), and are highly expressed in tissues/cell types related to the nervous, endocrine, and urogenital systems (Fig. S7). Together, these results suggest a potential functional role of altered DNA methylation in MDD pathogenesis.

Sensitivity analysis

Additionally adjustment for childhood traumatic experience (ACE) slightly attenuated the association between DNA methylation and MDD, but results remained largely unchanged. After further correction for use of antidepressants or PTSD, the association of the MAFF gene with MDD disappeared, but other genes remained statistically significant. Results for sensitivity analysis are shown in Table S8. It appears that additionally adjustments for adverse childhood events (ACE), PTSD, or use of antidepressants did not have an appreciable effect on the association between gene expression and MDD (Table S9).

Discussion

Using a MZ discordant co-twin control design, we identified 39 DMRs (33 hypermethylated, 6 hypomethylated) and 30 differentially expressed genes (14 upregulated, 16 downregulated) associated with lifetime history of MDD, after accounting for clinical covariates and multiple testing. These DM or expressed genes are significantly enriched in biological processes related to neuronal function, stress response, insulin regulation, mTOR signaling, and cytokine secretion, suggesting potential relevance to MDD pathogenesis. Moreover, the identified DMR genes are overrepresented in GWAS loci and drug targets related to antidepressants. Integrated DNA methylome and transcriptome analysis revealed that DNA methylation was both negatively and positively correlated with gene expression in peripheral blood monocytes. To the best of our knowledge, this is the first integrated DNA methylome and transcriptome analysis in purified blood monocytes for lifetime history of MDD using a relatively large number of MZ discordant pairs from a community-based population cohort.

In a recent meta-analysis examining the association between blood DNA methylation and depressive symptoms among middle-aged and elderly persons4, DNA methylation at three CpG sites were significantly associated with depressive symptoms. One of these probes (cg12325605) could be replicated in our study at a nominal level (P = 2.28 × 10-4). Multiple prior studies have also reported possible associations of DNA methylation in stress-related genes34 or genome-wide associations using unrelated individuals or twins in various tissues (peripheral or postmortem brain)10,35,36. However, due to the heterogeneity of MDD phenotypes and many other differences between studies (e.g., racial/ethnic groups, age ranges of participants, tissue/cell types, risk factor profiles, etc.), it is challenging to compare results across different scenarios, and no conclusive genes have been identified so far. Of the limited number of existing EWAS on MDD, the present study represents the first to utilize purified blood monocytes in a relatively large sample of MZ twin pairs discordant on MDD.

Of the 36 annotated DMR genes, the PRSS21 gene showed the most significant association with MDD. The methylation level of this gene in depressed twins is on average 1.13-fold as high as that in their non-depressed co-twins. This gene (also known as testisin) encodes a glycosylphosphatidylinositol (GPI)-linked serine protease, which is a member of the trypsin family of serine proteases. A growing body of evidence demonstrates that the brain can co-opt the activities of these serine proteases and their receptors/inhibitors to regulate various processes including synaptic activity, learning, and social behavior37. Aberrant activity of these molecules may contribute to neurological disorders such as Alzheimer’s disease, Parkinson’s disease, traumatic brain injury, and stroke37. Another significantly hypermethylated gene is HSPB11, which encodes a family member of the heat shock proteins (HSPs) that are produced in response to stressful conditions38. Although the exact mechanisms behind the association of HSPB11 hypermethylation with MDD are unknown, HSPs are involved in protein misfolding and aggregation, which have been implicated in neurodegenerative and neuropsychiatric disorders39. Other top-ranked DMR genes, such as AAK1, SORBS2, and GAREM2, are also abundantly expressed in the brain and may affect MDD susceptibility through a variety of biological processes. For example, the AAK1 gene is involved in intracellular vesicle trafficking, a mechanism that is essential for neurotransmitter release and recycling of synaptic vesicle proteins40. Inhibition of AAK1 activity may provide a novel therapeutic target for treating neuropsychiatric and neurodegenerative disorders41. The SORBS2 gene encodes the Arg protein tyrosine kinase-binding protein 2 (ArgBP2), downregulation of which was previously associated with mood disorders42. The GAREM2 gene encodes an adapter protein that regulates MAPK/ERK signaling, which is involved in neuronal plasticity and resilience in psychiatric disorders43. Further, the identified DMR genes are enriched in neuron apoptosis44, nerve growth factor45, stress-activated protein kinase signaling46, insulin receptor regulation47, and mTOR signaling48, suggesting potential relevance to MDD pathogenesis.

Of the differentially expressed genes associated with MDD, the peroxisomal trans-2-enoyl-CoA reductase (PECR) gene showed the strongest association. This gene is involved in mitochondrial energy production by catalyzing the reduction of enoyl-CoAs to acyl-CoAs, and genetic polymorphisms in this gene were associated with alcohol dependence49. The differentially expressed genes are enriched in regulation of cytokine secretion and oxidative stress responses, lending further support for the critical roles of inflammation and oxidative stress in major depression13.

Previous studies have demonstrated that zinc plays an essential role in synaptic activity and neuronal plasticity, and that zinc deficiency was associated with behavioral impairments50, neurodegenerative disorders, and mood disorders including depression51. In line with these findings, we found that several zinc family genes are DM (e.g., SLC30A3, ZNF212, ZBTB45, SWSAP1, WT1, TRIM39) or differentially expressed (e.g., ZNF200, ZNF101, ZNF493, ZNF816, ZNF487, ZNF772) between depressed twins and their non-depressed co-twins. These findings provide further support for a potential important role of zinc dysregulation in depression, and suggest that DNA methylation may modulate the effect of zinc function on depression susceptibility. Together, our results may unravel novel molecular pathways underlying MDD pathogenesis.

Although DNA methylation is generally believed to cause gene silencing, a global analysis of the extent and pattern of DNA methylation with gene expression in human blood monocytes is still lacking. Here we demonstrated that monocyte DNA methylation can be both positively and negatively correlated with gene expression, although there appears to be a trend that the correlations are predominantly negative in promoter regions. These results are in agreement with previous studies reporting both positive and negative correlations between DNA methylation and gene expression in blood and brain52. Interestingly, we found that the relationship between DNA methylation and cis-acting gene expression appears to vary by genomic locations, with methylation of putative genes located upstream of TSS showing predominantly negative correlations, whereas those located downstream of TSS showing largely positive correlations with gene expression. While the negative correlation may result from the interference with transcription factor binding or recruiting repressors such as histone deacetylases51, the positive correlation between DNA methylation and gene expression may be attributed to the high level of DNA methylation in the gene body of highly transcribed genes. In addition, it has been shown that DNA methylation of a promoter or an enhancer can activate transcription of a target gene and therefore is positively correlated with gene expression53. Together, our integrated DNA methylome and transcriptome analysis confirmed the importance of DNA methylation in gene regulation and identified key candidate genes whose role in depression are modulated by epigenetic changes.

MDD is a highly heterogeneous disorder involving the joint or interactive effects of many genes in multiple pathways. Traditional methods that model the effect of a single gene cannot capture the complicated biological pathways implicated in MDD. Using a network-based approach, we identified co-methylated or co-expressed modules containing coordinated genes across different genomic loci. These findings support the hypothesis that altered DNA methylation is interdependent and that the blood monocytes methylome and transcriptome comprise a complex network of interacting processes.

Our study has several limitations. First, in spite of using 79 MZ discordant twin pairs, our study is still underpowered and thus we might have missed important disease-related genes, especially those with small individual effect size. As such, the current analysis focused on region-based rather than single probe analysis. Our results should be considered as a proof of concept rather than conclusive. Second, although we used purified blood monocytes for DNA methylation and gene expression profiling, our sample still includes other cell types, such as macrophages or dendritic cells, each of which may have different epigenetic profiles. However, our sample has nearly 97% purity and we believe confounding by cellular heterogeneity should not be a major concern for our study. Moreover, given the cell-type-specific nature of DNA methylation, it is unclear to what extent our results derived from peripheral blood could reflect methylation changes in the brain. However, monocytes are one of the key components of the innate immunity system, dysfunction of which has been implicated in neuropsychiatric disorders including depression54, and accumulating evidence indicated that epimutations may not be limited to the affected organ (e.g., brain) but could also be detected in peripheral blood55. Further, many of the identified DMR genes are abundantly expressed and some could be replicated in the brain. These putative genes detected in easily accessible tissues, such as blood are suitable for biomarkers. Third, the twin participants were evaluated for lifetime history of MDD and some non-depressed co-twins might ultimately develop an episode of MDD. The mean age of our sample was 38, which suggests that a large portion of the participants had already passed the peak risk period of young adulthood56. Fourth, due to the uneven distribution of the CpG probes on the Illumina methylation arrays, we cannot entirely exclude the possibility that the observed correlation patterns between monocytes DNA methylation and cis-acting gene expression is likely reflecting the way the Illumina methylation array was designed. Whether the observed correlation patterns between DNA methylation and gene expression is due to true biology or artifact could be verified when whole genome bisulfite sequencing data become available in the near future. Moreover, like all other observational studies, establishing causality between DNA methylation and MDD is impossible in the present study. Fifth, of the putative CpG probes identified in this study, there is a considerable likelihood that some CpG sites, especially those with smaller effect sizes, could be due to technical artifact but not reflect true biology. Other potential limitations include the limited genome coverage of the EPIC array used in our study, and the uncertainty to generalize our findings to other racial/ethnic groups.

Our study has several strengths. First, as MZ twin pairs share almost identical genotypes, age, sex, and early familial environment (e.g., in utero environment), as well as many unknown or unmeasured factors, the use of a MZ discordant co-twin control design minimizes or eliminates potential confounding by these factors. Second, we profiled DNA methylome and transcriptome in purified blood monocytes, which minimizes confounding by cellular heterogeneity. Moreover, we used the structured clinical interview (DSM-IV) for depression diagnosis.

In summary, our results demonstrated a critical role of altered DNA methylation and gene expression in MDD, and identified key candidate genes and pathways underlying MDD pathogenesis. If validated, the newly identified genes and pathways may serve as novel therapeutic targets for MDD and related disorders.