Introduction

Major depressive disorder is a common psychiatric disorder linked to both genetic and environmental factors [1, 2]. Depressive symptoms are important public health problems and contribute to vulnerability to major depression, which causes suicidal behavior [3]. According to a WHO report, more than 800,000 people die by suicide every year worldwide [4]. Further, 3ā€“33% of adults were reported to experience suicidal ideation during a lifetime [5], and over 90% of suicide victims or suicide attempters commonly have a mood disorder including depression [6]. Previous trials confirmed that preventive action against suicide, such as early treatment and mental health promotion can reduce the incidence by 22% [7, 8]. Therefore, it is important to identify robust biomarkers for depression that can facilitate detection at an early stage.

Family and twin studies have indicated that genetic and environmental factors are important in the pathogenesis of major depression. Estimates of major depressive disorder heritability range from 31 to 42% [2]. However, neither candidate gene approach studies nor genome-wide association studies (GWASs) have identified genes that are associated with depression and are replicated in multiple studies [9] except for two SNPs located in Kinase Suppressor of Ras 2 (KSR2) and DCC Netrin 1 Receptor (DCC) [10]. This inconsistency is thought to be associated with a lack of power partly because the role of environment has not been taken into account [9]. It is widely believed that environmental factors, such as stress might play an important role in the pathogenesis of depression.

Epigenetics is now generally understood to refer to potentially heritable molecular modifications that regulate gene expression and chromatin structure independently from the primary DNA sequence [11]. DNA methylation is one of the major forms of epigenetic modification that play diverse roles in the etiology of complex diseases [12]. Previous studies have shown that DNA methylation is affected by stressful life events and mediates risk in psychiatric disorders [13]. Methylation of the gene encoding the Glucocorticoid Receptor, NR3C1, which is one of the most investigated genes in depression studies, was reported to be associated with early life stress and stress-related psychiatric disorders [13, 14]. The associations of methylation levels with other candidate genes involved in depression, such as brain derived neurotrophic factor (BDNF) and Serotonin Transporter 1 (SLC6A4), have also been reported [13] and interestingly, epigenetic modifications of these genes were found to correspond to antidepressant medication [15]. However, the reported methylation sites of these genes and the effects of methylation on gene expression were not always consistent between studies [13, 14]. Recent technological advances have facilitated the investigation of DNA methylation status across the entire genome. Several epigenome-wide association studies (EWASs) for depression have been performed, and a number of additional candidate genes were identified. CĆ³rdova-Palomera et al. [16] reported that most differentially methylated sites examined in their EWAS study were located in genes related to neuropsychiatric phenotypes. Sabunciyan and his colleagues also reported the association of methylation with neuronal development genes [17]. However, these EWASs did not find the associations between methylation and traditional candidate genes linked to depression, including NR3C1, BDNF, and SLC6A4 [16,17,18,19,20,21]. Further, most of the newly identified candidate methylation loci in EWASs were not replicated in other independent studies [16,17,18,19,20,21], clouding the interpretation of these data.

In this study, we performed an EWAS for depressive symptoms on a small Japanese cohort using the methylation array technology. We investigated the methylation loci related to depressive symptoms in healthy individuals who were all free from the antidepressant medication. According to the previous studies, antidepressant have an influence on the DNA methylation status of several genomic loci [15]. Therefore, analysis with medication-free individuals with or without depressive symptoms may be an effective way to detect DNA methylation sites associated with depression. On this basis, we divided healthy individuals with or without depressive symptoms into two groups according to a depression self-rating scale and compared DNA methylation profiles of the two groups.

Materials and methods

Sample

We recruited healthy unrelated Japanese individuals living in Tokyo. The healthy control subjects were interviewed by psychiatrists and filled out a questionnaire, MINI [22], to exclude history of major psychiatric illnesses. They also completed another self-reported questionnaire, CES-D [23], to examine whether they had depressive symptoms. Then, 47 subjects were selected for the current study from our larger sample set described above. They were divided into two groups, with (CES-Dā€‰ā‰„ā€‰16, Nā€‰=ā€‰20) and without (CES-Dā€‰<ā€‰16, Nā€‰=ā€‰27) depressive symptoms matching a male/female ratio in order to investigate methylation sites related to depressive tendencies. All the subjects with CES-Dā€‰ā‰„ā€‰16 were not diagnosed as major depressive disorder. The annotation data of these samples are provided in Supplementary TableĀ 1. Written informed consent was obtained from each subject. Ethical approval was obtained from the ethical committees of the University of Tokyo.

Epigenome-wide DNA methylation analysis

The DNA methylation data were collected in our previous EWAS study [24]. Briefly, genomic DNA was extracted from white blood cells by the standard procedures (Wizard genomic DNA purification kit, Promega Corporation, WI). DNA samples were first bisulfite-converted using EZ DNA Methylation Kit (Zymo Research, Irvine, CA). The levels of DNA methylation were examined with a DNA methylation array (Infinium Human Methylation 450ā€‰K BeadChip, Illumina Inc., CA) according to the manufacturerā€™s protocol. The DNA samples were amplified by a whole-genome amplification technique and then fragmented and hybridized to the methylation array. After the hybridization, a single-base extension step determined the DNA methylation status of each locus. The arrays were imaged with a high-precision scanner (iScan system, Illumina Inc.) and the signal intensities corresponding to methylated or unmethylated signals were extracted with a software package (GenomeStudio Software, Illumina Inc.). The DNA methylation status of each cytosine residue was evaluated using Ī²- and M-values. The Ī²-value is the ratio of the signal from the methylated probe to the total signal intensity and ranges from 0 (unmethylated) to 1 (completely methylated). The M-value is the logit transformation of the Ī²-value, which is statistically valid for the differential analysis of methylation levels [25].

Data filtering and normalization

The Ī²-values generated by GenomeStudio Software were subjected to stringent data filtering and normalization. First, Ī²-values with a detection P-value ā‰„ 0.01 were treated as missing values and the call rate of each probe was calculated. We included probes which met the following conditions in the EWAS analysis: (1) probe call rate > 95%, (2) probe on autosomal chromosome, (3) probe not including a single-nucleotide polymorphism (SNP) with a minor allele frequency ā‰„ 0.05, and (4) probe not reported to have cross-reactivity [26] (Supplementary Fig.Ā 1). A previous study reported probes that co-hybridize to alternate sequences that are highly homologous (< 4 base mismatches among 50 bases) to the intended targets [26] as cross-reactive probes. In addition to these probes, we excluded the additional potentially cross-reactive probes. We created a list of possible cross-reactive probes that had 20-bp of sequence from the 5ā€² end that was perfectly matched to unintended target sequences [24]. After filtering, we conducted stringent data normalization using the following pipeline: (1) Lumi: color bias correction and quantile normalization (QN; correction for the distributions of the pooled probes) [27], (2) beta-mixture quantile dilation (BMIQ) normalization (correction for probe design bias) [28], (3) correction for the batch effect (ComBat) [29].

To investigate methylation sites associated with depressive symptoms, the normalized Ī²-values were converted to M-values and applied to the linear regression analysis.

Prediction of the proportions of leukocyte subtypes

To address the effect of different distributions of leukocyte subtypes between subjects with and without depressive symptoms, we predicted the proportions of each leukocyte subtype in all subjects. The proportions of natural killer (NK) cells, B cells, CD4+ T cells, CD8+ T cells, monocytes, and granulocytes were estimated using a published algorithm [30, 31]. The estimated proportions were used to control for the effects of leukocyte subtypes in the linear regression analysis.

Epigenome-wide association analysis

For the EWAS, significant associations were assessed by linear regression analysis with adjustments for the effects of age, sex, BMI, and the predicted proportions of leukocyte subsets. M-values were used for the analysis. In order to rank the differentially methylated positions (DMPs) efficiently, we employed a combined-rank scheme [32]. In EWAS analyses, probes with the lowest P-values do not always have large effects, i.e., statistically significant probes sometimes have very small Ī²-value differences between case and control subjects (delta Ī²; Ī”Ī²) and are considered not to be biologically significant [33]. The combined-rank method considers both P-value and Ī”Ī², therefore, we can prioritize DMPs which are important in the context of both statistical significance and effect size. P-values calculated with the linear regression model and adjusted Ī”Ī² [(adjusted case Ī²-value)ā€‰āˆ’ā€‰(adjusted control Ī²-value)] were used to calculate combined-ranks. First we ranked methylation sites in accordance with P-values and secondly the methylation sites were ranked in accordance with Ī”Ī². The summation of the two kinds of ranks was then evaluated and the final combined-ranks of the methylation sites were determined.

Pathway analysis

In order to investigate the characteristics of the top-ranked CpG sites, pathway analyses were performed with MetaCore software (version 6.24 build 67895, Thomson Reuters, New York, NY). Genes annotated to the top 100 ranked CpG sites were tested to examine whether they had any enrichment of gene sets for biological processes or molecular functions in the GO database (http://geneontology.org/) [34]. Among the detected gene sets, those with more than 500 registered genes were ignored because such gene sets tend to represent broader categories with biological meanings that are often ambiguous [35]. Gene sets with < 5 registered genes that might be less noteworthy as ā€œgene setsā€ were also disregarded. Further, we also conducted another GO-based pathway analysis with GOseq software [36], which enabled us to correct for gene length bias, because larger genes usually have many CpG sites and a higher a-priori chance of being included in the pathway analysis [37]. Pathways identified as significant in both MetaCore and GOseq analyses were regarded as candidate pathways.

Assessment of the blood-brain correlation in the top-ranked probes

The top 100 ranked probes were examined to determine whether the methylation status of these probe sites correlate between blood and brain. We used the Blood Brain DNA Methylation Comparison Tool (http://epigenetics.iop.kcl.ac.uk/bloodbrain/) to investigate the Pearsonā€™s correlation coefficients of Ī²-values between samples obtained from blood and four brain regions; prefrontal cortex (PFC), entorhinal cortex (EC), superior temporal gyrus (STG) and cerebellum (CER) [38]. The correlation plots were also obtained using this tool.

Statistical analysis

In the pathway analysis, the detected pathway was assessed using the significance level at FDR 0.05. All analyses other than MetaCore-based pathway analysis were performed using R software.

Results

Epigenome-wide DNA methylation analysis

Genome-wide DNA methylation profiles were examined in healthy individuals with or without depressive symptoms using array-based technology. First, a quality check of the array data was performed. We investigated the detection P-values of each probe, which reflect the overall probe performance. More than 99% of all probes in all samples had a detection P-value ā‰¤ 0.05, showing that the overall performance of the assay was high. Density plots of the Ī²-values of each sample were checked by a visual inspection. All plots had a standard bimodal distribution of the Ī²-values as we reported previously [24].

After stringent data filtering and normalization were performed, 363,887 methylation sites were analyzed (Supplementary Fig.Ā 1). DMPs associated with depressive symptoms were investigated by linear regression analyses. Since several DNA methylation sites were observed to be affected by sex, age, and BMI as previous studies, we included these variables as covariates. Cell types have also been reported to have a strong influence on DNA methylation at several specific sites; thus, we estimated the proportions of leukocyte subtypes using a published algorithm. No obvious proportion difference between subjects with or without depression were observed in any of the cell subtypes (Supplementary Fig.Ā 2). However, considering their large impact on DNA methylation, the effects of cell subtypes were also regressed out.

The 100 top-ranked DMPs are listed in (TableĀ 1, Supplementary TableĀ 2). Of the top-ranked DMPs, 85/100 probes (85%) were found to be hypermethylated in the individuals with depressive symptoms, and only 15% were hypomethylated (Fig.Ā 1, Supplementary TableĀ 2). The top-ranked DMP, cg17277199, was 10.7% hypermethylated in subjects with depressive symptoms (Pā€‰=ā€‰9.24ā€‰Ć—ā€‰10āˆ’7, TableĀ 1), and was located in a region within 200-bp upstream of the C2orf84 transcription start site. cg13768055 (RPH3AL, Pā€‰=ā€‰3.27ā€‰Ć—ā€‰10āˆ’4, Ī”Ī²ā€‰=ā€‰āˆ’17.7%), and cg01343041 (C2orf84, Pā€‰=ā€‰8.07ā€‰Ć—ā€‰10āˆ’6, Ī”Ī²ā€‰=ā€‰9.6%) were the next two most highly ranked sites.

Table 1 EWAS result of the top 10 ranked probes
Fig. 1
figure 1

Result of the EWAS examining DMPs between subjects with and without depressive symptoms. Log-transformed P-values of all the probes were plotted against differences between average adjusted Ī²-values of subjects with and without depressive symptoms. The top 100 ranked probes are shown in blue dots

The top 100 DMPs were found to have no characteristics based on locations related to genes and CpG islands. However, among the top 100 probes, 36% were located in a putative enhancer region. This percentage was relatively high considering that the overall percentage of probes in enhancer regions was 22.9% (Chi-square Pā€‰=ā€‰1.92ā€‰Ć—ā€‰10āˆ’3).

Pathway analysis

Pathway analyses were performed to investigate whether the genes annotated to the top 100 ranked probes were associated with specific biological processes or molecular functions. Sixty-seven genes located near the top 100 DMPs were used for the two kinds of pathway analyses. First, Metacore software, which is based on the expert-curated data sets, was used to identify networks that showed significant associations at a FDR of 5%. Additionally, we performed pathway analysis with GOseq software in order to adjust for gene-length bias. The ā€œregulation of G-protein coupled receptor protein signaling pathwayā€ was significant in both analyses (Pā€‰=ā€‰1.69ā€‰Ć—ā€‰10āˆ’6, TableĀ 2).

Table 2 Result of the pathway analysis

Assessment of the blood-brain correlation in the top-ranked probes

As for the top 100 ranked probes, blood and brain correlations of the methylation were examined using the database. Of the top DMPs, 77/100 (77%) showed at least weak (rā€‰ā€‰ā‰„ā€‰ā€‰0.2) association with at least one brain region, and six DMPs had a strong correlation (rā€‰ā€‰ā‰„ā€‰0.7) (Supplementary TableĀ 3). The DMP with the highest correlation was the second top-ranked probe, cg13768055, with rā€‰>ā€‰0.8 between blood and three brain regions, PFC, EC, and STG (PFC, rā€‰=ā€‰0.84; EC, rā€‰=ā€‰0.82; STG, rā€‰=ā€‰0.84) (Supplementary Fig.Ā 3).

Discussion

To the best of our knowledge this is the first reported EWAS for depressive symptoms in healthy individuals. The major advantage of this study was the investigation of DNA methylation profiles associated with depressive symptoms in subjects who were all free from antidepressant medication. According to a previous study using animal models, selective serotonin reuptake inhibitors induced demethylation in the promoter region of the S100 Calcium Binding Protein A10 (S100A10) gene, for which hypermethylation was reported to be associated with the pathophysiology of depression [39]. Another study using animal models revealed that a tricyclic antidepressant imipramine repressed Serotonin 1A Receptor (5-HT1A) expression through the demethylation of its promoter region which is recognized and repressed by Sp4 [40]. These results suggest that antidepressant medication can change the DNA methylation profiles of CpG sites associated with depression. The participants of this study were all healthy human individuals, therefore, several candidate CpG sites that were related to depressive symptoms might reflect the direct effects of depression, free from any potential distortion associated with treatment medications.

The top-ranked probe associated with depressive symptoms, cg17277199, was located upstream of the transcription start site of the C2orf84 gene, although the association did not achieve genome-wide significance. C2orf84, also known as Family With Sequence Similarity 228 member A (FAM228A) encodes a protein of unknown function. The second top-ranked probe, cg13768055, was located in the 8th exon of the Rabphilin 3A-Like (RPH3AL) gene. This CpG site was reported to be a CTCF binding site and hypomethylation of this site may be related to increased CTCF binding, which might in turn regulate RPH3AL expression. Although the detailed function of RPH3AL in humans is still unclear, the protein was found to regulate calcium-ion-dependent exocytosis in endocrine and exocrine cells [41,42,43].

Among the top 100 ranked probes, 71 probes were annotated to a gene region, and about half of them (35/71) were located upstream (within 1500ā€‰bp of the transcription start site, 5ā€² UTR, and 1st exon) of the annotated genes. All the probes in the upstream regions, other than two, were found to be hypermethylated in the subjects with depressive symptoms. Although recent studies have highlighted complex context-dependent interactions between DNA methylation and gene expression [44], a number of previous studies reported that DNA hypermethylation of gene promoter region is related to a lower level of gene expression [45]. Therefore, DNA hypermethylation in upstream regulatory regions in subjects with depressive symptoms might be associated with lower expression of particular genes. Further, many of the top-ranked probes (36/100) were identified in putative enhancer regions raising the possibility of additional effects on gene expression.

In this study, we examined the DNA methylation profiles associated with depressive symptoms using DNA extracted from peripheral blood. Although the methylation profiles were found to be different between blood and brain, several CpG sites were correlated between the two tissues. We examined whether the methylation of the top-ranked DMPs correlated between blood and four brain regions using a published database [38]. From this analysis, 77/100 (77%) of the top DMPs showed at least weak (rā€‰ā‰„ā€‰0.2) associations with at least one brain region, and six DMPs showed strong correlations (rā€‰ā‰„ā€‰0.7) (Supplementary TableĀ 3). The DMP with the highest correlation was the second top-ranked probe, cg13768055, with rā€‰>ā€‰0.8 between blood and three brain regions, PFC, EC, and STG. Although, detailed studies using brain samples are essential to find additional and/or brain-specific DMPs associated with depressive symptoms, such DMPs with high correlation in this study may be more likely to be related to depressive symptoms through methylation alterations in brain.

Finally, we performed pathway analyses with the genes annotated to the top-ranked DMPs to investigate the functional relationships between these genes. We found that the pathway, regulation of G-protein coupled receptor protein signaling, was significantly associated with depressive symptoms, reflecting the top-ranked probes annotated to RGS14, RGS18, GNG4, CHGA, RPH3AL, and NPR2. Regulator of G-protein Signaling 14 (RGS14) and Regulator of G-protein Signaling 18 (RGS18) are both members of the regulator of G-protein signaling family and bind GTP-bound G-protein alpha subtypes through their RGS domains, increasing the GTPase activity and attenuating GPCR signaling [46, 47]. Selective expression of RGS family members was previously reported [48]. RGS14 was found to be enriched in hippocampal CA2 neurons [47], whereas RGS18 is predominantly expressed in platelets and granulocytes [46]. G-protein subunit gamma 4 (GNG4) is a component of a heterotrimeric G-protein subunit. G-protein alpha, beta, and gamma subunits have preferential interactions and subunit combinations contribute to the specificity of G-protein-mediated signaling pathways [49]. Interestingly, RGS14, RGS18, and GNG4 are all upstream regulators of the G-protein coupled receptor protein signaling pathway. Accumulating evidence suggests that G-protein coupled receptor related pathways are associated with mood disorders including depression [50, 51]. Monoamine receptors including serotonin, norepinephrine, and dopamine receptors are important targets of antidepressant medication and are all G-protein coupled receptor, with the exception of 5-HT3. Antidepressant drugs are believed to work by activating serotonergic, noradrenergic, and dopaminergic receptors and increasing monoamine transmission in the brain [51]. Although the precise mechanisms linking G-protein coupled receptors to the antidepressant drug response remain unclear, G-protein coupled receptors are predicted to play a role in the induction of hippocampal neurogenesis [51]. These data are consistent with an association between the regulation of G-protein coupled receptor protein signaling pathway and depressive symptoms found in this study.

In the current study, we could not find any DMPs which were ranked in the top 1000 and were overlapped with depression-associated DMPs reported in the previous studies using blood samples and the same array technology [52,53,54,55]. However, the methylation sites located in protocadherin (PCDH) family genes were reported as highly variable methylation sites according to depression in a previous study [16] and in the current study, we also found several CpG sites with Ī”Ī²ā€‰>ā€‰5% and Pā€‰<ā€‰0.05 in this region (Supplementary TableĀ 4). Therefore the PCDH family gene region might be an interesting candidate for future studies.

There are several limitations to this work. First of all, the sample size was small, potentially compromising the power of the statistical analyses. In order to identify additional depression-associated methylation sites, future studies with larger numbers of samples are essential. Second, the current study would benefit from replication using other methods such as pyrosequencing to examine the levels of DNA methylation. Finally, we used DNA extracted from peripheral blood in this study. Although many of the top-ranked CpG sites were consistent between blood and brain, it is necessary to investigate DNA methylation profiles using brain samples to find additional brain-specific DMPs.

In conclusion, we performed an EWAS for depressive symptoms and found several candidate CpG sites associated with this symptoms, especially annotated to genes linked to the G-protein coupled receptor protein signaling pathway. These findings provide a strong impetus for further validation studies using larger cohort to confirm and potentially expand on these result.

Data availability

The data sets of the current study are freely available from the National Bioscience Database Center (NBDC) website (http://biosciencedbc.jp/en/).