Introduction

Major depressive disorder (MDD) is a complex mental disorder with the highest prevalence (the lifetime prevalence of MDD is about 15% [1, 2]) among the psychiatric disorders [3]. In addition to high prevalence, MDD is also associated with substantial morbidity and mortality [4,5,6], which makes it the second leading cause of disability worldwide [7]. Despite the fact that MDD imposes great economic burden on society [7, 8], currently the pathogenesis of MDD remains largely unknown. The heritability of MDD is estimated to be around 30–40% [9, 10], indicating that genetic factors have a pivotal role in MDD. Though great effort has been made to investigate the genetic underpinnings of MDD, only limited risk variants and genes have been identified by genetic linkage and association studies [11,12,13,14,15]. The advent of GWAS provides an opportunity to explore the genetic basis of MDD. In 2015, the CONVERGE consortium successfully identified two genome-wide significant risk loci for MDD through using recurrent MDD cases [16]. In 2016, Hyde et al.[17] identified 15 genetic loci associated with risk of MDD through using a large cohort of MDD samples. Recently, Wray et al. conducted the largest GWAS meta-analysis of MDD so far and identified 44 risk loci [18].

To further identify novel risk variants for MDD, we performed a meta-analysis (a total of 336,753 subjects) through combining three independent GWAS of MDD (23andMe, Inc., a personal genetics company [17], the Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium (PGC) [20], and the CONVERGE consortium [16]). Novel genetic variants from ten independent loci showed significant association with MDD at genome-wide significance level (P < 5 × 10−8). SNPs with a P < 1 × 10−7 in the meta-analysis were further replicated in an independent sample, the Generation Scotland: Scottish Family Health Study (GS:SFHS), comprising 2659 MDD cases and 17,237 controls. We also performed eQTL analysis to explore the potential influence of the identified risk variants on gene expression. Our study identified three novel genetic loci (6q16.2, 12q24.31, and 16p13.3) associated with risk of MDD.

Materials and methods

GWAS datasets

We used three independent GWASs of MDD in this study. The first GWAS of MDD was obtained from a recent large-scale study conducted by Hyde et al.[17], which identified 15 genome-wide significant loci [17]. MDD cases and controls were ascertained from 23andMe and subjects who reported a history of clinical diagnosis (or treatment) of depression were included as MDD cases. Participants provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited IRB, Ethical & Independent Review Services (E&I Review). SNPs were primarily genotyped with the Illumina HumanHap550 + BeadChip and the Illumina OmniExpress + BeadChip. In addition, custom arrays were also used. Logistic regression (additive allelic effects model) was used to test the association of SNPs with MDD. In brief, genome-wide association results from 75,607 MDD cases and 231,747 controls were used in this study. More detailed information about sample collection, SNP genotyping, quality control, and statistical analysis can be found in the original paper [17]. The second GWAS dataset is from the Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium [20]. This dataset contains genome-wide association results from 9240 MDD cases and 9519 controls. Cases were diagnosed with DSM-IV lifetime MDD using direct structured diagnostic interview (by trained interviewers) or clinician-administered DSM-IV checklists, and most of cases were from clinical sources [15]. Most of the controls were selected from the general population randomly and screened for lifetime history of MDD. All subjects were genotyped with Illumina or Affymetrix SNP arrays. Logistic regression was used to test the association between the SNPs and MDD (under an additive model). More detailed information about sample collection, diagnosis, genotyping, statistical analyses, and quality control can be found in the original paper [20]. The third GWAS dataset is from the CONVERGE consortium [16]. To reduce the phenotypic heterogeneity of MDD, CONVERGE consortium only used female MDD cases recruited from China. Briefly, 5303 female recurrent MDD cases and 5337 controls were included in this study and low-coverage whole-genome sequencing was used to genotype all of the subjects. The Composite International Diagnostic Interview (CIDI) (which used DSM-IV criteria) was used for MDD diagnosis. Linear mixed model was utilized to perform the genetic association analysis. More detailed information about sample recruitment, ascertainment, sequencing, genotype calling, quality control, and statistical analysis can be found in the original publication [16].

Meta-analysis

Genome-wide association results from 23andMe [17], PGC [20] and CONVERGE [16] (totaling 90,150 MDD cases and 246,603) were used to perform meta-analysis with the program PLINK (v1.9) [21]. Ancestry determination was performed and subjects who had >97% European ancestry were included in 23andMe study [17]. Genotype data of 23andMe were imputed (minimac2 software [22]) using the reference haplotypes from the 1000 Genomes project [23] (2013 September release). MDD cases and controls used in PGC study [20] were European ancestry and genotype data were imputed with Beagle (v3.0.4) [24] (the phased haplotypes of CEU + TSI from HapMap3 data were used as reference). Subjects used in CONVERGE study [16] were Han Chinese and whole-genome sequencing was used to genotype the samples. The number of SNPs used as input for the meta-analysis was as follows: 23andMe: 15,607,353 SNPs; CONVERGE: 5,992,772 SNPs; PGC 1,235,109 SNPs. We first performed a conversion so each SNP has the same effect allele in each GWAS study. Meta-analysis was then conducted (based on the same effect allele) using summary statistics (including odds ratio, P-value, standard error of odds ratio) from each GWAS. SNPs that were presented in at least two GWAS were included in the final meta-analysis. As described in most GWAS meta-analysis [20, 25], we used the fixed-effect model in this study. The fixed-effect model assumes that the effect of each SNP is the same across different studies. Compared with the random effect model, the fixed-effect model is more powerful for detecting association [25, 26]. I2 was used to quantify the heterogeneity of the meta-analysis [27]. We restricted our analysis on autosomal SNPs and we also validated our meta-analysis results using METAL software [28], which utilizes an inverse-weighted fixed-effects model.

Replication in GS:SFHS

Through combining samples from the 23andMe, CONVERGE and PGC, we identified 213 previously unreported SNPs that reached genome-wide significance level (P < 5 × 10−8). In addition, we also identified 171 SNPs that showed suggestive association (i.e., P < 1 × 10−7) in the meta-analysis (including 23andMe, CONVERGE, and PGC). To further explore if these 171 SNPs were associated with MDD in an independent sample, we tried to replicate these 171 SNPs in GS:SFHS, a family- and population-based Scottish cohort [29]. Due to that 43 SNPs were not available in GS:SFHS, a total of 128 SNPs (with a P < 1 × 10−7) were successfully interrogated in GS:SFHS finally. Briefly, 2659 MDD cases and 17,237 controls were included in GS:SFHS. All of the subjects were recruited from the United Kingdom and structured clinical interviews were applied for the diagnosis of MDD using DSM-IV criteria. The Illumina Human OmniExpressExome -8- v1.0 array was used for genotyping. More detailed information about sample collection, genotyping, quality control, and statistical analysis can be found in the original paper [29].

Linkage disequilibrium analysis

Linkage disequilibrium (LD) values (r2) among the studied SNPs were calculated using genotype data of 99 European subjects (Utah residents with northern and western European ancestry, CEU) from the 1000 Genomes project [23] (http://www.internationalgenome.org/). As the major MDD GWAS (including 23andMe [17], PGC [20], and GS:SFHS [29]) were from populations of European ancestry, we only calculated LD among the studied SNPs in Europeans. Haploview [30] was used to plot the LD pattern among the studied SNPs. LD block was defined with the confidence interval method as described by Gabriel et al. [31].

Frequency distribution of the risk variants in world populations

The frequency of the identified risk variants in different geographic populations was plotted using data from the 1000 Genomes project [32] and the Human Genome Diversity Project (HGDP) [33].

Prioritization of the potential functional variants

To pinpoint the potential functional SNPs at each identified risk loci, we conducted functional prioritization using LINSIGHT [34]. LINSIGHT predicts the functional consequence of the genetic variants using functional and population genomic data, including evolutional conservation (e.g., phyloP score and phastCons element), binding site (e.g., transcription binding site, miRNA binding site and splicing site), and regional annotation data (e.g., ChIP-seq peak of transcription factor, DNase-I hypersensitive site and histone modification). LINSIGHT combines these features using a linear model and scores each variant. The score of LINSIGHT ranges from 0 to 1 and a larger LINSIGHT score represents higher probability that this SNP is functional.

Functional fine-mapping using Probabilistic Annotation Integrator (PAINTOR)

In addition to LINSIGHT, we also used PAINTOR [35] to prioritize the possible causal variant (s) at each risk loci. PAINTOR prioritizes plausible causal variants through integrating genetic association signals (from GWAS) and functional annotation data (such as DNase hypersensitivity sites, enhancer, promoter, and etc.). For each input variant, PAINTOR calculates the probability that the variant is causal. The SNP with the smallest P-value at each identified risk loci was defined as index SNP, and SNPs that were in linkage disequilibrium with the index SNP (r2 > 0.7) were extracted using SNiPA (http://snipa.helmholtz-muenchen.de/snipa/index.php?task=about_snipa) [36]. European populations (CEU) from the 1000 Genomes project [23] were used to calculate the linkage disequilibrium values (r2). The index SNP and SNPs that were in linkage disequilibrium with the index SNP (r2 > 0.7) were used as input for functional fine-mapping. A higher PAINTOR score indicates a higher probability that the SNP is causal.

Pathway analysis

To explore if certain specific gene ontology (GO) categories or pathways were enriched in the identified MDD risk genes, we carried out pathway analysis. Briefly, we first performed LD analysis and SNPs linked with the identified risk SNPs (r2 > 0.3) were extracted. For each loci, the most significant SNP was defined as index SNP. We utilized PLINK (v1.09) [21] to calculate the LD values between the index SNP and nearby SNPs using genotype data of European populations (CEU, Phase I data) from the 1000 Genomes Project [23]. Genes covered by these extracted SNPs were then used for pathway analysis with DAVID [37].

Expression quantitative trait locus (eQTL) analysis

To explore if the identified SNPs are associated with the expression level of nearby genes, we performed eQTL analysis using the LIBD eQTL browser (http://eqtl.brainseq.org/phase1/eqtl/) [38, 39]. The LIBD eQTL browser included brain tissues (the dorsolateral prefrontal cortex, DLPFC) of 412 subjects (including 175 schizophrenia patients, and 237 controls). Gene expression was measured with RNA sequencing and an additive genetic effect model was used to test the association of genotyped SNPs with gene expression. We queried the most significant SNP (i.e., SNPs in Table 1 and Table 2) at each locus using LIBD eQTL browser and genes whose expression is associated with the query SNP were extracted. The P-values were extracted directly from the LIBD eQTL browser and were not corrected for multiple testing. Only significant associations with a P-value less than 1.0 × 10−4 and false discovery rate (FDR) <0.01 were retained. More detailed information about LIBD eQTL database can be found at http://eqtl.brainseq.org/phase1/eqtl/ [38, 39].

Table 1 Association significance for representative SNPs reaching genome-wide significance level (P < 5 × 10−8) in the meta-analysis
Table 2 Replication of SNPs with a P < 10−7 in GS:SFHS identifies one novel risk locus (16p13.3) for MDD

Expression analysis of risk genes in MDD cases and controls

To explore whether nearby genes of the identified risk SNPs were dysregulated in MDD cases, we compared the expression of these genes in MDD cases with controls using expression data (GSE102556) from a recent study of Labonte et al. [40]. Briefly, six brain regions (including the dorsolateral PFC, ventromedial prefrontal cortex, orbitofrontal cortex, ventral subiculum, nucleus accumbens, and anterior insula) of 26 MDD cases (13 males and 13 females) and 22 controls (13 males and 9 females) were collected and genome-wide gene expression was measured with RNA sequencing method. In addition to human subjects, Labonte et al.[40] also established a stressed mice model (using chronic variable stress (CVS)) and measured the gene expression in brains of stressed mice (n = 10) and control mice (n = 10). As chronic stress is a well-characterized risk factor for depression, several rodent models (including chronic social defeat stress and chronic variable stress) have been introduced to uncover the role and mechanism of chronic stress in depression [41, 42]. Among these models, CVS has been proved to be a reliable paradigm and animals exposed to CVS exhibited symptoms parallel to human depression, including anxiety, depression-like behavior, and neurobiological alterations [43]. Labonte et al.[40] exposed the mice to CVS for 21 days and they showed that the stressed mice exhibited depression-and anxiety-like behaviors. Two representative brain regions (i.e., ventromedial prefrontal cortex (vmPFC) and nucleus accumbens (NAc)) implicated in stress responses in rodent models [44] were examined in stressed mice in the Labonte study. To assess if the expression of the identified risk genes was significantly different in MDD cases compared with controls, we extracted the P-values (uncorrected for multiple testing) of MDD risk genes directly from the study of Labonte et al. [40]. Labonte et al. [40] analyzed the males and females separately, and differentially expressed genes in female MDD cases (compared with healthy female controls) and male MDD cases (compared with healthy male controls) were identified separately. More detailed information about the human and mice subjects, RNA extraction, gene expression measurement, statistical analysis can be found in the original study of Labonte et al. [40].

Results

Meta-analysis identified two novel genetic loci associated with MDD

Genome-wide meta-analysis of 90,150 MDD cases and 246,603 controls (from 23andMe, PGC and CONVERGE) identified 213 previously unreported SNPs that showed significant association with MDD at genome-wide significance level (P < 5 × 10−8) (Fig. 1 and Supplementary Table S1). Quantile–quantile plot of the GWAS meta-analysis was shown in Supplementary Figure S1. Of note, these SNPs did not show significant associations (P < 5 × 10−8) with MDD in any of the three genome-wide association studies (Supplementary Table S1). These genome-wide significant SNPs are located in 10 independent genomic regions, including 1p31.1, 2p16.1, 3q25.32, 5q14.3, 5q34, 6q16.2, 12q24.31, 13q14.3, 13q21.32, and 15q14 (Fig. 2, Supplementary Figure S2 and S3). Genetic variants near 8 loci (1p31.1, 2p16.1, 3q25.32, 5q14.3, 5q34, 13q14.3, 13q21.32, and 15q14) have been reported to be associated with MDD previously [17, 45]. Nevertheless, no previous study has showed that genetic variants on 6q16.2 and 12q24.31 were associated with MDD. Thus, our results indicate that 6q16.2 and 12q24.31 are novel risk loci for MDD. The most significant SNP for each of the ten risk loci are listed in Table 1.

Fig. 1
figure 1

Meta-analysis results of three MDD GWAS. Novel genetic variants from ten independent loci (1p31.1, 2p16.1, 3q25.32, 5q14.3, 5q34, 6q16.2, 12q24.31, 13q14.3, 13q21.32, and 15q14) showed significant association with MDD (P < 5 × 10−8) in a total of 90,150 MDD cases and 246,603 controls. Two novel risk loci (6q16.2 and 12q24.31) showed significant association with MDD

Fig. 2
figure 2

Regional association plots for the three novel genome-wide significant loci. a The significant SNP (rs10457592) on 6q16.2 were located upstream of the FBXL4 gene. b The novel identified risk variant (rs2004910) on 12q24.32 were located in upstream of the SPPL3 gene. c The newly identified risk variant (rs3785234) on 16p13.3 were located in intron 7 of the RBFOX1 gene

Replication of SNPs with a P < 1 × 10−7 in GS:SFHS identified one additional novel risk locus for MDD

In addition to the 213 genome-wide significant SNPs (previously unreported) (Supplementary Table S1), we also identified a total of 171 SNPs that showed suggestive association (i.e., P < 1 × 10−7) with MDD in the meta-analysis (including 23andMe, CONVERGE and PGC). We interrogated these 171 SNPs in GS:SFHS and found that 128 SNPs were available in GS:SFHS. We thus performed a meta-analysis restricted to these 128 SNPs and 28 additional genome-wide significant SNPs (Pmeta < 5 × 10−8) were identified in the combined samples (including 23andMe, CONVERGE, PCG, and GS:SFHS, a total of 356,649 subjects (92,809 MDD cases and 263,840 controls)) (Supplementary Table S2). These newly identified significant SNPs were distributed in six genomic regions (Supplementary Table S2), including 1p31.1, 2p16.1, 13q21.32, 15q14, 16p13.3, and 22q13.2. Genetic variants near 1p31.1, 2p16.1, 13q21.32, 15q14, and 22q13.2 have been reported to be associated with MDD previously [17]. However, no previous study has shown the association between genetic variants on 16p13.3 and MDD. Thus, our study indicates that 16p13.3 is a novel risk locus for MDD. The genome-wide significant SNP on 16p13.3 is located in intron 7 of the RBFOX1 gene (Fig. 2c). The most significant SNP for each risk loci in the replication stage (including 23andMe, PGC, CONVERGE, and GS:SFHS) was listed in Table 2. Taken together, our study identified three novel MDD risk loci (i.e., 6q16.2, 12q24.31, and 16p13.3).

The identified risk SNPs did not show significant heterogeneity across studies

Considering that GWAS datasets from different populations (i.e., European and Chinese) were meta-analyzed with fixed-effect model, we also performed heterogeneity analysis. Among the 16 genome-wide significant SNPs, nine SNPs did not show heterogeneity (I2 = 0) and five SNPs showed low heterogeneity (I2 < 0.25) across studies (Tables 1 and 2). And two SNPs (rs10457592 and rs2717046) showed moderate to high heterogeneity (0.5 < I2 < 0.75). These results suggest that the identified SNPs may represent common risk variants for MDD in different populations. However, independent replication is needed to validate our findings.

Prioritization of potential functional SNP at each identified risk loci and pathway analysis

Our meta-analysis identified multiple independent risk loci for MDD (Tables 1 and 2). To further identify the possible functional (or causal) SNPs at each identified locus, we performed functional prediction using LINSIGHT [34]. We extracted the LINSIGHT scores of SNPs linked with the index SNP (r2 > 0.3). We found that 8 out of 10 risk loci have SNPs with a LINSIGHT score larger than 0.9, suggesting these SNPs may have functional consequences. The SNP with the largest LINSIGHT score at each risk locus was listed in Supplementary Table S3. We also performed functional fine-mapping using PAINTOR. The SNP with the highest PAINTOR score at each risk locus was listed in Supplementary Table S4. Of note, four SNPs have a PAINTOR score of 1, implying these SNPs may be functional. However, further experimental validation are needed. Finally, we conducted pathway analysis and found no pathways were significantly enriched in the identified risk genes.

Some of the identified risk SNPs showed significant association with gene expression in human brain (DLPFC)

To explore whether the identified risk variants are associated with gene expression in the DLPFC, we performed eQTL analysis. As the identified risk SNPs on each locus are in linkage disequilibrium (except for 1p31.1), we only selected the most significant SNP (i.e., SNPs in Tables 1 and 2) at each locus for eQTL analysis. SNP rs12127789 is associated with NEGR1 expression (P = 7.63 × 10−5), rs1193510 is associated with the expression of GFM1 (P = 5.49 × 10−6), RSRC1 (P = 5.63 × 10−5) and RARRES1 (P = 7.62 × 10−5), rs1501672 is associated with LINC00461 expression, rs2004910 is associated with SPPL3 expression (P = 9.16 × 10−13), rs9623320 is associated with the expression of L3MBTL2 (P = 1.64 × 10−7), XPNPEP3 (P = 2.84 × 10−7) and POLR3H (P = 3.41 × 10−5), and rs7140116 is associated with PCDH8P1 expression (P = 9.64 × 10−5) in the DLPFC (Supplementary Table S5). SNPs on five loci (rs4543289, rs10457592, rs9540720, rs8037781, and rs11682175) were not associated with gene expression in the LIBD eQTL database. These eQTL results suggest that the identified risk variants may modulate the expression level of nearby genes in the DLPFC.

Upregulation of FBXL4 and RSRC1 in brains of MDD cases compared with controls

Expression quantitative trait locus analysis showed that some of the identified risk variants were associated with gene expression in human brains (Supplementary Table S5), suggesting that the risk variants may confer risk of MDD through regulating gene expression. We thus examined the expression level of genes near the identified risk loci in MDD cases and controls using expression data (GSE102556) from Labonte et al. [40]. Only genes nearest to the identified risk SNP were examined. We found that NEGR1 (P = 0.038, uncorrected) was significantly downregulated in female MDD cases compared with controls. By contrast, FBXL4 (P = 0.0072, uncorrected) and RSRC1 (P = 0.042, uncorrected) were significantly upregulated in female MDD cases compared with controls. Consistent with the observation in female MDD cases, we found that Fbxl4 and Rsrc1 were also significantly upregulated in brains of stressed female mice (P = 0.019 and P = 8.50 × 10−4, respectively, uncorrected). The significant upregulation of FBXL4 and RSRC1 in both female MDD cases and stressed female mice suggest that dysregulation of these two genes may have a role in MDD.

Discussion

Accumulating evidence suggests that genetic factors play pivotal roles in MDD. However, currently the genetic basis of MDD remains largely unknown. Identification of MDD-associated genetic variants remains a major challenge as MDD is a moderately heritable, clinically heterogeneous condition with a complex genetic architecture [46]. Though previous GWAS have identified several genome-wide significant risk variants [16, 29, 47], most of the risk loci of MDD remain to be uncovered. To further identify new MDD-associated variants (which could not be detected in individual GWAS due to limited power), we tried to improve the power of this study through increasing sample size and utilizing a relatively powerful statistical method. First, considering that the effect size of most risk variants is relatively small, combining samples from different studies may help to identify new risk variants as the statistical power improves with the increase of sample size. Second, as reported in most previous GWAS [20, 25], we used the fixed-effect model in this study. The fixed-effect model assumes that the effects of the genetic variants are the same across studies, thus it is useful to identify novel risk variants through combining different studies. Compared with the random effect model, the fixed-effect model provides narrower confidence intervals and it is useful for detecting association [25, 26].

We successfully identified three novel MDD-associated loci (6q16.2, 12q21.31, and 16p13.3). The newly identified SNP on 6q16.2 (rs10457592) is located upstream of the FBXL4 gene (Fig. 2a), which encodes a member of the F-box protein family. FBXL4 protein is found to be expressed in mitochondria and may play a pivotal role in the maintenance of mitochondrial DNA (mtDNA) [48]. Previous studies have showed that mutations in FBXL4 resulted in mitochondrial encephalopathy [48, 49], indicating the important role of FBXL4 in maintenance of mitochondrial function. In addition to the genetic evidence, expression analysis also suggests that FBXL4 may be involved in MDD. Compared with controls, FBXL4 was significantly upregulated in both female MDD cases and female stressed mice, implying dysregulation of FBXL4 in MDD.

In addition to 6q16.2, our study also suggests that 12q21.31 and 16p13.3 are novel risk loci for MDD. It should be noted that genetic variants near 12q21.31 and 16p13.3 showed significant associations with MDD in the discovery stage of Hyde et al.’s study [17]. However, they did not follow these SNPs as these SNPs were absent in PGC (Hyde et al. performed a meta-analysis through combining results from PGC and 23andMe, and only SNPs presented in both PGC and 23andMe were followed for downstream analysis). Accordingly, these two loci were not included in the final 15 loci reported by Hyde et al. [17].

We explored the genome-wide significant SNPs (rs12415800 and rs35936514, which located upstream of SIRT1 and intronic region of LHPP, respectively) reported by CONVERGE in the meta-analysis. Both rs12415800 and rs35936514 were not available in PGC dataset. We found that rs12415800 is also significantly associated with MDD in 23andMe (P = 0.041), with the same risk allele (i.e., A allele) in CONVERGE and 23andMe studies (Supplementary Table S6). In fact, SNP rs12415800 reached genome-wide significant level (P = 1.19 × 10−8) when samples from 23andMe and CONVERGE were combined. In addition, heterogeneity analysis showed that there was low heterogeneity (I2 = 0.11) in 23andMe and CONVERGE for SNP rs12415800, suggesting this SNP may represent a common risk variant in Chinese and European populations. SNP rs35936514 is not associated with MDD in 23andMe dataset. When samples from 23andMe and CONVERGE were combined, rs35936514 only showed marginal association with MDD (P = 0.0196). Heterogeneity analysis showed there was significant heterogeneity (I2 = 0.96) for rs35936514 in 23andMe and CONVERGE (Supplementary Table S6), implying that SNP may represent an Asian-specific susceptibility risk variant for MDD. In fact, we noted that the frequencies of the risk alleles of rs12415800 and rs35936514 are different in world populations (Supplementary Figure S4), further suggesting that population-specific risk variants may exist. However, more work is needed to verify this.

We also explored the potential functional consequences of the identified risk SNPs. Of note, the novel risk SNP (rs2004910) on 12q21.31 was associated with SPPL3 expression in human brain (Supplementary Table S5). SPPL3 encodes signal peptide peptidase like 3 (SPPL3), an intramembrane protease that cleaves several types of membrane signal peptides [50, 51]. Previous studies have showed the important functions of SPPL3 in eukaryotes [52]. Voss et al.[52] showed that SPPL3 regulates cellular N-glycosylation and downregulation of SPPL3 leads to a hyperglycosylation phenotype. In addition to regulation of glycosylation, recent studies also showed that SPPL3 is involved in immune response, including NFAT activation [53] and regulation of NK cell maturation and cytotoxicity [54]. Surprisingly, the activation of NFAT is not dependent on the proteolytic activity of SPPL3 [53]. A recent study also showed that genetic variant nearby SPPL3 is associated with the levels of markers of inflammation [55], consistent with SPPL3’s reported role in immunity and inflammation. Of note, immune dysfunction has been thought to be an important contributor to MDD [56, 57]. Our study suggests that SPPL3 may represent a novel risk gene for MDD.

Another interesting gene is RSRC1 (also named SRrp53). Most of the newly identified risk SNPs on 3q25.32 are located in introns of RSRC1, and our eQTL analysis indicated that the most significant SNP (rs1193510) was associated with RSRC1 expression in DLPFC of human brain (Supplementary Table S5). We further showed that RSRC1 was significantly upregulated in brains of female MDD cases. Intriguingly, expression of Rsrc1 was also significantly upregulated in brains of stressed female mice. These results suggest that RSRC1 may have a role in MDD and genetic variants on 3q25.32 may confer risk of MDD through affecting the expression of RSRC1. RSRC1 encodes a member of the serine and arginine rich-related protein family that plays a pivotal role in mRNA splicing [58]. In addition to MDD, RSRC1 was also reported to be associated with schizophrenia [59] and height [60]. The frequency distribution of the risk alleles of FBXL4 and RSRC1 in global populations was shown in Figure 3.

Fig. 3
figure 3

The frequency distribution of the risk alleles of FBXL4 and RSRC1 in global populations. a Frequency distribution of the risk allele (A allele) of rs10457592 in global populations. b Frequency distribution of the risk allele (G allele) of rs1193510 in global populations

Taken together, our study identified three novel risk loci for MDD and our results suggest that these risk SNPs may contribute to MDD risk through modulating gene expression. Further verification of our findings in independent samples and functional characterization of the identified risk genes may provide potential targets for therapeutics and diagnostics.