Introduction

Major depressive disorder (MDD) is a severe and complex mental disorder with a lifetime prevalence of ~15% (Hasin et al, 2005), and was estimated to be the second leading cause of disability in the near future (Kessler and Bromet, 2013) by the World Health Organization. The overall heritability of MDD is 31–42% (Sullivan et al, 2000), with certain subsets being more heritable (eg, recurrent early-onset MDD) (Levinson, 2006). Although this trait of MDD suggests potential opportunity to understand the disease via genetic analyses, such modest heritability has greatly complicated the search for risk or protective genetic loci. This is reflected by the fact that previous candidate gene studies (Bosker et al, 2011) and early genome-wide association studies (GWASs) (Kohli et al, 2011; Lewis et al, 2010; Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013; Muglia et al, 2010; Rietschel et al, 2010; Shi et al, 2011; Wray et al, 2012) were unable to identify large numbers of robust MDD genetic variants. Until recently, two landmark MDD GWAS studies named 23andMe (European descent) (Hyde et al, 2016) and CONVERGE (Han Chinese population) (Converge consortium, 2015) were published, and have made promising contributions to the understanding of MDD. 23andMe GWAS used the largest sample size so far, and CONVERGE GWAS focused on female subjects with recurrent MDD to reduce phenotypic heterogeneity. These two GWASs successfully discovered 17 (15 by 23andMe and 2 by CONVERGE consortium) genetic loci yielding genome-wide significance (detailed information about the genome-wide significant risk variants is listed in Supplementary Table S1).

Despite the recent achievements, GWAS approaches in MDD so far still seem to be less successful than in other complex diseases such as schizophrenia (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). This is likely because of the aforementioned higher prevalence but lower heritability of MDD compared with other adult psychiatric disorders (such as schizophrenia), and more cases are thus required to detect the same number of single-nucleotide polymorphism (SNP) associations (Levinson et al, 2014). Therefore, future meta-analyses of diverse genome-wide data with increasing sample sizes are in critical need. Indeed, meta-analysis has become essential in genetic analyses of complex human diseases, and many recent studies in which meta-analyses combine dozens of GWAS data sets have shed light on the genetic architectures of complex diseases such as schizophrenia (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014), Crohn’s disease (Franke et al, 2010), and type 2 diabetes mellitus (Voight et al, 2010). In addition to genome-wide meta-analysis of SNPs, convergent functional genomics approaches have also aided in identification of potential susceptibility genes and molecular mechanisms for psychiatric disorders including schizophrenia (Ayalew et al, 2012; Luo et al, 2014) and bipolar disorder (Le-Niculescu et al, 2007; Ogden et al, 2004). Integration of genome-wide meta-analysis of SNPs and convergent functional genomics data is believed to be an efficient strategy in dissecting the genetic and biological basis of complex illnesses.

To date, GWASs of MDD by 23andMe (Hyde et al, 2016), CONVERGE (Converge consortium, 2015), and PGC (Psychiatric Genomics Consortium) (Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013) consortia have (fully or partially) released their genome-wide results. In addition, many genome-wide gene expression analyses in brain tissues have also shared their data (Colantuoni et al, 2011; GTEx Consortium, 2013; Ramasamy et al, 2014), and analyses on subjects with MDD are included (Kim and Webster, 2009, 2010). We believe that systemic utilization of these public resources allows identification of novel MDD risk genes, and will provide valuable information that is beneficial for other psychiatric studies. Besides, MDD patients often exhibit certain related phenotypes and impaired cognitive abilities compared with healthy individuals (Souery et al, 2007; Taylor Tavares et al, 2007), and accumulating data indicate that genetic loci associated with MDD are also related to these phenotypes in humans (Demirkan et al, 2011; Schuhmacher et al, 2013; Vrijsen et al, 2015). So far, researchers have released the data and resources of several GWASs on MDD-related phenotypes such as neuroticism and depressive symptoms (Okbay et al, 2016a), preschool internalizing problems (Benke et al, 2014), anxiety symptoms (Otowa et al, 2016), as well as cognitive functions (Benyamin et al, 2014; Okbay et al, 2016b), providing valuable tools for further analyses on these phenotypes to strengthen the association between the risk SNPs and MDD and also to reveal potential biological mechanisms. In the current study, we conducted integrative analyses using these data to examine the genetic risk of MDD from convergent perspectives (Le-Niculescu et al, 2010; Niculescu, 2005; Ogden et al, 2004). We discovered a novel SNP rs9540720 in the PCDH9 gene conferring genome-wide significant risk of MDD, and this SNP was also associated with multiple MDD-related phenotypes and cognitive function alterations. Intriguingly, the risk allele of rs9540720 was associated with reduced expression of PCDH9, consistent with the significant downregulation of this gene in the brain and peripheral blood tissue of MDD patients compared with healthy controls. Collectively, our study supports the potential roles of PCDH9 in MDD susceptibility, and illustrates an example of comprehensive utilization of public resources to uncover the genetic risk factors of MDD.

Materials and methods

All the protocols and methods used in this study were approved by the institutional review board of the Kunming Institute of Zoology, Chinese Academy of Sciences.

MDD Discovery GWAS Data Set

23andMe GWAS sample

The 23andMe GWAS sample included 75 607 cases with self-declared depression (SDD) and 231 747 controls (Hyde et al, 2016). A previous study showed that SDD and clinically assessed MDD were highly correlated (r=1.00, SE=0.20) on common-variant-associated genetic effect (Zeng et al, 2016). We therefore consider SDD as an alternative phenotype for identifying common risk variants associated with MDD. The participating cohort was collected from the customer base of the consumer genetics company 23andMe (Eriksson et al, 2010; Hyde et al, 2016; Tung et al, 2011). Participants provided informed consents and the protocol was approved by an external AAHRPP-accredited institutional review board, Ethical and Independent Review Services. Detailed information of the samples, genotyping methods, and statistical analyses can be found in the original GWAS report (Hyde et al, 2016).

CONVERGE GWAS sample

The CONVERGE GWAS consisted of 5303 cases with MDD and 5337 controls (Converge consortium, 2015). Cases were diagnosed using the Composite International Diagnostic Interview (CIDI) (WHO lifetime version 2.1; Chinese version) following DSM-IV criteria. Preexisting records of bipolar disorder, psychosis or mental retardation, as well as drugs or alcohol abuse history before the first depressive episode were applied as excluding criteria. Controls were recruited from patients undergoing minor surgical procedures at the general hospitals or from local community centers. Detailed information of the samples, genotyping methods, and statistical analyses can be found in the original report (Converge consortium, 2015).

PGC GWAS sample

The PGC GWAS included 9240 patients and 9519 controls (Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013). Cases had diagnoses of DSM-IV lifetime MDD using structured diagnostic instruments through direct interviews by trained interviewers or clinician-administered DSM-IV checklists. Most samples ascertained cases from clinical sources, and most controls were randomly selected from the population and screened for their lifetime history of MDD. Detailed descriptions of the samples, data quality, genomic controls, and statistical analyses can be found in the original publication (Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013).

MDD Replication Data Set

23andMe Replication sample

The 23andMe Replication sample included 45 773 SDD cases and 106 354 controls independent from the discovery cohort (Hyde et al, 2016). The protocols and criteria of participants’ recruitment of the 23andMe Replication sample were the same as 23andMe GWAS sample, although the subjects were totally independent as previously described (Hyde et al, 2016). The 23andMe Replication sample was also used as the validation sample in the original 23andMe MDD GWAS study (Hyde et al, 2016).

Chinese MDD sample

This sample contained 732 MDD cases and 2318 controls as previously described (Zhang et al, 2014, 2016). All patients were diagnosed with MDD strictly according to the DSM-IV criteria. Standard diagnostic assessments were supplemented with clinical information obtained by a review of medical records and interviews with family informants. Patients were excluded either when they had a lifetime diagnosis of bipolar disorder, schizoaffective disorder, schizophrenia, or another psychotic disorder, or when they were female and were pregnant, planning to become pregnant, or breast-feeding during the study period. Control subjects were recruited from local volunteers without any history of mental disorders. All participants provided written informed consents.

SNP Selection and Statistical Analysis

SNPs highlighted in 23andMe GWAS sample were first subject to replication analyses in CONVERGE and PGC GWAS samples, and meta-analytic study was then conducted combining samples from all three data sets. Significant associations identified through the above meta-analysis were then replicated in the 23andMe Replication sample and Chinese MDD sample. In each sample, logistic regression was applied to test the association between phenotypes and SNP dosages under an additive model, and covariates included sample grouping and principal components reflecting ancestry. For the meta-analysis, we used odds ratio (OR) and standard error (SE) to estimate heterogeneity between individual samples and to calculate the pooled OR and 95% confidence interval (CI). The calculation was conducted using the classical inverse variance weighted methods with PLINK v1.07 (Purcell et al, 2007) in consistency with our previous study (Xiao et al, 2017). In the current study, we conducted two-tailed tests for discovery analysis and meta-analysis, and applied one-tailed tests for replication analyses as described before (Psychiatric GWAS Consortium Bipolar Disorder Working Group, 2011). A two-tailed p-value<5.00 × 10−8 was considered genome-wide statistically significant in the combined samples; in the replication sample, a one-tailed p-value<0.05 was considered nominally significant.

Analyses on MDD-Related Phenotypes

Neuroticism

Neuroticism is commonly defined as the proneness to negative emotions (including irritability, anger, sadness, anxiety, worry, hostility, self-consciousness, and vulnerability) usually in response to stress-inducing events (Kotov et al, 2010; Lahey, 2009). Neuroticism is a pervasive risk factor for different psychiatric conditions including MDD and entailed emotional dysregulation (Fanous and Kendler, 2004; Kotov et al, 2010). The neuroticism data (n=170 911) were obtained from a previous GWAS by Okbay et al (2016a). Detailed information of the samples, genotyping methods, and statistical analyses can be found in the original report (Okbay et al, 2016a).

Depressive symptoms (DS)

The DS data (n=180 866) were retrieved from a previous GWAS by Okbay et al (2016a). The authors analyzed the summary statistics from a GWAS of MDD (Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013) performed by the PGC (9240 cases and 9519 controls) in combination with data from two additional cohorts: the UKB (105 739 subjects) and the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort (7231 cases and 49 137 controls). Details on the samples, genotyping methods, and statistical analyses can be found in the original report (Okbay et al, 2016a).

Preschool internalizing problems (INTs)

INTs are heritable traits with moderate genetic stability from childhood into adulthood, and are found to be highly prevalent in the offspring of MDD patients (Olino et al, 2008). The INT data (n=4596) were taken from a previous GWAS by Benke et al (2014). They investigated the effects of SNPs on INT in a total of 4596 children (3 cohorts) with the Child Behavior Checklist (CBCL). Of the 36 items for INT scale in the most recent version of the CBCL 11/2 – 5, 34 were measured in all 3 cohorts. Example items include ‘Acts too young for age,’ ‘Worries a lot,’ and ‘Clings to adults or too dependent.’ For each item, the rater selected a score of 0 (not true), 1 (somewhat or sometimes true), or 2 (very true or often true), resulting in a potential score range of 0 to 68 for each child. Detailed information of the samples, genotyping methods, and statistical analyses can be found in the original study (Benke et al, 2014).

Anxiety phenotype

Anxiety disorders (ADs), namely generalized AD (GAD), panic disorder (PD), and phobias, are relatively common and often disabling conditions with the lifetime prevalence of over 20% (Kessler et al, 2005). The anxiety data (n=15 299) were from a previous GWAS by Otowa et al (2016). The anxiety phenotype was diagnosed based on DSM with some exceptions, and assessed using quantitative phenotypic factor scores (FSs) derived from a multivariate analysis combining information across the clinical phenotypes. Detailed information of the samples, genotyping methods, and statistical analyses can be found in the original paper (Otowa et al, 2016).

Analyses on Cognitive Function

Educational attainment

We used educational attainment as a ‘proxy phenotype’ for cognitive function. Although educational attainment is not a direct cognitive measure, it is correlated with cognitive ability (r~0.5) as well as personality traits related to persistence and self-discipline (Rietveld et al, 2013). Educational attainment is strongly associated with social outcomes, and there is a well-documented health-education gradient. It was estimated that ~40% of the variance in educational attainment is explained by genetic factors (Rietveld et al, 2013). For analysis of this phenotype, we used the data from a recent GWAS performed in 293 723 European individuals in whom education attainment was quantified with the well-characterized measurement ‘EduYears’ (an individual’s years of schooling) (Okbay et al, 2016b). Briefly, educational attainment was measured at an age at which participants were very likely to have completed their education (>95% of the samples were at least 30 years old). On average, participants had 13.3 years of schooling, and 23.1% had a College degree. Details on the samples, genotyping methods, and statistical analyses can be found in the original report (Okbay et al, 2016b).

Childhood intelligence

Intelligence, a quantifiable index of cognition, has been widely used in relevant genetic analyses given its great heritability and genetic stability, both in an individual’s life course and across generations (Deary et al, 2009). Moreover, childhood intelligence is a strong predictor of many important life outcomes including educational attainment (Deary, 2012), and is also associated with various psychiatric disorders such as schizophrenia and MDD (Batty et al, 2005; Koenen et al, 2009). As a result, we focused on the childhood intelligence phenotype measured with psychometric cognitive tests (Intelligence Quotient (IQ)-type tests). We utilized a recent GWAS of childhood intelligence including 12 441 children of European ancestry (Benyamin et al, 2014). In brief, the age of the participants ranged between 6 and 18 years. The best available measure of general cognitive ability (g) or intelligence quotient (IQ) derived from diverse tests assessing both verbal and nonverbal ability was used. Detailed information of the cohorts, intelligence measurements, genotyping methods, and statistical analyses can be found in the original study (Benyamin et al, 2014).

Healthy Subjects for Expression Quantitative Trait Loci (eQTL) Analysis

To identify the impact of MDD risk SNPs on mRNA expression, we utilized the well-characterized gene expression database BrainCloud (http://braincloud.jhmi.edu/) (Colantuoni et al, 2011). BrainCloud presents data regarding gene expression regulation in the human brain that guides functional studies of disease-associated variants. The BrainCloud sample comprises 261 postmortem dorsolateral prefrontal cortex (DLPFC) tissues of nonpsychiatric individuals, including 113 Caucasian subjects and 148 African-American individuals at various ages across the lifespan. As PCDH9 is differentially expressed across different age groups, we retrieved the genotype and expression data of 152 adult individuals (age >18 years; 64 Caucasians and 88 African Americans) from BrainCloud. The statistical analysis was conducted using linear regression, with RNA integrity number (RIN), sex, race, brain PH, postmortem interval (PMI), and age as covariates.

Diagnostic Analysis of PCDH9 Expression between Patients with MDD and Healthy Controls

We obtained the microarray mRNA expression data of the frontal cortex from 103 adult controls (age >18 years) and 131 adult MDD patients of European ancestry from dbGaP (accession number: phs000979.v1.p1). For the expression data, log2 ratios were normalized across mean log2 florescent intensities using loess correction (Colantuoni et al, 2002). After normalization, surrogate variable analysis (SVA) was conducted on the log2 ratios to optimize the signal/noise ratio and minimize potential impact from known and unknown sources of systematic confounders (Leek and Storey, 2007). The above data analyses were conducted using R with codes and tools retrieved from the Bioconductor project (http://www.bioconductor.org/).

We also collected hippocampal RNA-sequencing (RNA-seq) data of 15 adult MDD cases and 15 adult controls from the Stanley Medical Research Institute (SMRI) data set (http://sncid.stanleyresearch.org/) in the FASTQ file format. The RNA-seq reads underwent adaptation and low-quality filtering using btrim64 (Kong, 2011), and were then aligned to human reference genome (Human GRCh38 (hg38), http://asia.ensembl.org/index.html) using splice-read mapper (Tophat2 v2.0.14) (Kim et al, 2013). The map of known transcripts extracted from Ensembl Build GRCh38. Cufflinks v2.2.1 (Trapnell et al, 2012) was applied to call new transcripts as well as to assemble and quantify both the novel and known transcripts with default parameters. For each subject, accepted hits bam files from Tophat2 alignment were merged by Samtools v0.1.18 (Li et al, 2009) for the following Cufflinks quantification: (1) reads that were uniquely mapped to genes were used to calculate the gene expression level; (2) to quantify mRNA expression, FPKM (Fragments per Kilobase per Million mapped reads) was calculated to measure the gene-level expression according to the formula FPKM=F × 103/L × 106/N (F is the number of fragments mapping to the gene annotation, L is the length of the gene structure in nucleotides, and N is the total number of sequence reads mapped to the genome).

Statistical analyses of mRNA expression associated with diagnosis were conducted in R 3.0.1 using an analyses of covariance (ANCOVA) model. The diagnostic status was applied as independent variables, whereas age, sex, RIN, brain pH, and PMI were set as covariates.

Results

Rs9540720 in PCDH9 Shows Genome-Wide Significant Association with MDD

In this study, the top 10 000 most significant SNPs (two-tailed p5.31 × 10–5) in the 23andMe GWAS (Hyde et al, 2016), which contained the largest sample size, were first subject to replication analyses in the CONVERGE and PGC GWAS data sets (Converge consortium, 2015; Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013). Briefly, SNPs meeting the following criteria were collected for further analyses: (1) the SNPs showed a one-tailed p-value <0.05 in both CONVERGE and PGC GWAS data sets (Converge consortium, 2015; Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013), and (2) the SNPs showed the same direction of allelic effects across all three GWAS data sets (Converge consortium, 2015; Hyde et al, 2016; Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013). This replication investigation returned a total of 33 SNPs (Table 1).

Table 1 Association of 33 SNPs with Major Depressive Disorder

These 33 SNPs then underwent the meta-analysis using the three GWAS data sets. An intronic SNP rs9540720 in PCDH9 at 13q21.32, which showed nominal associations with MDD in 23andMe GWAS (two-tailed p=1.07 × 10–6, OR=1.030), CONVERGE GWAS (one-tailed p=3.00 × 10–3, OR=1.077), and PGC GWAS (one-tailed p=2.75 × 10–2, OR=1.041), was genome-wide significantly associated with the illness in the meta-analysis of all three data sets (a total of 89 610 cases and 246 603 controls, two-tailed p=1.69 × 10–8, OR=1.033, Table 1). Five SNPs (rs2875472, rs1831972, rs9592461, rs1444387, and rs9540728) in high linkage disequilibrium (LD) with rs9540720 in both European and Chinese populations also showed genome-wide significance in the meta-analysis (Table 1). A schematic presentation of the PCDH9 gene and the locations of rs9540720 and other tested SNPs in this gene are shown in Supplementary Figure S1; the LD relationship and haplotype structure of the PCDH9 risk SNPs in European and Chinese populations are also displayed in Supplementary Figure S1. Intriguingly, rs9540720 was not related to any of the 15 genome-wide significant SNPs previously reported in the 23andMe GWAS study (Hyde et al, 2016).

We further investigated the associations of rs9540720 with MDD in the 23andMe Replication sample and the Chinese MDD sample. Intriguingly, rs9540720 was also nominally associated with MDD in both the 23andMe Replication sample (one-tailed p=4.85 × 10–2, OR=1.014) (Hyde et al, 2016) and the Han Chinese MDD sample (one-tailed p=2.74 × 10–2, OR=1.125). In addition, meta-analysis combining these two replication samples consistently yielded a nominal association between rs9540720 and MDD (one-tailed p=2.88 × 10–2, OR=1.016). When all the discovery and replication samples were merged (a total of 136 115 cases and 355 275 controls), the association was further strengthened (two-tailed p=1.20 × 10–8, OR=1.027). Although the effect size for rs9540720 (ie, OR=1.027) was relatively small, it was still similar to the other previously reported GWAS significant SNPs in 23andMe GWAS (OR ranges from 1.028 to 1.051, Supplementary Table S1) (Hyde et al, 2016). This observation is likely attributed to the fact that MDD is polygenic with numerous alleles each accounting for a limited share of genetic risk for the illness (Peterson et al, 2017).

Taken together, rs9540720 shows significant associations with MDD in both Europeans and Han Chinese, in line with its stable allelic frequency across those two populations (Supplementary Figure S2a), suggesting that it is a possible common risk SNP for MDD among various ethnic groups.

Rs9540720 Is Associated with MDD-Related Phenotypes and Cognitive Functions

Given the strong implication of phenotypes such as neuroticism and depressive symptoms (Okbay et al, 2016a), preschool internalizing problems (Benke et al, 2014), as well as anxiety (Otowa et al, 2016) in MDD patients, we hypothesized that these phenotypes were also associated with rs9540720. We utilized the public GWAS resources on these MDD-related phenotypes, and found that this SNP was nominally associated with neuroticism (n=170 911, two-tailed p=7.84 × 10–3), preschool internalizing problems (n=4596, two-tailed p=3.94 × 10–2), and anxiety scores (n=15 299, two-tailed p=2.18 × 10–2), and was also marginally associated with depressive symptoms (n=180 866, two-tailed p=6.12 × 10–2). Intriguingly, the rs9540720 risk [G] allele carriers tended to show more vulnerable personality traits (or worse symptoms) compared with protective allele [A] carries in these series of analyses. It is hypothesized that MDD shares substantial genetic risk components with schizophrenia and bipolar disorder (Ding et al, 2015), and patients with these psychiatric disorders sometimes exhibit overlapping symptoms. We therefore assessed the associations of rs9540720 with schizophrenia and bipolar disorder in the large-scale GWAS samples to understand its role in these two illnesses. However, this SNP was not associated with schizophrenia (34 241 cases and 45 604 controls, two-tailed p=0.459) (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014) or bipolar disorder (10 410 cases and 10 700 controls, two-tailed p=0.983) (Ruderfer et al, 2014), suggesting that it is likely a MDD-specific risk variant.

Besides MDD-related phenotypes, it is hypothesized that MDD risk-associated SNP may also affect cognitive abilities. To test the influence of rs9540720 on cognitive functions, we utilized public GWAS data sets of educational attainment (Okbay et al, 2016b) and childhood intelligence (Benyamin et al, 2014), respectively. In this explorative analysis, rs9540720 was significantly associated with educational attainment (two-tailed p=1.09 × 10–2) and childhood intelligence (two-tailed p=1.20 × 10–4). Notably, the MDD risk allele indicated lower education levels and lower scores in standard IQ tests.

Collectively, these data consistently demonstrated negative impact of rs9540720 risk allele on a variety of MDD-related phenotypes and cognitive abilities, further supporting our hypothesis that rs9540720 was a susceptibility SNP for MDD.

Risk Genotypes in rs9540720 and MDD Diagnosis Predict PCDH9 mRNA Expression

The associations of rs9540720 with MDD and related phenotypes in multiple independent samples lend statistical and biological support for the involvement of this genomic locus in the risk of the illness. However, the exact causal variant and underlying molecular mechanisms remained to be determined. This mission is often difficult in genetic association studies because an associated SNP most likely points to a larger region containing numerous correlated variants with a high degree of LD (Li et al, 2016). For this reason, we explored the LD between rs9540720 and surrounding SNPs to investigate whether there were SNPs linked with rs9540720. A proxy search for SNPs of LD with rs9540720 was performed on the SNAP website with the European panel from the 1000-Human-Genomes (pilot 1) data set (http://archive.broadinstitute.org/mpg/snap/ldplot.php), and returned quite a few SNPs in relatively high LD (r2>0.80) with rs9540720 (Supplementary Figure S2b and Supplementary Table S2). Notably, all the rs9540720-linked SNPs were located in the PCDH9 intron region, and we therefore searched for potential functional SNPs through bioinformatics predictive analyses via synthesizing annotation information of noncoding elements and genomic properties regarding GC content, evolutionary conservation, and so on using the GWAVA data set (http://www.sanger.ac.uk/sanger/StatGen_Gwava) (Ritchie et al, 2014). This functional prediction showed that those SNPs in high LD with rs9540720 were unlikely located in the DNA segments binding to transcription factors or histone markers (eg, H3K4me1, H3K4me3, H3K9ac, and H3K27ac) (Supplementary Table S2) (Ritchie et al, 2014), and thus we were unable to identify the causative variant without performing further functional assays. However, there was also the possibility that rs9540720 was an eQTL of a specific gene. According to the data presented in Supplementary Figure S2b, there was only PCDH9 gene within 400 kb around rs9540720 and its LD SNPs, and hence we examined the association between rs9540720 and PCDH9 expression using data from BrainCloud (Colantuoni et al, 2011). Interestingly, the risk SNP rs9540720 was associated with PCDH9 mRNA expression in human frontal cortex in 152 healthy adult individuals (two-tailed p=0.014, Figure 1a), with the risk allele predicting lower PCDH9 mRNA levels. In addition, most other MDD risk SNPs in this genomic region (with genotypes available in BrainCloud) were also associated with lower PCDH9 mRNA expression in the frontal cortex of these subjects (Supplementary Table S3).

Figure 1
figure 1

Risk genotype and diagnosis predict PCDH9 expression. (a) Association of rs9540720 with PCDH9 mRNA expression in the 152 adult subjects from frontal cortex in the BrainCloud data set (Colantuoni et al, 2011). Number of individuals: GG, N=24; GA, N=75; AA, N=53. (b) Diagnostic analysis of PCDH9 expression in adult samples from frontal cortex in dbGaP using microarray data. MDD, major depressive disorder. Number of individuals: MDD, N=131; Control, N=103. (c) Diagnostic analysis of PCDH9 expression in the frontal cortex and hippocampus of adult samples from Stanley center using RNA-seq data. N=15 in each group.

PowerPoint slide

To gain further insights into the potential pathophysiological roles of PCDH9, we assessed the effects of diagnostic status on PCDH9 mRNA expression. According to the frontal cortex microarray mRNA expression data from dbGaP (103 adult controls and 131 adult MDD patients of European ancestry), PCDH9 expression was significantly decreased in MDD patients than in control subjects (two-tailed p=0.0007, Figure 1b). By analyzing the RNA-seq data of frontal cortex and hippocampus tissues from 15 MDD cases and 15 healthy controls from SMRI, we observed a consistent reduction of PCDH9 expression in MDD patients compared with healthy controls in both the hippocampus (two-tailed p=0.0086, Figure 1c) and the frontal cortex (two-tailed p=0.068, Figure 1c). In addition, in a previous transcriptome study using peripheral blood from The Netherlands Study of Depression and Anxiety cohort (Jansen et al, 2016), PCDH9 mRNA expression was also significantly decreased in 882 patients with current MDD vs 331 healthy controls (two-tailed p=0.00402, Supplementary Table S2 in the original study (Jansen et al, 2016)). Moreover, our observations were also partly in concordance with another earlier study (Klempan et al, 2009) showing reduced PCDH9 expression in the DLPFC and inferior frontal gyrus of suicides with MDD compared with healthy controls (Table 2 in their original study). Interestingly, no differences of PCDH9 expression between nondepressed suicides and healthy controls were reported (Table 2 in their original study) (Klempan et al, 2009). These diagnostic results were consistent with our eQTL analysis results that individuals carrying the MDD risk allele at rs9540720 (and other risk LD SNPs) had lower mRNA levels. Notably, PCDH9 expression was enriched in brain tissues compared with other organs (Supplementary Figure S3, PCDH9 expression in brain tissues is marked in yellow box), highlighting its potential role in brain disorders (eg, MDD).

In summary, we have presented convergent and consistent evidence suggesting that PCDH9 is a MDD susceptibility gene, and the reduced expression of this gene might contribute to the pathogenesis of the illness.

Discussion

MDD is a major complex mental disorder affecting millions of people worldwide. The advent of genetic studies, especially the GWASs, has greatly promoted our understanding of this illness (Ding et al, 2015). Although several promising candidate genes associated with the risk of MDD have been identified, studies clarifying more genetic risk factors and the underlying mechanisms for this illness are urgently needed. We believe that if a SNP exhibits (at least) nominal associations with a particular illness among multiple individual samples, and presents genome-wide significance in the comprehensive meta-analysis combining all the above samples, it is then very likely to be an authentic disease relevant locus. To this end, we explored several independent GWAS data sets and report that a locus rs9540720 predicting the mRNA expression of PCDH9 possessed genome-wide significant association with MDD. In addition to showing that rs9540720 was associated with MDD in 23andMe, CONVERGE, and PGC as well as in the combined meta-analysis, we confirmed that the risk allele of this locus was associated with multiple phenotypes related to MDD pathogenesis and cognitive abilities using a series of convergent analyses. Taken one step further, we examined the roles of rs9540720 and PCDH9 in clinical diagnosis of MDD, and observed a consistent pattern of allelic associations that risk genotype at this SNP was linked to MDD diagnosis and lower PCDH9 mRNA expression. Although the analyses on MDD-related phenotypes and cognition failed to achieve genome-wide significance, it was unlikely that the risk allele predicted abnormality associated with the illness in each of those independent phenotypes across diverse samples purely by chance. Moreover, expression of the PCDH9 gene in brain tissues was also in line with the direction of allelic association, indicating that aberrant regulation of PCDH9 transcription was the likely molecular mechanism underlying the genetic risk conferred by rs9540720 or its LD SNPs. Taken together, PCDH9 is likely a bona fide MDD susceptibility locus.

In addition, one feature of the current study is the utilization of transethnic data that are believed to leverage LD structural differences across various ethnic groups and thus promote the resolution of fine-mapping causal variants (Li et al, 2017; Morris, 2011). In fact, this idea has also been applied in a recent MDD study (using CONVERGE and PGC data sets) (Bigdeli et al, 2017), in which the authors successfully demonstrated a partially shared polygenic basis of the illness between Han Chinese and European populations (the trans-ancestry genetic correlation of lifetime MDD was 0.33, whereas female-only and recurrent MDD yielded estimates of 0.40 and 0.41, respectively); however, the PCDH9 region was not highlighted in their study, probably because of the limited sample size (Bigdeli et al, 2017). Although some population-specific risk loci may fail to be identified, this strategy warrants higher detection power for complex trait loci. In our study, rs9540720 showed nominal associations with MDD in both Europeans and Asians, in line with its stable allelic frequency across those two populations (Supplementary Figure S2a), suggesting that it is a possible common risk SNP for MDD among various ethnic groups.

The SNP rs9540720 locates on the chromosome 13q21.32, a chromosomal region that was not recognized as a major locus in the few GWASs of MDD (Lewis et al, 2010; Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium et al, 2013; Muglia et al, 2010; Rietschel et al, 2010; Shi et al, 2011; Wray et al, 2012), probably because of the limited sample sizes. Our meta-analysis combining multiple GWASs significantly boosted the sample size for the genetic analyses and successfully detected associations of PCDH9 SNPs. The PCDH9 gene encodes protocadherin 9, a protein of the protocadherin family and cadherin superfamily (transmembrane proteins containing cadherin domains). Interestingly, we have recently identified another gene encoding a protein of the same family, PCDH17 (locates on 13q21.1), to confer risk of MDD and bipolar disorder (Chang et al, 2017). Both the short distance between these two genes on the chromosome (8.57 Mb, but there is no LD between risk SNPs at PCDH9 and PCDH17) and structural similarities shared by their protocadherin protein products suggest potential correlated biological roles. Although the mechanisms by which PCDH9 contributes to MDD pathogenesis are unclear, PCDH17 has been demonstrated to significantly affect synaptic development (Chang et al, 2017; Hoshina et al, 2013). Intriguingly, a recent quantitative analysis of basic mouse behavior uncovered the role of Pcdh9 in specific cognitive functions required for long-term recognition (Bruining et al, 2015). Therefore, the protein encoded by PCDH9 may also affect signaling at neuronal synaptic junctions (Asahina et al, 2012), and further studies investigating this hypothesis are necessary.

In addition to PCDH9, we have also observed marginal genome-wide significant associations between MDD and the SNPs in chromosome 1 region in this meta-analysis (lowest SNP rs12127723, p=9.63 × 10–8, Table 1). Further validation of the risk associations in additional samples is therefore necessary. It should be noted that this chromosome 1 region contains large-scale number of genomic variants that are in high LD with the risk SNPs (eg, rs12127723 and rs12128855) observed here (r2>0.8, Supplementary Figure S4), and these risk SNPs may tag one or more potential functional loci that needs to be identified. However, there are no protein-coding genes within this genomic region (Supplementary Figure S4), adding difficulties to further investigations of the molecular mechanisms underlying this potential genetic risk locus.

Although we herein present a comprehensive report of rs9540720 in the risk of MDD, there are certain limitations to be acknowledged. For example, the present analyses are primarily based on publicly available databases, and the full-scale analyses thus depend on the accessibility of the complete data set. In this case, only a selection of 10 000 SNPs are available from the 23andMe data set (Hyde et al, 2016), and the current analysis therefore falls short of the data from 23andMe (indicated as well by 1 vs 15 GWAS hits). Although the purpose of a meta-analysis is normally to maximize the power to detect true effects, the maximal power is not available in our study because of this limitation. Further genome-wide meta-analyses are thus necessary to reflect additional risk loci. Another caveat lies in the eQTL analysis utilizing both Europeans and African-American subjects to maximize the statistical power, even though the allelic frequencies of rs9540720 did not differ significantly between these two ethnic groups (A allele, 0.506 in Europeans and 0.602 in African Americans), and the eQTL effect sizes were also similar among different populations. Future eQTL analyses utilizing larger samples of Europeans and Han Chinese are needed to confirm the current observations.

In sum, using the existing GWAS data sets on MDD, MDD-related phenotypes plus cognitive function, as well as databases of brain tissue gene expression, we have identified a novel genome-wide significant MDD risk gene PCDH9 through a convergent meta-analysis. Although potential concerns remain to be addressed, our study highlights the necessity and importance of excavating the public data sets to explore undiscovered risk genes for MDD and other complex diseases.

Funding and Disclosure

The authors declare no conflict of interest.