Introduction

Despite the Psychiatric Genomic Consortium (PGC)’s collaborative efforts of major depressive disorder (MDD) meta-analysis, variants predictive of disease susceptibility remain elusive, partly due to the heterogeneous nature of the disorder. Treatment for MDD and genetic predictor of treatment response are equally challenging. There are ~30 antidepressants available for MDD patient care and response to treatment varies in time to onset of benefit, overall efficacy, and duration of effect. Genetic variability may contribute to the differences in drug-specific, class-specific or antidepressant-wide treatment non-response/resistance. Several antidepressant efficacy genome-wide association studies (GWASs) have been conducted using samples from the Munich Antidepressant Response Signature project (a naturalistic prospective study, n=339, genotyped using Illumina Sentrix Human-1 (109,000 loci) and HumanHap300 (317 000 loci) BeadChip),1 the Genome-Based Therapeutic Drugs for Depression (GENDEP) project (n=394 on escitalopram and n=312 on nortriptyline genotyped using Illumina Human610-quad chip),2 the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study (n=1491 on citalopram genotyped using the Affymetrix 500K and 5.0 Human SNP Arrays),3 and the Mayo Clinic Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study (PGRN-AMPS) study (n=529 on selective serotonin reuptake inhibitor (SSRI) genotyped using Illumina Human610-Quad BeadChips).4 The largest antidepressant efficacy GWAS meta-analysis performed using the data from three studies among subjects of European ancestry only (STAR*D n=980, GENDEP n=706, and Munich Antidepressant Response Signature including additional samples genotyped n=604 resulting a total sample size of 2256 across three studies) did not identify any variants passing genome-wide significance threshold with primary outcome measurements (percentage improvement on the clinician-rated depression scale and remission in each study after 12 weeks of treatment), although a secondary analysis restricted to 1354 individuals treated with either citalopram or escitalopram revealed a variant rs12054895 (P=2.65 × 10−8) in the intergenic region of chromosome 5 associated with early improvement after 2 weeks of antidepressant treatment.5 Drug response information obtained from self-reported questionnaires may offer an alternative approach to conduct a study with much larger sample size.

MDD is a heterogeneous disease,6 which partially explained why finding disease risk variants for MDD seems to be more challenging than for schizophrenia and bipolar. Disease subtypes have been defined by clinical features such as MDD with the melancholic, atypical or anxious distress specifiers.7 Theoretically, different disease subtypes may be defined by different underlying molecular mechanism, which in turn determines the response to different therapeutic agents. In this study, we explored different disease subtypes as defined by response status and compared the heritability and genetic correlation to reference populations (PGC1 MDD, bipolar and schizophrenia).

Materials and methods

Cohort

Approximately 48 000 research participants, drawn from 23andMe (Mountain View, CA, USA), provided saliva samples for genetic testing, informed consent and answered surveys (‘Antidepressant Efficacy and Side Effects’ and ‘Your Profile and Health History’) online between June 2013 and June 2015 according to a human subjects protocol reviewed and approved by Ethical & Independent Review Services, an AAHRPP-accredited Institutional Review Board (http://www.eandireview.com). Studies on other selected phenotypes from 23andMe were previously reported.8, 9, 10, 11, 12, 13, 14

Sample genotyping and SNP data imputation

DNA extraction and genotyping were performed as described before,9 on saliva samples by National Genetics Institute (NGI, Los Angeles, CA, USA). Samples have been genotyped on one of four genotyping platforms. The V1 and V2 platforms were variants of the Illumina HumanHap550+ BeadChip (Illumina, San Diego, CA, USA), including about 25 000 custom single-nucleotide polymorphisms (SNPs) selected by 23andMe, with a total of about 560 000 SNPs. The V3 platform was based on the Illumina OmniExpress+ BeadChip, with custom content to improve the overlap with the V2 array, with a total of about 950 000 SNPs. The V4 platform in current use is a fully custom array, including a lower redundancy subset of V2 and V3 SNPs with additional coverage of lower-frequency-coding variation, and about 570 000 SNPs. Samples that failed to reach 98.5% call rate were reanalyzed. Participant genotype data were imputed against the September 2013 release of 1000 Genomes15 Phase1 reference haplotypes, phased with ShapeIt2. Prior to imputation, we excluded SNPs with Hardy–Weinberg equilibrium P<10−20, call rate <95%, or with large allele frequency discrepancies compared to European 1000 Genomes reference data. Additional details on the imputation procedure could be found in Supplementary Text S1.

Phenotype data and phenotypic analysis groups

The ‘Antidepressant Efficacy and Side Effects’ questionnaire was designed by 23andMe in collaboration with Dr Steven Hamilton of Kaiser Permanente San Francisco Medical Center and Carol Cochran Schaffner at the University of California, San Francisco. The questionnaires asked respondents in their use of antidepressants and antipsychotics in the last 5 years and the effect qualitatively (for example, ‘How well did Wellbutrin/bupropion work for you?’. The list of drugs includes SSRIs (citalopram, escitalopram, fluoxetine, paroxetine and sertraline), serotonin-norepinephrine reuptake inhibitors (duloxetine, venlafaxine and desvenlafaxine), norepinephrine–dopamine reuptake inhibitor (NDRI) (bupropion), serotonin antagonist and reuptake inhibitor trazodone, and atypical antipsychotics (quetiapine, olanzapine and aripiprazole). The antidepressant efficacy question in the survey has five possible answers ranging from a great deal (coded as 4 for later reference), a fair amount (3), somewhat (2), a little (1) to not at all (0). The ‘Your Profile and Health History’ survey asked a member’s medical history.

Using phenotype data collected from 23andMe surveys (‘Antidepressant Efficacy and Side Effects’ and ‘Your Profile and Health History’) and genotype data from 23andMe’s research participants, we performed genome-wide association analyses on four groups of phenotypes (a) non-treatment-resistant depression (n=7795) vs treatment-resistant depression (TRD) (n=1311), (b) SSRI responders (n=6348) vs non-responders (n=3340), (c) citalopram/escitalopram responders (n=2963) vs non-responders (n=2005) and (d) bupropion responders (n=2675) vs non-responders (n=1861). All subjects included in the analyses self-reported taking antidepressants for depression indication and were of European ancestry. TRD were defined as subjects who reported efficacy 1 to at least 2 antidepressants and never reported efficacy 3 to any antidepressant, whereas non-TRD were defined as subjects who reported efficacy 3 to at least one antidepressants and never reported efficacy 1 for any antidepressant. SSRI non-responders were defined as subjects who reported efficacy 1 to at least one SSRI and never reported efficacy 3 to any SSRI antidepressant, whereas SSRI responders were defined as subjects who reported efficacy 3 to at least one SSRI and never reported efficacy 1 to any SSRI antidepressant. Citalopram/escitalopram non-responders were subjects who reported efficacy 1 to either citalopram or escitalopram and never reported efficacy 3 to either citalopram or escitalopram, whereas citalopram/escitalopram responders were subjects who reported efficacy 3 to either citalopram or escitalopram and never reported efficacy 1 to either citalopram or escitalopram. Likewise, bupropion non-responders were subjects who reported efficacy 1 to bupropion and bupropion responders were subjects who reported efficacy 3 to bupropion.

For each of the four phenotype groups, the resistant/non-responder group and the non-resistant/responder group were also compared with healthy controls (n~190 000) self-reported to be free of any of the following conditions based on the survey data captured from the ‘Your Profile and Health History’ survey: attention-deficit/hyperactivity disorder; anxiety; schizophrenia; depression; bipolar; obsessive-compulsive disorder; autism; post-traumatic stress disorder; and insomnia.

Genome-wide association analysis

We restricted GWAS to a set of unrelated individuals who have >97% European ancestry, as determined through an analysis of local ancestry. Standard quality control on directly genotyped markers excluded (1) SNPs that were only genotyped on our ‘V1’ and/or ‘V2’ platforms due to small sample size, and SNPs on chrM or chrY; (2) SNPs that failed a test for parent-offspring transmission using trio data; (3) Hardy–Weinberg P<10−20 in Europeans; or (4) SNPs with call rate of <90%; (5) SNPs with genotyping batch effect. For imputed markers, we excluded markers with avg.rsq<0.5 or min.rsq<0.3 in any imputation batch and markers with significant imputation batch effect. For case control comparisons, we compute association test results by logistic regression assuming additive allelic effects using custom scripts implemented by 23andMe in C++ programing language, which were also used to compute association test results in previous publications.8, 9, 10, 11, 12, 13, 14 For tests using imputed data, we use the imputed dosages rather than best-guess genotypes. We include covariates for age, gender, genotype platforms and the top five principal components to account for residual population structure. The association test P-value we report is computed using a two-sided likelihood ratio test, which is better behaved than a Wald test on the regression coefficient. A P-value threshold of 5 × 10−8 is considered to be genome-wide significant. No additional multiple testing correction was applied for considering multiple phenotype groups. Additional details on the method could be found in Supplementary Text S1.

Genetic heritability and genetic correlation estimates

PGC phase 1 disease susceptibility summary association statistics for MDD, bipolar, and schizophrenia16, 17, 18, 19 were downloaded from PGC website (http://www.med.unc.edu/pgc/downloads) and included together with the summary statistics from this study as reference data sets for genetic heritability and genetic correlation estimates. Phenotypic variance explained by variants (both genotyped and imputed, mostly SNPs) (h2) for each of the phenotype groups and the genetic correlation between traits (rg) were estimated using association statistics as implemented in LD Score regression.20

Gene set enrichment analysis

INRICH is a pathway analysis tool for genome-wide association studies, designed for detecting enriched association signals of linkage disequilibrium (LD)-independent genomic regions within biologically relevant gene sets.21 Reference gene sets used in the INRICH analysis include KEGG, Gene Ontology and Molecular Signature Database (v5.0). Top variants from responder vs non-responder analyses with nominal association P-value <0.0005, 0.0001, 0.00005, 0.00001 were separately fed into PLINK to clump the variants into LD-independent genomic intervals (r2 threshold using 0.2, 0.3 and 0.5, respectively), then LD-independent genomic regions were used for INRICH (version 1.0) analyses. No multiple testing corrections were applied for running INRICH against multiple reference gene sets or for using multiple parameters (P-value cutoff and LD threshold).

Results

Sample sizes for each phenotypic group together with demographic variables such as gender and age as well as genotyping platform used and GWAS genomic control inflation factor lambda are listed in Supplementary Table S1. Overall, the prevalence rate for depressed patient was greater in females than in males, consistent with the extant epidemiological literature. Lambda scores (λ1000) between 1.002 and 1.013 revealed no departures from uniform distributions of P-values across ~12 million genotyped and imputed markers.

Out of a total of 12 genome-wide association analyses performed, only the bupropion responders vs non-responders analysis yielded a locus reaching genome-wide significance threshold (P=5 × 10−8; Figure 1a). The most significant association variant rs1908557 (P=2.6 × 10−8; OR =1.35) was located within the intron of human spliced expressed sequence tags between known genes GPRIN family member 3 (GPRIN3) and synuclein alpha (SNCA) in chromosome 4 (Figure 1b) overlapping with enhancer/promoter-associated histone mark H3K4Me1. Each copy of rs1908557-C allele was associated with higher odds of being bupropion non-responder. The frequency of C allele was relatively common (minor allele frequency=25%) in the study population. No SNP passed genome-wide significance threshold in any other GWAS analyses. Manhattan and quantile–quantile plots for all analyses (Supplementary Figure S1 and S2) as well as variants with a nominal association P-value in any of the 12 analyses <5 × 10−5 (Supplementary Dataset 1) are available online. The indexed SNPs with P<1 × 10−5 for the responders vs non-responders analyses and the responder subgroup vs healthy control group analyses are listed in Tables 1 and 2, respectively.

Figure 1
figure 1

Bupropion responders vs non-responders GWAS. (a) Manhattan plot—the Manhattan plot depicts the distribution of association test statistics versus genomic position, with chromosomes 1 to 22, X, and Y arranged along the x axis. The y axis represents log-scaled P-values. Positions with P<5 × 10−8 (a score of about 7.3) are shown in red. Loci with smallest P<10−6 are labeled with the name of the nearest gene. A ‘good’ Manhattan plot should show towers of single-nucleotide polymorphisms (SNPs) with small P-values supporting most signals that pass the genome-wide threshold. (b) Regional association plots—the regional association plots show association test statistics versus position in the vicinity of the strongest associations. The plots are generated with LocusZoom,22 using linkage disequilibrium data from the March 2012 release of 1000 Genomes data. In the plots, a ‘o’ symbol indicates a genotyped SNP and a ‘+’ indicates an imputed SNP. Color indicates strength of linkage disequilibrium with the index SNP. GWAS, genome-wide association study.

Table 1 Index SNPs for strongest associations for responders vs non-responders and non-TRD vs TRD
Table 2 Index SNPs for strongest associations for responders or non-responders vs healthy controls, and non-TRD or TRD vs healthy controls

The heritability estimates for responders vs non-responders or non-TRD vs TRD analyses were generally unreliable with large standard errors and smaller h2 estimates (Table 3). The heritability estimates for each responder, non-responder, non-TRD, TRD group in comparison to healthy controls were between 0.14 and 0.22, consistent with the known heritability estimates for MDD (h2=0.19) estimated from PGC1 MDD samples. The genetic correlation between responder vs non-responder and non-TRD vs TRD in general are less similar than genetic correlations among responder/non-responder/non-TRD/TRD vs healthy controls, as expected.

Table 3 Heritability and genetic correlation estimates

Gene set enrichment analysis may yield signals of enriched gene sets in GWAS analysis despite the individual variants not reaching genome-wide significance. Applying INRICH21 enrichment analysis to the NDRI (bupropion) responders vs non-responders GWAS results implicated gene sets such as circadian rhythm, long-term depression (LTD), and vascular endothelial growth factor (VEGF) pathway being enriched among the bupropion suggestive association hits (P<0.0005; Table 4, Supplementary Table S2 for gene lists belonging to each enriched gene sets in Table 4, Supplementary Dataset 2 for SNP lists corresponding to gene lists from Supplementary Table S2). Additional enriched gene sets for other GWAS analyses are listed in Supplementary Table S3.

Table 4 Pathway Enrichment of bupropion response vs non-response (pcorr < 0.05)

Discussion

We identified a candidate genetic marker rs1908557 in the intergenic region between GPRIN3 and SNCA for bupropion response with an association P-value passing genome-wide significance using a phenotype derived from survey data. This finding ultimately will require replication in clinically ascertained samples to further dissect the genetic basis of treatment response to bupropion among depression patients. The biological significance of rs1908557 is unknown except that rs1908557 is also marginally associated with two brain regions known to exhibit volumetric difference between MDD subjects and healthy controls. Specifically, patients carrying the common T allele are associated with lower hippocampal (P=0.016), amygdala (P=0.067) and nucleus accumbens volumes (P=0.067).23 In the recent ENIGMA analysis of 1728 MDD patients and 7199 controls, MDD (especially recurrent MDD and patients with earlier age of onset (AOO21 years)) had significantly lower hippocampal volumes, although patients with earlier AOO also showed a trend toward smaller amygdala volumes.24 In our analysis the T allele was associated specifically with better bupropion response. It remains unclear whether the intergenic variant has any biological impact on the neighboring gene SNCA. So far the limited sample size from GTEx did not support any eQTL relationship between rs1908557 and the neighboring genes; however, the overlap of enhancer/promoter-associated histone mark H3K4Me1cannot rule out that there may be still more subtle relationship between neighbor genes and rs1908557. However, eQTL data from BRAINEAC25 (http://braineac.org/) suggested a subtle cis-eQTL relationship between rs1908557 and genes further away from rs1908557 such as family with sequence similarity 13 member A (FAM13A), multimerin 1 (MMRN1), HECT and RLD domain containing E3 ubiquitin protein ligase 5 (HERC5), and coiled-coil serine rich protein 1 (CCSER1, also known as FAM190A; see Supplementary Table S4). Additional discussion of the possible genes involved is available from Supplementary Text S2. In addition, several variants with suggestive association with bupropion response such as rs8076666 in solute carrier family 26 (anion exchanger), member 11 (SLC26A11, P=1.59 × 10−6) and rs9373491 in glutamate receptor, metabotropic 1 (GRM1, P=3.01 × 10−6) also showed suggestive volumetric relationships with brain volumes23 (Supplementary Dataset 1).

It is noteworthy that the gene set enrichment analysis of bupropion GWAS yielded the most gene sets of interest to MDD. In particular, circadian rhythm, LTD and VEGF pathway genes were enriched among the bupropion suggestive association hits (P<0.0005). Alteration of circadian rhythms and disturbances of sleep are common features of the major depressive syndrome. Variants from period circadian clock 3 (PER3, representative intronic variants rs7528457, P=3.81 × 10−5; rs12137927 P=6.30 × 10−5), RAR-related orphan receptor A (RORA, rs185937898, P=5.51 × 10−5), and nuclear receptor subfamily 1, group D, member 1 (NR1D1) were accountable for circadian rhythm gene set enrichment in the bupropion response GWAS. Acute and chronic stress, which putatively can be precipitating factors of MDD, also can affect rhythms of the circadian pacemaker. PER3 is a member of the Period family of genes and is expressed in a circadian pattern in the suprachiasmatic nucleus, the primary circadian pacemaker in the mammalian brain. Genes in this family encode components of the circadian rhythms of locomotor activity, metabolism and behavior. PER3 is upregulated by CLOCK/ARNTL heterodimers but then represses this up-regulation in a feedback loop using PER/CRY heterodimers to interact with CLOCK/ARNTL. RORA is a member of the NR1 subfamily of nuclear hormone receptors and aids in the transcriptional regulation of some genes involved in circadian rhythm. Rs228697 in PER3 has been linked to morningness–eveningness preference and circadian rhythm sleep disorders, although the length polymorphism/VNTR in PER3 has been linked to stress response and bipolar disorder..26, 27, 28 In addition, rs12137927 (the same SNP implicated in our study) and rs228644 from PER3 and rs11632098 from RORA were reportedly linked to endorsing the presence of both a modest number (>2 to <6) and a high number of depressive symptoms (6) on the Geriatric Depression Scale as compared with endorsing none-few depressive symptoms (0–2).29 The variant with association signal of P<0.0001 in the NR1D1 interval is variant rs10305315, located downstream of IGFBP4 (P=5.99 × 10−5). It remains unclear whether rs10305315 affects the function of NR1D1 (no eQTL evidence from GTEx portal based on GTEx Analysis Release V6). NR1D1 encodes a ligand-sensitive transcription factor that negatively regulates the expression of core clock proteins. The expression of this gene represses the circadian clock transcription factor aryl hydrocarbon receptor nuclear translocator-like protein 1 (ARNTL).

Variants from glutamate receptor, ionotropic, delta 2 (GRID2, rs76800659 P=4.21 × 10−5), glutamate receptor, metabotropic 1 (GRM1, rs2328741 P=1.12 × 10−5; rs2268666 P=4.75 × 10−6), glutamate receptor, metabotropic 5 (GRM5, rs308873 P=2.24 × 10−5), corticotropin releasing hormone receptor 1 (CRHR1) and protein kinase C, alpha (PRKCA, rs34337960 P=1.49 × 10−5) were accountable for LTD gene set enrichment. l-glutamate is the major excitatory neurotransmitter in the central nervous system and activates both ionotropic and metabotropic glutamate receptors. Glutamatergic neurotransmission is involved in most aspects of normal brain function and putatively is perturbed in many neuropathological conditions. In particular, glutamate has been implicated in the pathophysiology and treatment of mood disorders, particularly with respect to altered transmission in limbic–thalamocortical circuits.30 Rs2268666 in GRM1 was shown to be associated with unipolar depression (UPD) phenotype in a discovery cohort of 350 patients and 370 matched controls (P=7.0 × 10−5 in allelic test and P=0.0002 in genotypic test with T being the risk allele, both passing multiple testing correction threshold), and was partially replicated in an independent cohort of 904 patients and 1012 controls (genotypic test P=0. 02, allelic test P=0.59).31 Furthermore, patients homozygous for the non-risk genotypes (C/C) showed reduced hippocampal glutamate levels as measured by ¹H-MR-spectroscopy, a more pronounced normalization of HPA-axis hyperactivity using a combined dexamethasone suppression/CRH-challenge (DEX/CRH) test,32 and a better antidepressant treatment outcome.31 Likewise, patients with each copy of C allele of rs2268666 also showed an increased likelihood of being bupropion responders in this study (P=4.75 × 10−6, OR=1.22). Knockout mice mGluR5(-/-) displayed more depression-like behaviors (learned helplessness, social withdrawal and anhedonia) than control mice following exposure to stressful stimuli, whereas lentiviral 'rescue' of mGluR5 in the nucleus accumbens decreased these depression-like behaviors in mGluR5(-/-) mice.33 mGluR5 may be involved in the regulation of neural network activity and synaptic plasticity. d-Serine is an endogenous co-agonist for N-methyl-d-aspartate receptors and regulates neurotransmission and synaptic plasticity including long-term potentiation and LTD.34, 35 d-Serine serves as an endogenous ligand for GluD2, the predominant excitatory neurotransmitter receptors in the mammalian brain, to regulate LTD at synapses.

The VEGF pathway is another gene set enriched in bupropion response analysis. The neurotrophic hypothesis of MDD hypothesizes that the neuropathology of MDD involves a downregulation of neurotrophin signaling, involving both brain-derived neurotrophic factor and the multi-competent angiogenic and neurogenic molecule VEGF in hippocampal neurogenesis. A recent meta-analysis of fourteen studies (n=1633) showed that VEGF levels were significantly elevated in individuals with MDD when compared to healthy controls,36 supporting that this mediator may be involved in neuroplasticity mechanisms underlying or compensating for the pathophysiology of MDD. Variants from fms-related tyrosine kinase 4 (FLT4, rs189869480, P=6.12 × 10−5), PRKCA, and phospholipase C, gamma 1(PLCG1, rs56012336, P=2.05 × 10−5) was accountable for VEGF pathway enrichment in the bupropion response GWAS. The protein encoded by PLCG1 catalyzes the formation of inositol 1, 4, 5-trisphosphate and diacylglycerol from phosphatidylinositol 4, 5-bisphosphate, and is a major enzyme of the phosphatidylinositol second messenger system. Polymorphisms in PLCG1A were associated with bipolar disorder37 and response to lithium.38, 39

Many variants from candidate gene studies and GWAS analyses have been associated with antidepressant treatment response with various strength of association in mostly small sample sizes. Findings of selected variants reported from prior candidate gene and GWAS studies are discussed in Supplementary Text S3.

The consistent heritability estimates between epidemiologically ascertained depression samples and PGC1 MDD samples (which in large part were clinically ascertained) and the high genetic correlation estimated between our phenotypes and PGC1 MDD patients suggest that self-reported samples do not significantly bias the recruitment of diagnostic class, although the self-reported samples are certainly more heterogeneous and may include both MDD and minor depressive disorder samples. The apparent overlap in genetic architecture between drug/class-specific and antidepressant-wide responders/non-responders is not surprising given that these research participants have a history of self-reported depression. There are inherent challenges of interpreting single arm retrospective study where self-reported outcome assessment is the only data collection modality. These challenges include lack of diagnostic certainty, recall biases and qualitative nature of outcome assessment, and whether patients are optimally dosed and medication compliance for minimal dose exposure (for example, 6 weeks). The Antidepressant Efficacy and Side Effect Survey contained a question ‘For how long did you take Wellbutrin/bupropion?’ with the following four answers ‘Less than 4 months; 4 months–1 year; 1–5 years; and Over 5 years’. The shortest time duration <4 months will not allow us to disentangle minimal medication exposure of 6 weeks. However, among the ~2500 patients reporting efficacious bupropion response, only ~150 (~6%) took the medication for <4 months. In contrast, among the ~1800 patients who reported lack of efficacy to bupropion, ~830 (46%) took the medication for longer than 4 months.

This study grouped escitalopram and citalopram into one analysis group to increase the sample size. Citalopram is a racemic mixture of a pharmacologically active S-enantiomer (that is, escitalopram) and the R-enantiomer, which is putatively inactive. The study explored both drug-specific and class-specific effects and focused on the groups with the largest sample sizes as well as on the treatment-resistant subgroup. Despite the interesting finding from the bupropion GWAS analysis and the consistency (though not significant at the genome-wide significance threshold) between this study and previously reported studies for key MDD, pharmacokinetics, and pharmacodynamics genes relevant to antidepressant efficacy response, the genetic variants alone are unlikely to deliver clinically actionable predictive diagnostic tests. A more comprehensive approach using a composite signature of predictors ultimately may be required to predict treatment outcome to a particular drug class with sufficient sensitivity and specificity to warrant its use in the clinic.