Introduction

Most genetic diseases, Mendelian or not, are now known to be complex;1 the broad strategy used to identify Mendelian disease genes, which requires that each susceptibility factor have a large independent effect on disease risk, has not been successful for common and complex disorders.2 In spite of the recent application of genome-wide association studies,3 which have identified hundreds of genetic variants associated with chronic illnesses, the search for and interpretation of these genetic associations continue to be a daunting task. In many cases, associated genetic variants account for a very small component of the phenotypic variance (3%), despite high estimates of heritability (0.80).4 Because of this disappointingly small power for phenotypic prediction, several authors have suggested that genome-wide association study is more informative about the novel biological pathways rather than for the clinical diagnosis and use in predictive medicine.5, 6

It has been proposed that multifactorial disorders fit the ‘common disease–common alleles model’,2 in which the cumulative linear impact of multiple common small-effect genetic variants (gene–gene interactions) in the context of environmental exposures to increase the individual risk, exceed a biological threshold and produce an altered phenotype (genetic–environment interactions).3 It is also reasonable to assume that larger risks will emerge from the nonlinear interaction of rare genetic variant with major effect. Nonsynonymous single-nucleotide polymorphisms (nsSNPs) are a main type of ‘functional’ SNPs, as they result in true amino-acid variation, and consequently may have a major impact on phenotype.

Major depressive disorder (MDD) is a complex illness characterized by a triad of symptoms: low or depressed mood, loss of interest or pleasure in almost all activities (anhedonia) and low energy/fatigability;7 MDD represents the second cause of disability.7, 8 In the United States, suicide, mostly a consequence of depression, is the eleventh overall cause of death; the third cause of death in the age group of 15–24 years and the fourth cause of death in the age group of 25–44 years.9 The economic burden of depression to the US economy is $100 billion annually.10 MDD heritability is estimated at 0.36–0.7 based on twin studies,11 that is, a significant genetic component underlies MDD susceptibility. Several candidate-gene approaches disclosed intriguing association of MDD to genetic variants, whereas genome-wide association studies only reported two loci that reached genome-wide significance.12, 13

This study was designed to investigate relationships/gene–gene interactions defined by nsSNPs in the pathways relevant to central nervous system function in case-control samples to find recurrent functional process themes in MDD.

Materials and methods

Subjects

Mexican-American sample

This sample of cases and controls consists of 321 healthy controls and 278 MDD aged 19 years or older (Supplementary Table 1). All participants were Mexican-Americans and had at least three grandparents born in Mexico. MDD was defined using the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, 4th edn) diagnosis of current, unipolar major depressive episode as assessed by the structured clinical interview for DSM-IV and a HAM-D21 (21-Item Hamilton Depression Rating Scale) score of 18 with item number 1 (depressed mood) rated 2. MDD patients were recruited in a pharmacogenetic study approved by the Institutional Review Boards of the University of California Los Angeles, University of Miami and Australian National University, and it has been registered in ClinicalTrials.gov (NCT00265292) as previously described.14, 15, 16 Briefly, in their primary language, all MDD patients had a comprehensive psychiatric and medical assessment based on the diagnostic and ratings instruments that had been fully validated in English and Spanish. Exclusion criteria included active medical illnesses that could be etiologically related to the ongoing depressive episode, current or active suicidal ideation with a plan and strong intent, pregnancy, lactation, current use of medications with significant central nervous system activity, which interfere with electroencephalographic activity (for example, benzodiazepines) or any other antidepressant treatment within the 2 weeks prior to enrollment, illicit drug use and/or alcohol abuse in the last 3 months or current enrollment in psychotherapy.14, 15, 16 Control individuals for our genomic studies were in general good health but were not screened for medical or psychiatric illness, they were age- and gender-matched and recruited from the same Mexican-American community in Los Angeles by the same bilingual clinical research team.14, 15, 16

Database of genotypes and phenotypes (dbGaP) sample data

We used genome-wide association study genotype information from the major depression dbGaP17 for replication purposes. Briefly, dbGaP provides open access to large samples of genetic and phenotypic datasets (this particular dataset included 1862 participants affected with major depression and 1857 unaffected controls).17

Genomic DNA collection and genotyping

At the initial visit, blood samples were collected under informed consent from the participating individuals into EDTA (K2EDTA) BD Vacutainer EDTA tubes (Becton Dickinson, Franklin Lakes, NJ, USA), and genomic DNA was isolated from those samples using Gentra Puregene DNA purification kits (Gentra Systems, Indianapolis, IN, USA). A total of 372 autosomal nsSNPs were selected in 188 genes from dbSNP (build 121). SNP assays were designed and typed with the Golden Gate assay as part of a 1536 multiplex reaction (59). DNAs with poor results (50% GC score <0.65) as well as loci with a low clustering score (<0.3) were removed. The threshold for retaining individual genotype calls was set to a score of 0.25. Data quality was assessed using duplicate DNAs across all plates. Genotypes from no matching or missing duplicates were dropped.

Data analysis

Standard genetic association analysis

The Mexican-American sample was used as a discovery-exploratory sample. Quality control and the standard genetic filters were applied following the STREGA statement checklist.18 Testing for Hardy–Weinberg equilibrium, a more appropriate approach when one allele is very rare, was performed as implemented in PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/). Deviation from Hardy–Weinberg equilibrium was tested separately for control and depressed groups. Nonsynonymous SNPs were excluded from the data analysis if they met the following criteria: SNP assays were successfully genotyped on <90% of samples, failed an exact test of Hardy–Weinberg equilibrium in controls (p<0.01) or were monoallelic in the whole sample. To control and identify the presence of hidden case-control genetic stratification, a major cause of discovering spurious associations, we applied rigorous strategies to discriminate genetic subdivision as described elsewhere.14, 19

Student's t-test was conducted to compare age and baseline HAM-D21 score means, and Pearson's χ2 test was performed to compare gender ratios between the subgroups using SAS version 9.1 (SAS Institute, Cary, NC, USA). Genotypic association with disease status was investigated based on two genetic models: dominant (homozygous for major allele versus the remaining genotypes) and recessive (homozygous for minor allele versus the remaining genotypes). Either the Pearson's χ2 test or the Fisher's exact test, whenever any cell count was <5, was applied using PLINK. Odds ratio (OR) and its 95% confidence interval were calculated using either the Woolf's method or the exact logistic regression model as implemented in SAS to appropriately handle frequency tables that contain cells with zero counts.20

Gene–gene interaction analyses

Additive interaction between two genetic variants was measured based on three measures: relative excess risk due to interaction (RERI=OR11−OR10−OR01+1), attributable proportion due to interaction (AP=RERI/OR11) and Rothman's synergy index (S=(OR11−1)/(OR10+OR01−2)), where subscript 0 denotes the absence of the risk genotype at the SNP and OR denotes the odds ratio. No interaction corresponds to RERI=0, AP=0 and S=1, whereas RERI>0, AP>0 and S>1 indicate an increase in the effect due to the additive interaction.21, 22

Linkage disequilibrium analysis and SNP tagging for dbGaP

HapMap databases (genotypes and phased haplotypes, phase III, release 2) were used to identify linkage disequilibrium blocks harboring SNPs significantly associated to depression risk in the Mexican-American sample. We then selected the dbGaP SNPs in the identified linkage disequilibrium blocks and performed the same comprehensive and extensive array of genetic association tests applied to the Mexican-American sample.

Advance recursive partition (tree-based) approach (ARPA)

We applied tree-based approaches because they are currently the most used in predictive analyses as they account for non-linear effects, fast solution for hidden complex substructure (gene–gene and gene–demography interactions) and release of truly non-biased, statistically significant analysis of highly dimensional, seemingly unrelated data. Furthermore, results supplied by tree-based analytics are easier to be visually and logically interpreted. Thus, we employed ARPA to test high-order gene–gene and gene–covariates (demographic data) interactions as it is implemented in HelixTree. Briefly, the basic process in recursive partitioning analysis is to divide a data set into parts where the individuals in each segment share common features. The dependent variable was the response MDD affection status, and genetic as well as demographic variables were used as predictors. Trees were interactively explored with the following restrictive criteria: (i) at least five elements per child to avoid small, possibly outlier groups, a common artifact consequence of recursive partitioning; (ii) an exact O(n2) algorithm for segmenting; (iii) a maximum cardinality of five for the multi-way split; (iv) a resampling approach with 5000 iterations to define the smallest bound for the P-value; and (v) split-stopping criteria defined as P-value 0.05 (corrected by Bonferroni).

Prediction of functional effect of human nsSNPs

We used the freely available Polymorphism phenotyping (PolyPhen-2) tool for predicting whether a specific nsSNP may be damaging23 (http://genetics.bwh.harvard.edu/pph2/).

Pathway and network analyses

In order to evaluate potential common ontogenetic as well as cellular process functionality of those genes disclosed by the ARPA analysis, we used Metacore 6.8 software build 29806 (GeneGo, St Joseph, MI, USA) tools to build networks using the list of six genes that had significant events of branching process.

Results

Discovery and exploratory phase with the Mexican-American data

After quality control filters were applied and Hardy–Weinberg equilibrium tested, a total of 252 nsSNPs in 155 genes were available for analytical procedures (Supplementary Table 2). It is relevant to mention that SNP frequency may vary widely among different ethnic populations. We studied Mexican-American individuals (Supplementary Table 3 shows the influence of ethnicity in significant nsSNPs); consequently, those frequencies may influence the risk-assessment tree structure. For this reason, we replicated this work in the dbGap sample.

In all, 15 nsSNPs showed significant genotypic association with the diagnosis of MDD (Table 1); 11 and 6 of those nsSNPs fit better a dominant and a recessive model, respectively.

Table 1 nsSNPs showing significant genotypic association with depression

ARPA analyses performed to assess MDD risk in clusters of variants showed that eight nsSNPs had high-order interactions in the control of tree growth (Figure 1). Our tree model provided 40% sensitivity to the diagnosis of MDD, 83% specificity for prediction of controls and 63% accuracy for the corrected prediction of MDD or control subjects. Significant additive interactions between two genetic variants were found for nine genotype combinations (Table 2).

Figure 1
figure 1

Tree analyses performed to assess MDD risk in cluster of variants. Eight nsSNPs had high-order interactions in the control of tree growth; they are located in the following genes: HSD3B1, PDLIM5, PSMD9, BDNF, USP36, PDE6C, PSMB4 and MYOC genes. Light-pink nodes=low MDD risk; light-green nodes=moderate MDD risk; light-blue nodes=high MDD risk; and purple node=reference MDD risk. Sensitivity=40%, specificity=83% and accuracy=63% (sensitivity refers to the proportion of cases who are correctly predicted as case; specificity refers to the proportion of controls who are correctly predicted as control; and accuracy refers to the proportion of cases and controls who are correctly predicted as case or control).

PowerPoint slide

Table 2 nsSNPs showing synergic effects on the risk of depression

Replication phase with the dbGaP sample

A preliminary univariate phenotype analysis reported significant associations to markers harbored at markers tagging HSD3B1, PDLIM5, INMT, ADRA1A and PSMD9 (Table 3). By controlling effects of significantly associated demographic factors, such as age, sex and comorbidities (substance abuse/dependence (nicotine and alcohol), among other variables), all of them available at dbGaP (Supplementary Table 4), we created a multidimensional MDD plus modifiers phenotype that explained a significant phenotype variance associated to the MDD phenotype. Using the multidimensional MDD plus modifiers phenotype, ARPA reconstructed a tree extraordinarily resembling the tree generated during the exploratory phase with the Mexican-American MDD sample. This dbGaP-based tree replicated most of the genes involved in the branching process; that is, the first five very significant events of branching are provided by: PDE6C, PSMD9, HSD3B1, BDNF, GHRHR and PDLIM5 (Figure 2).

Table 3 Replication results with the dbGap sample
Figure 2
figure 2

ARPA using dbGaP data reconstructed a tree resembling the tree generated during the exploratory phase with the Mexican-American MDD sample in Figure 1. Most of the genes involved in the branching process, that is, PDE6C, PSMD9, HSD3B1, BDNF and PDLIM5, were replicated. This tree has one gene (GHRHR) not present in the exploratory tree. Reconstruction of the tree highlighted that some covariates used by the multidimensional phenotype, that is, nicotine and alcohol use, provided a significant weight to the process of discriminating MDD-affected individuals from unaffected during the branching. For easier understanding, Figure 1 tree is reproduced in the right upper corner inset and genes in the two trees are color-coordinated.

PowerPoint slide

Pathway analyses

We used all six genes generated by significant events of branching provided by our dbGaP analyses, that is, PDE6C, PSMD9, HSD3B1, BDNF, GHRHR and PDLIM5, to build networks and perform the pathway analyses (Figure 3). These genes form a network with 156 interactions. Significant biological processes involved in this network include positive regulation of cellular and biological processes, nerve growth-factor receptor signaling pathway, and regulation of cell death (Supplementary Table 5).

Figure 3
figure 3

Network analyses showed that all the genes involved in the major branching events in Figure 2 are related to positive regulation of cellular, biological processes and relevant to growth and organ development (the nerve growth-factor signaling pathway and retinal cone-cell development, also see Supplementary Table 5). All the symbols are explained in the MetaCore legend (Supplementary Material, Supplementary Figure 1).

PowerPoint slide

Prediction of functional effect of human nsSNPs

Most variations are predicted to be benign (Table 1). Variations rs4603 (PSMB4), rs2229125 (ADRA1A), rs2234926 (MYOC) and rs2302339 (INMT) are predicted to be probably damaging and variation rs6265 (BDNF) is predicted to be possibly damaging.

Discussion

We showed that genetic analyses as well as classificatory multidimensional tree techniques applied to nsSNPs associated to MDD diagnosis provided a reliable branching-tree framework that predicted, with reasonable sensibility and specificity, the clustering of MDD patients.

The allelic variation rs6265 in the BDNF gene has been replicated in at least three independent studies in MDD;24 this variation is predicted to be possibly damaging and it has been also implicated in a number of neuropsychiatric conditions (OMIM 113505). Recently, allelic variations in the PDLIM5 gene have been implicated in recurrent MDD,25 bipolar26 and schizophrenia.27 We have reported data supporting the role of cGMP phosphodiesterases (PDE) genes28 and the role of immune-related genes relevant to T-cell function (TBX21 and PSMB4) in MDD.29

In our discovery/exploratory analyses, the ubiquitin-proteasome pathway provided a recurrent theme in the tree structure (Figure 1). Three nsSNPs with high-order interactions in the control of the tree growth in the PSMB4, PSMD9 and USP36 genes belong to this pathway, however, only PSMD9 remained in the replication tree (Figure 2). PSMD9 is a subunit of the 26S proteasome, which consists of a 20S proteolytic core capped at both ends by the 19S regulatory complex that recognizes the polyubiquitin-tagged substrates. Studies have linked the 26S proteasome with transcriptional activities through proteolysis of steroid-hormone receptors to limit their transcriptional output and recycling of transcriptional complexes on chromatin to facilitate multiple rounds of transcriptional initiation.30 The ubiquitin-proteasome pathway is the major system of selective degradation of short-lived regulatory proteins in eukaryotic cells, but it also has an important roles in cell-cycle regulation, signal transduction, differentiation, antigen processing, immune function and degradation of tumor suppressors.30

In Figure 1, the variation rs6205 in the HSD3B1 gene, which defined the root (topmost) node in our risk-assessment tree, is located in a putative membrane-binding domain.31 The genotype TT in rs6205 (HSD3B1) is prevalent in the population, whereas the CC/CT variants of rs6205 (HSD3B1) conferred a small increment in MDD risk, which in combination with the PSMD9 genotype GG (in rs1043306) defined the highest MDD percentage-increased risk (84%, node 12). This cluster of genotypes represents an increase of six-fold in MDD risk compared with our reference node 6 (47%). Clusters of genotypes can confer low risk (nodes 1, 2, 5 and 9) or increased risk for MDD (3 and 4 are moderate MDD risk and nodes 7, 8, 10, 11 and 12 are high MDD risk, OR1.9). Three subtrees emerged in our main tree structure: two outer (right and left) and one middle subtrees. Outer subtrees contained several representations of the ubiquitin-proteasome pathway components (PSMD9, USP36 and PSMB4 in the left subtree, and PSMD9 and PSMB4 in the right subtree) and the middle subtree contained the BDNF rs6265 variation. Of note, is the presence of genes relevant to visual function (MYOC and PDE6C) in the subtrees that contain high-risk MDD nodes (Figure 1 middle and right subtrees) and their absence in the left subtree, which carries low- and moderate-risk MDD nodes. Variation rs2234926 (MYOC) is predicted to be possibly damaging; variations in the MYOC gene have been associated with glaucoma. PDE6C is a phosphodiesterase enzyme abundantly found in the eye. In our replication tree, only PDE6C remained (Figure 2).

Our replication study used an MDD-independent sample (dbGaP data) and tagged markers in linkage disequilibrium to the nsSNPs (Figure 2). It showed an extraordinary reliability for most genes involved in the branching-tree process depicted by the exploratory analysis. It also revealed that comorbidities, such as, nicotine, alcohol abuse, among others, included as covariate information might provide a better prediction of the status outcome. Additionally, network analyses showed that genes involved in those major branching events are related to positive regulation of cellular, biological processes and relevant to growth and organ development.

Our study provides an initial functional pathway map which helps elucidate the pathophysiology of mild to moderate forms of MDD. This approach has the potential to enhance our understanding of the contributions of gene–gene variations to MDD risk. We have not conducted an extensive nsSNP survey; therefore, other pathways may also contribute to MDD risk. Future work should further assess the interactions of nsSNPs and environmental factors in major depression in diverse population groups.