Abstract

Major depressive disorder (MDD) is a common complex trait with enormous public health significance. As part of the Genetic Association Information Network initiative of the US Foundation for the National Institutes of Health, we conducted a genome-wide association study of 435 291 single nucleotide polymorphisms (SNPs) genotyped in 1738 MDD cases and 1802 controls selected to be at low liability for MDD. Of the top 200, 11 signals localized to a 167 kb region overlapping the gene piccolo (PCLO, whose protein product localizes to the cytomatrix of the presynaptic active zone and is important in monoaminergic neurotransmission in the brain) with P-values of 7.7 × 10−7 for rs2715148 and 1.2 × 10−6 for rs2522833. We undertook replication of SNPs in this region in five independent samples (6079 MDD independent cases and 5893 controls) but no SNP exceeded the replication significance threshold when all replication samples were analyzed together. However, there was heterogeneity in the replication samples, and secondary analysis of the original sample with the sample of greatest similarity yielded P=6.4 × 10−8 for the nonsynonymous SNP rs2522833 that gives rise to a serine to alanine substitution near a C2 calcium-binding domain of the PCLO protein. With the integrated replication effort, we present a specific hypothesis for further studies.

Introduction

The defining features of major depressive disorder (MDD) are marked and persistent dysphoria plus additional cognitive signs and symptoms (anhedonia, sleep disturbance, weight/appetite changes, motor agitation/retardation, anergia, excessive guilt or worthlessness, poor concentration or indecisiveness, and recurrent thoughts of death or suicide).1 MDD is distinct from normal sadness by its persistence (that is, 2 weeks), additional signs and symptoms, and substantial associated impairment. The definition of MDD excludes other conditions typified by substantial depressive symptoms (other psychiatric disorders, drug/alcohol dependence and somatic diseases). The lifetime prevalence of MDD is 15%2, 3, 4 and is twofold higher in women5 with a course typified by recurrence of illness.6 It is associated with considerable morbidity,7, 8, 9 excess mortality from suicide and other causes,10, 11, 12, 13 and substantial direct and indirect costs.14 A World Health Organization study projected MDD to be the second leading cause of disability worldwide by 2020.15

Although there is a considerable corpus of research on the epidemiology and biological correlates of MDD, little is known for certain about its etiology. An important etiological clue may be the familial tendency of MDD and its heritability of 31–42%.16 This clue led to a number of genome-wide linkage studies (Supplementary Methods) and studies of >100 theoretical or positional candidate genes. As for the use of these study designs with other biomedical disorders, their application to MDD has not been as successful as had been hoped.

It is now clear that genome-wide association studies (GWASs) can be a successful tool in the genetic dissection of complex biomedical disorders.17, 18 The goal of this report is to describe a GWAS for MDD that was systematically designed to remediate a set of methodological issues common to genetic studies of MDD. Examples of these issues include small sample sizes, inhomogeneous samples in terms of ancestry and phenotyping, convenience sampling, and controls that are unaffected but not at low liability for MDD. Moreover, large-scale replication was integral to our design.

Materials and methods

This GWAS was one of the six initial Genetic Association Information Network (GAIN) studies sponsored by the Foundation for the NIH.19 Individual phenotype and genotype data are available to researchers by application to the dbGaP repository.20 We have attempted to follow published guidelines for GWAS (Chanock et al.,21 Box 1).

Subjects

The parent projects that supplied subjects for this GWASs are longitudinal studies, the Netherlands Study of Depression and Anxiety (NESDA; http://www.nesda.nl)22 and the Netherlands Twin Registry (NTR; http://www.tweelingenregister.org).23 Sampling and data collection characteristics of the GAIN–MDD study have been described in detail elsewhere.24

MDD cases were mainly from NESDA, a longitudinal cohort study designed to be representative of individuals with depressive and/or anxiety disorders. Recruitment of participants for NESDA took place from 09/2004–02/2007, and ascertainment was from outpatient specialist mental health facilities and by primary care screening. Additional cases were from the population-based cohorts NEMESIS,25 ARIADNE26 and the NTR. Regardless of recruitment setting, similar inclusion and exclusion criteria were used to select MDD cases. Inclusion criteria were a lifetime diagnosis of DSM-IV MDD1 as diagnosed by the Composite International Diagnostic Interview psychiatric interview,27 age 18–65 years, and self-reported western European ancestry. Persons who were not fluent in Dutch and those with a primary diagnosis of schizophrenia or schizoaffective disorder, obsessive–compulsive disorder, bipolar disorder or severe substance use dependence were excluded (the etiology of MDD in these subjects may be distinct). The 1862 cases included in GAIN were recruited from mental health care organizations (N=785), primary care (N=603) and community samples (NEMESIS N=218, ARIADNE N=96 and NTR N=160).

Control subjects were mainly from the NTR, which has collected longitudinal data from twins and their families since 1991 (total cohort of 22 000 participants from 5546 families). The majority of families were recruited when the twins were adolescents or young adults through city council registrations along with alternative efforts to recruit older twins. Longitudinal phenotyping includes assessment of depressive symptoms (via multiple instruments), anxiety, neuroticism and other personality measures. Inclusion required availability of both survey data and biological samples, no report of MDD at any measurement occasion, and low genetic liability for MDD. No report of MDD was determined by specific queries about medication use or whether the subject had ever sought treatment for depression symptoms and/or through the CIDI interview. Low genetic liability for MDD was determined by the use of a factor score derived from longitudinal measures of neuroticism, anxiety and depressive symptoms28 (mean 0, s.d. 0.7); controls were required never to have scored highly (0.65) on this factor score. Finally, controls and their parents were required to have been born in the Netherlands or western Europe. Only one control per family was selected. There were controls (N=1703) from the NTR and additional controls from NESDA (N=133 from general practice, N=24 from ARIADNE). NESDA controls had no lifetime diagnosis of MDD or an anxiety disorder as assessed by the CIDI and reported low depressive symptoms at baseline (K-10 score <16 and inventory of depressive symptoms score <4).29, 30

Case–control matching

If there were multiple eligible NTR controls in a family, we first matched on sex and age, and used the highest number of completed questionnaires as an additional criterion. Again, only one control per family was included.

DNA sampling

Before the start of the NESDA and NTR biological sample collection, processing, and storage protocols were harmonized and DNA extraction was conducted concurrently in the same laboratory. For NESDA, blood sampling for the NESDA participants took place during the baseline visit (between 0830 and 0930 hours) and DNA was isolated using the FlexiGene DNA AGF3000 kit (Qiagen, Valencia, CA, USA) on an AutoGenFlex 3000 workstation (Autogen, Holliston, MA, USA). For NTR, biological samples were taken in the subject's home (between 0700 and 1000 hours) and DNA was extracted using the Puregene DNA isolation kit (Gentra, Minneapolis, MN, USA) for frozen whole blood samples. DNA concentrations were determined using the PicoGreen dsDNA Quantitation kit (Invitrogen Corporation, Carlsbad, CA, USA). All procedures were performed according to the manufacturer's protocols.

Ethical issues

The NESDA and NTR studies were approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Center, Amsterdam, an Institutional Review Board certified by the US Office of Human Research Protections (IRB number IRB-2991 under Federal-wide Assurance-3703; IRB/institute codes, NESDA 03-183; NTR 03-180). All subjects provided written informed consent. As part of the GAIN application process, consent forms were specifically rereviewed for suitability for the deposit of deidentified phenotype and genotype data into the controlled-access dbGaP repository.20 NESDA and NTR subjects were informed of participation in GAIN by newsletters. Only 22 NESDA respondents refused informed consent for genetic research (1.7% of all respondents approached).

GWAS genotyping

Individual genotyping was conducted by Perlegen Sciences (Mountain View, CA, USA) using a set of four proprietary, high-density oligonucleotide arrays. The SNPs on these arrays were selected to tag common variation in the HapMap European and Asian panels using previously described genotype data,31 tagging approach32 and methodology.33 At the beginning of GAIN, all HapMap34 samples were genotyped with the Perlegen GWAS platform. Independent review of these data by the GAIN analysis group19 showed 99.8% agreement with prior HapMap genotypes and the mean maximum r2 between the Perlegen SNPs and HapMap phase II SNPs31 was 0.89 for single and 0.96 for multimarker analyses. The genotyping procedures and genotyping calling algorithms are described in the Supplementary Methods and in prior reports.35, 36 Briefly, 40 × 96-well plates were sent to Perlegen for GWAS genotyping. Genotyping was conducted blind to case–control status. Cases and controls were randomly allocated to plates and to positions within plates. Each plate contained DNA samples from 93 Dutch subjects plus 3 quality control samples. The three quality control samples included: two parents of one control on that plate (40 complete trios in total); and half the plates contained the same HapMap CEU sample (used for quality control in all GAIN projects) and half had a randomly selected duplicate case sample. The total number of samples was 3840 (=40 plates × 96 samples per plate) or 1860 cases+1860 controls+80 parents+20 duplicate samples+20 HapMap samples.

Quality control—subjects

Of the 3820 Dutch samples sent to Perlegen (excluding the 20 HapMap internal control samples), genotypes were delivered for 3761 samples. A total of 59 samples did not have GWAS data: 39 samples with uncertain linkage between genotype and phenotype records, 7 samples with evidence of contamination, 6 samples that failed genotyping and 7 miscellaneous failures (2 of these were excluded as chrX and chrY genotyping data were consistent with the presence of XO and XXY sex chromosome status). After further analysis, 8 subjects were removed for excessive missing genotype data (>25%), 1 case for high genome-wide homozygosity (75%), 38 subjects whose genome-wide IBS estimates were consistent with first- or second-degree relationships and 57 additional subjects whose ancestry diverged from the remainder of the sample (see Supplementary Methods for details). After these exclusions (N=104) and removing duplicated and trio quality control samples, there were 3540 subjects in the final analysis data set including 1738 cases and 1802 controls. The principal reason for fewer cases than controls was the higher prevalence of substantial non-European ancestry. The list of subjects in the final analyses data set is included as a Supplementary File (‘mddC.fam’).

Quality control—SNPs

The unfiltered data set obtained from dbGaP contained 599 156 unique SNPs. The Perlegen genotyping algorithm yielded a quality score for each individual genotype, and a more stringent quality score cutoff (10) than that used by Perlegen was applied. The SNP quality control process is described in detail in the Supplementary Methods. Briefly, to be included in the final analysis data set, SNPs were required not to have any of the following features: gross mapping problem,37 2 genotype disagreements in 40 duplicated samples, 2 Mendelian inheritance errors in 38 complete trio samples, minor allele frequency <0.01 or >0.05 missing genotypes in either cases or controls. A Hardy–Weinberg filter was not used as lack of fit to Hardy–Weinberg expectations can occur for valid reasons (for example, a true association)38 and given that 95.6% (=51 592/53 994) of SNPs with P<0.00001 from an exact test of Hardy–Weinberg equilibrium39 in controls were already flagged for exclusion. A total of 435 291 SNPs met these criteria and were included in the final analysis data set (included as a Supplementary File, ‘mddC.bim’). Additional quality control checks are described in the Supplementary Methods). A total of 13 controls were genotyped in a different study using the Illumina 317K platform and, of the 82 636 SNPs common to both platforms, the genotype agreement was 99.94%.

Single-marker statistical analyses

There were three classes of SNPs—those that could be heterozygous in all subjects (chr1-22 and chrX/PAR1), those that were heterozygous in women (non-PAR chrX) and those that were hemizygous in men (non-PAR chrX and chrY). All SNPs that passed quality control checks were tested for association with MDD using 1 d.f. Cochran-Armitage trend tests. For complex traits, it is widely believed that the contributions of individual SNPs to disease risk are often roughly additive.40 The Cochran-Armitage trend test can be used to detect such effects. This test is usually recommended due to its robustness to the violation of the HWE assumption:41 P-values from women and men for non-PAR chrX were combined using Fisher's method.42

Population stratification artifacts were assessed in two ways. As described elsewhere,36 including principal components as covariates in a logistic regression model can robustly control stratification effects. To do this, we identified a set of 127 688 SNPs in linkage equilibrium43 and used the ‘smartpca’ program in EigenSoft44 to compute 10 principal components for each subject that were included as covariates in logistic regression models (case/control statusSNPi+PC1+PC2+…+PC10). We also used a stratified Cochran–Mantel–Haenszel test in PLINK43 as a complementary approach.

For noteworthy associations, there were additional checks to ensure that an association was not due to experimental bias. These checks included: manual inspection of SNP cluster plots to ensure reasonable performance of the genotyping calling algorithm; evaluation of conformation to Hardy–Weinberg equilibrium in controls, cases and overall (discussed in the Supplementary Methods); the checks for population stratification described above; evaluation of plate-specific association results to ensure that the overall association was not driven by one or a few plates; comparison of control MAFs to the HapMap EUR panel; and evaluation of the characteristics of a SNP in high linkage disequilibrium (‘proxy association’) as a similar association with such a SNP decreases the chance of some forms of method artifacts.

Control of false discoveries

Given the 105–107 statistical comparisons in a GWAS, small P-values are expected by chance. To control the risk of false discoveries, q-values45, 46 were computed for all P-values for single-marker tests of association. A q-value is an estimate of the proportion of false discoveries among all significant markers, or the false discovery rate (FDR) for the corresponding P-value. The use of q-values is preferable to more traditional multiple testing controls because q-values provide a better balance between the competing goals of finding true positives versus controlling false discoveries, allow more similar comparisons across studies because proportions of false discoveries are much less dependent on the number of tests conducted and are relatively robust against the effects of correlated tests.45, 47, 48, 49, 50, 51, 52, 53, 54 The q-value threshold for declaring significance was 0.10 (that is, the top 10% of the significant findings are, on average, allowed to be false discoveries).50, 55 FDR thresholds <0.10 result in a disproportionate drop in power to detect true effects.

Imputation

We used two imputation approaches, the SNPMStat method of Lin et al.56 to impute 246 additional SNPs in the piccolo (PCLO) region and Abecasis’ MACH (v1) to impute 2 037 829 autosomal SNPs with R20.5 (a cutoff that removes 90% of SNPs with unreliable imputation results while dropping 2–3% of reliably imputed SNPs). Both SNPMStat and MACH gave similar results in the PCLO region. Imputed genotypes were used in secondary analyses. The HapMap2 EUR panel31, 34 was used as reference.

Statistical power

Quanto57, 58 was used to approximate statistical power given the following assumptions: two-tailed α=1 × 10−7 (=0.05/500 000), 1738 cases and 1802 controls, lifetime morbid risk of MDD of 0.15 and a log additive genetic model. For statistical power of 0.80 (β=0.20), the minimum detectable genotypic relative risks are 1.59, 1.40 and 1.35 for minor allele frequencies of 0.10, 0.25 and 0.40.

Software

PLINK (v1.0),43 SAS (v9.1.3),59 R (v2.6.1),60 HAPSTAT (v3),61, 62, 63 MACH1, SNPMStat,56 HaploView,64 and JMP (v6)65 were used for data management, quality control, statistical analyses and graphics.

Bioinformatics

All genomic locations are per NCBI Build 35 66 (UCSC hg17).67 Pseudoautosomal region 1 (PAR1) is assumed to be located on chrX:1–2 692 881 and chrY:1–2 692 881 and PAR2 on chrX:154 494 747–154 824 264 and chrY:57 372 174–57 701 691.68 SNP annotations were per TAMAL37 based chiefly on UCSC genome browser files,67 HapMap34 and dbSNP.66

Results

Sample description

Table 1 presents descriptive data for cases and controls. Controls had a higher proportion of men and were slightly older (and thus were farther through the period of risk for MDD). Consistent with known correlates of MDD, cases had a significantly lower educational level, less often had a partner, were more often smokers and scored much higher on the NEO-FFI neuroticism scale.

Table 1: Descriptive data for cases with MDD and controls at low liability for MDD included in the GWAS

SNP description

The analysis SNP set had 435 291 SNPs including 427 049 autosomal SNPs, 7 988 SNPs on the non-PAR portions of chrX, 239 SNPs on chrXY/PAR1, 15 SNPs on chrY and 0 SNPs on PAR2. The median SNP missingness was 0.00339 (interquartile range 0.00113–0.0105) and the median minor allele frequency was 0.2422 (interquartile range 0.1375–0.3646) with similar estimates in cases and controls. The average marker density over the genome was 1 SNP every 7069 bases (=3 077 088 087 bases/435 291 SNPs). The median intermarker distance was 2911 bases with interquartile range 966–7374 bases and a 99th percentile of 50.1 kb.

Single-marker association tests

We used the Cochran-Armitage trend test to test for association of the 435 291 SNPs in the GWAS data set with case/control status. The estimated λ69 was 1.046 (similar P-value minima and λs were obtained using logistic regression with 10 principal components and using a stratified Cochran–Mantel–Haenszel tests based on identity-by-state clusters).43, 44 The minimum q-value was 0.28 (that is, if these tests were called significant, over the long term, a minimum false discovery rate of 28% would be incurred). As the prespecified q-value threshold was 0.10, no SNP reached genome-wide significance. The proportion of all SNPs without true effects (P0)54 was conservatively estimated to be P0=0.9999954, consistent with the presence of 2 SNPs with true effects in these GWAS data.

Figure 1a depicts the quantile–quantile plots40 for these analyses. The observed P-values do not strongly depart from the P-value distribution expected by chance. Figure 1b shows a plot of –log10(Ptrend) by genomic location.

Figure 1
Figure 1

Genome-wide association study (GWAS) results for major depressive disorder (MDD) in cases versus controls. (a) Quantile–quantile plots and λ estimates for the primary analysis using the Cochran–Armitage trend test and confirmatory analyses using logistic regressions and Cochran–Mantel–Haenszel stratified tests. The dashed lines show the expected 95% probability interval for ordered P-values, and the circles show the observed versus expected values for all SNPs. The λ values are the median χ2 from all association tests divided by the expected value under the null hypothesis of no association. If λ is large (for example, >1.2), there is evidence that the observed test statistics deviate from the expected. This could be due to true associations but is more likely due to a systematic bias (for example, population stratification effects). The λ values in (a) are not consistent with the presence of systematic biases in the results. (b) –log10(P) by genomic location for chr1–chr22 plus chrX.

Table 2 presents the findings for the top 25 SNPs. The quality control metrics—SNP missingness, agreement with HWE and similarity of the control MAFs to the HapMap EUR panel—for the top 25 SNPs are generally acceptable. Of the top 25, 4 associations are in the presynaptic cytomatrix protein PCLO. Table 3 depicts the top 25 multi-SNP clusters (that is, for an index SNP with association P<0.001, these clusters are additional SNPs within 250 kb of the index SNP with r20.50). The full version of this table is included as a Supplementary File (‘Table 3_full.xls’). PCLO is present in the top 25 clusters along with two additional multi-SNP clusters in the top 200. Other notable SNP clusters occurred in GRM7 (rank 51), DGKH (rank 83, a candidate gene for bipolar disorder),70 DAOA (rank 124) and DRD2 (rank 226).

Table 2: Information on the SNPs with the smallest association P-values in the GWAS
Table 3: Clustering of SNPs with low P-values

Focusing on piccolo

Although no association met genome-wide significance, there were clusters of SNPs in PCLO (Figure 2). Notably, 11 of the 200 smallest P-values localized to a 167 kb segment overlapping PCLO. Interest in PCLO was increased given its expression in brain, localization to the presynaptic active zone71 and involvement in monoamine neurotransmission, a venerable hypothesis of the etiology of MDD.72 Moreover, the third most significant SNP (rs2522833) codes for a nonsynonymous amino-acid change (ala-4814-ser) in PCLO near its C2A calcium binding domain.73

Figure 2
Figure 2

Plot of the piccolo (PCLO) region (NCBI build 35, UCSC hg17, chr7:82 000 000–82 500 000). P-values in this figure are all from SNPMstat. The x axis is chromosomal position, the left y axis is –log10(P) for genotyped SNPs (colored diamonds) and imputed SNPs (grey diamonds), and the right y axis is the recombination rate from the HapMap EUR panel (light blue curve). The color of the genotyped single nucleotide polymorphisms (SNPs) corresponds to LD with the SNP with smallest P-value (rs2715148): red 0.8r21.0, orange 0.5r2<0.8, yellow 0.2r2<0.5 and white r2<0.2. The significant and extent of all three-SNP haplotypes with P<0.0001 in this region are colored light green. The transcripts for two PCLO isoforms are shown in dark green at the bottom. Graph adapted from an R function by the Broad DGI group.

We investigated possible causes of spurious associations in the PCLO region (chr7:82 032 093–82 436 848). First, these findings were not due to plate effects as inspection of plate-specific association data for these SNPs did not show any marked outliers or systematic biases. Second, review of allelic intensity cluster plots on which genotype calls were based revealed adequate performance of the Perlegen genotype calling algorithm. Third, inspection of additional quality control metrics did not suggest systematic problems with SNPs in this region. Fourth, inspection of LD matrices excluded very high LD as the sole explanation for the results (Supplementary Figure 10), and none of the genotyped SNPs had strong LD (r20.8) with rs2715148 (the SNP with the smallest P-value in the PCLO region). Fifth, population stratification can cause false-positive findings but this did not appear to explain the PCLO association: (1) the same 11 SNPs had P-values among the top 200 associations in unadjusted analyses as well as with adjustment via principal components and stratified analyses; and (b) for the 57 SNPs in the PCLO region, the P-values across these three types of analyses were consistent (the Spearman's correlations between P-values from trend tests, logistic regression and stratified analyses were all >0.962). Sixth, the minor allele frequencies in the control group in the PCLO region were usually quite similar to available EUR control groups suggesting that the PCLO findings were not due to an artifact of the control selection process (see below). Finally, bioinformatic investigation did not suggest that this is a problematic region to genotype as the PCLO region is not known to be under positive selection in humans,74 to contain segmental duplications67 or common copy number variants (search of the Database of Genomic Variants yielded two rare copy number variations (CNVs) with control frequencies of 0.12 and 0.89%).75, 76, 77

We conducted additional analyses to attempt to localize the association depicted in Figure 2. Imputation56 supported the directly typed SNP associations but did not yield an association P-value markedly more significant than any directly genotyped SNP (although 22 of the 25 most significant imputed associations in the genome were in this region). Haplotype analysis using three-SNP sliding windows did not improve localization. Secondary analyses by sex, case ascertainment setting and recurrent early onset MDD (reoMDD, arguably the most heritable form of MDD)16, 78 suggested that most of the signals were from women and from subjects with reoMDD (Supplementary Table 11). The findings for reoMDD were often stronger than the primary analyses, particularly for the most significant SNP (rs2715148) where the P-value decreased by 1.2 orders of magnitude to 9.5 × 10−8.

PCLO replication

Although no finding met genome-wide significance, the presence of multiple possible signals in PCLO and the plausibility of a function for PCLO in the etiology of MDD led us to attempt replication in external samples. We assembled a collection of 11 972 independent subjects (6079 MDD cases and 5893 controls) from seven different groups and a total of six case–control replication samples (two German samples were combined; Supplementary Methods). As with NESDA cases, all replication cases were adults of European ancestry on whom a structured clinical interview was used to substantiate the lifetime diagnosis of DSM-IV MDD,1 and all studies excluded common MDD phenocopies (for example, depressive symptoms due to another psychiatric disorder or a general medical condition). As with NTR controls, all replication controls were adults of European ancestry ascertained from the population, and individuals reporting MDD symptoms were excluded. We estimated statistical power using Quanto57 (assumptions: log-additive genetic model, MDD lifetime risk 0.15, MAF=0.45 (similar to rs2522833), a genotypic relative risk of 1.14 (‘shrunk’ down from the observed GRR of 1.26 for rs2522833 to account for the ‘Winner's Curse’ phenomenon))79 and a conservative two-tailed type 1 error rate of 0.00167 (=0.05/30 replication SNPs). Statistical power was 97.2% for replication for the two SNPs genotyped in all samples (N=11 972) and 90.4% for the remaining SNPs (N=9278). Five replication samples were genotyped for 30 SNPs using the same Sequenom iPlex SNP pool (15 SNPs were in the primary GWAS and 15 were selected to tag common variation in Europeans)80 and one sample was successfully genotyped for two SNPs using TaqMan. The SNP selection strategy effectively cast a broad net over the region showing association in Figure 2. For the NESDA/NTR samples, agreement between the initial Perlegen genotypes in this region and independent re-genotyping was high (0.9987).

The single SNP results for MDD are depicted in Figure 3 and Table 4a. Our analytic plan dictated the combined analysis of all replication samples with the use of a one-tailed directional test. No association in the replication sample reached statistical significance after correction for multiple comparisons and SNP nonindependence due to LD (ninth column in Table 4a). Similarly, haplotype analyses did not reveal significantly associated regions (Supplementary Figure 16). There were four P-values <0.05 in the replication sample but only rs10954694 also had Z-scores of the same sign in both samples. Table 4b shows the results for reoMDD, and no single SNP was significant after correction for multiple comparisons. When we repeated the MDD analyses restricted to female subjects, the observed significance levels did not become markedly stronger in any of the replication samples in contrast to the initial NESDA/NTR sample. Thus, results from analyses of all replication samples did not reach the a priori criterion for replication evidence for the involvement of PCLO in the etiology of MDD.

Figure 3
Figure 3

Piccolo (PCLO) region replication results for major depressive disorder (MDD) showing genomic context and forest plots for the top 12 single nucleotide polymorphisms (SNPs) in the original sample. The backbone of the graph is the region of PCLO targeted for follow-up. SNP locations are given by the grey triangles. There are 12 forest plots for the SNPs with P<0.001 in the original sample. Each forest plot is for one SNP and shows the odds ratio (square) and 95% confidence intervals (horizontal line) for a particular sample with the area of the square proportional to sample size.

Table 4: PCLO replication results

Unanticipated heterogeneity in cases

However, we observed, a posteriori, that there was potentially important heterogeneity in the replication samples for eight SNPs that were strongly associated in the original sample (I20.4, ninth column in Table 4a). In investigating this further (Supplementary Methods), we determined that there was little evidence for genetic heterogeneity in the genotyped region for controls but, unexpectedly, there was significant heterogeneity in the cases. Principal components analysis and inspection of Table 4a and the forest plots in Figure 3 indicated that the outlier was the Australian QIMR sample. Notably, the original and QIMR samples were particularly similar in that both studies included population-based cases and controls were selected to be at low liability for MDD based on longitudinal assessments. Of the nine SNPs with P<0.05 in the QIMR sample, eight had both low P-values and Z-scores with the same sign as in the NESDA/NTR sample. As an exploratory analysis, we analyzed the original and QIMR samples jointly, and the minimum P-value was 6.4 × 10−8 at the nonsynonymous SNP rs2522833 that gives rise to a serine to alanine substitution near the C2A calcium-binding domain of the PCLO protein.

Secondary analyses

We conducted additional analyses of the NESDA/NTR GWAS data set that were specified a priori but which should be considered exploratory.

(1) The network of proteins with which PCLO interacts in its function at the presynaptic cytoskeletal matrix is relatively well characterized, and we reasoned that genes encoding these proteins might harbor risk or protective variants. We assessed this hypothesis by testing for association conditioning on the PCLO nsSNP rs2522833 (that is, investigating whether controlling statistically for the effect of rs2522833 increases the salience of other SNP associations), assessing the minimum P-value per gene, and then comparing this list to a list of 54 genes that make proteins that interact with PCLO. This analysis did not reveal any SNPs or genes whose significance was markedly lower than without including rs2522833 in the logistic regression model. Moreover, no known PCLO interacting protein was notable on this list.

(2) We imputed genotypes for 2 037 829 autosomal SNPs using MACH with reference to HapMap CEU genotypes. The resulting λ was 1.048, and the minimum P-value was 1.21 × 10−7. As noted above, 22 of the 25 most significant imputed associations were in the PCLO region. Investigation of SNP clustering that accounted for LD yielded results similar to those shown in Table 3.

(3) We assembled a list of 103 candidate genes that had been studied for association with MDD in the literature.81 A total of 19 of these genes had no SNPs within its transcript and another 9 genes had inadequate coverage (>1 SNP per 15 kb; Supplementary Table 17). Of the remaining 75 genes, only neuronal nitric oxide synthase (NOS1, P=0.0006) had P<0.001. However, NOS1 (as with most genes in Supplementary Table 16) is quite large and there is a possibility of a potential influence on these results.

(4) We compared the GWAS association results to a meta-analysis of gene expression data from 12 studies of postmortem brain tissue in MDD cases compared with controls (10 frontal cortex and 2 cerebellum studies). These data are available via the Stanley Foundation (http://www.stanleygenomic.org). There were five genes with GWAS P<0.05 (all had gene expression changes significant at P 0.0004–0.007). The genes were: SGCG (sarcoglycan), CALD1 (caldesmon 1), EEF1A1 (eukaryotic translation elongation factor 1α1), CFLAR (CASP8 and FADD-like apoptosis regulator) and TP73L (tumor protein p73-like). There is no overlap of this list with the PCLO interactors or MDD candidate genes from the literature.

(5) Alternative models, filters and phenotypes: (i) For reoMDD, the minimum P-value over all GWAS SNPs was at the PCLO region SNP rs2715148 (8.4 × 10−8) which ranked second of all SNPs using the trend test (Table 2). (ii) rs2715148 also had the smallest P-value under a dominant model of SNP action (6.2 × 10−6). (iii) Given the female predominance in MDD, we analyzed data from women and men separately. For female cases and controls, rs2715148 had the smallest P-value (4.0 × 10−7) and multiple other PCLO SNPs had P-values in the 10−5–106 range. For men, most PCLO SNPs had P>0.05 and the minimum was in the SLC9A9 SNP rs4839627 (9.1 × 10−7). (iv) Again, given sex differences in MDD prevalence, we investigated SNPs on chrX and chrY more closely. The minimum P-value in chrX pseudoautosomal region 1 was 0.02. For the non-PAR regions of chrX in women, the SNPs with the smallest P-values were rs11094388 (P=0.0003, intergenic), rs5971108 (P=0.0003, PTCHD1), rs5930667 (P=0.0004, intergenic), rs4618863 (P=0.0005, intergenic), rs2207796 (P=0.0005, in the very large gene DMD) and rs5936428 (P=0.0009, FMR2). For men, the minimum P-value on chrX was at rs10521594 (P=5.4 × 10−5, intergenic) and 0.22 on chrY.

Discussion

Overview

MDD is a common complex trait of enormous public health significance. As part of the GAIN initiative of the US Foundation for the NIH,19 we conducted a GWAS of 435 291 SNPs genotyped in 1738 MDD cases and 1802 controls selected to be at low liability for MDD. Our study had numerous positive attributes including its historically large sample size, its largely population-based and longitudinal design, and relatively unbiased and dense genome-wide genotyping designed to capture common variation in subjects of European ancestry.

According to our primary analysis plan, no SNP–MDD phenotype association reached genome-wide significance as the minimum q-value was 0.28, greater than the pre-defined q-value threshold of 0.10. This result was not unexpected. For example, type 2 diabetes mellitus has arguably reaped the greatest harvest from GWAS82 and yet two of the initial T2DM GWAS were unremarkable when analyzed independently.83, 84 One of the key lessons of the GWAS era is the importance of meta-analysis where its application to the primary GWAS can uncover positive findings that replicate well across studies.18, 85

Is PCLO a causal risk factor for MDD?

Although no locus exceeded the genome-wide threshold after correction for multiple comparisons, 11 of the top 200 signals localized to a 167 kb region overlapping the gene PCLO. The protein product of PCLO localizes to the presynaptic active zone and is important in brain monoaminergic neurotransmission,86 clearly intersecting with a venerable hypothesis of the etiology of mood disorders.87 Moreover, the third most significant association was a common nonsynonymous SNP near its critical C2A binding domain in PCLO.88, 89 Although it is an obvious candidate gene, we are not aware of any prior association studies of PCLO and mood disorders (PCLO is in a region of 7q implicated by linkage in autism and one autism association study has been published).90

We judged the intersection of this GWAS result with prior knowledge sufficient to trigger a large-scale replication effort by genotyping PCLO SNPs in 6079 MDD-independent cases and 5893 controls. Statistical power to replicate exceeded 90% even after accounting for79 the ‘Winner's Curse’ phenomenon (a form of regression to the mean whereby the true genotypic relative risk is overestimated in the initial study).91, 92 However, in spite of the apparent a priori strength of a hypothesis of genetic variation in PCLO in the etiology of MDD, no SNP analyzed in the replication sample met appropriately rigorous criteria for replication.21 Therefore, unlike GWAS for many nonpsychiatric biomedical disorders, our GWAS and replication efforts did not yield ‘proof beyond a reasonable doubt’ level of evidence for an association between genetic variation in PCLO and MDD.

Investigation of the sources of heterogeneity in the replication samples indicated that controls were genetically similar to the original sample in the PCLO region but that cases were dissimilar. We observed, a posteriori, that both principal components derived from PCLO region genotypes in QIMR cases and effect size estimates in the QIMR replication sample tended to be similar to the original sample. This is notable because, of all the replication samples, ascertainment of QIMR subjects was most similar to the primary NESDA/NTR sample in that cases were identified from population-based sources (100% for QIMR and 60% for NESDA) rather than tertiary sources as for the other replication samples. MDD cases from clinical samples may differ from population-based cases due to selection bias,93 Berkson's bias,94, 95 differing referral filters96 or even a different genetic basis97 with respect to genetic variation in the PCLO region.

Joint analysis of the NESDA/NTR and QIMR samples yielded P=6.4 × 10−8 (uncorrected for multiple hypothesis testing) for the nonsynonymous SNP rs2522833. This result suggests a specific hypothesis for future studies: an association between genetic variation in PCLO and MDD may be detected only in population-based cases. Thus, it would be premature to exclude PCLO from a function in the etiology of some forms of MDD.

The heterogeneous nature of MDD

Interpretation of the PCLO replication efforts is consistent with two broad possibilities. The first possibility is that genetic variation in PCLO is truly not associated with MDD. This interpretation is supported by the replication analyses (specified a priori) in which no SNP was significantly associated after correction for multiple comparisons and SNP dependence due to LD. This strict interpretation is generally viewed as ‘best practice’ in human genetics21 but implicitly assumes etiological homogeneity for MDD in the PCLO region. The second possibility invokes a less parsimonious model involving heterogeneity, that genetic variation in PCLO is etiologically causal to some subtypes of MDD. This interpretation is an a posteriori hypothesis consistent with the empirical results particularly in the notable differences in associations between samples, case ascertainment strategies, and indications from principal components analysis that NESDA and QIMR cases are more similar than the clinically ascertained subjects.

It is notable that the control samples from each site were considerably more similar than cases from the same sites.

The tension between null a priori results and plausible a posteriori hypotheses is a core issue in psychiatric genetics. Important phenotypes like MDD are defined reliably and with reference to diagnostic schema developed principally for clinical purposes. Heterogeneous etiology of MDD is widely suspected but there are no proven ways to index heterogeneity (indeed, a prominent rationale for genetics studies is improve differential diagnosis).

Our results are consistent with prior observations of the heterogeneous nature of MDD, particularly with regard to ascertainment. Individuals who meet MDD criteria from community or primary care sources may have a more inclusive and less comorbid form of MDD98 whereas tertiary ascertainment may yield subjects with greater comorbidity and perhaps distinctive etiology.99 In particular, it is formally possible (but unproven) that the PCLO results are accurate—genetic variation in PCLO might be causal to the types of MDD seen in community samples but other loci contribute to a distinctive type of MDD seen in tertiary care samples.

Other hypotheses

There were two MDD cases who may have had unrecognized genomic disorders100 (possible Turner's and Klinefelter's syndromes). We speculate that small numbers of cases with MDD will have CNV-related genomic disorders that are plausibly causal to MDD. Clarification of the function of such rare variants will require larger samples.

Most of the additional exploratory analyses were unrevealing, including examination of proteins known to interact with PCLO, genotype imputation, comparison of GWAS findings with MDD candidate genes from the literature and gene expression changes in the brain in cases with MDD, and alternative genetic models, phenotype definitions and sex-specific analyses.

We searched the Sullivan Lab Evidence Project (SLEP) compendium of psychiatric genetics findings101 in an attempt to discover overlap of our findings with those reported in the literature. First, with reference to a meta-analysis of microarray studies on the Stanley brain bank MDD and control samples, expression of CFLAR and MARCH3 were increased and LST1 and HLA-B were decreased in MDD postmortem frontal cortex. These regions ranked 9, 232, 267 and 432 in the NESDA/NTR GWAS. Second, we looked for convergence of our findings with other GWAS of psychiatric disorders. Notable genomic locations of overlap of the top 480 regions in the present GWAS were found with GWAS for ADHD (ITIH1; S Faraone, personal communication), the Wellcome Trust Case-Control Consortium GWAS for bipolar disorder (SHFM1 and UGT2B4)102 and a bipolar GWAS that used DNA pooling (GRM7 and DGKH).70 Third, we looked at the minimum P-values in our study for genes that met or nearly achieved genome-wide significance: the minimum P-values in our study for MAMDC1103 were 0.004, 0.03 for ZNF804A,104 0.002 for ANK3105 and 0.03 for CACNA1C.105 These overlaps are intriguing (although the possibility of chance cannot be excluded), and will be formally investigated as part of our participation in the Psychiatric GWAS Consortium analyses.18

Limitations

(1) Although statistical power has been systematically underestimated in psychiatric genetics, when we began this study in Q3 2006, it was believed that statistical power would be reasonable to detect realistic genetic effects. However, the definition of ‘realistic’ has shifted considerably since 2006 and it may be important to design studies that can detect genotypic relative risks <1.10. (2) When this study began, the coverage and performance of the Perlegen GWAS platform were among the better options available.19 The technology and pricing have evolved rapidly and superior platforms are now available. A key limitation of the Perlegen platform is its inability to assess CNV106 that may be particularly salient for psychiatric disorders.107, 108 More generally, the GWAS platform might not be sufficiently ‘genome-wide’ and unbiased: the platform may have had inadequate coverage in an etiologically important region of the genome, SNPs are only one type of genetic variation, and important non-SNP genetic variation might not have been sufficiently well captured. (3) There was an imbalance in the proportion of men in cases and controls. Although it is unclear whether and how this might bias the results, it may have lead to some degree of bias. (d) Finally, GWASs are predicated upon the crucial assumption that the predominant diagnostic criteria are valid with respect to the fundamental architecture of the disorder.

Conclusions

We describe here a large effort to identify DNA sequence variation fundamental to MDD. Although our initial GWAS results for the PCLO region were intriguing, this highly plausible hypothesis did not find support in a large-scale replication attempt. Our hypothesis about a function of genetic variation in PCLO for MDD in population but not clinical settings emphasizes the importance of knowing the epidemiological sampling frame for a study. Finally, we hope that the model we used in this study—a cooperative international effort—will be adopted by groups studying other psychiatric disorders in order to maximize progress.

Accessions

GenBank/EMBL/DDBJ

References

  1. 1.

    American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, 4th edn. American Psychiatric Association: Washington, DC, 1994.

  2. 2.

    , , , , , et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Survey. Arch Gen Psychiatry 1994; 51: 8–19.

  3. 3.

    , , , , , et al. The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA 2003; 289: 3095–3105.

  4. 4.

    , . The World Mental Health (WMH) Survey Initiative Version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI). Int J Methods Psychiatr Res 2004; 13: 93–121.

  5. 5.

    , , , , , . Sex differences in rates of depression: cross-national perspectives. J Affect Disord 1993; 29: 77–84.

  6. 6.

    , . Outcome of depression in psychiatric settings. Br J Psychiatry 1994; 164: 297–304.

  7. 7.

    , , , , , et al. The functioning and well-being of depressed patients: results from the Medical Outcomes Study. J Am Med Assoc 1989; 262: 914–919.

  8. 8.

    , , , . Depression, disability days, and days lost from work in a prospective epidemiologic survey. J Am Med Assoc 1990; 264: 2524–2528.

  9. 9.

    , , , . Socioeconomic burden of subsyndromal depressive symptoms and major depression in a sample of the general population. Am J Psychiatry 1996; 153: 1411–1417.

  10. 10.

    , . Excess mortality in schizophrenia and affective disorders. Arch Gen Psychiatry 1978; 35: 1181–1185.

  11. 11.

    , . Mortality in severe depression: a prospective study including 103 suicides. Acta Psychiatr Scand 1987; 76: 372–380.

  12. 12.

    , , . Is death from natural causes still excessive in psychiatric patients? J Nerv Ment Dis 1987; 175: 674–680.

  13. 13.

    , , . Mortality among psychiatric patients—the groups at risk. Acta Psychiatr Scand 1989; 79: 248–256.

  14. 14.

    , , , . The economic burden of depression in 1990. J Clin Psychiatry 1993; 54: 405–418.

  15. 15.

    , . Evidence-based health policy: lessons from the Global Burden of Disease Study. Science 1996; 274: 740–743.

  16. 16.

    , , . Genetic epidemiology of major depression: review and meta-analysis. Am J Psychiatry 2000; 157: 1552–1562.

  17. 17.

    , . Guilt beyond a reasonable doubt. Nat Genet 2007; 39: 813–815.

  18. 18.

    Psychiatric GWAS Consortium. A framework for interpreting genomewide association studies of psychiatric disorders. Mol Psychiatry (in press).

  19. 19.

    , , , , , et al. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nat Genet 2007; 39: 1045–1051.

  20. 20.

    , , , , , et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 2007; 39: 1181–1186.

  21. 21.

    , , , , , et al. Replicating genotype–phenotype associations. Nature 2007; 447: 655–660.

  22. 22.

    , , . The Netherlands Study of Depression and Anxiety (NESDA): rationales, objectives and methods. Int J Methods Psychiatr Res 2008; 17: 121–140.

  23. 23.

    , , , , , et al. Netherlands Twin Register: from twins to twin families. Twin Res Hum Genet 2006; 9: 849–857.

  24. 24.

    , , , , , et al. Genome-wide association of major depression: Description of samples for the GAIN major depressive disorder study: NTR and NESDA Biobank Projects. Eur J Hum Genet 2008; 16: 335–342.

  25. 25.

    , , , , . The Netherlands Mental Health Survey and Incidence Study (NEMESIS): objectives and design. Soc Psychiatry Psychiatr Epidemiol 1998; 33: 581–586.

  26. 26.

    , , , , , . Gender differences in the relation between social support, problems in parent–offspring communication, and depression and anxiety. Soc Sci Med 2005; 60: 2549–2559.

  27. 27.

    World Health Organization. Composite International Diagnostic Interview (CIDI), Version 2.1. World Health Organization: Geneva, Switzerland, 1997.

  28. 28.

    , , , , , et al. Netherlands twin family study of anxious depression (NETSAD). Twin Res 2000; 3: 323–334.

  29. 29.

    , , , , , et al. Screening for serious mental illness in the general population. Arch Gen Psychiatry 2003; 60: 184–189.

  30. 30.

    , , , , . The Inventory of Depressive Symptomatology (IDS): psychometric properties. Psychol Med 1996; 26: 477–486.

  31. 31.

    , , , , , et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 2007; 449: 851–861.

  32. 32.

    , , , , , et al. Whole-genome patterns of common DNA variation in three human populations. Science 2005; 307: 1072–1079.

  33. 33.

    , , , , , . Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 2004; 74: 106–120.

  34. 34.

    , , , , , . A haplotype map of the human genome. Nature 2005; 437: 1299–1320.

  35. 35.

    , , , , , et al. Novel genes identified in a high-density genome wide association study for nicotine dependence. Hum Mol Genet 2007; 16: 24–35.

  36. 36.

    , , , , , et al. Genomewide association for schizophrenia in the CATIE study: results of Stage 1. Mol Psychiatry 2008; 13: 570–584.

  37. 37.

    , , . TAMAL: An integrated approach to choosing SNPs for genetic studies of human complex traits. Bioinformatics 2006; 22: 626–627.

  38. 38.

    , , . Rational inferences about departures from Hardy–Weinberg equilibrium. Am J Hum Genet 2005; 76: 967–986.

  39. 39.

    , , . A note on exact tests of Hardy–Weinberg equilibrium. Am J Hum Genet 2005; 76: 887–893.

  40. 40.

    . A tutorial on statistical methods for population association studies. Nat Rev Genet 2006; 7: 781–791.

  41. 41.

    . From genotypes to genes: doubling the sample size. Biometrics 1997; 53: 1253–1261.

  42. 42.

    . Statistical Methods for Research Workers, 11th edn. Oliver and Boyd: London, 1950.

  43. 43.

    , , , , , et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 2007; 81: 559–575.

  44. 44.

    , , , , , . Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006; 38: 904–909.

  45. 45.

    . The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Stat 2003; 31: 2013–2035.

  46. 46.

    , . Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440–9445.

  47. 47.

    , . Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc (Ser B) 1995; 57: 289–300.

  48. 48.

    , . Methods of correcting for multiple testing: operating characteristics. Stat Med 1997; 16: 2511–2528.

  49. 49.

    , , , , , . Controlling the proportion of false positives in multiple dependent tests. Genetics 2004; 166: 611–619.

  50. 50.

    , . A framework for controlling false discovery rates and minimizing the amount of genotyping in the search for disease mutations. Hum Hered 2003; 56: 188–199.

  51. 51.

    , , . Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics 2003; 59: 1071–1081.

  52. 52.

    . Controlling false discoveries in candidate gene studies. Mol Psychiatry 2005; 10: 230–231.

  53. 53.

    , , . False discovery rate in linkage and association genome screens for complex disorders. Genetics 2003; 164: 829–833.

  54. 54.

    , . Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann Stat 2006; 34: 373–393.

  55. 55.

    , . False discoveries and models for gene discovery. Trends Genet 2003; 19: 537–542.

  56. 56.

    , , . Simple and efficient analysis of disease association with missing genotype data. Am J Hum Genet 2008; 82: 444–452.

  57. 57.

    . Sample size requirements for association studies of gene-gene interaction. Am J Epidemiol 2002; 155: 478–484.

  58. 58.

    . Sample size requirements for matched case–control studies of gene-environment interaction. Stat Med 2002; 21: 35–50.

  59. 59.

    SAS Institute Inc.. SAS/STAT® Software: Version 9. SAS Institute Inc.: Cary, NC, 2004.

  60. 60.

    R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2007.

  61. 61.

    , , . Maximum likelihood estimation of haplotype effects and haplotype–environment interactions in association studies. Genet Epidemiol 2005; 29: 299–312.

  62. 62.

    , , , , . Efficient semiparametric estimation of haplotype–disease associations in case–cohort and nested case–control studies. Biostatistics 2006; 7: 486–502.

  63. 63.

    , , . Detecting haplotype effects in genomewide association studies. Genet Epidemiol 2007; 31: 803–812.

  64. 64.

    , , , . Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005; 21: 263–265.

  65. 65.

    SAS Institute Inc. JMP User's Guide (Version 6). SAS Institute Inc.: Cary, NC, 2005.

  66. 66.

    , , , , , et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2006; 34(Database issue): D173–D180.

  67. 67.

    , , , , , et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res 2006; 34(Database issue): D590–D598.

  68. 68.

    , . The pseudoautosomal regions, SHOX and disease. Curr Opin Genet Dev 2006; 16: 233–239.

  69. 69.

    , . Genomic control for association studies. Biometrics 1999; 55: 997–1004.

  70. 70.

    , , , , , et al. A genome-wide association study implicates diacylglycerol kinase eta (DGKH) and several other genes in the etiology of bipolar disorder. Mol Psychiatry 2007; 13: 197–207.

  71. 71.

    , , , , , et al. The presynaptic particle web: ultrastructure, composition, dissolution, and reconstitution. Neuron 2001; 32: 63–77.

  72. 72.

    . The catecholamine hypothesis of affective disorders: a review of supporting evidence. Am J Psychiatry 1965; 122: 509–522.

  73. 73.

    , , , , , . Aczonin, a 550-kD putative scaffolding protein of presynaptic active zones, shares homology regions with Rim and Bassoon and binds profilin. J Cell Biol 1999; 147: 151–162.

  74. 74.

    , , , , , et al. Genome-wide detection and characterization of positive selection in human populations. Nature 2007; 449: 913–918.

  75. 75.

    , , , , , et al. Detection of large-scale variation in the human genome. Nat Genet 2004; 36: 949–951.

  76. 76.

    , , , . Copy-number variation in control population cohorts. Hum Mol Genet 2007; 16(Spec No. 2): R168–R173.

  77. 77.

    , , , , , et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 2007; 17: 1665–1674.

  78. 78.

    , , , , , et al. Genetics of recurrent early-onset depression (GenRED): design and preliminary clinical characteristics of a repository sample for genetic linkage studies. Am J Med Genet B Neuropsychiatr Genet 2003; 119: 118–130.

  79. 79.

    , . Reduction of selection bias in genomewide genetic studies by resampling. Genet Epidemiol 2005; 28: 352–367.

  80. 80.

    , , , , , . Efficiency and power in genetic association studies. Nat Genet 2005; 37: 1217–1223.

  81. 81.

    , , , , , et al. Meta-analyses of genetic studies on major depressive disorder. Mol Psychiatry 2007; 13: 772–785.

  82. 82.

    . Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nat Rev Genet 2007; 8: 657–662.

  83. 83.

    , , , , , et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 2007; 316: 1341–1345.

  84. 84.

    , , , , , et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 2007; 316: 1331–1336.

  85. 85.

    , , , , , et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 2008; 40: 638–645.

  86. 86.

    . Neurotransmitter release. Handb Exp Pharmacol 2008; 184: 1–21.

  87. 87.

    . The catecholamine hypothesis of affective disorders: a review of the supporting evidence. Am J Psychiatry 1965; 122: 509–522.

  88. 88.

    , , , , . A conformational switch in the Piccolo C2A domain regulated by alternative splicing. Nat Struct Mol Biol 2004; 11: 45–53.

  89. 89.

    , , , . An unusual C(2)-domain in the active-zone protein piccolo: implications for Ca(2+) regulation of neurotransmitter release. EMBO J 2001; 20: 1605–1619.

  90. 90.

    , , , . No association between single nucleotide polymorphisms in DLX6 and Piccolo genes at 7q21-q22 and autism. Am J Med Genet B Neuropsychiatr Genet 2003; 119B: 98–101.

  91. 91.

    , . Overcoming the winner's curse: estimating penetrance parameters from case–control data. Am J Hum Genet 2007; 80: 605–615.

  92. 92.

    , , . Estimating odds ratios in genome scans: an approximate conditional likelihood approach. Am J Hum Genet 2008; 82: 1064–1074.

  93. 93.

    . Selection bias in studies of major depression using clinical subjects. J Clin Epidemiol 2000; 53: 351–357.

  94. 94.

    , , . Psychiatric comorbidity and treatment seeking. Sources of selection bias in the study of clinical populations. J Nerv Ment Dis 1993; 181: 467–474.

  95. 95.

    . Limitations of the application of fourfold table analysis to hospital data. Biometrics Bull 1946; 2: 47–53.

  96. 96.

    , . Effects of exclusion criteria in depression treatment studies. J Affect Disord 1994; 32: 21–26.

  97. 97.

    , , , , , . Family history of depression in clinic and community samples. J Affect Disord 1996; 40: 159–168.

  98. 98.

    , , , , . The lifetime history of major depression in women: reliability of diagnosis and heritability. Arch Gen Psychiatry 1993; 50: 863–870.

  99. 99.

    , , , . A hospital-based twin register of the heritability of DSM-IV unipolar depression. Arch Gen Psychiatry 1996; 53: 129–136.

  100. 100.

    , . Implications of human genome architecture for rearrangement-based disorders: the genomic basis of disease. Hum Mol Genet 2004; 13(Spec No. 1): R57–R64.

  101. 101.

    , , , , , . A searchable database of genetic evidence for psychiatric disorders. Am J Med Genet (Neuropsychiatr Genet) 2008; 147: 671–675.

  102. 102.

    WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007; 447: 661–678.

  103. 103.

    , , , , , et al. Genomewide association analysis followed by a replication study implicates a novel candidate gene for neuroticism. Arch Gen Psychiatry 2008; 65: 1062–1071.

  104. 104.

    , , , , , et al. Identification of novel schizophrenia loci by genome-wide association and follow-up. Nat Genet 2008, Jul 30 e-pub ahead of print.

  105. 105.

    , , , , , et al. Collaborative genome-wide association analysis of 10,596 individuals supports a role for Ankyrin-G (ANK3) and the alpha-1C subunit of the L-type voltage-gated calcium channel (CACNA1C) in bipolar disorder. Nat Genet 2008, Aug 17 e-pub ahead of print.

  106. 106.

    , , , , , et al. Large-scale copy number polymorphism in the human genome. Science 2004; 305: 525–528.

  107. 107.

    , , , , , et al. Association between microdeletion and microduplication at 16p11.2 and autism. New Engl J Med 2008; 358: 667–675.

  108. 108.

    , , , , , et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 2008; 320: 539–543.

Download references

Acknowledgements

We acknowledge support from NWO: genetic basis of anxiety and depression (904-61-090); resolving cause and effect in the association between exercise and well-being (904-61-193); twin-family database for behavior genomic studies (480-04-004); twin research focusing on behavior (400-05-717), Center for Medical Systems Biology (NWO Genomics); Spinozapremie (SPI 56-464-14192); Centre for Neurogenomics and Cognitive Research (CNCR-VU); genome-wide analyses of European twin and population cohorts (EU/QLRT-2001-01254); genome scan for neuroticism (NIMH R01 MH059160); Geestkracht program of ZonMW (10-000-1002); matching funds from universities and mental health care institutes involved in NESDA (GGZ Buitenamstel-Geestgronden, Rivierduinen, University Medical Center Groningen, GGZ Lentis, GGZ Friesland, GGZ Drenthe). Genotyping was funded by the Genetic Association Information Network (GAIN) of the Foundation for the US National Institutes of Health, and analysis was supported by grants from GAIN and the NIMH (MH081802). Genotype data were obtained from dbGaP (http://www.ncbi.nlm.nih.gov/dbgap, accession number phs000020.v1.p1). Statistical analyses were carried out on the Genetic Cluster Computer (http://www.geneticcluster.org) which is financially supported by the NWO (480-05-003). Dr Sullivan was also supported by R01s MH074027 and MH077139. Dr Schosser was supported by an Austrian Science Fund Erwin-Schrödinger-Fellowship. We express our thanks to: the GAIN Genotyping group (Dr Gonçalo Abecasis, chair) for help with quality control; Dr Gonçalo Abecasis and Dr Jun Li for assistance with MACH; Dr Shaun Purcell for PLINK; Troy Dumenil (QIMR) for expert assistance with the replication genotyping; Dr Dina Ruano (Portuguese Foundation for Science and Technology, SFRH/BPD/28725/2006); and Dr Pam Madden (DA012854) and Dr Richard Todd (AA013320) for supplying some of the phenotypes used in the Australian sample. Replication genotyping of the STAR*D samples was supported by a grant from the Bowman Family Foundation and the Sidney R Baer, Jr Foundation. We gratefully acknowledge NARSAD for funding the PCLO follow-up genotyping.

Author information

Author notes

    • D I Boomsma
    •  & B W J H Penninx

    These authors contributed equally to this work.

Affiliations

  1. Department of Genetics, University of North Carolina, Chapel Hill, NC, USA

    • P F Sullivan
    • , Q He
    • , Y Hu
    •  & D Lin
  2. VU University Amsterdam, Amsterdam, The Netherlands

    • E J C de Geus
    • , G Willemsen
    • , J J Hottenga
    • , D Posthuma
    • , A B Smit
    • , M Verhage
    •  & D I Boomsma
  3. Queensland Institute for Medical Research, Brisbane, QLD, Australia

    • M R James
    • , S D Gordon
    • , G W Montgomery
    • , N G Martin
    •  & N R Wray
  4. VU University Medical Center Amsterdam, Amsterdam, The Netherlands

    • J H Smit
    • , T Zandbelt
    • , P Heutink
    • , W J Hoogendijk
    • , P Rizzu
    • , R van Dyck
    •  & B W J H Penninx
  5. University of Münster, Münster, Germany

    • V Arolt
    •  & K Domschke
  6. James Cook University, Cairns, QLD, Australia

    • B T Baune
  7. University of Edinburgh, Edinburgh, UK

    • D Blackwood
    • , K A McGhee
    •  & W J Muir
  8. University of Bonn, Bonn, Germany

    • S Cichon
    • , W Maier
    •  & M M Nöthen
  9. University of New England, Armidale, NSW, Australia

    • W L Coventry
  10. Institute of Psychiatry, London, UK

    • A Farmer
    • , P McGuffin
    • , K Pirlo
    •  & A Schosser
  11. Harvard Medical School, Cambridge, MA, USA

    • M Fava
    • , R H Perlis
    •  & J W Smoller
  12. Washington University, St. Louis, MO, USA

    • A C Heath
  13. Max-Planck Institute of Psychiatry, Munich, Germany

    • F Holsboer
    • , M Kohli
    •  & S Lucae
  14. Royal Edinburgh Hospital, Edinburgh, UK

    • D J MacIntyre
  15. University Medical Center Groningen, Groningen, The Netherlands

    • W A Nolen
  16. University of Heidelberg, Heidelberg, Germany

    • M Rietschel
  17. North Carolina State University, Raleigh, NC, USA

    • J-Y Tzeng
  18. Leiden University Medical Center, Leiden, The Netherlands

    • F G Zitman

Authors

  1. Search for P F Sullivan in:

  2. Search for E J C de Geus in:

  3. Search for G Willemsen in:

  4. Search for M R James in:

  5. Search for J H Smit in:

  6. Search for T Zandbelt in:

  7. Search for V Arolt in:

  8. Search for B T Baune in:

  9. Search for D Blackwood in:

  10. Search for S Cichon in:

  11. Search for W L Coventry in:

  12. Search for K Domschke in:

  13. Search for A Farmer in:

  14. Search for M Fava in:

  15. Search for S D Gordon in:

  16. Search for Q He in:

  17. Search for A C Heath in:

  18. Search for P Heutink in:

  19. Search for F Holsboer in:

  20. Search for W J Hoogendijk in:

  21. Search for J J Hottenga in:

  22. Search for Y Hu in:

  23. Search for M Kohli in:

  24. Search for D Lin in:

  25. Search for S Lucae in:

  26. Search for D J MacIntyre in:

  27. Search for W Maier in:

  28. Search for K A McGhee in:

  29. Search for P McGuffin in:

  30. Search for G W Montgomery in:

  31. Search for W J Muir in:

  32. Search for W A Nolen in:

  33. Search for M M Nöthen in:

  34. Search for R H Perlis in:

  35. Search for K Pirlo in:

  36. Search for D Posthuma in:

  37. Search for M Rietschel in:

  38. Search for P Rizzu in:

  39. Search for A Schosser in:

  40. Search for A B Smit in:

  41. Search for J W Smoller in:

  42. Search for J-Y Tzeng in:

  43. Search for R van Dyck in:

  44. Search for M Verhage in:

  45. Search for F G Zitman in:

  46. Search for N G Martin in:

  47. Search for N R Wray in:

  48. Search for D I Boomsma in:

  49. Search for B W J H Penninx in:

Corresponding author

Correspondence to P F Sullivan.

Supplementary information

About this article

Publication history

Received

Revised

Accepted

Published

DOI

https://doi.org/10.1038/mp.2008.125

Conflict of interest/disclosure (past 3 years)

Dr Baune has received honoraria for educational training of psychiatrists and general practitioners from Lundbeck, AstraZeneca and Pfizer Pharmaceuticals and travel grants from AstraZeneca, Bristol-Meyrs Squibb, Janssen and Pfizer Pharmaceuticals. Dr Fava has received: research support from Abbott Laboratories, Alkermes, Aspect Medical Systems, AstraZeneca, Bristol-Myers Squibb Company, Cephalon, Eli Lilly & Company, Forest Pharmaceuticals Inc., GlaxoSmithKline, J&J Pharmaceuticals, Lichtwer Pharma GmbH, Lorex Pharmaceuticals, Novartis, Organon Inc., PamLab, LLC, Pfizer Inc., Pharmavite, Roche, Sanofi-Aventis, Solvay Pharmaceuticals Inc., Synthelabo, Wyeth-Ayerst Laboratories; advisory/consulting fees from Abbott Laboratories, Amarin, Aspect Medical Systems, AstraZeneca, Auspex Pharmaceuticals, Bayer AG, Best Practice Project Management Inc., Biovail Pharmaceuticals Inc., BrainCells Inc., Bristol-Myers Squibb Company, Cephalon, CNS Response, Compellis, Cypress Pharmaceuticals, Dov Pharmaceuticals, Eli Lilly & Company, EPIX Pharmaceuticals, Fabre-Kramer Pharmaceuticals Inc., Forest Pharmaceuticals Inc., GlaxoSmithKline, Grunenthal GmBH, Janssen Pharmaceutica, Jazz Pharmaceuticals, J&J Pharmaceuticals, Knoll Pharmaceutical Company, Lorex Pharmaceuticals, Lundbeck, MedAvante Inc., Merck, Neuronetics, Novartis, Nutrition 21, Organon Inc., PamLab, LLC, Pfizer Inc., PharmaStar, Pharmavite, Precision Human Biolaboratory, Roche, Sanofi-Aventis, Sepracor, Solvay Pharmaceuticals Inc., Somaxon, Somerset Pharmaceuticals, Synthelabo, Takeda, Tetragenex, Transcept Pharmaceuticals, Vanda Pharmaceuticals Inc., Wyeth-Ayerst Laboratories; speaking fees from AstraZeneca, Boehringer-Ingelheim, Bristol-Myers Squibb Company, Cephalon, Eli Lilly & Company, Forest Pharmaceuticals Inc., GlaxoSmithKline, Novartis, Organon Inc., Pfizer Inc., PharmaStar, Primedia, Reed-Elsevier, Wyeth-Ayerst Laboratories; has equity holdings in Compellis, MedAvante; and has royalty/patent, other income for patent applications for SPCD and for a combination of azapirones and bupropion in MDD, copyright royalties for the MGH CPFQ, DESS and SAFER. Dr. Nolen has received: speaking fees from AstraZeneca, Eli Lilly, Pfizer, Servier, Wyeth; unrestricted research funding from AstraZeneca, Eli Lilly, GlaxoSmithKline, Wyeth; and served on advisory boards for AstraZeneca, Cyberonics, Eli Lilly, GlaxoSmithKline, Pfizer, Servier. Dr Perlis has received consulting fees or honoraria from AstraZeneca, Bristol-Myers Squibb, Eli Lilly, GlaxoSmithKline, Pfizer and Proteus; he is a stockholder in Concordant Rater Systems, LLC, and the holder of a patent related to the monitoring of raters in clinical trials. Dr Smoller has consulted to Eli Lilly, received honoraria from Hoffman-La Roche Inc., Enterprise Analysis Corp. and MPM Capital, and has served on an advisory board for Roche Diagnostics Corporation. Dr Sullivan has received unrestricted research support from Eli Lilly.

Supplementary Information accompanies the paper on the Molecular Psychiatry website (http://www.nature.com/mp)

Further reading