Large, rare copy number variants (CNVs) have been implicated in a variety of psychiatric disorders, but the role of CNVs in recurrent depression is unclear. We performed a genome-wide analysis of large, rare CNVs in 3106 cases of recurrent depression, 459 controls screened for lifetime-absence of psychiatric disorder and 5619 unscreened controls from phase 2 of the Wellcome Trust Case Control Consortium (WTCCC2). We compared the frequency of cases with CNVs against the frequency observed in each control group, analysing CNVs over the whole genome, genic, intergenic, intronic and exonic regions. We found that deletion CNVs were associated with recurrent depression, whereas duplications were not. The effect was significant when comparing cases with WTCCC2 controls (P=7.7 × 10−6, odds ratio (OR) =1.25 (95% confidence interval (CI) 1.13–1.37)) and to screened controls (P=5.6 × 10−4, OR=1.52 (95% CI 1.20–1.93). Further analysis showed that CNVs deleting protein coding regions were largely responsible for the association. Within an analysis of regions previously implicated in schizophrenia, we found an overall enrichment of CNVs in our cases when compared with screened controls (P=0.019). We observe an ordered increase of samples with deletion CNVs, with the lowest proportion seen in screened controls, the next highest in unscreened controls and the highest in cases. This may suggest that the absence of deletion CNVs, especially in genes, is associated with resilience to recurrent depression.
Recurrent depressive disorder is a common psychiatric disorder associated with high morbidity, high economic burden and high rates of completed suicide.1, 2, 3 Depressive disorder is heritable, with recurrent and severe types being substantially so,4, 5 but the specific genes involved are not known. As recurrent depression is associated with increased mortality at a young age,3 it is logical to expect that some of the genetic contribution to the disorder may be explained by rare genomic variants operating under negative selection pressure.6
Genome-wide association studies of single nucleotide polymorphisms have been inconsistent in depression.7, 8 However there is increasingly robust evidence that copy number variants (CNVs), defined as submicroscopic deletions or duplications of genomic DNA seen at a frequency of <1 in 100 in the general population,9 are associated with a range of psychiatric disorders.10, 11, 12, 13, 14, 15, 16 A genome-wide analysis of CNVs in recurrent depression has not yet been undertaken. Thus we decided to analyse Illumina microarray data for evidence of copy number variation in a large cohort of cases of recurrent depression and compare this with two control samples; one consisting of participants screened for a lifetime-absence of psychiatric disorder and another larger, unscreened population control cohort from phase 2 of the Wellcome Trust Case Control Consortium (WTCCC2).
Materials and methods
3106 cases (2188 female and 918 male) of depression were taken from two studies of recurrent depression and a pharmacogenetic study of antidepressant response. Details of these studies have been published elsewhere17, 18, 19 and further details are also provided in the Supplementary Information. All studies were approved by local ethics committees and informed written consent was gained from all participants. All cases were ascertained by interview with the Schedules for Clinical Assessment in Neuropsychiatry20 and fulfilled criteria for depression of at least moderate severity (two or more episodes) under ICD-10 or DSM-IV criteria. Subjects were excluded if there was a history or family history of schizophrenia or bipolar disorder or if mood symptoms were secondary to alcohol or substance misuse. Subjects with mood congruent psychotic symptoms were not excluded.
A total of 459 controls (281 female and 178 male) comprehensively screened for a lifetime absence of psychiatric disorder were recruited. Potential subjects for the screened control group were collected from students and staff at Kings College London by internal email advertisement and by local media advertisement. Subjects were interviewed with a modified version of the Past History Schedule21 and were included only if they demonstrated no evidence of past or present psychiatric disorder. Subjects were also interviewed with the Beck Depression Inventory22 and excluded if they scored >10.
As an additional control set, we also used 5619 control samples (2611 female and 2668 male) from the WTCCC2, which is composed of the 1958 British birth cohort and the national blood service cohort. The 1958 British birth cohort is a sample of sequential live births in the UK during 1 week in 1958, while the national blood service cohort is drawn from subjects who have donated blood to the UK blood services collection (further details can be found in the Supplementary Information).
Our control samples are of exclusively UK ethnic origin. However, some of our case samples have non-UK, European heritage. Within our analyses we have attempted to account for this potential confounding factor by performing an additional analysis restricted to samples with exclusively UK origins.
All DNA samples were obtained from venous blood. We genotyped our case sample and screened control samples on the Illumina (San Diego, CA, USA) HumanHap 610-Quad Beadchip and all DNA samples were processed contemporaneously at the same laboratory with stringent quality control and full Laboratory Information Management Systems monitoring. Samples from the WTCCC2 cohort were genotyped on a modified Illumina 1M beadchip, which is technically very similar to the 610-Quad beadchip. We were granted access to the chip intensity data (Illumina ‘.idat’ files). All probe intensity data were normalised and processed using standard Illumina protocols with Illumina's GenomeStudio platform to obtain the log R ratio (LRR) and B allele frequency (BAF) at each marker. The LRR and BAF represent, for each marker in each sample, a summed probe intensity ratio derived from comparison with a canonical value calculated from all samples, and, in the case of bi-allelic probes, an allelic intensity ratio, respectively.
All our calls and analysis are based on build 18 of the human genome reference sequence. We processed LRR and BAF values for autosomal markers in common to both Illumina arrays (n=562 680) using PennCNV (version released August 2009), a popular, open-source package designed for Illumina array data that implements a hidden Markov model, a Viterbi algorithm, expectation maximisation and takes account of the distance between consecutive markers to make copy number calls.23
Sample and CNV quality control
Samples with a genotype call rate of <98%, a BAF s.d. (BAFSD) of >0.045 or a LRR s.d. (LRRSD) of >0.3 were excluded from analysis. We excluded all CNVs of <100 kb and all CNVs made with <10 consecutive markers. We excluded samples with more than a total of 20 CNVs, as such samples are more likely to contain artefactual calls. We excluded calls that fell within 500 kb of the centromere or telomere of each chromosome and any calls falling within immunoglobulin regions. We excluded calls occurring in >1% of our control sample, and calls made over regions of the genome where the marker density was low (<1 marker per 200 000 bp (n=134)).
As an additional level of assurance that our results were not biased by poorly performing samples clustering in particular cohorts, we removed the worst performing 10% of samples as defined by the LRRSD and BAFSD across all cohorts and re-analysed our data. This tightened our thresholds for inclusion of a sample in this analysis to a LRRSD of 0.2241, a BAFSD of 0.0390 and a genotype call rate of >99%.
To test for statistical significance and effect sizes, we calculated the odds ratios (OR), Pearson's χ2 or Fisher's exact statistic for the frequency of samples with CNVs in cases and controls. We estimated the proportion of genetic risk for depression contributed to by our results by performing a logistic regression analysis and calculating pseudo R2. All statistical calculations were performed in STATA IC v10.124 or PLINK v1.07.25 Graphs were created using STATA IC v10.1 and Microsoft Excel (2011).
Further information on methods and analysis can be found in the Supplementary Information.
In all, 2723 cases of recurrent depression, 348 screened controls and 4828 unscreened controls passed quality control and were used in our main analysis. Overall, the proportion of samples with a CNV was increased in cases when compared with screened controls and WTCCC2 controls (P=0.019, OR=1.31 (95% confidence intervals (CI) 1.04–1.64) and P=0.025, OR=1.12 (95% CI 1.01–1.23), respectively). After stratifying by type of CNV, deletions accounted for this change in both cohorts (P=5.6 × 10−4, OR=1.52 (95% CI 1.20–1.93) and P=7.7 × 10−6, OR=1.25 (95% CI 1.13–1.37), respectively). Table 1 illustrates the frequency of samples with a CNV stratified into deletions and duplications, in screened controls, WTCCC2 controls, and cases. Figure 1 illustrates the proportion of samples with a CNV, stratified by type of CNV, across cohorts.
We then stratified our dataset into samples containing CNVs covering gene-coding (genic) regions and non-gene-coding (intergenic) regions, as defined by RefSeq gene annotation coordinates available from the UCSC genome browser (http://genome.ucsc.edu/). The frequency of samples with deletion CNVs in gene-coding regions was significantly increased in cases when compared with screened controls and WTCCC2 controls (P=4.2 × 10−5, OR=1.79 (95% CI 1.35–2.35) and P=7.2 × 10−6, OR=1.27 (95% CI 1.14–1.41), respectively). We further divided the samples with gene-coding deletion CNVs into those with deletions interrupting exons and those with only intronic deletions. We found an increased frequency of cases with an exonic deletion CNV when compared with screened controls and WTCCC2 controls (P=1.70 × 10−5, OR=1.87 (95% CI 1.40–2.49) and P=1.04 × 10−5, OR=1.27 (95% CI 1.14–1.41), respectively). Intronic deletions were not associated (P=1.00, OR=1.00 (95% CI 0.38–2.60) and P=0.58, OR=0.91 (95% CI 0.66–1.24), respectively); however, low numbers of samples with only intronic CNVs reduced our power to detect an effect. Table 2 details the results from this analysis and Figure 2 illustrates the frequency of samples with exonic and intronic deletions across cohorts. Supplementary Tables S1a and S1b detail all results for all regions, while Supplementary Figure S1a contrasts the results for deletion CNVs shown in Figure 2 with the results for duplication CNVs. Further details on absolute numbers of CNVs, number of CNVs per sample, and mean and median CNV size can be found in Supplementary Table S1k. Within an analysis for CNVs that occur only once in the dataset (singleton CNVs), we found a broadly similar enrichment within our cases when compared with the WTCCC2 control cohort (one-sided empirical P value =0.0054). Further details of this analysis can be found in the Supplementary Information.
Analyses such as ours can be affected by differences in sample quality between groups, and also by population stratification. We attempted to account for this by performing analyses, post hoc, excluding the worst performing samples across our data set and restricted to a UK-only sample set. The association between deletion CNVs occurring across the whole genome, gene-coding regions and exons remains significant in both analyses. Full details of these analyses are contained in pages 6–15 of the Supplementary Information.
Given the differences indicated in Figures 1 and 2 between the screened control cohort and the WTCCC2 control cohort, we further analysed these two cohorts for significant differences, again stratifying by type of CNV made over (A) genic and intergenic areas and (B) exonic and intronic areas. There were no significant differences for the frequency of samples with CNVs throughout the genome between screened controls and WTCCC2 controls, although there was a trend for deletion CNVs (P=0.096, OR=1.22 (95% CI 0.97–1.54)). However, in the analysis stratified into genic/intergenic and exonic/intronic CNVs, significantly more deletion CNVs were seen in the WTCCC2 group than the screened control group in genic regions and exons (P=0.016, OR=1.40 (95% CI 1.06–1.84); P=0.0074, OR=1.47 (95% CI 1.11–1.95), respectively), but not in intergenic and intronic regions (P=0.79, OR=1.04 (95% CI 0.78–1.39) and P=1.00, OR=1.06 (95% CI 0.52–2.15), respectively). Again, low numbers of samples with only intronic CNVs reduced our power to detect an effect. No significant differences were seen in duplication CNV frequency. Results are presented in Supplementary Table S1g.
The WTCCC2 cohort is comprised of the 1958 British birth cohort and the national blood service cohort. As these represent samples drawn from different populations, we performed an additional analysis comparing the frequency of samples with CNVs between these two subsets. More detail on this, details on stratification of our results by gender, and other information can be found in pages 18–22 of the Supplementary Information.
CNVs in various regions of the genome have previously been associated with psychiatric disorders such as schizophrenia.26 We performed an analysis across regions previously implicated in schizophrenia and a more detailed analysis of regions 1q21.1, 15q13.3 and 22q11.2.13, 14, 16 We found that our cases were significantly enriched with CNVs in regions previously implicated in schizophrenia when compared with screened controls and when both deletions and duplications were considered together (P=0.019, OR=3.21 (95% CI 1.07–9.68)). However, no significant difference was observed when cases were compared with the WTCCC2 controls. We observed no significant differences in CNV frequency in the 1q21.1 region. We saw a nominally reduced frequency of cases with deletion or duplication CNVs in 15q13.3 compared with the national blood service subset of the WTCCC2 sample (P=0.044). We saw a nominally increased frequency of samples with deletion CNVs in 22q11.2 in our case sample when compared with the national blood service subset of the WTCCC2 control sample (P=0.045). Full details, including tables of frequencies for all regions and UCSC browser illustrations and individual CNV plots for regions 1q21.1, 15q13.3 and 22q11.2, can be found in pages 25–33 of the Supplementary Information.
This is the largest analysis of CNVs in recurrent depression performed to date. The results of our analysis suggest that rare deletion CNVs over 100 kb in size are significantly enriched in cases of recurrent depression. Furthermore, genic and exonic deletion CNVs show the greatest enrichment in our sample. We see a continuum of increasing deletion CNV frequency throughout our cohorts with the lowest frequency in screened controls, raised in unscreened controls and highest in cases. The only other study of CNVs in depressive disorder thus far focussed on individual variants rather than results from across the genome.27
There is some weight to the hypothesis that cases of recurrent depression will be associated with CNVs. Other studies have found that deletion CNVs, and to a lesser extent duplication CNVs, are also associated with psychiatric disorders such as autism, schizophrenia and ADHD (see, for example refs. 11, 12, 15). Furthermore, other research suggests that similar CNVs can predispose to a variety of neurological and psychiatric disorders.10 Other studies of CNVs in affective disorders have concentrated on bipolar disorder.28, 29 These studies do not find an association of CNVs with cases. Recurrent depression and bipolar disorder are overlapping, but distinct disorders, so both results may be correct. In general, the literature on CNVs in psychiatric disorders (especially autism and schizophrenia) supports the involvement of CNVs in specific areas of the genome, although the absolute numbers at each locus are low, and large sample sizes are needed to provide statistically significant evidence of association. In Supplementary Table S2a, we present counts of CNVs in areas of the genome previously associated with schizophrenia. Although we find a significant difference when the total number of cases with CNVs in these areas is compared with the total number of screened controls with CNVs in these areas, no comparisons within the individual areas are significant. However, some areas warrant further research. For example, we see two and three cases, respectively, with the microdeletion and microduplication of 16p11.2 that has previously been associated with autism and schizophrenia, respectively.30, 31 Two out of three cases with the 16p11.2 duplication, but neither of the cases with the deletion, also had mood congruent psychotic symptoms. One case has the large 22q11.2 microdeletion associated with numerous clinical syndromes (as well as schizophrenia).32 Exonic deletions in neurexin 1 have also been suggested to have a role in schizophrenia.33, 34 We see two cases with microdeletions affecting neurexin 1 in our cohort and one case with a deletion immediately downstream of the gene. Four WTCCC2 controls have neurexin 1 deletions, and a further two have deletions immediately downstream of the gene. No screened controls have deletions of neurexin 1 within or downstream of the gene. Our data provide evidence for the presence of specific CNVs in some regions of the genome previously implicated in other psychiatric disorders in a minority of our cases.
In 1967, Gottesman and Shields35 commented that psychiatric disorders may represent the extremes of more general continuums of normal psychosocial functioning. They highlighted that this implied that the genetic aetiology of psychiatric disorders was likely to be highly complex. Modern research in psychiatric genetics indicates that this model may be applicable, and its renaissance is gathering momentum.25, 36 Perhaps, those with a low number of deleterious CNVs, particularly deletions, are more likely to undergo development in a manner that fosters emotionally and cognitively adaptive behaviours associated with higher resilience to developing psychiatric disorders such as recurrent depression. Such a theory may be applicable to other complex disorders in medicine; however, it does suggest that the genetic aetiology of complex diseases will involve multiple genes, each with multiple biological effects, each subject to gene–gene interactions, and the entire milieu subject to environmental modification over the lifespan of the organism. Particular patterns of genetic variation and environmental interaction may be more strongly associated with complex disease than specific genes per se.
We performed a logistic regression analysis on our data (details in methods and Supplementary Information), which indicated that 0.87% (pseudo R2, Nagelkerke method, Supplementary Table S3g) of the variance between RDD cases and screened controls is explained by our results. Although not explaining a large proportion, this is nonetheless a significant finding in a study of the genetic architecture of a complex disorder and underlines the genetic complexity of disorders such as recurrent depression. Only a limited number of large CNV events can be reliably detected using microarrays such as the Illumina Beadarray used in our experiment, and what we observe may extend to rare variants in general. In the near future, more detailed studies, such as those involving large-scale whole-genome-sequencing efforts, may uncover more rare variants than that are currently detectable. This information will then allow us to make a better assessment of the role of rare variants in recurrent depression and psychiatric disorders in general.
Recurrent depression represents an increasingly prominent health problem, especially in developed countries. It conspicuously lacks any clinically useful biomarkers to guide diagnosis or treatment, which often proceeds on a trial-and-error basis, and frequently does not lead to sustained remission. Elucidating the genetic architecture of diseases like recurrent depression has the potential to, in collaboration with other avenues of research, inform both diagnosis and treatment and our results thus represent a significant step towards identifying the biological underpinnings of a complex and hitherto genetically obscure psychiatric disorder.
This study was funded by a joint grant from the UK Medical Research Council and GlaxoSmithKline (G0701420). James Rucker was supported by a fellowship from the Wellcome Trust (086635). Alexandra Schosser was supported by the Erwin-Schroedinger Fellowship (J2647) of the Austrian Science Funds. Alexandra Schosser, Inti Pedroso and Sarah Cohen-Woods received financial support from the National Institute for Health Research (NIHR) Specialist Biomedical Research Centre for Mental Health at the South London and Maudsley NHS Foundation Trust and the Institute of Psychiatry, King's College London. Margarita Rivera was supported by a Marie Curie Intra-European Fellowship within the 7th European Community Framework Programme. The GENDEP study was funded by a European Commission Framework 6 grant, EC Contract Ref.: LSHB-CT-2003-503428 and GlaxoSmithKline contributed by funding an add-on project in the London centre. The population-based study in Lausanne was supported by three grants from the Swiss National Science Foundation (No. 3200B0-105993, No. 3200B0-118308, No. 33CSCO-122661) and from GlaxoSmithKline (Psychiatry Center of Excellence for Drug Discovery and Genetics Division, Drug Discovery Verona, R&D). Rudolf Uher and Peter McGuffin are supported by a grant from the European Commission (Grant Agreement No. 115008). Genotyping was performed at the Centre Nationale De Genotypage, Evry, Paris, with acknowledgement to Simon Heath, Ivo Gut and Mark Lathrop. We acknowledge the contribution of phase 2 of the Wellcome Trust Case Control Consortium in providing access to control data sets from the 1958 British birth cohort and the national blood service cohort.
About this article
Supplementary Information accompanies the paper on the Molecular Psychiatry website (http://www.nature.com/mp)