Introduction

Depression is one of the most common psychiatric disorders with an estimated lifetime prevalence of 14.6% in high-income and 11.1% in low-income countries.1 Worldwide, depression is responsible for 74.5 million disability-adjusted life years2 and in 2010 the total economic burden of depression in the United States alone was estimated at $210.5 billion,3 indicating that depression is a major strain on society. In order to create new and more effective therapeutic and preventive strategies, unravelling the pathogenesis of depression is imperative and elucidation of its genetic determinants is a critical step in this process.

The heritability of major depressive disorder (MDD) has been estimated to be between 30 and 50%.4 Nevertheless, genome-wide association studies (GWASs) have shown limited success in identifying its genetic basis. In a mega-analysis with more than 18 000 subjects from European ancestry, no single-nucleotide polymorphism (SNP) reached genome-wide significance.4 However, recently, a meta-analysis of results from three studies, together containing 180 866 individuals, found two lead SNPs associated with depression.5 In addition, another study combining three cohorts in a joint analysis with a total of 478 240 subjects found 17 independent SNPs significantly associated with a diagnosis of MDD,6 suggesting that GWASs can successfully identify genetic associations with highly polygenic phenotypes. However, the identified effect sizes were small. Moreover, to gain sufficient statistical power the sample sizes required were extremely large and, as a consequence, unavoidably heterogeneous. Therefore, although GWASs have contributed greatly to genetic mapping of complex human traits, their success has been limited by the fact that, aside from SNPs, these studies are unsuitable to assess the contribution of other genetic polymorphisms, especially DNA repeat sequences.7

Tandem repeats constitute ~3% of the human genome, a higher percentage than the entirety of the protein-coding sequences,8 substantially contributing to genetic variation.9, 10 Many rare hereditary disorders are caused by expansions of simple DNA repeat sequences.11 However, the association of DNA repeat sequences with more common diseases is largely unknown. Polyglutamine diseases are the most prevalent disorders caused by an expanded DNA repeat sequence.11, 12 These diseases are caused by an expansion of a trinucleotide repeat (cytosine–adenine–guanine (CAG)) in the translated region of various genes. As CAG encodes the amino acid glutamine, the expansion in the trinucleotide repeat sequence results in an elongated polyglutamine domain in the associated proteins.12 The most common polyglutamine disease is Huntington disease, a severe neurodegenerative disorder characterized by both motor and neuropsychiatric impairment. Huntington disease is caused by a CAG repeat expansion in exon 1 of the huntingtin (HTT) gene.13, 14 Recently, we demonstrated that both relatively short and relatively large CAG repeats in the longer HTT allele are associated with lifetime depression, suggesting that repeat polymorphisms could also act as complex genetic modifiers of depression and account for part of its ‘missing heritability’.15

Apart from Huntington disease, eight other polyglutamine disorders, all of which are neurodegenerative disorders, are frequently associated with considerable neuropsychiatric impairment (Table 1).16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 These rare disorders result from relatively large repeat expansions in polyglutamine disease-associated genes (PDAGs). However, to what extent more common repeat length variations in the normal range of these genes could act as genetic modifiers of depression in the general population is still unknown. Hence, here we aimed to assess the contribution of CAG repeat length variations in these other PDAGs to depression susceptibility using data from two well-defined Dutch cohorts: The Netherlands Study of Depression and Anxiety (NESDA) and the Netherlands Study of Depression in Old Persons (NESDO).

Table 1 Summary genotyped polyglutamine disease-associated genes

Materials and methods

We genotyped eight PDAGs (ATXN1, ATXN2, ATXN3, CACNA1A, ATXN7, TBP, ATN1 and AR) in all participants with sufficient amounts of DNA available from blood samples of two well-characterized Dutch cohorts: the NESDA (cohort 1) and the NESDO (cohort 2) cohorts (Table 1 and Supplementary Figures 1–17).

Cohort 1

The NESDA is a cohort study among 2981 participants aged 18–65 years.36 The participants were recruited from the general population, general practices and mental health-care institutes. The sample included 1973 subjects with a lifetime diagnosis of depression and/or dysthymia (including 1925 patients with MDD), 635 subjects with a lifetime anxiety disorder without lifetime depression and 373 healthy controls.36 Diagnoses were made in accordance with the Diagnostic and Statistical Manuel of Mental Disorders Fourth Edition criteria using the WHO Composite International Diagnostic Interview.37

Cohort 2

The NESDO is a cohort study among 510 participants aged 60–93 years.38 The participants were recruited from both general practices and mental health-care institutes. The sample included 378 depressed individuals (360 with MDD) and 132 healthy controls. The same methods for diagnosing depression were used as in the NESDA.38

Genotyping

A polymerase chain reaction (PCR) was performed in a TProfessional thermocycler (Biometra, Westburg, Leusden, the Netherlands) with labelled primers flanking the CAG stretch of the PDAGs (Biolegio, Nijmegen, the Netherlands; Supplementary Table 1). The PCR was performed using 10 ng of genomic DNA, 1 × OneTaq mastermix (New England Biolabs, Ipswich, MA, USA, OneTaq Hot start with GC Buffer mastermix), 1 μl of primer Mix A or B (Supplementary Table 1) and Aqua B. Braun water to a final volume of 10 μl. The PCR was run with 27 cycles of 30 s, denaturation at 94 °C, 1 min of annealing at 60 °C and 2 min elongation at 68 °C, preceded by 5 min of initial denaturation at 94 °C. Final elongation was performed at 69 °C for 5 min. Every PCR included a negative control without genomic DNA and a reference sample of CEPH 1347-02 genomic DNA. The PCR products were run on an ABI 3730 automatic DNA sequencer (Applied Biosystems, Foster City, CA, USA) and analysed using the GeneMarker software version 2.4.0. (SoftGenetics, State College, PA, USA). For every analysis, we included three controls with known CAG repeat lengths for each PDAG to assure that every run was performed reliably. All assessments were made with cases and controls randomized on plates and blinding with respect to disease status information.

Statistical analysis

Binary logistic regression was used to assess whether CAG repeat sizes in the two alleles of each PDAG were associated with the risk of lifetime depression. For each PDAG, in an initial model, the presence of lifetime depression (that is, MDD and/or dysthymia) was set as the dependent variable and the CAG repeat lengths of both alleles were used as the independent variables. To assess interaction effects between the two alleles or nonlinear effects, a product term of the two alleles and a quadratic term for each allele were added to the model. In case the product term or the quadratic terms were not significant, these predictors were removed from the model and the analysis was repeated. Subsequently, we adjusted the results for the effects of sex, age and education level (coded as ‘basic’, ‘intermediate’ and ‘high’36, 38) in order to assess whether the effect of CAG repeat size variations was independent of these well-established risk factors for depression. The Nagelkerke R2 was used to assess the proportion of variability explained by the predictors in the model. To account for potential effects of heteroscedasticity and influential points, all statistical significance tests were based on robust estimators of s.e.'s. Moreover, to visualise our results as well as to assure that the results were not unduly affected by violated model assumptions, we also applied a non-parametric method: we calculated the odds ratios (ORs) for a lifetime diagnosis of depression per CAG repeat group. The CAG repeat groups were defined based on the median CAG repeat size of each allele. A division as such resulted in three adequately sized groups: I, both allelestheir median; II, the relatively shorter allelethe median and the relatively longer allele>the median; III, both alleles>their median. Subsequently, the ORs were compared with the Fisher’s exact test. In order to account for multiple testing, we applied a false discovery rate correction as described by Benjamini and Hochberg, assuming eight independent tests with a two-sided α of 0.05.39 All data are displayed as means and 95% confidence intervals (CIs), unless otherwise specified. All analyses were performed in SPSS version 23.0 (IBM SPSS Statistics for Windows, IBM, Armonk, NY, USA).

Results

Risk of lifetime depression increases with higher ATXN7 CAG repeat sizes

In total, we were able to genotype a total of 2979 participants for ATXN7 (Table 1). This included 1998 depressed subjects and 981 non-depressed subjects. For 512 individuals, too little DNA material was available to determine the CAG repeat length in ATXN7 (Supplementary Table 2). These lacking samples were missing completely at random. The CAG repeat lengths ranged from 7 to 19 repeats (Table 1).

We found a significant association between the risk of lifetime depression and the CAG repeat length of ATXN7 in the shorter allele (β=0.202, P=0.006). Adjusting for the effects of gender, age and education hardly changed these results (β=0.184, P=0.013; Supplementary Table 3). After adjusting for multiple testing, this association remained significant. Inclusion of the CAG repeat length of the ATXN7 allele in the model increased the proportion of explained genetic variation from the baseline model including only gender, age and level of education by 0.004 (that is, the R2 increased from 0.024 to 0.028). Considering our control group included subjects with anxiety disorders without comorbid depression (n=348), we also performed a sensitivity analysis by excluding these individuals from the control group. Neither the parameter estimates nor their significance were materially altered by this procedure (β=0.215, P=0.009).

For the non-parametric method, we divided all subjects based on the median CAG repeat size for each ATXN7 allele (median relatively short allele=10, median relatively long allele=10) and compared the ORs of lifetime depression among the groups (Table 2). The group containing the largest number of individuals was used as the reference category. Comparing the other groups to the reference category demonstrated the significant effect of the shorter allele on the risk of lifetime depression (Figure 1). When the relatively short allele and thus also the relatively long allele both contained a CAG repeat length larger than 10 repeats, the odds for lifetime depression almost doubled compared with the reference category in which both ATXN7 alleles had a CAG repeat number equal to or smaller than the median.

Table 2 The distribution of participants in three ATXN7 categories
Figure 1
figure 1

Odds ratio for lifetime depression per ATXN7 category. The odds ratio for lifetime depression increases significantly when both the relatively long ATXN7 allele and the relatively short ATXN7 allele exceed the median CAG repeat number of 10. The odds of having lifetime depression almost doubles. The group having both alleles with a CAG repeat number 10 was the largest and, therefore, was set as the reference category. I, both alleles of ATXN7 contain a CAG repeat number 10; II, the relatively longer ATXN7 allele contains a CAG repeat number >10 and the relatively shorter allele contains a CAG repeat number 10; III, both alleles of ATXN7 contain a CAG repeat number >10. Error bars indicate±s.e. **P<0.01 by the Fisher’s exact test in comparison with the reference category; CAG, cytosine–adenine–guanine.

TBP CAG repeat sizes in both alleles interact to affect the risk of lifetime depression

A total of 3238 individuals were genotyped for TBP (Table 1). This included 2180 people diagnosed with lifetime depression and 1058 non-depressed people. For 253 individuals, too little DNA material was available to determine the CAG repeat length in TBP (Supplementary Table 2). The lacking samples were missing completely at random. For TBP, we found CAG repeat lengths ranging between 27 and 48 repeats (Table 1).

We found the interaction between the number of CAG repeats in the two TBP alleles to be significantly associated with the risk of lifetime depression (TBP short allele: β=−2.270, P=0.004; TBP long allele: β=−2.112, P=0.006; TBP interaction term: β=0.060, P=0.005). When adjusting for the effects of gender, age and education, the results hardly changed (TBP short allele: β=−2.302, P=0.004; TBP long allele: β=−2.150, P=0.005; TBP interaction term: β=0.061, P=0.005; Supplementary Table 3). Furthermore, the results remained significant after performing the sensitivity analysis and applying the false discovery rate correction.39 The proportion of explained genetic variation increased by 0.005 from the baseline model including only gender, age and level of education (that is, the R2 increased from 0.023 to 0.028).

For the non-parametric method, we divided all subjects in three groups based on the median CAG repeat size for each TBP allele (median relatively short TBP allele=37, median relatively long TBP allele=38) and compared the ORs of lifetime depression among the groups (Table 3). The largest group was set as the reference category. The comparison of the groups demonstrated the significant effect of the interaction between the relatively short TBP allele and the relatively long TBP allele on the risk of depression. The risk of lifetime depression was significantly higher when both TBP alleles had a CAG repeat length exceeding their median compared with the reference category, in which both alleles had a CAG repeat number equal to or lower than their median (Figure 2).

Table 3 The distribution of participants in three TBP categories
Figure 2
figure 2

Odds ratio for lifetime depression per TBP category. The odds ratio for lifetime depression increases significantly when both the relatively shorter TBP allele and the relatively longer TBP allele exceed their median CAG repeat number (median short allele=37, median long allele=38). The group with both alleles being equal to or smaller than their medians was the largest and, therefore, is defined as the reference category. I, both TBP alleles contain a CAG repeat number their median; II, the relatively longer TBP allele contains a CAG repeat number>the median and the relatively shorter TBP allele contains a CAG repeat number the median; III, both TBP alleles contain a CAG repeat number > their median. Error bars indicate±one s.e. *P<0.05 by the Fisher’s exact test in comparison with the reference category. CAG, cytosine–adenine–guanine; TBP, thymine-adenine-thymine-adenine (TATA) box-binding protein.

CAG repeat sizes in other PDAGs were not associated with risk of lifetime depression

The number of CAG repeats in ATXN1, ATXN2, ATXN3, CACNA1A, ATN1 and AR were not associated with the risk of lifetime depression. Neither the main effect of the alleles, nor the interactions between the two alleles, or the quadratic terms, were significantly associated with the presence of lifetime depression in the combined cohort. As AR is located on the X-chromosome, we performed one analysis stratifying the data by gender and another analysis using only either the relatively short or the relatively long allele in the model. None of the two approaches demonstrated a significant association between CAG repeat number in AR and depression (Supplementary Table 3).

The effects of HTT, ATXN7 and TBP CAG repeat sizes are independent

In previous research, we demonstrated that both the main term of the longer HTT allele and the quadratic term of the longer HTT allele are also significantly associated with a diagnosis of lifetime depression.15 To assess the contribution of the CAG repeat polymorphisms in all PDAGs to depression risk, we examined the degree to which HTT, ATXN7 and TBP CAG repeat sizes explain depression heritability. To this end, we applied a multivariate model with the CAG repeat sizes in these three PDAGs as predictors. In addition, we adjusted for the effects of gender, age and education. We found that the parameter estimates as well as their associated statistical significances hardly changed, indicating that the effects of HTT, ATXN7 and TBP CAG repeat sizes on depression susceptibility are mutually independent (HTT long allele: β=−0.286, P=0.018; HTT long allele quadratic term β=0.006, P=0.017; ATXN7 short allele: β=0.210, P=0.005; TBP short allele: β=−2.728, P=0.001; TBP long allele, β=−2.566; P=0.001; TBP interaction term: β=0.072, P=0.001). Inclusion of the CAG repeat sizes in HTT, ATXN7 and TBP in the model increased R2 by 0.014 from the baseline model including only gender, age and level of education (from 0.024 to 0.038), indicating that the CAG repeat lengths in these PDAGs can account for an additional 1.4% of the genetic variation on the observed probability scale. We also derived the R2 on the liability scale as described previously.40 Assuming that depression has a lifetime prevalence of ~15% in the Netherlands41 and adjusting for the oversampling of patients with depression in our cohort, the R2 on the liability scale was 0.0191, indicating that CAG repeat size polymorphisms in HTT, ATXN7 and TBP together can account for ~1.9% of depression heritability.

The prevalence of intermediate and pathological PGDA alleles

In total, four genotyped subjects had a CAG repeat number in the pathological range of a PDAG. One individual had a CAG repeat number of 22 in CACNA1A, one subject had a CAG repeat number of 48 in TBP and two people had a CAG repeat number of 33 and 36 in ATXN2 (Table 1). All four were depressed, but at the ages of 53 years for CACNA1A, 59 years for TBP, and 26 and 31 years for ATXN2, none had been diagnosed with the respective diseases (that is, SCA6, SCA17 and SCA2).

One person had a CAG repeat number in the longer TBP allele of 44, belonging to the intermediate range. This individual was also diagnosed with lifetime depression. Furthermore, one depressed individual had an intermediate CAG repeat number in the longer allele of ATXN3 (that is, 49 repeats). For ATXN1, ATXN7, ATN1 and AR all subjects had a CAG repeat number within the normal range (Table 1).

When we included the two depressed individuals with a CAG repeat length in the reduced penetrance range of HTT found previously,15 we observed a trend for a higher proportion of subjects with a CAG repeat in the intermediate or pathological range of a PDAG in the depressed group (P=0.059).

Discussion

To our knowledge, this is the first study assessing the influence of CAG repeat size variations in the normal range of PDAGs—other than HTT—on the risk of lifetime depression. Interestingly, we found an association between the presence of lifetime depression and the CAG repeat length in two PDAGs, that is, ATXN7 and TBP. The main relationship we found was that when the CAG repeat number of both alleles in either ATXN7 or TBP was relatively large, the odds for a diagnosis of lifetime depression markedly increased. Moreover, six genotyped individuals had a CAG repeat number in at least one of their PDAGs that extended into the intermediate or pathological range (two intermediate, four pathological). All of these subjects were diagnosed with lifetime depression. These findings are in support of the hypothesis that repeat polymorphisms may act as complex genetic modifiers of depression and thus could account for part of its ‘missing heritability’.

In our study, we found that the odds of lifetime depression almost doubled in individuals with a CAG repeat length of >10 in both ATXN7 alleles. Similarly, we found that the risk of lifetime depression markedly increased (OR=1.33) in individuals who had a CAG repeat length exceeding the median in both TBP alleles. These effect sizes are considerable, especially when compared with the effect sizes of the two most significant SNPs found in a recent GWAS: OR=0.955 (95% CI 0.943–0.968) and OR=1.051 (95% CI 1.036–1.067).6 Although we cannot fully exclude potential modifying effects of SNPs in linkage disequilibrium (LD) with ATXN7 and TBP, the fact that such SNPs in these PDAGs have not been detected before in GWASs suggests that the influence of other genetic variants in LD with the CAG repeat size in these genes is likely to be minimal.42 Furthermore, we found that CAG repeat size polymorphisms in HTT, ATXN7 and TBP together can account for ~1.9% of depression heritability. Using the same population prevalence of 15% for depression, the meta-analysis in the recent GWAS calculated a heritability of 5.9%.6 The amount of variance explained by these SNPs is larger than the amount explained by the CAG repeat polymorphisms we found. However, the meta-analysis in this GWAS examined SNPs in the entire genome of 326 113 individuals,6 whereas our estimation was based on the effects of CAG repeat variations in only nine PDAGs. This fact suggests that investigating additional repeat polymorphisms in the human genome could lead to the identification of many more novel genetic determinants of depression. The importance of investigating repeat polymorphisms in association with health and disease has been described in previous literature and begins to gain more recognition within the field of genetic research.9, 10, 43, 44, 45, 46

CAG repeat numbers exceeding 36 in ATXN7 are responsible for the severe neurodegenerative disorder spinocerebellar ataxia type 7 (SCA7).34 SCA7 is a progressive autosomal-dominant neurodegenerative disorder primarily characterized by cerebellar ataxia and macular degeneration.47 Although depressive symptoms are a frequent finding in many neurodegenerative disorders, the prevalence of depression has hardly been assessed in SCA7.48, 49, 50, 51 Our findings suggest that depression might be an underappreciated feature of SCA7, which needs further characterisation. A CAG repeat number in TBP larger than 48 causes the severe progressive neurodegenerative disorder SCA17.34 Aside from cerebellar ataxia, dementia and pyramidal symptoms, in 67% of the cases SCA17 is also accompanied by psychiatric signs and symptoms, including depression, behavioural changes as well as psychosis.52 Initial symptoms of SCA17 have been described to include depression.53

The gene ATXN7 encodes the protein ataxin-7. Apart from several important cellular functions,51 ataxin-7 is an integral part of the TATA-binding protein-free TAF-containing complex.54 TATA-binding protein-free TAF-containing complex allows for the initiation of transcription via RNA polymerase II in the absence of the RNA polymerase II transcription factor D.55 Interestingly, the TATA-box-binding protein (TBP) that is encoded by TBP, is the DNA-binding subunit of RNA polymerase II transcription factor D. TBP anchors RNA polymerase II transcription factor D to the TATA-box upstream of the first codon, allowing for the initiation of transcription via RNA polymerase II.56 Thus, TBP as well as ataxin-7 seem to have a part in the initiation of transcription via RNA polymerase II. RNA polymerase II catalyses the transcription of DNA to synthesise precursors of mRNA, snRNA and microRNA.57, 58 Numerous studies indicate an altered mRNA expression of proteins such as the serotonin 5-HT1A receptor,59, 60 brain-derived neurotropic factor61, 62, 63, 64 and corticotrophin-releasing factor65, 66 in subjects suffering from MDD compared with controls. Furthermore, the evidence supporting a role for the dysfunction of microRNA-mediated regulated gene expression in MDD is increasing.67, 68, 69, 70, 71 Therefore, CAG repeat variations in either ATXN7 or TBP could modulate the function of RNA polymerase II, and thereby lead to changes in mRNA and microRNA expression that have previously been associated with MDD.59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 Hence, we intend to investigate the effect of CAG repeat length variations within the normal range in ATXN7 and TBP on the expression and function of proteins thought to be associated with depression in a cellular model.

The presence of one subject with a CAG repeat size in the pathological range of CACNA1A, one subject with a CAG repeat size in the pathological range of TBP and two subjects with a pathological CAG repeat size in ATXN2 within our two cohorts is remarkable. For these individuals, depression is apparently the first symptom with which SCA6, SCA17 or SCA2 manifests itself. The prevalence of an autosomal-dominant cerebral ataxia among Europeans is estimated to be between 1 and 3 per 100 000, of which 2% are diagnosed with SCA6, <1% with SCA17 and 10% with SCA2,72 resulting in a prevalence of 0.02–0.06 per 100 000, <0.01–0.03 per 10 000 and 0.10–0.30 per 100 000 for SCA6, SCA17 and SCA2, respectively. These numbers are in stark contrast to the substantially higher prevalence estimates in patients with depression in our study, 90 per 100 000 for SCA6, 50 per 100 000 for SCA17 and 50 per 100 000 for SCA2. Previously, we also found two individuals with incompletely penetrant HTT alleles (that is, those containing 36–39 CAG repeats) in this same population, both of whom suffered from depression.15 The difference in proportion of individuals with intermediate or pathological CAG repeat lengths in PDAGs indeed tended to be higher in depressed compared with control subjects, although likely due to the relative rarity of these expanded alleles the results did not reach statistical significance. Our findings are in line with another study that estimated the prevalence of incompletely penetrant HTT alleles in MDD patients to be ~3 in 1000, whereas such alleles were absent in the control group.73 Together, these findings suggest that depression could be the first manifestation of polyglutamine diseases and that, conversely, polyglutamine diseases might be underdiagnosed in patients with depression.

In this study, we analysed a homogenous population by using samples from two Dutch cohorts for our analysis, thereby minimising the impact of population stratification. However, the use of a uniform group of individuals could also be seen as a limitation of our study, as it might have consequences for the generalisability of our findings. However, our results appear robust and consistent, although confirmation of our findings in other populations, as well as the elucidation of their pathophysiological basis, is warranted.

In conclusion, we observed a significant association between the risk of lifetime depression and CAG repeat size in ATXN7 and TBP. A relatively large CAG repeat number in both alleles of either ATXN7 or TBP substantially increased depression risk. Our findings add more critical evidence to the notion that repeat polymorphisms could act as complex genetic modifiers of depression and, therefore, could partially account for its ‘missing heritability’. In addition, our findings indicate that the role of DNA repeat polymorphisms as potential genetic modifiers of other psychiatric disorders also needs further scrutiny.