Introduction

Obesity is a growing pandemic and acts as a major risk factor for a variety of prevalent chronic disorders, including cardiovascular, metabolic, inflammatory and neoplastic diseases [1]. Several studies have estimated the heritability of body mass index (BMI) at around 40–70% [2,3,4]. However, the BMI-associated loci identified in the largest meta-analysis of genome-wide association studies (GWAS) to date explained only ~2.7% of the variation [5], indicating a large degree of ‘missing heritability’. The GWAS approach, irrespective of its crucial contribution to the genetic mapping of complex human traits, neglects the effect of dynamic mutations on body composition, in the way trinucleotide expansions, for instance, associate with neurodegenerative disorders [6,7,8]. Recent studies have indeed shown that variations in these highly unstable repeat expansions can result in phenotypic consequences for organisms [9]. Nine hereditary neurodegenerative diseases, including Huntington′s Disease (HD), are caused by protein-coding trinucleotide expansions consisting of cytosine–adenine–guanine (CAG) repeats (Table 1) [10, 11]. Alongside motor impairment and neuropsychiatric disturbances, these disorders are often also accompanied by severe weight loss and metabolic disturbances. Given recent findings that even CAG repeat length variations in the non-mutant range in polyglutamine disease-associated genes (PDAGs) can act as risk factors for neuropsychiatric conditions [12,13,14], we hypothesized that these prevalent polymorphisms may also act as genetic risk factors of BMI.

Table 1 Summary of genotyped polyglutamine disease-associated genes (PDAGs)

Subjects and methods

Subjects

The nine known PDAGs (including ATXN1, ATXN2, ATXN3, CACNA1A, ATXN7, TBP, HTT, ATN1 and AR) were genotyped in all participants with sufficient amounts of DNA available from blood samples of two well-characterized cohorts: the Netherlands Epidemiology of Obesity (NEO) study and the Prospective Study of Pravastatin in the Elderly at Risk (PROSPER) study (Table 1 and Supplementary Tables 13). The NEO is a cohort study among 6671 men and women aged 45–65 years living in the greater area of Leiden, the Netherlands, with an oversampling of overweight or obese individuals. A total of 5217 participants had a BMI of 27 kg/m2 or higher. This study was approved by the medical ethical committee of the Leiden University Centre (LUMC) and written informed consent was obtained from all participants [15]. The PROSPER is a cohort study among 5786 men and women between 70–82 years old with a pre-existing vascular disease or a raised risk for such a disease. Participants were recruited from three countries with 2517 individuals from Scotland, 2173 individuals from Ireland and 1096 individuals from the Netherlands. The study was approved by the institutional ethics review boards of all centres and written informed consent was obtained from all participants [16]. A post-hoc power calculation using the sample sizes of the NEO and PROSPER cohorts combined (n = 12,457) showed that, at a significance level of α = 0.0056 (0.05/9, because of the nine tested PDAGs), this sample size enabled detection of a very small effect size equalling to R2 = 0.001 or larger with a statistical power of ≥0.78 (calculated using G*Power version 3.1.9.2) [17].

Genotyping

To determine the CAG repeat length in the nine PDAGs for each individual, a polymerase chain reaction (PCR) was performed in a TProfessional thermocycler (Biometra, Westburg) with labelled primers flanking the CAG stretch of the PDAGs (Biolegio) (Supplementary Table 4). The PCR was performed using 10 ng of genomic DNA, 1× OneTaq mastermix (New England Biolabs, OneTaq Hot start with GC Buffer master mix), 1 µl of primer Mix A or B (Supplementary Table 4) and Aqua B. Braun water to a final volume of 10 µl. The PCR was run for 27 cycles of 30 s, denaturation at 94 °C, 1 min of annealing at 60 °C and 2 min elongation at 68 °C, preceded by 5 min of initial denaturation at 94 °C. Final elongation was performed at 69 °C for 5 min. Every PCR included a negative control without genomic DNA and a reference sample of CEPH 1347-02 genomic DNA. The PCR products were run on an ABI 3730 automatic DNA sequencer (Applied Biosystems) and analyzed using the GeneMarker software version 2.4.0. For every analysis, we included three controls with known CAG repeat lengths for each PDAG to assure every run was performed reliably. All assessments were performed by randomizing study participants across batches while researchers were blinded with respect to the clinical information.

Statistical analysis

We initially assessed the relation between CAG repeat sizes in the two alleles of each PDAG and BMI for each cohort separately (Supplementary Tables 5 and 6). Next, to combine the results of both cohorts reliably, we first constructed parsimonious models for each cohort with the CAG repeat lengths of both alleles of each PDAG as independent variables (Supplementary Tables 7 and 8). Subsequently, we only combined the data for PDAG alleles whose effects on BMI were directionally consistent. We applied a generalized linear mixed-effect model with BMI as the outcome variable and the CAG repeat lengths of both alleles as fixed effects. To assess potential interaction or non-linear effects [12, 18], we also included a product term of both alleles and a quadratic term for each allele. When the effect on BMI of only one allele was consistent between the two cohorts, we only included the quadratic term of that specific allele. Cohort (i.e. NEO or PROSPER) and country (i.e. Scotland, Ireland or the Netherlands) were set as random effects to account for potential population stratification. Non-significant higher order terms were removed from this original model and the analysis was repeated to arrive at a final model. All final models were corrected for age, sex and population structure using principle components generated from genome-wide genotyping data [19, 20]. The NEO data were weighed to the BMI distribution of the general population (the weight factor given to PROSPER participants was set at 1). To reduce multicollinearity, all continuous variables were centred around their respective means. Furthermore, we calculated the marginal R2 per PDAG for each model to determine the amount of variance explained by each gene [21]. To account for potential effects of heteroscedasticity and influential points, all statistical significance tests were based on robust estimators of standard errors, and all CAG repeat lengths with a frequency of less than ten were excluded. In addition, we excluded related participants and participants with a non-Caucasian ethnicity to increase homogeneity (Supplementary Tables 911). For the results of the combined cohort, we applied a false discovery rate (FDR) correction to account for multiple testing, assuming nine independent tests with q set at 0.05 [22].

To illustrate the combined effect of the significant CAG repeat size variations in PDAGs on BMI we (1) calculated the residual BMI after regression on age and sex as fixed factors and cohort and country as random factors in a linear mixed-effects model, (2) performed linear regression with CAG repeat sizes in the alleles of the PDAGs significantly associated with BMI (including all interaction and non-linear effects which were identified as significant in the main analyses) as the independent variables and this residual BMI as the outcome, (3) divided the total cohort in four equally sized groups based on quartiles of the predicted values of this regression model, and (4) plotted the average BMI residual for each of these quartiles. All data are displayed as means and 95% confidence intervals (CIs) unless otherwise specified. All analyses were performed in STATA/SE version 14.2 (StataCorp LLC).

Results

We were able to determine the CAG repeat length between 11,641 and 12,100 participants of both cohorts for each gene (Table 1). The lacking samples were due to too little available DNA and were missing completely at random. Between 6.9 and 7.4% were subsequently excluded due to CAG repeat lengths with a frequency of less than ten, participants being related or of non-Caucasian ethnicity (Supplementary Tables 911), leaving a total of 10,832–11,222 participants per gene for the analyses with 5485–5676 from the NEO cohort and 5276–5615 from the PROSPER cohort.

In the NEO cohort, we found four PDAGs that were significantly associated with BMI (including ATXN1, ATXN2, ATXN3 and TBP) (Supplementary Table 5). Seven PDAGs in the PROSPER cohort were significantly associated with BMI (including ATXN1, ATXN2, ATXN3, CACNA1A, ATXN7, TBP and HTT) (Supplementary Table 6). Between the two cohorts, the effect on BMI of at least one allele was in the same direction for eight PDAGs (Supplementary Table 7 and 8). The data of only these directionally consistent alleles were combined. The effects of both HTT alleles were not consistent and therefore not combined (Table 2). After combining the data of the directionally consistent alleles, we found a total of seven PDAGs (including ATXN1, ATXN2, ATXN3, CACNA1A, ATXN7, TBP and AR) to be significantly associated with BMI (Table 2). For 5744 participants in the NEO and 5244 participants in the PROSPER cohorts, we obtained principle components generated from genome-wide genotyping data as described before [19, 20]. We corrected for age, sex and population structure using these principle components. This correction did not substantially alter our results (Supplementary Table 12).

Table 2 The association between polyglutamine disease-associated genes (PDAGs) and body mass index (BMI) in the combined cohort

In the combined cohort, the longer alleles of ATXN1, ATXN2 and CACNA1A were significantly associated with BMI. The association between BMI and the longer alleles of ATXN1 and CACNA1A was quadratic, implying that both shorter and longer CAG repeat lengths were associated with a lower or higher BMI, respectively (Fig. 1a, b). The longer allele of ATXN2 was associated with BMI in a linear fashion. Higher numbers of CAG repeats were associated with a higher BMI (Fig. 1c). For ATXN3, the interaction between the two alleles affected BMI (Table 2). Given that the effect of CAG repeat size in the shorter and longer ATXN3 allele on BMI was in opposite direction, we calculated the difference in CAG repeat size between the longer and shorter ATXN3 alleles and found this difference to have a quadratic association with BMI (Fig. 1d). Furthermore, the shorter alleles of both ATXN7 and TBP had a quadratic association with BMI (Fig. 1e, f). Lastly, we examined the effect on BMI of the CAG repeat size in the X-linked AR gene, for which we (1) analyzed men and women separately, and (2) investigated either the shorter or the longer AR allele in men and women combined. In men, long CAG repeat lengths resulted in an exponential decrease of BMI, whereas in women, the longer AR allele had a quadratic association with BMI (Table 2). When analyzing the AR gene in men and women combined, a longer AR CAG repeat size in the longer allele was also associated with lower BMI (Fig. 1g). To estimate the total percentage of variation in BMI explained by these seven PDAGs, we calculated the marginal R2 for the final model including all the alleles which were significantly associated with BMI in the per gene analysis (Table 2). For AR, we included only the longer allele. The seven PDAGs that were significantly associated with BMI accounted for 0.75% of its variation in the combined cohort. Additional analysis of the combined effect showed that the difference in BMI between the lowest and highest quartile of the prediction score calculated based on the CAG repeat sizes in these seven PDAGs was about 0.42 kg/m2 (corresponding to 1.29 kg for an individual 1.75 m in height) (Fig. 2).

Fig. 1
figure 1

Scatterplots of the association between body mass index (BMI) and polyglutamine disease-associated genes (PDAGs). Shorter and longer CAG repeats lengths in the longer alleles of ATXN1 (a) and CACNA1A (b) were associated with a lower and higher BMI, respectively. c Larger CAG repeat numbers in longer allele of ATXN2 were associated with a higher BMI. d The difference in CAG repeat number between the shorter and longer ATXN3 alleles had a quadratic association with BMI. Larger and smaller differences between these alleles were associated with a lower BMI. e Shorter and longer CAG repeats in the shorter ATXN7 allele (e) and the shorter TBP allele (f) were associated with a higher and lower BMI, respectively. g The longer allele of AR had a quadratic association with BMI. Shorter and longer CAG repeats were associated with a higher BMI. Beta-coefficient ± SE. CI confidence interval

Fig. 2
figure 2

The effect of CAG repeat size variations in polyglutamine disease-associated genes (PDAGs) on body mass index (BMI). This plot illustrates that in combination, CAG repeat size variations in only seven PDAGs can account for a variation of up to ~0.42 kg/m2 in BMI. Please refer to the Methods section for details on how the ‘Predicted Score’ was constructed

Discussion

Metabolic disturbances occur in many neurodegenerative diseases, including polyglutamine disorders [23]. Spinocerebellar ataxia type 3 (SCA3), one of the most prevalent polyglutamine diseases worldwide, is frequently complicated by unintended weight loss. The number of CAG repeats in the longer ATXN3 allele was shown to have an inverse association with BMI in SCA3 patients [24, 25]. We found that a larger difference between both ATXN3 alleles was associated with a lower BMI. These results are consistent with the decreased BMI in SCA3 patients as the longer ATXN3 allele needs to have a relatively large number of CAG repeats in order for the difference with the shorter allele to be large. Furthermore, amyotrophy has been reported in SCA1 and SCA6 patients with SCA1 patients displaying a higher resting state energy expenditure and fat oxidation compared to age, sex and body composition matched controls [26, 27]. Consistent with these characteristics, the curvilinear association between BMI and the CAG repeat number in the longer ATXN1 allele indicated that larger CAG repeat numbers also led to a lower BMI. The association between BMI and the CAG repeat length in CACNA1A was not consistent with the reported SCA6 characteristics, suggesting that the relationship between CACNA1A and BMI is different for the ‘healthy’ range compared to the diseased range. Including the diseased range in future research could provide additional insights into the overall effect of CACNA1A on BMI. Together, these results indicate that the effects of PDAGs on metabolism are not confined to the pathological range and may represent a homoeostatic property of the polyglutamine domains of the encoded proteins in systemic energy regulation [28].

The other PDAGs have also been suggested to be involved in the regulation of BMI and metabolism. For instance, normal ranged AR CAG repeat sizes, which determine androgen receptor sensitivity to testosterone, have been associated with body fat mass and blood lipid levels before [29, 30]. Recent research also implicates ATXN2 in metabolic processes. ATXN2 knockout or transgenic mice display changes in body weight, insulin sensitivity and fertility [31, 32]. Furthermore, an single nucleotide polymorphism (SNP) located in the A2BP1 gene which encodes the ataxin-2 binding protein 1 (also known as FOX-1) was associated with percentage of total body fat in Pima Indians [33], while an single nucleotide polymorphism in ATXN2L encoding ataxin-2-like protein, which interacts with ataxin-2, has been related to BMI [5, 34]. Other obesity-related SNPs change the affinity of the thymine–adenine–thymine–adenine (TATA) box-binding protein (TBP) encoded by TBP for human gene promoters, suggesting a possible pathophysiological mechanism for obesity involving TBP [35].

Cognitive and behavioural changes are key characteristics of polyglutamine disorders. However, little is known about the extent to which repeat variations within the ‘healthy’ range result in similar deficits and whether these could cause changes in BMI. In previous research, we found a significant association between the risk of lifetime depression and the CAG repeat numbers in ATXN7 and TBP [12]. The association between depression and obesity has been well established and a meta-analysis of longitudinal studies showed that obese individuals had a 55% increased risk of depression and depressed individuals had a 58% increased risk of becoming obese [36]. Interestingly, the association of the CAG repeat number in the shorter ATXN7 allele with BMI and depression was consistent with larger CAG repeat numbers leading to both a higher risk of lifetime depression and a higher BMI [12]. ATXN7 encodes ataxin-7 (ATXN7), a member of the TATA-binding protein-free TAF complex (TFTC) and the SPT3/TAF9/GCN5 acetyltransferase (STAGA) complex. These complexes are coactivators involved in the initiation of gene transcription via RNA polymerase II [37]. Through modification of the transcription of RNA polymerase II-dependent genes, ATXN7 repeat variations could cause obesity resulting in depression via metabolic pathways, such as inflammatory responses, dysregulation of the hypothalamic–pituitary–adrenal axis (HPA axis) and alterations in the brain due to diabetes mellitus and insulin resistance [38,39,40,41,42,43,44,45,46,47,48,49]. In addition, increased psychological stress, body dissatisfaction, physical pain and a decreased self-esteem due to obesity could also cause depression [50,51,52]. Conversely, repeat polymorphisms in ATXN7 could cause depression leading to obesity through the adoption of an unhealthy lifestyle, including insufficient physical exercise and unhealthy dietary preferences [53]. AR CAG repeat variations have also been previously associated with depression in men. Larger CAG repeat numbers in AR lead to lower transcriptional effects of testosterone and were associated with depressive symptoms [54,55,56]. Furthermore, larger CAG repeat numbers in AR were associated with lower test scores on three cognitive tests in older white men and decreased effects of testosterone have been associated with cognitive problems in rodents, such as decreased performances in spatial learning, memory and inhibitory avoidance tasks. Different studies have shown that working memory and episodic memory are core cognitive processes critical for food-related decision-making, and that disruption of these processes contributes to problems with appetite control and weight gain [57]. Therefore, high CAG repeat numbers in AR and the resulting decreased transcriptional effects of testosterone might lead to cognitive deficits that in turn could result in changes in appetite control and BMI.

We recognize that our cohort size was relatively small compared to the sample sizes usually included in GWAS. However, the fact that we were able to find many tandem repeat polymorphisms in the PDAGs significantly associated with BMI implies that our study was sufficiently powered to detect these effects. In addition, our sample size allowed us to find relatively small effects similar to, for instance, the effect of the type 2 diabetes-associated A allele at rs9939609 linked to the FTO gene that was associated with a median per-allele change of ~0.36 kg/m2 and explained a variance in BMI of ~1%, or the effect of the C allele at rs17782313 linked to the MC4R gene that was associated with a difference in BMI of ~0.22 kg/m2 per allele and explained ~0.14% of the variance in BMI [58, 59]. Although increasing the sample size might have resulted in the detection of even more and even smaller effects, we must affirm that determining the repeat numbers in these genes was not a straightforward process, could not be automated and was extremely laborious. This fact also compelled us to focus on a set of predefined and promising genes with repeat variations which are known to be (1) related to changes in protein function, and (2) causative of (brain) disorders which are accompanied by profound metabolic disturbances. Nonetheless, many more interesting tandem repeat polymorphisms exist in the human genome, and future research is warranted to delineate the effects of these other repeat polymorphisms on BMI [60]. Recently, a method was described that could allow genome-wide imputation of short tandem repeats (STRs) from SNP data using a phased SNP/STR haplotype panel generated from available whole-genome sequencing datasets [61]. However, these SNP/STR haplotypes have not been published yet, but once these data become publicly available, this panel could be used to test the association between many STR variations and BMI within the myriad of existing data.

To our knowledge, the association between normal ranged CAG repeat polymorphisms in the nine PDAGs and BMI was not assessed before and the SNPs previously found associated with BMI were not located in or near the investigated PDAGs [5]. Through linkage disequilibrium (LD) analysis, several studies found haplotypes associated with expanded or large ‘healthy’ ranged CAG repeat numbers in ATXN1, CACNA1A, ATXN7 and AR [62,63,64,65,66,67]. However, these associated haplotypes differed substantially per investigated population. In addition, the CAG repeat sequence in PDAGs are directly translated in the respective proteins and have important functional consequences [68]. Therefore, the CAG repeat sequence itself is likely to lead to the variation in BMI. Although we cannot fully exclude potential modifying effects of other genetic loci in linkage disequilibrium with PDAGs, the fact that tagging SNPs in or around PDAGs have not been related to BMI before suggests that the influence of other genetic variants in linkage disequilibrium with these triplet repeats is likely to be minimal [5].

In summary, we found the CAG repeat size in seven PDAGs to be significantly associated with BMI in two large study populations accounting for 0.75% of the total variation. As PDAGs are known to be critically implicated in processes which recently were identified through pathway analysis to be involved in obesity susceptibility, including synaptic function and glutamate signalling, and can be specifically targeted by promising therapeutics currently in development for polyglutamine disorders, including gene suppression strategies [69], our results open a novel therapeutic avenue for obesity treatment. In conclusion, we demonstrate the relevance of trinucleotide repeats as a new class of genetic risk factors of obesity and provide further evidence for the fundamental link between the brain and metabolism.