Genetic common variants associated with cerebellar volume and their overlap with mental disorders: a study on 33,265 individuals from the UK-Biobank

Interest in the cerebellum is expanding given evidence of its contributions to cognition and emotion, and dysfunction in various psychopathologies. However, research into its genetic architecture and shared influences with liability for mental disorders is lacking. We conducted a genome-wide association study (GWAS) of total cerebellar volume and underlying cerebellar lobe volumes in 33,265 UK-Biobank participants. Total cerebellar volume was heritable (h2SNP = 50.6%), showing moderate genetic homogeneity across lobes (h2SNP from 35.4% to 57.1%; mean genetic correlation between lobes rg ≈ 0.44). We identified 33 GWAS signals associated with total cerebellar volume, of which 6 are known to alter protein-coding gene structure, while a further five mapped to genomic regions known to alter cerebellar tissue gene expression. Use of summary data-based Mendelian randomisation further prioritised genes whose change in expression appears to mediate the SNP-trait association. In total, we highlight 21 unique genes of greatest interest for follow-up analyses. Using LD-regression, we report significant genetic correlations between total cerebellar volume and brainstem, pallidum and thalamus volumes. While the same approach did not result in significant correlations with psychiatric phenotypes, we report enrichment of schizophrenia, bipolar disorder and autism spectrum disorder associated signals within total cerebellar GWAS results via conditional and conjunctional-FDR analysis. Via these methods and GWAS catalogue, we identify which of our cerebellar genomic regions also associate with psychiatric traits. Our results provide important insights into the common allele architecture of cerebellar volume and its overlap with other brain volumes and psychiatric phenotypes.


INTRODUCTION
The cerebellum has historically been ascribed solely to a role in movement coordination, however, increasing evidence has underlined its relevance in cognition and emotion [1], with expansive functional connectivity with non-motor cortical regions [2][3][4] and activity during a wide-range of cognitive tasks [5]. Lesions during cerebellar development not only lead to motor alterations but also to cognitive and emotional deficits [6], and represent the second highest risk factor for autism spectrum disorder (ASD) [7]. Cerebellar anatomical alterations have also been identified in most other neurodevelopmental/psychiatric disorders, with particularly strong cerebellar-specific evidence in schizophrenia [8], but also in attention deficit hyperactivity disorder (ADHD) [9] and mood disorders [10], general liability to clinical mental disorders [11] and adolescent psychopathology [12].
Cerebellar volume reductions have also been reported in unaffected relatives of people with schizophrenia, bipolar disorder and depression, being also the only structure commonly reduced across all three disorders [13] and suggesting cerebellar volume reductions to be associated with genetic risk for mental disorders. Indeed, analysing shared altered genetic expression across these three disorders as well as ADHD and ASD shows strong cerebellar tissue enrichment [14].
Twin studies show cerebellar volume to be heritable (h 2 = 33.6-86.4%) [15], but little is known about its polymorphic architecture. In this study, we aim to undertake an in-depth investigation of the common variant influences of total cerebellar volume, their association with altered cerebellar gene expression, genetic overlap with other cortical and subcortical anatomical phenotypes, and importantly, with several of these psychiatric disorders shown to be associated with cerebellar anatomical abnormalities (i.e. ASD, ADHD, schizophrenia, bipolar disorder and depression). While a recent omnibus-GWAS study [16] using an overlapping sample to ours, included 30 cerebellar volume metrics, none of these corresponded to total cerebellar volume and did not explore genetic association with mental disorders.

METHODS Total cerebellar volume measure generation
This study utilises T1-weghted structural brain magnetic resonance imaging (MRI) image derived phenotypes (IDPs) data for~40,000 individuals from UK Biobank (http://www.ukbiobank.ac.uk/) (Supplementary Methods). The generation and semi-automated quality control of these IDPs by UK Biobank has been described previously [17]. Our research group accessed the data in two batches, each containing approximately half of the total sample (henceforth wave 1 and wave 2), which we analysed separately before being meta-analysed.
We generated a summated total cerebellar grey-matter volume measure from the 28 cerebellar lobule IDPs [18], aside from Crus I vermis due to its small size [19]. To explore the reliability of UK Biobank's cerebellar volume measures, for the 1273 participants in our study who have been scanned twice by UK Biobank within a 5-year interval (with these second scans not included in our main analyses), between-scan intraclass correlation indicated a high test-retest reliability of our cerebellar volume metric (ICC = 0.92). Following outlier removal, we obtained residual total cerebellar volume values after correction for covariates of age, sex, head motion, date of scan and imaging centre attended, and head and table position in the scanner (see Supplementary Methods for details). We scaled these values, with beta values reflecting differences in standard deviations (SD) of residual cerebellar volume.

Genotyping and quality control
A description of UK-Biobank's genetic-data collection, quality control and imputation processes can be found elsewhere (http://www.ukbiobank.ac. uk/scientists-3/genetic-data/). We applied additional quality controls independently to each wave's genotypes (Supplementary Methods), including restriction to unrelated individuals of British/Irish ancestry (>96% sample) ( Supplementary Fig. 1). Following local processing, from initial samples of 21,390 and 26,541 participants with genetic-data in wave 1 and wave 2,19,170 and 22,808 participants remained, with 7,003,604 and 6,935,580 genetic markers, respectively.

Comparison of GWASs from wave 1 and wave 2
Several methodologies were deployed to assess similarity between summary statistics from both waves, including between-wave SNP replication, LDSC [25] genetic correlation and PLINK [20] polygenic score analyses (Supplementary Methods).

Meta-analysis
We meta-analysed the two waves' GWASs using METAL (2011-03-25 release) [26], weighting effect sizes by the inverse of the standard errors and retaining only the 6,193,476 markers present in both waves. Independent index SNP identification and SNP-based heritability estimates were calculated using the same methods as outlined above, creating a merged wave dataset for SNPs' LD structure and estimates of h 2 SNP (GCTA-GREML).

Within cerebellum analysis-by lobe analysis
To investigate the homogeneity of cerebellar volume genetic architecture, we ascertained the GCTA-GREML h 2 SNP and LDSC between-lobe genetic correlation estimates for 7 cerebellar lobes based on demarcations of primary, horizontal and posterolateral fissures: anterior (I-V), superior posterior (VI-Crus I), inferior posterior (Crus II-IX) and flocculonodular (X) hemispheres and vermal regions of the latter three (Supplementary Methods).

Functional annotation and cerebellar gene expression
We physically mapped the extended-LD regions of each index SNP (r 2 > 0.2 to Index SNP) to nearby transcripts and functionally annotated index SNPs and high LD proxy SNPs (r 2 > 0.8 to index SNP) for SNP consequences using several methods (Supplementary Methods). We additionally mapped these proxy SNPs to GTEx-v7 expression quantitative trait loci (cis-eQTL), focusing on directly relevant cerebellar-labelled tissues, but also including analyses in other brain and whole-blood tissues. Use of Summary data-based Mendelian Randomisation (SMR) [27,28] allowed for assessing mediation via altered cerebellar gene expression of our meta-GWAS identified SNPcerebellar volume associations, and separation of pleiotropic associations from those caused by linkage within the genomic region (Supplementary Methods).

Genetic correlation analysis
We used LDSC to estimate genetic correlations between our total cerebellar volume meta-GWAS summary statistics and previously published GWAS summary statistics from two studies that included different sub-regional cerebellar measures [16,29], cortical and subcortical anatomical measures [30][31][32], anthropomorphic traits (http://www.nealelab.is/uk-biobank/), and psychiatric disorders of schizophrenia, bipolar, major depression and ASD and ADHD [33][34][35][36][37] (Supplementary Methods). We additionally ascertained genetic overlap between cerebellar volume and these psychiatric disorders, irrespective of direction of effect, using conditional and conjunctional false discovery rate (FDR) analysis [38] (see Supplementary Methods). This included analysis of genetic enrichment in our cerebellar GWAS using stratified quantile-quantile (Q-Q) plots, and investigation of which of our COJO-identified GWAS signals contained SNPs showing evidence for a pleiotropic association with a psychiatric phenotype (conjunctional-FDR < 0.01). Finally, for our COJO proxy SNPs, we inspected GWAS catalogue for previous reports of associations with these psychiatric traits, as well as for any additional traits (Supplementary Methods).
Meta-analysis of GWAS results for wave 1 and wave 2 Given the high correlation between waves, we combined both waves' summary statistics in a meta-GWAS (n = 33,265, SNPs = 6,193,476) (Fig. 1) Table 4). Between-lobe genetic correlation was moderate for most (between lobes mean r g ≈ 0.44) and all survived Bonferroni correction for the number of lobe-pairings tested  Table 4).
Annotation and mapping of genome-wide significant regions from the meta-GWAS We found 33 conditionally independent index SNPs associated with total cerebellar volume ( Table 1; Supplementary Fig. 2). All index SNPs in each wave were present within the 33 meta-GWAS index SNPs, all 33 meta-GWAS index SNPs were at least nominally significant in each wave (p values ranging from 7 × 10 −3 to 1.4 × 10 −21 for wave 1 and from 5.3 × 10 −3 to 9.5 × 10 −16 for wave 2) and with all showing the same direction of effect across waves (Supplementary Table 5).
Functional annotation of the 33 independent GWAS signals (index SNPs and high LD partners r 2 > 0.8) (Supplementary Tables 6, 7A, B) identified 5 containing non-synonymous SNPs leading to altered protein structure. Two of these were flagged as likely deleterious: the missense variants rs1800562 within HFE and rs13107325 within SLC39A8 transcripts. The other three nonsynonymous SNPs were flagged as tolerated/benign, being within genes EIF2AK3, PPP2R4 (alias PTPA), and MYCL. A further synonymous annotated SNP located within PAPPA gene was within our strongest GWAS signal (rs72754248 Index SNP).

Genetic correlations
We found high positive genetic correlation above Bonferronicorrected significance threshold (p < 0.0014 {0.05/35}) between our total cerebellum meta-GWAS summary statistics and those of previously published regional cerebellar measures (left & right hemispheres; IIV-V, VI-VII and VIII-IX vermal regions [29] Table 9A). Of the 33 GWAS signals we identified, 28 reached genome-wide significance in these previous works while 5 were novel to the literature (Index SNPs rs6546070, rs6812830, rs3846716, rs3118634 and rs529059). We also found positive genetic correlation (p < 0.005 {0.05/10}) between our total cerebellar volume measure and brainstem, pallidum and thalamus volumes, as well as a trend towards a negative correlation with cerebral cortical surface area but which fell short of our Bonferroni-corrected significant threshold (Table 3A). We found no genetic correlations (p < 0.0083 {0.05/6}) with any anthropomorphic measure, confirming the results not to simply be reflecting general body size measures (Supplementary Table 9B).
We ascertained the genetic correlation between cerebellar volume and liability to psychiatric diagnoses. None showed significant consistent genetic correlation across the genome with cerebellar volume, even at nominal significance (Table 3B). Stratified Q-Q plots, however, suggested a clear enrichment of schizophrenia signal and, to a less degree, bipolar and ASD associations within our total cerebellar volume variants (Supplementary Fig. 4). No apparent relationship was seen with major depressive disorder or ADHD. Conjunctional-FDR analysis revealed 8 of the 33 GWAS signals showing evidence for a pleiotropic relationship with a psychiatric phenotype (5 with schizophrenia, 2 with bipolar, 1 with ASD, and 1 with ADHD), with one GWAS signal (index SNP rs2572397) associating with more than one psychiatric condition: being with decreased cerebellar volume, decreased schizophrenia and increased ASD risk liability (Supplementary  Table 10). In total, the majority (7/9) of pleiotropic associations were in opposing directions of effect to that of cerebellar volume. Finally, we report 2 of our 33 COJO GWAS signals (rs13135092 and rs1935951) as being previously associated with psychiatric traits of schizophrenia, bipolar, ASD, and across-and between-psychiatric disorder diagnoses (Supplementary Table 11A, B).

DISCUSSION
In this study we examined UK-Biobank brain imaging and genotype data of 33,265 individuals to investigate common allele influences on cerebellar volume. We found total cerebellar volume is moderately heritable in our sample (h 2 SNP = 50.6%), identifying 33 independent genome-wide significant signals (index SNPs and SNPs in LD) associated with this trait. We identified 6 within protein-coding sections of the genome while another 5 associated with cerebellar gene expression regulation. We found evidence for pleiotropy of identified variants with schizophrenia, bipolar and ASD. We did not, however, find significant genetic correlations across the whole genome, suggesting a smaller subset of pleiotropic regions and/or opposing direction of effects across these regions. Our main GWAS of total cerebellar volume identified 33 index SNPs, of which 28 had been reported genome-wide significant (p < 5e−8) in previous GWASes of sub-regional cerebellar volume measures [16,29]. The 5 other index SNPs had previously shown subthreshold associations with some of those sub-regional volumes, while reaching GWAS significance level for our composite total volume measure. This overlap suggests an important genetic homogeneity across cerebellar structures, as previously indicated by cerebellar gene expression research [39], and which is further substantiated by our findings of moderate-tohigh genetic correlation between our results and those of previous sub-regional cerebellar GWASes, and also across the 7 cerebellar lobe volumes in which we divided the cerebellum following demarcations of primary, horizontal and posterolateral fissures.
We conducted follow-up analyses of each GWAS signal to identify likely causal SNPs. One signal contained the synonymous SNP rs35565319 in the IGF binging protein protease PAPPA transcript, with previous reports of possible cerebellar-specific interactional effects [40], high placenta expression and association with adverse pregnancy outcomes [41,42] and neuronal survival [43]. Five other GWAS signals contained non-synonymous SNPs altering protein structure. Of the two labelled as likely deleterious, one was the rs13107325 variant within the metal cation symporter SLC39A8 transcript, being previously associated with a wide-range of traits including inferior posterior and flocculonodular lobule [44], striatum and putamen volumes [44,45], schizophrenia [33,45], neurodevelopmental outcomes and intelligence test performance [46,47] and numerous other factors [44,[48][49][50] (http://www.nealelab.is/uk-biobank/). The other was the rs1800562 variant (alias Cys282Tyr) within the homoeostatic iron regulator HFE transcript, with associations with reduced putamen volume and striatal T2star signal [44], and iron and mineral regulation [44,51,52]. The other three non-synonymous SNPs included those within translation initiation factor kinase (EIF2AK3), proto-oncogene transcription factor (MYCL) and protein phosphatase 2A activator (PPP2R4 alias PTPA) protein-coding regions. The novel PTPA finding agrees with previous work of the role of phosphatase 2A controlling cell growth and division, regulating dendritic spine morphology [53] and whose dysfunction is a known cause of spinocerebellar ataxia [54].
We also mapped 6 of GWAS signal regions with cis-eQTLs altering expression of 14 gene transcripts. Expanding the cis-eQTL analysis to additional brain regions and whole blood, we identified a further 3 GWAS signals mapping to 5 cis-eQTLs. SMR further investigated possible cerebellar expression mediation of SNP-trait associations for six gene transcripts at 3 GWAS signal regions, including again the PPP2R4/PTPA transcript. The strongest SMR association was with VCAN, encoding the extracellular matrix protein Versican, which plays crucial roles in nervous system development [55,56]. The pseudogenes FAM86B3P and FAM85B were also identified from the SMR analysis, with FAM85B and the other non-coding gene cis-eQTLs for RP11-481A20.10 and RP11-481A20.11 in the same region having been indicated in mood instability and schizophrenia [57,58]. While a higher confidence can be placed on SMR identified genes, its requirement for multiple cis-eQTL signals within a genomic region means genes with poorer coverage might be omitted, therefore both cis-eQTLonly and SMR identified genes should be considered for future follow-up work.
In total, therefore, from 732 unique gene transcripts overlapping with the extended-LD regions of our 33 index SNPs, functional annotation and cerebellar tissue gene expression mapping refined this to a list of 21 gene transcripts particularly warranting further interrogation (Supplementary Table 12).
Through inspection of GWAS Catalogue, we identified 2 GWAS signals (rs13135092 and rs1935951) previously associated with Table 3. Genetic correlation of total cerebellar volume with (A) brain-based phenotypes and (B) brain-related phenotypes previously associated with cerebellar anatomy/function. schizophrenia, and the former also with bipolar disorder, ASD and PGC cross-psychiatric disorder associations. Furthermore, using conjunctional-FDR analysis-leveraging genomic pleiotropy to indicate pleiotropic regions which might be below genome-wide significance for each psychiatric GWAS-we not only confirm psychiatric associations at these 2 GWAS signals with schizophrenia, but also identified 6 other GWAS signals with evidence for psychiatric pleiotropy (rs7530673 and rs1278519 with bipolar disorder; rs7640903 with ADHD; rs3118634 and rs62012045 with schizophrenia; rs2572397 with schizophrenia and ASD). Of these 8 GWAS signals, 6 followed the expected opposing direction of effect as would be predicted from case/control studies [8,11], e.g. associating with increased psychiatric risk liability and decreased cerebellar size, whereas rs13135092 and rs2572397 showed the same direction of effect for both traits. Related to this, while we found evidence for enrichment of our cerebellar GWAS for schizophrenia, bipolar disorder and ASD using stratified Q-Q plots, in accordance with the majority of other structural brain phenotype GWASs [30,32], we did not find a whole-genome level correlation when using LDSC, indicating regional heterogeneity of effect directions. These results highlight the benefit of using multiple methods to investigate genetic overlap between traits, as previously stressed [38,59].
We found strong genetic correlation between our total cerebellar volume GWAS and those of the brainstem, pallidum and thalamus [32] but not other subcortical structures, cortical surface area or thickness [30][31][32]. These results agree with previous reports of a particular clustering of these three subcortical volumes [32,60] and contrast to the significant phenotypic correlations amongst most subcortical volumes [32]. Importantly, the gene expression profile of cerebellar grey matter is quite distinct [39]. This shared common architecture, therefore, could be explained by cerebellar white matter connectivity between these regions. The major cerebellar input and output nuclei located within the brainstem and thalamus, respectively. Cerebellar-pallidal interactions are known to occur within the cortex, thalamus and via direct connections [61][62][63], with joint roles in sensorimotor regulation, learning and reward [61]. The common allele overlap found across these four brain structures, therefore, warrants further research into the neurobiological underpinnings of this potential network and its role in psychopathology, particularly given the association between cerebellothalamic and cerebellar-basal ganglia connectivity dysfunction in individuals with schizophrenia [64,65].
There are several features of the study design to consider when interpreting the results presented. While the UK Biobank's homogenous data collection and processing helps decrease methodological variation, the cohort does not represent the general UK population, deviating in important socioeconomic and demographic measures [66]. We further limited our analyses to participants with ancestry similar to a British and Irish reference (>96% sample), limiting the extrapolation of our results to other ancestries. Regarding the imaging data, while visual inspection of each segmentation was not possible due to the cohort size, we believe the UK Biobank's semi-automated image artefact detection, our removal of outlier measures, confirmation of reliability of cerebellar measures in individuals with repeat scans, and correction for potential noise due to participants' head motion and position within the scanner improve the validity of our cerebellar measures. The UK Biobank's IDPs, however, are not optimised for the cerebellum, which can lead to poorer registration and segmentation of individuals lobules [67]. For this reason, as well as the high correlation between lobules and its conserved cytoarchitecture, our main analyses focused on total cerebellar volume. Lack of access to raw genotypes for the psychiatric phenotype GWASs prevented the use of methods such as bivariate GCTA-GREML which could have brought further insight into their genetic relationship with cerebellar volume.
In conclusion, we provide a genome-wide association study of the common genetic variation underlying human cerebellar volume. We find a moderate-to-high heritability for cerebellar volume, with relatively consistent heritability across lobes, and sharing common allele influences with brainstem, pallidal and thalamic volumes. We report enrichment for schizophrenia, bipolar and ASD signals, but not for major depression and ADHD. As a guide for future functional studies, we identify 33 independent index SNPs associated with cerebellar volume and 21 unique candidate genes for follow-up work: 6 protein-coding variants and 14 cerebellar tissue cis-eQTL associations, with 6 (4 common with the latter) showing potential causal relationships with gene expression. Overall, these results advance our knowledge on the common allele architecture of the cerebellum and pave the way to further research into the neurobiological basis of its anatomy, and associations with psychiatric conditions.

DATA AVAILABILITY
Summary statistics from all GWAS analyses run are available from GWAS catalog (GCP000196).