Genome-wide association study of corticobasal degeneration identifies risk variants shared with progressive supranuclear palsy

Corticobasal degeneration (CBD) is a neurodegenerative disorder affecting movement and cognition, definitively diagnosed only at autopsy. Here, we conduct a genome-wide association study (GWAS) in CBD cases (n=152) and 3,311 controls, and 67 CBD cases and 439 controls in a replication stage. Associations with meta-analysis were 17q21 at MAPT (P=1.42 × 10−12), 8p12 at lnc-KIF13B-1, a long non-coding RNA (rs643472; P=3.41 × 10−8), and 2p22 at SOS1 (rs963731; P=1.76 × 10−7). Testing for association of CBD with top progressive supranuclear palsy (PSP) GWAS single-nucleotide polymorphisms (SNPs) identified associations at MOBP (3p22; rs1768208; P=2.07 × 10−7) and MAPT H1c (17q21; rs242557; P=7.91 × 10−6). We previously reported SNP/transcript level associations with rs8070723/MAPT, rs242557/MAPT, and rs1768208/MOBP and herein identified association with rs963731/SOS1. We identify new CBD susceptibility loci and show that CBD and PSP share a genetic risk factor other than MAPT at 3p22 MOBP (myelin-associated oligodendrocyte basic protein).

C BD is a late-onset neurodegenerative disorder that can only be diagnosed upon neuropathologic examination, which has impeded the ability to conduct genome-wide association studies (GWASs). Similar to progressive supranuclear palsy (PSP), Alzheimer's disease (AD), and chronic traumatic encephalopathy, CBD is a tauopathy, having abnormal aggregates of microtubule-associated protein tau in the brain 1 . Neuropathologic diagnostic criteria for CBD are based on tau immunohistochemistry, requiring tau inclusions in neurons and glia, with tau astrocytic plaques, and extensive thread-like pathology in both grey matter and white matter 1 .
First described in 1968 by Rebeiz and colleagues, three CBD cases were identified to have a distinct movement disorder and neuropathologic profile 2 . These patients presented with what is now termed corticobasal syndrome (CBS) where patients can exhibit levodopa-unresponsive parkinsonism, asymmetric akinesia and rigidity, accompanied by ideomotor apraxia, dystonia, and myoclonus. Patients who present with the archetypal CBS, however, do not necessarily have CBD pathology upon neuropathologic examination. Furthermore, autopsy-confirmed CBD cases can often present with several different clinical syndromes 3 , which are frequently associated with other underlying neurodegenerative disorders such as PSP, AD, and frontotemporal dementia 4 . This phenotypic variability results in o50% of patients with CBS having CBD at autopsy [5][6][7] .
There are few reported families affected with CBD or PSP [8][9][10] , yet genetic association studies of PSP have repeatedly shown an increased risk for individuals carrying the H1 MAPT haplotype at chromosome 17q21 (refs 11-14). An association of CBD with the H1 haplotype was first reported in a series of 18 cases (16 corticobasal syndrome patients and two autopsy-confirmed CBD cases) 15 and subsequently confirmed in a larger autopsy series of 57 cases 16 . Recently, the first PSP GWAS 17 discovered three non-MAPT susceptibility loci at STX6, EIF2AK3, and MOBP, raising the possibility that additional non-MAPT genetic risk factors or modifiers may also exist for CBD. Recognition of the lack of specificity of the corticobasal syndrome for CBD has promoted increasing interest among clinicians to have autopsy confirmation of patients thought to have CBD. Brain banks focusing on atypical parkinsonian disorders, such as that supported by the Foundation for PSP|CBD and Related Brain Diseases 18 , have facilitated accumulation of sufficient number of CBD cases to make identification of genetic contributions more feasible. Here, we report a two-stage GWAS in autopsy-proven CBD cases (n ¼ 152 discovery stage, n ¼ 67 replication stage) compared with controls (n ¼ 3,311 discovery stage, n ¼ 457 replication stage). The results confirm the association at MAPT and identify a novel susceptibility locus at 8p12 associated with CBD at the genome-wide significant level, as well as suggestive associations at 2p22 and 3p22. Importantly, we show that CBD and PSP share a genetic risk factor other than MAPT, at 3p22 MOBP. These findings highlight the genetic similarities and differences between CBD and PSP and suggest that the two disorders may, at least in part, share common disease processes.

Results
Discovery GWAS and replication. Genotype data from 152 autopsy-proven CBD cases and 3,311 controls (Table 1) were previously generated as part of the PSP GWAS 17 . The National Institute of Health Office of Rare Diseases Research criteria were used for making a neuropathologic diagnosis of CBD 1 . Discovery stage cases collected from multiple institutions (Supplementary Table 1) were genotyped by the Center for Applied Genomics at the Children's Hospital of Philadelphia using Human 660 W-Quad Infinium BeadChips, and control samples were genotyped using the Illumina Human HapMap550 Infinium BeadChip. Replication stage samples included 67 autopsy-proven CBD cases (Mayo Clinic Florida Brain Bank, 6 CBD cases that failed quality control and 61 new CBD cases collected since the GWAS genotyping was performed) and 457 control individuals (mean age at blood draw ¼ 74 years) collected from Mayo Clinic Florida, all of which were clinically diagnosed as being free of any neurological disorder. For the replication stage, genotyping was performed on the 67 CBD cases using Taqman genotyping assays or Sanger sequencing and 30 CBD cases from the discovery stage as an internal genotyping control.
These data were subjected to accepted GWAS quality control measures. Individual samples were excluded with a genotype failure rate 42%, cryptic relatedness or duplicate samples based on identity by state. Genetic outliers were excluded on the basis of distance to the nearest-neighbour approach, and population substructure was assessed using multidimensional scaling (MDS) analysis with PLINK 19 (http://pngu.mgh.harvard.edu/Bpurcell/ plink/; Supplementary Fig. 1). Quantile-quantile (QQ) plots and the genomic inflation factor (l) show that the first principal component as a covariate in logistic regression analyses was sufficient to correct for population substructure based on a reduction of l from the unadjusted (l ¼ 1.06) to adjusted (l ¼ 1.01; Supplementary Fig. 2).
For the discovery stage, we analysed association between disease and 533,898 single-nucleotide polymorphisms (SNPs) in 152 CBD cases and 3,311 control individuals by conditional logistic regression under an additive model using the first MDS principle component as a covariate using PLINK 19 . Given the neuropathologic, clinical, and genetic overlaps between CBD and PSP, we selected the top PSP GWAS SNPs at MAPT (the H1c subhaplotype-tagging SNP, rs242557), MOBP, EIF2AK3, and STX6 to test for association with CBD. The discovery stage identified one genome-wide significant association at 17q21.31, which encompasses MAPT (rs393152, odds ratio (OR) ¼ 3.45, P ¼ 6.71 Â 10 À 9 ) testing using conditional logistic regression analysis ( Fig. 1a; Table 2) and 12 SNPs in LD with rs393152 (Supplementary Table 2). QQ plots including and excluding SNPs at 17q21 show that the P value distribution is consistent with the null, except for departures in the extreme tail ( Supplementary  Fig. 3). We also found nominal evidence for association testing using conditional logistic regression analysis at two novel susceptibility loci-rs643472 (OR ¼ 1.88, P ¼ 7.12 Â 10 À 7 ), an intronic SNP in lnc-KIF13B-1 (TCONS_00014956, a large intergenic non-coding RNA, or lincRNA) located between KIF13B and DUSP4 on chromosome 8p12, and rs963731 (OR ¼ 2.46, P ¼ 2.04 Â 10 À 6 ), an intronic SNP in SOS1 on chromosome 2q22 ( Supplementary Fig. 4). Testing for association using conditional logistic regression analysis with the top PSP GWAS SNPs identified rs1768208 (OR ¼ 1.65, P ¼ 3.86 Â 10 À 5 ), an intronic SNP in MOBP (myelin-associated oligodendrocyte basic protein), and rs242557 (OR ¼ 1.48, P ¼ 1.20 Â 10 À 3 ) at the MAPT locus to be associated with CBD. The top SNP at EIF2AK3 (rs7571971) did not show an association with CBD (P ¼ 0.057; OR ¼ 1.27), but the OR was of similar effect size as observed for PSP (Supplementary Table 3).

ARTICLE
In an independent cohort of 67 autopsy-proven CBD cases from the Mayo Clinic Florida Brain Bank and 457 control individuals, a replication stage was performed for a total of seven SNPs: the top five SNPs from the discovery stage (Po10 À 5 ) and two PSP GWAS SNPs that showed nominally significant association with CBD (rs1768208 and rs242557) ( Table 2). Association testing for these seven SNPs was performed using conditional logistic regression under an additive model with age and sex as covariates. Using PLINK to perform random effects meta-analysis, or the weighted average of the effect sizes of the discovery and replication stages, showed genome-wide significant associations at MAPT (rs393152, P meta ¼ 1.4 Â 10 À 12 ) (Fig. 1a) and lnc-KIF13B-1 (rs643472, P meta ¼ 3.4 Â 10 À 8 ; Table 2) (Fig. 1b). SNPs at MOBP (rs1768208, P meta ¼ 2.1 Â 10 À 7 ) (Fig. 1c), SOS1 (rs963731, P meta ¼ 1.8 Â 10 À 7 ) (Fig. 1d) and MAPT H1c haplotype (rs242557, P meta ¼ 7.9 Â 10 À 6 ) had suggestive evidence for association with CBD. Performing the same random effects meta-analysis while including the number of H1 alleles as a covariate, the association was lost (rs242557, OR ¼ 1.18, P meta ¼ 0.11). The MAFs for the top SNPs in younger controls of this study did not differ significantly from those of older controls (N ¼ 3,720) in three data sets from the National Institute of Health Database for Genotypes and Phenotypes (dbGaP; http://www.ncbi.nlm.nih.gov/gap; Supplementary  Table 4). Furthermore, we repeated the discovery stage CBD GWAS with a different control data set, 1,986 controls from dbGaP data set NeuroGenetics Research Consortium (NGRC-PD; phs000196.v2.p1), analysed by logistic regression under an additive model and found similar association results (Supplementary Table 5).
Genotype imputation was performed to increase the resolution of the top susceptibility loci. CBD cases and controls were imputed to samples from the 1,000 Genomes Project (May 2011, European reference population) 20  Top susceptibility loci were further examined with HaploRegv2 (http://www.broadinstitute.org/mammals/haploreg/documenta-tion_v2.html), a tool for chromatin profiling, which allows one to explore the impact of non-coding variants, such as those identified in GWASs, by using in silico analysis coupled with experiments to validate biochemical and functional activators and repressors 23,24 . Results show that rs643472 is located in an enhancer region in multiple cell lines and alters three transcription factor motifs, rs963731 at 2p22 alters 18 regulatory motifs, and rs1768208 at 3p22 is in high LD (R 2 ¼ 0.89-0.98) with five SNPs that are located in enhancer regions in several brain regions (Supplementary Table 6 and Supplementary Fig. 5). Two of the five SNPs (rs631312 and rs1768190) were further genotyped in CBD (n ¼ 133) and controls (n ¼ 457), and association testing showed similar results with CBD (OR ¼ 1. (Table 3).
Expression quantitative trait locus analysis. To further explore potential biological mechanisms, RNA expression quantitative trait loci (eQTL) analysis for the CBD risk loci were interrogated in our expression GWAS (eGWAS) 25 of B400 samples from cerebellum and temporal cortex from neurodegenerative disorders (Supplementary Table 7) using the Illumina Whole-Genome DASL assay. Expression changes are a possible mechanism of disease-associated functional variants, but other mechanisms also need to be investigated in future studies. The eGWAS data were queried for the top SNPs associated with CBD (rs393152, rs963731, rs1768208, and rs242557). The eGWAS genotypes did not contain rs643472 (KIF13B/DUSP4 locus) or any SNPs in LD at R 2 40.5 with rs643472; and the call rates for the two DUSP4 expression probes did not pass quality control, precluding eQTL analysis with DUSP4 expression. SNPs were tested for association with transcript levels in the eGWAS cohort by multivariable linear regression using an additive model, with the minor allele dosage (0, 1 or 2) as the independent variable, and APOE e4 dosage (0, 1 or 2), age at death, gender, PCR plate, RIN, and RIN-RIN mean as biological and technical covariates. The sample set was analysed as a single cohort for maximum statistical power to detect associations, while including a covariate for AD diagnosis. As we previously reported 25 , there were significant SNP/ transcript associations with MAPT/rs8070723 (cerebellum: P ¼ 7.02 Â 10 À 69 ; temporal cortex: P ¼ 8.61 Â 10 À 44 ) and MOBP/rs1768208 (cerebellum: P ¼ 1.71 Â 10 À 7 ; temporal cortex: 1.57 Â 10 À 6 ). We also detected significant associations with SOS1/rs963731 (cerebellum: P ¼ 4.61 Â 10 À 4 ; temporal cortex: P ¼ 2.80 Â 10 À 6 ) upon eQTL analysis (Table 4). Beta values for SNP/transcript associations showed increased expression levels with each copy of the risk allele.
To determine whether the novel genome-wide significant association at 8p12 (rs643472, an intronic SNP in lnc-KIF13B-1) genotype correlates with differential expression levels, we measured lnc-KIF13B-1 expression in RNA from superior frontal cortex of 22 CBD and 20 normal cases using the mirVana Paris RNA isolation kit and quantitative PCR (qPCR) by SYBR Green. All qPCR reactions were technically replicated. Two different qPCR assays targeting TCONS_00014956 were designed, one targeting exon 1 (primer set A) and the other for exon 2 (primer set B), which showed good correlation upon comparison of expression levels using the two different primer sets (R 2 ¼ 0.97). This identified a statistical trend for greater lnc-KIF13B-1 expression with the minor risk allele (Supplementary Fig. 6).

Discussion
This first CBD GWAS on 152 autopsy-proven CBD patients and 3,311 control individuals has a clear limitation in terms of sample size. Given that CBD is a rare disorder and that it can only be diagnosed upon neuropathologic examination, this was the largest possible cohort at the time of genotyping. To control the false-positive rate due to the small sample size, a stringent significance threshold (Po10 À 5 ) was applied in the discovery stage to select SNPs for genotyping in the replication stage. This proved to be a useful approach because the random effects metaanalysis identified two associations that did not replicate ( Table 2). The disadvantage of applying stringent criteria is the possibility of false-negative associations, as evidenced from the power analysis, which show low power to detect genome-wide associations of variants with modest effect sizes (Supplementary  Tables 8 and 9).
Both CBD and PSP now have confirmed genome-wide significant associations at the MAPT locus. In addition, CBD was also associated with rs242557 (P ¼ 7.91 Â 10 À 6 ), a SNP tagging the H1c subhaplotype at 17q21, shown to be associated with PSP and located in a regulatory region influencing MAPT expression 26 . In support of the MOBP locus being a shared genetic risk factor between CBD and PSP, CBD cases have an even greater OR estimate compared with PSP (MAF CBD ¼ 0.40; OR ¼ 1.71; MAF PSP ¼ 0.36; OR ¼ 1.39; ref. 17). Importantly, this is the first non-MAPT genetic risk factor shared between CBD and PSP. Yamamoto et al. 27 described MOBP as one of the most abundant oligodendrocyte-expressing proteins, similar to myelin basic protein, with the main difference being that MOBP is only expressed in myelin of the central nervous system. The results of the current study and those of the PSP GWAS indicate an overlap in genetic risk factors for PSP and CBD, but it appears that there is additional genetic variation that differs between the two disorders. Additional analyses of the novel susceptibility loci at  Top CBD GWAS SNPs were tested for transcript associations in 374 human cerebellum and 399 temporal cortex samples. Linear regression was employed using an additive model, with the minor allele dosage (0, 1, 2) as the independent variable, and APOE e4 dosage (0, 1, 2), age at death, gender, PCR plate, RIN, (RIN-RINmean) as biological and technical covariates 14 . chromosomes 8p12 and 2p22 have the potential to advance our understanding of CBD. The chromosome 8p12 locus contains three candidate genes-lnc-KIF13B-1 (TCONS_00014956, a large intergenic non-coding RNA or lincRNA); kinesin family member 13B (KIF13B); and dual-specificity protein phosphatase 4 (DUSP4), also known as MAP kinase phosphatase-2 (MKP-2)).
Although the lnc-KIF13B-1 expression levels were not significantly different between CBD and normals, there was a trend for greater expression with the minor risk allele. KIF13B is a plus end-directed microtubule motor protein highly expressed in the brain, and is involved in synaptic vesicle trafficking along microtubules, neurite extension, and caveolin-dependent endocytosis [28][29][30] . Of interest is that one of the PSP risk variants is in the gene STX6 and is a SNARE-class protein that regulates vesicle membrane fusion, which raises the possibility that dysfunction of vesicular trafficking may be a common disease mechanism between CBD and PSP. DUSP4 acts through tyrosine and threonine-directed dephosphorylation 31 . Because tau phosphorylation plays a critical role in tauopathies 32 , it will be interesting if DUSP4 is responsible for the genetic association at 8p12. Indeed, in Alzheimer's disease brain tissue there is decreased enzymatic activity of tau phosphatases such as PP-2A and PP-1 (ref. 33), indicating that a potential role for variants in DUSP4 in CBD pathogenesis would be through aberrant tau phosphorylation. Furthermore, the chromosome 2p22 locus contains son of sevenless homologue 1 (SOS1), a guanine nucleotide exchange factor for Ras that has a catalytic region known as the cdc25 domain due to sequence homology to CDC25, a dual-specificity phosphatase 34 . Taken together, the novel susceptibility loci for CBD may link common genetic variation to aberrant tau phosphorylation.
In conclusion, this first CBD GWAS identified MAPT and MOBP as shared genetic risk factors between CBD and PSP. Given the significant white matter and oligodendrocyte pathology in these primary tauopathies, the genetic association with MOBP warrants functional characterization to determine its role in CBD and PSP.

Methods
Samples. The discovery stage cohort was comprised of 152 autopsy-proven CBD cases collected from eight institutions (Table 1 and Supplementary Table 1). Discovery stage controls were recruited from the Children's Hospital of Philadelphia Health Care Network, and although these controls are not age matched to the cases, the justification for using this cohort is that they were all genotyped at the same center using the same protocol as the CBD cases. Furthermore, because CBD is such a rare disorder with an estimated prevalence of 4.9-7.3 cases per 100,000 individuals 35 , the chance of any of the young controls developing CBD at a later age is negligible. As described in the PSP GWAS 17 , to ensure significant associations were not confounded by using young controls, allele frequencies for the top associated SNPs were compared and found to be similar to older controls (N ¼ 3,720) from three data sets downloaded from the National Institute of Health (NIH) Database for Genotypes and Phenotypes (dbGaP).
DNA was extracted from the brain tissue of CBD patients and from the blood samples of controls. Brain autopsies were obtained after consent of the legal nextof-kin and are considered exempt from human subject research. Written informed consent was obtained from control individuals as approved by The Institutional Review Board (University of Pennsylvania) and Mayo Clinic Institutional Review Board.
Power analyses were estimated for the Discovery Stage and Replication Stage for various minor allele frequencies and odds ratios with 1,000 simulations of each scenario (Supplementary Tables 8 and 9). The power assessment assumes that genotype frequencies are in Hardy-Weinberg equilibrium in CBD cases and control individuals separately and that association testing is performed using logistic regression under an additive model. Quality control. Quality control of genotyping data was performed at the individual level and then at the SNP level. For quality control, 10 individuals were genotyped in duplicate. Exclusion criteria for individual samples included high genotype failure rate (six individuals were removed because of a genotype failure rate 42%) and cryptic relatedness. Genetic outliers were excluded from further analyses on the basis of identity by state and distance to the nearest-neighbour analysis was conducted in PLINK 19 (five CBD cases and 341 controls were removed on the basis of Z-score distributions for the first to the fifth neighbour, Z-scoreo À 2). Gender inconsistencies were assessed by chromosome X genotypes, which excluded six individuals on the basis of observed and expected gender. Exclusion criteria for markers included minor allele frequency (24,183 SNPs were removed because of minor allele frequency o1%), deviation from Hardy-Weinberg equilibrium in controls (1,098 SNPs gave a Hardy-Weinberg expectations test, P value r10 À 7 ), and high genotype failure rates (3,199 SNPs were removed because of genotype failure rates 42%).
Population substructure. MDS was applied to a pruned data set (B140,000 markers) using PLINK. Scatter plots for the first two principle components were generated using R (Supplementary Fig. 1a). Analysis of population stratification showed that the first principle component was sufficient to reduce the genomic inflation factor (l) from 1.06 to 1.01, as illustrated by the quantile-quantile (QQ) plots of observed versus expected P values. Using the first two principal components resulted in l ¼ 1.02 (Supplementary Fig. 2b); thus, only the first MDS principal component was used as a covariate for all subsequent analyses.
Association analysis. CBD versus control association was tested by conditional logistic regression under an additive model using the first MDS principle component as a covariate. We also tested for association from a subset of samples from stage 1, excluding outliers based on population substructure ( Supplementary  Fig. 1b) eQTL analysis. These data were generated and previously reported in a brain expression GWAS (eGWAS) by Zou et al. 25 , where the detailed mRNA extraction and quality assessment using RNAqueous kit (Ambion, Grand Island, NY) and RNA 6,000 Nano Chip (Agilent, Santa Clara, CA), Whole Genome DASL assay (Illumina, San Diego, CA), and data quality control methods can be found. This eGWAS cohort included 374 cerebellar and 399 temporal cortex mRNA samples from the Mayo Clinic Florida Brain Bank. Neuropathologic diagnoses for the cohort include Alzheimer's disease, PSP, CBD, Lewy body disease, frontotemporal lobar degeneration, and other diagnoses (Supplementary Table 7). The Bonferroni P value threshold for SNP/transcript association significance was Po1.56 Â 10 À 3 (four SNPs and eight probes).
RNA preparation for lnc-KIF13B-1 expression studies. Superior frontal cortex tissue was isolated from CBD cases and normal controls from the Mayo Clinic Florida brain bank and matched for age at death and gender. Total RNA from brain tissue samples was extracted with acid-phenol: chloroform chemistry using mir-Vana PARIS isolation kit (Ambion). Total RNA concentration was determined with a NanoDrop ND-1,000 spectrophotometer (NanoDrop, Wilmington, DE). RNA quality control was performed using a 2,100 Bioanalyzer (Agilent, Santa Clara, CA) to measure RNA integrity number (RIN) and only samples with an RNA integrity value 45 were included in this study. The mean RIN in frontal cortex for 22 CBD cases (5.6±0.38) were not different from 19 controls (6.4±0.77).
Real-time quantitative PCR for lincRNA. Five hundred nanograms of total RNA from the human superior frontal of CBD patients (n ¼ 22) and normal controls (n ¼ 19) were reverse transcribed with High Capacity cDNA Reverse Transcription kit (Applied Biosystems, Carlsbad, CA). Quantitative PCR (qPCR) was performed using Fast SYBR Green PCR Master Mix (Applied Biosystems) and StepOnePlus (Applied Biosystems) according to the company's suggested protocol and thermal cycling programme. A final dissociation step was performed at the end of the qPCR assay to evaluate the specificity of amplification. All qPCR reactions were technically replicated. Two different qPCR assays targeting TCONS_00014956 were designed, one targeting exon 1 (Primer set A) and the other for exon 2 (Primer set B). All the primers were purchased from Sigma Life Science (Sigma, St Louis, MO) with the sequences shown in Supplementary Fig. 6. The relative levels of lincRNA were calculated by comparative Ct method using StepOne Software v2.3 (Applied Biosystems, Grand Island, NY) and GeneEx 5.3.2 (Multid Analyses, Göteborg, Sweden). GAPDH was used as a normalization control. Statistical analysis was performed using GraphPadPrism v5.04 (GraphPad, La Jolla, CA).