Towards genomic database of Alexander disease to identify variations modifying disease phenotype

Alexander disease (AxD) is an extremely rare neurodegenerative disorder caused by glial fibrillary acidic protein (GFAP) gene mutations. Compared with the cerebral type, which is characterized by infantile onset, the bulbospinal type and intermediate form are associated with a late onset, spanning from juveniles to the elderly, and more diverse clinical spectrum, suggesting the existence of factors contributing to phenotypic diversity. To build a foundation for future genetic studies of this rare disease, we obtained genomic data by whole exome-sequencing (WES) and DNA microarray derived from thirty-one AxD patients with the bulbospinal type and intermediate form. Using this data, we aimed to identify genetic variations determining the age at onset (AAO) of AxD. As a result, WES- or microarray-based association studies between younger (<45 years; n = 13)- and older (≥45 years; n = 18)-onset patients considering the predicted GFAP-mutation pathogenicity identified no genome-wide significant variant. The candidate gene approach identified several variants likely correlated with AAO (p < 0.05): GAN, SLC1A2, CASP3, HDACs, and PI3K. Although we need to replicate the results using an independent population, this is the first step towards constructing a database, which may serve as an important tool to advance our understanding of AxD.

form in addition to GFAP mutations, while the almost constant phenotype of the cerebral type may be determined mostly by GFAP mutations themselves.
Recent genome-wide association studies on Huntington disease and Duchenne muscular dystrophy, both of which are monogenic neurological diseases, identified some susceptible loci as modifier genes that are likely to contribute to the diversity of AAO and age at loss of ambulation, respectively 5,6 . However, the literature on extremely rare diseases including AxD mainly comprises case reports from different institutes. An association study involving a number of AxD patients has not been reported to date.
We collected blood samples from and clinical information on AxD patients referred from facilities throughout Japan, and obtained genomic data from 31 bulbospinal-type and intermediate-form AxD patients by whole-exome sequencing (WES) and microarray analysis. Then, we performed association analyses focusing on AAO in an attempt to identify genes responsible for the phenotypic diversity of the bulbospinal type and intermediate form. Although no genome-wide significant variant was detected, we identified genetic variations likely to be associated with AAO by focusing on AxD-related genes. Table 1. Twenty-five patients had already been reported in the literature [7][8][9][10][11][12][13][14][15][16][17] . The distribution of AAO and clinical classification (bulbospinal type or intermediate form) are shown in Fig. 1. The AAO was distributed widely (5 to 72 years) and bimodally: one peak was in those in their teens to thirties and the other peak was around 60 years, and the boundary was approximately 45 years. Thus, we defined the subjects with an onset at younger than 45 years as a "younger-onset" group (n = 13) and those with an onset at 45 years or older as an "older-onset" group (n = 18). Although the exact AAO of patient 12 was unknown, she was included in the younger-onset group because she had been diagnosed  group, and those with R79H belonged to the younger-onset group, whereas those with R416W were distributed in both younger-and older-onset groups. The remaining sixteen mutations were singletons. These findings suggested the difficulty of analyzing genotype-phenotype correlations for each mutation. Alternatively, we divided the mutations into two groups: "Neutral" and "Deleterious", based on the prediction tool PROVEAN and compared AAO between the two groups ( Fig. 3). Nine patients had "Neutral" and 22 patients had "Deleterious" mutations. The mean AAO was significantly different between "Neutral" (58.0 ± 8.8 years) and "Deleterious" (38.9 ± 8.8 years) groups (p < 0.01, t-test).

Results clinical information on subjects. Clinical information is presented in
WeS/Microarray analysis. WES was performed to identify exonic functional variants, and microarray was used for genome-wide analysis. In WES, the mean depth (defined as "total number of reads mapped to the target regions/total number of target regions") was 169, and the mean coverage (defined as "number of target regions covered over 50 reads/total number of target regions") was 90.7%. In microarray analysis, the average call rate was 99.8%. An association study with logistic regression was performed by considering the predicted pathogenicity of the GFAP mutations described above. The number of variants investigated in WES and microarray analyses were 38,679 and 266,925, respectively (Fig. 4). Forty-six variants (37 genes) had a p-value < 0.01 in WES (Supplementary  Table S1) and 645 variants (188 gene loci) met this value in microarray analysis (Supplementary Table S2). Of the forty-six variants with a p-value < 0.01 in WES data, thirty-five variants (76%) were reproducible in microarray data. The p-value of the recessive model tended to be unavailable or high mainly due to the absence or too small a number of minor allele homozygotes. No variant had a p-value below the significance level after Bonferroni correction using either analysis (WES-based significance: p < 1.29 × 10 −6 , microarray-based significance: p < 1.87 × 10 −7 ).
We additionally searched for modifiers solely in 19 patients with mutations that cause the bulbospinal type, because their symptoms are comparatively mild and the effect of these GFAP mutations might not be severe (Supplementary Materials S3 and S4). Sixteen and 22 variants showed p-values below 1.0 × 10 −5 in WES and microarray analysis, respectively. One variant, p.Ala155Ala in TRABD2B, reached WES-based significance (p = 2.05 × 10 −7 ). However, after excluding the outlier with a markedly early age at onset, no variant showed p-values below 1.0 × 10 −5 . Figure 3. The predicted pathogenicity of GFAP mutations and age at onset. The pathogenicity of GFAP mutations was divided into "Neutral" and "Deleterious" using the prediction tool PROVEAN. The mean age at onset of the "Deleterious" group (38.9 ± 8.8 years) was significantly younger than that of the "Neutral" group (58.0 ± 8.8 years) (p = 0.00999, t-test www.nature.com/scientificreports www.nature.com/scientificreports/ candidate gene approach. The lack of WES-or microarray-based significant variants may be expected due to the small sample size and weaker modifier effect than GFAP mutations. To identify likely associated variants in AxD-related genes, we employed a candidate gene approach. Among 33 genes reported to have an association with AxD in the literature (Supplementary Table S5), the variants of 5 genes had a p-value < 0.05 using next-generation sequencing data ( Table 2): upstream insertion variant of HDAC4; intron single nucleotide polymorphisms (SNPs) of CASP3 and GAN; and missense variants of PIK3CG and PIK3C2G. The variants of 7 genes were identified with a p-value < 0.05 using microarray data ( Table 2): intron SNPs of HDAC9, PIK3AP1, SLC1A2, PIK3C2G, GAN and PIK3R5; 3'UTR SNP of HDAC9; and missense variants of PIK3CG and PIK3C2G. The missense variants in PIK3CG and PIK3C2G were identified with a p-value < 0.05 using both next-generation sequencing and microarray data.
These results were verified by Sanger sequencing of the SNP with the lowest p-value in each gene (Supplementary Fig. S6-1). In all patients, genotypes of the 9 representative SNPs determined by WES/ microarray were consistent with those determined by the Sanger method, except for PIK3C2G SNP (position chr12:18651702, rs2305220) in one patient. In this particular patient, rs2305220 was T/T based on microarray; however, it was T/C based on Sanger sequencing. This discrepancy may have been due to another heterozygous SNP, rs191094996, located next to rs2305220 (Supplementary Fig. S6−2), suggesting a genotyping error caused by mismatched hybridization in microarray. When using corrected genotyping data, the p-value of PIK3C2G rs2305220 remained at < 0.05 (p = 0.0151).

Discussion
Our study is the first genome-wide analysis focusing on genetic modifiers in extremely rare monogenic diseases including AxD. The predicted pathogenicity of GFAP mutation was significantly correlated with AAO. In the association study considering GFAP mutation pathogenicity, we detected several genes likely to be associated with AAO, which may contribute to the pathophysiology of AxD, although no genome-wide variants with significance after Bonferroni correction were detected.
In this study, we obtained clinical information linked with genomic data for the first time for AxD, being helpful to elucidate phenotype-genotype associations. Because of the extreme rarity of AxD, independent patients in different hospitals were recruited. To standardize the format of their clinical data, we used a questionnaire including AAO, initial symptoms, family history, neurological examination, brain/spine MRI findings, physiological testing, and degree of disability. Furthermore, we re-evaluated original MRI data based on unified standards.
In our study analyzing patients with bulbospinal type and intermediate form of AxD, AAO was distributed widely from 5 to 72 years, and older-onset patients (AAO ≥ 45) accounted for 58%, which is a very large number of older-onset patients compared with a previous study where the mean AAO was around 21 years old 18 . That is, AxD, which belongs to monogenic neurological diseases, has a broader AAO than previously thought, suggesting the existence of modifier genes other than causative GFAP mutations contributing to the diversity of Figure 4. Flowchart of association analysis. We performed WES-and microarray-based association analysis (upper) and also adopted a candidate gene approach (lower). Initially, an association study between youngerand older-onset patients considering predicted GFAP mutation pathogenicity was conducted. There was no variant reaching genome-wide significance. Next, we focused on variants in candidate genes related to the pathophysiology of Alexander disease. We also demonstrated that the GFAP mutation pathogenicity predicted by sequence conservation was significantly correlated with AAO. The mean AAO was 20 years younger in patients with "Deleterious" compared with "Neutral" mutations. We applied this finding to association analysis.
In the WES-or microarray-wide association study, each variant was analyzed considering the predicted pathogenicity of GFAP mutations as a covariate. A WES-based association study is a novel approach, validated with microarray-based analysis. A total of 76% of the variants based on the WES-based association study with a p-value < 0.01 were reproducible in microarray-based analysis, suggesting the reliability of the WES-based association study. Such a WES-based association study is expensive compared with microarray analysis, but it has the merit of identifying functional variants directly. A WES-based association study would be particularly useful if WES data already exist.
We could not detect any variants reaching WES-or microarray-based significance. This is mainly due to not only the small sample size but also the effect of GFAP mutations. Intermediate-type patients showed a more severe phenotype and a high frequency of GFAP mutations predicted as deleterious (11 out of 12, 92%). On the other hand, bulbospinal-type patients showed a mild phenotype and their GFAP mutations included both deleterious (11 out of 19, 58%) and neutral (8 out of 19, 42%) forms. These results suggest the weaker effect of GFAP mutations in the bulbospinal type. In the quantitative trait locus (QTL) analysis of bulbospinal-type patients, although the influence of one outlier was not negligible, comparatively lower p-values were obtained. Increasing the number of patients, especially with the bulbospinal type, will be critical to identify genetic modifiers in the future.
In rare monogenic disorders, a small number of samples will lead to a weak statistical power, but it is rational to focus on the pathophysiology related to causative genes playing a primary role. With a focus on variants in candidate genes selected from the literature, five significant candidate genes with p-values under 0.05 were identified. GAN encodes gigaxonin which plays a role in the neurofilament architecture. Gigaxonin is mutated in giant axon neuropathy (OMIM #256850), in which abnormal GFAP aggregation occurs. A recent report indicated that gigaxonin targets GFAP for proteasomal degradation 19 . Interestingly, the decreased proteasomal function in astrocytes would contribute to increase the toxic effects of mutant GFAPs 20 , suggesting that the dysfunction of gigaxonin may be involved in accelerating GFAP aggregation. SLC1A2 encodes the major glutamate transporter of astrocytes, GLT-1 21 . Reductions of GLT-1 protein and mRNA were exacerbated in AxD mouse models 22 . The alteration of GLT-1 expression may affect the susceptibility to glutamate excitotoxity and neuron death. CASP3 encodes a protease, caspase 3, which plays a central role in cell apoptosis. Caspase 3 activation was reported to www.nature.com/scientificreports www.nature.com/scientificreports/ be correlated with mutant GFAPs in the C-terminal domain of GFAP, leading to the loss of astrocyte viability 23 . HDACs encode histone deacetylases, which play a role in the regulation of gene expression. Some HDACs were reported to control GFAP expression in primary human astrocytes 24 . In an exome next-generation sequencing study of two half-siblings with adult-onset AxD, the HDAC6 variant was shown to modulate a motor neuron disease-like phenotype 25 . PI3K encodes phosphatidylinositol-4,5-bisphosphate-3-kinase that is upstream of the key events leading to GFAP expression 26 , suggesting that the dysregulation of GFAP expression due to PI3K mutations may influence an abnormal GFAP aggregation process. Mutations in components of the PI3K-AKT pathway were found in patients with megalencephaly 27 , which is one of the major symptoms of intermediate form.
We validated the results of the candidate gene approach by Sanger sequencing to rule out latent genotype errors caused by WES/microarray methods. Surprisingly, ~5% errors caused by NGS in cancer samples were reported 28 . Of 273 genotyping data (9 variants of 31 patients) by Sanger sequencing, one genotype mismatched with that determined by microarray. This was probably due to another adjacent variant identified by the Sanger method. We should verify genotyping data by the Sanger method, especially when positive results are obtained by WES/microarray.
Our study has several limitations. First, because the patients were all Japanese, our results may not be applicable to other ethnicities. Second, the number of patients was too small to detect variants with a weak effect. It was difficult to collect an adequate number of patients due to the markedly low prevalence of AxD: one case per 2.7 million individuals, of which bulbospinal type and intermediate form were estimated to collectively comprise 65% 2 . Finally, we could not collect a sufficient number of patients with identical GFAP mutations. Alternatively, we applied the in silico prediction of GFAP mutation pathogenicity. Of the prediction tools based on phylogenic conservation such as PROVEAN, PolyPhen-2, SIFT, and Mutation Assessor, we selected PROVEAN because it covers insertion/deletion mutations. To improve the prediction of pathogenicity, effects of mutations on the structure or stability of GFAP protein and annotations other than conservation should be considered. Combined Annotation-Dependent Depletion (CADD) is a prediction method to objectively integrate many diverse annotations into a single measure 30 . We compared predictions by PROVEAN, PolyPhen-2, SIFT, Mutation Assessor, and CADD (Supplementary Table S7). For some mutations, predictions were not consistent among the tools. In addition, we considered predicted pathogenicity based on the protein structure or stability reported previously 31,32 (Supplementary Table S7). These findings suggest that prediction method used in this study is not definite. The use of multiple prediction tools and consideration of the protein structure or stability may be necessary in future analyses. To overcome these limitations in the future, recruiting larger numbers of AxD patients is indispensable. The identification of significant genome-wide genes will help elucidate the pathophysiology and therapeutic targets of AxD.
In conclusion, we conducted genome-wide analyses for the first time in AxD, which is an extremely rare monogenic disease, by employing two methods: WES and microarray. The association study in consideration of GFAP pathogenicity identified several variants of candidate genes with a p-value < 0.05. The results of the WES-based association study were comparable to those generated by microarray-based analysis. It is challenging to identify modifier genes with genome-wide significance for extremely rare monogenic disorders. Efforts toward establishing a genomic database of AxD may be important for confirming the results of the present study as well as identifying other potential modifier genes. Methods participants. Forty-two Japanese AxD patients of 40 families with heterozygous GFAP mutations were diagnosed between 2004 and 2016, all of whom were referred to our facility from other hospitals throughout Japan for GFAP gene analysis because of suspected AxD. Clinical information, including AAO, family history, initial symptom, neurological findings, and MRI data, were obtained by survey questionnaire form. To strictly classify the clinical types of patients, brain and spinal MRI findings were re-evaluated by experienced neurologists (RY, TY) using original image data. Four patients who gave written informed consent for genetic testing, but did not consent to participate in further research, were excluded from this study. From the 38 patients who gave written consent of this study, we excluded five patients with cerebral type because the history and prognosis of cerebral type are almost constant for each case, and thus AAO of this form may be almost exclusively determined by GFAP mutations. Of the remaining 33 patients from 31 families, who met the diagnostic criteria of bulbospinal type or intermediate form, 31 probands were included. All participants provided written informed consent. This study was performed in accordance with the declaration of Helsinki and approved by the Ethical Review Board of Kyoto Prefectural University of Medicine.
WeS and microarray analysis. We extracted genomic DNA from leukocytes and performed WES using standard protocols. DNA was captured with SureSelect Human All Exon V5 (Agilent Technologies, Santa Clara, CA, USA) and sequenced using Illumina HiScanSQ (Illumina, San Diego, CA, USA) with 108-bp paired-end reads in the NGS Core Facility at Kyoto Prefectural University of Medicine. After base calling with CASAVA v.1.8.2 (Illumina), reads were aligned to the human reference genome sequence (GRCh37/hg19) with Burrows-Wheeler Aligner v.0.7.12 33 . PCR duplicates were removed with Piacard v1.119. Single nucleotide variants and indels were called with the Genome Analysis Toolkit UnifiedGenotyper v.1.6-13 34 . Called variants were annotated with SnpEff v.4.0e 35 . Autosomal variants whose effect-impact was predicted as "High", "Moderate", or "Low" (except for synonymous variants) were analyzed. The bases without variant calls were considered to be identical genotypes to the reference allele. To ensure the accuracy of genotyping, variants with a minimum depth of 10 or lower were excluded.
In microarray analysis, we genotyped 551,839 genetic markers with InfiniumCoreExome-24 v.1.1 BeadChip (Illumina) and Genome Studio Software v.2011.1 (Illumina) according to the manufacturer's instructions. The genotype data were obtained by using Genome Studio Software v.2011.1 (Illumina). A quality control filter was applied as: minor allele frequency >0 (i.e., monomorphic markers were excluded) and a call rate ≥95% for autosomal markers.
To validate the accuracy of WES/microarray genotyping, the SNP with the lowest p-value for each significant candidate gene was chosen and further PCR and Sanger sequencing were performed. The data were analyzed by Sequencher 4.10.1 (Gene Codes Corporation, Ann Arbor, MI, USA). The PCR/sequencing primers are described (Supplementary Table S8).
Statistical analysis. Firstly, GFAP mutations were divided into "Neutral" or "Deleterious" using the prediction tool PROVEAN 36 based on sequence conservation. "Deleterious" means that the mutation is considered selectively deleterious because its alignment-based score among species is equal to or below a predefined threshold. "Neutral" means that the mutation is considered selectively neutral because its alignment-based score is above the threshold. The mean AAO was compared between patients with "Neutral" or "Deleterious" mutations using the t-test (R v.3.2.2) 37 .
Next, to identify variants related to AAO of AxD, we performed association analysis with logistic regression using WES and microarray data (Fig. 4 upper). The dependent variable was AAO as categorical data (a younger (<45 years) or older (≥45 years) onset). The independent variables were genotypes of each variant. The covariate was the predicted pathogenicity of GFAP mutations as binary categorical data ("Neutral" or "Deleterious"). Association analysis for additive, dominant, and recessive models and QTL analysis for age at onset were performed using PLINK v.1.07 software 38 . The additive model tested the effects of a minor allele dosage (0, 1, 2). Dominant and recessive models tested the effects of a minor allele between (DD, Dd) versus dd, and DD versus (Dd, dd), respectively (D and d are the minor and major alleles, respectively).
Finally, we focused on the candidate genes related to AxD pathophysiology (Fig. 4 lower). The p-values of all variations in the candidate genes were extracted from WES and microarray data. A list of candidate genes was created based on a search of PubMed using the keywords "Alexander disease" and "GFAP". We also hand-searched relevant publications and articles (Supplementary Table S5).