A multi-ethnic meta-analysis identifies novel genes, including ACSL5, associated with amyotrophic lateral sclerosis

Amyotrophic lateral sclerosis (ALS) is a devastating progressive motor neuron disease that affects people of all ethnicities. Approximately 90% of ALS cases are sporadic and thought to have multifactorial pathogenesis. To understand the genetics of sporadic ALS, we conducted a genome-wide association study using 1,173 sporadic ALS cases and 8,925 controls in a Japanese population. A combined meta-analysis of our Japanese cohort with individuals of European ancestry revealed a significant association at the ACSL5 locus (top SNP p = 2.97 × 10−8). We validated the association with ACSL5 in a replication study with a Chinese population and an independent Japanese population (1941 ALS cases, 3821 controls; top SNP p = 1.82 × 10−4). In the combined meta-analysis, the intronic ACSL5 SNP rs3736947 showed the strongest association (p = 7.81 × 10−11). Using a gene-based analysis of the full multi-ethnic dataset, we uncovered additional genes significantly associated with ALS: ERGIC1, RAPGEF5, FNBP1, and ATXN3. These results advance our understanding of the genetic basis of sporadic ALS.

A myotrophic lateral sclerosis (ALS) is the most common motor neuron disease characterised by progressive skeletal muscle atrophies that lead to death mostly within 3-5 years from disease onset 1 . Mutations in more than 30 genes have been identified to cause ALS, including SOD1, TARDBP, FUS, and C9orf72. Approximately 50-70% of ALS patients with a family history of the disease can be attributed to mutations in these genes; however, these mutations are found in only 3% of the Japanese patients with sporadic ALS 2 . A twin study estimated the heritability of sporadic ALS to be 61% 3 ; therefore, sporadic ALS is thought to be a multifactorial disease to which multiple genetic and environmental factors contribute. Previous genome-wide association studies (GWAS) reported six common loci showing genome-wide significant associations with ALS, which only explained 0.2% of its heritability 4 . To elucidate the pathophysiology of sporadic ALS and to develop appropriate therapies, the genetic background of sporadic ALS needs to be understood more clearly.
Assuming that common causal variants in different populations are important in investigating the pathophysiology of sporadic ALS, cross-ethnic meta-analysis of GWAS would have valuable implications. In addition, cross-ethnic GWAS has an advantage for fine-mapping because the linkage disequilibrium patterns that differ across populations can improve the resolution 5 . Thus, we report analyses of novel genome-wide association study data of 1173 sporadic ALS cases and 8925 normal controls in a Japanese population and meta-analysis with the largest ALS study in a European population 6 . We also validate the candidate region with a combined replication study using 707 other ALS cases and 971 controls from a Japanese population and a Chinese dataset 7 . Using a gene-based analysis, we identify four additional genes significantly associated with ALS. The discovery of novel risk genes advances our understanding of ALS aetiology.

Discussion
In this study, the region in ACSL5 was discovered as a novel risk locus for sporadic ALS by meta-analysis between Japanese and European datasets and was replicated in the Chinese dataset and another Japanese dataset. Expression analysis showed that the risk allele is associated with increased ACSL5 expression. The expression of ACSL5 mRNA in spinal motor neurons isolated by laser-capture microdissection in 12 sporadic ALS patients and nine controls was catalogued by Batra et al. 10,11 . ACSL5 mRNA expression was possibly higher in sporadic ALS than in controls (Supplementary Fig. 5; p-value = 0.033 with Mann-Whitney U test). Similarly, another report showed that ACSL5 mRNA expression in the spinal anterior horn was upregulated in sporadic ALS patients compared with that in controls 12 . ACSL5 is one of the members of the acyl-CoA synthetase long chain family. Acyl-CoA synthetase produces acyl-CoA for numerous metabolic pathways, such as cellular lipid metabolism; transcriptional regulation; intracellular protein transportation; and protein acylation in various tissues, including skeletal muscle, the liver, and the brain 13 . ACSL5 is a neurotoxic A1 astrocyterelated gene, and is up-regulated in A1 astrocytes 14 . A1 astrocytes are abundant in various neurodegenerative diseases, including Table 1 Top three SNPs displaying p < 5.0 × 10 −8 in the ACSL5 gene body region by GWAS of the meta-analysis between European and Japanese (JaCALS) cohorts. ALS, and they induce the death of neurons in the central nervous system 15 . We speculated that increased expression of ACSL5 could induce A1 astrocytes, cause motor neuron death, and lead to ALS development. Another possible reason why ACSL5 is associated with ALS may be related to lipid metabolism. The risk allele (A) of rs58854276 has been reported to be associated with lower HDL-cholesterol in Japanese individuals 16 . There have been several reports that dyslipidaemia increases the risk of ALS 17,18 . However, the association between ALS and dyslipidaemia has not been replicated in other studies 19 . Further studies are warranted to clarify the association between ACSL5 and ALS onset.
In the largest meta-analysis with 23,213 cases and 71,579 controls from Japanese, European, and Chinese populations, we identified a novel locus at 6p21, which reached genome-wide significance, in addition to ACSL5. The top SNP in the locus (rs140736091) was in the long non-coding RNA TSBP1-AS1. Some studies have reported that long non-coding RNAs are associated with ALS, but their role in ALS still needs to be elucidated 20 . Therefore, further replication study and functional analysis will be needed to clarify the association between rs140736091 and patients with ALS.
The gene-based test for the largest multi-ethnic meta-GWAS from Japanese, Chinese and Europeans revealed novel genes, ERGIC1, RAPGEF5, FNBP1, and ATXN3, in addition to ACSL5.
ERGIC1 is a membrane-bound protein that is localised to the endoplasmic reticulum (ER)-Golgi intermediate compartment (ERGIC). The ERGIC mediates membrane traffic and selective transport of cargo between the ER and the Golgi complex 21 . ER-Golgi transport dysfunction is reported to be a common pathogenic mechanism in SOD1-, TDP-43-, and FUS-associated ALS 22 , and ER stress has been implicated in ALS aetiology. Combined GWAS of genetic overlap between ALS and frontotemporal dementia-spectrum neurodegenerative diseases identified rs538622 near ERGIC1 23 .
RAPGEF5 is a member of the Ras subfamily of GTPases, which function in signal transduction for cell growth and differentiation as guanosine diphosphate (GDP)/guanosine triphosphate (GTP)regulated switches cycling between inactive GDP-and active GTP-bound states 24 . RAPGEF5 has been reported to be associated with telencephalic neurogenesis 25 . The RAPGEF5 transcript is expressed predominantly in the brain.   FNBP1, a member of the formin-binding protein family, is a membrane-associated protein. It plays an important role in clathrin-mediated endocytosis 26 . FNBP1 is upregulated in the spinal cord of SOD1 G93A mice 27 .
ATXN3 is a ubiquitously expressed deubiquitinase that plays important roles in the ubiquitin proteasome system, transcriptional regulation and neuroprotection 28 . A recent meta-analysis of European GWAS suggested an association between SNP (rs10143310) in ATXN3 and ALS, although the SNP did not achieve genome-wide significance (p = 3.2 ×10 −7 ) 6 . The CAG repeat expansion in the coding region of ATXN3 causes spinocerebellar ataxia type 3 (SCA3) 29 . SCA3 patients and ALS patients have common pathologies, such as TDP-43-positive inclusions in the lower motor neurons of the anterior horn of the spinal cord and brainstem 30 . A gene involved in cerebellar ataxia, ATXN2, has also been described as a risk gene for sporadic ALS. An intermediate repeat expansion in ATXN2 is associated with the risk of ALS 31 .
In conclusion, multi-ethnic GWAS identified the association of the ACSL5 locus with ALS. This locus was identified by combining GWAS results from our Japanese dataset with the largest set of European GWAS data and was replicated in an independent Japanese cohort and a Chinese cohort. In addition, genebased analysis identified ERGIC1, RAPGEF5, FNBP1, ACSL5, and ATXN3. While these genes reached the discovery stage of the analysis, further replication analysis or functional analysis in ALS is warranted. Nevertheless, the discovery of novel risk loci significantly advances our understanding of ALS aetiology.

Methods
Study subjects in the Japanese cohort. In the discovery cohort, we performed a GWAS in Japanese sporadic ALS cases from the Japanese Consortium for ALS research (JaCALS) 32 and in normal controls from the Tohoku Medical Megabank Project (TMM) 33,34 . In the replication cohort, we obtained DNA from ALS patients registered by BioBank Japan 35,36 and normal controls registered in the Pharma SNP Consortium. The ethics committees of the respective research projects approved this study. Written informed consent for this study was obtained from all the participants.
DNA extraction from ALS patients. To extract genomic DNA, peripheral wholeblood samples were processed using an Autopure LS system (Qiagen, Hilden, Germany) for automated nucleotide purification according to the manufacturer's instructions. We omitted RNase treatment, measured the concentration of the double-stranded DNA with PicoGreen (Life Technologies, Carlsbad, CA, USA), and adjusted the concentration of the DNA to 200 ng/μL in Elution Buffer (Qiagen).   Japanese ALS patients for the discovery study. The case cohort comprised 1,245 candidate sporadic ALS patients recruited from the Japanese Consortium for ALS research (JaCALS), which included 32 neurology facilities in Japan. Patients with a family history for ALS were excluded. The included patients were diagnosed with definite, probable, probable laboratory-supported, or possible ALS based on the revised El Escorial criteria 37  Japanese controls for the discovery study. The control cohort was the prospective cohort of the Tohoku Medical Megabank Project (TMM) 33 38 using Impute 4 39 . The target individuals themselves for the imputation were not included in the 2KJPN reference panel 40 . According to the protocol by Shido et al. 41 , the imputed genotype data in Oxford GEN format were converted to Plink BED format by selecting the genotype with the highest posterior probability for each SNP and individual. In the conversion, highest posterior probabilities less than 0.9 were handled as missing genotypes. Finally, we constructed the imputed dataset with 1245 cases and 9924 controls from 27,893,192 SNPs.
Japanese replication cohort. We obtained DNA from 707 ALS patients registered by BioBank Japan 35,36 and 971 normal controls registered in the Pharma SNP Consortium. Twenty-nine SNPs with a suggestive threshold of p = 5 × 10 −6 on chromosome 10q25.2 were genotyped via the multiplex PCR-based target sequencing method. Primers were designed using Primer 3 software. The products of multiplex PCR were sequenced using Illumina HiSeq 2500 (Illumina Inc., San Diego, CA, USA). The sequence data were analysed using a standard pipeline. The details of the methods have been previously described 48 .
For this analysis, we did not have genome-wide genotype information for these samples. We conducted logistic regression analysis using Plink with no covariate, e.g. principal components, for these 29 SNPs.
Meta-analysis of Chinese and Japanese replication cohort. We downloaded the summary statistics of the 1234 ALS cases and 2850 controls in the Chinese cohort 7 . Inverse-variance meta-analysis was conducted for the 1234 ALS cases and 2850 controls in the Chinese cohort and 707 ALS cases and 971 normal controls in the Japanese replication cohort using METAL (version 2011-03-25) 46 . A value of p < 1.72 × 10 −3 (= 0.05/29; Bonferroni adjusted) in the replication stage was considered statistically significant.
Multi-ethnic meta-analysis. Inverse-variance meta-analysis was conducted among 1234 cases and 2850 controls in the Chinese cohort 7 and the former samples in European 6 and Japanese datasets (JaCALS) using METAL (version 2011-03-25) 46 .
Gene-based analysis of the multi-ethnic dataset. For the summary statistics data of the multi-ethnic meta-analysis of Japanese, European, and Chinese datasets with 3,704,464 SNPs, a gene-based analysis was applied using MAGMA (version 1.07) with the default option 49 . In the pre-processing step, each SNP was checked to be mapped to a specific gene. If the SNP was located within the gene body region of a gene, the SNP was annotated to this gene, i.e. a SNP in an intergenic region was not annotated to any gene. In the pre-processing step, an annotation window range can be set to include the peripheral regions around the genes. The default value of the annotation window range is zero in MAGMA (version 1.07). The value is the strictest option in this pre-processing step since extra regions around the gene are not included in genebased analyses. Thus, we used the default option. After the pre-processing, 1,585,558 SNPs in total were annotated to 17,544 genes. For the gene-based analysis, a p-value should be calculated for each gene from the SNPs annotated to this gene in the preprocessing step. We used the default SNP-wise mean model in MAGMA for the calculation step. The SNP-wise mean model (the mean of the χ2 statistic for the SNPs in a gene) is highly similar to the commonly used SKAT model (with inverse variance weights) 50 . The drawback of this method is that it decreases the power to detect associations for rare variants. In our analysis, the minor allele frequency in the discovery Japanese cohort was more than 0.03, and the problem was negligible. Finally, 13 genes with a genome-wide significance threshold of p = 2.85 × 10 −6 (=0.05/17544; after Bonferroni correction) were selected.
Quantitative real-time PCR. Lymphoblastoid B cell lines (LCLs) were prepared from peripheral blood B cells of ALS patients using standard Epstein-Barr virus transformation techniques at the time of registration for JaCALS 51 . LCLs obtained from 20 patients with age-matched and sex-matched ALS (Supplementary Table 1) for each genotype of rs3736947 were applied to the gene expression analysis. All 60 patients whom we selected to quantify eQTL were diagnosed with sporadic ALS. Further, we performed exome sequencing or target resequencing for all 60 patients who were selected to quantify eQTL. The details of the methods have been previously described 2 . These patients had no pathogenic variant in the ALS-related genes, such as SOD1, FUS, and TARDBP. Total RNA was extracted from the LCLs using the PureLink RNA Mini kit (Thermo Fisher Scientific, Waltham, MA, USA). Total RNA was transcribed using the SuperScript IV VILO Master Mix (Thermo Fisher Scientific). Real-time PCR was performed using the THUNDERBIRD SYBR qPCR Mix (Toyobo, Osaka, Japan) and the CFX96 system (BioRad, Hercules, CA, USA), according to the manufacturer's instructions. The expression level of the internal control, β2 microglobulin, was simultaneously quantified. The primers are listed in Supplementary

Data availability
The summary statistics of our genome-wide association studies are available at the Human Genetic Variation Database (Accession ID: HGV0000013). The source data of expression of ACSL5 mRNA in LCLs are available in the Supplementary Data 4. All relevant data are available from G.S. (sobueg@aichi-med-u.ac.jp) upon request.