Common and rare variant association analyses in amyotrophic lateral sclerosis identify 15 risk loci with distinct genetic architectures and neuron-specific biology

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with a lifetime risk of one in 350 people and an unmet need for disease-modifying therapies. We conducted a cross-ancestry genome-wide association study (GWAS) including 29,612 patients with ALS and 122,656 controls, which identified 15 risk loci. When combined with 8,953 individuals with whole-genome sequencing (6,538 patients, 2,415 controls) and a large cortex-derived expression quantitative trait locus (eQTL) dataset (MetaBrain), analyses revealed locus-specific genetic architectures in which we prioritized genes either through rare variants, short tandem repeats or regulatory effects. ALS-associated risk loci were shared with multiple traits within the neurodegenerative spectrum but with distinct enrichment patterns across brain regions and cell types. Of the environmental and lifestyle risk factors obtained from the literature, Mendelian randomization analyses indicated a causal role for high cholesterol levels. The combination of all ALS-associated signals reveals a role for perturbations in vesicle-mediated transport and autophagy and provides evidence for cell-autonomous disease initiation in glutamatergic neurons.

A LS is a fatal neurodegenerative disease affecting one in 350 individuals. Due to degeneration of both upper and lower motor neurons, patients suffer from progressive paralysis, ultimately leading to respiratory failure within 3-5 years after disease onset 1 . In ~10% of patients with ALS, there is a clear family history for ALS, suggesting a strong genetic predisposition, and currently a pathogenic mutation can be found in more than half of these cases 2 . On the other hand, apparently sporadic ALS is considered a complex trait for which heritability is estimated at 40-50% (refs. 3,4 ). There is no widely accepted definition of familial or sporadic ALS 5 , and they are likely to represent the ends of a spectrum with overlapping genetic architectures for which the same genes have been implicated in both familial and sporadic disease [6][7][8][9][10][11] . To date, partially overlapping GWASs have identified up to six genome-wide significant loci, explaining a small proportion of the genetic susceptibility to ALS [11][12][13][14][15][16] . Indeed, some of these loci found in GWASs harbor rare variants with large effects also present in familial cases (for example, C9orf72 and TBK1) 6,17,18 . For other loci, the role of rare variants remains unknown.
While ALS is referred to as a motor neuron disease, cognitive and behavioral changes are observed in up to 50% of patients, sometimes leading to frontotemporal dementia (FTD). The overlap with FTD is clearly illustrated by the pathogenic hexanucleotide repeat expansion in C9orf72, which causes familial ALS and/or FTD 17,18 and the genome-wide genetic correlation between ALS and FTD 19 . Further expanding the ALS-FTD spectrum, a genetic correlation with progressive supranuclear palsy (PSP) has been described 20 . Shared pathogenic mechanisms between ALS and other neurodegenerative diseases, including common diseases such as Alzheimer's disease (AD) and Parkinson's disease (PD), can further reveal ALS pathophysiology and inform new therapeutic strategies.
Here, we combine new and existing individual-level genotype data in the largest GWAS of ALS to date. We present a comprehensive screen for pathogenic rare variants and short tandem repeat (STR) expansions as well as regulatory effects observed in brain cortex-derived RNA sequencing (RNA-seq) and methylation datasets to prioritize causal genes within ALS-risk loci. Furthermore, we reveal similarities and differences between ALS and other neurodegenerative diseases as well as the biological processes in disease-relevant tissues and cell types that affect ALS risk.

Results
Cross-ancestry meta-analysis reveals 15 risk loci for ALS. To generate the largest GWAS of ALS to date, we merged individual-level genotype data from 117 cohorts into six strata matched by genotyping platform. A total of 27,205 patients with ALS and 110,881 control participants of European ancestries passed quality control (including 6,374 newly genotyped cases and 22,526 control participants; Methods and Supplementary Tables 1 and 2). Patients were not selected for a family history of ALS. Through meta-analysis of these six strata, we obtained association statistics for 10,461,755 variants down to a minor allele frequency (MAF) of 0.1% in the Haplotype Reference Consortium resource 21 . We observed moderate inflation of the test statistics (λ GC = 1.12, λ 1000 = 1.003), and linkage disequilibrium (LD) score regression yielded an intercept of 1.029 (s.e. = 0.0073), indicating that the majority of inflation was due to the polygenic signal in ALS (LD score regression (LDSC): h 2 l = 0.028, s.e. = 0.003, K = 350 −1 , P = 5.5 × 10 −21 ). The European ancestry analysis identified 12 loci reaching genome-wide significance (P < 5.0 × 10 −8 ; Extended Data Fig. 1). For nine loci, the top SNP or a strong LD proxy (r 2 = 0.996) was present in GWAS of ALS in Asian ancestries (2,407 patients with ALS and 11,775 control participants) 15,16 , and all showed a consistent direction of effects (P binom = 2.0 × 10 −3 ). The three SNPs that were not present in the Asian ancestry GWAS were low-frequency variants (MAF of 0.6-1.6% in European ancestries, Table 1). The genetic overlap between ALS risk in European and Asian ancestries resulted in a trans-ancestry genetic correlation of 0.57 (s.e. = 0.28) for genetic effect and 0.58 (s.e. = 0.30) for genetic impact, which were not statistically significantly different from unity (P = 0.13 and P = 0.16, respectively). also the most likely causal mechanism for rs75087725 (CFAP410, formerly C21orf2, p.V58L; Supplementary Fig. 15), as the GWAS variant is a missense variant; no evidence for other mechanisms including repeat expansions or eQTL or mQTL effects was observed within this locus, and CFAP410 itself is known to directly interact with NEK1, another ALS gene 6,28 . These three loci illustrate the power of large-scale GWASs combined with large imputation panels to directly identify low-frequency causal variants that confer disease risk.
Second, SNPs can tag a highly pathogenic repeat expansion, as was observed for rs2453555 (C9orf72) and the known GGGGCC hexanucleotide repeat in this locus ( Supplementary Fig. 7). Conditional analysis revealed no residual signal after conditioning on the repeat expansion, which was in LD with the top SNP (r 2 = 0.14, |D′| = 0.99, MAF SNP = 0.25, MAF STR = 0.047). Besides the repeat expansion, both eQTL and mQTL analyses point to C9orf72 (Supplementary Fig. 7). The HEIDI (heterogeneity in dependent instruments) outlier test, however, rejected the null hypothesis that gene expression or methylation mediated the causal effect of the associated SNP (P HEIDI,eQTL = 3.7 × 10 −23 and P HEIDI,mQTL = 4.1 × 10 −7 ). This is in line with the idea that pathogenic repeat expansion is the causal variant in this locus and that eQTL and mQTL effects do not mediate a causal effect. We found no similar pathogenic repeat expansions that fully explained the SNP association signal in the other genome-wide significant loci.
Third, in two loci (rs62333164 in NEK1 and rs4075094 in TBK1), common and rare variants converged to the same gene, which are known ALS-risk genes 6,8 . For both loci, the rare variant burden association was conditionally independent from the top SNP that was included in the GWAS (Supplementary Figs. 2 and 9). Here, eQTL and mQTL analyses indicated that the risk-increasing effects of the common variants were mediated through both eQTL and mQTL effects on NEK1 and TBK1. Furthermore, a polymorphic STR downstream of NEK1 was associated with increased ALS risk (motif, TTTA; threshold = 10 repeat units, expanded allele frequency = 0.51, P = 5.2 × 10 −5 , false discovery rate (FDR) = 4.7 × 10 −4 ; Extended Data Fig. 4). This polymorphic repeat was in LD with the top associated SNP within this locus (r 2 = 0.24, |D′| = 0.70). There was no statistically significant association for the top SNP in the WGS data to reliably determine its independent contribution to ALS risk.
Lastly, the fourth group contains seven remaining loci for which there was no direct link to a causal gene through coding variants or repeat expansions. Here, we investigated regulatory effects of the associated SNPs on target genes acting as either eQTL or mQTL. Single genes were prioritized by SMR using both mQTL and eQTL for rs2985994 (COG3; Supplementary Fig. 10), rs229243 (SCFD1; Supplementary Fig. 11) and rs517339 (ERGIC1; Supplementary Fig.  4). In other loci, both methods prioritized multiple genes, such as rs631312 (MOBP and RPSA; Supplementary Fig. 1) and rs10463311 (GPX3 and TNIP1; Supplementary Fig. 3). Aside from the prioritized genes, each of these loci harbored multiple genes that were not prioritized by any method and are therefore less likely to contribute to ALS risk.
For two loci, no gene was prioritized with these approaches. Within the UNC13A locus (rs12608932; Supplementary Fig. 12), recent studies illustrate that the genome-wide significant SNPs act as splicing quantitative trait loci conditional on dysfunction of TAR DNA-binding protein (TDP)-43, resulting in inclusion of a cryptic exon in UNC13A 29,30 . Furthermore, we could not prioritize a specific gene in the HLA locus (rs9275477; Supplementary Fig. 5).
Genetic modifiers of ALS disease progression. We investigated whether genetic risk factors for ALS also act as disease modifiers that affect disease onset and progression. Genotypes for the 15 genome-wide significant SNPs, PRSs and the rare variant burden We therefore performed a cross-ancestry meta-analysis totaling 29,612 cases and 122,656 controls, which revealed three additional loci, totaling 15 genome-wide significant risk loci for ALS risk (Fig. 1, Table 1 and Supplementary Tables 4-18). Conditional and joint analysis did not identify secondary signals within these loci.
Of these findings, eight loci have been reported in previous GWASs (C9orf72, UNC13A, SCFD1, MOBP-RPSA, KIF5A,  CFAP410, GPX3-TNIP1 and TBK1) 11,14,15 . The rs80265967 variant corresponds to the p.D90A mutation in SOD1 previously identified in a Finnish ALS cohort enriched for familial ALS 13 . Interestingly, we observed a genome-wide significant common variant association signal within the NEK1 locus, which was previously shown to harbor rare variants associated with ALS 8 . The recently reported association at the ACSL5-ZDHHC6 locus 16,22 did not reach the threshold for genome-wide significance (rs58854276, P EUR = 5.4 × 10 −5 , P ASN = 4.9 × 10 −7 , P comb = 6.5 × 10 −8 ; Supplementary  Table 19), despite the fact that our analysis includes all data from the original discovery studies.

Rare variant gene-based association analyses in ALS.
To assess a general pattern of underlying architectures that link associated SNPs to causal genes, we first tested for annotation-specific enrichment using stratified LDSC. This revealed that 5′ UTR regions as well as coding regions in the genome and those annotated as conserved were most enriched for ALS-associated SNPs (Extended Data Fig. 2). Subsequently, we investigated how rare, coding variants contributed to ALS risk by generating a whole-genome sequencing (WGS) dataset of patients with ALS (n = 6,538) and control participants (n = 2,415), which is a subset of the common variant GWAS cohort. The exome-wide association analysis included transcript-level rare variant burden testing for different models of allele-frequency thresholds and variant annotations (Methods). This identified NEK1 as the strongest associated gene (minimal P = 4.9 × 10 −8 for disruptive and damaging variants at MAF < 0.005), which was the only gene to pass the exome-wide significance thresholds (0.05 ÷ 17,994 = 2.8 × 10 −6 and 0.05 ÷ 58,058 = 8.6 × 10 −7 for number of genes and protein-coding transcripts, respectively; Supplementary Table 20). This association was independent from the previously reported increased rare variant burden in selected patients with 'familial ALS' (ref. 8 ) who were not included in this study. Polygenic risk score (PRS) analyses did not illustrate a difference in PRSs in patients carrying rare variants in ALS-risk genes (SOD1, C9orf72 repeat expansion, TARDBP, FUS, NEK1, TBK1 and CFAP410) compared to all patients with ALS (Extended Data Fig. 3). Although power was limited, this is compatible with a scenario in which the genetic risk of ALS in these patients is a sum of rare variants in ALS genes and other (common) genetic variation.
Gene prioritization shows locus-specific underlying architectures. To assess whether rare variant associations could drive the common variant signals at the 15 genome-wide significant loci, we combined the common and rare variant analyses to prioritize genes within these loci. The SNP effects on gene expression were assessed by summary-based Mendelian randomization (MR) (SMR) in blood (eQTLGen 23 , n = 31,648) and a new brain cortex-derived eQTL dataset (MetaBrain 24 , n = 2,970). Finally, we analyzed methylation quantitative trait loci (mQTL) by SMR in blood-derived (n = 2,082) and brain-derived (n = 522) mQTL datasets 25-27 . Through these multi-layered gene-prioritization strategies, we classified each locus into one of four classes of most likely underlying genetic architecture to prioritize the causal gene ( Supplementary Figs. 1-15).

Locus-specific sharing of risk loci between ALS and neurodegenerative diseases.
To investigate the pleiotropic properties of ALS-associated variants and shared genetic risk with other brain diseases, we estimated genetic correlations between neurodegenerative diseases, psychiatric traits, cerebrovascular diseases and multiple sclerosis (Extended Data Fig. 5). This showed strong genetic correlations among neurodegenerative diseases. Bivariate LDSC confirmed a statistically significant genetic correlation between ALS and PSP (r g = 0.44, s.e. = 0.11, P = 1.0 × 10 −4 ) as previously reported 20 and also revealed a significant genetic correlation between ALS and AD (r g = 0.31, s.e. = 0.12, P = 9.6 × 10 −3 ) as well as between ALS and PD (r g = 0.16, s.e. = 0.061, P = 0.011; Fig. 3a). The point estimate for the genetic correlation between ALS and FTD was high (r g = 0.59, s.e. = 0.41, P = 0.15) but not statistically significant due to the limited size of the FTD GWAS (3,526 cases and 9,402 controls). Thus, power to detect a genetic correlation between ALS and FTD using LDSC was limited.
Patterns of sharing disease-associated genetic variants appeared to be locus specific ( Fig. 3b and Supplementary Table 21). To assess whether two traits shared a common signal, indicating shared causal variants, we performed colocalization analyses for all loci meeting P < 5 × 10 −5 in any of the GWASs of neurodegenerative diseases (n = 161 loci). This revealed a shared signal in the MOBP-RPSA locus between ALS, PSP and corticobasal degeneration (CBD) as well as a shared signal in the UNC13A locus between ALS and FTD (posterior probability, PP H4 > 95%; Extended Data Fig. 6). For the HLA locus, there was evidence for a shared causal variant between ALS and PD (PP H4 = 88%) but no conclusive evidence for ALS and AD (PP H4 = 51% for a shared causal variant and PP H3 = 49% for independent signals in both traits).
Furthermore, colocalization analyses identified two additional shared loci that were not genome-wide significant in the ALS GWAS: between ALS and PD at the GAK locus (rs34311866, PP H4 = 99%) and between ALS and AD at the TSPOAP1-AS1locus (rs2632516, PP H4 = 90%). Of note, the association at TSPOAP1-AS1 was not genome-wide significant in the GWAS of clinically diagnosed AD (P = 3.7 × 10 −7 ) either but was identified in the larger AD-by-proxy GWAS 31 . For FTD subtypes, C9orf72 showed a colocalization signal for a shared causal variant between ALS and the motor neuron disease subtype of FTD (mndFTD, PP H4 = 93%; Extended Data Figs. 6 and 7).

Enrichment of glutamatergic neurons indicates cell-autonomous processes in ALS susceptibility.
To find tissues and cell types for which gene expression profiles were enriched for genes within ALS-risk loci, we first combined gene-based association statistics Table 1  ). a For the strongest associated SNP in the SCFD1 locus, rs229195 (mAF = 0.337), details of the LD proxy rs229194 are described (mAF = 0.337, r 2 = 0.996 in Asian ancestries), as only the LD proxy was present in the Asian ancestry GWAS. The low-frequency SNPs rs80265967, rs113247976 and rs75087725 were not present in the Asian ancestry GWAS, and no LD proxies (r 2 > 0.8) were found. chr, chromosome; Position, basepair position in the reference genome Grch37; A 1 , effect allele; A 2 , non-effect allele; Freq, frequency of the effect allele in the european ancestry GWAS; s.e., standard error of the effect estimate. calculated using MAGMA 32 with gene expression patterns from the Genotype-Tissue Expression (GTEx) project (version 8) in a gene set enrichment analysis using FUMA 33 . We observed a significant enrichment in genes expressed in brain tissues across multiple brain regions but not in peripheral nervous tissue or muscle. Whereas this pattern roughly resembled the enrichments observed in PD and psychiatric traits, it was strikingly different from that reported 31 and observed in AD in which blood, lung and spleen were mostly enriched, resembling the pattern observed in multiple sclerosis, which is a typical immune-mediated brain disease ( Fig. 4a and full results in Supplementary Fig. 16 and Extended Data Fig. 8a). We subsequently queried single-cell RNA-seq datasets of humanderived brain samples to further specify brain-specific enriched cell types using the cell type analysis module in FUMA 34 . This showed significant enrichment for neurons but not for microglia or astrocytes (Fig. 4b). Further subtyping of these neurons illustrated that genes expressed in glutamatergic neurons were mostly enriched for genes within the ALS-associated risk loci. Again, this contrasted with AD, which showed specific enrichment of microglia, similar to multiple sclerosis (Extended Data Fig. 8b). In single-cell RNA-seq data obtained from brain tissues in mice, a similar pattern was observed showing neuron-specific enrichment in ALS and PD but microglia in AD (Extended Data Fig. 9). Together, this indicates that susceptibility to neurodegeneration in ALS is mainly driven by neuron-specific pathology and not by immune-related tissues and microglia.

Brain-specific coexpression networks improve detection of ALS-relevant pathways.
To determine which processes were mostly enriched in ALS, we performed enrichment analyses that combined gene-based association statistics with gene coexpression patterns obtained from either multi-tissue transcriptome datasets 35 or RNA-seq data from brain cortex samples (MetaBrain 24 ). To validate this approach, we first tested for enrichment of human phenotype ontology (HPO) terms that are linked to well-established disease genes in the Online Mendelian Inheritance in Man (OMIM) and Orphanet catalogs. Using the multi-tissue coexpression matrix, we found no enriched HPO terms after Bonferroni correction for multiple testing. Using the brain-specific coexpression matrix, however, we found a strong enrichment of HPO terms that are related to ALS or neurodegenerative diseases in general, including 'cerebral cortical atrophy' (P = 1.8 × 10 −8 ), 'abnormal nervous system electrophysiology' (P = 4.1 × 10 −7 ) and 'distal amyotrophy' (P = 8.6 × 10 −7 ; full list in Supplementary Table 22). In general, HPO terms in the neurological branch ('abnormality of the nervous system') showed an increase in enrichment statistics in ALS when using the brain-specific coexpression matrix compared to the multi-tissue dataset (Extended Data Fig. 10), which illustrates the benefit of the brain-specific coexpression matrix. Subsequently, we tested for enriched biological processes using reactome and gene ontology terms. Again, using the multi-tissue expression profiles, we found that no reactome annotations were enriched. Leveraging the brain-specific coexpression networks, we identified vesicle-mediated transport ('membrane trafficking' , P = 4.2 × 10 −6 , 'intra-Golgi and retrograde Golgi-to-endoplasmic reticulum (ER) trafficking' , P = 1.4 × 10 −5 ) and autophagy ('macroautophagy' , P = 3.2 × 10 −5 ) as enriched processes after Bonferroni correction for multiple testing (Supplementary Table 23). The subsequently identified enriched gene ontology terms were all related to vesicle-mediated transport or autophagy (Supplementary Tables 24 and 25).

MR analyses are in line with a causal relationship between cholesterol levels and ALS.
From previous observational case-control studies and our blood-based methylome-wide study 36 , numerous non-genetic risk factors have been implicated in ALS. Here, we studied a selection of those putative risk factors through causal inference in an MR framework 37 . We selected 22 risk factors for which robust genetic predictors were available including body mass index, smoking, alcohol consumption, physical activity, cholesterol-related traits, cardiovascular diseases and inflammatory markers (Supplementary  Table 26). These analyses provided the strongest evidence that cholesterol levels were causally related to ALS risk (b weighted median = 0.15, s.e. = 0.04, P = 3.2 × 10 −4 ; Fig. 5a and full results in Supplementary Table 27). These results were robust to removal of outliers through radial MR analysis 38 , and we observed no evidence for reverse causality (Supplementary Tables 28 and 29). Importantly, ascertainment bias can lead to the selection of more highly educated control participants 39 compared to patients with ALS who are mostly ascertained through the clinic. In line with control participants having higher education, MR analyses indicated a negative effect for years of schooling on ALS risk (inverse-variance-weighted P IVW = 2.0 × 10 −4 ; Fig. 5b). As a result, years of schooling can act as a confounder for the observed risk-increasing effect of higher total cholesterol levels through ascertainment bias. To correct for this potential confounding, we applied multivariate MR analyses including both years of schooling and total cholesterol levels. The results for 0.25 0.50 1.00 2.00 Survival HR (95% CI) Rare variants Multi-SNP Single SNP Survival a −5 0 5 Age at onset effect estimate (95% CI) Rare variants Multi-SNP Single SNP Fig. 2 | Genetic modifier analyses. a, cox proportional Hrs for genome-wide significant SNPs (brown, n = 15), PrSs (red, n = 2) and rare variant burden in ALS-risk genes (pink, n = 7) on survival (months) tested in 6,095 patients with ALS. estimated Hrs are displayed with error bars corresponding to 95% cIs. Higher Hrs correspond to shorter survival times. b, effect estimates from a linear regression model of age at onset (years) in 6,095 patients with ALS. Lower effect estimates correspond to a younger age at onset. effect estimates from linear regression are displayed with error bars corresponding to 95% cIs. The risk-increasing allele for ALS corresponds to the effect allele for both survival and age-at-onset analyses.
total cholesterol were robust in the multivariate analyses, suggesting a causal role for total cholesterol levels on ALS susceptibility (Supplementary Table 30).

Discussion
In summary, in the largest GWAS on ALS to date including 29,612 patients with ALS and 122,656 control participants, we identified 15 risk loci contributing to ALS risk. Through in-depth analysis of these loci incorporating rare variant burden analyses and repeat expansion screens in WGS data and blood-and brain-specific eQTL and mQTL analyses, we prioritized genes in 13 of the loci. Across the spectrum of neurodegenerative diseases, we identified a genetic correlation between ALS and AD as well as PD and PSP with locus-specific patterns of shared genetic risk across all neurodegenerative diseases. Colocalization analysis identified two additional loci, GAK and TSPOAP1-AS1, with a high posterior probability of shared causal variants between ALS and PD and between ALS and AD, respectively. We found glutamatergic neurons as the most enriched cell type in the brain, and brain-specific coexpression network enrichment analyses indicated a role for vesicle-mediated transport and autophagy in ALS. Finally, causal inference of previously described risk factors provides evidence for high total cholesterol levels as a causal risk factor for ALS. The cross-ancestry comparison illustrated similarities in the genetic risk factors for ALS in European and East Asian ancestries, providing an argument for cross-ancestry studies and to further expand ALS GWASs in non-European populations. It is important to note that three loci including those that harbor low-frequency variants (KIF5A, SOD1 and CFAP410) were not included in the East Asian GWAS due to their low MAFs. Therefore, the shared genetic risk might not extend to rare genetic variation, for which population-specific frequencies have been observed even within Europe.
The multi-layered gene-prioritization analyses highlighted four different classes of genome-wide significant loci in ALS. First, the sample size of this GWAS combined with accurate imputation of low-frequency variants directly identified rare coding variants that increase ALS risk. These include the known p.D90A mutation in SOD1 (MAF = 0.006) as well as rare variants in KIF5A (MAF = 0.016) and CFAP410 (MAF = 0.012) for which, after their identification through GWAS, experimental work confirmed their direct role in ALS pathophysiology 11,28,40 . Second, we confirmed that the pathogenic C9orf72 repeat expansion is tagged by genome-wide significant GWAS SNPs and that no residual signal is left by conditioning the SNP on the repeat expansion. Although more repeat expansions are known to affect ALS risk, we found no similar loci for which the SNPs tag a highly pathogenic repeat expansion. This suggests that highly pathogenic repeat expansions on a stable haplotype are merely the exception rather than the rule in ALS. In total, 83 and 178 SNPs were used as instruments at cutoffs of P < 5 × 10 −8 and P < 5 × 10 −5 , respectively. All methods show a consistent positive effect for an increased risk of ALS with higher total cholesterol levels. There is no evidence for reverse causality. Point estimates for mr are presented with error bars reflecting 95% cIs. b, mr results for ALS and years of schooling. In total, 306 and 681 SNPs were used as instruments at cutoffs of P < 5 × 10 −8 and P < 5 × 10 −5 . Point estimates for mr are presented, with error bars reflecting 95% cIs. Statistically significant effects with a two-sided P-value passing bonferroni correction for multiple testing over all tested traits (n = 22), instrument P-value cutoffs (n = 2) and mr methods (n = 5) are marked with an asterisk (total cholesterol, P weighted median = 0.0003 and P weighted median = 0.0007 for cutoffs at P < 5 × 10 −8 and P < 5 × 10 −5 , respectively; years of schooling, P IVW = 0.0002 at the cutoff of P < 5 × 10 −5 ).
Here, SNP outliers were not removed for instrument selection. Z, genetic instrument; b xy , estimated causal effect for an increase of 1 s.d. in genetically predicted exposure.
Third, common and rare variant association signals can converge on the same gene as observed for NEK1 and TBK1, consistent with observations for other traits and diseases 41-43 . We show that these signals are conditionally independent and that the common variants act on the same gene through regulatory effects as eQTL or mQTL. Fourth, we find evidence for regulatory effects of ALS-associated SNPs that act as eQTL or mQTL. These locus-specific architectures illustrate the complexity of ALS-associated GWAS loci for which not one solution fits all, but instead a multi-layered approach to prioritize genes is warranted.
In addition, we find locus-specific patterns of shared effects across neurodegenerative diseases. The MOBP locus has previously been identified in PSP and ALS, and here we show that indeed both diseases as well as CBD are likely to share the same causal variant in this locus. The same is true for UNC13A and C9orf72 with FTD and mndFTD, respectively. The colocalization analysis with PD identified a shared causal variant in the GAK locus, which was not found in the ALS GWAS alone. Furthermore, the TSPOAP1-AS1 locus harbors SNPs associated with ALS and AD risk. Although this locus was not significant in either of the GWASs, a larger GWAS including AD-by-proxy cases confirmed this as a risk locus for AD. This illustrates the power of cross-disorder analyses to leverage the shared genetic risk of neurodegenerative diseases.
We aimed to clarify the role of neuron-specific pathology in ALS susceptibility as opposed to non-cell-autonomous pathology through detailed cell type enrichment analyses. Previous experiments have illustrated multiple lines of evidence for non-cell-autonomous pathology in microglia, astrocytes and oligodendrocytes, which ultimately leads to neurodegeneration in ALS 44-46 . These experiments have shown that non-cell-autonomous processes, such as neuroinflammation, mainly act as modifiers of disease in SOD1 models of ALS 45,46 . Here, we show that genes within loci associated with ALS susceptibility are specifically expressed in (glutamatergic) neurons. This provides evidence for neuron-specific pathology as a driver of ALS susceptibility, which is in stark contrast to the signal of inflammation-associated tissues and cell types in AD and multiple sclerosis. It also shows that disease susceptibility and disease modification can be distinct processes, which is supported by our finding that most genetic susceptibility factors do not have a strong effect on survival. This motivates future large-scale genetic studies on modifiers of ALS progression, as these can be targets for potential new treatments for ALS as well.
The subsequent functional enrichment analyses identified that membrane trafficking, Golgi-to-ER trafficking and autophagy were enriched for genes within ALS-associated loci. These terms and their related gene ontology terms of biological processes are all related to autophagy and degradation of (misfolded) proteins. This corroborates the central hypothesis of impaired protein degradation leading to aberrant protein aggregation in neurons, which is the pathological hallmark of ALS. Our results suggest that this is a central mechanism in ALS even in the absence of rare known mutations in genes directly involved in these biological processes such as TARDBP, FUS, UBQLN2 and OPTN 47 .
Based on observational studies and MR analyses, conflicting evidence exists for lipid levels including cholesterol as a risk factor for ALS 48-50 . Potential selection bias, reverse causality and the subtype of cholesterol studied challenge the interpretation of these results. Here, we provided support for a causal relationship between high total cholesterol levels and ALS independent of educational attainment and ruling out reverse orientation of the MR effect. The total cholesterol effects were consistent across the different MR methods tested, indicating that this finding is robust to violation of the 'no horizontal pleiotropy' assumption. This is in line with our study showing methylation changes associated with increased cholesterol levels in ALS 36 . We do not find a clear pattern for either low-density lipoprotein (LDL) or high-density lipoprotein (HDL) cholesterol subtypes in relation to ALS risk. While cholesterol levels are closely related to cardiovascular risk, the association between cardiovascular risk and ALS risk remains controversial with conflicting reports 3,48,51 . Interestingly, recent work has shown that lipid metabolism and autophagy are closely related 52 , which brings the results of our pathway analyses and MR together. Both in vitro and in vivo experiments have shown that autophagy regulates lipid homeostasis through lipolysis and that impaired autophagy increases triglyceride and cholesterol levels. Conversely, high lipid levels were shown to impair autophagy 52 . Further studies on the effect of high cholesterol levels and protein degradation through autophagy illustrate that high cholesterol levels decrease the fusogenic ability of autophagic vesicles through decreased function of soluble N-ethylmaleimide-sensitive factor-attachment protein receptor (SNARE) 53,54 and lead to increased protein aggregation due to impaired autophagy in mouse models of AD 55 . Therefore, the risk-increasing effect of cholesterol on ALS might be mediated through impaired autophagy.
In conclusion, our GWAS identifies 15 risk loci in ALS and illustrates locus-specific interplay between common and rare genetic variation that helps to prioritize genes for future follow-up studies. We show a causal role for cholesterol, which can be linked to impaired autophagy as common denominators of neuron-specific pathology that drive ALS susceptibility and serve as potential targets for therapeutic strategies.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41588-021-00973-1.  Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. 11,14 and publicly available control datasets including 120,971 controls genotyped on Illumina platforms. Additionally, 6,374 cases and 22,526 controls were genotyped on the Illumina OmniExpress and Illumina GSA arrays. Details for each cohort are provided in Supplementary Table 1. All patients with ALS were diagnosed and ascertained through specialized MND clinics where they were diagnosed with ALS according to the (revised) El Escorial Criteria 56 by neurologists specialized in motor neuron diseases. Whole-blood samples were drawn for DNA isolation, which were specifically collected for ongoing case-control studies of ALS. Both cases with and without a family history for ALS and/or dementia were included. Cases were not pre-screened for specific ALS-related mutations. Given the late onset and relatively low lifetime risk of ALS, controls were not screened for (subclinical) signs of ALS. A detailed description of the ascertainment of newly genotyped cases and controls is provided in the Supplementary Note. All participants gave written informed consent, and the relevant local institutional review boards approved this study (Supplementary Note). Cases and controls formed cohorts when they were processed in the same laboratory and were genotyped in the same batch, resulting in 117 independent cohorts. Summary statistics were obtained for the Asian ancestry GWAS of ALS 15,16 (Supplementary Note).

GWAS quality control and imputation.
For each cohort, we first performed individual-and variant-level quality control, after which cohorts were merged into six strata based on genotyping platform. Subsequent stratum-wise quality control was performed, and strata were imputed up to the Haplotype Reference Consortium panel (r.1.1 2016) through the Michigan Imputation Server 21 . Full quality-control details are described in the Supplementary Note and Supplementary Fig. 17. Numbers of individuals and variants passing each quality-control step are described in Supplementary Table 2.
Association testing and meta-analysis. After quality control, a null logistic mixed model was fitted using SAIGE 57 0.29.1 for each stratum with principal component (PC)1-PC20 as covariates. The model was fit on a set of high-quality (INFO > 0.95) SNPs pruned with PLINK 1.9 ('-indep-pairwise 50 25 0.1') in a leave-one-chromosome-out scheme. Subsequently, a SNP-wise logistic mixed model including the saddlepoint approximation test was performed using genotype dosages with SAIGE. Association statistics for all strata were combined in an IVW fixed-effects meta-analysis using METAL 58 .
Genomic inflation factors were calculated per stratum and for the full meta-analysis. To assess any residual confounding due to population stratification and artificial structure in the data, we calculated the LDSC 59 intercept using SNP LD scores calculated in the HapMap3 CEU population.
Cross-ancestry analyses. GWAS summary statistics from two Asian ancestry studies were obtained 15,16 . These summary statistics were meta-analyzed with all European ancestry data in strata as described above. To assess genetic correlation for ALS in European and Asian ancestries, we used Popcorn 60 version 0.9.9. We used population-specific LD scores for genetic impact and genetic effect provided with the Popcorn software. The regression model ('-use_regression') was used to estimate genetic correlation. We calculated both the correlation of genetic effects (correlation of allelic effect sizes) and genetic impact (correlation of allelic effect size adjusted for difference in allele frequencies).
Conditional SNP analysis. Conditional and joint SNP analysis (COJO, GCTA version 1.91.1b) 61,62 was performed to identify potential secondary GWAS signals within a single locus. SNPs with association P ≤ 5 × 10 −8 were considered. Controls of European ancestry from the Health and Retirement Study (HRS, cohort 65, Supplementary Table 1), included in stratum 4 of this study, were used as the LD reference panel.
Gene prioritization. Whole-genome sequencing. Sample selection, sequencing and data preparation. Patients with ALS and control participants from Project MinE 63 were recruited for WGS. The participating cohorts were not pre-screened for ALS-associated mutations and are described in the Supplementary Note. In total, 228 patients were known to have at least one first-or second-degree relative with ALS. A full description of Project MinE and the sequencing and quality-control pipeline were described previously 64 . In summary, the first batch of 2,250 cases and control samples was sequenced on the Illumina HiSeq 2000 platform. All remaining 7,350 case and control samples were sequenced on the Illumina HiSeq X platform. All samples were sequenced to ~35× coverage with 100-bp reads and ~25× coverage with 150-bp reads for HiSeq 2000 and HiSeq X, respectively. Both sequencing sets used PCR-free library preparation. Samples were also genotyped on the Illumina 2.5M array. Sequencing data were then aligned to GRCh37 using the Isaac Aligner, and variants were called using the Isaac variant caller; both the aligner and caller are standard to Illumina's aligning and calling pipeline. Full details of individual-and variant-level quality control are described in the Supplementary Note.
Genic burden association analyses. To aggregate rare variants in a genic burden test framework, we used a variety of variant filters to allow for different genetic architectures of ALS-associated variants per gene as we and others did previously 64,65 . In summary, variants were annotated according to allele-frequency threshold (MAF < 0.01 or MAF < 0.005) and predicted variant impact ('missense' , 'damaging' , 'disruptive'). 'Disruptive' variants were those variants classified as frameshift, splice site, exon loss, stop gained, start loss and transcription ablation by SnpEff 66 . 'Damaging' variants were missense variants predicted to be damaging by seven prediction algorithms (SIFT 67 , PolyPhen-2 (ref. 68 ), LRT 69 , MutationTaster2 (ref. 70 ), Mutations Assessor 71 and PROVEAN 72 ). 'Missense' variants were those missense variants that did not meet the 'damaging' criteria. All combinations of allele-frequency threshold and variant annotations were used to test the genic burden on a transcript level in a Firth logistic regression framework in which burden was defined as the number of variants per individual. Sex and the first 20 PCs were included as covariates. All Ensembl protein-coding transcripts for which at least five individuals had a non-zero burden were included in the analysis.
Conditional genic burden analysis. We selected for each gene the protein-coding transcripts that were the most strongly associated with ALS across all different combinations of MAF and variant-impact thresholds. For these transcripts and variants, we applied Firth logistic regression on individuals included in both the GWAS and WGS datasets (5,158 cases and 2,167 controls). To assess whether the rare variant burden association and the signal from the GWAS were conditionally independent, we subsequently included the genotype of the top associated SNP within that locus as a covariate.
Short tandem repeat screen. For all individuals who had sequencing results in the HiSeq X dataset (5,392 cases, 1,795 controls), we screened all loci harboring SNPs associated with ALS meeting genome-wide significance for expansions of known and new STRs using ExpansionHunter 73 and ExpansionHunter Denovo 74 .
First, we used ExpansionHunter (version 4.0) to screen for expansions of known STRs located within 1 Mb of the top ALS-associated SNP. For this, we used the STRs identified from indels in 18 high-quality genomes and the GangSTR STR catalog based on STR annotations in the reference genome 75 . We excluded all homopolymers from these catalogs. Repeat length was subsequently regressed on case-control status using Firth logistic regression including the first 20 PCs as covariates, recoding the STR size to a biallelic variant using a sliding window over all observed repeat lengths. To correct for multiple testing across all possible thresholds, we applied Benjamini-Hochberg correction per STR.
To screen for extremely long STR expansions (similar to the C9orf72 repeat expansion) at loci that were not included in the predefined STR catalogs, we applied ExpansionHunter Denovo 74 . This method aims to only find STR expansions that exceed the sequencing read length (>150 bp) by identifying reads (mapped, mismapped and unmapped) that contain STR motifs, using their mate pairs for de novo mapping to the reference genome.
For all STRs, we calculated LD statistics (r 2 and |D′|) between recoded repeat genotypes at the optimal threshold and the top associated GWAS SNP. Subsequently, we conditioned the SNP association on the repeat genotype in a Firth logistic regression.
Summary-based Mendelian randomization. We used multi-SNP SMR 76,77 to infer the effect of gene expression variation on ALS using eQTL (the association of a SNP with expression of a gene) on ALS risk. We chose to apply SMR because this method yielded very similar results when compared to S-PrediXcan 78 and TWAS 79 (Supplementary Fig. 18) when applied using GTEx version 7 eQTL, and it can be applied to the large relevant eQTL datasets (MetaBrain and eQTLGen) without access to individual-level genotype and gene expression data. MetaBrain is a harmonized set of 8,727 RNA-seq samples from seven regions of the central nervous system from 15 datasets, and we selected eQTL derived from the cortex region of the brain in samples of European ancestry (MetaBrain Cortex-EUR eQTL, n = 2,970 individuals, n = 6,601 RNA-seq samples) as our instrument variable 24 . European-only ALS summary statistics were used as the outcome. To supplement this analysis, we also used eQTL in blood from the eQTLGen Consortium, as this is a large available eQTL resource. Samples of European ancestry in the HRS (cohort 65 of this GWAS) were used as the LD reference panel. SNPs with MAF ≥ 1% in the HRS were included. Further SMR settings were left as default, meaning probes with at least one eQTL with P ≤ 5 × 10 −8 were included.
We subsequently performed SMR using DNA mQTL data and European-only ALS summary statistics. Human prefrontal cortex and whole-blood DNA mQTL were generated as part of ongoing analyses by the Complex Disease Epigenomics Group at the University of Exeter (https://www.epigenomicslab.com/) using the Illumina EPIC HumanMethylation array that quantifies DNAm at >850,000 sites across the genome 25 . The prefrontal cortex mQTL dataset was generated using DNA-methylation and SNP data from 522 individuals from the Brains for Dementia Research cohort 26 and includes 4,623,966 cis mQTL (distance between quantitative trait locus SNP and DNAm site ≤500 kb) between 1,744,102 SNPs and 43,337 DNA-methylation sites. The whole-blood mQTL dataset was generated using DNAm and SNP data from 2,082 individuals 80 and included 30,432,023 cis mQTL between 4,030,902 SNPs and 167,854 DNA-methylation sites. mQTL reaching the significance threshold P ≤ 1 × 10 −10 were taken forward for SMR analysis as described by Hannon and colleagues 80 . To map CpG sites to their putative target genes, we used the expression quantitative trait methylation results from a paired methylation and gene expression (RNA-seq) study in blood 81 . For CpG sites where no expression quantitative trait methylation was present in this dataset, we used positional mapping based on the basal regulatory domains and extended regulatory domains as defined in the Genomic Regions Enrichment of Annotations Tool (GREAT) 82 , which is applied in the 'cpg_to_gene' function in the CpGtools toolkit 83 .
Polygenic risk score calculation. PRSs were constructed based on the 15 lead SNPs of genome-wide significant loci (15-SNP PRS) or a full-genome-wide model (full-genome PRS). For the 15-SNP PRS, the SNP weights were defined as the meta-analyzed effect estimates. We used the summary-BayesR framework from the Genome-wide Complex Trait Bayesian analysis (GCTB) toolkit 84,85 to obtain SNP weights for the full-genome PRS based on the European ancestry meta-analysis excluding stratum 6. We used the default model parameters and the precalculated sparse LD matrix of imputed HapMap3 SNPs in 50,000 random individuals included in the UK Biobank of European ancestries. Summary-BayesR SNP effects were plotted against marginal SNP effects to rule out potential biased estimates due to non-convergence of the MCMC algorithm. Finally, the PRSs for all individuals in stratum 6 were calculated using the '-score' function in PLINK and normalized to zero mean and unit variance.
Modifier analyses. For 6,095 of the patients with WGS and ALS, core clinical data were obtained including sex, site of onset (spinal or bulbar), age at onset (years), country of origin and survival, defined as time from disease onset to death, 23 h of continuous non-invasive ventilation per day or tracheostomy. Patients who were still alive were censored at the last date of follow-up.
The genetic risk factors included SNP genotypes, PRSs, C9orf72 repeat expansion status and the number of rare coding mutations in ALS-risk genes (SOD1, TARDBP, FUS, NEK1, TBK1 and CFAP410) as obtained from WGS as described above.
For survival analyses, the Cox proportional hazards mixed model from the 'coxme' package in R was used, modeling country of origin as a random effect. Fixed-effect covariates included sex, age at onset, site of onset, GWAS stratum and PC1-PC5. Violation of the proportional hazards assumption for genotype on survival was assessed by inspecting Schoenfeld residuals. For age-at-onset analyses, we applied linear regression of age at onset on genotype including sex, site of onset, country, GWAS stratum and PC1-PC5 as covariates.
Cross-trait analyses. Datasets and data preparation. GWAS summary statistics for clinically diagnosed AD 86 , PD 87 , FTD 88 , CBD 89 and PSP 20 in individuals of European ancestry were obtained. For AD, we used the clinical diagnosis as the case definition to avoid spurious genetic correlations that could have been introduced through the by-proxy design 31 , in which by-proxy cases are defined as having a parent with AD. Although this is a powerful design for gene discovery and the genetic correlation with clinically diagnosed AD is high 90 , mislabeling by-proxy cases when parents suffer from other types of dementia (for example, Lewy body dementia, Parkinson's dementia, FTD or vascular dementia) can lead to spurious genetic correlations with ALS and other neurodegenerative diseases. For FTD, we primarily used the results of the cross-subtype meta-analysis, which includes behavioral variant FTD, semantic dementia FTD, progressive non-fluent aphasia FTD and mndFTD. For CBD, allele coding was unavailable, and effect alleles were inferred by matching allele frequencies to those observed in the Haplotype Reference Consortium. SNPs with MAF > 0.4 were excluded. Because downstream methods rely on LD scores or population-specific LD patterns, the European ancestry summary statistics from the present study were used for ALS. For sample size parameters, effective sample size was calculated as described previously.
Multiple sclerosis summary statistics were obtained from the International Multiple Sclerosis Genetics Consortium 91 . For cerebrovascular diseases, GWAS summary statistics were obtained for ischemic stroke (any ischemic stroke) 92 , intracerebral hemorrhage 93 and intracranial aneurysm 94 . For psychiatric traits, GWAS summary statistics were obtained from Psychiatric Genomics Consortium studies on anorexia nervosa 95 , obsessive-compulsive disorder 96 , anxiety disorders (anxiety score) 97 , post-traumatic stress disorder (all European ancestries) 98 , major depressive disorder 99 , bipolar disorder 100 , schizophrenia 101 , Tourette's syndrome 102 , autism spectrum disorder 103 and attention-deficit hyperactivity disorder (European ancestries) 104 .
Genetic correlation. Genome-wide genetic correlation between neurodegenerative traits was calculated using LDSC (version 1.0.0) 59 . Precomputed LD scores of European individuals in the 1000 Genomes project for high-quality HapMap3 SNPs were used ('eur_w_ld_chr'). A free intercept was modeled to allow for potential sample overlap.

Enrichment analyses.
Linkage disequilibrium score regression annotation-specific enrichment analysis. We used LDSC (version 1.0.0) 59 to calculate SNP-based heritability, the LDSC intercept and SNP-based heritability enrichment for partitions of the genome. In all LDSC analyses, summary statistics excluding the HLA region of only samples of European ancestry were included. LD scores and partitioned LD scores provided by LDSC were used for genome-wide and genic region-based heritability analyses. The option '-overlap-annot' was used in the partitioned heritability analysis to allow for overlapping SNPs between MAF bins. SNPs with MAF > 5% were included.
Tissue and cell type enrichment analysis. Tissue and cell type enrichment analyses were performed using the GWAS summary statistics of the European ancestry meta-analysis and FUMA 33 software version 1.3.6a. FUMA performs a genic aggregation analysis of GWAS association signals to calculate gene-wise association signals using MAGMA version 1.6 and subsequently tests whether tissues and cell types are enriched for expression of these genes. For tissue enrichment analysis, we used the GTEx version 8 reference set. FDR-corrected P-values <0.05 across all tissues (n = 54) were considered statistically significant. For cell type enrichment analyses 34 , we used human-derived single-cell RNA-seq data on major brain cell types (GSE67835 without fetal samples 106 ), Allen Brain Atlas cell types 107 for the human-derived major neuronal subtypes and the DropViz 108 dataset for mouse-derived brain cell types across all brain regions. We applied FDR correction for multiple testing within each expression dataset, and FDR-corrected P-values <0.05 were considered statistically significant.
Pathway enrichment analysis. We used Downstreamer software 24 to identify enriched biological pathways and processes. First, gene-based association statistics were obtained with the Pascal method 109 , which aggregates SNP association statistics including SNPs up to 10 kb upstream and downstream of a gene, accounting for LD using the non-Finnish European individuals from the 1000 Genomes Project phase 3 (ref. 110 ) as a reference. In the Downstreamer method, putative core genes are defined as those that are coexpressed with disease-associated genes and can therefore be implicated in disease. Coexpression networks are based on either a large, multi-tissue transcriptome dataset including 56,435 genes and 31,499 individuals or brain-specific RNA-seq data obtained from the MetaBrain resource. The gene-based association statistics, coexpression matrix and gene Z scores per pathway or HPO term are then combined in a generalized least-squares regression model to obtain enrichment statistics 24 . Enrichment analyses were performed for reactome, gene ontology and HPO terms using multi-tissue or brain-specific transcriptome datasets to calculate the coexpression matrix.
The distribution of enrichment Z-score statistics was compared between analyses using multi-tissue or brain-specific coexpression matrices. Using the 'pyhpo' module in Python, all HPO terms were assigned to their parent term(s) in the 'phenotypic abnormality' (HP:0000118) branch, which includes phenotypic abnormalities grouped per organ system.

Mendelian randomization.
Causal inference through MR analysis was performed for 22 exposures for which large-scale GWASs are available and for which there is prior evidence for an association with ALS. These include seven behavioral-related traits: body mass index (anthropometric) 111 , years of schooling (educational attainment) 112 , alcoholic drinks per week, age of smoking initiation and cigarettes per day from Liu et al. 113 , days per week of moderate physical activity and days per week of vigorous activity from the UK Biobank 114 ; four blood pressure traits (coronary artery disease 115 , stroke 92 , diastolic blood pressure and systolic blood pressure 116 ); seven immune system traits from Vuckovic et al. 117 (basophil, eosinophil, lymphocyte, monocyte, neutrophil and white blood cell counts) and C-reactive protein 118 ; and four lipid traits from Willer et al. 119 (HDL cholesterol, LDL cholesterol, total cholesterol and triglyceride levels). A full description of the included studies is provided in Supplementary Table 26. From these GWASs, SNPs to serve as instruments for MR analyses were selected at two different P-value cutoffs (P < 5 × 10 −8 and P < 5 × 10 −5 ) and then LD clumped to obtain independent SNPs. SNP effect estimates on ALS risk were obtained from the European ancestry-only GWAS and, if needed, an LD proxy was selected (r 2 > 0.8).
After harmonizing effect alleles and excluding palindromic SNPs, we performed a series of quality-control steps to avoid biased estimates of causal effects, checking for each exposure (1) instrument coverage (>85% overlapping SNPs; Supplementary Table 31 Top associated SNPs in the ALS GWAS were selected for colocalization analysis between ALS and FTD subtypes using cOLOc. In the top panel, point height is the two-sided -log 10 (P-value) of the top-associated SNP in the ALS GWAS. In the bottom panel, association P-values of these SNPs with FTD subtypes are shown by color. The bayesian posterior probability for a shared causal variant between traits (PP H4 ) is depicted by a connection between points. Fig. 8 | tissue and cell-type enrichment analyses for all brain diseases. Tissue (a) and cell-type (b) enrichment for all included brain diseases obtained from two-sided mAGmA linear regression. Only brain diseases with exome-wide significant gene-based mAGmA associations (P < 2.7 × 10 −6 ) were suitable for tissue and cell-type enrichment analyses. The color represents enrichment coefficient and size indicates two-sided -log 10 (P-value) of enrichment obtained by the linear regression model in the mAGmA gene-property analysis. Due to the large number of significant genes in the gene-based mAGmA analyses for schizophrenia, bipolar disorder and multiple sclerosis the enrichment P-values were truncated at P < 1.0 × 10 −5 . ALS = amyotrophic lateral sclerosis, PD = Parkinson's disease, AD = Alzheimer's disease, ADHD = attention-deficit hyperactivity disorder, ASD = autism spectrum disorder, TS = Tourette's syndrome, ScZ = schizophrenia, bIP = bipolar disorder, mDD = major depressive disorder, PTSD = post-traumatic stress disorder, Anxiety = anxiety disorder (score), AN = anorexia nervosa, IA intracranial aneurysm (any), IS = ischemic stroke, mS = multiple sclerosis, cx = cortex, OPc = oligodendrocyte progenitor cells. Fig. 9 | Cell-type enrichment analysis in mice. cell-type enrichment analysis using the DropViz single-cell rNA sequencing dataset obtained from mice. Similar to the cell-type enrichment analyses there is neuron-specific enrichment in ALS and Parkinson's disease. In Alzheimer's disease microglia are the most enriched cell-types. The color represents enrichment coefficient and size indicates two-sided -log 10 (P-value) of enrichment obtained by the linear regression model in the mAGmA gene-property analysis. Statistically significant enrichments after correction for multiple testing with a false discovery rate (FDr) < 0.05 are marked with an asterisk. ALS = amyotrophic lateral sclerosis, PD = Parkinson's disease, AD = Alzheimer's disease, cx = cortex. Fig. 10 | Human phenotype ontology term enrichment. Downstreamer enrichment analyses were performed using the multi-tissue and brain-specific co-expression matrix to identify co-regulated ALS-genes. The distribution of enrichment statistics (Z-scores) for all Human phenotype ontology (HPO) terms are plotted per HPO parent branch. The multi-tissue analysis indicates enrichment for the neurology parent branch 'abnormality of the nervous system' (dark-red), although no term passes the bonferroni threshold for multiple testing. The brain-specific analysis illustrates stronger enrichment for the neurology parent branch. In total, 58 HPO terms pass the threshold for multiple testing of which 42 are defined within the 'abnormality of the nervous system' branch.