Large meta-analysis of genome-wide association studies expands knowledge of the genetic etiology of Alzheimer’s disease and highlights potential translational opportunities

Deciphering the genetic landscape of Alzheimer’s disease (AD) is essential to define the pathophysiological pathways involved and to successfully translate genomics to potential tailored medical care. To generate the most complete knowledge of the AD genetics, we developed through the European Alzheimer’s Disease BioBank (EADB) consortium a discovery meta-analysis of genome-wide association studies (GWAS) based on a new large case-control study and previous GWAS (in total 39,106 clinically diagnosed cases, 46,828 proxy-AD cases and 401,577 controls) with the most promising signals followed-up in independent samples (18,063 cases and 23,207 controls). In addition to 34 known AD loci, we report here the genome-wide significant association of 31 new loci with the risk of AD. Pathway-enrichment analyses strongly indicated the involvement of gene sets related to amyloid and Tau, but also highlighted microglia, in which increased gene expression corresponds to more significant AD risk. In addition, we successfully prioritized candidate genes in the majority of our new loci, with nine being primarily expressed in microglia. Finally, we observed that a polygenic risk score generated from this new genetic landscape was strongly associated with the risk of progression from mild cognitive impairment (MCI) to dementia (4,609 MCI cases of whom 1,532 converted to dementia), independently of age and the APOE  4 allele.


INTRODUCTION
The objectives of translational genomics and subsequent personalized medicine are to define prognosis/diagnosis markers of a disease and to adapt treatments at the individual level. Such approaches have already been successful in cancers and promising approaches are emerging for tailored treatments for diabetes 1 . However, in many other pathologies such as dementia, personalized approaches are still a concept and have proven difficult to develop.
Indeed, dementia is a term that encompasses an array of complex phenotypes in which the main symptom is the progressive decline of cognitive performance as observed in Alzheimer's disease (AD), the most common type. The dementia stage in AD is the culmination of a long, progressive, and silent process that is followed by intermediate pathological changes leading ultimately to cognitive decline and dementia. Importantly, the concept of AD has changed to recognize that AD is a continuum with a long preclinical phase of Subjective Cognitive Decline (SCD), a stage of mild cognitive impairment (MCI), and a dementia phase. The preclinical phase may offer unique opportunities for prevention of AD through early detection of AD pathology and application of pharmacological treatment with disease-modifying drugs that are still under development.
In this context, translational genomics may be of particular interest in AD since this disease exhibits a particularly high heritability, estimated between 60 and 80% 2 . Indeed, since the first large genome-wide association studies (GWAS) published in 2009 3,4 , many loci/genes have been associated with the risk of developing AD 5 . These genetic findings have contributed to the identification of pathways and networks underlying AD, specifically implicating immunity, cholesterol processing, endocytosis, and more recently, the role of A and Tau in the pathogenesis of common forms of AD 6 . Indeed, it is expected that the discovery of genetic risk factors for AD will reveal additional relevant pathogenic pathways operating in AD. Mounting evidence now suggests that AD is a disease in which multiple components combine to trigger the disease, beyond the dominant "amyloid cascade hypothesis". This observation may also indicate that preponderant deleterious pathway(s) might be differentially involved at the individual level. For instance, therapies targeting the APP metabolism pathway may not be effective if this pathway is relatively unimportant for an individual with a particular genetic profile. If true, this would imply the importance of pursuing multiple therapies that target different genetically driven-pathways. Once a range of treatment options become available, an individualized model of AD pathology would be feasible, such that polytherapies and personalised medicine approaches can be developed and applied.
In a personalized medicine framework, it will be essential to translate large-scale genomic information into useful tools for personalized risk prediction and subsequent potential tailored intervention, for example through polygenic risk scores (PRS). Generating PRSs has been regarded as a reasonable solution to summarize genome-wide genotype data into a single variable that measures the genetic contribution to a trait or a disease for a particular individual. Herein, genomic information offers a unique opportunity for early detection. However, a large part of the AD genetic component is still unknown and several loci/genes also need to be confirmed as genuine genetic risk factors.
As a consequence, strong efforts are still needed to characterize the genetic architecture of AD, with the objectives to identify critical pathways and construct powerful PRSs for the disease. Within this background, increasing the size of GWAS data is an obvious way to facilitate the characterization of new genetic risk factors as observed in many other multifactorial diseases. In addition, since rare variants might explain a large proportion of the missing heritability, improving their analyses is also mandatory. Taking into account these two major points, we developed the European Alzheimer's Disease BioBank (EADB) consortium grouping together the main European GWAS consortia already working on AD and a new dataset of 20,464 AD cases and 22,244 controls collated from 15 European is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; countries (Belgium, Bulgaria, Czech Republic, Denmark, Finland, France, Germany, Greece, Italy, Portugal, Spain, Sweden, Switzerland, The Netherlands and The United Kingdom). We then benefited from the Trans-Omics for Precision Medicine (TOPMed) imputation panel based on whole-genome sequencing of 62,784 individuals in order to increase the number of variants tested and to improve the imputation quality of the rare variants 7 . The EADB results were meta-analysed with a proxy-AD GWAS performed in the UK Biobank. The best hits (p≤10 -5 ) generated from this meta-analysis step were then replicated in a large set of samples from the ADGC and CHARGE consortia.
In addition, in EADB, we collected an independent longitudinal cohort of 4,609 MCI cases of whom 1,532 converted to dementia. This provided us with the unique opportunity to test the association between a PRS we generated from our GWAS data and the risk of progression to dementia/AD with the objective to translate genomic information into personalized risk profiles for early detection of AD risk. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ;

GWAS analysis
The EADB Stage I (GWAS meta-analysis) was based on 39,106 clinically diagnosed cases, 46,828 proxy-AD cases and 401,577 controls (Supplementary Tables 1 and 2) and on 21,101,114 variants after quality control. Genomic inflation factors (λ) were slightly inflated (λ =1.08 overall and 1.17 when restricted to variants with minor allele frequency (MAF) above 1% (see Supplementary Figure 1 for a quantile-quantile (QQ) plot). However, linkage disequilibrium score (LDSC) regression estimate 8 indicated that the majority of this inflation was due to a polygenic signal, with the intercept being close to 1 (intercept=1.05, s.e=0.01 versus λ=1.2 on the variants considered in the LDSC analysis).
We selected all variants with P value less than 1×10 −5 in Stage I. We defined nonoverlapping regions around those variants, excluded the region corresponding to APOE, and sent the remaining variants for follow-up (Stage II, n=11,390, see Methods) in a large set of samples from the ADGC and CHARGE consortia (18,063 cases and 23,207 controls). A signal was considered as genome-wide significant if nominally replicated (P value≤0.05) in the same direction in the Stage I and Stage II analyses and if associated with AD risk with a P value less than 5×10 −8 in the Stage I + Stage II meta-analysis. In addition, we applied a PLINK clumping procedure 9 to define potential independent hits from the Stage I results (see Methods). After validation by conditional analyses (see Supplementary Information and  Supplementary Tables 3 and 4), this approach led us to confirm 38 signals in 34 loci already known to be associated with the risk of developing AD in the main previous AD GWASs 6,10-15 (Table 1) and to propose 31 new loci (Table 2 and Supplementary Figures 2-24). Five of these loci (APP, CCDC6, NCK2, TSPAN14 and Sharpin) were already reported in two preprints using GWAS data included in our study 16,17 . Besides, the NDUFAF6 and IGH loci were previously reported in a gene-wide analysis 18 . Of note, the magnitudes of associations in Stage I were highly similar to those observed if we restricted the Stage I to clinicallydiagnosed cases, hereafter denoted as diagnosed cases only analysis (Supplementary Table  5 and Supplementary Figure 25). In addition, we did not detect any signal which may be mainly brought by the proxy-AD cases (Supplementary Table 5). Of note, we also provided a list of four loci with a genome-wide significant signal in the Stage I + Stage II analysis, but failing in Stage II (Supplementary Table 6).

Pathway analyses
To evaluate the biological significance of this new AD genetic landscape, we first performed pathway enrichment analyses. 150 gene sets were significant after multiple testing correction (q≤0.05, see Methods) in the Stage I (Supplementary Table 7), with the 20 most significant pathways shown in Table 3. The most significant gene sets relate to amyloid and Tau, with many of the other significant gene sets relating to lipids and immunity. Notably, there are gene sets related to macrophage activation and microglial cell activation. We also assessed whether enrichment pathway analyses were sensitive to the inclusion of the proxy-AD cases in Stage I. When analyses were performed limited to the diagnosed cases only, 69 gene sets were significant (q≤0.05) (Supplementary Table 7). Of these 69 gene sets, 53 reached q≤0.05 and all 69 reached p≤0.05 in the full Stage I that includes proxy-AD cases. This indicates that including proxy-AD cases did not mask disease-relevant biological information. We also repeated the enrichment analysis by using a window of 35kb upstream and 10kb downstream to assign variants to genes or by removing all 70 genes within 1Mb of APOE. Results were consistent between analyses (Supplementary Table 7).
We then performed a single-cell expression enrichment analysis in MAGMA using human data from the Allen Brain Atlas dataset (see Methods). Two complementary measures were used for each cell type: average gene expression per nucleus (Av. exp) and percentage of nuclei in a cell type expressing each gene (% Cell exp). Gabaergic/glutamergic neurons, astrocytes, oligodendrocytes, microglia and endothelial cells were analyzed and only microglia expression reached significance after correcting for multiple testing in the two measurements (FDR≤0.05; Av. Exp.; p=4.60x10 -4 and % Cell. Exp. p=5.59x10 -8 , Supplementary Table 8), with increased expression corresponding to more significant association with AD risk. A similar result was also observed using the mouse single cell dataset from Skene et al 19 (Supplementary Table 9).
We tested whether the relationship between microglia expression and association with AD risk was specific to particular areas of biology by using MAGMA. In particular, we tested for interaction between expression and pathway membership of each of the 91 significant pathways containing at least 25 genes with measured expression (Supplementary Table 10). Among several significant interactions, the most significant one was detected between GO:1902991 (regulation of amyloid precursor protein catabolic process) and gene expression level in microglia after multiple testing correction (q-values=1.2x10 -12 and 8.3x10 -8 for % Cell Exp. and Av. Exp. respectively). This interaction is still significant when the APOE locus is removed (q-values= 3.8x10 -11 and 7.8x10 -4 for % Cell Exp and Av. Exp. respectively). This observation indicates that among the amyloid-beta gene sets showing an overall enrichment for AD association signal, it is the genes in these pathways with highest microglia expression that show the most association, suggesting a functional relationship between microglia and APP/A pathways. A complete list of genes in APP/Apathways GO:1902991 (but also GO:1902003), along with their GWAS significance and microglia expression, is given in Supplementary Table 11.

Gene prioritization
In order to prioritize candidate genes in the new loci, we considered the "nearest" gene from the lead variant and the genes exhibiting AD-related modulations within a region of 1 Mb around the lead variant according to those criteria: (i) expression and splicing quantitative trait loci (eQTLs and sQTLs) and colocalization analyses combined with transcriptome-wide association studies on expression and splicing (eTWAS and sTWAS) in AD-relevant brain regions; (ii) genetic-driven methylation as a biological mediator of genetic signals in blood (MetaMeth). We also considered those additional criteria: (i) the functional impact of gene under-expression on APP metabolism 20 ; (ii) methylation QTL (mQTL) and histone acetylation QTL (haQTL) effects of the lead variants in dorsolateral prefrontal cortex (DLPFC) 21 and (iii) additional eQTL effects of these variants in naïve state monocytes and macrophages [22][23][24][25][26][27] . All the results were summarized in Fig. 2 and a full description of how the genes were prioritized is reported in the Supplementary Information (see also Supplementary Tables 12-21  Although the lead variants did not fall within a gene, for eight of the novel loci, brain molecular QTL, TWAS, blood MetaMeth and/or APP metabolism results exclusively supported the genes nearest to the lead variant: OTULIN (locus 4), RASA1 (locus 5), ICA1 (locus 9), TMEM106B (locus 10), ABCA1 (locus 15), CTSH (locus 23), MAF (locus 25) and SIGLEC11 (locus 28). Three other "nearest" genes in the new loci can be prioritized since the lead variant corresponds to a predicted deleterious missense variant within the gene itself: MME (locus 2), FDFT1 (locus 12), and SHARPIN (locus 14). For SHARPIN, we found additional evidence that AD risk in the locus is associated with SHARPIN expression and splicing events (Supplementary Tables 17-18 and Supplementary Fig. 31-32). Finally, for six loci: NCK2 (locus 1), RASGEF1C (locus 6), HS3HT5 (locus 7), UMAD1 (locus 8), C1S (locus 19) and APP (locus 31), none of the candidate genes could be prioritized based on geneticdriven expression, splicing or methylation analyses, therefore we considered that their proximity to the lead variant was in favor of their prioritization but at this stage at a lower level of confidence. Of note, APP is an obvious candidate gene, but CYYR1-AS1 in between APP and ADAMTS1 might also be of interest (Supplementary Tables 17, 21 and 31).
The remaining 14 novel loci present a more complicated pattern; several genes exhibit ADrelated modulations in the same locus, and/or the prioritized gene is not the nearest protein coding gene. First, we could efficiently prioritize candidate risk genes in 4 additional loci: . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; EGFR (locus 11), TSPAN14 (locus 17), BLNK (locus 18) and GRN (locus 27). For instance, in locus 11, EGFR is a likely candidate gene because its eQTL signals colocalize with the AD risk association signal, and its fine-mapped eTWAS hits (with FOCUS PIP values of ~1) associate predicted increased EGFR expression with increased AD risk (Fig 3; Supplementary Tables 15,17;supplementary Fig. 29,31). In the complex locus 17, TSPAN14 was identified as the candidate risk gene as it exhibited numerous AD-related expression and splicing modulations, including novel cryptic complex splicing events that we identified and experimentally confirmed ( Fig. 3; Supplementary Tables 13-19 and supplementary Fig. 26,27,29-32).
We did not clearly identify a single candidate in the remaining 10 loci. However, current evidence points towards the following candidate genes: i) DGKQ, SCL26A1 and IDUA in the complex locus 3; ii) CCDC6 in locus 16; iii) SNX1 in locus 22 (importantly, we previously determined that SNX1 and APH1B GWAS signals in this region were independent, see Supplementary Tables 3 and 4); iv) INO80E, DOC2A, TBX6 and YPEL3 in the most complex new locus 24 (ALDOA locus); v) LIME1 and RTEL in locus 30. In locus 13, among the 4 genes, NDUFAF6 exhibited the highest number of hits but TP53INP1 was also of interest. In locus 21, a recent report pointing to an increased burden of rare variants in ATP8B4 in AD patients prioritizes this gene 28 . For the locus 29, we consider LILRB2 as a plausible risk gene according to bibliographical data 29,30 . Finally, we were not able to prioritize a gene in the complex IGH cluster (locus 20), nor in the PRDM7 locus (locus 26).

Polygenic risk score
In order to explore the effect of a genetic burden on progression from mild cognitive impairment to AD-dementia, a PRS based on the genetic data generated above (see Methods and Supplementary Table 22) was constructed and tested in several longitudinal cohorts of MCI cases (Supplementary Table 23). We observed a significant association of this PRS with the risk of progressing to any type of dementia (HR=1.028 per average risk variant, 95%CI [1.022-1.033], p=4.93x10 -7 ) or with the risk of progression to AD dementia only (HR=1.033 per average risk variants, 95%CI (1.027-1.039), p=8.6x10 -8 ) after adjustment on age, sex, principal components and the number of APOE-e4 alleles ( Figure 4). Unadjusted analysis, analysis adjusted for age, sex and PCs only and coding non-AD converters as censored cases in the progression to AD dementia analysis did not change the results (see Supplementary table 24). Importantly, association of the PRS with progression risk does not seem to be modified by the presence of APOE-4 since we did not find any significant interaction between the number of APOE-e4 alleles and the PRS whatever the model tested. Of note, the number of APOE-e4 alleles itself exerted a strong effect on the progression to all-cause of dementia (HR=1.64, 95%CI [1.51-1.78], p=1.2x10 -33 ) and AD-dementia (HR=1.79 [1.64-1.96], p=3.6x10 -38 ). This effect corresponds to carry 18 average risk variants coded in our PRS. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ;

DISCUSSION
This meta-analysis combining a new large case-control study and previous GWAS identified 69 independent genetic risk factors for AD, 38 previously reported in published GWAS and 31 corresponding to novel signals, strongly expanding our knowledge of the complex genetic landscape of Alzheimer's disease. We leveraged genetic, functional, and prior literature to nominate credible candidate genes at each new locus with a particular emphasis on transcriptional regulation, methylation, and APP metabolism datasets. A short description of these genes and their potential implication in AD are described in the Supplementary Information. Remarkably, our meta-analysis and the characterization of these new loci clarify the global picture of AD etiology. For instance, pathway enrichment analyses remove ambiguities concerning the involvement of Tau binding proteins and APP/A metabolism as major actors of the AD processes, beyond the levels of certainty previously described 6 .
In addition, beyond the genetic risk factors already known to be involved in the APP metabolism, i.e. ADAM10, APH1B and FERMT2 20 , we also proposed five candidate genes susceptible to modulate this metabolism (DGKQ, RASA1, ICA1, DOC2A and LIME1). Although further investigations are needed to determine their exact implications, our data clearly support the central role of the APP/A metabolism/functions in the pathophysiological process of late-onset forms of AD. Of note, none of these genes were included in the GO:1902992 pathway (negative regulation of amyloid precursor protein catabolic process) we characterized as the most enriched in potential AD genetic risk factors (Table 3). These enrichment pathway analyses also confirmed the involvement of innate immunity and microglial activation in AD (Table 3). In addition, single-cell expression enrichment analysis also highlighted genes expressed in microglia (Supplementary Tables 8 and 9). Finally, 9 of our prioritized genes, i.e. OTULIN, RASGEF1C, TSPAN14, BLNK, ATP8B4, MAF, GRN, SIGLE11C and LILRB2, appeared to be mainly or almost only expressed in microglia (Fig.  2). However, at this time, only GRN is currently referenced in the microglial cell activation pathway (GO:0001774; Table 3). This suggests that enrichment of pathways involving microglia may be underestimated in our current analysis and further works will be clearly needed to determine whether and how these 9 genes may be involved in microglia function/activation. Several publications have already demonstrated involvement of the corresponding proteins in microglia function/activation (see Supplementary Information) and importantly three, i.e. GRN, SIGLEC11 and LILRB2 have also been linked to A peptides/amyloid plaques 29,31,32 . Taking into account the already known genetic risk factors primarily expressed in microglia, i.e. INPPD5, TREM2, SPI1, MS4A4A, SPPL2A, PLC2 and ABI3 (Supplementary Figure 35), this means that at least 25% of the genetic risk loci described in this paper are credibly linked to AD-related microglia dysfunctions. Importantly, TREM2, PLCg2, ABI3 and INPP5D were also characterized as microglia A response proteins at the transcript and/or protein levels 33 . This observation thus indicates that at least 7 genes (44% of those mostly expressed in microglia) have been already linked to A clearance/toxicity. However, it is also necessary to keep in mind that gene prioritization, while efficient for numerous loci, presents some limitations, particularly in complex locus where it is difficult to clearly identify the most relevant gene. In addition, it is important to note that for our molecular QTL-based analyses, we only considered cis-QTLs that are typically found within 1 Mb around the molecular phenotype feature. Therefore our analyses might have missed certain molecular trans-QTLs with important effect on AD risk. Furthermore, even though we extensively integrated our GWAS results with the information derived from expression, splicing, and methylation landscape of newly identified genetic risk regions that hinted at possible explanation for mechanism of action of the AD association in these regions, our post-GWAS analyses did not account for the possibility that the underlying risk mechanism could be explained through the effect of genetic variation on protein levels (protein QTLs), metabolite levels (metabolite QTLs), 3D spatial organization of chromatin (e.g. topologically associated domains [TADs]) or a structural variant that might be tagged by is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; the newly identified genetic risk variants. We therefore emphasize that for complete elucidation of AD risk mechanisms in these regions, more investigations are required. We therefore cannot exclude that we overstated some pathophysiological pathways based on incorrect gene assignment or incomplete information. Nevertheless, reinforcing the role of microglia in AD, our data also revealed for the first time a (direct or indirect) statistical relationship between gene expression in microglia, genetic risk factors and APP/A pathways (Supplementary table 10).
Translating genetic findings into tools that can be used in the clinical setting has proven to be challenging because different strategies are available to create a PRS summarizing personal genetic burden. In our study, we computed a PRS following the strategy previously described by Chouraki et al 34 in which increase of one point on the PRS corresponds to carrying one additional average risk allele. We observed that this PRS was associated with the risk of progression from MCI to dementia and to AD dementia (ADD), the major form of dementia. Importantly, the association was obtained while the APOE-ε4 allele was not included in the PRS. Moreover, the association did not disappear following adjustment for age or additive effect of the APOE-ε4 allele. Previous studies evaluating the effect of PRS on progression of MCI to ADD have provided compelling evidence supporting the role of APOE, whereas the contribution of additional genetic variants to progression has not shown unanimous results 35 . Our finding of an APOE-independent effect on MCI progression may be explained by (i) the large longitudinal sample of 4,609 MCI cases of whom 1,532 converted to dementia (ii) and the improved knowledge of the genetic component of AD through the larger number of newly-discovered genome-wide significant variants included in the PRS. Our study also shows that carrying 18 "average risk variants" conferred a similar probability of progressing to ADD as being APOE-ε4 heterozygous, while carrying 32 "average risk variants" resemble that of being APOE-ε4 homozygous. Although further research is still needed before these findings can be translated into the clinical routine, completing the genetic architecture of AD is definitely paving the way to personalized risk prediction even before dementia stage is reached.
Of note, several new AD loci were also associated with the risk of developing other neurodegenerative diseases: IDUA locus with Parkinson's Disease (PD), GRN and TMEM106B loci with fronto-temporal dementia (FTD). According to the large number of cases analyzed in our study and the well documented clinical diagnostic errors between neurodegenerative diseases, we cannot exclude that these associations are in part due to contaminations by PD or FTD cases in our sample. However, GWAS colocalization analyses indicate that the main signal in PD is independent of the one observed in AD (Supplementary  Tables 25 and 26). Only small GWAS are available in FTD (and depending on the type of FTD) 36 , this makes difficult to definitely answer to the independency of the AD and FTD signals. Further investigations will be required all the more since sporadic FTD has been described as a polygenic disorder where multiple pleiotropic loci with small effects contribute to increased disease risk 37 . In addition, the lead GRN variant in AD is functional and has also been described to be associated with TDP-43 positive FTD risk 38 . In this context, it will be interesting to determine whether our PRS may be specific or not of AD given the potential genetic overlap between neurodegenerative diseases. Of note, our PRS was not associated with risk of converting to non-AD dementia (HR=1.010, 95%CI [0.996-1.025], p=4.8x10 -1 ). However, this absence of association may be due to lack of statistical power in our current analysis and will require further investigations.
In conclusion, our work demonstrates that improved characterization AD genetics also expands our knowledge of the underlying pathophysiological processes, presenting novel opportunities for therapeutic approaches and risk prediction through robust PRS. Convergence between treatments generated from genomics and PRS may thus pave the way to translational genomics and personalized medicine. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ;

MATERIALS AND METHODS
Samples. All discovery meta-analysis samples are from the following consortia/datasets: EADB, GR@ACE, EADI, GERAD/PERADES, DemGene, Bonn, the Rotterdam study, the CCHS study, NxC and the UK Biobank. Summary demographics of these case-control studies are described in Supplementary Table 1 Table 1 and Supplementary Information) and fully described elsewhere 6,11,14,15,[39][40][41] . Written informed consent was obtained from study participants or, for those with substantial cognitive impairment, from a caregiver, legal guardian, or other proxy. Study protocols for all cohorts were reviewed and approved by the appropriate institutional review boards. Further details of all cohorts can be found in the Supplementary Information.  Table 2). For the UK Biobank, we used the provided imputed data, generated from a combination of the 1000 Genomes (1000G), HRC and UK10K reference panels. See Supplementary Information for more details.

Stage I analyses.
Association tests between AD clinical or proxy status and autosomal genetic variant were conducted separately in each dataset using logistic regression assuming an additive genetic model as implemented in SNPTEST 45 or in PLINK 9 , except in the UK Biobank where a logistic mixed model as implemented in SAIGE 46 was considered. Analyses were performed on the genotype probabilities in SNPTEST (newml method) and on dosage in PLINK and SAIGE. Analyses were adjusted for principal components and genotyping centers when necessary (Supplementary Table 2). For the UK Biobank dataset, effect sizes and standard errors were corrected by a factor of two to take into account that proxy cases were analysed 12 . We filtered out duplicated variants and variants with (i) missing effect size, standard error or P value, (ii) absolute value of effect size above 5, (iii) imputation quality less than 0.3, (iv) the product of the minor allele count and the imputation quality (mac-info score) less than 20. In the UK Biobank dataset, only variants with minor allele frequency (MAF) above 0.01% were analyzed. For datasets not imputed with the TOPMed reference panel, we also excluded (i) variants for which conversion of position or alleles from the GRCh37 assembly to the GRCh38 assembly was not possible or problematic, or (ii) variants with very large difference of frequency between the TOPMed reference panel and the reference panels used to perform imputation. Results were then combined across studies with a fixed-effect meta-analysis using the inverse variance weighted approach as implemented in the METAL software 47 . We filtered (i) variants with heterogeneity P value below 5x10 -8 , (ii) variants analyzed in less than 20% of the total number of cases and (iii) variants with frequency amplitude above 0.4 (defined as the difference between the maximum and minimum frequency across studies). We further excluded variants analyzed in the UK Biobank only or variants not analyzed in the EADB-TOPMed dataset. Genomic inflation factor lambda was computed with the R package GenABEL 48 using the median approach after exclusion of the APOE region (44 Mb to 46 Mb on chromosome 19 in GRCh38). The linkage disequilibrium (LD) score (LDSC) regression intercept was computed with the LDSC software using the "baselineLD" LD scores built from 1000 Genomes Phase 3 8 . The analysis was restricted to HapMap 3 variants and excluded multi-allelic variants, variants without an rsID and variants in the APOE region.
Definition of associated loci. A region of +/-500kb was defined around each variant with a Stage I P value below 1x10 -5 . Those regions were then merged with the bedtools software to define non-overlapping regions. The region corresponding to the APOE locus was excluded. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; We then applied the PLINK clumping procedure to define independent hits in each of those regions. The clumping procedure was applied on all variants with a Stage I P value below 1x10 -5 . It is an iterative process beginning with the variant with the lowest P value, named index variant. Variants with a Stage I P value below 1x10 -5 , located within 500 kb of this index variant, and in LD with the index variant (r 2 above 0.001) are assigned to the clump of this index variant. The clumping procedure is then applied on all the remaining variants, until no variant is left. LD was computed in the EADB-TOPMed dataset using high quality (probability ≥ 0.8) imputed genotypes.
Stage II analyses. Variants with a Stage I P value below 1x10 -5 were sent for follow-up (see Supplementary Information). A fixed-effect meta-analysis was performed with METAL (inverse variance weighted approach) to combine the results across Stage I and Stage II. In each clump, we then reported the replicated variant (same direction of effects between Stage I and Stage II, with a Stage II P value below 0.05) with the lowest P value in the metaanalysis of the Stages I + II. Those variants were considered associated at the genome-wide significance level if they had a P value below 5x10 -8 in the Stages I + II meta-analysis. Among them, we excluded the variant chr6:32657066:G:A because its frequency amplitude was large.
Pathway analysis. The assignment of Gene Ontology (GO) terms to human genes was obtained from the "gene2go" file, downloaded from NCBI on March 11 th 2020. "Parent" GO terms were assigned to genes using the ontology file downloaded from the Gene Ontology website on the same date. GO terms were assigned to genes based on experimental or curated evidence of a specific type, so evidence codes IEA (electronic annotation), NAS (non-traceable author statement), RCA (inferred from reviewed computational analysis) were excluded. Pathways were downloaded from the Reactome website on April 26 th 2020. Biocarta, KEGG and Pathway Interaction Database (PID) pathways were downloaded from v7.1 (March 2020) of the Molecular Signatures Database. Analysis was restricted to GO terms containing between 10 and 2000 genes. No size restrictions were placed on the other gene sets, since there were many fewer of them. This resulted in a total of 10,271 gene sets for analysis. Gene set enrichment analyses were performed in MAGMA 49 , correcting for the number of variants in each gene, linkage disequilibrium (LD) between variants and LD between genes. LD was computed in the EADB-TOPMed dataset using high quality (probability ≥ 0.9) imputed genotypes. The measure of pathway enrichment is the MAGMA "competitive" test (where the association statistic for genes in the pathway is compared to those of all other protein-coding genes), as recommended by De Leeuw et al. 50 . We used the "mean" test statistic, which uses the sum of -log(variant P value) across all genes as the association statistic for genes. The primary analysis assigned variants to genes if they lie within the gene boundaries, but a secondary analysis used a window of 35kb upstream and 10kb downstream to assign variants to genes, as in Kunkle et al 6 . The primary analysis used all variants with imputation quality above 0.8. We used q-values 51 to account for multiple testing throughout this report.
QTLs/TWAS/MetaMeth. In order to prioritize candidate genes in the new loci, we employed several approaches: (i) expression quantitative trait loci (eQTLs) and colocalization (eQTL coloc) analyses combined with expression transcriptome-wide association studies (eTWAS) in AD-relevant brain regions; (ii) splicing quantitative trait loci (sQTL) and colocalization (sQTL coloc) analyses combined with splicing transcriptome-wide association studies (sTWAS) in AD-relevant brain regions; (iii) genetic-driven methylation as a biological mediator of genetic signals in blood (MetaMeth). In our regions of interest, we systematically searched if a gene has a significant e/sQTL, colocalization e/sTWAS and/or MetaMeth signal(s) within a region of 1 Mb around the lead variant. In addition to the "nearest" genes from the lead variant, we kept for further analyses those exhibiting such AD-related modulations. We then added several additional approaches: (i) data from a genome-wide, high-content siRNA screening approach to assess the functional impact of gene under- is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; expression on APP metabolism 20 ; (ii) methylation QTL (mQTL) and histone acetylation QTL (haQTL) effects of the lead variants in DLPFC 21 and (iii) additional eQTL effects of these variants in monocytes and macrophages [22][23][24][25][26][27] . A full description of how the genes were prioritized is reported in the Supplementary Information (see also Supplementary Tables 12-21 and Supplementary Fig. 26-34).
Cell type expression. Assignment of newly identified AD risk genes to specific cell classes of the adult brain was performed as previously described 52 . Briefly, middle temporal gyrus (MTG) single-nucleus transcriptomes (15,928 total nuclei derived from 8 human tissue donors ranging in age from 24-66 year), were used to annotate and select 6 main cell classes using Seurat 3.1.1 53 : Glutamatergic Neurons, GABAergic Neurons, Astrocytes, Oligodendrocytes, Microglia and Endothelial cells.

PRS analysis.
Twelve longitudinal MCI cohorts were included in the analysis and are fully described in the Supplementary Information and Supplementary Table 23. PRS were calculated as previously described 34 . Briefly, we considered 60 variants with genome-wide significant evidence of association with AD in our study (Figure 1, Tables 1 and 2) with a MAF ≥ 0.05 in our case-control study. Variants were directly genotyped or imputed (R² ≥ 0.3). We did not include any APOE variants in the PRS. The PRS was calculated as the weighted average of the number of risk increasing alleles for each variant. Weights were based on the respective log(OR) obtained in the Stage II since no samples in this stage were included in the MCI study. The PRS was then multiplied by 60, i.e. the number of included variants. Thus, an increase in HR corresponds to carry one additional average risk allele. All PCs used were generated per cohort, using the same variants that were used on the case/control study PCA. The number of APOE-e4 alleles was obtained based on direct genotyping or, if missing, based on genotypes derived from the TOPMed imputations. The association of the PRS with risk of progression to dementia in patients with MCI was assessed using Cox proportional hazards-regression analysis. First, the effect was analyzed on progression to all-cause dementia (i.e. regardless of clinical dementia subtype). Next, the analysis was focused on MCI patients converting to AD dementia. To this end, all converters to non-AD dementia were excluded from the analysis sample. Two cohorts (HBA, SAN) were excluded due to missing information on the clinical dementia subtype at this stage. Finally, to assess whether the exclusion of non-AD dementia converters affected our results, the analysis was repeated by coding non-AD converters as censored cases. Each Cox-regression analysis was first performed unadjusted for covariates and then repeated, adjusted for age, sex and the first four principal components to correct for potential population stratification. Furthermore, analyses were additionally controlled for the number of APOE-e4 alleles (assuming an additive effect) to assess the independence of the PRS effect from APOE. Moreover, the interaction between the PRS and APOE-e4 was tested. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

URLs:
The copyright holder for this this version posted October 4, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; General de Evaluación and the Fondo Europeo de Desarrollo Regional (FEDER-'Una manera de hacer Europa'). Some control samples and data from patients included in this study were provided in part by the National DNA Bank Carlos III (www.bancoadn.org, University of Salamanca, Spain) and Hospital Universitario Virgen de Valme (Sevilla, Spain); they were processed following standard operating procedures with the appropriate approval of the Ethical and Scientific Committee. The present work has been performed as part of the doctoral program of I. de Rojas at the Universitat de Barcelona (Barcelona, Spain). EADI. This work has been developed and supported by the LABEX (laboratory of excellence program investment for the future) DISTALZ grant (Development of Innovative Strategies for a Transdisciplinary approach to ALZheimer's disease) including funding from MEL (Metropole européenne de Lille), ERDF (European Regional Development Fund) and Conseil Régional Nord Pas de Calais. This work was supported by INSERM, the National Foundation for Alzheimer's disease and related disorders, the Institut Pasteur de Lille and the Centre National de Recherche en Génomique Humaine, CEA, the JPND PERADES, the Laboratory of Excellence GENMED (Medical Genomics) grant no. ANR-10-LABX-0013 managed by the National Research Agency (ANR) part of the Investment for the Future program, and the FP7 AgedBrainSysBio. The Three-City Study was performed as part of collaboration between the Institut National de la Santé et de la Recherche Médicale (Inserm), the Victor Segalen Bordeaux II University and Sanofi-Synthélabo. The Fondation pour la Recherche Médicale funded the preparation and initiation of the study. The 3C Study was also funded by the Caisse Nationale Maladie des Travailleurs Salariés, Direction Générale de la Santé, MGEN, Institut de la Longévité, Agence Française de Sécurité Sanitaire des Produits de Santé, the Aquitaine and Bourgogne Regional Councils, Agence Nationale de la Recherche, ANR supported the COGINUT and COVADIS projects. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; German Federal Ministry for Education and Research grants 01 GI 0710, 01 GI 0712, 01 GI 0713, 01 GI 0714, 01 GI 0715, 01 GI 0716, 01 GI 0717. Genotyping of the Bonn case-control sample was funded by the German centre for Neurodegenerative Diseases (DZNE), Germany. The GERAD Consortium also used samples ascertained by the NIMH AD Genetics Initiative.    is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; Loci with a genome-wide significant signal are annotated (known loci in black and new loci in red). Variants with P value below 1x10 -95 are not shown. The red dotted line represents the genome-wide significance level (P value = 5x10 -8 ), while the black dotted line represents the suggestive significance level (P value = 1x10 -5 ).  (right panel). In order to prioritize candidate genes in the new loci, we considered the "nearest" protein-coding gene from the lead variant and the genes exhibiting AD-related modulations within a region of 1 Mb around the lead variant. The average expression of each gene expressed by at least 10% of cells (pct.exp > 0.1) was rescaled from 0 to 2, allowing the identification of genes expressed by unique or multiple cell classes. Brown squares show significant hit for the respective column, dark green squares indicate the most probable candidate genes, bright greens squares indicate if several genes were retained in the same locus, and silver square indicate a prioritized gene in an independent known locus. The columns demonstrating lead variant m/haQTL effects in DLPFC and lead variant eQTL effects in monocyte and macrophages are annotated for the type of association (mQTL: methylation QTL, haQTL: histone acetylation QTL, mon: monocyte eQTL, mac: macrophage eQTL), and the superscript indicates that the associated feature is annotated for both genes. Arrow indicates level of expression of the candidate genes (full line for the most probable genes and dotted lines are for genes with less strong evidence).   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ;   . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; is the author/funder, who has granted medRxiv a license to display the preprint in is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ; https://doi.org/10.1101/2020.10.01.20200659 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 4, 2020. ;