## Introduction

Late-onset sporadic Alzheimer’s disease (AD) is a progressive neurodegenerative disorder accounting for 50–70% of all dementia cases in the elderly population1. Amyloid β-peptide (Aβ) is the primary component found in the neuritic plaques of AD patient brain, and multiple mutations in the APP gene and its related genes (PSEN1 and PSEN2) promoting Aβ production have been identified in familial (early-onset) AD2,3,4,5,6. These observations support a causal role of Aβ deposition in the etiology of AD. Familial AD is, however, much rarer than sporadic AD, which is highly prevalent after age 65. Recent genome-wide association studies (GWAS) have identified a large number of genetic variants associated with the risk of late-onset AD7,8,9,10,11,12,13, most of which are located in genes exclusively expressed in microglia (e.g., TREM2). These insights suggest the involvement of microglia in the pathology of AD.

Despite recent progress in understanding the biological mechanisms underlying AD, the cellular and molecular activities and causation in the late-onset AD of most common variants discovered in GWAS, including those in APOE, remain unclear. Functional links between most of these AD-related loci and genes are still to be determined, although some microglia-related single nucleotide polymorphisms (SNPs) in, e.g., CD33, and the MS4A gene cluster, are shown to be mediated through TREM2 (refs. 14,15). The functional mechanisms of TREM2 in Aβ uptake by microglia are also complicated, and contradictory biological consequences are observed in mouse models (see, e.g. ref. 16, for a review on this topic). Moreover, adding up the APOE variant and other nine identified top SNPs accounts for a small portion (5%) of variation of age-of-onset17, suggesting that missing genetic mechanisms contribute to this complex disease. We expect that the discovery of additional AD-associated genetic variants will provide more insights into the understanding of AD pathology.

In this study, we performed an exome-wide association analysis of age-of-onset of AD, in which most genetic variants are rare or low frequency, using an Alzheimer’s Disease Sequencing Project (ADSP) sample of 10,216 subjects in the discovery phase. Rare coding variants often show larger effect sizes, and their biological consequences are more explicable, but its association analysis is complicated by insufficient statistical power. Although the exome-wide association of AD has recently been explored using AD status18,19,20, our rationale is that more AD-related rare variants can be identified using analysis of age-of-onset of AD with a Cox model given emerging evidence from a previous study showing its potential advantage in terms of statistical power21. We attempted to replicate significant findings in five other studies, with a meta-analysis sample size of about 20,000 subjects. To understand the biological consequences of the identified SNPs, we explored their influence on regulatory activities and gene expression at tissue and single-cell levels.

We further performed a separate exome-wide association analysis of the age-of-onset of AD by excluding the APOE ε4 carriers. The overarching goal is to identify novel variants contributing to AD independently of the APOE ε4 allele, the strongest single genetic risk factor for AD. Despite quarter Century research on the function of the APOE gene22, the primary biological role of this gene in AD pathogenesis remains elusive as the gene and its protein are probably involved in many pathways related to Aβ deposition, Aβ clearance, tau pathology, and neuroinflammation23. Our analysis is designed to provide more insights into AD-related APOE biology.

## Results

### Exome-wide analysis of age-of-onset of AD in the discovery phase

In the first analysis (using all subjects without the adjustment for APOE ε4), we detected four independent signals passing the exome-wide threshold (p = 5E−07) (Fig. 1A, Table S2, and Model 1). The most significant SNP was the APOE ε4-coding variant rs429358, having a hazard ratio (HR) of 3.32 (p = 4.39E−497). The p-value is much more significant than that reported in the largest meta-analysis so far based on AD status (p = 5.79E−276)10. This result confirms previous findings25,26,27 that APOE ε4 is not only associated with AD status but also substantially decreases its age at onset (Fig. 2A). The three signals outside the APOE region were rs75932628 (the R47H mutation) in TREM2 (HR = 2.76, p = 8.16E−17), rs7982 in CLU (HR = 0.890, p = 1.1E−07), and rs2405442 in PILRA (HR = 0.879, p = 6.35E−08) (Fig. 1A, Table S2, and Model 1). The beneficial association of the missense variant rs7982 in CLU was not reported in the previous study of AD status using the same ADSP sample18. We observed that the minor allele carriers of rs7982 had lower hazards consistently across a wide age interval (Fig. 2B). Although the R47H mutation in TREM2 and rs2405442 in PILRA were identified in the previous analysis18, our analysis achieved increased significance for the R47H mutation (p = 8.16E−16 vs. 4.8E−12). In addition, we observed well-known AD-associated SNPs among the top hits, including rs12453 in MS4A6A (p = 1.52E−06), rs2296160 in CR1 (p = 6.50E−06), and rs592297 in PICALM (p = 5.26E−05) (Table S2 and Model 1).

In the second analysis (using all subjects with the adjustment for APOE ε4), we identified six independent SNPs (p < 5E−07) (Fig. 1B, Table S2, and Model 2), including three aforementioned variants in TREM2, CLU, and PILRA. Three additional variants include rs144292455 in TACR3 on 4q24 (HR = 5.15, p = 2.16E−07, MAC = 17), rs111033333 in USH2A on 1q41 (HR = 4.65, p = 1.99E−07, MAC = 19), and rs199533 in NSF on 17q21.31 (HR = 0.87, p = 1.57E−07, minor allele frequency (MAF) = 20.2%). The SNP rs199533 in NSF is previously reported in ref. 18 but does not reach the genome-wide significance in a follow-up meta-analysis incorporating replication studies18. The other two variants are novel. This analysis also identified two variants in CST9 and CDKL1 genes at the suggestive level of significance p < 5E−06 (Table 1).

In the third analysis (using only APOE ε4 non-carriers), we identified three independent significant SNPs (p < 5E−07) (Fig. 1C, Table S2, and Model 3) including the R47H mutation in TREM2 (HR = 2.99, p = 1.11E−14), and rs111033333 in USH2A (HR = 5.13, p = 1.70E−08) found in the second analysis. One novel SNP was the rare variant rs56201815 in ERN1 within 17q23.3 locus (HR = 4.22, p = 7.99E−08, MAC = 29). The HR of the minor allele of this SNP was substantial and comparable to that of APOE, which is not surprising because rare coding variants tend to show more significant biological effects, and the MAF of this SNP in the ADSP sample is merely ~0.13%, much lower than that of the R47H mutation in TREM2.

We found that the p-values of the newly identified SNPs from the Cox models were more significant, particularly for the rare variants, than those from a logistic model using the same ADSP sample and covariates (Fig. 1D), explaining why these SNPs were not detected in the previous study. We compared the p-values of well-established AD-related coding-variants in the ADSP WES data between the two models. We found that the Cox model produced more significant p-values for almost all SNPs except for the two SNPs in MS4A6A (Fig. 1D).

### Replication analyses confirm SNPs in ERN1 and the MAPT region

In the analysis using APOE ε4 non-carriers, three SNPs (rs56201815, rs12373123, and rs199533) showed exome-wide meta-analysis p-values (p < 5E−07) more significant than those from the ADSP sample alone. Association for rs111033333 in USH2A and rs79782048 in NOTCH1 remained at the exome-wide significance. Replication of these two rare variants was, however, less robust because ≤1 minor allele carrier was observed in most of the replication cohorts and thus the significance of the meta-analysis p-value was dominantly attributed to the signal from the discovery phase. The novel AD-associated SNP rs56201815 (meta-analysis p = 2.35E−12) is a synonymous variant in ERN1. rs12373123, a missense variant of SPPL2C (Table 1), is located in a large LD block spanning the MAPT region and it is in complete LD with multiple synonymous, nonsense, or missense variants in CRHR1 and MAPT. In APOE ε4 non-carriers, the hazards of AD were consistently lower in the carriers of the minor allele of rs12373123 after age 70 (Fig. 2D). It had a more significant p-value (meta-analysis p = 6.67E−08) than the previously reported SNP rs2732703 (meta-analysis p = 2.74E−06) and rs199533 (meta-analysis p = 1.11E−07) among APOE ε4 non-carriers, while rs199533 was more significant in the full sample. The minor allele of rs12373123 was consistently associated with decreased risk of AD in all studies except for LOADFS.

### rs56201815 is a synonymous variant and potential brain-specific expression quantitative trait locus (eQTL) of ERN1

As rs56201815 in ERN1 was the most significant SNP identified from the discovery and replication phases, we next sought to examine its biological and regulatory functions. rs56201815 is a synonymous coding variant, indicating that it unlikely alters the amino acid sequence of ERN1. However, rs56201815 is located in a CTCF binding site, an open chromatin region in multiple cell types, and an evolutionarily conserved region (Fig. 3B). Moreover, a recent mouse study reports that inhibition of ERN1 expression reduces amyloid precursor protein (APP) in cortical and hippocampal areas, and restores the learning and memory capacity of AD mice37. We, therefore, hypothesized that rs56201815 is a cis-eQTL of ERN1 in the brain, and the detrimental effect of rs56201815 on AD is mediated by upregulating the expression of ERN1. To test this hypothesis, we examined the effect of rs56201815 on the expression of ERN1 using RNA-seq data in ROSMAP and GTEx, and microarray data in ADNI.

We collected 2213 RNA-seq samples from 838 subjects in the ROSMAP cohort in three brain regions including the dorsolateral prefrontal cortex (PFC), posterior cingulate cortex (PCC), and anterior caudate nucleus, among which four subjects were rs56201815-G carriers. Our differential expression (DE) analysis revealed that the minor allele of rs56201815 was associated with increased expression of ERN1 (log(fold-change (FC)) = 0.204, p = 0.0285) in PCC (Fig. 3C). We then analyzed a WGS dataset of 838 healthy subjects from the GTEx project. The WGS data included two rs56201815-G carriers, one of which had RNA-seq data in nine brain tissues including the amygdala, anterior cingulate cortex (ACC), hypothalamus, caudate, nucleus accumbens, putamen, cerebellar hemisphere, cerebellum, and spinal cord. Despite the small sample size, our DE analyses indicated that rs56201815 was a potential eQTL of ERN1 in several regions in the cerebrum, particularly the nucleus accumbens (log(FC) = 1.28, p = 1E−4), and the putamen (log(FC) = 0.734, p = 0.05) (Fig. 3D). In line with the result from the ROSMAP data in PCC, rs56201815-G was correlated, albeit not significant (log(FC) = 0.35, p = 0.437), with the expression in ACC, leading to a significant meta-analysis p-value of 0.0213 for cingulate cortex. In almost all regions in the cerebrum, the rs56201815-G carrier had uniformly higher expression of ERN1 than the average (Figs. 3D and S2A).

We then investigated the effects of rs56201815 on ERN1 expression in other brain regions, and in four non-brain tissues including the sigmoid colon, lung, spleen, and whole blood. The RNA-seq data in the sigmoid colon had two rs56201815-G carriers, and one rs56201815-G carrier was available in the other tissues. The DE results showed no evidence of an association between rs56201815 and the gene expression in any of these tissues (Fig. S2A). As the number of rs56201815-G carriers in the GTEx project is small, we further analyzed a peripheral whole blood sample from the ADNI project, comprising 733 subjects having both a WGS dataset and a microarray gene expression dataset, three of whom were rs56201815-G carriers with high sequencing quality. Our DE analyses of two probes in ERN1 showed that the minor allele rs56201815-G was not associated with either probe (Fig. S2B).

These results suggested that rs56201815 was associated with elevated expression of ERN1 in cerebral regions (most predominantly in PCC and several regions in the basal ganglia), but not likely in other tissues. To examine whether its regulatory effects in the brain are mediated by a change of chromatin activity, we further carried out association analyses of epigenetic markers including DNA methylation and histone modifications in PFC. We collected an Illumina 450k array DNA methylation dataset of 721 subjects (four rs56201815-G carriers) from a ROSMAP sample38,39. Among 11 probes located in the region of ERN1, we found no evidence of significant association after adjustment for multiple testing (Table S3). The most significant probe (chr17:62134117), also the probe closest to rs56201815, was located in an enhancer with a p-value of 0.012. For histone modifications, we interrogated histone 3 lysine 9 acetylation (H3K9ac) peaks using a ChIP-seq dataset of 632 subjects (four rs56201815-G carriers) from a ROSMAP sample38,40. We conducted differential analyses of 26,384 broad peaks adjusted for fraction of reads in peaks (FRiPs), GC bias, and ten remove unwanted variation (RUV) components. No significant association was found among nine broad peaks within a ±200 kb flanking region of ERN1 after adjustment of multiple testing although eight peaks showed slightly increased intensity in the carriers (Table S4). The most significant association was in an enhancer at chr17:62,337,374-62,342,372 with a p-value of 0.043.

### Rs12373123 is a neural cell type-specific eQTL of MAPT and GRN

Previous studies show that rs12373123 is a cis-eQTL of multiple nearby genes (e.g., MAPT, CRHR1, and LRRC37A) in multiple tissues including the brain28,41,42,43, and shows chromatin interactions with these genes (Fig. 4A). But it is not clear which cell type and genes mediate its effect on AD. We then explored the regulatory effects of rs12373123 at a cell-type level using a single-nucleus RNA-seq (snRNA-seq) dataset. Cell type-specific analysis can also reduce the potential confounding effects originating from unobserved heterogeneous cell type proportion across subjects in the tissue-level analysis, and therefore produces more accurate and refined estimates. We performed cell type-specific eQTL analyses using 44 subjects having both genotype data (39 subjects from WGS and five subjects from a SNP array) and snRNA-seq data from ~80,000 cells in PFC from a ROSMAP sample. We classified cells into excitatory neurons, inhibitory neurons, astrocytes, microglia, oligodendrocytes, and oligodendrocyte progenitor cells (OPCs) based on previous clustering results44. We then aggregated cells within each cell type and each subject.

In each cell type, we interrogated 11 protein-coding genes (10 genes within a ±500 kb flanking region and GRN, a nearby gene linked to frontotemporal lobar degeneration (FTD), a type of dementia). The cell type-specific eQTL analyses revealed that one or more copies of rs12373123-C were associated with elevated expression of ARL17B in all six brain cell types (p < 1E−11) (Fig. 4B and Table S5). rs12373123 was also an eQTL of LRRC37A2, LRRC37A3, and KANSL1 in most cell types except for microglia (Fig. S3 and Table S5). The protective allele rs12373123-C was associated with elevated MAPT expression in astrocytes (p = 0.01) while a decreasing trend in OPCs (p = 0.09) (Fig. 4B and Table S5). We further found that rs12373123-C, particularly its homozygous protective genotype, was significantly associated with increased expression of GRN in microglia (p = 3.65E−06) (Fig. 4B and Table S5), which is a protective gene against dementia and is important for lysosome homeostasis in the brain45,46.

We also assessed the cell type-specific association between rs56201815 and the expression of ERN1. We observed that ERN1 was ubiquitously expressed in all brain cell types, most abundantly in microglia, followed by astrocytes and OPCs. As there was only one rs56201815-G carrier among the 39 WGS subjects, and, unfortunately, its total sequencing depth was much lower than that of the other subjects (~10% of the average library size), we investigated three major abundant cell types (excitatory neurons, astrocytes, and oligodendrocytes), for which the carrier had a library size >50,000. We observed that rs56201815-G was slightly correlated with increased expression of ERN1 in excitatory neurons, but not significant (Fig. S4).

### Gene-set analysis identifies astrocyte, microglia, and amyloid-beta-related pathways

As aggregating signals within a gene can often increase the statistical power, in particular, for detecting rare coding variants, we carried out gene-based analyses using the summary statistics of all examined SNPs estimated from the ADSP sample. Our gene-based analyses using MAGMA47 showed that TREM2 was the most significant gene associated with AD in all individuals (p = 5.0E−10) and APOE ε4 non-carriers (p = 1.62E−10) (Fig. S5A), consistent with previous results18. Indeed, all six exonic SNPs (rs2234256, rs2234255, rs2234253, rs142232675, rs143332484, rs75932628) in TREM2 were at least nominally associated with AD (Table S2). Its significance in APOE ε4 non-carriers was higher, suggesting that the effects of TREM2 on AD were independent of APOE. Besides, multiple genes in the MAPT region including MAPT, KANSL1, NSF, and SPPL2C were associated with the risk of AD in both analyses (Fig. S5A, B). We also observed that CLU, PILRA, EXO5, and ERN1 were among the top associated genes.

Our gene-set analysis using FUMA48 based on the summary statistics from the exome-wide association analysis conditional on APOE ε4 revealed that Gene Ontology (GO) gene sets related to the regulation of astrocytes, amyloid-beta, endoplasmic reticulum (ER) stress, and unfolded protein response (UPR) were among the top enriched gene sets associated with AD (Fig. 5A). In contrast, the gene sets related to astrocyte activation, microglia migration, and lipoprotein metabolic process were among the top in the gene-set analysis using APOE ε4 non-carriers (Fig. 5B). Our cell-type association analysis using FUMA49 (Watanabe et al., 2019) showed that microglia were associated with AD among nine major cell types in the brain (p < 0.05) in the analysis of APOE ε4 non-carriers (Fig. 5D). No cell type was associated with AD based on the summary statistics from the association analysis conditional on APOE ε4 (Fig. 5C).

## Discussion

In this study, we interrogated the associations between 108,509 exome-wide SNPs and age-of-onset of late-onset AD using Cox models with a sample consisting of ~20,000 AD patients and controls. We also attempted to identify SNPs contributing to earlier onset in APOE ε4 non-carriers alone. Most of these SNPs are rare variants. Our results not only confirm previously reported AD-related SNPs with much higher significance but also reveal novel genetic variants associated with age-of-onset of AD, particularly in APOE ε4 non-carriers.

Aging is the most important risk factor for late-onset AD, indicating that certain risk factors during the aging process might be implicated and required in the pathogenesis of AD. The UPR is one of the mechanisms disrupted during aging, resulting in augmented susceptibility to ER stress and the accumulation of unfolded protein53. Previous studies show that aging leads to deficits in the systems involved in the defense against unfolded proteins in the rat hippocampus54. Persistent ER stress in the central nervous system during aging can initiate apoptosis of neurons and can trigger the innate immune response in microglia55,56. Combined with the fact that many AD-related genes identified by GWAS are expressed exclusively in microglia, our findings indicate that the interaction between the UPR and innate immune system might play a critical role in biological mechanisms underlying AD.

As rs56201815, the variant rs12373123 in the MAPT region was also identified in APOE ε4 non-carriers. The minor allele of rs12373123 was associated with reduced susceptibility to AD in ADSP, ROSMAP, CHS, and GenADA. This SNP is located in an LD block spanning >400 kb, and is in high LD with a large number of SNPs including multiple missense variants in MAPT, SPPL2C, CRHR1, and KANSL1. Previous GWAS show that rs12373123 and two nearby missense SNPs (rs12185268 and rs12373124) in complete LD with rs12373123 exhibit pleiotropic associations with numerous diseases and traits including intracranial volume57, corticobasal degeneration58, Parkinson’s disease (PD)59,60,61,62, primary biliary cirrhosis63, red blood cell count64, and androgenetic alopecia65. On the other hand, the major allele, more predisposed to degenerative diseases, is significantly associated with increased bone mineral density66,67. Because SNPs contributing to age-related degenerative diseases are generally not subject to evolutionary selection68,69, its major allele is probably selected by evolution due to its beneficial effect on bone mineral density. The results of our age-of-onset analyses indicate that this pleiotropic region might also be implicated in late-onset AD, especially in APOE ε4 non-carriers. Our cell type-specific analyses reveal that rs12373123 is a cis-eQTL in different brain cells of multiple critical genes implicated in PD and FTD (e.g., MAPT and GRN), elucidating the regulatory mechanisms underlying its pleiotropy. Due to the involvement of tau protein in the etiology of AD and PD, the effect of rs12373123 on these diseases might be mediated by MAPT. Indeed, rs12373123 is in high LD with multiple missense SNPs (e.g., rs62056781 and rs74496580) in MAPT, and we found in the snRNA-seq data that rs12373123 is also an eQTL of MAPT in astrocytes. Our finding also suggests that the effects of rs12373123 can be mediated by increasing the expression of GRN in microglia, which is a key gene protective against FTD.

Also, our results demonstrated advantages in the statistical power of using a Cox model for age-of-onset traits than a logistic model for binary outcomes in the study of AD. The power gain in terms of p-values is evident for many well-known AD-related SNPs in e.g., TREM2 and CLU, which all achieved more significant p-values than a previous study using the same cohort18. Despite a smaller sample size, the p-value from the Cox model for detecting APOE ε4, the recognized true positive signal, is much more significant than a recent large-scale meta-analysis of AD status10 and a previous analysis using a linear model of log-transformed age-of-onset26. Moreover, our age-of-onset analysis showed promising results for identifying rare variants compared to logistic regression. An advantage of a Cox model over Poisson regression or logistic regression is that it implicitly accounts for age-varying hazards, a characteristic in many age-related diseases, e.g., AD70. Our results in AD suggest that Cox models can have a power advantage for exploring rare variant association in other age-related diseases.

Although our identified SNPs were validated in multiple independent cohorts, we acknowledge some limitations. The definitions and criteria of diagnosis of AD can vary across these cohorts. AD has a certain similarity in the clinical and biological manifestation of other common neurodegenerative diseases such as FTD, which makes the clinical diagnosis of AD more complicated. Also, one of our findings rs56201815 in ERN1 is a rare variant (MAF = ~0.13%), which had slightly lower imputation quality compared to common variants. Although this SNP showed solid associations in our meta-analyses, as the sample sizes of our WGS replication cohorts are small for rare variants, more GWAS using large-scale WGS or WES data are preferable to further validate this SNP and other candidate SNPs identified in the discovery phase.

In conclusion, we identified two novel SNPs in ERN1 and SPPL2C/MAPT-AS1 that exhibit strong associations with the age-of-onset of AD. We also explored their regulatory consequences at the tissue and single-cell levels in the brain. These findings support the hypothesis of the potential involvement of the UPR to ER stress and tau protein in the pathological pathway of AD, contributing to the understanding of the biological mechanisms underlying AD. Our findings are useful for guiding follow-up studies and provide more insight into the molecular mechanisms and implications of the relevant genes in AD.

## Methods

### Exome-wide age-of-onset association analysis

The association analyses of the age-of-onset of AD in the discovery phase of ADSP was conducted using a Cox mixed-effects model implemented in the coxmeg R package21, which accounted for the clustering structure using a GRM. A dense GRM was first estimated from the original WES data based on the GCTA model77 implemented in the SNPRelate R package78. In the discovery phase of ADSP, we built a sparse GRM by setting any entry below 0.03 to zero. We evaluated ten top PCs (PC1 to PC10) calculated from the dense GRM, and included the only significant PC2, PC8, and PC10 in the analyses. We first estimated a variance component in the null model, which was then used to estimated HRs and p-values for all SNPs. We performed two analyses, (a) including all subjects with the three PCs, sex and the number of copies of APOE ε4 included as covariates, (b) including only APOE ε4 non-carriers with the three PCs and sex included as covariates. We found that the estimated variance component was zero in the analysis (b), suggesting no evidence of random effects, and therefore we instead used a simple Cox model. The threshold to declare significant associations was calculated as 0.05 divided by the total number of tested SNPs. For comparison with the analysis of AD status, we performed association analysis by fitting a logistic regression using the glm R function adjusting for the same covariates with the same sample.

We performed age-of-onset association analyses in LOADFS, CHS, ROSMAP, GenADA, and the ADSP extension study for the top SNPs passing the suggestive threshold (p < 5E−06) in the discovery phase. The same model and estimation procedures as in ADSP were used in LOADFS, which is also a family-based cohort. In LOADFS, the GRM was estimated from the genotype array data. The association analyses were conducted in the other four cohorts (i.e., CHS, ROSMAP, GenADA and the ADSP extension study) using a Cox model implemented in the survival R package30 because these cohorts consisted of unrelated subjects. We also included sex and the number of copies of APOE ε4 as covariates. Meta-analysis effect sizes and standard errors were computed using the summary statistics from all six studies based on the following fixed-effects model, β = ∑iβiwi/∑iwi and sd(β) = 1/$$\sqrt {\mathop {\sum}\nolimits_i {w_i} }$$, where wi is the weight for the study i. To compare age-of-onset analysis with case-control analysis, we also performed association analyses of AD status in ADSP using logistic regression.

### Gene-based association analysis

The gene-based analysis was performed based on the summary statistics obtained from the age-of-onset association analyses. We only included SNPs with MAC >10 and a missing rate <2% in the gene-based analyses. Each SNP was first annotated to a gene using its SNP ID according to a gene location file obtained in the MAGMA website (https://ctg.cncr.nl/software/magma). We only included SNPs within the boundary of a gene body. Gene-based p-values were then computed using MAGMA (v1.08b) with a SNP-wise mean model47. LD between the SNPs was estimated using the raw WES data in ADSP.

### Gene-set and cell-type association analysis

The gene-set analysis was performed for curated gene sets and GO terms using the procedure SNP2GENE in FUMA48 based on the summary statistics obtained from the age-of-onset association analyses. The 1000 Genomes Project (phase 3) for the European population was used as a reference panel in the analysis. The cell-type association analysis was also performed using FUMA79 following the SNP2GENE procedure. We selected a human brain single-cell RNA-seq dataset provided in ref. 80 as a reference for cell type-specific gene expression.

### Analysis of FDG-PET data

The longitudinal FDG-PET average intensity scores across five ROIs (left/right angular gyrus, bilateral posterior cingulate gyrus, and left/right inferior temporal gyrus) for 738 subjects in ADNI having the WGS data were downloaded from the ADNI website (https://ida.loni.usc.edu). Details about sample preparation and data generation were described in refs. 33,34. The association analysis between average FDG-ROI and the genotype of rs56201815 was performed by fitting a linear mixed-effects model using lme4 R package81 including a random effect accounting for within-subject variability and three covariates (age, sex, and diagnosis group).

### Analysis of tissue-specific RNA-seq and microarray data

BAM files of aligned reads from a total of 2213 RNA-seq samples in three brain regions (dorsolateral PFC, PCC, and anterior caudate nucleus) in the ROSMAP project were downloaded from the synapse website (https://www.synapse.org/). Raw counts of 57,905 coding and non-coding genes were called using featureCounts82 according to the GENCODE annotations GRCh37(r87). Samples with the RNA integrity number (RIN) < 5 were excluded before the analysis. We first removed low-expressed genes (those genes for which fewer than three individuals had counts-per-million >1) before normalization. We then normalized the RNA-seq raw counts using the trimmed mean of M-values (TMM) normalization method83. In the analysis of PFC, 761 non-Hispanic Caucasian subjects (including four rs56201815-G carriers) having both gene expression and genotype of rs56201815 from the WGS data with RIN ≥4.5 were included. Differential eQTL analysis was performed using edgeR84,85 adjusted for RIN, age at death, sex, AD status, and RNA extraction methods (polyA selection or rRNA depletion). In the analysis of PCC and anterior caudate nucleus, 371 (including three rs56201815-G carriers) and 585 (including four rs56201815-G carriers) non-Hispanic Caucasian subjects having both genotypes and gene expression with RIN ≥4.5 and rRNA depletion were included, respectively. To minimize technical noise resulted from sample preparation, we did not include polyA selection samples (accounting for merely 10% and 15% of all samples) because different RNA extraction methods have a large impact on measured expression in postmortem samples86, and the samples of all rs56201815-G carriers were generated using rRNA depletion. Differential eQTL analysis was performed using edgeR adjusted for RIN, age at death, sex, and AD status.

The raw count data of 3252 RNA-seq samples in nine brain tissues (i.e., amygdala, ACC, hypothalamus, caudate (basal ganglia), nucleus accumbens (basal ganglia), putamen (basal ganglia), cerebellar hemisphere, cerebellum, and spinal cord (cervical c1)) and four non-brain tissues (i.e., sigmoid colon, lung, spleen, and whole blood) from the GTEx project (version 8) were downloaded from the GTEx portal (https://gtexportal.org/home/datasets). Gene-level quantification was conducted by RSEM87. All GTEx raw count data were normalized using the same pipeline as in the analysis of ROSMAP. Differential eQTL analysis was then performed using edgeR with age, sex, and RIN as adjusted covariates.

The gene expression microarray data in peripheral blood from 742 ADNI subjects were profiled using the Affymetrix Human Genome U219 Array. Raw expression values were pre-processed using the robust multiarray average normalization method. More details about sample collection and data pre-processing can be found in ref. 88. Differential gene expression analyses were performed using linear regression adjusted for RIN and plate number.

### Analysis of DNA methylation data

The DNA methylation data in PFC were collected from 740 individuals in ROSMAP using the Illumina HumanMethylation450 BeadChip. Eighteen samples lying beyond ±3 standard deviations for the top three PCs were removed as outliers. We converted methylation beta-value to M-value using a logistic transformation. Differential methylation analysis was carried out using a linear regression adjusted for the top ten PCs.

### Analysis of H3K9ac ChIP-seq data

H3K9ac ChIP-seq raw count data were downloaded from the synapse website (https://www.synapse.org/). This dataset is previously described in detail in ref. 40. Briefly, the sample comprising 26,384 H3K9ac peaks (nine peaks in the ERN1 region) across the genome was collected from dorsolateral PFC of 669 subjects from the ROSMAP project, among which 625 subjects had also the WGS genotype data of rs56201815. The raw count data were normalized using the TMM method83. Estimation of common and tagwise dispersions and the analysis of differential peaks for rs56201815 were carried out using edgeR84,85 adjusted for FRiPs and GC bias. A sensitivity analysis was performed by further adjusting for ten RUV components estimated using RUVSeq89.

### Analysis of snRNA-seq data

We collected snRNA-seq raw count data generated by ref. 44 using the 10X Genomics Cell ranger pipeline in human PFC from 48 subjects (50% AD cases) including 17,926 genes profiled in 75,060 nuclei. We assigned cell identity and divided all cells into six subtypes (excitatory neurons, inhibitory neurons, astrocytes, oligodendrocytes, microglia, and OPCs) according to the previous clustering results44 using the scanpy package90. The clustering of the cells is described in more detail in ref. 44. We excluded endothelial cells or pericytes because of the lack of abundant cell counts in these two cell types.

To perform cell type-specific eQTL analysis, we first merged cells in each cell type and in each subject to obtain a raw count matrix of 17,926 genes and 39 subjects (six subjects were excluded due to lack of WGS data). We then followed the preprocessing and normalization procedures in the previous eQTL analysis of the bulk RNA-seq data. Differential eQTL analyses were then performed using edgeR84,85 with age, sex, and AD status as covariates. RIN was not available for most of the subjects.

### Functional annotation

The epigenetic and regulatory annotation of the identified SNPs and its nearby SNPs in high LD (r2 > 0.8) was performed using Haploreg v4 (ref. 91), in which its tissue-specific epigenetic markers (H3K27ac), regulatory regions (enhancers and promoters), motif changes, and eQTL information were annotated based on the ENCODE92, Roadmap93, and GTEx42 projects. GWAS catalog93 and GRASP94 were used to annotate whether a SNP is an existing QTL.