Non-coding variability at the APOE locus contributes to the Alzheimer’s risk

Alzheimer’s disease (AD) is a leading cause of mortality in the elderly. While the coding change of APOE-ε4 is a key risk factor for late-onset AD and has been believed to be the only risk factor in the APOE locus, it does not fully explain the risk effect conferred by the locus. Here, we report the identification of AD causal variants in PVRL2 and APOC1 regions in proximity to APOE and define common risk haplotypes independent of APOE-ε4 coding change. These risk haplotypes are associated with changes of AD-related endophenotypes including cognitive performance, and altered expression of APOE and its nearby genes in the human brain and blood. High-throughput genome-wide chromosome conformation capture analysis further supports the roles of these risk haplotypes in modulating chromatin states and gene expression in the brain. Our findings provide compelling evidence for additional risk factors in the APOE locus that contribute to AD pathogenesis.

A lzheimer's disease (AD), a progressive age-related neurodegenerative disorder, is the most common type of dementia and a leading cause of mortality in the elderly. Its prevalence is increasing rapidly with the aging population worldwide 1 . However, its underlying pathological mechanism remains unclear. Over the last few decades, various genetic risk factors for late-onset AD (LOAD) have been identified, including common non-coding variants with low penetrance (odds ratios = 1.05-1.30) 2 . In particular, the APOE locus tagged by coding variant APOE-ε4, is unequivocally the most significant genetic risk factor for AD 3,4 . While other AD risk variants have also been identified in this region, including TOMM40 poly-T variation [5][6][7][8] , APOE-ε4 is believed to be the only genetic factor that accounts for the risk effect exerted by the APOE locus 9 .
Apolipoprotein E (ApoE), the lipoprotein encoded by APOE, serves as a major lipid carrier in the brain 10 . APOE has three isoforms-APOE-ε2, APOE-ε3, and APOE-ε4-defined by combinations of two coding risk mutations (rs429358 and rs7412). APOE-ε3 is predominant in the general population, while APOE-ε2 is less common and exerts a protective effect against LOAD. On the other hand, APOE-ε4 has been identified as a strong AD genetic risk factor, with odds ratios of 1.78-9.93 across different studies or ethnic groups [11][12][13] , and has been reported to modulate brain amyloid-beta (Aβ) burden, tau protein level 14,15 , neuronal activity 16,17 , immune status 18,19 , blood-brain barrier integrity 20 and longevity 21,22 . Thus, APOE plays critical roles in both aging and human diseases.
Emerging studies suggest that APOE-ε4 does not fully explain the AD risk conferred by APOE and the surrounding regions [23][24][25][26] . Indeed, recent genome-wide association studies (GWAS) for AD conducted in Chinese 27 and European populations 28 have identified leading risk variants in this region, specifically located in the APOC1 or PVRL2 loci. Moreover, while individual risk variants residing in non-coding regions exhibit small effect sizes for disease risk, a combination of risk alleles from multiple variants results in aggregate effects, thus contributing to a higher disease risk. Hints of the presence of AD risk haplotype structures in the APOE locus have been identified 29,30 , although our understanding of these haplotypes has been restricted by traditional genotyping methods (i.e., genotyping array or Sanger sequencing). Thus, there might be additional AD risk variants or haplotype structures in the APOE locus that can modulate the risk effects and function of APOE-ε4 or exert their effects independently. Hence, it is vital to comprehensively analyze AD-associated genetic structures, as well as risk variants in this region in order to better understand the pathological basis of AD and aid the translation of such findings into clinical practice, namely patient stratification and therapeutic development in a genotype-specific manner.
Here, to dissect the complex AD-associated genomic signature within the extended APOE region and its contribution to the disease, we perform fine-mapping analysis based on wholegenome sequencing (WGS) and imputed array data from Chinese and non-Asian AD cohorts. We demonstrate the existence of AD risk haplotypes in the PVRL2 and APOC1 regions that exert risk effects on AD in an APOE-ε4 and APOE-ε2 genotypeindependent manner. These risk haplotypes are associated with changes in gene expression, particularly PVRL2 and APOE transcript levels in the brain or blood, and the resultant endophenotypes. Hence, our results collectively suggest that in parallel with the APOE-ε4 coding risk factor, there are additional genetic risk factors in the APOE surrounding regions that can modulate both gene expression and AD-associated phenotypic outcomes, pointing towards new directions for studying the disease mechanisms of AD.

Results
AD causal variants in the PVRL2 and APOC1 regions. We recently reported a WGS study of AD in the mainland Chinese population (n = 1172; Supplementary Table 1), in which multiple variants located in APOE and the surrounding regions exhibited the strongest association with AD 27 . To further investigate the existence of additional risk signals in this region, we conducted fine-mapping analysis in the extended APOE region (chr19:45,300,000-45,500,000) using the GATK HaplotypeCaller, which enables the simultaneous detection of SNPs and INDELs in the WGS data of this cohort and an AD cohort from Hong Kong. We applied post-filtering, including controlling for imputation quality (allele dosage DR 2 ), allele frequency, and Hardy-Weinberg equilibrium, yielding 682 variants (554 SNPs and 128 INDELs) for subsequent investigation (see Methods section).
To examine whether there are APOE-ε4-independent AD risk effects in the APOE surrounding regions, we first conducted association analysis among APOE-ε3 homozygous individuals from the mainland Chinese WGS cohort (n = 237 and 288 for the AD and NC groups, respectively) among the 682 obtained variants. A cluster of risk variants near the APOC1 region was identified. The top signal was observed from rs157592 (effect size = 1.672, p = 3.20 × 10 −3 ; Fig. 1a), which indicates that there might be other risk signals in the APOE surrounding region in addition to the well-studied APOE-ε4 risk factor. We subsequently performed an association study for all participants from the mainland AD cohort. Again, the results highlighted the contribution of non-coding variants near APOC1 to AD pathogenesis (represented by the top candidate rs56131196, effect size = 0.869, p = 1.10 × 10 −10 ; Fig. 1b, Table 1). Therefore, we further investigated potential causal variants in this region by performing credible variant analysis through CAVIAR 31 . We identified nine variants with a posterior probability > 10% from three loci-PVRL2, APOE, and APOC1 (Table 1)-marked by the following three causal variants with the highest probability: rs11668861 in the PVRL2 region, rs429358 in the APOE region, and rs56131196 in the APOC1 region (posterior probabilities = 42.5%, 13.9%, and 21.5%, respectively; Fig. 1c, Table 1). These findings suggest the existence of multi-variant effects in APOE and the surrounding region, and that the PVLR2 and APOC1 loci might contribute to AD pathogenesis in an APOE-ε4-independent manner. Furthermore, we queried the summary statistics from transethnic GWAS summary data reported by Jun et al. 32 from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS). Accordingly, multiple AD-associated variants from the PVRL2 and APOC1 loci with p-values < 5 × 10 −8 were identified in APOE-ε4 carriers (n = 12,738 and 13,850 for AD and NC carrying APOE-ε4, respectively; Supplementary Table 2) and in all individuals after adjusting for APOE-ε4 genotype (n = 21,392 and 38,164 for AD and NC, respectively; Supplementary Table 3). Notably, three of the potential causal variants identified in the mainland Chinese WGS dataset (i.e., rs12721051, rs56131196, and rs4420638) remained significant in conditional analyses after adjusting for APOE-ε4 in the transethnic GWAS results (Supplementary Table 3). Thus, our results indicate the existence of APOE-ε4-independent genetic AD risk factors in the APOE surrounding region.
AD risk haplotypes in the PVRL2 and APOC1 loci. To further dissect the AD-associated genetic structure in APOE and the surrounding region, we included additional variants (i.e., SNPs and INDELs) that were in LD (r 2 ≥ 0.50) with the nine causal variants in mainland Chinese WGS dataset, which yielded 33 variants that might reflect the AD-associated genetic signatures in this region (Supplementary Table 4). Haplotype analysis revealed two major haplotype blocks defined by variants extending from the PVRL2 and APOC1 causal variants (Fig. 2a). The stratified LD plots showed that AD patients manifested a distinct genomic structure relative to NC groups, as represented by stronger LD (i.e., larger pairwise r 2 values between variants) among risk variants in the PVRL2, APOE, and APOC1 loci, suggesting that these AD risk variants are more likely to coexist in AD (Fig. 2a). We replicated this analysis in the ADNI WGS dataset (n = 808) and observed similar LD patterns in AD ( Supplementary Fig. 1 Fig. 1 Multivariant effects of the APOE locus in the Chinese AD cohort. a Regional association plot of the AD risk variants in APOE-ε3 homozygous subjects. The horizontal red line denotes the p-value threshold of 0.01. b Regional association plot of the AD risk variants (SNPs and INDELs with frequency ≥ 5%) located in the APOE locus. The purple diamond specifies the sentinel variant (with the SNP ID marked in the plot). Dot colors illustrate the LD (measured as R 2 ) between the sentinel variant and its neighboring variants. c CAVIAR analysis results for mapping of possible causal variants in the APOE locus. Dots represent the variants tested in the APOE locus; the y-axis and dot color denote the effect size. Dot size corresponds to the posterior probabilities of the variants being the causal variants obtained from CAVIAR analysis, with the sentinel variants located in three loci marked with SNP IDs. AD Alzheimer's disease, CAVIAR causal variants identification in associated regions, cM/Mb centimorgans per megabase, INDELs insertions and deletions, LD linkage disequilibrium, SNP single nucleotide polymorphism, Post Prob posterior probabilities of being the causal variants mainland Chinese WGS data (Fig. 2b), particularly the minor haplotypes defined by the minor alleles of all variants within blocks that cover PVRL2 or APOC1 gene bodies (i.e., PVRL2 haplotype alpha and APOC1 haplotype gamma, respectively; Fig. 2b, c). In addition, these minor haplotypes were enriched and more frequently associated with each other in the MCI and AD groups than the NC group (Fig. 2b); thus, these minor haplotypes might contribute to AD, and there might be extended haplotypes spanning the PVRL2-APOE-APOC1 region formed by the combination of the abovementioned minor haplotypes from these three genomic regions.
We subsequently performed haplotype inference in a variant pool containing the PVLR2 and APOC1 haplotype blocks (comprising 14 variants for each haplotype block), as well as two coding variants representing APOE haplotypes (rs429358 and rs7412) by resolving their phased states (as recorded in phased VCF files) at the individual level. Using a partial correlation test controlling for confounding factors, we confirmed that there were more frequent associations between PVRL2 haplotype alpha and APOC1 haplotype gamma or APOE-ε4 in the AD and MCI groups when compared to the control groups ( Fig. 2c- Fig. 2 Haplotype structure of AD-associated risk variants in the Chinese AD cohort. a Pairwise LD plot of the 33 selected variants in LD with the potential risk variants in different phenotypic groups. The color map corresponds to the pairwise r 2 values between variants, with nine potential risk variants located in the PVRL2, APOE, and APOC1 loci marked at the top panel, respectively. b Haplotype analysis of the 33 selected variants among different phenotypic groups. Each column (marked with numbers) represents one of the 33 variants, with red and blue indicating the minor (i.e., AD risk) and major alleles, respectively. Each row represents a particular haplotype defined by a specific combination of major and minor alleles in the given haplotype blocks, with decimals on the right side specifying the frequencies of corresponding haplotypes in the given phenotypic groups. Intersecting lines represent the frequency of associations between two connected haplotypes (thin and thick lines denote associations with frequency > 1% and > 10% in the corresponding groups, respectively). c Table summarizing the identified minor haplotypes in PVRL2, APOC1, and extended APOE regions. Letters in uppercase blue or lowercase red denote the major and minor (risk) alleles, respectively; underlined letters highlight INDELs. d, e Pairwise correlations between the minor haplotypes of PVRL2 alpha and APOC1 gamma or APOE-ε4 measured by Spearman's partial rank-order correlation, adjusted for age, gender, and principal components in corresponding phenotypic groups (presented as Spearman's ρ in the y-axis). AD Alzheimer's disease, INDELs insertions and deletions, LD linkage disequilibrium, MCI mild cognitive impairment, NC normal controls (Supplementary Tables 1, 6, 7). In addition, we confirmed the existence of the minor haplotypes in PVRL2 and APOC1 loci (PVRL2 haplotype alpha, PVRL2 haplotype beta, and APOC1 haplotype gamma), as well as APOE-ε4-harboring extended haplotypes (haplotypes delta and epsilon; Fig. 2c) defined by the combination of PVRL2, APOE, and APOC1 minor haplotypes in non-Asian populations (predominantly Caucasian populations using three array-based AD genetic datasets, ADC, LOAD, and ADNI; Supplementary Tables 1,8). In summary, we identified PVRL2 and APOC1 and APOE extended haplotypes, which are potentially associated with AD, located in APOE and the surrounding region in the general population.
APOE-ε4-independent effects of the AD risk haplotypes. We subsequently used a multivariate model to evaluate the risk effects of the aforementioned minor haplotypes and determine their associations with AD (Supplementary Table 9−11). Metaanalysis highlighted the haplotypes' risk effects for AD, with all meta-p-values passing the genome-wide significance threshold (p < 5 × 10 −8 ; Supplementary Table 12). Notably, after controlling for APOE genotypes (both APOE-ε4 and APOE-ε2), PVRL2 haplotype alpha, APOC1 haplotype gamma, and the two APOE-ε4-harboring extended haplotypes (delta and epsilon) still manifested as conferring a significantly elevated risk for AD (Supplementary Tables 11, 12). Meta-analysis summarizing the statistics from all datasets (n = 7092 and 4856 for the AD and NC groups, respectively) corroborated the haplotypes' risk effects (meta-p < 0.01; Fig. 3a-d, Table 2, Supplementary   Tables 13, 14). Thus, we identified AD-associated haplotypes that encompass APOC1 and PVRL2, and contribute to AD in an APOE-ε4 genotype-independent manner. Furthermore, we replicated the above analysis in individuals harboring homozygous APOE-ε3 alleles. While APOC1 haplotype gamma was significantly associated with AD (effect size = 2.203, p = 6.84 × 10 −3 ), PVRL2 haplotype alpha was significantly associated with AD in females in the mainland cohort (effect size = 0.980, p = 0.038 in females). The concordant risk effects for PVRL2 haplotype alpha were observed in females in the ADC (effect size = 0.165, p = 0.250) and LOAD (effect size = 0.072, p = 0.720) cohorts. Thus, these results further support the risk effect of PVRL2 haplotypes in AD, especially in females.
Cross-platform validation of the AD risk haplotypes. To examine the accuracy of our haplotype-phasing method, we adopted two independent datasets: the mainland Chinese WGS dataset and the ADNI WGS datasets, both of which have the WGS and array data available. Both datasets indicated that our analysis method can achieve more than 95% accuracy (Supplementary Fig. 2, Supplementary Tables 15, 16) for haplotypes with a frequency > 5%. Furthermore, we obtained sequencing data from the Ashkenazim son-father-mother trio from the Personal Genome Project 33 , which comprises high-coverage (~300×) Illumina short-read data and long-read PacBio data (~50× coverage), and confirmed the existence of PVRL2 haplotype alpha and APOC1 haplotype gamma in the general population (HG003, the father, carries both haplotypes; Supplementary Table 17). We   Tables 18, 19). Thus, we demonstrated the existence of AD risk haplotype structures in the general population, as well as the accuracy of our detection method for both the WGS and imputed array data.
Effects of AD risk haplotypes on endophenotypes. We subsequently examined the effects of the identified risk haplotypes on cognitive performance, brain volumetric imaging, and levels of cerebrospinal fluid (CSF) and plasma biomarkers from ADNI dataset by using a multivariate model integrating information for the PVRL2, APOE, and APOC1 risk haplotypes. PVRL2 haplotype alpha was associated with worsening cognitive performance as assessed by the Everyday Cognitive Scale (p = 2.27 × 10 −4 ; total score reported by study partners; Fig. 4a, Supplementary  Table 26), and particularly the volume of the hippocampus, which plays key roles in memory-associated behaviors (p = 2.14 × 10 −2 ; Fig. 4c, Supplementary Table 27). The haplotype was also associated with changes in total Aβ 1-42 plasma level (FDR = 0.009; Fig. 4d, Supplementary Table 28) and a reduction in intercellular adhesion molecule 1 (ICAM-1) level in CSF (FDR = 0.054; Fig. 4e, Supplementary Table 29). In contrast, APOC1 haplotype gamma was associated with the plasma levels of free Aβ 1-40 (FDR < 0.001; Fig. 4f, Supplementary These results indicate that the identified PVRL2 and APOC1 risk haplotypes affect a variety of clinical and biochemical indexes including cognitive performance (especially memory function), brain volume, and plasma and CSF biomarkers-all in an APOE-ε4-independent manner ( Supplementary Fig. 3). This corroborates our previous findings and indicates that these risk haplotypes may play critical roles in the AD pathogenesis.
Association of risk haplotypes with gene expression changes.
Given that non-coding variants are potentially associated with the regulation of gene expression, we examined whether the variants in the identified risk haplotypes are located within regulatory regions in the human genome. The UCSC Genome Browser 34 suggested that some of these variants are located in transcription factor-binding regions ( Supplementary Fig. 4) Table 4), the identified PVRL2 AD-risk haplotypes might influence APOE expression level in the brain. We subsequently performed genotype-expression association analysis with the GTEx dataset, which revealed that PVLR2 minor haplotypes were associated with reduced blood PVRL2 transcript level (p = 1.77 × 10 −2 and 6.95 × 10 −6 for PVRL2 haplotypes alpha and beta, respectively; Fig. 5a, Supplementary Table 35). We observed the same associations in APOE-ε3 homozygous carriers (p = 2.41 × 10 −2 and 1.05 × 10 −4 for PVRL2 haplotypes alpha and beta, respectively; Supplementary Table 36). In the brain, PVRL2 haplotype alpha and APOC1 haplotype gamma exhibited concordant associations with increased APOE and APOC1 transcript levels (alpha: effect size = 0.347 and 0.273, meta-p < 0.05; gamma: effect size = 0.559 and 0.518, meta-p < 0.001; for APOE and APOC1 brain transcript levels, respectively; Fig. 5b, Supplementary Table 41), suggesting that the identified risk haplotypes have a distal regulatory effect on APOE expression in the brain. Interestingly, APOE-ε4 was associated with a consistent decrease of TOMM40, APOE, and APOC1 transcript levels in the brain (effect size = −0.370, −0.392, and −0.444, respectively, meta-p < 0.01; Fig. 5b, Supplementary Table 41), implying that APOE-ε4 has a suppressive effect on the nearby genes. Moreover, we observed a concordant increase in blood and brain transcript levels of APOE with increasing age (effect size = 0.014 and 0.011, p < 0.001; Fig. 5a, b, Supplementary Tables 35, 41). These results further suggest that aging affects gene expression, particularly APOE transcript levels in the brain and blood.
Other than causing an amino acid mutation in ApoE protein, APOE variant rs429358 exhibited allelic imbalance in multiple tissues, demonstrating a suppressive effect of variant rs429358 on the expression of the risk allele-harboring transcript (Fig. 5f). Unlike PVRL2 rs6859, we also observed an allelic imbalance of APOE rs429358 in brain tissues (average fraction of minor alleles = 0.442, p < 0.0001 in CommonMind; Fig. 5d, f, Supplementary Table 38), which corroborates the aforementioned suppressive effect of APOE-ε4 on brain APOE transcript level (Fig. 5b, f). In contrast, PVRL2 haplotype alpha and APOC1 haplotype gamma were associated with an elevated APOE transcript level in the brain (meta-p < 0.01; Supplementary Table 40 Table 42). These results suggest that dysregulated APOE expression is involved in AD pathogenesis in parallel with the dysfunctions conferred by APOE-ε4 allele.
Physical interactions of haplotype regions in the brain. To examine the possible mechanisms that contribute to the regulatory effects of the PVRL2, APOE, and APOC1 risk haplotypes on the expression of nearby genes in brain tissues, we adopted Hi-C data from two datasets: one comprising pooled samples from both adult and fetal human brains 36 , and the other comprising Hi-C data from the germinal zone (GZ) and cortical plate (CP) of the fetal brain 37 . We identified multiple interaction hotspots in APOE and the surrounding regions including regions that cover the risk haplotypes, i.e., the APOE risk haplotype region ( Regarding the interaction hotspots that cover the risk haplotypes, the APOE risk haplotype region exhibited physical interactions with the PVRL2-TOMM40 and APOC1P1 regions (FDR < 0.05; Fig. 6, Supplementary Tables 43, 44). Meanwhile, regarding the PVRL2 and APOC1 risk haplotypes associated with gene expression changes in the brain (Fig. 5b), the APOC1 risk haplotype region interacted with the PVRL2-TOMM40 region (FDR < 1 × 10 −9 for the adult and fetal brains; Fig. 6, Supplementary Table 43), and the PVRL2 risk haplotype region interacted with the PVRL2 promoter region in the adult brain (FDR < 0.001; Fig. 6, Supplementary Tables 43, 44). Interestingly, distal interactions with the risk haplotype region (p < 0.05) covering a broad genomic region were observed in both fetal and adult brain tissues (Supplementary Figs. 11,12), implying that non-coding haplotypes might have broad modulatory effects on nearby genes. These observations suggest the complexity of chromatin states that might contribute to the regulation of transcriptional activity, prompted the urgency for the further investigation of associated chromatin structure changes in the brain during the aging or dementing stage.
Functional implications of the AD risk haplotype variants. In line with the genotype-expression association analysis and observed chromatin interaction events, the identified non-coding risk variants likely function through modulating local transcript factor or microRNA binding. We first queried the non-coding risk variants to determine their potential functions. Several noncoding risk variants, including rs6859 and rs483082, as well as one INDEL, rs11568822, were co-localized with histone modifications and/or transcription factor-binding regions (Supplementary Fig. 13). Subsequent electrophoretic mobility shift assay for genomic regions harboring those variants confirmed their binding capability with nuclear protein (Supplementary Fig. 14), implying that these non-coding variants play roles in the modulation of transcription factor binding.
Furthermore, MicroSNiPer 38 database query of rs6859, which is located in the UTR of PVRL2 transcript, returned microRNA candidates including miR-595, miR-636, and miR-1825-all of which might bind to the rs6859 region (Supplementary Table 45). These binding events were further assessed by independent in silico alignment using miRanda (Supplementary Table 46). Specifically, miR-595 was predicted to only interact with the major G allele of rs6859 and not the minor A allele (Supplementary Tables 46, 47). This suggests that rs6859 might also affect the PVRL2 transcript level through the modulation of microRNA binding events at its UTR in parallel with transcription factor binding at the DNA level.
Haplotype prevalence is heterogeneous among ethnic groups. To corroborate the observed differences in haplotype frequency across the Chinese and non-Asian datasets (Supplementary Table 8), we assessed individual haplotype frequency using the 1000 Genomes Project phase 3 data (n = 2,504) and stratified the individuals into five "super-populations." The results show heterogeneity among ethnic groups (Fig. 7, Supplementary Table 48). Regarding APOE, APOE-ε4 was most frequent in the African population (frequency = 0.267) and was less frequent in the East Asian population than the European population (frequency = 0.086 vs. 0.155, respectively), whereas APOE-ε2 was more frequent in the East Asian population than the European population (0.100 vs. 0.063, respectively). The prevalence of PVRL2 haplotype alpha was similar between the East Asian and European populations (0.102 and 0.103, respectively). However, PVRL2 haplotype beta and APOC1 haplotype gamma were much less frequent in the East Asian population than the European population (haplotype beta = 0.081 vs. 0.318, haplotype gamma = 0.066 vs. 0.111, respectively). As for long-range AD risk haplotypes, haplotype delta was most frequent in the East Asian population (0.043 vs. 0.016, 0.027, 0.021, and 0.002 in the South Asian, American, European, and African populations, respectively), whereas haplotype epsilon was most frequent in the European population (0.059 vs. < 0.001, 0.008, 0.006, and 0.002 in the East Asian, South Asian, American, and African populations, respectively). These findings suggest the existence of possible divergent mechanisms of AD pathogenesis among ethnic groups and demonstrate how ethnic diversity might influence the relative risk of a disease at the population level.

Discussion
Here, we report a comprehensive analysis of APOE and the surrounding region using WGS data, which revealed specific ADassociated genetic structures. Our haplotype analysis identified PVRL2 and APOC1 minor haplotypes that exhibit independent risk effects for AD in parallel with APOE-ε4, as well as long-range AD risk haplotypes defined by the combination of PVLR2, APOE, and APOC1 risk haplotypes that exhibit stronger risk effects than APOE-ε4 alone. We also demonstrated that the AD risk haplotypes are associated with endophenotypes. The regulatory effects of the risk haplotypes on the brain transcript levels of APOE and its nearby genes, together with the identification of chromatin interaction hotspots in and near the APOE risk loci, all support involvement of the identified genetic risk factors in the APOE locus play pathological roles in AD in parallel with APOE-ε4.
Most previous genetic studies identified genetic risk factors at the single-variant level 2,27,28 . However, individual genetic variants can only explain a small proportion of the variations of complex traits (e.g., phenotypic consequences of diseases or gene expression), which are largely due to polygenetic effects (i.e., combined effects of multiple common variants) 39,40 . Corroborating this notion, we have identified AD risk haplotypes in APOE and the surrounding region that harbor functional variants (Table 1). In particular, the identified minor haplotypes in the PVRL2 and APOC1 regions exhibit APOE-ε4-independent AD risk effects. Thus, our fine-mapping work extends the current understanding of the APOE locus as a risk factor for AD beyond the well-studied APOE-ε4 to a more complex genomic structure and its associated regulatory mechanisms. In particular, we showed that the risk haplotypes potentially exert biological impacts through modulating endophenotypes including memory performance, hippocampal volume, proteomic biomarkers in CSF and plasma, and transcriptome signatures in the brain and blood. Thus, these results demonstrate the functional implications of the risk effects of the non-coding variants/haplotypes from the macroscale to the microscale. Their roles in gene expression are further supported by the chromatin interaction events of the APOE locus in human brain tissues, as well as the risk variant-dependent regulation of microRNA and nuclear protein binding ( Supplementary Fig. 14,  Supplementary Tables 46, 47). These results are vital for more comprehensive analyses of phenotype-associated genomic structures in AD risk loci or the contribution of polygenic effects to AD-associated phenotypes. These findings might also facilitate AD mechanistic studies or the development of risk prediction or intervention strategies in a genotype-aware manner.
Regarding the identified risk loci, PVRL2 and APOC1, in the APOE surrounding region, PVRL2 encodes poliovirus receptorrelated 2, which is a glycoprotein and a component of the plasma membrane that serves as an entry point for herpesvirus and pseudorabies virus 41 . While it was recently reported that levels of herpesvirus, HHV-6A, and HHV-7 are elevated in postmortem AD brains compared to normal brains 42   of PVRL2 expression, specifically in blood, affects viral entry in AD patients requires further study. Meanwhile, APOC1 encodes apolipoprotein C1, which is mainly involved in lipoprotein metabolism and might inhibit the ApoE-mediated uptake of verylow-density lipoprotein particles 43 . Thus, it is important to examine whether altered APOC1 expression regulates ApoE functions such as ApoE-associated Aβ clearance in AD states. APOE or APOE-ε4 transcript levels in the brain might also be crucial for the pathogenesis of AD. Alterations of APOE signatures have been observed in AD brain tissues 26,44,45 . Meanwhile, noncoding AD genetic risk factors might mediate their effects by modulating gene expression in specific cellular contexts 46,47 . The present study showed that the identified PVRL2 and APOC1 risk haplotypes are potentially associated with elevated brain APOE transcript level, which is consistent with the changes in brain APOE level during aging; this suggests that a higher brain APOE (or APOE-ε4) level is associated with the risk of disease pathogenesis. Notably, AD transgenic mouse model(s) exhibit higher hippocampal APOE transcript levels than corresponding wild-type mice ( Supplementary Fig. 15a). Moreover, APOE transcript levels are strongly correlated with hippocampal plaque pathology in AD transgenic mice (R 2 > 0.70; Supplementary Fig. 15b). In addition, recent studies show that APOE expression is elevated in diseaseassociated microglia in an AD transgenic mouse model 48 and microglia with a neurodegenerative phenotype 49 ; these results collectively implicate elevated APOE level in inflammatory response, AD disease onset, and AD progression. Thus, in addition to APOE-ε4 genetic risk factors, elevated brain APOE level might be critical for the pathogenesis of AD. Furthermore, our analysis provides additional clues regarding the suppressive effects of APOE-ε4 on APOE expression in the brain after controlling for the genetic content in the PVRL2 and APOC1 regions. Thus, further investigation is required to determine how APOE-ε4 mediates the regulatory roles of APOE expression.
In conclusion, we identified AD risk haplotypes with putative biological effects that confer AD risk. Our findings suggest the existence of alternative disease mechanisms involving non-coding variants in the APOE surrounding regions, which act in parallel with the well-studied APOE-ε4 risk factor. Our results further demonstrate the complexity of the genetic basis associated with AD pathogenesis, which might result in aggregate risk effects from both intrinsic factors such as mutant proteins defined by coding mutations, the local and distal regulation of gene expression by genomic contents, as well as extrinsic factors including aging, viral infection, and ethnic variation. Further investigations aiming to further dissect the underlying mechanism of AD will be of great importance for the development of effective diagnostics and therapeutics.   Covariates adjustments in association analysis. In general, for all statistical analyses, age, gender, and the top five principal components (PCs) were included as covariates separately within individual cohort. Principal components analysis was conducted using the PLINK 56 (version 1.9)-pca function with the pruned (-indeppairwise 50 5 0.2) variants with an MAF > 5%. For Chinese AD cohorts, the genome-wide variant calling was obtained using Gotcloud pipeline with genotyping refinement performed by Beagle 60,61 (r1399) (nthreads = 24, phase-its = 30, impute-its = 15; Please refer to Supplementary Methods for more detailed information). For ADNI biomarker data, phenotypic labels were included as covariates.

Methods
For ADNI brain volumetric data, the analysis was further adjusted for the type of MRI platform, analytical software, and individual intracranial volume.
Association test at the single variant level. We used PLINK 56,62 (version 1.9) for logistic regression analysis of SNPs and INDELs with an MAF > 5% in APOE and the surrounding region (chr19:45,300,000-45,500,000), controlling for age, gender, and the top five ancestry PCs; 682 variants passed these filters and were included in the analysis (-hwe 1E-05,-maf 0.05). We subjected the PLINK association results (i.e., Z-score) with pairwise linkage disequilibrium (LD) information (i.e., the r 2 matrix obtained from PLINK-matrix with-r function) to CAVIAR 31 (Causal Variants Identification in Associated Regions) analysis (version 2.0.0) to estimate the potential causal variants within the APOE locus indicated by the posterior probability of being the causal variants.
Multivariate regression analysis for haplotype function. Multivariate regression analysis was performed to estimate the effects of specific haplotypes on phenotype or gene expression because of the existence of multiple haplotypes in the study cohort. An N × (M + 1) matrix was generated for a cohort comprising N individuals (in rows) and M detected haplotypes with frequencies > 1% or > 5% (in columns), with cells storing a value of 0, 1, or 2, representing the harboring of 0, 1, or 2 copies of specific haplotypes, respectively. In the last column (M + 1th column), the haplotype counts for haplotypes with a frequency < 1% were summed and annotated as "others" to ensure the sum of each row equaled 2. Major haplotypes (usually Hap 1 denoted by all major alleles, which is the most frequent in the population) were excluded in the regression model during the association test. Thus, the effect sizes (beta) from the model above were estimated with respect to the major haplotype.
To further control the effects from other haplotype regions, the genetic dosages of minor haplotypes from all haplotype blocks were included in the present models with minor revision of above model. See Supplementary materials and methods for a detailed description about the analytical model.
Association test and meta-analysis of candidate haplotypes. Minor haplotypes with frequencies > 1% were included in the multivariate logistic regression model using the R glm function from the stats package. Analyses were performed separately for the PVRL2, APOC1, and long-range haplotypes defined by the combination of PVRL2, APOE, and APOC1 haplotypes. The analyses were controlled for APOE genotype by incorporating the genotype dosages of APOE-ε4 and APOE-ε2 into the model. The effect size and standard errors (SE) obtained from the logistic regression were subjected to METASOFT 63 to generate the meta-analysis results using a random effects (RE) model, with statistical significance estimated by Han and Eskin's random effects model (RE2).
Association test for haplotypes on endophenotypes. A multivariate model jointly taking haplotype information from the PVRL2, APOE, and APOC1 loci was adopted to assay the haplotype effects on cognitive score, brain volumetric data, and ADNI biomarker levels using robust regression (R lmrob from the robustbase package) with appropriate covariate adjustments. For ADNI biomarker data, Bonferroni adjustment was applied for the association test of individual biomarkers to correct for the multiple tests on haplotypes, whereas the false discovery rate (FDR) was calculated for individual haplotypes across all biomarkers. Adjustments were performed using the p.adjust function from the R stats package.
Association test for variants/haplotypes on gene expression. GTEx data comprising the transcript levels of PVRL2, TOMM40, APOE, and APOC1 (rankbased inverse normal transformed by the R rntransform function from the Gen-ABEL 64 package) together with imputed genotype data for variants with an MAF > 5% located in non-repetitive regions (UCSC RepeatMasker in hg19 coordinate) were included in the genotype-phenotype association test using PLINK, with age, gender, and the top five PCs as covariates. To estimate the variant effects for all tissues or 13 brain tissues, meta-analysis was conducted using the rma in the R package metafor 65 (method = "HE," test = "knha"), taking effect sizes and standard errors from the PLINK results. For haplotype data, association tests were conducted using the multivariate model, jointly including PVRL2, APOE, and APOC1 haplotype information using the robust regression model. Among the brain tissues, the cerebellum, cerebellar hemisphere, and spinal cord were excluded from the meta-analysis conducted by METASOFT using the RE model, with statistical significance estimated by the RE2 model for haplotype effects in brain tissues. For ASE data in GTEx data, robust regression was applied to test associations. For ASE in the GTEx and CommonMind datasets, one-sample t-tests were applied to examine allele imbalance under the null hypothesis of balanced expression (i.e., the fraction of reads carrying minor alleles = 0.5 as the theoretical values) using GraphPad Prism 6 (GraphPad Software Inc.) at an α level of 0.05.
Chromatin interaction analysis in brain tissues. Two high-throughput chromosome conformation capture (Hi-C) datasets were adopted to investigate the chromatin organization in APOE and surrounding regions. The first dataset comprised anterior temporal cortex samples from three adults of European ancestry with no psychiatric disorders, as well as cerebral cortex samples from three fetal brains at a gestational age of 17-19 weeks 36 . All samples were free from large structure variations (>100 kb), and easy Hi-C (eHi-C) methods were adopted for library construction, sequencing, and data analysis 66 . The second dataset comprised data generated from three paired germinal zone and cortical plate fetal brain samples 37 . Briefly, for both datasets, pooled or individual data were mapped to human reference genome (hg19) using BWA mem or Bowtie 67 . The uniquely mapped paired-end reads passing quality controls were further binned into 10-kb bin resolution contact matrices, and the data were then subjected to Fit-Hi-C 68 and FastHiC 69,70 to assess chromatin interaction events in this region. The FDR was further calculated to identify interaction hotspots.
Data visualization. The GWAS results were visualized using Locuszoom 71 plots, with LD and p-values obtained from the WGS data. The CAVIAR results and heatmap for haplotype effects were visualized using the ggplot function in the ggplot2 R package. LD and haplotype structures were plotted using Haploview. Bar charts, dot plots, box plots, and line charts were generated using GraphPad Prism 6 (GraphPad Software Inc). Forest plots for meta-analysis were generated using ForestPMPlot 72 . Pie charts were generated using Excel 2017 (Microsoft). Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.