The escape from Alzheimer's disease underlies genetic variants involved into immune response and endocytosis pathways

The risk to develop and escape Alzheimer9s disease (AD) is influenced by a constellation of genetic variants, each associated with specific molecular pathways. Different pathways may differentially contribute to the modification of the AD-risk. We studied the molecular mechanisms that explain the extreme ends of the cognitive spectrum by comparing pathway-specific polygenic risk scores (pathway-PRS) in individuals with AD and those who escaped AD until old age.
We used 29 genetic variants associated with AD to calculate pathway-PRS for five major pathways involved in AD. We developed an integrative framework that allows multiple genes to associate with a variant, and multiple pathways to associate with a gene. We studied pathway-PRS in patients with AD (N=1,909), population controls (N=1,654), and cognitively healthy centenarians who escaped AD (N=293). Last, we estimated the contribution of each pathway to the genetic risk of AD in the general population. 
All pathway-PRS significantly associated with the modification of AD-risk (p<0.05). The pathway that contributed the most was β-amyloid metabolism (32%), driven mainly by APOE variants. After excluding APOE variants, all pathway-PRS associated with increased AD-risk (p<0.05), while specifically immune response (p=3.1x10-3) and endocytosis (p=3.8x10-4) associated with escaping AD. These pathways were the main contributors to the genetic risk of AD (41.3% and 21.4%, respectively), and their effect on escaping AD was larger or comparable to that of developing AD. 
Our work suggests that immune response and endocytosis might be involved in general neuro-protective functions, and highlights the need to study these pathways, next to β-amyloid metabolism.


Polygenic risk score
To calculate the personal polygenic risk score, or the genetic risk of AD that affects a single individual, the effect-sizes of all genetic variants that significantly associate with AD are combined. Formally, a PRS is defined as the sum of traitassociated alleles carried by an individual across a defined set of genetic loci, weighted by effect-sizes estimated from a GWAS. [30] We constructed a polygenic risk score (PRS) using 29 variants that were previously associated with AD.
As weights for the PRS, we used the variant effect-sizes (log of odds ratio) as published in large GWAS of AD (Table   S1). Given a subject s, the PRS is defined as: where K is the full set of variants, + $ is the allele dosage from the (imputed) genotype of variant k in subject s and + is the variant effect size as derived from literature (Table S1).

Mapping variants to pathways
We studied the five pathways implicated in AD: immune response, ß-amyloid metabolism, cholesterol/lipid metabolism, endocytosis and vascular dysfunction. [22][23][24][25][26][27] For these pathways we developed the variant-pathway mapping 2 + , which represents the degree of involvement of a given variant in the pre-selected pathways. To generate this value, we (i) associated genetic variants to genes (variant-gene mapping), (ii) associated genes to pathways (gene-pathway mapping) and (iii) combined these mappings in the variant-pathway mapping.
Variant-gene mapping: the association of a variant with a specific gene is not straight-forward as the closest gene is not necessarily the gene affected by the variant. The two most recent and largest GWAS of AD addressed the relationship between genetic variants and associated genes applying two independent methods. [19,20] Briefly, one study used (i) gene-based annotation, (ii) expression-quantitative trait loci (eQTL) analyses, (iii) gene cluster/pathway analyses, and (iv) differential gene expression analysis between AD cases and healthy controls. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted October 18, 2019. . https://doi.org/10.1101/19009464 doi: medRxiv preprint Gene-pathway mapping: each gene from the variant-gene mapping was classified into the pre-defined set of pathways integrating four sources of information: 1. Gene-sets from the pathway analysis of Kunkle et al., [19] in which the authors identified 9 significant pathways (coupled with the genes involved in each pathway), which we mapped to 3 of the 5 pathways of interest (Table   S2); 2. Associated genes from Gene-ontology (GO) terms resembling the 5 pathways of interest within the biological processes tree (including all child-terms) ( Table S3); [44,45] 3. Gene-sets derived from the functional clustering within DAVID: [46,47] the gene-set from the variant-gene mapping was used to obtain 15 functional clusters which were then mapped to the 5 pre-selected pathways using a set of keywords (Table S4 and Table S5); 4. Gene-pathway associations from a recent review concerning the genetic landscape of AD (Table S6); [22] By counting the number of times each gene was associated to each pathway according to these sources, and dividing by the total number of associations per gene, we obtained a weighted mapping of each gene g to one or more pathways p, 2 4 , denoted as the gene-pathway mapping weight. In case the gene-pathway mapping could not be calculated (i.e there was no mapping to any of the pathways of consideration), we excluded the gene from further analyses (Table S7).
To associate variants with pathways, we combined the variant-gene mapping and the gene-pathway mapping. Given a variant k, mapping to a set of genes G, and a pathway p, we define the weight of the variant to the pathway ( 2 + ) as: 2 + = ∑ ( 4 + * 2 4 ) 6 4 (2) where 4 + is the variant-gene mapping weight of variant k to gene g, and 2 4 is the gene-pathway mapping weight of gene g to pathway p. In this way, for each variant, we calculated a score indicative of the involvement of the variant in each of the five pathways (variant-pathway mapping). For some variants no variant-pathway mapping was possible. We marked these variants as unmapped.

Pathway-specific polygenic risk score
For the pathway-specific polygenic risk score (pPRS), we extended the definition of the PRS by adding as multiplicative factor the variant-pathway mapping weight of each variant. Given a sample s and a pathway p, we defined the pPRS as: where + 2 is the variant-pathway mapping of variant k to pathway p.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review)

Association of PRSs in the three cohorts
We calculated the polygenic risk score (PRS) and pathway-PRS (pPRS) for the population subjects, the AD cases and the cognitively healthy centenarians (escapers of AD) (P, A and C, respectively). In addition, we interrogated the influence of the APOE variants on these risk scores by calculating the PRSs and pPRSs with and without APOE variants.
To investigate the differential contributions of the risk scores to AD development as well as escape from AD, we calculated (i) the association of the risk scores (PRS and pPRS) with AD status by comparing AD cases and population subjects (A vs. P), and (ii) the association of the risk scores with escaping AD until extreme ages by comparing cognitively healthy centenarians and population subjects (C vs. P comparison). For the associations, we used logistic regression models with the PRS and pPRS as predictors. The PRS were scaled (mean=0, SD=1) and therefore the effectsizes (log of odds ratio), can be interpreted as the odds ratio difference per one standard deviation (SD) increase in the PRS, with corresponding estimated 95% confidence intervals (95% CI).
To further investigate the relationship between the effect of each pathway on AD and on escaping AD, we calculated the change in effect-size as the ratio between the effect-size of the association with escaping AD (log of odds ratios of C vs. P comparison) and the effect-size of the association with AD (log of odds ratios of A vs. P comparison). A change larger than 1 is indicative of a more pronounced effect on escaping AD, whereas a change smaller than 1 indicates a more pronounce AD effect.

Contribution of each pathway to polygenic risk of AD
We estimated the contribution of each pathway to the genetic risk of AD in the general population: this is the ratio between the variance of each pathway-PRS and the variance of the combined PRS as calculated in the individuals in the general population. As such, it is a function of the variant-pathway mapping, the effect-size (log of odds ratio) of the variants, and the variant frequencies. Given a variant k and the relative variant-pathway mapping + 2 , we define the percentage of the risk explained by each pathway p as: 2 = ∑ (8 9 : * (; 9 < * 8=> 9 * (?@8=> 9 ))) A 9 ∑ (; 9 < * 8=> 9 * (?@8=> 9 )) A 9 where + is the variant effect-size from literature, and is the variance of a Bernoulli random variable that occurs with probability + , i.e the minor allele frequency of each variant k in our cohort of middleaged healthy population subjects. Here, + 2 is interpreted as the probability that variant k belongs to pathway p. When calculating the contributions of each pathway, we also considered variants with missing variant-pathway mapping. For these variants, the variant-pathway mapping was set to 1 for an unmapped pathway. Together, the pathway PRS variances sum to the total PRS variance.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted October 18, 2019. . https://doi.org/10.1101/19009464 doi: medRxiv preprint

Implementation
We performed quality control of genotype data as well as population stratification analysis and relatedness analysis with PLINK (v2.0). All subsequent analyses were performed with R (v3.5.2), Bash and Python (v2.7.14) scripts.

Pathway-specific PRS associate with AD and escape from AD
Subsequently, we annotated the 29 AD-associated genetic variants to 5 pathways (Figure 2). According to our variantgene mapping, the 29 AD-associated variants mapped to 155 genes ( Table S8). The number of genes associated with each variant ranged from 1 (e.g. for variants in/near SORL1, APOE or PLCG2), to 34 (a variant in the gene-dense region near ZCWPW1 gene) ( Figure 2 and Table S9). We were able to calculate the gene-pathway mapping weight for 74 genes ( Table   S10). The remaining 81 genes were not mapped to the 5 pathways (Table S8). In total, we calculated the variant-pathway mapping for 23 loci to at least one of the pre-selected biological pathways ( Figure 2 and Table S11).
We then calculated the pPRS for each pathway in population subjects, AD cases and cognitively healthy centenarians including and excluding APOE variants ( Figure 1B and 1C). We found that, when including APOE variants, the pPRSs of all pathways significantly associated with increased risk of AD ( Figure 1B and Table S7). The association of pPRSs with increased chance of escaping AD was as expected in the opposite direction for all pathways, and the association was significant for all pathways except for  Figure 1B and Table S7).
When excluding APOE variants, the pPRSs of all pathways was still significantly associated with increased risk of AD  Figure 1C and Table 2). The association of pPRSs with increased chance of escaping AD was as expected in the opposite direction for all pathways, yet the association was significant only for immune response and endocytosis ( p=1.1x10 -1 ) ( Figure 1C and Table 2).

Comparison of effect on AD and escaping AD
To compare the association of different pPRSs with AD and with escaping AD until extreme old ages more in depth, we calculated for all pPRSs the change in effect-size. A change in effect-size of 1 indicates a similar effect on causing AD as on escaping AD, whereas a change larger than 1 is indicative of larger effect on escaping AD compared to causing AD. When including APOE variants, the change in effect-size was <1 for all pathways except for the vascular dysfunction pathway ( Figure 3). This is expected as the effect-size of APOE variants on causing AD alone is much larger than its effect on escaping AD. When excluding APOE variants, the change in effect-size was still <1 for b-amyloid metabolism and cholesterol/lipid metabolism (respectively 0.33 and 0.59), but it approximated 1 for endocytosis (0.93) and it was larger than 1 for the immune response and vascular dysfunction (respectively 1.28 and 1.15) ( Figure 3).

Contributions of each pathway to the polygenic risk of AD
Finally, we estimated the relative contribution of each pathway to the polygenic risk of AD in the general population. This is indicative of the degree of involvement of each pathway to the total polygenic risk of AD. Including APOE variants, the contribution of the pathways to the total polygenic risk of AD was 32% for ß-amyloid metabolism, 24.2% for . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review)  Figure 4A).
When we excluded APOE variants, the contribution of the pathways to the total polygenic risk of AD was 41.3% for immune response, 21.4% for endocytosis, 13.3% for ß-amyloid metabolism, 6% for cholesterol/lipid metabolism, 3% for vascular dysfunction and 15% for the unmapped variants ( Figure 4B).

Discussion
In this work, we studied 29 common genetic variants known to associate with AD using polygenic risk scores and pathway-specific polygenic risk scores. As expected, we found that a higher PRS for AD was associated with a higher risk of AD. In addition, using our unique cohort of cognitively healthy centenarians, we showed that a lower PRS for AD associated with escaping AD until extreme old ages. This suggests that the long-term preservation of cognitive health is associated with the selective survival of individuals with the lowest burden of risk-increasing variants or, vice versa, the highest burden of protective variants. Then, using an innovative approach, we studied the five pathways previously found to be involved in AD as well as the contribution of these pathways to the polygenic risk of AD. We showed that all pathways-PRS associate with increased AD risk, both including and excluding APOE variants. When we studied the association of pathways-PRS with escaping AD until extreme old ages, we found that, as expected, APOE variants played a major factor also in escaping AD. However, when excluding these variants only immune response and endocytosis significantly associated with an increased chance to escape AD. Interestingly, the effect of these pathways on escaping AD was larger or comparable to that of developing AD, suggesting that these pathways might be involved in general neuro-protective functions. Based on the effect size and frequency of all AD-associated variants, we found that the ß-amyloid metabolism (32%) followed by endocytosis (24.2%) were the major contributors to the modification of AD-risk. After excluding APOE variants, the pathways that contributed the most to the modification of AD-risk were immune response (41.3%) and endocytosis (21.4%).
Previous studies showed that polygenic risk score of AD not only associated with increased risk of AD, but also with neuropathological hallmarks of AD, lifetime risk and the age at onset in both APOE ε4 carriers and non-carriers.[28, 29, 48-52] We now add that the PRS for AD also associates with escaping AD at extremely old ages. This adds further importance to the potentiality of using PRS and APOE genotype in a clinical setting. [48,49,51,53] Our approach to map variants to associated genes and to map genes to pathways resulted in a weighted annotation of variants to pathways that allowed for uncertainty in gene as well as pathway assignment, which was not done . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted October 18, 2019. . https://doi.org/10.1101/19009464 doi: medRxiv preprint previously. We note that considering uncertainty in variant-gene as well as gene-pathway assignments is crucial because most genetic variants are in non-coding regions, which makes the closest gene not necessarily the culprit gene, and because different functional annotation-sources often do not overlap. The pathway-specific PRS that we proposed in this manuscript can be re-used for the identification of subtypes of AD patients compromised in a specific AD-associated pathway. This is of interest for clinical trials, in order to test responsiveness to compounds in specific subsets of patients. For example, monoclonal antibody targeting TREM2 receptors could work better in AD patients where the immune response pathway is most deficient. Recently, several studies attempted to construct pathwayspecific PRS to find heterogeneity in AD patients based on a genetic basis. [28,29] In line with our findings, Ahmad et al. found that genes capturing endocytosis pathway significantly associated with AD and with the conversion to AD. [29] Other studies used less variants [28] or less stringent selection for variants, and did not observe a differential involvement of pathways in AD etiology. [54] The amyloid cascade hypothesis has been dominating AD-related research in the last two decades. However, treatments targeting amyloid have, so far, not been able to slow or stop disease progression. This has led to an increased interest for the other pathways that are important in AD pathogenesis. [22] Part of the current view of the etiology of AD is that the dysregulation of the immune response is a major causal pathway, and that AD is not only a consequence of β-amyloid metabolism. [55,56] In addition, previous studies showed that healthy immune and metabolic systems are associated with longer and healthier lifespan. [1,57] Our results indicate that, excluding APOE variants, the effect of immune response and endocytosis on escaping AD is stronger or comparable to the effect on causing AD. This suggests that these pathways might be involved in the maintenance of general cognitive health, as the cognitively healthy centenarians represent the escape of all neurodegenerative diseases until extreme ages. We recently found evidence for this hypothesis in the protective low frequency variant in PLCG2, which is involved in the regulation of the immune response. [50] This variant is enriched in cognitively healthy centenarians, and protects against AD as well as frontotemporal dementia and dementia with Lewy bodies. [50] We included this variant in the total PRS as well as in the pathway-PRS for the immune response (variant-pathway mapping was 60%) and endocytosis (variant-pathway mapping was 40%). Regarding endocytosis, this pathway is thought to play a role both in neurons, as part of the β-amyloid metabolism, but also in glia cells, as part of the immune response. Thus, a dysregulation in the interplay between these pathways might lead to an imbalance of immune signaling factors, favouring the engulfment of synapses and AD-associated processes. This, in turn, may contribute to the accumulation of amyloid and tau pathologies.[58-61].
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted October 18, 2019. . https://doi.org /10.1101/19009464 doi: medRxiv preprint In this work, we assessed the effect of common and low frequency variants on the development and the escape of AD. Therefore, the contributions of rare, causative variants associated with increased AD risk, such as those in APP, PSEN1, PSEN2, TREM2 and SORL1 were not considered. Despite the large odds ratios to develop AD associated with carrying such variants, the frequency of these variants in the population is very low, and therefore have a minor effect on the total AD risk in the population. [11,12] However, future versions of the PRS will most likely include the effect of carrying disease-associated rare variants. This will affect individual PRS scores and the necessity to accordingly adapt the results generated with current PRSs. Compared to the sizes of recent GWAS of AD, we included relatively small sample sizes, particularly with respect to the cognitively healthy centenarians, a very rare phenotype in the population (<0.1%). [4] These sample sizes are however sufficient to study PRSs. The cohorts that we used in this study were not used in any GWAS of AD, therefore we provide independent replication of AD PRS in a homogeneous group of (Dutch) individuals.
The goal of this study was not to associate new pathways to AD. For this reason, we limited our analysis to 5 pathways known to be relevant in AD. We acknowledge that, as new variants as well as pathways will be associated with AD, the contributions and the pathway-specific PRSs will need to be recalculated. A limitation, not exclusive to our work, is the highly debated role of APOE gene. We mapped the effect of APOE to four pathways and we are aware this assignment is relatively arbitrary. For example, APOE has vascular properties in cholesterol transport,[62] but in the annotation sources that we consulted these links were not present. The combination of a large effect and unclear pathway assignment makes that pathway-PRS including APOE challenging to use. Lastly, we note that the variance contributions might change in different populations, as it depends on variant frequency and population heterogeneity.
Concluding, with the exclusion of APOE variants, the aggregate contribution of the immune response and endocytosis represents more than 60% of the currently known polygenic risk of AD. This indicates that an intervention in these systems may have large potential to prevent AD and potentially other related diseases and highlights the critical need to study (neuro)immune response and endocytosis, next to b-amyloid metabolism. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) * Age at onset for AD cases, age at study inclusion for population subjects and cognitively healthy centenarians.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review)  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review)   (Table S1). Figure B (central) and Figure C (bottom) show the pPRS for each of the selected molecular pathways, including and excluding APOE variants, respectively. For all plots, risk scores were calculated for AD cases, population subjects and cognitively healthy centenarians. Then, risk scores were compared between (i) AD cases and population subjects (A vs. P comparison) and (ii) cognitively healthy centenarians and population subjects (C vs. P comparison). For representation, we scaled all PRS and pathway-PRS to be mean=0 and SD=1. For the comparison, we used logistic regression models with risk scores as predictors. Annotation: ***, p-value of association < 5x10 -6 ; **, pvalue of association < 5x10 -4 ; *, p-value of association < 5x10 -2 .

APOE included APOE excluded
AD > Escaping AD Escaping AD > AD . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted October 18, 2019. . https://doi.org/10.1101/19009464 doi: medRxiv preprint  (Table S1), and (iii) variant's frequency in our cohort of middle-aged healthy population subjects. We also considered variants with missing variant-pathway mapping (unmapped pathway).

APOE excluded
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted October 18, 2019. . https://doi.org/10.1101/19009464 doi: medRxiv preprint