Main

Alzheimer’s disease (AD) is the most common cause of dementia, accounting for up to 80% of all dementia cases1, of which late-onset Alzheimer’s disease (LOAD) is most common2. As of 2022, approximately 55 million individuals worldwide had dementia, representing one out of nine people aged 65 years or older3. Although promising advances have been made in amyloid-targeting therapeutic options for early-stage LOAD4,5, they still have limited benefit, and identification of additional risk pathways that can be used for early detection and intervention is highly needed. To meet these demands, a variety of biologically relevant circulating molecules have been broadly associated with LOAD risk. The proteome in particular has the potential to reveal circulating markers of disease-related molecular pathways from different tissues, and studies assessing the circulating proteomic signatures between older adults without dementia and individuals suffering from LOAD have been described6,7,8,9,10,11,12,13,14,15,16. Modest sample sizes, low-throughput proteomics and lack of longitudinal information have, however, been limiting factors in these studies. A recent large-scale longitudinal study identified promising blood-based markers for all-cause incident dementia, although it is unknown how specific the results are to LOAD17. Information on the global circulating proteomic profile preceding the onset of LOAD, and how well it reflects AD-related processes in brain and cerebrospinal fluid (CSF), is, thus, scarce.

AD has a considerable genetic component, and both common18 and rare risk variants have been identified19, of which the strongest effects are conferred by variants in the well-known APOE (apolipoprotein E) gene. Approximately 25% of the general population carries the APOE-ε4 variant, whereas it is present in over 50% of AD cases20,21. The APOE-ε4 allele increases the risk of LOAD by three-fold in heterozygous carriers and by up to 12-fold in homozygous carriers22. Although the link between the ε4 allele and LOAD has been extensively researched, light has yet to be shed on the precise mechanism by which the APOE gene affects LOAD onset and/or progression. Importantly, recent large-scale proteogenomic studies have consistently established the APOE locus as a protein-regulatory hotspot, regulating levels of hundreds of proteins in both circulation23,24,25,26 and CSF27,28. However, it remains unknown to what extent these proteins relate to LOAD and if they can provide new information on the mechanisms through which APOE-ε4 mediates its risk. Identifying LOAD-associated circulatory proteins and whether their association is APOE-ε4 dependent or independent is crucial for the understanding of AD more generally as well as for gaining insight into potential pathways suitable for targeting in personalized treatment.

The current study tests the hypotheses that specific proteomic signatures in the circulation precede LOAD diagnosis and can reflect dysregulated biological pathways in the brain and CSF. Furthermore, we expect that some of these protein signatures may be affected by the APOE-ε4 genotype and can, thus, provide molecular readout of pathways directly affected by APOE-ε4. To address these hypotheses, we used a high-throughput aptamer-based platform to characterize 4,137 serum proteins in 5,294 participants of the population-based Age, Gene/Environment Susceptibility–Reykjavik Study (AGES)29 to identify protein signatures of incident LOAD (events occurring during follow-up) and prevalent LOAD, taking an unbiased, longitudinal and cross-sectional approach to the discovery of potential biomarkers for LOAD (Fig. 1). Considering the protein-regulatory influence of APOE and how it may impact the way that serum proteins are associated with LOAD, we disentangled the LOAD protein signature into APOE-ε4-dependent and APOE-ε4-independent components, by identifying proteins whose LOAD association is largely attenuated upon conditioning on APOE-ε4 carrier status. We compared the serum protein signature of LOAD to those observed in CSF and brain and, finally, used genetic variation as anchors to determine the potential causal direction between serum proteins and disease state.

Fig. 1: Study overview.
figure 1

a, Overview of the AGES cohort and study participants. Prevalent non-AD dementia cases were excluded from the analysis. b, Overview of the aptamers tested and their associations with LOAD. Serum measurements of 4,782 aptamers were tested for associations with prevalent and incident LOAD status, using logistic and Cox proportional hazards regression models, respectively. From the proteins associated with incident LOAD, sets of 140 proteins with an APOE-ε4-independent association and 17 proteins with an APOE-ε4-dependent association were defined. The APOE-ε4-dependent proteins were further expanded to first-degree PPI partners. All sets of proteins were subjected to functional enrichment analysis and bidirectional MR analysis. c, Overview of the replication cohorts used in the study, which include proteins measured in the circulation (ACE) as well as in brain and CSF (Emory). This figure was created with BioRender.

Results

The AGES study cohort

This prospective population-based study was based on 5,127 participants free of dementia at baseline, after the exclusion of 163 individuals with prevalent non-AD dementia and 167 individuals with prevalent LOAD. During a potential follow-up of 12.8 years (median using reverse Kaplan–Meier, 95% confidence interval (CI): 12.6–13.2), 655 individuals were diagnosed with incident LOAD, with the last individual being diagnosed 16 years from baseline. Of those, 115 were diagnosed at the AGES 5-year follow-up study visit, whereas the remaining cases were based on clinical diagnosis of LOAD from linked records (Methods). Participants with incident LOAD were older at entry, were more likely to carry an APOE-ε4 allele, had lower body mass index (BMI) and had lower education levels compared to healthy individuals (Supplementary Table 1). See Fig. 1 for the study overview.

Serum protein profile of incident LOAD in AGES

To investigate the LOAD-associated circulatory proteomic patterns that occur before disease onset, we used Cox proportional hazards (Cox PH) models and found 320 aptamers (303 proteins) to be significantly (false discovery rate (FDR) < 0.05) associated with incident LOAD diagnosis after adjusting for age and sex (model 1), with hazard ratios (HRs) ranging from 0.78 for TBCA to 1.47 for NTN1 per standard deviation increase of protein levels (Fig. 2a and Supplementary Table 2). To account for variability related to APOE-ε4 carrier status, we adjusted for the genotype in an additional model (model 2; Supplementary Table 2), which resulted in 140 significant aptamers (130 unique proteins, HR: 0.79 (CD4)–1.25 (CGA/FSHB), FDR < 0.05) (Fig. 2b), all of which overlapped with model 1 (Fig. 2c). When comparing the two models, 43% of the serum proteins remained significant after APOE-ε4 adjustment, indicating that their LOAD association is independent of the APOE-ε4 genotype (Table 1 and Supplementary Table 2). Adjusting for additional AD risk factors and estimated glomerular filtration rate (eGFR) (Methods) retained 38 significant LOAD-associated aptamers (35 proteins, HR: 0.80 (CD4)–1.26 (SMOC1), FDR < 0.05) (model 3; Supplementary Table 2), which may reflect specific processes affecting risk of LOAD that are not captured by currently established risk factors.

Fig. 2: Proteins associated with incident LOAD in AGES (n = 5,127).
figure 2

a,b, Volcano plots showing the protein association profile for incident LOAD with the HR for incident LOAD from the Cox PH models (x axis) and −log10 of Benjamini–Hochberg FDRs (y axis) across two models: without APOE-ε4 adjustment (model 1) (a) and with APOE-ε4 adjustment (model 2) (b). c, Venn diagram for the overlap between models 1 and 2 for incident LOAD. d,e, Enrichment of top GO terms from GSEA analysis for incident LOAD (model 1) shown as a dot plot stratified by ontology (d) and gene-concept network (e). f,g, Comparison of effect sizes (HR) from Cox PH models for incident LOAD between the AGES and the ACE (n = 719) cohorts for all proteins reaching nominal significance (P < 0.05) in the Cox PH in ACE for model 1 (f) and model 2 (g). Protein associations with Benjamini–Hochberg FDR < 0.05 are denoted in red. BP, biological process; CC, cellular component; MF, molecular function.

Table 1 Summary table of the top 20 significant APOE-ε4-independent proteins associated with incident LOAD in AGES (n = 5,127)

As HR variability can arise with lengthy follow-up time, secondary analyses were implemented with a 10-year follow-up cutoff, which revealed mostly overlapping results (Supplementary Note 1, Supplementary Tables 3 and 4 and Supplementary Fig. 1). We did, however, detect protein associations specific to the shorter follow-up time, which potentially reflect processes that take place closer to the LOAD diagnosis. As there may be further differences in proteomic profiles depending on whether protein sampling occurred before or after LOAD diagnosis, we additionally considered the protein profile of 167 AGES participants with prevalent LOAD at baseline (Supplementary Note 2, Supplementary Fig. 2a–c and Supplementary Tables 57). Interestingly, many of the proteins associated with increased risk of incident LOAD showed the opposite direction of effect for prevalent LOAD although generally not statistically significant (Supplementary Fig. 2d). These contrasting results suggest an important temporal element in the LOAD-associated proteome. In total, 346 aptamers (329 unique proteins) were associated with LOAD when all outcomes (incident and prevalent LOAD), follow-up times and models were considered (Supplementary Tables 2, 3 and 5).

To evaluate which biological processes are reflected by the overall incident LOAD-associated protein signature in AGES, we performed a gene set enrichment analysis (GSEA). The strongest enrichment for protein associations in model 1 was observed for Gene Ontology (GO) terms related to axon development and neuron morphogenesis (Fig. 2d,e and Supplementary Table 8). The proteins driving the enrichment included neural cell adhesion molecules 1 and 2 (NCAM1 and NCAM2), netrin 1 (NTN1), contactin 1 (CNTN1), neuropilin 1 (NRP1), fibronectin leucine-rich transmembrane protein 2 (FLRT2), matrix metallopeptidase 2 (MMP2) and cell adhesion molecule L1-like (CHL1). GSEA of the protein profiles of model 2, where APOE-ε4 carrier status was adjusted for, showed similar enrichment results (Supplementary Table 8), demonstrating that these terms were mainly driven by the APOE-ε4-independent component of the LOAD-associated protein profile (Supplementary Note 3). No tissue-elevated gene expression was significantly enriched among the LOAD-associated proteins, except for adipose tissue (Supplementary Table 8). Nevertheless, seven (35%) of the top 20 APOE-ε4-independent LOAD-associated proteins had elevated expression in brain or choroid plexus compared to other tissues (Table 1).

Proteins with APOE-ε4-dependent association with incident LOAD

As previously mentioned, 43% of the protein associations with incident LOAD were independent of APOE-ε4. Of the remaining 57% that were affected by APOE-ε4 adjustment, we identified 17 proteins whose associations with incident LOAD were particularly strongly affected by APOE-ε4 carrier status (Table 2, Fig. 3a, Supplementary Fig. 3 and Supplementary Table 2). These proteins, hereafter referred to as APOE-ε4-dependent proteins, were defined as proteins significantly (FDR < 0.05) associated with incident LOAD in model 1 but whose nominal significance was attenuated (P > 0.05) or whose direction of effect changed upon APOE-ε4 adjustment in model 2. These APOE-ε4-dependent proteins included those with the strongest associations with LOAD before adjusting for the APOE-ε4 allele (Fig. 2a). The levels of the APOE protein (targeted by four aptamers) were not significantly associated with incident or prevalent LOAD (FDR > 0.19 for all models, both outcomes). However, lower levels were observed in prevalent LOAD at a nominal significance (P < 0.05) that was unaffected by adjustment for APOE-ε4 carrier status (Supplementary Table 9). Figure 3b shows the intra-correlations among the 17 APOE-ε4-dependent proteins. All the 17 APOE-ε4-dependent proteins were strongly regulated by the APOE-ε4 allele (Fig. 3c, Table 2, Supplementary Fig. 4 and Supplementary Table 10), with the ε4 allele increasing the levels of five of the proteins and decreasing the levels of the other 12. Accordingly, we observed that increased levels of the five APOE-ε4 upregulated proteins and decreased levels of the 12 APOE-ε4 downregulated proteins were also associated with higher risk of LOAD, yielding an HR above and below 1, respectively (Fig. 3d). As per definition, most of the APOE-ε4-dependent proteins lost significance upon APOE-ε4 adjustment, yet, interestingly, the direction of effect inverted for five proteins after APOE-ε4 adjustment (ARL2, IRF6, NEFL, S100A13 and TBCA) (Fig. 3e). A previous study using the Simoa assay (Quanterix) reported increased NEFL levels in APOE-ε4 compared to APOE-ε3 carriers30, whereas we observed the opposite. We, thus, compared NEFL measurements from SOMAscan and the Simoa assay in a subset of AGES and observed differences in the associations with both APOE-ε4 and LOAD, indicating that they potentially measure different NEFL species31 (Supplementary Note 4 and Supplementary Fig. 5).

Table 2 Summary table of the 17 APOE-ε4-dependent proteins associated with incident LOAD in AGES (n = 5,127)
Fig. 3: Proteins with APOE-ε4-dependent association with incident LOAD in AGES (n = 5,127).
figure 3

a, Spaghetti plot showing the statistical significance as Benjamini–Hochberg FDR of protein associations with incident LOAD across the three Cox PH models, highlighting a set of 17 unique proteins (green) whose association with incident LOAD is attenuated upon APOE-ε4 adjustment. The horizontal lines indicate Benjamini–Hochberg FDR < 0.05 (dashed) and P < 0.05 (dot-dashed). The total number of significantly associated proteins (FDR < 0.05) for each model is shown above. b, Pairwise Pearson’s correlation among the 17 APOE-ε4-dependent proteins. c, Forest plot showing the effect (beta coefficient) of the APOE genotype on the 17 APOE-ε4-dependent proteins in AGES. The beta coefficients indicate the change in protein levels per ε4 allele count and are shown with 95% CIs. d,e, Forest plots showing the HR for incident LOAD per standard deviation increase in level for each of the 17 APOE-ε4-dependent proteins in AGES without APOE-ε4 adjustment (model 1) (d) and with APOE-ε4 adjustment (model 2) (e). The LOAD HRs are shown with 95% CIs. Proteins that change direction of effect between the two models are highlighted in red. fh, Replication analyses for ce were performed in the ACE cohort (n = 719) in the same manner as in the AGES cohort. FAM159B (in gray) was not measured in the ACE SOMAscan assay.

The HR conferred by APOE-ε4 for incident LOAD in the AGES cohort was 2.1 (Cox PH P = 1.23 × 10−27) per copy of the ε4 allele. To evaluate if any of the 17 APOE-ε4-dependent proteins might mediate the effect of APOE-ε4 on incident LOAD, we performed a regression-based mediation analysis. The overall proportion of the effect mediated was non-significant (estimate = −0.05, P = 1; Supplementary Fig. 6a), thus suggesting that these proteins do not mediate the LOAD risk conferred by APOE-ε4. However, although not direct mediators, the 17 proteins could be blood-based readouts of a true mediator within tissue-specific pathological processes occurring before LOAD diagnosis. We additionally considered the change in HR for APOE-ε4 on risk of incident LOAD when adjusting for individual LOAD-associated proteins and found that adjustment for most proteins resulted in a minor effect decrease (Supplementary Fig. 6b). Intriguingly, however, the adjustment for four APOE-ε4-dependent proteins (NEFL, ARL2, TBCA and S100A13) caused an increase of approximately 10% in APOE-ε4 effect size (Supplementary Fig. 6b,c). Thus, the effect of APOE-ε4 on LOAD is partly masked by secondary opposing associations between these proteins and LOAD, which are further explored below. Although the 17 APOE-ε4-dependent proteins were not significantly enriched for tissue-elevated gene expression (Supplementary Table 8), we observed that four (LRRN1, FAM159B, NEFL and HBQ1) had elevated gene expression in brain compared to other tissues, and one (TMCC3) clustered with oligodendrocyte-related genes (Table 2). Of the remaining APOE-ε4-dependent proteins, eight were ubiquitously expressed, including in brain tissue, and four were elevated in other tissues. We did not detect any significantly enriched molecular signatures or GO terms for the 17 APOE-ε4-dependent proteins (Supplementary Table 8). However, a network analysis of measured and inferred physical protein–protein interactions (PPIs)32 revealed that the APOE-ε4-dependent proteins interact directly with proteins involved in microtubule and centromeric functions, neuronal response and development, neuroinflammation and AD (Extended Data Fig. 1, Supplementary Tables 1113 and Supplementary Note 5).

Given the well-established relationship between APOE and cholesterol33, we explored the potential effect of serum lipid levels on the association between LOAD and the 17 APOE-ε4-dependent proteins (Supplementary Table 14, Supplementary Figs. 7 and 8 and Supplementary Note 6). Our findings suggest that, although many of the APOE-ε4-dependent proteins are associated with cholesterol levels, it is not the driver of their link to LOAD.

Finally, given the observed APOE-ε4-dependent and APOE-ε4-independent proteomic associations with LOAD in the full cohort, we additionally investigated if any proteins were differentially associated with LOAD within APOE-ε4 carriers versus non-carriers by stratification via an interaction analysis. We found differential associations between the strata for several proteins, potentially suggesting that different pathways vary in their contribution to the development of LOAD depending on APOE-ε4 carrier status (Supplementary Note 7 and Supplementary Table 15).

External validation of protein associations with LOAD

We evaluated the protein associations with incident LOAD from our APOE-ε4-dependent/independent analyses in an external cohort, the Alzheimer Center Barcelona (ACE) (n = 1,341), with SOMAscan platform (v4.1-7K) measurements from plasma of individuals who were referred to the center. The longitudinal component of ACE consists of individuals who had been diagnosed with mild cognitive impairment (MCI) at the center and had been followed up. A total of 719 participants had follow-up information and 266 converted to LOAD over a median follow-up of 3.14 years (reverse Kaplan–Meier, 95% CI: 3.04–3.28) (Supplementary Table 16). Despite the fundamentally different cohorts, with AGES being population based and using the V3-5K SOMAscan platform and ACE based on individuals with established symptoms and the v4.1-7K SOMAscan platform, we replicated 36 protein associations with LOAD at nominal significance (P < 0.05) in the smaller ACE cohort (Table 3, Supplementary Table 10 and Fig. 2f,g). Of those, 30 proteins were nominally significant in model 1, with 97% being directionally consistent with the observations in AGES (Fig. 2f). In model 2, 21 proteins were nominally significant, 86% of which were directionally consistent (Fig. 2g). After multiple testing correction, seven proteins remained statistically significant (FDR < 0.05), all of which were directionally consistent (Table 3, Supplementary Table 10 and Fig. 2f,g). Six were statistically significant (FDR < 0.05) in model 1 (NEFL, LRRN1, TBCA, CTF1, C1orf56 and TIMP4) and one in model 2 (S100A13) (Supplementary Table 10). Of all 332 tested aptamers, 213 (64%) were directionally consistent regardless of significance in model 1 (two-sided exact binomial test P = 2.0 × 10−5), and 202 (61%) were directionally consistent in model 2 (two-sided exact binomial test P = 0.002), demonstrating an enrichment for consistency in direction of effect. The protein associations replicated in the ACE cohort are of particular interest as they represent potentially clinically relevant candidates for LOAD that are consistent in two different contexts: in both a general population and a clinically derived symptomatic sample set. However, our results suggest that many of the proteins that associate with long-term LOAD risk are not strongly associated with the conversion from MCI to AD, which is further into the AD trajectory and may also explain the limited overlap between the proteins associated with prevalent and incident LOAD in AGES.

Table 3 Replication of the LOAD-associated proteins from AGES (n = 5,127) in the ACE cohort (n = 719)

Validation of reversed association conditional on APOE-ε4

Specifically considering the APOE-ε4-dependent proteins, the association between the APOE-ε4 allele and the proteins was replicated for 13 of 17 proteins in the ACE cohort (Fig. 3f and Supplementary Fig. 4b). Furthermore, the change in direction of effect for incident LOAD upon APOE-ε4 adjustment was replicated in the ACE cohort for four of five proteins (ARL2, NEFL, S100A13 and TBCA) (Fig. 3g,h and Supplementary Table 10), with even larger effects observed in the ACE cohort compared to AGES in the APOE-ε4 adjusted model and three proteins (ARL2, S100A13 and TBCA) becoming statistically significant (P < 0.05). Thus, the attenuation of the primary LOAD associations for these proteins upon APOE-ε4 adjustment meet the criteria of APOE-ε4 dependence (Methods). No significant interaction between these proteins and APOE-ε4 carrier status on AD risk was observed in either the AGES or ACE cohorts. Taken together, our results show that these proteins are strongly downregulated by APOE-ε4 and, consequently, show an inverse relationship with incident LOAD; but when adjusting for the APOE-ε4 allele, their association with LOAD is still significant but reversed, suggesting a secondary non-APOE-ε4-mediated process affecting these same proteins in relation to LOAD in the opposite direction that is more strongly observed in a cohort of individuals with MCI than in the population-based AGES cohort.

Potential causal associations between proteins and LOAD

The proteins associated with LOAD could include proteins causally related to the disease or proteins whose serum level changes reflect a response to prodromal or genetic liability to LOAD. To test this hypothesis, we performed a bidirectional two-sample Mendelian randomization (MR) analysis, including the targets of all 346 aptamers associated with LOAD in our study. Genetic variant associations for serum protein levels were obtained from a catalog of cis-protein quantitative trait loci (pQTLs) from AGES23, whereas variant associations with LOAD were extracted from a recent GWAS of 39,106 clinically diagnosed LOAD cases, 46,828 proxy-LOAD and dementia cases and 401,577 controls of European ancestry18. In total, 117 (34%) of the LOAD-associated serum aptamers had cis-pQTLs that were suitable as genetic instruments and were included in the protein-LOAD MR analysis (Supplementary Table 17).

In the forward MR analysis, two proteins—integrin binding sialoprotein (IBSP) and amyloid precursor protein (APP)—had support for causality (Supplementary Table 18). IBSP had a risk-increasing effect for LOAD in both the causal analysis (odds ratio (OR) = 1.26, FDR = 0.03) and observational analysis (incident LOAD full follow-up, HR = 1.13, FDR = 0.04). APP had a protective effect for LOAD in both the causal analysis (OR = 0.76, FDR = 0.03) and the observational analysis (incident LOAD full follow-up, HR = 0.88, FDR = 0.047). Notably, although not statistically significant, we observed suggestive support for a protective effect of genetically determined serum levels of acetylcholinesterase (ACHE; OR = 0.92, P = 0.061), a target of clinically used therapeutic agent for dementia34 (Supplementary Table 18 and Supplementary Fig. 9). In a forward MR analysis of the APOE-ε4-dependent protein interaction partners, two proteins, APP and MAPK3, had support for causality (Supplementary Table 13 and Supplementary Note 5).

As most of the observational protein associations in the current study were detected for incident LOAD and, thus, reflect changes that take place before the onset of clinically diagnosed disease, it is unlikely that their levels and effects are direct downstream consequences of the disease after it reaches a clinical stage. However, they may reflect a response to a prodromal stage of the disease. We, therefore, performed a reverse MR to test if the observed changes in serum protein levels are likely to occur downstream of the genetic liability to LOAD, which may capture processes both at the prodromal and clinical stage. The APOE locus is likely to have a dominant pleiotropic effect in the reverse MR analysis (Supplementary Table 19, Supplementary Fig. 10 and Supplementary Note 8), as it has a disproportionately strong effect on LOAD risk compared to all other common genetic variants while also being a well-established pQTL trans-hotspot, affecting circulating levels of up to hundreds of proteins23,24,25,26. We, therefore, performed the primary reverse MR analysis using only LOAD-associated genetic variants outside of the APOE locus as instruments. We found two proteins (S100A13 and ARL2) that were significantly (FDR < 0.05) increased by LOAD or its genetic liability (Supplementary Table 19 and Supplementary Figs. 10 and 11). Interestingly, both were among the 17 previously identified APOE-ε4-dependent LOAD proteins, together with two additional proteins that were nominally significant in the reverse MR (TBCA, P = 4.4 × 10−4, FDR = 0.051 and IRF6, P = 7.9 × 10−4, FDR = 0.055). Thus, intriguingly, these findings suggest that these four proteins are upregulated by LOAD, in contrast to the observed APOE-ε4 downregulation of the same proteins (Fig. 4). This supports our findings of competing biological effects described above (Fig. 3e and Supplementary Fig. 6), and, collectively, our results indicate that simultaneous opposing effects of APOE-ε4 on the one hand and LOAD on the other result in differential regulation of these proteins in serum (Fig. 4b).

Fig. 4: Reverse MR analysis suggests a causal effect of LOAD on four proteins.
figure 4

a, Comparison of HRs per SD increase of protein levels for incident LOAD with and without APOE-ε4 adjustment in the observational analysis (Cox PH) (n = 5,127) (upper), the effects of APOE-ε4 on protein levels in AGES (n = 5,332) and the effect of LOAD on protein levels from the reverse MR analysis (excluding the APOE locus) (lower), shown for the four APOE-ε4-dependent proteins that change direction of effect in both observational and causal analyses when APOE is accounted for. All effects are shown with 95% CIs. b,c, Visual summaries of the observed data. b, Mediation diagrams showing three possible hypotheses that could explain the relationship among APOE-ε4, LOAD and the four proteins shown in a. Our analyses do not support the hypothesis that LOAD mediates the effect of APOE-ε4 on proteins (hypothesis 1) or the other way around (hypothesis 2). However, our results from both the observational and causal analyses support the hypothesis that two mechanisms are at play that affect the same proteins in the opposite direction (hypothesis 3). c, The APOE-ε4 genotype leads to increased risk of LOAD by its effects in brain tissue. The same genotype results in a downregulation of serum levels of four proteins that are consequently themselves negatively associated with incident LOAD. Additionally, other non-APOE LOAD risk variants lead to upregulation of the same proteins in the reverse MR analysis, possibly reflecting a response to LOAD or its genetic liability. This figure was created with BioRender.

We performed a replication analysis of the effect of APOE-ε4 on protein levels and the reverse MR results for these four proteins using published protein GWAS summary statistics from two recent studies24,35. In the external datasets, the downregulation of all four proteins by APOE-ε4 (as determined by the rs429358 C allele) was replicated (Supplementary Fig. 12). In the reverse MR analysis (excluding the APOE locus), the upregulation of protein levels by LOAD liability observed in AGES was also detected for two proteins (S100A13 and TBCA) in both validation cohorts, reaching significance (P < 0.05) in the study by Ferkingstad et al.35 (Supplementary Fig. 12 and Supplementary Table 20). Although the two proteins changed direction in a similar manner as in AGES, the effect size was considerably smaller in the validation cohorts. Notably, however, individuals in these two cohorts are much younger than those in AGES, with mean ages of 55 years and 48 years for the Ferkingstad et al.35 and Sun et al.24 studies, respectively, compared to 76 years in AGES. Therefore, we conducted an age-stratified reverse MR analysis in AGES that showed a strong age-dependent effect, with a much larger effect of LOAD genetic liability on protein levels in individuals over 80 years of age compared to those younger than 80 years (Supplementary Fig. 12). The effect size in AGES individuals younger than 80 years was in line with the effect observed in the validation cohorts. Thus, if the upregulation of these proteins reflects a response to prodromal or preclinical LOAD, an older cohort may be needed to detect an association of the same degree as we found in AGES. However, the observed support in the validation cohorts for the discordant effects of APOE versus non-APOE LOAD-associated genetic variants on the same serum proteins strongly implicates these proteins as directly relevant to LOAD, potentially as readouts of biological processes that are both disrupted by APOE-ε4 and modulated in the opposite manner as a response to genetic predisposition to LOAD or the disease onset in general.

Together, these results indicate that LOAD or its general genetic liability causally affects the levels of some APOE-ε4-dependent proteins, but this effect is simultaneously masked by the strong effects of the APOE locus in the other direction (Fig. 4a). These outcomes strengthen the results described above, showing that the levels of these four proteins are strongly downregulated in APOE-ε4 carriers, and lower levels of these proteins are, therefore, associated with increased risk of LOAD in an APOE-ε4-dependent manner (Fig. 4b). Simultaneously, the reverse MR analysis shows that the collective effect of the other non-APOE LOAD risk variants is to upregulate the serum levels of these same proteins, possibly reflecting a response mechanism to LOAD pathogenesis (Fig. 4c). Again, this is in line with the observational analysis, where all four proteins changed direction of effect when adjusting for APOE-ε4 (Figs. 3d,e,g,h and 4a).

Overlap with the AD brain and CSF proteome

To evaluate to what extent our LOAD-associated serum proteins reflect the proteomic profile of AD in relevant tissues, we queried data from recent proteomic studies of AD in CSF36 and brain37, which also describe tissue-specific co-regulatory modules. We observed that, of our LOAD-associated serum proteins, 51 were also associated with AD in brain as measured by mass spectrometry (MS), with 32 (63%) being directionally consistent (Fig. 5a,b and Supplementary Tables 21 and 22). Higher directional consistency was observed within the APOE-ε4-independent protein group, or 15 (71%) of 21 proteins associated with AD in brain tissue. Additionally, 60 proteins were directly associated with AD in CSF as measured with SOMAscan (7K) (Fig. 5a), with 46 (77%) being directionally consistent (Fig. 5b). The proportion of directionally consistent associations between serum and CSF was higher in both the APOE-ε4-independent and APOE-ε4-dependent protein groups, or 88% (22 of 25 and seven of eight for APOE-ε4-independent and APOE-ε4-dependent proteins, respectively) (Fig. 5b and Supplementary Table 21). However, directional inconsistency between plasma and CSF AD proteomic profiles was reported previously in a similar comparison38. Fourteen proteins overlapped among all three tissues in the context of AD (Fig. 5a and Supplementary Table 21). Many of these proteins have established links or are highly relevant to LOAD, such as spondin 1 (SPON1), involved in the processing of APP39; secreted modular calcium-binding protein 1 (SMOC1), previously proposed as a biomarker of LOAD in postmortem brains and CSF40; NTN1, an interactor of APP and regulator of amyloid beta (Aβ) production41; NEFL, previously proposed as a plasma biomarker for LOAD and axon injury42,43; and Von Willebrand factor (VWF), known for its role in blood clotting and associations with LOAD44 (Supplementary Table 21). Notably, some of the APOE-ε4-dependent proteins were associated with AD across all three tissues, such as TBCA and TP53I11.

Fig. 5: Overlap between AD protein signatures in serum, brain and CSF.
figure 5

a, Venn diagram showing the overlap of AD-associated proteins in serum, brain and CSF. b, Comparison of the effect sizes for AD-associated proteins that overlap between serum and brain (top) and serum and CSF (bottom). The proteins are stratified based on the APOE-ε4 dependence in AGES for incident LOAD. The effect size in AGES is shown for incident LOAD model 1 (Cox PH), except for proteins that were uniquely identified using the shorter 10-year follow-up (Cox PH) or prevalent LOAD (logistic regression), in which case the respective effect size from the significant association is shown. ce, Heatmap showing the enrichment (two-sided Fisher’s test) of AD-associated proteins by tissue type (x axis) in the AGES serum protein modules (c), Emory CSF protein modules (d) and Emory brain protein modules (e) (y axis). Modules that are enriched for AD associations in more than one tissue are highlighted with red squares.

We previously described the co-regulatory structure of the serum proteome, which can broadly be defined as 27 modules of correlated proteins25 (Supplementary Table 23). In the current study, we found that, among the 346 aptamers (329 proteins) associated with LOAD (prevalent or incident, any model), five serum protein modules (M27, M3, M11, M2 and M24) were overrepresented (Fig. 5c and Supplementary Table 24). In particular, the 140 APOE-ε4-independent proteins were specifically overrepresented in module M27, enriched for proteins involved in neuron development and the extracellular matrix (ECM), and in module M3, associated with growth factor signaling pathways (Supplementary Table 24). By contrast, the 17 APOE-ε4-dependent proteins were specifically enriched in protein module M11 (Supplementary Table 24), which is strongly enriched for lipoprotein-related pathways and is under strong genetic control of the APOE locus25. Serum modules M27, M24 and M11 were all enriched for AD associations in CSF (Fig. 5c). We next sought to understand to what extent our LOAD-associated proteins identified in serum might reflect AD protein signatures in CSF and brain tissue. Among the LOAD-associated proteins measured in serum, we found the APOE-ε4-dependent and APOE-ε4-independent proteins to be enriched in different CSF modules, most of which were also linked to AD (Fig. 5d and Supplementary Table 24). In brain tissue, the serum APOE-ε4-independent LOAD proteins were particularly enriched in brain module M42 (Matrisome), which is enriched for ECM proteins37 (Fig. 5e and Supplementary Table 24). Strikingly, M42 was strongly enriched for the AD proteomic profiles of all three tissues (Fig. 5e and Supplementary Table 24). Interestingly, members of this module (SMOC1, SPON1, NTN1, GPNMB and APP), with some of the strongest association with AD in brain (Fig. 5b and Supplementary Table 24), overlapped with some of the strongest associations in serum to incident LOAD in our study (Fig. 2a,b and Supplementary Table 2).

This module has furthermore been demonstrated to be correlated with Aβ deposition in the brain, and some of its protein constituents (for example, MDK, NTN1 and SMOC1) have been shown to co-localize with and bind to Aβ37. Additionally, the APOE locus regulates M42 levels in the brain (mod-QTL), and, although the APOE protein is a member of module M42, this regulation was found to not be solely driven through the levels of the APOE protein itself37. Our results simultaneously show that other members of the module, such as SPON1 and SMOC1, exhibit an APOE-ε4-independent association to incident LOAD in serum. Interestingly, these same two proteins are increased in CSF 30 years before symptom onset in autosomal dominant early-onset AD45. In summary, we demonstrate significant overlaps in LOAD-associated protein expression across blood, CSF and brain on both an individual protein level and on a protein module level.

Discussion

We describe a comprehensive mapping of the serum protein profile of LOAD that provides insight into processes that are independent of or dependent on the genetic control of APOE-ε4 (Supplementary Fig. 13). We identified 329 proteins in total that differed in the incident or prevalent LOAD cases compared to non-LOAD participants in a population-based cohort with long-term follow-up. Among these, we identified a grouping of proteins based on their primary LOAD association being statistically independent of (140 proteins), or dependent on (17 proteins) APOE-ε4 carrier status. Many of the APOE-ε4-independent proteins are implicated in neuronal pathways and are shared with the LOAD-associated CSF and brain proteome. The 17 APOE-ε4-dependent proteins overlap with AD-associated protein modules in CSF and interact directly with protein partners involved in LOAD, including APP. Another key finding is that, among these 17 proteins, four proteins (ARL2, S100A13, TBCA and IRF6) change LOAD-associated direction of effect both observationally and genetically when taking APOE-ε4 carrier status into account. Notably, we replicated this directional change both observationally for three proteins (ARL2, S100A13 and TBCA) and genetically for two proteins (S100A13 and TBCA) in external cohorts. Collectively, our results suggest that, although their primary association with LOAD reflects the risk conferred by APOE-ε4, there exists a secondary causal effect of LOAD itself on the protein levels in the reverse direction as supported by the MR analysis, possibly reflecting a response to the disease onset.

Previous studies identifying proteins associated with LOAD were limited to cross-sectional cohorts or were based on all-cause dementia6,17,46,47. Here, we extend those findings by distinguishing LOAD cases from other types of dementia based on a clinical diagnosis criterion in a prospective cohort study to identify LOAD-specific serum protein signatures preceding clinical onset. A recent study of UK Biobank participants using the Olink platform identified several proteins associated with incident dementia, including AD48. Their top AD-associated proteins differed from those prioritized in the current study; thus, future work is required to determine how proteomic platform and cohort differences, such as age, influence protein associations with LOAD, both of which we found to directly affect the results in our extended analyses of NEFL (Supplementary Note 4). Furthermore, our comparative approach of statistical models with and without APOE-ε4 adjustment provides a compartmentalized view of the LOAD serum protein profile and demonstrates how protein effects can differ depending on genetic confounders, which are imperative to take into consideration. We found that the proteins associated with incident LOAD in our study, in particular those independently of APOE-ε4, such as GPNMB, NTN1, SMOC1 and SPON1, overlap with the proteomic profile of LOAD in CSF38 and brain37; are enriched for neuronal pathways; and have been functionally implicated with LOAD (Table 1), which may reflect an altered abundance of neuronal proteins in the circulation during the prodromal stage of LOAD. These overlaps that we found across independent cohorts and different proteomics technologies suggest that the serum levels of some proteins have a direct link to the biological systems involved in LOAD pathogenesis and may even provide a peripheral readout of neurodegenerative processes before clinical diagnosis of LOAD. In particular, the proteins that show directionally consistent effect sizes suggest exceptional AD-specific robustness as the measurements vary by tissue, methodology and populations.

We identified 17 proteins with a particularly strong APOE-ε4-dependent association with incident LOAD, of which eight were also associated with prevalent AD in CSF. The association between APOE-ε4 and circulating levels of these proteins was reported by our group23,25,26 and others49, but their direct association with incident LOAD has, to our knowledge, not been previously described. Interestingly, we previously observed multiple independent genetic signals in the APOE–APOC1–APOC1P1 region affecting these same proteins to a varying degree, some of which co-localize with GWAS signals for LOAD23, which necessitate further investigation for better understanding of the complex regulatory effects in this genetic region that converge on the same set of proteins. The proteins with an APOE-ε4-dependent association with LOAD may point directly to the processes through which APOE-ε4 mediates its risk and provide a readout of the pathogenic process in the circulation for the approximately 50% of patients with LOAD worldwide carrying the variant20,21. Although our data do not provide information on the tissue origin of the APOE-ε4-dependent proteins, some exhibit brain-elevated gene expression50 or have been associated with LOAD at the transcriptomic or protein level in brain tissue or CSF (Table 2). At the genetic level, a lookup in the GWAS catalog50 shows that an intron variant in the IRF6 gene has a suggestive GWAS association with LOAD via APOE-ε4 carrier status interaction51. In addition, variants in the TMCC3 gene have been linked to LOAD52, educational attainment53 and caudate volume change rate54, and variants in the TBCA gene have been suggestively associated with reaction time55 and PHF-tau levels56. Collectively, the gene expression patterns for these proteins in the brain, interactions with proteins involved in neuronal processes and suggestive associations between genetic markers in or near these genes and brain-related outcomes suggest that these APOE-ε4-dependent proteins may reflect brain-specific processes affected by APOE-ε4 carrier status that affect the risk of developing LOAD. Notably, the association patterns for ARL2, S100A13 and TBCA suggest the presence of a pathway that is downregulated by APOE-ε4 already in midlife, given the consistent effect of APOE-ε4 on the same proteins in younger cohorts, but upregulated at the onset of LOAD, as supported by the larger observed effects in the APOE-ε4 adjusted analysis in the ACE cohort of individuals who are closer to diagnosis on the AD trajectory than those in AGES. Additional studies are required to expand on these interpretations and dissect the complex mechanisms at playages to determine if the modulation of the process represented by these proteins has therapeutic potential.

Conflicting results have been observed for the relationship between serum or plasma levels of the APOE protein and LOAD57. Although serum levels of the APOE protein, as measured by the SOMAscan platform, were not strongly associated with LOAD in AGES, our results support a relationship between lower APOE levels and prevalent LOAD. We furthermore observed conflicting results in the association between APOE-ε4 and NEFL compared to a previous study30. Our results comparing different methods for measuring NEFL in AGES (Supplementary Note 4) highlight the importance of considering proteomic platforms and their potential differences in protein species detection, as noted by Budelier et al.31 and others58.

Two proteins, IBSP and APP, were identified to potentially have a causal role in LOAD. IBSP was previously associated with plasma Aβ and incident dementia59, and APP is the precursor protein for Aβ60. Based on the MR analysis for the third of LOAD-associated proteins that could be tested, most do not appear to be causal in and of themselves, but their association with incident LOAD may still reflect changes that occur years before the onset of LOAD that could be of interest to target before irreversible damage accumulates.

A major strength of our study is the high-quality data from a prospective longitudinal population-based cohort study with detailed follow-up, broad coverage of circulating proteins and a comprehensive comparison to the AD proteome in CSF and brain. The limitations of our study include that our results are based on a Northern European cohort and cannot necessarily be transferred directly to other populations or ethnicities. Additionally, although we partly replicated our overall findings in an external cohort, a greater replication proportion could be anticipated in a more comparable cohort as discussed above. The ACE cohort consists of clinically referred individuals with MCI and proteomic measurements performed on a different version of the SOMAscan platform (version 4.1 versus version 3 in AGES). Additionally, different normalization procedures were applied by SomaLogic for the two SOMAscan versions, which may have an effect on the LOAD associations47. Regardless of these differences, we did replicate most of the APOE-ε4-dependent LOAD associations, including the APOE-ε4-dependent change in effect for ARL2, S100A13 and TBCA. We could not test all LOAD-associated proteins for causality, including most of the APOE-ε4-dependent proteins, due to lack of significant cis-pQTLs for two-thirds of the proteins; thus, we cannot exclude the possibility that some could be causal but missed by our analysis. Finally, although a clinical LOAD diagnosis criterion was used for classifying cases, it is possible that some individuals were misclassified, and some of our findings may, thus, reflect processes related to dementia in general. As a result, it is critical to validate these findings in individuals with established Aβ and tau deposits as well as in experimental settings.

The proteins highlighted in this study and the mechanisms they point to may be used as a source of biomarkers or therapeutic targets that can be modulated for the prevention or treatment of LOAD. This large prospective cohort study, using both a longitudinal and a cross-sectional design, represents a unified and comprehensive reference analysis with which past and future serum protein biomarkers and drug targets can be considered, compared and evaluated.

Methods

AGES study population

Participants aged 66–96 years were from the AGES cohort. AGES is a single-center prospective population-based study of deeply phenotyped individuals (n = 5,764; mean age, 76.6 ± 5.6 years; 58% women) and survivors of the 40-year-long prospective Reykjavik study, an epidemiologic study aimed to understand aging in the context of gene/environment interaction by focusing on four biologic systems: vascular, neurocognitive (including sensory), musculoskeletal and body composition/metabolism29. The AGES study was approved by the Nation Bioethics Committee in Iceland (approval number VSN-00-063), by the National Institute on Aging Intramural Institutional Review Board and by the Data Protection Authority in Iceland. All participants provided informed consent for their participation in the study and did not receive compensation.

Of the AGES participants, 3,411 attended a 5-year follow-up visit, and all participants were followed up for incident dementia through medical and nursing home reports (Resident Assessment Instrument (RAI)) and death certificates. The follow-up time was up to 16.9 years, with the last individual being diagnosed 16 years from baseline. LOAD diagnosis at AGES baseline and follow-up visits was carried out using a three-step procedure as previously described29. In brief, cognitive assessment was carried out on all participants. Neuropsychological testing was performed on individuals with suspected dementia. Individuals remaining suspect for dementia underwent further neurologic and proxy examinations in the second diagnosis step. Third, a panel comprising a neurologist, a geriatrician, a neuroradiologist and a neuropsychologist assessed the positive-scoring participants according to international guidelines and gave a dementia diagnosis. Diagnoses for all-cause dementia and LOAD from nursing home reports were based on intake examinations upon entry or standardized procedures carried out in all Icelandic nursing homes61. Diagnosis of LOAD was established according to National Institute of Neurological and Communicative Diseases and Stroke–Alzheimerʼs Disease and Related Disorders Association (NINCDS-ADRDA) criteria or according to International Classification of Diseases, 10th revision (ICD-10) code F00 criteria. The participants diagnosed at baseline were defined as prevalent LOAD cases, whereas individuals diagnosed with LOAD during the follow-up period (either at the AGESII follow-up visit or through linked records) were defined as incident LOAD cases. All prevalent non-AD dementia cases (n = 163) were excluded from analyses.

Age, sex, education and lifestyle variables were assessed using questionnaires at baseline. Education was categorized as primary, secondary, college or university degree. Smoking was characterized as current, former or never smoker. APOE genotyping was assessed using microplate array diagonal gel electrophoresis (MADGE)62. BMI and hypertension were assessed at baseline. BMI was calculated as weight (kg) divided by height squared (m2), and hypertension was defined as antihypertensive treatment or blood pressure (BP) > 140/90 mmHg. Type 2 diabetes was defined from self-reported diabetes, diabetes medication use or fasting plasma glucose ≥7 mmol L−1. Serum creatinine was measured using a Roche Hitachi 912 instrument, and eGFR was derived with the four-variable MDRD study equation63.

Proteomic measurements

The proteomic measurements in AGES were described in detail elsewhere26,64 and were available for 5,457 participants. In brief, a custom version of the SOMAscan platform (Novartis V3-5K) was applied based on SOMAmer protein profiling technology65,66 including 4,782 aptamers that bind to 4,137 human proteins. Serum was prepared using a standardized protocol67 from blood samples collected after an overnight fast by the same personnel who were specifically trained in protocols for sample collection and handling. Special care was taken to minimize the time between blood draw and sample centrifugation. Bench time was minimized at all times, and samples were stored in 0.5-ml aliquots at −80 °C under constant surveillance. Serum samples that had not been previously thawed were used for the protein measurements. All samples were randomized and run as a single set at SomaLogic, Inc., blinded to any phenotypic outcomes. Hybridization controls were used to adjust for systematic variability in detection, and calibrator samples of three dilution sets (40%, 1% and 0.005%) were included so that the degree of fluorescence was a quantitative reflection of protein concentration. All aptamers that passed quality control had median intra-assay and inter-assay coefficient of variation (CV) < 5%. Finally, intra-plate median signal normalization was applied to individual samples by SomaLogic instead of normalization to an external reference of healthy individuals, as is done for later versions of the SOMAscan platform (https://somalogic.com/wp-content/uploads/2022/07/SL00000048_Rev-3_2022-01_-Data-Standardization-and-File-Specification-Technical-Note-v2.pdf).

Of the 37 APOE-ε4-independent and APOE-ε4-dependent proteins highlighted in Tables 1 and 2, respectively, orthogonal MS verified the specificity of eight aptamers (seven proteins) in previous studies25. Twelve additional aptamers were profiled (CD4 (3143_3_1), BRD4 (10043_31_3), SPON1 (5496_49_3), SMOC1 (13118_5_3), LRRN1 (11293_14_3), S100A13 (7223_60_3), CTF1 (13732_79_3), ARL2 (12587_65_3), C1orf56 (5744_12_3), MSN (5009_11_1), IRF6 (9999_1_3) and NEFL (10082_251_3)), and two additional aptamers (C1orf56 (5744_12_3) and MSN (5009_11_1)) were confirmed (Table 2) with SOMAmer pulldown mass spectrometry (SP-MS) using patient serum samples (>65 years) purchased from BioIVT. The new confirmations’ methodology is consistent with previous publications25, but the instrumentation was updated. Data-dependent analysis was performed on an Orbitrap Eclipse operated in positive ionization mode, with electrospray voltage of 1,500 V and ion transfer tube temperature of 275 °C applied. Full MS scans with quadrupole isolation were acquired in the Orbitrap mass analyzer using a scan range of 375–1,500 m/z, standard AGC target and automatic maximum injection time. Data-dependent scans were acquired in the Orbitrap with a 0.7-m/z quadrupole isolation window, 50,000 resolution, 50% normalized AGC target, 200-ms maximum injection time and 38% HCD collision energy over a 2-s cycle time. Dynamic exclusion of 45 s relative to ±10 p.p.m. reference mass tolerance was applied. The peptides were eluted with Aurora Ultimate 25 cm × 75 µm ID, 1.7 µm C18 nano columns over a 90-min gradient on the Vanquish Neo UHPLC system (Thermo Fisher Scientific). Raw data files were processed in Proteome Discoverer version 2.5 with SequestHT database search using a canonical human FASTA database (20,528 sequences, updated 8 April 2022).

The proteomic measurements for NEFL in a subset of AGES using the Simoa assay from Quanterix were described elsewhere68.

ACE cohort

ACE Alzheimer Center Barcelona was founded in 1995 and has collected and analyzed roughly 18,000 genetic samples, diagnosed over 8,000 patients and participated in nearly 150 clinical trials to date. For more details, visit https://www.fundacioace.com/en. The syndromic diagnosis of all individuals of the ACE cohort was established by a multidisciplinary group of neurologists, neuropsychologists and social workers. Healthy controls (HCs), including individuals with a diagnosis of subjective cognitive decline (SCD), were assigned a Clinical Dementia Rating (CDR) of 0, and individuals with MCI were assigned a CDR of 0.5. For MCI diagnoses, the classification of López et al. and Petersen’s criteria were used69,70,71,72. The 2011 National Institute on Aging and Alzheimer’s Association (NIA-AA) guidelines were used for AD diagnosis73. All ACE clinical protocols were previously published74,75,76. All ACE cohort participants provided written informed consent for their participation in the study and did not receive compensation74. Paired plasma and CSF samples77, following consensus recommendations, were stored at −80 °C. A subset of the ACE cohort was analyzed with the SOMAscan 7K proteomic platform78 (n = 1,370) (SomaLogic, Inc.). The proteomic data underwent standard quality control procedures at SomaLogic and were median normalized to reference using the adaptive normalization by maximum likelihood (ANML) method (https://somalogic.com/wp-content/uploads/2022/07/SL00000048_Rev-3_2022-01_-Data-Standardization-and-File-Specification-Technical-Note-v2.pdf). Additionally, APOE genotyping was assessed using TaqMan genotyping assays for rs429358 and rs7412 single-nucleotide polymorphisms (SNPs) (Thermo Fisher Scientific). Genotypes were furthermore extracted from the Axiom 815K Spanish Biobank Array (Thermo Fisher Scientific) performed by the Spanish National Center for Genotyping (CeGen).

Statistics and reproducibility

Protein measurement data were Box-Cox transformed and then centered and scaled. Extreme outliers (>4.3 s.d.) were excluded as previously described64. The assumption of normal distribution for the transformed protein measurements was visually inspected but not formally tested. Sample size was not predetermined by any statistical method but, rather, by available data. The associations of serum protein profiles with prevalent AD (n = 167) were examined cross-sectionally by logistic regression at baseline. The associations of serum protein profiles with incident LOAD (n = 655) were examined longitudinally using Cox proportional hazards models after excluding all participants with prevalent dementia. Participants who died or were diagnosed with incident non-AD dementia were censored at date of death or diagnosis. To account for HR variability that may arise with lengthy follow-up periods, a secondary analysis using a 10-year follow-up cutoff of incident LOAD was performed (nLOAD = 432). All individuals who had not experienced an event by the end of the 10-year follow-up were considered as not having an event. These individuals were not excluded from the analysis and were, thus, treated as ‘healthy’ controls. To compare the fits of the two follow-up times and to test for time dependence of the coefficients, we used ANOVA and the ‘survsplit’ function from the ‘survival’ R package79. For both prevalent and incident LOAD, we examined three covariate-adjusted models. The primary model (model 1) included the covariates sex and age. Model 2 included as an additional covariate the APOE-ε4 allele count (ε2/ε4 genotypes excluded). The third model (model 3) included additional adjustment for cardiovascular and lifestyle risk factors (BMI, type 2 diabetes, education, hypertension and smoking history) that have been associated with risk of LOAD80 and kidney function (eGFR) that may influence circulating protein levels. When performing the APOE-ε4 stratification analysis via interaction for incident LOAD, we added an interaction term between each aptamer and APOE-ε4 carrier status (no, not carrying ε4; yes, either ε34 or ε44) in model 2. We then used the ‘glht’ function from the ‘multcomp’ package (version 1.4.20) to obtain a recalculated HR and P value per strata. We extracted the effect sizes and P values for the interaction term directly from the summary of the Cox model. To assess the associations between APOE genotype (0, 1 or 2 copies of APOE-ε4) and the LOAD-associated proteins, a multiple linear regression was performed adjusting for sex and age, where the beta coefficient indicates the change in protein levels per ε4 allele count. Benjamini–Hochberg FDR was used to account for multiple hypothesis testing. ACE SomaLogic proteomics data were similarly Box-Cox transformed, and association analysis was performed in the same manner as in AGES. Mediation analysis was conducted using the ‘cmest’ function from the ‘CMAverse’ (version 0.1.0) R package with APOE-ε4 as exposure, incident LOAD as outcome and the 17 APOE-ε4-dependent proteins as potential mediators. The proportion mediated was calculated with direct counterfactual imputation estimation and 95% CIs based on 1,000 bootstrap repetitions.

APOE-ε4 dependence criteria of the proteins were defined as serum proteins that met FDR significance of less than 0.05 in association with incident LOAD in model 1, thus unadjusted for the APOE-ε4 allele, but whose nominal significance was abolished upon APOE-ε4 correction in model 2 (P > 0.05). Serum proteins that remained nominally significantly associated with incident LOAD (P < 0.05) upon APOE-ε4 correction but changed direction of effect were also considered to meet the APOE-ε4 dependence criteria, as a reversal of the effect indicates that the primary association is driven by APOE-ε4.

Functional enrichment analyses were performed using overrepresentation analysis (ORA) and GSEA using the R packages ‘clusterProfiler’ and ‘fgsea’81,82. The association significance cutoff for inclusion in ORA was FDR < 0.05. Background for both methods was specified as all proteins tested from the analysis leading up to enrichment testing. The investigated gene sets were the following: GO, Human Phenotype Ontology, KEGG, Wikipathways, Reactome, Pathway Interaction Database (PID), microRNA targets (MIRDB and Legacy), transcription factor targets (GTRD and Legacy), ImmuneSigDB and the vaccine response gene set83. Finally, we included tissue gene expression signatures via the same methods (ORA and GSEA) using data from GTEx84 and the Human Protein Atlas85, where gene expression patterns across tissues were categorized in the same manner as described by Uhlen et al.85, and tissue-elevated expression was considered as gene expression in any of the categories ‘tissue-specific’, ’tissue-enriched’ or ‘group-enriched’. minGSSize was set at 2 when investigating the LOAD-associated serum proteins directly. Before running the GSEA, the average effect size from the appropriate observational analysis was computed for each protein detected by multiple aptamers to eliminate duplicates from the protein list. Duplicate protein annotations were removed before executing the ORA. The effect sizes from the observational analyses were used for GSEA ranking. For the PPI network analysis, PPIs from InWeb32 (n = 14,448, after Entrez ID filtering) were used to obtain the first-degree interaction partners of the APOE-ε4-dependent proteins. For GSEA of the APOE-ε4-dependent protein interaction partners, minGSSize was set to 15, and maxGSSize was set to 500. Gene expression patterns based on a consensus dataset combining GTEx and Human Protein Atlas gene expression data were obtained from the Human Protein Atlas (version 23) for the top LOAD-associated proteins (Tables 1 and 2) as well as single-cell sequencing cluster membership85. Analyses were conducted using R versions 4.2.1. and 4.2.3.

Protein comparisons across serum, CSF and brain

To compare protein modules and AD associations across tissues, protein modules and protein associations with AD were obtained from brain37 and CSF36. The brain data, from the Banner Sun Health Research Institute86 and ROSMAP87, included tandem mass tag (TMT)-MS-based quantitative proteomics for 106 controls, 200 asymptomatic AD cases and 182 AD cases. The CSF samples were collected under the auspices of the Emory Goizueta Alzheimer’s Disease Research Center (ADRC) and the Emory Healthy Brain Study (EHBS)36. The cohort consisted of 140 healthy controls and 160 patients with AD as defined by the National Institute on Aging research framework73. Protein measurements were performed using TMT-MS and SOMAScan (7K). Only SomaLogic protein measurements were included in the comparison between CSF and serum, which were median normalized. Proteins were matched on SomaLogic aptamer ID when possible but otherwise by Entrez gene symbol. Overlaps between modules and AD-associated (FDR < 0.05) proteins across tissues were evaluated with Fisher’s exact test.

MR

A two-sample bidirectional MR analysis was performed, first, to evaluate the potential causal effects of serum protein levels on AD (forward MR) and, second, to evaluate the potential causal effects of AD or its genetic liability on serum protein levels (reverse MR). All aptamers significantly (FDR < 0.05) associated with LOAD (incident or prevalent) were included in the MR analyses, or a total of 346 unique aptamers (Supplementary Tables 2, 3 and 5), of which 320 aptamers were significant in the full follow-up incident LOAD analysis (models 1–3); 106 aptamers were significant in the 10-year follow-up incident LOAD analysis (models 1–3); and 10 aptamers were significant in the prevalent LOAD analysis (models 1–3). Genetic instruments for serum protein levels were obtained from a GWAS of serum protein levels in AGES23 and defined as follows. All variants within a 1-Mb (±500-kb) cis-window for the protein-encoding gene were obtained for a given aptamer. A cis-window-wide significance level Pb = 0.05/N, where N equals the number of SNPs within a given cis-window, was computed, and variants within the cis-window for each aptamer were clumped (r2 ≥ 0.2, P ≥ Pb). All aptamers included in the MR analysis had instruments with F-statistic > 10 (Supplementary Table 17). The effect of the genetic instruments for serum protein levels on LOAD risk was obtained from a GWAS of 39,106 clinically diagnosed LOAD cases, 46,828 proxy-LOAD and dementia cases and 401,577 controls of European ancestry18. Genetic instruments for the serum protein levels not found in the LOAD GWAS dataset were replaced by proxy SNPs (r2 > 0.8) when possible, to maximize SNP coverage. Genetic instruments for LOAD in the reverse causation MR analysis were obtained from the same LOAD GWAS18, where genome-wide significant variants were extracted (P < 5 × 10−8) and clumped at a more stringent linkage disequilibrium (LD) threshold (r2 ≥ 0.01) than for the protein instruments to limit overrepresentation of SNPs from any given locus across the genome. In the reverse causation MR analysis, cis-variants (±500 kb) for the given protein were excluded from the analysis to avoid including pleiotropic instruments affecting the outcome (protein levels) through other mechanisms than the exposure (LOAD). The primary reverse causation MR analysis was performed excluding any variants in the APOE locus (chr19:45,048,858–45,733,201, genome build GRCh37). Causal estimate in the forward MR for each protein was obtained by the generalized weighted least squares (GWLS) method88, which accounts for correlation between instruments. Causality for proteins with single cis-acting variants was assessed with the Wald ratio estimator. For the reverse MR analysis, the inverse variance weighted method was applied due to a more stringent LD filtering of the instruments. Instrument heterogeneity was evaluated with Cochran’s Q test and horizontal pleiotropy with the MR Eggerʼs test using the ‘TwoSampleMR’ R package.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.