Prediction of clinical diagnosis of Alzheimer’s disease, vascular, mixed, and all-cause dementia by a polygenic risk score and APOE status in a community-based cohort prospectively followed over 17 years

The strongest genetic risk factor for Alzheimer’s disease (AD) is the ε4 allele of Apolipoprotein E (APOE) and recent genome-wide association meta-analyses have confirmed additional associated genetic loci with smaller effects. The aim of this study was to investigate the ability of an AD polygenic risk score (PRS) and APOE status to predict clinical diagnosis of AD, vascular (VD), mixed (MD), and all-cause dementia in a community-based cohort prospectively followed over 17 years and secondarily across age, sex, and education strata. A PRS encompassing genetic variants reaching genome-wide significant associations to AD (excluding APOE) from the most recent genome-wide association meta-analysis data was calculated and APOE status was determined in 5203 participants. During follow-up, 103, 111, 58, and 359 participants were diagnosed with AD, VD, MD, and all-cause dementia, respectively. Prediction ability of AD, VD, MD, and all-cause dementia by the PRS and APOE was assessed by multiple logistic regression and receiver operating characteristic curve analyses. The PRS per standard deviation increase in score and APOE4 positivity (≥1 ε4 allele) were significantly associated with greater odds of AD (OR, 95% CI: PRS: 1.70, 1.45–1.99; APOE4: 3.34, 2.24–4.99) and AD prediction accuracy was significantly improved when adding the PRS to a base model of age, sex, and education (ASE) (c-statistics: ASE, 0.772; ASE + PRS, 0.810). The PRS enriched the ability of APOE to discern AD with stronger associations than to VD, MD, or all-cause dementia in a prospective community-based cohort.


Introduction
The etiology of Alzheimer's disease (AD), the most prevalent form of dementia, remains poorly understood, although it is evident that genetic predisposition plays a fundamental role [1]. The heritability of late-onset AD has been estimated as high as 79% [2]. The ε4 allele of Apolipoprotein E (APOE4) is the strongest known genetic risk factor of late-onset AD, but only 7% of dementia cases are attributable to APOE4 [3], suggesting that additional genetic or environmental factors are of high relevance in AD pathogenesis [4]. In recent years large-scale genome-wide association studies (GWAS), including meta-analyses with up to 94,437 AD cases, have identified and confirmed many more genetic loci associated with AD beyond APOE4 [5][6][7].
In order to collectively consider the relatively small effects of the individual genetic loci, the development of polygenic risk scores (PRSs) for AD has greatly advanced [8]. The results have illustrated that genetic risk, as measured by the PRSs, was consistently significantly associated with AD [8], although disease prediction accuracy was quite varied (cstatistic range: 0.57-0.84) [9][10][11][12][13][14][15][16]. The majority of previous studies have examined the PRS in a case-control study design in a sample within or associated to a previous genome-wide association (GWA) meta-analysis, the International Genomics of Alzheimer's Project (IGAP), from which the associated genetic variants included in the PRSs were derived [5,8].
The combination of APOE4 presence and PRS classification presents a genetic risk stratification strategy that may be beneficial for future use in therapeutic development research and precision medicine. A more complex genetic risk stratification strategy could provide more specific information, which could become critical in individualized therapeutics [17,18]. However, AD rarely occurs in isolation [19] and the relationship between AD genetic risk and other dementias could better inform risk stratification and the specificity of AD genetic risk.
To our knowledge a PRS for AD has not been evaluated in a prospective community-based cohort completely independent of the IGAP consortia or used the most recent AD GWAS meta-analyses data, and the association to vascular and mixed dementia (VD/MD) has yet to be explored. The aim of this study was to build upon previous work by calculating a PRS utilizing AD associated single nucleotide polymorphisms (SNPs) from the largest GWA metaanalysis to date, and to evaluate the score's prediction of clinical diagnosis of AD, VD, MD, and all-cause dementia within a large community-based cohort study followed over 17 years. A secondary aim of this study was to investigate the PRS and APOE4 across age, sex, and education strata.

Study design and population
The PRS was derived from the most recent IGAP metaanalysis data [7] and applied in a prospective populationbased cohort, the ESTHER study, followed over 17 years [20,21].
Summary statistics from stage 1 of the IGAP metaanalyses from Kunkle et al. were utilized [7], in which genotyped and imputed data on 11,480,632 SNPs was used to meta-analyze four previously published GWAS consortia datasets consisting of 21,982 AD cases and 41,944 controls (The Alzheimer Disease Genetics Consortium; The European AD Initiative; The Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium; and The Genetic and Environmental Risk in AD Consortium Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for AD Consortium (GERAD/ PERADES)) [7].
The subjects for the analyses for this study are drawn from the ESTHER study, a large population-based cohort study conducted in Saarland, Germany [20,21]. A total of 9940 participants aged 50-75 years attending a general health examination were recruited by their general practitioners (GPs) in a statewide study in Saarland, Germany in 2000-2002. A general health examination is offered at no cost to the patient every two years to adults aged 35 and older in the German health care system. Participants completed standardized self-administered health questionnaires and provided blood samples, which were stored at −80°. Information regarding age, sex, education, medical history, and lifestyle factors was collected at baseline through participant questionnaires and medical records. Follow-up questionnaires, medical records, and biological samples were collected after 2,5,8,11,14, and 17 years. The ESTHER study was approved by the Ethics Committee of the Medical Faculty of Heidelberg University and of the Physicians' Board of Saarland, and all participants gave written informed consent.
AD, VD, MD, and all-cause dementia diagnoses were collected from participants' GPs during the 14-year and 17year ESTHER follow-ups as previously reported [22]. Briefly, GPs of all ESTHER participants were contacted at the 14-year and 17-year follow-ups and asked to fill out a detailed questionnaire regarding dementia diagnoses of their patients as well as to provide all available medical records of neurologists, psychiatrists, memory, or other specialized providers. The 17-year follow-up was still pending a response from a second mailing of the dementia questionnaire to those GPs who had not yet responded to the initial mailing at the time of this publication. The current guidelines in Germany for AD diagnosis follow the National Institute on Aging and the Alzheimer's Association [23] or the International Working group (IWG)-2 criteria [24,25], for VD diagnosis the National Institute of Neurological Disorders and Stroke-Association Internationale pour la Recherche et l'Enseignement en Neurosciences criteria [26], and for MD diagnosis the IWGcriteria for mixed dementia [24,25]. All-cause dementia diagnoses are recommended if the dementia symptoms outlined by the ICD-10 are present for at least 6 months [25,27]. Participants with dementia diagnoses before the age of 65 (n = 7) and those that did not have APOE genotyped information (n = 141) were excluded. Overall, 5203 participants with available genotyping and dementia information were included in this study (Fig. 1).

Genotyping and imputation
APOE was determined based on allelic combinations of the SNPs rs7412 and rs429358 using predesigned TaqMan SNP genotyping assays (Applied Biosystems, Foster City, CA). Genotypes were analyzed in an endpoint allelic discrimination read using the Bio-RAD CFX Connect System (Bio-Rad Laboratories, Hercules, CA). DNA genotyping has been previously described elsewhere [28]. Briefly, blood samples were taken during a routine health examination and stored at −80°C until analysis. DNA from whole blood samples was collected using a salting out procedure. The extracted DNA from blood cells was genotyped using the Illumina Infinium OncoArray and Global Screening Array BeadChips (Illumina, San Diego, CA, USA).
General genotyping quality control assessment was done following the Nature Protocols article from Anderson et al. [29]. Imputation of the quality controlled data was conducted using the Michigan Imputation Server, where SHAPEIT2 was used to phase the data, and MiniMac 4 was used to impute to the HRC Version r1.1 2016 reference panel [30,31].

Polygenic risk score calculation
The PRS in this study was a weighted score including AD associated SNPs, calculated by summing the number of risk alleles weighted by the magnitude of association (ln of the odds ratio (OR)) from Kunkle et al. [7].
Using summary statistics from Kunkle et al., SNPs reaching genome-wide significance in the IGAP metaanalysis were extracted from the imputed ESTHER data, which resulted in 1234 SNPs extracted. Linkage disequilibrium-based clumping was carried out, providing the most significantly associated SNP in each region of  linkage disequilibrium (using PLINK clumping command with a pairwise r 2 threshold of 0.2). After linkage disequilibrium-based clumping, 106 SNPs remained. Then, SNPs within or directly upstream/downstream from the APOE locus (chr19: 45,404,000-45,418,000) were excluded (n = 9). Finally, a minor allele frequency (MAF) threshold of 0.01 was applied that resulted in an additional 25 SNPs excluded. The remaining included SNPs had imputation quality median R 2 = 0.92 (R 2 range: 0.47-0.99). A total of 72 SNPs were included in the PRS (Supplementary Table 1).
The score was normalized by subtracting the mean and dividing by the standard deviation (SD), which were both calculated from the overall sample. For the sake of comparability of prediction performance of PRS and APOE, the cutoff for PRS+ was determined as the score point in which the proportion of PRS+ individuals was equal to the proportion of APOE4+ (≥1 ε4 allele) individuals in the control group. It should be noted that this not a true or validated threshold but was chosen for comparability with APOE only.

Statistical analyses
Descriptive statistics were calculated to provide information on participant characteristics, while chi-square and t-tests were completed comparing both AD, VD, MD, and allcause dementia cases to individuals without dementia diagnosis. Multivariable logistic regression models with 95% confidence intervals (CI) were used to assess differences in outcome as OR between dementia cases and individuals without dementia diagnosis based upon the PRS and APOE4 status. The PRS was considered per SD increase in score, as quartiles, and as a binary variable following the cutoff previously described. APOE status was utilized as a categorical variable based upon allele type/ count (APOE ε3ε4, ε4ε4 vs. ε3ε3) and as a binary variable (APOE4+: ≥1 ε4 allele vs. APOE-: no ε4 allele). In addition, PRS and APOE4 status were combined and ORs were calculated for individuals that were both PRS+ and APOE4 + compared with the reference PRS− and APOE4−. Covariates for all logistic regression analyses included age, sex, ten principal components, and education, measured by years of formal education (≤9, 10-11, ≥12 years; standard categories of the German school system; the lowest category corresponds to a leaving certificate from school, the highest category corresponds to qualification for university). Stratified analyses and interaction testing for age, sex, and education by PRS, APOE4, and PRS & APOE4 status together were computed for all outcomes. Multiple imputation (n = 5) for education covariates missing at random was carried out following the Markov chain Monte Carlo (MCMC) method [32].
Receiver operating characteristic (ROC) curve analysis was completed for both the PRS and APOE, where the PRS was considered continuously and APOE was considered categorically (APOE ε2ε2, ε2ε3, ε3ε4, ε4ε4 vs. ε3ε3). For AD, VD, MD, and all-cause dementia, ROC curves were calculated based upon: (1) age, sex, and education; (2) age, sex, education, and PRS; (3) age, sex, education, and APOE; and (4) age, sex, education, PRS, and APOE. ROC contrast analysis using the DeLong test was conducted to compare for significant differences between curves [33].
All statistical analyses were two-sided, conducted at an α-level 0.05, and completed using SAS software, version 9.4 (SAS institute, Cary, NC).

Participant characteristics
The participants from the ESTHER study that had both genotyping and dementia information available for these analyses included 103 AD, 111 VD, 58 MD, 359 all-cause dementia cases, and 4844 participants without dementia diagnosis. Seven participants had dementia diagnoses before the age of 65 and were therefore excluded. The mean length of follow-up of all included participants was 14.4 years. Main characteristics of study participants are shown in Table 1 and additional APOE characteristics in Supplementary Table 2. The mean age at baseline in AD cases was 67 years, VD/all-cause dementia cases 68 years, MD cases 69 years, and participants without dementia diagnosis 61 years. The mean age of diagnosis was 77, 79, 79, and 78 for AD, VD, MD, and all-cause dementia cases, respectively. All groups included slightly more females (51-59%) than males. In all groups the majority of individuals completed 9 years of formal education or less (83% AD cases, 78% VD cases, 80% MD cases, 81% all-cause dementia cases, and 72% participants without dementia). PRS positivity was evident among half (52%) of AD cases, 35% of VD cases, 31% of MD cases, 38% of all-cause dementia cases, and a quarter (25%) of participants without dementia diagnosis. APOE4 positivity was evident among half (51%) of AD cases, 37% of VD cases, 35% of MD cases, 40% of allcause dementia cases, and a quarter (25%) of participants without dementia diagnosis.

AD prediction
After linkage disequilibrium analyses, exclusion of the APOE locus including SNPS located directly down or upstream from APOE, and further exclusion of SNPs with MAF < 0.01, 72 SNPs reaching genome-wide significance were included in the PRS (Supplementary Table 1). PRS+ and APOE4+ participants had 3.40 (95%CI: 2.28-5.09) and 3.34 (95% CI: 2.24-4.99) times the odds of developing AD within 17 years than PRS-and APOE-participants, respectively ( Table 2). Participants that were both PRS+ APOE4 + had a 4.6-fold increased risk in developing AD compared with PRS− APOE4− participants (OR, 95% CI: 4.59, 2.96-7.11). Furthermore, increased odds of AD per SD increase of the PRS was evident (OR, 95% CI: 1.70, 1.45-1.99), which remained true even after additionally adjusting for APOE status (OR, 95% CI: 1.52, 1.26-1.84). Participants that had one and two APOE ε4 alleles had 3and 14-fold greater odds to be diagnosed with AD compared with participants with two ε3 alleles.
The addition of the PRS to the base model of age sex and education (ASE) significantly improved AD prediction (Fig. 2, c-statistic: ASE, 0.772; ASE + PRS, 0.810, p < 0.01).
Stratified analyses and interaction testing based upon age, sex, and education can be found in Table 3. There were no significant interactions, and individuals that were of high genetic risk (PRS+ & APOE4+) had similar odds of AD diagnosis regardless of age, sex, or education.

VD prediction
The PRS and APOE were predictive of VD (OR, 95% CI: PRS+: 1.65, 1.10-2.47; APOE4+: 1.84, 1.23-2.74; PRS+ APOE4+: 2.08, 1.34-3.25) ( Table 2). The genotype APOE ε3ε4 was associated with twofold greater odds of VD when compared with the reference group APOE ε3ε3. ROC curve analysis revealed no significant differences in prediction by  the addition of the PRS or APOE to age, sex, and education (Fig. 2). The stratified analyses revealed no significant interactions between AD genetic risk and age, sex, and education in the prediction of VD diagnosis (Table 3).

MD prediction
The PRS was not predictive of MD diagnosis; however, APOE4+ was predictive of MD diagnosis (OR, 95% CI: 1.75, 1.00-3.07) and participants that were APOE ε4ε4 compared with APOE ε3ε3 participants had 15-fold increased risk of diagnosis (OR, 95% CI: 14.81, 5.12-42.87). ROC curve analysis revealed no significant differences in prediction by the addition of the PRS or APOE to age, sex, and education (Fig. 2). Stratified analyses revealed an interaction between age and AD genetic risk, however a limited number of cases were included in the analysis (Table 3).

All-cause dementia prediction
PRS and APOE4 status were significantly predictive of allcause dementia ( Table 2). PRS+ and APOE4+ participants each had increased odds of all-cause dementia diagnosis (OR, 95% CI: PRS+: 1.91, 1.51-2.42; APOE4+: 2.20, 1.74-2.78). Participants that were PRS+APOE4+ expressed 2.5-fold increased odds of all-cause dementia. One SD increase in the PRS resulted in 1.4-fold greater odds of dementia diagnosis (OR, 95% CI: 1.36, 1.23-1.51). In addition, participants with one and two APOE ε4 alleles had two-and seven-fold increased odds of all-cause dementia compared with participants with two ε3 alleles.
ROC curve analysis illustrated all-cause dementia prediction was significantly improved when adding the PRS There were no significant interactions between AD genetic risk and age, sex, and education in the prediction of dementia diagnosis (Table 3).

Discussion
In a prospective community-based cohort independent of the IGAP consortia, PRS positivity expressed significant predictive ability of AD diagnosis beyond APOE status, with stronger associations to AD than VD, MD, or all-cause dementia. Participants that were both PRS and APOE4 positive exhibited 4.6-fold greater odds of AD diagnosis within 17 years compared with participants who were both PRS and APOE4 negative.
Our PRS builds upon a rich foundation of previous AD PRSs, with nearly twenty studies expressing significant ability of the PRS to discern AD [8,34,35]. Four of these studies have also utilized a cohort approach: (1) Chouraki et al. used eight prospective cohorts from the IGAP consortia, a mix of varying types of cohort studies including the Rotterdam study; [36] (2) Tan et al. examined the PRS in a clinical cohort; [37] and (3) finally two studies by Ahmad et al. and Van der Lee et al. utilized the community-based Rotterdam cohort study [38,39]. Our community-based cohort was however the only study completely independent of previous GWA meta-analyses from which the PRSs were derived and that utilized the most recent IGAP data. Community-based cohorts play an implicit role in contribution to the study of risk factor-outcome associations [40] and are important in establishing the future role of PRSs in genetic risk stratification. Interestingly, in the Rotterdam cohort study Van der Lee at al. reported similar likelihood of AD or dementia development based upon the PRS (HR, 95% CI: AD, 1.11, 0.97-1.27; dementia, 1.11, 0.99-1.26) [39]. No other studies considered prediction of all-cause dementia and none VD/MD. In our study, we found the PRS to be more predictive of AD than VD, MD, and all-cause dementia. A large proportion of all-cause dementia cases included AD cases in our study, which could explain the association between AD genetic risk and all-cause dementia. This could also be the reason for the similar associations found between the PRS and AD, and the PRS and dementia in the Rotterdam study [39].
Often AD diagnosed patients additionally exhibit cerebrovascular pathology and VD diagnosed patients have evident AD pathology, which may go undiagnosed as mixed dementia [41]. This should be taken into account when considering clinical diagnoses of dementia and could additionally account for the ability of our PRS to predict VD and all-cause dementia. However, the much larger associations between the PRS and AD compared with the other dementia subtypes supports the specificity of AD genetic risk and heterogeneity of the genetic architecture among dementia subtypes. Larger independent cohort studies are necessary to explore the relationship between AD genetic risk and other dementia subtypes for more insight All analyses stratified by 10 principal components and: sex were adjusted for age and education, age for sex and education, and education for sex and age. Bolded results indicate achievement of statistical significance, p < 0.05.
into the underlying genetic architecture and mediating influence of genetics.
There was a lack of significant interaction between AD genetic risk and age, sex, and education in the prediction of AD, VD, MD, and all-cause dementia diagnoses. This supports the idea that prediction of dementia based upon AD genetic risk is similar regardless of these important AD risk factors. The case numbers for several categories were however rather small and should be interpreted with caution.

Implications
Presently, the utilization of a PRS in addition to APOE4 could be used to enhance genetic risk stratification as it provides additional genetic information and greater AD prediction ability. PRSs in clinical trials could be used to target individuals who may be at risk for AD before any pathological changes occur in the brain, which is critical in the search for a successful therapy preventing AD.
In the future, PRSs for AD may play a paramount role in precision medicine with targeted therapies based upon AD genetic risk. Enhanced genetic risk stratification could also help identify the best candidates for AD preventive treatment, before accumulation of amyloid in the brain or for individualized treatments based upon genetic make-up. Recently, it has been shown that the effects of lifestyle behaviors are mediated by genetic risk in dementia development [42]. A multi-domain intervention approach involving modifiable vascular and lifestyle risk factors, which has shown to improve/maintain cognitive function in older adults, could be recommended based on genetic risk [43].

Strengths and weaknesses
The greatest strengths of our study is the epidemiological approach to the investigation of a PRS for AD in a large community-based cohort prospectively followed over 17 years that is completely independent of the IGAP consortia, and the novel use of the latest GWA meta-analysis data. In addition, we investigated VD/MD diagnoses and completed age, sex, education stratified analyses, which no other study has addressed.
There are however several limitations including the possibility of dementia misdiagnosis/underdiagnosis. The dementia diagnoses made in the ESTHER study were clinical diagnoses reported heterogeneously by numerous practitioners, and may be inferior to diagnostic standards that can be achieved in highly specialized academic settings. This is however the nature of community-based cohort studies, which portray common practice in such a setting. In addition, dementia neuropathologies are complex where AD pathology seldom occurs in isolation [19], further complicating diagnoses. Only 63% of participants with available genetic information also had dementia information available in our cohort. Although an inherent characteristic of prospective cohort studies, non-response bias may have led to an underestimation of dementia. The AD, MD, and VD case numbers were rather small, especially in the stratified analysis, which led to large CIs and a general lack of power. In addition, the participants without dementia diagnosis were significantly younger at baseline which could have led to missed dementia diagnoses that would have been made at higher ages. Finally, this study has limited generalizability as its population consisted of participants of European descent.

Conclusion
A PRS encompassing additional genetic variants derived from the most current AD GWA meta-analysis enriched the ability of APOE status to discern AD in a prospective community-based cohort followed over 17 years that was independent of previous GWA meta-analyses. The PRS expressed a greater ability to predict AD than VD, MD, or all-cause dementia. Therapeutic treatment development and eventually precision medicine could benefit from enhanced risk stratification through the utilization of an AD PRS in addition to APOE status.
Acknowledgements The ESTHER study was supported by grants from the Baden-Württemberg Ministry of Science, Research and Arts, the German Federal Ministry of Education and Research, the German Federal Ministry of Family, Senior Citizens, Women and Youth, the Saarland Ministry of Social Affairs, Health, Women and Family, and the Network Aging Research at Heidelberg University. HS is a doctoral student supported by a scholarship awarded from the Klaus Tschira Foundation. We thank the IGAP for providing summary results data for these analyses. The investigators within IGAP contributed to the design and implementation of IGAP and/or provided data but did not participate in analysis or writing of this report. Please see Supplementary Table 3 for information regarding IGAP support/ funders.
Funding Open Access funding enabled and organized by Projekt DEAL.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.