Introduction

Being the most prevalent dementia type in the elderly worldwide, Alzheimer’s disease (AD) accounts for an estimated 60–80% of all dementia cases [1]. AD is characterized by cognitive impairments, such as memory loss, disorientation. Amyloid-β (Aβ) accumulation and tau neurofibrillary tangles are the main pathological hallmarks in the AD brain [2]. Amyloid precursor protein (APP), presenilin 1 (PSEN1), and presenilin 2 (PSEN2) are the causative genes of AD [3]. It is estimated that AD has a heritability of 70%, suggesting AD is a highly heritable disease [4].

Thanks to the rapid development of high-throughput sequencing technologies, over 30 AD risk genes have been identified by genome-wide association studies (GWASs) [4]. Apolipoprotein E (APOE) remains the strongest genetic risk factor in AD [5]. In 2009, the first two GWAS studies in AD demonstrated that CLU, CR1, and PICALM were risk genes of AD [6, 7]. In the second year, another GWAS identified that BIN1 was associated with AD [8]. In the subsequent GWAS studies, ABCA7, MS4A gene cluster (MS4A6A, MS4A6E), EPHA1, CD33, and CD2AP reached genome-wide statistical significance [9, 10]. Furthermore, 19 genes were related to AD, of which 11 genes were novel, including HLA-DRB5/HLA-DRB1, PTK2B, SORL1, FERMT2, etc. [11]. Besides, rare variants were identified by next-generation sequencings, such as PLD3 and ABCA7 [12, 13]. Interestingly, some rare AD-associated variants are located in AD risk genes with common variants related to AD, such as ABCA7 and SORL1, indicating that these genes are involved in the etiology of AD through multiple pathways [3].

Most genetic studies of AD are from Caucasian populations, whereas the genetic data of the Chinese population are limited. Genetic heterogeneity existed among different populations. Even for APOE ε4, the most prominent genetic risk factor in AD, its contribution to AD varied among different ethnic groups [14]. To investigate the roles of risk genes in the Chinese population systematically, we genotyped 33 AD risk genes in a large-scale Chinese population.

Materials and methods

Subjects

For targeted panel sequencing, we recruited 1192 AD patients from Xiangya Hospital and 2412 controls from a community in Changsha. The AD patients were diagnosed by two neurologists specializing in neurodegenerative disease. The AD patients met the National Institute on Aging-Alzheimer’s Association criteria for probable AD [15]. A battery of neuropsychological tests was performed in the AD patients by an experienced clinical neuropsychologist, including Mini-Mental State Exam (MMSE), Montreal Cognitive Assessment (MoCA), Clinical Dementia Rating (CDR), the activity of daily living (ADL), and Neuropsychiatric Inventory (NPI). The MMSE was also administered to the controls. The clinical data were collected by PhD students guided by experienced neurologists. Participants with causative mutations for AD (APP, PSEN1, and PSEN2) had been excluded by Sanger sequencing. This study was approved by the Ethics Committee of Xiangya Hospital, Central South University, China. Written informed consent was obtained from each participant or guardian.

Targeted sequencing

We designed a targeted sequencing panel composed of 33 AD risk genes, including APOE, BIN1, CD2AP, EPHA1, CLU, MS4A6A, MS4A6E, CD33, TTR, TMEM106B, PTK2B, SLC24A4, RIN3, DSG2, INPP5D, MEF2C, NME8, ZCWPW1, CELF1, FERMT2, CASS4, CR1, ABCA7, SORL1, TREM2, ADAM10, PLD3, PICALM, UNC5C, AKAP9, TTC3, PLCG2, and ABI3. These risk genes were identified by GWAS approaches or next-generation sequencing studies in AD cohorts. Our panel used biotinylated RNA probes to capture known DNA sequences from the human reference GRCh37. The designed probes and genomic locations for the 33 AD risk genes are shown in Supplementary Files 1 and 2. The panel’s design workflow involves five steps: (1) probe design, (2) oligo pool synthesis, (3) probe production, (4) wet-lab testing, and (5) data quality control and analysis. Genomic DNA was extracted from peripheral blood leukocytes using a QIAGEN kit. All DNA samples were normalized to 100 ng/μL. The genomic DNA was fragmented into 150–200 bp length fragments by Biorupter Pico. End-repairing, A-tailing, adaptor ligation, and an 11-cycle pre-capture PCR amplification were conducted in fragmented DNA. The fragmented DNA was captured by the targeted panel and sequenced on Illumina NovaSeq 6000 platform. The low-quality reads of fastq data were filtered out by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Then, the paired-end sequence reads were aligned to the human reference genome (UCSC hg19/GRCH37) using the BWA software (version 0.7.15, http://bio-bwa.sourceforge.net) [16]. Picard (version 2.18.7, http://broadinstitute.github.io/picard/) was used to remove duplicate sequence reads and index the sequencing data. The quality-score recalibration, local realignments, variant calling, and filtering were conducted by the Genome Analysis Toolkit (version 3.2, https://software.broadinstitute.org/gatk/) [17]. The variants were annotated using ANNOVAR (https://hpc.nih.gov/apps/ANNOVAR.html) [18]. Common variants and rare variants were classified on the basis of minor allele frequencies (MAF) at a threshold of 0.01 (common variants: MAF ≥ 0.01; rare variants: MAF < 0.01). Besides, ReVe, an algorithm developed by our team, was used to predict the pathogenicity of missense variants [19]. In our study, we defined the damaging variants as loss-of-function (LoF) variants or missense variants with ReVe > 0.7. LoF variants were considered as the variants resulting in stop, frameshift, or splice-site disruption. The variants were named based on the guidelines of the Human Genome Variation Society [20].

Blood sampling and analyses

In our study, a subgroup of 333 AD patients and 130 controls underwent plasma biomarkers testing. Specifically, the venous blood was collected and stored frozen at −80 °C before analysis. Plasma Aβ42, Aβ40, t-tau, and neurofilament light chain (NFL) levels were determined using the single-molecule array (Simoa)-HD1 platform (Simoa; Quanterix, USA). Aβ42, Aβ40, and t-tau levels were determined using a multiplex array (Neurology 3-Plex A Advantage Kit, N3PA), and NFL levels were measured via a single-analyte array (NF-light). Specifically, the calibrators were kept at room temperature. The N3PA assay or NF-Light assay definition was imported under Custom Assay. Plasma samples were manually diluted 4× with sample diluent. The beads were vortexed for at least 30 s and the prepared reagents (Bead Reagent, Detector Reagent, SBG Reagent, Sample Diluent) were added into the reagent bay. Meanwhile, the resorufin ß-D-galactopyranoside was loaded into the sample bay. Finally, the concentration of each sample was assessed by Simoa software with Neat or the standard 4× dilution protocol for AD patients and controls. All samples were measured with the two-step immunoassay. All measurements were conducted by well-trained technicians who were blinded to the clinical information.

Statistical analysis

The variants with genotyping rate <95%, Hardy–Weinberg equilibrium P value < 1 × 10−6 in the controls, genotype quality ≤ 20, allelic balance out of 25%/75% ratio of referent and alternate allele reads in the heterozygote, and allelic balance out of 95% ratio of in the homozygote were filtered out with the use of PLINK 1.9. We performed the common variant (MAF ≥ 0.01) based association analysis between 1192 AD patients and 2412 controls using PLINK 1.9 [21]. Age, gender, and APOE ε4 status (APOE ε4+, APOE ε4–) were adjusted by PLINK 1.9 for each common variant. Linkage disequilibrium (LD) patterns of significant variants were reconstructed using Haploview version 4.2 [22].

PRS was generated using PRSice-2 [23], and the receiver operating characteristic curve (ROC) was drawn by the R software (version 4.0.3, R Project for Statistical Computing). The area under the ROC (AUC) was calculated. Moreover, the participants were divided into four groups based on the PRS quartile. Using the Cox proportional hazard model, we investigated the associations of PRS scores with the cumulative incidence rate of AD. The associations of PRS and plasma biomarkers were performed using the Spearman correlation test.

In addition, using the Sequence Kernel Association Test-Optimal (SKAT-O test) [24], we performed the gene-based association test by combining rare variants between AD patients and controls. Rare variants were further classified as followings: rare damaging variants (MAF < 0.01, LoF or ReVe > 0.7), rare damaging missense variants (MAF < 0.01, ReVe > 0.7), rare LoF variants (MAF < 0.01, LoF), rare missense variants (MAF < 0.01, missense), and rare synonymous variants (MAF < 0.01, synonymous). Age, gender, and APOE ε4 status were adjusted by SKAT-O. A cutoff P value * n < 0.05 was considered statistically significant (n is defined by the number of common variants or genes).

Results

Demographic and clinical information

On average, the sequencing coverage (or sequencing depth) was 621.81× and the percentage of base sequences ≥20× was 98.36%. A total of 1194 AD patients and 2412 controls were enrolled. The average onset age of AD patients was 63.93 years, and the average age of controls was 64.76 years. With regard to age, no significant difference was observed between AD patients and controls (P = 0.06). The MMSE scores of AD patients were significantly higher than those of controls (P = 4.84 × 10−6). Furthermore, in the AD patients, the average MoCA, CDR, ADL, and NPI scores were 8.46, 1.29, 34.41, and 18.05, respectively.

Plasma Aβ42 levels and Aβ42/Aβ40 ratio in AD were lower than those than in controls (Aβ42: P = 5.32 × 10−3, Aβ42/Aβ40: P = 9.11 × 10−12). The levels of plasma T-tau and NFL were higher than those in controls (T-tau: P = 9.72 × 10−16, NFL: P = 2.20 × 10−16) (Table 1).

Table 1 Characteristics of AD patients and controls.

Common variant association test

After quality control, 217 common variants were identified in AD patients and controls. After adjusting for age, gender, and APOE ε4 status, 34 variants were nominally associated with AD risk, including variants in ABCA7 (n = 18), NME8 (n = 3), APOE (n = 2), BIN1 (n = 2), SORL1 (n = 2), INPP5D (n = 2), UNC5C (n = 1), CLU (n = 1), MS4A6E (n = 1), PICALM (n = 1), and TEME106B (n = 1) (adjusted P < 0.05). Based on Bonferroni corrected P value (P < 2.30 × 10−4, 0.05/217), six variants differed significantly between AD patients and controls, involving APOE rs429358 (adjusted P = 1.82 × 10−14), ABCA7 rs3752246 (adjusted P = 3.66 × 10−6), ABCA7 rs3752229 (adjusted P = 1.83 × 10−5), ABCA7 rs3764648 (adjusted P = 3.98 × 10−5), ABCA7 rs4147914 (adjusted P = 1.64 × 10−4), and ABCA7 rs150594667 (adjusted P = 1.77 × 10−4) (Table 2 and Fig. 1). The LD patterns of variants in the ABCA7 (rs3752246-rs3752229-rs3764648-rs4147914-rs150594667) were similar between AD patients and controls (Supplementary Fig. 1). In our study, the nominal common variants with adjusted P < 0.05 are listed in Supplementary File 3. Given that nominal variants may also play important roles in AD, we performed the network biology approach using the Network Assisted Genomic Association (NAGA) [25]. NAGA study revealed that several genes may be implicated in the AD etiology, including APOE, APOC2, APOC1, APOC4, CLPTM1, TOMM40, etc. (Supplementary Fig. 2).

Table 2 The significant common variants between AD patients and controls.
Fig. 1: Regional association plots.
figure 1

Regional association plots of the APOE (a) and ABCA7 loci (b). Purple diamonds represent the sentinel variant in the corresponding locus. Colors show the LD measured as R2 between the sentinel variant and its neighboring variants. cMMb centimorgans per megabase.

Discriminative and predictive performance of PRS

PRS was generated using PRSice-2. As expected, the PRS values in AD patients were significantly higher than those in controls (P < 2.2 × 10−16) (Fig. 2a). The AUC of the model was 0.71 (95% confidence interval: 0.69–0.72) (Fig. 2b). The effects on AD occurrence were evaluated using a Cox proportional hazards model. Based on their individual PRS, all AD cases were separated into quartiles. Using the Cox proportional hazards model, we found that the highest PRS quartile was significantly associated with an earlier onset age compared to those in the lowest quartile (high PRS vs low PRS, OR = 1.36, P = 4.30 × 10−4, 95% CI: 1.15–1.60). For instance, the expected onset age for 60% to develop AD was around 70 years in the low PRS group, later that that in the high PRS group (the expected age of onset was about 66 years). Meanwhile, at the age of 70, the cumulative incidence rates of AD patients in the high PRS group were higher than that of the low PRS group, which were approximately 70% and 60%, respectively (Fig. 2c).

Fig. 2: Discriminative and predictive performance of PRS.
figure 2

PRS between AD patients and controls (a) (***P < 0.0001, PRS polygenic risk score). The discriminative ability of PRS model (b). The cumulative incidence of AD in high and low PRS groups (c).

Correlations between PRS and AD plasma biomarkers

PRS was inversely associated with plasma Aβ42 (P = 0.0013, Spearman ρ = −0.1487) and the ratio of Aβ42/Aβ40 (P = 1.78 × 10−9, Spearman ρ = −0.2749). No significant correlation between PRS and plasma Aβ40 was observed (P = 0.9170, Spearman ρ = 0.0049). Meanwhile, PRS was positively correlated with plasma T-tau (P = 6.03 × 10−5, Spearman ρ = 0.1853) as well as plasma NFL (P = 0.0162, Spearman ρ = 0.1179) (Fig. 3). Furthermore, using general linear regression, the associations of plasma Aβ42, Aβ42/Aβ40 ratio, and T-tau with PRS remained significant even after adjusting for age and gender (Aβ42, β = −2.941, adjusted P = 0.0034; Aβ42/Aβ40 ratio, β = −4.496, adjusted P = 8.77 × 10−6; T-tau, β = 2.877, adjusted P = 0.0042). Also, plasma NFL was nominally associated with PRS after the adjustment of age and gender (β = 1.777, adjusted P = 0.0762).

Fig. 3: Correlations between PRSs and plasma biomarkers (Spearman correlation coefficients (ρ) and P values were used to evaluate the correlations).
figure 3

PRS and Aβ42 (a); PRS and Aβ42/Aβ42 ratio (b); PRS and T-tau (c); PRS and NFL (d).

Gene-level aggregation testing

After quality control, 4277 rare variants were identified in our study. The rare variants were collapsed together within genes and their joint effects were investigated. P value less than 1.52 × 10−3 was considered significant based on Bonferroni correction (0.05/33). When analyzing rare damaging variants and rare damaging missense variants, ABCA7 all reached statistical significance (adjusted P = 1.32 × 10−3 and adjusted P = 7.48 × 10−4, respectively). Gene-based association analysis on rare missense variants revealed that UNC5C and ABCA7 were significantly associated with AD (adjusted P = 1.14 × 10−3 and adjusted P = 1.20 × 10−3, respectively) (Table 3).

Table 3 Significant genes between AD and controls in the SKAT-O test.

Discussion

A number of risk genes contribute to the development of AD. However, the vast majority of studies were performed in the Caucasian population. Most studies focused on reported variants based on array-based SNP genotyping. In this study, we systematically screened 33 AD risk genes in the mainland Chinese population. In the common variant association test, six variants located within APOE and ABCA7 differed significantly between AD patients and controls. PRS was associated with onset age and plasma biomarkers of AD. Pathway enrichment analysis revealed that several processes were associated with AD. Furthermore, gene-based association analyses demonstrated that UNC5C and ABCA7 were associated with AD risk.

APOE, located in chromosome 19q13.2, is the most important risk gene for AD. We found that the ε4 allele of APOE (rs429358) conferred susceptibility to AD, which was similar to the finding in the Caucasian population [26]. The recent large-scale GWAS also revealed that APOE ε4 remains the strongest genetic risk factor [27]. Generally, one APOE ε4 allele enhanced the risk of developing AD by about 3.7 times in the Caucasian population [28], while our study demonstrated that one APOE ε4 allele increased the risk of AD by 5.7 times. Similarly, in the Japanese population, the APOE ε4 allele also exhibited a higher risk effect on AD compared to the Caucasian population [26]. These findings underscored that APOE ε4 may be more harmful in the Asian population than in the Caucasian population.

We identified that five ABCA7 common risk variants were correlated to AD risk, including rs3752246, rs3752229, rs3764648, rs4147914, and rs150594667. Among them, rs3752246 was described previously while the remaining four variants were novel [29]. ATP-binding cassette, sub-family A, member 7 (ABCA7) is composed of 47 exons and encodes a 220-kDa protein. ABCA7 is expressed in brain tissue and linked to lipid metabolism, regulation of phagocytosis as well as amyloid-β metabolism [19]. In 2011, the SNP rs3764650 of ABCA7 obtained genome-wide significance in the Caucasian population, firstly suggesting that ABCA7 is a risk gene of AD [9]. The subsequent large Caucasian-based GWAS studies revealed that rs3752246 and rs4147929 were significantly associated with AD [10, 11]. A meta-analysis revealed that three variants increased the risk of developing AD, namely rs3764650, rs3752246, and rs4147929 [29]. Therefore, the association between ABCA7 and AD is well established. In our study, we confirmed that rs3752246 was associated with AD, supporting its risk role in the pathogenesis of AD. In addition, the remaining four significant variants were in strong LD with rs3752246 in our sample (rs3752229 vs rs3752246: D’ = 0.78, R2 = 0.57; rs3764648 vs rs3752246: D’ = 0.87, R2 = 0.69; rs4147914 vs rs3752246: D’ = 0.78, R2 = 0.52; rs150594667 vs rs3752246: D’ = 1.00, R2 = 0.01). These findings indicated that they may tag the same functional variant [30]. Further larger sample studies and functional experiments are warranted to validate their roles in AD. In addition, using NAGA, we found that several genes, including APOE, APOC2, APOC1, APOC4, CLPTM1, TOMM40, etc., were implicated in AD pathogenesis. These genes involve APOE itself or genes located near the APOE gene [31, 32], indicating the important role of APOE in the etiology of AD.

Common variants contribute to AD; however, their effects are relatively limited. PRS has been widely applied in predicting individuals at high risk for common diseases [33]. Using the genotype data from the International Genomics of Alzheimer’s Project, the PRS exhibited 0.75–0.84 prediction accuracy of AD risk [34]. In the Chinese population, Li et al. genotyped 35 SNPs and PRS models were built, demonstrating 0.61–0.66 prediction accuracy of AD risk [35]. In addition, in a recent large Chinese GWAS study, the top AUC was 0.73 when combining the significant variants and APOE status [36]. Similarly, in our study, we found that the AUC of the RPS model in AD was 0.71 (ranging from 0.69 to 0.72), indicating the PRS model could predict AD risk to some extent in the Chinese population. Furthermore, we revealed that high PRS was associated with an earlier onset age, and the cumulative incidence rate of the high PRS group was higher than that of the low PRS group in the same age. Leonenko et al. identified that PRS could predict the age-specific risk for developing AD [37]. Meanwhile, another study also revealed that PRS was correlated with onset age and AD risk in the Chinese population [35]. Accordingly, the high PRS might help clinicians to prioritize the individuals who most likely to develop AD and benefit from early prevention as well as treatment.

We observed that PRS was associated with decreased plasma Aβ42 levels and Aβ42/Aβ40 ratio as well as increased plasma T-tau and NFL levels. In 2018, the ATN classification system was issued, composed of β amyloid deposition (“A”), pathologic tau (“T”), and neurodegeneration (“N”). Although the ATN classification system greatly facilitates the diagnosis of AD, the invasive cerebrospinal fluid sampling and expensive PET scan constrain their widespread use [38]. Plasma Aβ42, Aβ42/Aβ40 ratio, total tau, and NFL are accessible and potentially useful biomarkers in AD [39,40,41]. A subgroup of our sample demonstrated that plasma biomarkers were significantly altered in AD patients, further supporting their utility for screening and diagnosing of AD. Interestingly, our study revealed a significant relationship between PRS and plasma biomarkers. The effects of genetic risk on AD biomarkers have been studied previously. CSF Aβ42, Aβ42/Aβ40 ratio, T-tau, and P-tau were correlated with PRS in AD patients and controls [35]. In cognitively healthy elders, PRS was associated with CSF NFL levels in individuals without Aβ42 pathology [42]. In the Hong Kong Chinese AD cohort, PRS was associated with plasma Aβ42 level and Aβ42/Aβ40 ratio [43]. Our investigation firstly identified that PRS was associated with plasma NFL, T-tau, and confirmed that PRS was related to plasma Aβ42 level and the ratio Aβ42/Aβ40. We highlighted that the aggregate genetic risk may modulate the individual pathogenic and biological alterations. Besides, given that the pathological changes occurred over two decades before clinical symptoms onset [44], the utility of PRS may be promising in identifying the subjects with abnormal plasma AD biomarkers.

Gene-based analysis observed two genes, ABCA7 and UNC5C, were significantly associated with AD by the SKAT-O test. Intriguingly, in our study, ABCA7 modulated the risk of AD both in common variant association tests and gene-based analysis. Accumulating evidence showed that ABCA7 is a significant risk gene harboring both common and rare risk variants in the development of AD [30]. The high burden of ABCA7 LoF variants and missense variants was observed previously in AD [13, 45]. We found that ABCA7 rare damaging variants were enriched in AD cases, which was in line with a study conducted in the French cohort and further underscored the damaging role of ABCA7 rare variants in AD across different populations [46]. UNC5C localizes on 4q22.3 and encodes UNC5C mediating neuronal apoptosis [47]. A rare coding mutation, UNC5C T835M, segregated with AD in two families and associated with AD in large case–control cohorts [48]. Our group previously revealed that several rare coding variants may confer a certain risk of AD [49]. A rare missense variant, UNC5C D353N, existed in five affected individuals in the AD family and may be involved in AD [50]. Our study determined that the burden of rare missense variants in UNC5C was significantly associated with AD, further indicating that UNC5C was implicated in AD via the modulation of rare variants.

Conclusions

The common variant association test indicated that APOE and ABCA7 were associated with AD in the mainland Chinese population. PRS is of potential use in assessing the risk and onset age of AD as well as plasma AD biomarkers. Gene-level aggregation testing indicated that ABCA7 and UNC5C may contribute to the etiology of AD in the mainland Chinese population.