Introduction

Recent biotechnological advances have expanded the study of human diseases from descriptive to quantitative analyses. In particular, genomic variations are a major category of risk factors that contribute to various human diseases1. As genetic variants are present in the human genome, they are a reliable source of information for disease risk prediction throughout life2. Therefore, profiling genetic variation enables disease risk prediction in individuals before disease onset, which is especially valuable for clinical investigations and developing intervention strategies for age-related diseases such as Alzheimer’s disease (AD)3,4,5.

AD is one of the most common neurodegenerative diseases and is highly prevalent in older populations (~10% among people ≥65 years old)6. Genetic factors play a pivotal role in AD pathogenesis, supporting the utility of genetic information in AD risk prediction7,8. In addition, developing effective genetic screening tools for the early prediction of AD is vital for disease management9. However, recent genome-wide association studies (GWASs) have revealed that AD is polygenic in nature, with dozens of loci contributing to disease risk10,11,12,13,14,15,16,17,18,19. APOE-ε4 is the most prevalent genetic risk factor for AD20. However, as other common AD-associated variants exert small to moderate effects on AD risk, they cannot be used individually to infer disease risk10,11,12,13,14,15,16,17,18,19. Therefore, to determine an individual’s risk of developing AD, we need to develop models that encompass multiple informative genetic variants.

Tremendous efforts have been made to develop polygenic score models using genetic information to estimate disease risk21,22. One of the most commonly used is the weighted polygenic risk score (PRS) model, which predicts an individual’s risk of disease by summarizing the risk effects of multiple variants obtained from GWASs23. Numerous studies have investigated the utility of the weighted PRS model for classifying patients with various diseases20,24,25,26,27. In particular, the weighted PRS model can be used to classify clinically and pathologically confirmed AD as well as predict the onset age of AD. Collectively, these findings highlight the applicability of polygenic score models for predicting disease risk, particularly for AD28,29,30.

A weighted PRS model is constructed by multiplying the weighted sum of risk allele dosages by their corresponding effect sizes, which are derived from GWASs. However, most GWASs calculate the effect sizes of each variant independently without considering epistatic effects (i.e., the effects of interaction among the variants), resulting in an inaccurate estimation of an individual variant’s contribution to the disease31,32,33,34. Although various modified weighted PRS models have been proposed35,36,37, they have not been thoroughly tested using real-world data and are unlikely to adapt well to high-dimensional genomic data owing to their low model complexity (i.e., insufficient number of model parameters).

Notably, recent studies suggest the possible application of statistical learning (e.g., lasso [least absolute shrinkage and selection operator])38 and deep learning (e.g., neural network) models39,40 for polygenic risk analysis and disease risk classification. Specifically, as neural network models have higher model complexity (i.e., a greater number of model parameters) as well as sophisticated and multilayered architecture, they may be better suited to handle high-dimensional genomic data for disease classification than weighted PRS models. Nevertheless, the performance of lasso and neural network models for AD polygenic risk prediction has not been systematically evaluated. Therefore, it is of interest to investigate whether deep learning models, particularly neural network models, can be used for polygenic risk analysis and AD risk classification.

In this study, we aimed to develop neural network models for modeling AD polygenic risk. In particular, we find that neural network models are effective for classifying patients with AD, outperforming both weighted PRS and lasso models. Furthermore, by combining the predicted risk scores derived from neural network models with AD-associated endophenotypic data, we identify potential pathological mechanisms that contribute to AD polygenic risk. Together, our results suggest that deep learning methods can be used to predict AD risk, stratify at-risk individuals into subgroups, and identify the mechanisms underlying the disease.

Methods

Study data

To investigate the performance of polygenic score models for classifying AD risk, we included the array data from the National Institute on Aging Alzheimer’s Disease Centers (ADC) cohort (phs000372.v1.p1), the Late Onset Alzheimer’s Disease Family Study cohort (“LOAD cohort” hereafter, phs000168.v2.p2), and the Alzheimer’s Disease Neuroimaging Initiative cohort (ADNI) cohort (http://adni.loni.usc.edu/) in our analysis. The demographic data of these cohorts are presented in Supplementary Table 1. The details of the quality control and imputation processes are presented in the Supplementary Methods.

We included two Chinese whole-genome sequencing (WGS) cohorts to study the polygenic score models. The data for Chinese WGS cohort 1 (N = 2340 comprising 1116 patients with AD, 309 patients with mild cognitive impairment [MCI], and 915 age- and sex-matched normal controls [NCs]) have been published18. The data for Chinese WGS cohort 2 (N = 1077 comprising 356 patients with AD, 68 patients with MCI, and 653 age- and sex-matched NCs) have also been published38. The phenotypic data of the participants analyzed in this study were based on the participants’ most recent diagnostic records (as of December 2019). The study was approved by the Clinical Research & Ethics Committees of Joint Chinese University of Hong Kong-New Territories East cluster for Prince of Wales Hospital (CREC Ref no. 2015.461), Kowloon Central Cluster/Kowloon East Cluster for Queen Elizabeth Hospital (KC/KE-15-0024/FR-3), and Human Participants Research Panel of the Hong Kong University of Science and Technology (CRP#180 and CRP#225). All participants provided written informed consent for both study participation and sample collection.

Variant selection for model construction

We selected variants to evaluate the polygenic score models based on the AD GWAS summary statistics reported by Jansen et al.13. For model construction, we applied three different p-value thresholds (<1E−8, <1E−6, and <1E−4) to the resultant variants. We retained the variants detected by all imputed array data from the ADC, LOAD, and ADNI cohorts that also fell into the corresponding p-value ranges for model construction and comparison (selected according to single nucleotide polymorphism [SNP] ID).

For polygenic score analysis in the European-descent datasets, we compared the performance of the weighted PRS, lasso, and neural network models in three different scenarios: (i) using all the data from the three AD cohorts (i.e., ADC, LOAD, and ADNI) as training data; (ii) using all the data from the three AD cohorts for five-fold cross-validation analysis; and (iii) using two AD cohorts (i.e., ADC and LOAD) as training data and the remaining cohort (i.e., ADNI) as validation data.

For (ii), we conducted the five-fold cross-validation 10 times. We preclassified the samples using the createFolds function from the caret package in R; the classified labels were stored in a text file to allow for a fair comparison with other models (i.e., the weighted PRS and neural network models) for classification accuracy. For (iii), we used the data of 70% of individuals from the LOAD (n = 2995 of 4278) and ADC (n = 3984 of 5692) cohorts for model training and used the data of the remaining 30% (n = 2991 of 9970 total) to evaluate model accuracy at the end of each epoch. We used the data from the ADNI cohort (N = 1382 comprising 689 patients with AD and 693 NCs) as a cross-evaluation dataset to assess the performance of polygenic score models. Of note, to further avoid overestimating the model performance in the validation dataset rendered by potential duplicate samples among the three AD cohorts, we conducted identity-by-descent analysis using PLINK. We found 415 potential duplicate samples (PI_HAT > 0.90; n = 16, 348, and 51 for ADC, LOAD [identified and removed from the data], and ADNI, respectively) and then reconstructed the model and tested its performance.

Construction and testing of the weighted polygenic risk score model

We constructed the weighted PRS model by elementwise multiplication of allele dosage and selected the corresponding effect sizes from the AD GWAS summary statistics13 for the variants according to the p-value thresholds. We applied the same calculation method to evaluate the reported model for the Chinese population11. We calculated weighted PRSs in RStudio (v1.3.1056) using R programming (v4.0.2). We further used the effect sizes of the selected variants from another set of AD GWAS summary statistics (i.e., the IGAP 2019 Rare Variant Analysis stage 1 dataset)12 to generate a parallel weighted PRS model. More than 96% of the variants selected in the first GWAS were captured by the second GWAS for model construction.

In addition, we used LDpred35, Winner’s curse correction36, AnnoPred37, and SBayesR35 to model polygenic risk according to the instructions in each program’s user manual. We ran LDpred, AnnoPred, and SBayesR on the variant lists before the linkage disequilibrium (LD)-clumping steps, as these applications can utilize LD information for PRS modeling. Based on a p-value threshold of 1E−4, 1149 sites were excluded by LDpred because of its built-in filtering criteria (which removes all A/T and G/C SNPs), and 6860 sites were excluded by AnnoPred because the software was designed to only take variants listed in the HAPMAP3 dataset. AnnoPred produced no output among the four programs, possibly because too many variants were filtered. For model evaluation in the scenarios (i) (i.e., all samples) and (ii) (i.e., five-fold cross-validation) scenarios, we recalculated the effect sizes by conducting logistic regression—adjusting for the confounding effects of age, sex, and genomic structure (represented by the top five principal components)—in each training set. As for scenario (iii) and all remaining weighted PRS analyses, we obtained the effect sizes from the AD GWAS results reported by Jansen et al. to provide a more unbiased estimation on variant effects because of the large sample size.

For the Chinese data, we obtained the effect sizes from meta-analysis results of two Chinese datasets. We also used another published AD GWAS in the Chinese population to evaluate model performance for classifying AD risk18.

Construction and testing of the lasso model

We applied logistic regression to regress out the potentially confounding effects of age, sex, and genomic structure (represented by the top five principal components). We constructed the logistic lasso regression model using the cv.glmnet() function from the glmnet package, with five-fold cross-validation (alpha = 1, type.measure = “mse”, nfolds = 5) for the variants selected according to the p-value thresholds38. We selected the λ-value that retained the most variants for the risk score calculation and used the predict() function to retrieve the polygenic scores.

Regarding the Chinese WGS datasets, we used Chinese WGS cohort 1 for the training dataset; we applied the same approach for model construction using the information from the 37 variants selected by the association test (regressing out the potentially confounding effects of age, sex, and genomic structure). We subsequently used Chinese WGS cohort 2 to evaluate the resultant model. We performed all analyses of the lasso polygenic score model in RStudio (v1.3.1056) using R programming (v4.0.2). We fixed the value of the random seed to the same constant value before performing all analyses.

Construction and testing of the neural network model

We constructed the neural network model using the Sequential() function from the Keras package, an API for TensorFlow. Before performing analyzing the European-descent population, we annotated the selected SNPs for their associated loci using ANNOVAR41 (77, 141, and 696 loci for p-value thresholds <1E−8, <1E−6, and <1E−4, respectively). We designed a seven-layer model, with the first and third layers as dropout layers (dropout rate = 0.2 or 0.3). We designed the number of nodes as follows: 3 × number of loci (based on the assumption that a maximum of three different haplotypes are associated with AD in each locus), 1 × number of loci (corresponding to the locus number), 22 (corresponding to the chromosome number), 5 (an arbitrary number corresponding to the potential number of pathways that affect the disease in parallel), and 1 (corresponding to the risk score). We applied exponential decay using the ExponentialDecay() function to accelerate the analysis (decay steps = 100,000, decay rate = 0.96, staircase = True) with the sigmoid function as the activation function. We applied the binary crossentropy loss function and evaluated model accuracy using the accuracy, auROC, and auPRC metrics. We applied the neural network models for polygenic score analysis in the European-descent population for three scenarios as described in the previous section: (i) no validation, (ii) five-fold cross-validation, and (iii) validation using an independent cohort. For (i), we configured the models with 2000 epochs, a batch size of 256, and a learning rate of 0.5. For (ii), we configured the models with 1500 epochs, a batch size of 1024, and a learning rate of 0.5. For (iii), we chose the number of epochs (i.e., 500–800) by observing the model performance plot for the training and validation datasets. We further applied the early stopping using the EarlyStopping() function (patience = 50 or 100 epochs) when examining the transethnic performance of the neural network model (i.e., training models on Chinese data before applying them to European-descent data, or vice versa).

During model training for the Chinese WGS datasets, we used Chinese WGS cohorts 1 and 2 as the training and validation datasets, respectively. Accordingly, we designed a seven-layer model for the study, with the second and fourth layers as dropout layers (dropout rate = 0.3); the numbers of nodes in each layer were 50, 30, 10, 5, and 1. We applied the binary crossentropy loss function and evaluated model accuracy using the accuracy, auROC, and auPRC metrics. We configured the models with a batch size of 256. We chose the number of epochs (i.e., 1000) by observing the model performance plot for the training and validation datasets. Moreover, we used the backend.function() function to extract the outputs from the nodes from the penultimate layer for further analyses. We fixed the value of the random seed to the same constant value before conducting all analyses.

Construction and testing of the graph neural network model

We modeled disease risk as a graph classification problem. In brief, each participant was represented in a graph with nodes denoting the 37 selected variants, edges denoting pairwise LD (calculated by PLINK) among the variants, and graph labels denoting phenotypes. For node features, in addition to the allele dosage, we considered the biological properties of variants, including whether they resided in coding or untranslated regions, and the number of events of histone, open chromatin, polymerase, and transcription factor binding. We retrieved these biological properties of variants from the SNPnexus database42. Considering the possible variations of LD among ethnic backgrounds, we first inferred the ethnic backgrounds of 11,352 participants from the ADNI, ADC, and LOAD cohorts using GRAF-pop software43. Accordingly, for each ethnic group, we obtained the LD for 37 variants using the 1000 Genomes Project Phase 3 data44.

To construct the graph neural network model, all node features were normalized dimension-wise. We used an R2 threshold of 0.6 to determine if two variants connected and created the adjacent matrix for the edge information. In particular, for each individual, we used the LD data obtained from the matched ethnic background to construct the adjacent matrix. We followed the common practice45 of training a three-layer graph convolutional network46 with 128 hidden dimensions. We used two fully connected layers as the final classifier with 64 dimensions. We adopted Relu as the nonlinear function and employed global max pooling. We implemented the model using Pytorch47 and trained it using Adam optimizer48 with an early stop (patience step = 20).

Whole-genome sequencing

We performed WGS (5× coverage) using Novogene. We sequenced the genomic DNA libraries on an Illumina HiSeq × Ten and NovaSeq platform (San Diego, CA, USA) (150-bp paired-end reads). We adopted the GotCloud pipeline49 to detect variants from our low-pass WGS data. In brief, we subjected sequencing data to FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for quality control and Trimmomatic50 to trim and filter low-quality reads. We mapped clean data to the GRCh37 reference genome containing decoy fragments using BWA-mem. We conducted subsequent analysis by subjecting data to the GotCloud pipeline with data processing and variant detection using the default settings18. We then subjected the clean genotype files to Beagle51 for genotyping refinement. For Chinese WGS cohort 2, we also used Thunder52 for genotyping refinement after Beagle processing.

Analysis of plasma protein and brain imaging data

Next, we analyzed plasma protein and brain imaging data collected from a subgroup of participants from Chinese WGS cohort 238. Specifically, we analyzed plasma amyloid-beta (Aβ)42, Aβ40, tau, and neurofilament light polypeptide (NfL) levels in 157 patients with AD and 125 NCs by single-molecule detection assay (Neurology 3-Plex A Advantage Kit, #101995; NF-light Advantage Kit, #103186; Quanterix, Billerica, MA, USA). We also examined plasma p-tau181 levels in 154 patients with AD and 118 NCs (pTau-181 Advantage V2 Kit, #103714, Quanterix, Billerica, MA, USA). Detection was performed at the Quanterix Accelerator Lab (Boston, MA, USA). Moreover, the plasma samples of 97 patients with AD and 69 NCs from Chinese WGS cohort 2 were further subjected to Olink Proteomics (Boston, MA, USA) to determine the abundance of 1,160 plasma proteins by proximity extension assay. The following panels were used for the analysis: Cardiometabolic (91802), Cardiovascular II (91202), Cardiovascular III (91203), Cell Regulation (91702), Development (91703), Immune Response (91701), Inflammation (91301), Metabolism (91801), Neuro Exploratory (91502), Neurology (91501), Oncology II (91402), Oncology III (91403), and Organ Damage (91901).

For brain imaging analysis, we retrieved T1-weighted magnetization-prepared rapid acquisition with gradient-echo (MPRAGE) and fluid-attenuated inversion recovery (FLAIR) sequences for 78 patients with AD and 104 NCs from Prince of Wales Hospital (Hong Kong, China). We deidentified the raw imaging files and sent them to BrainNow Medical Technology (Hong Kong, China) to analyze volumetric information in different brain regions and white matter hyperintensity levels. We did not perform multiple test adjustment because of the limited sample sizes of the plasma protein and brain magnetic resonance imaging data.

Statistical analysis

We performed a meta-analysis with a fixed-effects model using METASOFT software (v2.0.0) for variant analysis. We evaluated the classification accuracy of the models by calculating the area under the receiver operating characteristic curve (auROC) using the roc() function from the pROC package or the area under the precision–recall curve (auPRC) using the pr.curve() function from the PRROC package. We estimated the 95% confidence intervals of the auROC using bootstrap methods from the ci.auc() function. We used the roc.test() function using the bootstrap method to test the potential differences in auROCs obtained from the different models. We determined the low-, medium-, and high-risk groups by fitting risk scores to a Gaussian mixture model using the normalmixEM() function from the mixtools package for the patients with MCI in Chinese WGS cohort 1 (k = 3, maxit = 200, ECM = T). We calculated the probability of an individual being classified into the low- or high-risk group by using the corresponding scores as the input for the fitted probability distributions (using values of μ and σ from the fitted Gaussian mixture model). Meanwhile, we calculated the probability of an individual being classified into the medium-risk group by calculating the difference between 1 and the sum of the probabilities of being in the low- or high-risk group.

In addition, we performed an association analysis between polygenic score or risk group and disease phenotype by logistic regression using the glm() function from the stats package. We also performed an association analysis between polygenic score or risk/phenotype group and cognitive performance, plasma biomarkers, and brain volume using a robust linear regression model using the lmrob() function from the robustbase package, with age, sex, and genomic structure (represented by the top five principal components) as covariates. For cognitive performance, we applied rank-based, inverse-normal transformation to the cognitive scores using the RankNorm() function from the RNOmni package, with age, sex, and genomic structure (represented by the top five principal components) as covariates. Regarding the brain imaging data, we further included intracranial volume as a covariate to normalize the possible interindividual variation in brain volume. We conducted the Spearman’s rank correlation test using the cor.test() function in R to examine the performance of models constructed from different ethnic backgrounds.

We performed a Gene Ontology enrichment analysis of the UniProt IDs in the Database for Annotation, Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/). Moreover, we performed a protein–protein interaction network analysis of the UniProt IDs in the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (https://string-db.org/). For our cluster analysis, we applied k-means clustering to separate plasma proteins into individual clusters using the kmeans() function from the stats package for the absolute values of t-statistics obtained from association tests between the levels of plasma proteins (i.e., normalized protein expression) and the polygenic scores from the five different modules. We determined the optimal number of clusters by using the elbow method implemented in the fviz_nbclust() function from the factoextra package.

For the cell-type enrichment analysis, we obtained the gene expression levels measured by RNA sequencing in individual blood cell types from the BLUEPRINT database (http://dcc.blueprint-epigenome.eu/). We performed a cell-type enrichment analysis using the TissueEnrich package. Briefly, we subjected cell-type-specific transcript levels measured as fragments per kilobase per million mapped fragments (FPKM) for 1159 plasma proteins to the teGeneRetrieval() function to first identify genes expressed by specific groups of cells (foldChangeThreshold = 1.5, expressedGeneThreshold = 5). We then used the teEnrichmentCustom() function for the enrichment analysis to identify the specific cell types associated with individual gene clusters (tissueSpecificGeneType = 1).

To stratify participants according to the outputs from the penultimate layer in the neural network model, we determined the optimal number of groups by using the elbow method implemented in the fviz_nbclust() function from the factoextra package. Then, we used the umap() function from the umap package in R to apply the uniform manifold approximation and projection (UMAP) method to project individual participants onto the two-dimensional plane for visualization. To examine individual variants’ contributions to the polygenic score, we conducted a partial correlation analysis using the pcor() function (method = “spearman”) from the ppcor package in R. We performed the annotation of variant functions by submitting the SNP rsID to the SNPnexus database (https://www.snp-nexus.org/v4/citation/)42.

Data visualization

We generated a schematic diagram of the study using Microsoft PowerPoint (v2105). We generated heatmaps of AD classification accuracy (i.e., auROC and auPRC), box plots, volcano plots, bar charts, and dot plots using GraphPad Prism (v8.3.0). We plotted ROC and PR curves using the plot() function in R. Moreover, we generated histograms of polygenic scores using the ggplot() function from the ggplot2 package with the geom_density_ridges_gradient() function from the ggridges package. We generated a heatmap to visualize distinct protein clusters using the heatmap.2() function from the gplots package. In addition, we visualized the protein–protein network using Cytoscape (v3.8.2) based on the node and interaction score information obtained from the STRING database (v11.0). We annotated candidate cis-regulatory regions and other epigenetic signatures using the SCREEN database (https://screen.encodeproject.org/)53, and visualized transcription factor binding events using the University of California Santa Cruz Genome Browser (https://genome.ucsc.edu/)54,55. We also annotated the chromatin accessible regions from human brain single-cell ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data56.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

To systematically evaluate the performance of different polygenic score models for AD risk classification, we obtain the genotype and phenotype data of three AD cohorts: the ADNI cohort57, the LOAD cohort58, and the ADC cohort59,60 (N = 11,352 comprising 6681 patients with AD and 4671 NCs; Supplementary Table 1). For model construction, we select AD-associated variants from the AD GWAS summary statistics reported by Jansen et al13. with three p-value thresholds—1E−4, 1E−6, and 1E−8—which yielded 8100, 2959, and 1799 SNPs, respectively. (Supplementary Table 2). Figure 1 shows a schematic flow diagram of the study.

Fig. 1: Study schematic.
figure 1

Schematic diagram showing the study design. AD, Alzheimer’s disease; ADC, National Institute on Aging Alzheimer’s Disease Center cohort; ADNI, Alzheimer’s Disease Neuroimaging Initiative cohort; ATN, amyloid-beta, tau, and neurofilament light polypeptide; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; LOAD, Late Onset Alzheimer’s Disease Family Study cohort; MCI, mild cognitive impairment; n, number of samples or variants; NC, normal control; NN, neural network; PR, precision–recall; PRS, polygenic risk score; p, p-values; p-tau181, tau phosphorylated at threonine-181; ROC, receiver operating characteristic; TNF, tumor necrosis factor; WGS, whole-genome sequencing.

Evaluation of the weighted polygenic risk models for Alzheimer’s disease risk prediction

To examine the performance of the weighted PRS models for classifying AD risk, we calculate PRSs based on the effect sizes (i.e., weights) from the AD GWAS summary statistics reported by Jansen et al.13. Besides including all the variants from that study, we also include the variants that reside outside the APOE locus (chr19:44000000-46000000; GRCh38)—the region that harbors the most prevalent risk factor for AD—to estimate their polygenic risk effects. Meanwhile, we apply LD-based clumping to obtain the minimum number of variants needed for classifying disease risk. We evaluate model performance by calculating the auROC and auPRC, with higher values indicating more accurate AD classification. We show that the weighted PRS model constructed from the variant set with the greatest number of variants (n = 8100; p < 1E−4) after LD-based clumping yields the highest classification accuracy (auROC: ~0.67; Supplementary Fig. 1; Supplementary Table 3). Moreover, only including the genetic variants outside of the APOE locus provide enough information for AD risk classification as suggested by auROCs from ~0.57 to 0.59 (Supplementary Fig. 1; Supplementary Table 3).

To assess the performance of different weighted PRS models for classifying AD risk, we conduct a parallel weighted PRS analysis (designated wPRS2) using the effect sizes from the summary statistics of the International Genomics of Alzheimer’s Project (IGAP) 2019 Rare Variant Analysis stage 1 data12. We find no significant differences in the accuracy of AD risk classification scores generated from the two sets of the AD GWAS summary statistics (Supplementary Fig. 1; Supplementary Table 3). Meanwhile, we also construct modified weighted PRS models using different tools including LDpred35, Winner’s curse correction36, AnnoPred37, and SBayesR35. These modified models do not significantly improve the accuracy of AD risk classification (auROC: ~0.67; Supplementary Fig. 1; Supplementary Table 4). Together, these findings demonstrate that genetic information can be used for AD risk classification (auROC: ~0.67 from the weighted PRS models), providing a basis for further evaluation of the performance of neural network models for classifying AD risk.

The neural network model outperforms both lasso and weighted polygenic risk score models for Alzheimer’s disease risk prediction

To evaluate the performance of a neural network model for predicting AD risk, we construct a seven-layer neural network model for disease risk classification with the same sets of variants used to construct the weighted PRS models (see above: 8100, 2959, and 1799 variants). In addition, we construct a lasso model to model polygenic risk in each scenario as a comparison, because we previously showed that polygenic scores derived from lasso models can be used for disease risk classification38.

First, to examine the potential of using the three models for disease risk classification, we construct the models based on all data from the three AD cohorts (N = 11,352; Supplementary Table 1). We find that for all three models, including more SNPs in the model construction increased the accuracy of AD risk classification (Supplementary Fig. 2; Supplementary Table 5). In particular, when we include 8100 SNPs in the model construction, the prediction accuracy of the neural network model is nearly perfect (auROC = 1.00) and significantly higher than that of both the lasso (auROC = 0.94; p < 0.001) and weighted PRS models (auROC = 0.71; p < 0.001; Supplementary Fig. 2; Supplementary Table 5). However, the high auROC values (>0.90) obtained from the neural network and lasso models suggest possible overfitting during the model training steps. Therefore, the model performance should be further evaluated and compared with samples that are independent of those used in the model training.

To mitigate overfitting, we conduct a five-fold cross-validation analysis that trains the model using 80% of the data and tests model performance with the remaining 20% of the data. Again, for all three models, including more SNPs improves AD risk classification accuracy. Moreover, when we use 8100 SNPs in the model construction, the neural network model (auROC = 0.73) exhibits greater prediction accuracy than both the lasso (auROC = 0.72; p < 0.001) and weighted PRS models (auROC = 0.69; p < 0.001; Supplementary Fig. 3; Supplementary Table 6). Therefore, our findings suggest that the neural network model predicts AD risk better than both the weighted PRS and lasso models.

To evaluate the performance of the three models (i.e., the weighted PRS, lasso, and neural network models) for predicting disease risk across different cohorts, we train the models using 70% of the ADC and LOAD data and then evaluate and fine-tune the models using the remaining 30% of the data. We then validate the model performance in the ADNI dataset. Again, including more SNPs in the model construction achieves higher classification accuracy (Fig. 2a, b; Table 1; Supplementary Fig. 4; Supplementary Table 7). Moreover, when we use the same number of SNPs for the model construction, the neural network model outperforms the weighted PRS and lasso models for AD risk classification. For example, when we include 8,100 SNPs in the model construction, the auROCs for the weighted PRS, lasso, and neural network models in the ADC cohort is 0.70, 0.81, and 0.84, respectively; the auPRCs are 0.84, 0.89, and 0.92, respectively (Fig. 2a–d; Table 1; Supplementary Table 7). Of note, the neural network model also performs better for classifying AD risk than the other two models in the ADNI cohort (as suggested by higher auPRC values) (Fig. 2; Table 1; Supplementary Table 7). Moreover, to avoid overestimating the model performance, we remove potential duplicate samples (n = 415) inferred by our identity-by-descent analysis (PI_HAT > 0.90) and reconstruct the models. Consistently, we show that the neural network model outperformed the other two models as suggested by the higher auROC and auPRC values (Supplementary Fig. 5; Supplementary Table 8). Hence, our findings demonstrate the superiority of the neural network model for AD risk classification.

Fig. 2: Application of the weighted polygenic risk score, lasso, and neural network models for Alzheimer’s disease risk classification.
figure 2

a, b Performance of the wPRS, lasso, and NN models for classifying patients with AD as indicated by (a) auROCs and (b) auPRCs. The variant pools used for model construction were selected according to the p-value cutoffs shown on the left side of each panel. c, d Representative plots showing the AD risk classification accuracy of different models constructed using variants with p < 1E−4 in individual cohorts. c ROC curves and d PR curves showing AD risk classification accuracy in different cohorts. AD, Alzheimer’s disease; ADC, National Institute on Aging Alzheimer’s Disease Center cohort; ADNI, Alzheimer’s Disease Neuroimaging Initiative cohort; auPRC, area under the precision–recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; LOAD, Late Onset Alzheimer’s Disease Family Study cohort; NN, neural network; p, p-value; PR, precision–recall; ROC, receiver operating characteristic; wPRS, weighted polygenic risk score.

Table 1 Model performance for Alzheimer’s disease classification.

Effects of confounding factors on Alzheimer’s disease risk prediction

Age and sex are risk factors for AD61. Ethnicity also influences AD risk, as the risk effects of specific genetic variants can vary across ethnic groups62. Hence, we assess the performance of these polygenic score models in subgroups of people stratified by age, sex, or ethnicity. For weighted PRS models constructed using the GWAS results of European-descent populations, we find significantly lower accuracy of AD risk classification in people of African-American descent (n = 713; auROC = 0.60; p < 0.001) and Latin-American descent (n = 604; auROC = 0.60; p < 0.001) than in people of European descent (n = 9940; auROC = 0.69). On the other hand, the neural network model exhibits similar accuracy for classifying AD risk between people of European descent (auROC = 0.80) and African-American descent (auROC = 0.84) but lower accuracy in people of Latin-American descent (auROC = 0.77; p < 0.05; Supplementary Fig. 6; Supplementary Table 9). In addition, in people of European descent, we observe similar classification accuracy between males and females (Supplementary Fig. 7; Supplementary Table 10), while older age groups (≥72 years old) showed higher classification accuracy than younger groups (<72 years old; p < 0.05; Supplementary Fig. 8; Supplementary Table 11). Hence, our results suggest that polygenic score models may exhibit variable performance in AD risk classification among people of different ages and ethnic backgrounds.

Polygenic score models for Alzheimer’s disease in the Chinese population

To further test the performance of these neural network models for classifying AD risk in non-European–descent populations, we apply the models to two Chinese AD cohorts with available WGS data: Chinese WGS cohort 1 (N = 2340 comprising 1116 patients with AD, 309 patients with MCI, and 915 NCs)18 and Chinese WGS cohort 2 (N = 1077 comprising 356 patients with AD, 68 patients with MCI, and 653 NCs) (Supplementary Table 1)38. Notably, the weighted PRS models constructed based on the AD GWAS summary statistics of Jansen et al. show poor classification accuracy for both Chinese WGS cohorts 1 and 2 (auROCs: ~0.50; Supplementary Figs. 9, 10; Supplementary Tables 1215). Meanwhile, the lasso and neural network models constructed based on three AD cohorts (i.e., ADC, LOAD, and ADNI) classify AD risk in the two Chinese cohorts with moderate accuracy (auROC = 0.63–0.67), although less so than that in the European-descent populations (auROC = 0.72–0.73; Supplementary Fig. 3). Hence, the variants selected based on the AD GWAS summary statistics of Jansen et al. are not representative of AD risk in the Chinese population and are thus unsuitable for constructing polygenic score models for AD in this population.

To obtain variants that are associated with AD in the Chinese population for modeling AD polygenic risk, we gather the AD-associated variants reported from several AD GWASs undertaken across people of different ethnic backgrounds10,12,13,14,15,17,19,63,64, which yielded 216 AD GWAS hits that may contribute to AD (Supplementary Tables 16; Supplementary Data 1). Logistic regression analysis including age, sex, and genomic structure (represented by the top five principal components) as covariates show that 38 of the 216 SNPs were significantly associated with AD in the Chinese population (in either Chinese WGS cohort 1 or 2; Supplementary Data 2, 3). A meta-analysis of the two Chinese cohorts showed that among these 38 SNPs, 33 are significantly associated with AD (meta-p < 0.05; Table 2; Supplementary Data 4) and an additional four SNPs (i.e., rs16824536, rs9271058, rs61732533, and rs111278892) exhibit concordant risk trends in both cohorts (Supplementary Data 4). Thus, we find 37 variants that have been reported in European AD GWASs and are associated with AD in the Chinese population that are useful for modeling AD polygenic risk in the Chinese population.

Table 2 Variants significantly associated with Alzheimer’s disease in the two Chinese Alzheimer’s disease whole-genome sequencing cohorts.

Using these 37 AD-associated SNPs, we calculate the polygenic scores using the weighted PRS, lasso, and neural network models in Chinese WGS cohort 1 (see the “Methods” section; Supplementary Data 5). The weighted PRS and lasso models for AD risk classification yielded auROCs of 0.64 and 0.71, respectively (Fig. 3a; Supplementary Fig. 11; Supplementary Table 17), suggesting that the abovementioned variants can be used to classify people at risk of AD in the Chinese population. The modified PRS models (i.e., SBayesR and Winner’s curse models) do not show superior performance for AD classification compared to the weighted PRS model (Supplementary Fig. 12). Again, we find that using the variants residing outside the APOE locus is sufficient to distinguish patients with AD from NCs (auROC = 0.61; Supplementary Fig. 11; Supplementary Table 17). Thus, we demonstrated that variants in the non-APOE region contribute to AD pathogenesis, corroborating the findings of other AD polygenic score studies65,66 and our results in the previous section.

Fig. 3: Polygenic risk analysis for Alzheimer’s disease in the Chinese population.
figure 3

a ROC and b PR curves of the polygenic score classification of patients with AD in Chinese WGS cohort 1. c Distribution of polygenic risk scores derived from the NN model for each phenotype group. The definitions of the low-, medium-, and high-risk groups are shown in the upper panel. d Percentages of each phenotype group in the low-, medium-, and high-risk groups. eh Associations between polygenic risk score and MMSE score in e all participants, f non-AD participants (i.e., NCs plus patients with MCI), g APOE-ε3 homozygous carriers, and h APOE-ε4 carriers. Data are presented as box-and-whisker plots. Boxes indicate the 25th to 75th percentiles, and whiskers indicate the 10th and 90th percentiles. The numbers of individuals in the corresponding group are shown at the bottom of each plot. Robust linear regression model: ***p < 0.001, **p < 0.01, *p < 0.05. AD, Alzheimer’s disease; auPRC, area under the precision–recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; MCI, mild cognitive impairment; MMSE, Mini–Mental State Examination; NC, normal control; NN, neural network; p, p-values; PR, precision–recall; ROC, receiver operating characteristic; wPRS, weighted polygenic risk score.

Next, we evaluated whether the neural network model also exhibits better performance for predicting AD in the Chinese population than the weighted PRS and lasso models. Notably, in Chinese WGS cohort 1, the neural network model (auROC = 0.77; auPRC = 0.77) distinguishes patients with AD from NCs more accurately than the weighted PRS (auROC = 0.66; auPRC = 0.71; p < 0.001) and lasso regression models (auROC = 0.71; auPRC = 0.74; p < 0.001) (Fig. 3a; Supplementary Fig. 11; Supplementary Table 17). In addition, the neural network model classifies individuals with MCI with higher accuracy than the other two models (p < 0.01; Supplementary Fig. 11; Supplementary Table 17). To further validate the above results, we examine the accuracy of these models for classifying AD risk in Chinese WGS cohort 2. Notably, the lasso regression (auROC = 0.63; auPRC = 0.51) and neural network models (auROC = 0.63; auPRC = 0.53) perform similarly for AD risk classification and perform slightly better than the weighted PRS model (auROC = 0.62; auPRC = 0.49; Supplementary Fig. 11; Supplementary Table 17). Hence, our analyses in the Chinese population demonstrate the applicability of the neural network model for AD risk classification modeling.

As the selected 37 variants are significantly associated with AD in both European-descent and Chinese populations, they can likely be used to classify AD risk in both populations. Interestingly, by conducting the five-fold cross-validation analysis using these 37 variants separately in European-descent and Chinese populations, the resultant polygenic score models could classify AD risk in both populations (European-descent: auROC = 0.68–0.72; Chinese: auROC = 0.66–0.69) (Supplementary Fig. 13; Supplementary Table 18). In particular, the lasso and neural network models constructed from the 37 variants exhibit comparable (or better) performance than the models constructed based on the variants selected by p-value thresholds (i.e., 8100, 2959, and 1799 variants) in both the European-descent and Chinese populations (Supplementary Figs. 14, 15). Furthermore, the polygenic score models constructed using the 37 variants based on Chinese data can classify AD risk in the European-descent population (auROC = 0.62–0.65; auPRC = 0.70–0.73), and the models using the same 37 variants based on European-descent data can classify AD risk in the Chinese population (auROC = 0.60–0.67, auPRC = 0.69–0.72; Supplementary Fig. 16; Supplementary Table 19). In addition, the neural network models constructed based on the 37 variants perform significantly better for classifying AD risk than the models constructed with the 216 AD GWAS hits or with other sets of 37 variants randomly selected from the 216 AD GWAS hits (p < 0.05; Supplementary Fig. 17). Thus, the polygenic score models based on the 37 variants can be used for modeling and classifying AD risk in both Chinese and European-descent populations.

Performance of the neural network model for Alzheimer’s disease risk classification in the Chinese population

As the neural network model using the 37 variants show superior performance for classifying AD risk, we examine whether it could stratify individuals with different levels of disease risk. Accordingly, the scores calculated using the neural network model (neural network risk scores hereafter) for individuals in the Chinese cohorts show clear separation between individuals with the lowest and highest scores. We apply a multiple Gaussian fitting model to the neural network risk scores to stratify individuals into low-, medium-, and high-risk groups (see the “Methods” section, Fig. 3c, and Supplementary Fig. 18). Compared with the low-risk group, the medium- and high-risk groups included larger proportions of patients with AD in both Chinese WGS cohorts (e.g., for Chinese WGS cohort 1, patients with AD made up 22.2%, 49.4%, and 70.9%, of the low-, medium- and high-risk groups, respectively) (Fig. 3d; Supplementary Fig. 18; Supplementary Table 20). Furthermore, individuals in the high-risk group have a greater risk of developing AD and MCI than those in the low- or medium-risk group (p < 1E−5 and 1E−3 for Chinese WGS cohorts 1 and 2, respectively; Supplementary Table 20). Moreover, in Chinese WGS cohort 1, individuals in the medium-risk group have higher risks of AD (p < 2E−16) and MCI (p = 1.25E−2) than those in the low-risk group (Supplementary Table 20). Thus, the neural network model can be used to stratify people into subgroups based on their relative risk of developing a disease.

To determine the relevance of the neural network risk scores on clinical outcomes, we examine the association between individuals’ scores and their cognitive functioning after controlling for confounding factors (i.e., age, sex, and genomic structure). Notably, in Chinese WGS cohort 1, the neural network risk scores are significantly associated with cognitive functioning as measured by the Mini–Mental State Examination (MMSE) in all participants (p < 2E−16), patients with MCI plus NCs (p = 3.10E−04), patients with MCI (p < 0.05), APOE-ε3 homozygous participants (p < 2E−16), and APOE-ε4 carriers (p = 2.18E−07) (Fig. 3e–h; Supplementary Fig. 19; Supplementary Table 21). In addition, in Chinese WGS cohort 2, the neural network risk scores are significantly associated with the Montreal Cognitive Assessment (MoCA) scores of patients with AD plus NCs (Supplementary Fig. 18) as well as those of patients with MCI (p < 0.05; Supplementary Fig. 19). Hence, the neural network risk scores calculated herein can predict cognitive functioning in the Chinese population.

Determination of the pathological mechanisms of Alzheimer’s disease according to polygenic scores

To investigate the mechanisms whereby the identified variants (i.e., SNPs) modulate disease risk, we examine the associations between polygenic risk and AD endophenotypes in Chinese WGS cohort 238. We show that the neural network risk scores are significantly associated with levels of the blood-based ATN biomarkers of classical AD pathology—Aβ, tau phosphorylated at threonine-181 (tau/p-tau181), and NfL—which reflect the progression and severity of AD67 (Fig. 4a; Supplementary Table 22). Detailed analysis shows that the associations between polygenic scores and plasma biomarker levels are significant in NCs but not in patients with AD, suggesting that the AD risk variants modulate AD-associated pathways independent of disease state (Fig. 4a–d; Supplementary Table 22). Moreover, among all participants, polygenic scores are significantly associated with changes in the volumes of specific brain regions68,69 including the amygdala (p = 6.53E−03), grey matter (p = 1.21E−02), and hippocampus (p = 4.92E−02) (Supplementary Fig. 20; Supplementary Data 6). Moreover, polygenic scores are significantly associated with white matter hyperintensity, which is a marker of demyelination and axonal loss in the brain70 (p = 2.69E−02; Supplementary Fig. 20; Supplementary Data 6). Hence, our results suggest that AD polygenic risk is associated with known AD biomarkers, particularly in people who have not yet developed AD.

Fig. 4: Modulatory effects of polygenic risk for Alzheimer’s disease on plasma protein biomarkers in normal controls.
figure 4

a Associations between the polygenic risk scores derived from the corresponding models and the levels of plasma ATN biomarkers (i.e., Aβ42, Aβ40, Aβ42/Aβ40 ratio, tau, p-tau181, and NfL) in all participants, NCs, and patients with AD. bd Plasma Aβ42 level (b), Aβ42/Aβ40 ratio (c), and p-tau181 level (d) in NCs stratified according to polygenic risk score group. e Volcano plots showing the associations between polygenic risk scores and plasma protein levels obtained from the high-throughput assay. f, g Levels of the candidate plasma proteins f PLTP and g CCL19 in NCs stratified according to polygenic risk score group. h Overrepresented Gene Ontology terms for plasma proteins associated with polygenic risk scores (p < 0.05). i Protein–protein interaction network of cytokines associated with polygenic risk scores. The gray nodes are the five proteins most strongly associated with the other nodes. Line color and thickness indicate the interaction strength of the connected nodes (darker and thicker lines denote stronger interactions). bd, f, g Data are presented as box-and-whisker plots. Boxes indicate the 25th to 75th percentiles, and whiskers indicate the 10th and 90th percentiles; numbers of individuals in the corresponding group are shown at the bottom of each plot. Robust linear regression: ***p < 0.001, **p < 0.01, *p < 0.05; robust linear regression model. e, h, i Colors denote plasma proteins or results derived from proteins that were positively (red) or negatively (blue) correlated with polygenic risk scores. Aβ, amyloid-beta; AD, Alzheimer’s disease; CCL19, chemokine ligand 19; MCI, mild cognitive impairment; NC, normal control; NfL, neurofilament light polypeptide; NN, neural network; p-tau181, tau phosphorylated at threonine-181; PLTP, phospholipid transfer protein; TNF, tumor necrosis factor.

To better understand how AD polygenic risk is associated with endophenotypic changes regardless of disease state, we comprehensively analyze the associations between polygenic scores and the changes in the levels of 1,160 plasma proteins that potentially reflect changes in multiple biological pathways in NCs (n = 69). The polygenic scores are significantly associated with the levels of 80 plasma proteins; among these proteins, PLTP (phospholipid transfer protein; p = 2.67E−03), which is involved in cholesterol metabolism71, and CCL19 (chemokine ligand 19; p = 6.65E−07), a cytokine involved in inflammation72, are the most strongly associated with the polygenic scores (Fig. 4e–g; Supplementary Data 7). Specifically, Gene Ontology enrichment analysis suggests that the polygenic scores are associated with plasma proteins involved in TNF-α– and cytokine-related pathways, which are closely related to the immune system73 (Fig. 4h; Supplementary Data 8, 9). Furthermore, protein–protein interaction network analysis of those plasma proteins involved in cytokine-related pathways again suggests their enriched interaction (enrichment p < 1E–16; Fig. 4i; Supplementary Table 32). Together, these results show that AD polygenic risks may modulate immune-associated signaling pathways in the blood.

Using neural network models to study disease mechanisms

Given that AD polygenic risks are possibly related to the involvement of multiple biological pathways in disease pathogenesis, the effects of individual variants on PRSs may partly reflect the contributions of corresponding biological pathways associated with specific genetic variants to the disease. Such effects may not be adequately captured by a single ultimate score derived from polygenic score models but rather by the intermediate outputs of the penultimate layer in neural network models. In our neural network model, the penultimate layer summarizes the polygenic effects of the 37 SNPs into five nodes (Fig. 5a); thus, the outputs from these five nodes may represent distinct genetic risks that affect different biological processes. Accordingly, we find that the outputs from the five nodes are not perfectly correlated (Fig. 5b), suggesting that they contain more information (i.e., polygenic risks) than the final polygenic score. Therefore, we designate each node in the penultimate layer as one module that may account for a distinct biological effect.

Fig. 5: Biological pathways modulated by the polygenic risk variants of Alzheimer’s disease.
figure 5

a Diagram showing the calculation of polygenic risk scores using the NN model. The five nodes in the penultimate layer were designated modules M1–M5. b Associations between the polygenic risk scores derived from the NN model and the outcomes of the five modules. c Heatmap showing the clusters of plasma proteins significantly associated with each module. The proteins formed four clusters (designated C1–C4) with respect to the absolute values of t-statistics. The number of proteins in each cluster are indicated in the plot. Representative Gene Ontology terms and cell-type enrichment analysis results are displayed in the center and right panels, respectively. d Protein–protein interaction network of proteins expressed by B cells. Colors denote proteins from C1 (red) and C4 (blue). e Cell-type-specific expression of TCL1A. (f) Plasma levels of TCL1A protein in NCs (n = 69) and patients with AD (n = 97). Data are presented as box-and-whisker plots. Boxes indicate the 25th to 75th percentiles, and whiskers indicate the 10th and 90th percentiles; numbers indicate the numbers of individuals in the corresponding group. Robust linear regression: **p < 0.01. AD, Alzheimer’s disease; FDR, false discovery rate; FPKM, fragments per kilobase per million mapped fragments; NC, normal control; NK, natural killer; NN, neural network; p, p-values; TCL1A, TCL1 family AKT coactivator A; TNF, tumor necrosis factor.

To understand the biological effects of the individual modules, we construct a multivariate model that simultaneously incorporates the outcomes of the five modules to determine their associations with individual endophenotypes (i.e., plasma protein levels). Notably, the levels of 336 plasma proteins are significantly associated with the outcomes of the five modules (Supplementary Table 23; Supplementary Data 10). Furthermore, unsupervised clustering analysis shows that these plasma proteins could be classified into four clusters (designated C1–C4) with distinct biological functions (Fig. 5c). For instance, the plasma proteins classify into C1, C3, and C4 are associated with immune pathways; those in C3 and C4 are associated with cell communication; and those in C1 and C4 are associated with TNF-α–related signaling (Fig. 5c).

Accordingly, we hypothesize that the effects of specific risk variants on gene expression regulation—possibly in specific cellular contexts—underlie the observed associations between polygenic risk and plasma protein levels74. Thus, to determine whether specific plasma proteins are predominantly expressed in specific blood cell types, we conduct a cell-type enrichment analysis of the plasma proteins in each cluster. Interestingly, the plasma proteins in C1 and C4 are expressed by B cells, those in C2 by erythroblasts and megakaryocytes, and those in C4 by dendritic cells and eosinophils (Fig. 5c; Supplementary Tables 24, 25). Furthermore, protein–protein interaction network analysis reveals that the proteins expressed by B cells are closely interconnected (enrichment p = 1E − 12; Fig. 5d; Supplementary Tables 26, 27). Specifically, the plasma protein TCL1A (TCL1 family AKT coactivator A), which is uniquely expressed by B cells and associated with B-cell maturation75, is modulated by polygenic risks; furthermore, its plasma level is altered in patients with AD compared with that in NCs (Fig. 5e, f). Therefore, these results demonstrate that AD polygenic risks are associated with specific biological pathways in a cell-type-specific manner.

To evaluate whether changes in neural network architecture affect the effects of specific risk variants on gene expression regulation, we modify the neural network structure by changing the numbers of nodes in the penultimate layer from five to two, three, or 10 to examine whether the same plasma protein sets can be obtained from the association analysis. First, we find that the neural network risk scores obtain from the modified models are highly correlated (R2 > 0.88; Supplementary Fig. 21a, b). In addition, these modified models recover >80% of the plasma proteins that are previously identified to be associated with the neural network risk scores (i.e., p < 0.10; Supplementary Fig. 21c, d). Furthermore, for the neural network model with three nodes in the penultimate layer, the analysis again highlights the associations between polygenic risks and immune-associated signaling pathways such as TNF-α– and cytokine-related pathways (Supplementary Fig. 21e, f). Therefore, these findings further strengthen our conclusions on the association of AD polygenic risk with immune-associated pathways.

Using neural network models to stratify people at risk of developing Alzheimer’s disease

The intermediate outputs from the neural network model capture polygenic risks that correspond to multiple biological pathways implicated in AD pathogenesis. Therefore, it is of interest to examine whether this model can stratify people into subgroups based on the polygenic risks estimated by those intermediate outputs. Accordingly, we subject the outputs from the penultimate layer of the neural network model to unsupervised clustering analysis and then subcluster the participants from Chinese AD WGS cohort 2 into five groups. Of note, the NCs in Groups 4 and 5 showed lower plasma levels of Aβ (p < 0.05) and an increased trend of plasma p-tau181 and NfL levels compared with the NCs in Groups 1–3 (Fig. 6a–c). Further association analysis identifies four clusters of plasma proteins that exhibited altered expression patterns among the five groups of individuals. Gene Ontology and pathway analysis reveal that the altered pathways included axon (p = 1.30E−06), neuron projection (p = 1.10E−05), and receptor activity (p = 2.80E−03) (Fig. 6d). Thus, the neural network model can be used to classify AD risk for individuals as well as provide insights into the disease mechanisms based on their polygenic risk information.

Fig. 6: Stratification of individuals by polygenic risk score from neural network models.
figure 6

a K-means clustering of the individuals in the Chinese AD WGS cohort 2 dataset according to the five sub-scores from the NN model. b Proportion of NCs in each group. c Levels of plasma ATN biomarkers in individual groups (n = 16, 41, 22, 29, and 34 individuals in Groups 1–5, respectively). Data are presented as mean ± SEM and analyzed using one-way ANOVA followed by Bonferroni’s post hoc test. *p < 0.05. d Heatmap of association t-values between plasma protein levels detected by two neurology panels and individual groups. According to their t-values, proteins were divided into four clusters using the k-means method (number of proteins in each cluster = 46, 35, 67, 35, from top to bottom, accordingly). e Pathway and Gene Ontology enrichment analysis results for proteins in each cluster. Aβ, amyloid-beta; AD, Alzheimer’s disease; ATN, amyloid-beta, tau, and neurofilament light polypeptide; FDR, false discovery rate; NC, normal control; NfL, neurofilament light polypeptide; NN, neural network; p-tau181, tau phosphorylated at threonine-181; SEM, standard error of the mean; UMAP, Uniform Manifold Approximation and Projection.

Modeling of disease risk by polygenic score

To identify which variants play critical roles in our neural network model for AD risk classification, we prioritize the variants according to their biological properties and use partial correlation analysis to estimate their relative contributions to the final neural network risk scores (Supplementary Fig. 22). Interestingly, the variants involved in the regulation of biological functions (e.g., residing in coding regions or transcription factor binding regions) showed greater contributions to the obtained polygenic scores (Supplementary Fig. 22a). For instance, coding variant rs429358, which encodes APOE-ε4 and is one of the most well-accepted AD genetic risk factors, is significantly correlated with the obtained risk scores (Spearman’s rho = 0.24, p < 0.001; Supplementary Fig. 22a). Meanwhile, the noncoding variant rs439401, identified as an AD risk factor that exerts a risk effect independent of the APOE-ε4 genotype76, is also significantly associated with the obtained risk scores (Spearman’s rho = 0.05, p < 0.001; Supplementary Fig. 22a). Of note, rs439401 resides in the regulatory region and occupies the transcription factor-binding regions, which may influence the expression of specific genes (Supplementary Fig. 22b, c). Furthermore, our genotype–expression analysis reveals the association between rs439401 and altered APOE expression in skin tissues (Supplementary Fig. 22d). Meanwhile, brain single-cell ATAC-seq data suggests that rs439401 resides in the open chromatin regions of specific brain cells, further supporting the roles of rs439401 in regulating APOE gene in the brain (Supplementary Fig. 22e)56. Thus, variants with specific biological functions might have a stronger effect on modulating disease risk, making them more informative for classifying disease risk.

Discussion

Here, we present the first deep learning-based polygenic score analysis for AD to the best of our knowledge. We evaluate the performance of weighted PRS, lasso, and neural network models for predicting AD risk based on genetic information and show that the deep learning model classifies disease risk more accurately than the weighted PRS and lasso models. When classifying clinically diagnosed AD patients, the best auROC our neural network model achieved is 0.84, which is higher than other recently reported results based on the weighted PRS model (auROC = 0.74)30. Meanwhile, by associating the risk scores (as well as the outputs of the hidden layers) from the neural network model with the disease-associated endophenotypes (e.g., cognitive function and the plasma proteome), we identify how AD polygenic risk may be correlated with pathophysiological changes in individual patients. Furthermore, we show that deep learning methods can stratify people at risk of developing diseases into subgroups according to their polygenic risks (Fig. 6)77. Thus, this study highlights the potential of using deep learning methods to investigate disease mechanisms and stratify at-risk people into subgroups, thereby paving the way to develop precision medicine for early disease intervention.

While the neural network model can be used for polygenic risk analysis of AD, there is room to improve the model’s performance. First, incorporating more variants into the classification model may better capture the genetic signatures that contribute to the disease, thereby increasing the accuracy of disease classification. Meanwhile, misdiagnoses and misclassification of patients (or NCs) may affect the accuracy of the model; this can be improved by better defining the classification of the disease with disease biomarkers such as brain amyloid load and levels of fluid biomarkers (e.g., Aβ and p-tau181) for AD64,78. As most genetic and polygenic risk analyses are performed in European-descent populations, it would also be beneficial to conduct more studies in non-European–descent populations to better understand the disease-associated genetic risks and develop customized polygenic score models for early risk prediction in distinct ethnic populations79.

Disease-associated variants may modify disease risk by affecting specific biological processes. Notably, our results suggest that functional variants are likely to contribute more to the polygenic risk model when modeling the disease risk (Supplementary Fig. 22). Thus, incorporating the biological properties of variants may enhance the model’s accuracy for classifying AD risk. Accordingly, we construct a graph neural network model by integrating allele dosage, annotated functions, and the LD of variants, which exhibited superior classification accuracy compared with the weighted PRS model (p < 0.001; Supplementary Fig. 23). Thus, it is critical to conduct further research on the interpretability of deep learning models80,81 and the usefulness of different types of deep learning models (e.g., the graph neural network model) for modeling disease risk to gain a comprehensive understanding of disease mechanisms and develop more accurate models for disease risk forecasting using genetic data.

Taken together, our results suggest the utility of deep learning methods for predicting disease risk and stratifying people at risk of developing diseases into subgroups as well as their potential applications in uncovering disease mechanisms. Further studies are required to explore the utility of these methods for predicting disease risk at a population scale as well as their potential applications in disease mechanism studies and therapeutic development.