Prediction of Alzheimer’s disease using blood gene expression data

Lee, Taesic; Lee, Hyunju

doi:10.1038/s41598-020-60595-1

Download PDF

Article
Open access
Published: 26 February 2020

Prediction of Alzheimer’s disease using blood gene expression data

Scientific Reports volume 10, Article number: 3485 (2020) Cite this article

16k Accesses
55 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Identification of AD (Alzheimer’s disease)-related genes obtained from blood samples is crucial for early AD diagnosis. We used three public datasets, ADNI, AddNeuroMed1 (ANM1), and ANM2, for this study. Five feature selection methods and five classifiers were used to curate AD-related genes and discriminate AD patients, respectively. In the internal validation (five-fold cross-validation within each dataset), the best average values of the area under the curve (AUC) were 0.657, 0.874, and 0.804 for ADNI, ANMI, and ANM2, respectively. In the external validation (training and test sets from different datasets), the best AUCs were 0.697 (training: ADNI to testing: ANM1), 0.764 (ADNI to ANM2), 0.619 (ANM1 to ADNI), 0.79 (ANM1 to ANM2), 0.655 (ANM2 to ADNI), and 0.859 (ANM2 to ANM1), respectively. These results suggest that although the classification performance of ADNI is relatively lower than that of ANM1 and ANM2, classifiers trained using blood gene expression can be used to classify AD for other data sets. In addition, pathway analysis showed that AD-related genes were enriched with inflammation, mitochondria, and Wnt signaling pathways. Our study suggests that blood gene expression data are useful in predicting the AD classification.

APOE4 homozygozity represents a distinct genetic form of Alzheimer’s disease

Article 06 May 2024

Identifying proteomic risk factors for cancer using prospective and exome analyses of 1463 circulating proteins and risk of 19 cancers in the UK Biobank

Article Open access 15 May 2024

Alzheimer’s disease risk reduction in clinical practice: a priority in the emerging field of preventive neurology

Article 10 January 2024

Introduction

Alzheimer’s disease (AD), the most common form of dementia, is estimated to affect in 13.8 million individuals in the United States (US), with 7.0 million being aged 85 years or older by 2050¹. Based on the National Institute of Neurological, Communicative Disorders, and Stroke and Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) criteria in 1985, probable or possible AD was diagnosed based on subjective symptoms and questionnaires². Recently, the transition from symptom-based to pathophysiology-based AD diagnosis showed that AD diagnosis is mainly based on structural brain changes (MRI), molecular neuroimaging changes (positron emission tomography imaging), and alterations in cerebral spinal fluid biomarkers³. Although the elucidation of the biological basis of AD has resulted in many advancements³, early diagnostic detection of AD remains challenging.

Recent advances in biotechnology have led to full-scale analyses of the genome, transcriptome, and epigenome rather than focusing on a few biomarkers. A large-scale genome-wide association study (GWAS) of 2,032 individuals with AD and 5,328 controls was presented in 2009 and it identified variants at CLU and CR1, which were associated with AD⁴. Additionally, a meta-analysis of four previously reported GWAS datasets (17,008 AD cases, 37,154 controls) yielded 11 new loci of susceptibility to AD⁵. Recently, Xu et al. constructed an AlzData database integrating data from GWAS, eQTL, interactome, and laboratory experiments⁶, which provides all human genes with scores for association with AD, called the Convergent Functional Genomics (CFG) score^7,8.

In recent years, two large multi-center studies were conducted to identify biomarkers for early AD diagnosis and MCI progression to AD: the Europe-based ANM and US-based Alzheimer’s Disease Neuroimaging Initiative (ADNI)^9,10. Furthermore, a large amount of publicly available gene expression data on AD have been provided in the NCBI GEO¹¹. As a result, various studies, especially gene expression-based studies, have been published to uncover the informative genes associated with AD. The BrainNet study analyzed 113 samples of well-characterized postmortem brain tissues, yielding 21 genes dysregulated in AD cases¹². A study by Liang et al.¹³ consisting of 87 brain tissues samples revealed that in brain tissues of AD cases, the genes encoding subunits of the mitochondrial components showed significantly lower expression. By analyzing RNA expression from brain tissues of AD patients, Xu et al.⁶ demonstrated that an early alteration of YAP1 could promote AD.

Although several studies involving gene expression data have uncovered valuable patterns, most gene expression data were obtained from biopsy or autopsy-based samples, which are difficult to extrapolate to clinical settings. Only a few studies used blood-based expression data for uncovering key genes related to AD or predicting early AD¹⁴. Cooper et al.¹⁴ published a study consisting of 186 AD cases, 118 MCI cases, and 204 controls from three independent datasets, overall suggesting that progranulin expression levels in the blood are increased in AD and MCI.

In Table 1, we list previously published blood expression-based studies for the identification of patients with AD, especially focusing on machine learning (ML)-based studies^{15,16,17,18,19,20}. Detailed information about related works is presented in Supplementary File. The difference between the studies outlined in Table 1 and ours are as follows. First, other studies selected AD-related genes only using statistical or ML methods. We extracted AD-related genes not only via statistical methods, but also protein-protein interaction databases (DBs), transcription factor (TF) DBs, and the CFG method that integrated results from SNP, transcripts, AD-animal model, and text-mining. Besides, we validated selected genes by measuring predictive performances among different datasets. Although studies discriminating AD by blood-based transcriptomic data have been performed, it is unclear whether the classifiers trained using one AD gene expression data set can be applied to other AD data sets. Thus, here, we systematically evaluated five feature selection methods and classifiers for distinguishing individuals with AD from healthy controls (CNs) using three independent blood gene expression datasets. Lastly, we analyzed the biological functions of AD-related genes from the blood via pathway analysis, and compared the result from blood bio-signature with those from brain bio-signature.

Table 1 Summary of six studies for predicting AD using blood gene expression data.

Full size table

Methods

The ADNI consisted of participants recruited at 57 sites in the US and Canada, funded as a private-public partnership⁹. The ANM consortium is a large cross-European AD biomarker study and a follow-on DCR cohort in London¹⁰. In both ADNI and ANM, AD was diagnosed using the NINCDS-ADRDA criteria for possible or probable AD².

We employed three large-scale blood gene expression datasets: ANM1 (GEO:GSE63060), ANM2 (GEO:GSE63061), and ADNI (adni.loni.usc.edu, last downloaded 2018/8/31). The overall framework of our study is illustrated in Fig. 1. The performance was evaluated via internal validation (five-fold CV within each dataset) and external validation (training and test sets from different datasets). In the internal validation. Detailed information about performance assessment is described in the Supplementary File.

The extraction of differentially expressed genes (DEGs), logistic regression (LR), L1-LR, support vector machine (SVM), and random forest (RF) were implemented using R (version 3.4.0) and Bioconductor (release 3.8)^21,22. Variational autoencoder (VAE) and deep neural network (DNN) were implemented with C++ based TensorFlow with a Python interface²³.

Preprocessing of data

Gene expression data were produced using the Affymetrix Human Genome U 219 array for the first case-control study (ADNI), Illumina Human HT-12 v.3 Expression BeadChips for the second case-control study (ANM1), and Illumina Human HT-12 v4 Expression BeadChips for the third case-control study (ANM2).

We performed three steps of data processing. The first step involved the selection of samples and probes to be analyzed. For ADNI, we selected high-quality RNA samples with RIN ≥ 6.9, as the previous study performed²⁴, and for ANM1 and ANM2, we did not exclude any samples. The ADNI, ANM1, and ANM2 datasets consisted of 49,386, 30,063, and 29,485 probes, respectively. The median RNA expression value of ADNI, ANM1, and ANM2 were 3.897, 7.584, and 6.154, respectively, indicating that ADNI could be influenced by background noise due to relatively low gene expression intensities. To reduce the background noise of ANDI, we excluded probes with an intensity value ≤ the median of all gene expression values in 100 or more samples, as performed in the previous study²⁵. If there were multiple probes annotated in one gene, then the median value of those was selected, yielding 11,276, 21,698, and 22,338 unique probes in ADNI, ANM1, and ANM2, respectively. We selected only probes that were present in all three datasets, and 8,835 final probes were left for analysis.

The second step was normalization within each dataset and renormalization between datasets. The probe set level intensities of all three datasets were normalized by the Robust Multi-Array Analysis (RMA) method²⁶. Although each dataset was normalized, a variance or batch effect among different datasets remained. Therefore, we renormalized all three datasets to reduce the batch effect among different datasets using ComBat from the sva package in R²⁷.

The third step was selecting DEGs of patients with AD. We extracted DEGs between the control and AD in the ADNI by the significance analysis of microarrays (SAM), which is a method for identifying genes on a microarray with statistically significant changes in expression²⁸. It was developed in the context of an actual biological experiment. DEGs of AD in ANM1 and ANM2 were curated via “lmFit” and “eBayes” functions in the limma package, which is based on a linear regression method²⁹. We set the cutoffs of FDR of the SAM and limma to 0.05 and 0.01, respectively.

Feature selection

We used VAE to extract a representation from a set of input features thereby reducing the dimensions of the data³⁰. An autoencoder is a type of neural network used to learn efficient and representative information in an unsupervised manner. Specifically, VAE not only adopts the autoencoder architecture but also assumes that the distribution of encoding features is similar to that of original features³⁰. The framework of VAE is concisely described in Fig. 1 and precisely in Supplementary Fig. S1A. The structure of VAE is described in Supplementary File.

We obtained information on TF-related genes from the TRANSFAC database 7.0, which is publicly available at http://gene-regulation.com ³¹. We selected “Factor”, “homo sapiens”, and “Organism Species (OS)” as the values of “option”, “search term”, and “table field to search in”, respectively, yielding a list of 608 TF-related genes. Then, common genes between the TF-related genes and DEGs were used as input features of a classifier. Detailed information about the 608 genes and the TF-related genes overlapped with DEGs is presented in Supplementary Table S1.

To curate hub genes, we obtained gene-gene interaction (GGI) data, including 7,765 genes and 54,719 interactions from the Human Protein Reference Database, publicly available at http://www.hprd.org/ ³². The process of extracting hub genes among DEGs consisted of several steps. We mapped DEGs obtained from each training set onto the data of GGI and calculated the number of edges of DEGs. To select a specific number of edges, we applied 10 different numbers of edges as thresholds ranging from 10 to 20 to ADNI, ANM1, and ANM2. Among the 10 thresholds, we selected a value of 10, which was similar to the number of CFG-based genes. Therefore, we defined genes with more than 10 edges as hub genes, which interact with more than 10 genes in the GGI database.

The CFG approach is a translational methodology that integrates multiple lines of external evidence from human and animal model studies^7,8. There have been several studies using various CFG scoring methods, from which we selected two representative methods^6,33. The first method⁶ was to score genes validated by multi-genomic and experimental studies using the five criteria (Supplementary File). We assigned each DEG one point if the DEG satisfied one of the above five criteria, yielding a score ranging from 0 to 5 points by using a publicly accessible database at http://alzdata.org ⁶. The second method is the database-based CFG scoring method that uses external lines of evidence³³. We scored a gene as one point if the gene was included in the AD-related genes extracted from two databases, including Alzgene and DigSee^34,35, yielding a score ranging from 0 to 2. Bertram et al. constructed a publicly available, continuously updated database (AlzGene, http://www.alzgene.org) by performing systematic meta-analysis for each polymorphism with available genotype data in at least three case-control subjects³⁴. The DigSee extracted gene-disease relationships by incorporating the text-mining method and the ML technique and included 4,494 disease types and 13,054 genes³⁵. From the AlzGene and DigSee, we obtained a list of 680 and 1602 AD-related genes, respectively. Combining these two CFG scoring methods, we annotated all DEGs with numeric points ranging from 0 to 7. We defined DEGs with 3 or more points as highly informative AD genes. Lists of DEGs with CFG score (ADNI, ANM1, and ANM2) are presented in Supplementary Tables S2–S4.

Classifying methods

We utilized LR, L1-GLM, RF, SVM, and DNN as a classifying model. LR, developed by David Cox in 1958, is a standard method for binary classification³⁶. L1-LR, first suggested by Tibshirani in 1996, is an extended version of LR applied by LASSO³⁷. Due to a sparsity, the L1-LR can simultaneously perform two tasks: feature selection and classification^38,39. The L1-LR needs λ, the tuning hyperparameter that controls the degree of the penalty, which is typically set as a value that gives the best performance via CV. However, the previous study showed that most weights were penalized to 0 after selecting λ by CV²⁰. Therefore, we preliminarily applied 100 sequencing λ values ranging from 10⁻³ to 10⁻⁵. The larger lambda was selected, the smaller number of genes were selected. Of 100 λ values, we selected the 50^th λ value (λ = 0.0001) because this value is the elbow point, where the increasing tendency of the number of selected genes is reduced (Supplementary Fig. S2).

SVM, a robust ML method, is typically used to solve binary classification problems by finding a hyper-plane that maximizes the margin between two classes. Two hyperparameters are needed for the SVM algorithm: cost (C), which indicates the degree of penalty for misclassification, and gamma (γ), which defines the extent of the influence of a single training example. In this study, we adopted the Gaussian radial basis kernel for SVM. The “svm” function in the e1071 package was used to run the SVM algorithm, in which, by default, C and γ values were set to 1 and 1/(dimension of input features)⁴⁰, respectively. Note that another widely used package, Scikit-learn⁴¹, also adopts default values of C and γ, similar to those of the e1071 package.

The RF algorithm, developed by Leo Breiman, utilizes an ensemble of classification trees, which include bootstrap samples and randomly selected variables^42,43. Two hyperparameters are required in RF: the number of trees, ntree, and the number of randomly selected features, mtry. In this study, we determined ntree = 500, and ntry = \(\sqrt{{\rm{data}}\,{\rm{dimension}}}\), which were default values.

DNN is a method of learning representative features with multi levels of representation, obtained via non-linear perceptrons, each of which transforms the representation at one level to that at a higher or abstract level with reduced dimension^44,45. LeCun et al.⁴⁴ suggested that these higher levels of representation would amplify important aspects of the input for classification tasks and suppress irrelevant variations. In this study, a DNN architecture consists of two hidden layers, where the number of hidden nodes are \([\frac{{\rm{N}}}{2}]\) and \([\frac{{\rm{N}}}{4}]\) (N: number of input features) in the first and second hidden layers, respectively. We set the minimum of hidden nodes in the first and second hidden layers as 10 and 5, respectively. The hyperbolic tangent function between hidden layers and the sigmoid function in the final layer were employed. We hypothesized that more input features needed more iterations of training. Therefore, we determined the number of iteration as “N (number of input features) × 3”, and also set the minimum and maximum number of iteration as 200 and 3000, respectively. We optimized our DNN model utilizing the Adagrad optimizer⁴⁶ when an input dimension was ≥ 800 or the AdamOptimizer when an input dimension was <800⁴⁷, and a learning rate was set to 0.001.

Results

Internal validation (CV within each dataset)

For samples in ADNI, ANM1, and ANM2, the average chronological ages were 75.9, 74.1, and 76.7, and ratios of males were 49.7, 35.3, and 39.4%, respectively (Table 2). In ADNI, ages did not differ significantly between AD and CN samples, while ages in ANM1 and ANM2 showed a significant difference between AD and CN samples. The gender difference between AD and CN samples was not significant in all datasets (Table 2).

Table 2 Demographic characteristics in ADNI, ANM1, and ANM2.

Full size table

In the internal validation (five-fold CV, ADNI), DEGs ranging from 72 to 922 were curated from each training set, and are presented in Supplementary Table S5. Similarly, DEGs ranging from 850 to 1617 and those from 187 to 790 were selected in each five-fold CV for ANM1 and ANM2, respectively (Supplementary Table S5). The numbers of DEGs of ADNI varied most with a standard deviation (SD) of 352.7 while those of ANM2 were most consistent (an SD of 254.4). When all DEGs were used as input features in ADNI, DNN and RF outperformed the other methods (Fig. 2A). In ANM1, L1-LR, SVM, RF, and DNN showed AUC values greater than 0.80 (Fig. 2B). In ANM2, SVM was the best performing classifier for distinguishing AD when all DEGs were used as input features (Fig. 2C).

In ADNI, ANM1 and ANM2, when the VAE method was used for dimension reduction, the performances were improved only for the LR classifier compared with the DEGs. For the other methods, the application of VAE may lose important information (Fig. 2, Supplementary Table S5).

We extracted TF-related genes ranging from zero to 21 among DEGs of ADNI (Supplementary Table S5). In the same way, 11 to 19 TF-related genes and zero to 9 TF-related genes were selected for ANM1 and ANM2, respectively. In ADNI, ANM1, and ANM2, the mean AUC values of all classifying methods, except for LR, decreased when a set of TF-related genes were used as input features (Fig. 2).

We selected 12 to 199, 166 to 341, and 46 to 108 hub genes with more than 10 edges in the GGI network among DEGs in ADNI, ANM1, and ANM2, respectively (Supplementary Table S5). As a result, the majority of classifying methods did not show improved performance, except for LR, DNN, and LR in ADNI, ANM1, and ANM2, respectively (Fig. 2).

The CFG method was utilized to select 18 to 197 genes from DEGs in ADNI, showing that LR and L1-LR showed improved performance. In ANM1 and ANM2, after the CFG scoring was used, 850 to 1617 and 187 to 790 of DEGs were shrunk to 175 to 353 and 49 to 188 genes, respectively. Two classifying methods (LR and DNN) in ANM1 and three (LR, L1-LR, and RF) in ANM2 were better than those using all DEGs as input, respectively (Fig. 2).

For each feature selection method, most comparisons across the five classifiers did not yield significant differences in performance (measured by a paired t-test, Supplementary File) in ADNI, ANM1, and ANM2 (Supplementary Table S5), except that LR showed statistically lower performances compared to other methods in ANM1 and ANM2.

Among five feature selection approaches, on average, the DEG provided the best performance in ADNI (p = 0.109 measured by a t-test, please refer to the Supplementary File) and ANM1 (p = 0.631), and the “DEG + CFG” (p = 0.002) showed the best performance in ANM2 (Fig. 2).

External validation (Cross datasets analysis)

When 334 DEGs extracted from ADNI served as input features for predicting AD patients in ANM1, the best and second-best performing classifiers were L1-LR and SVM with 0.70 and 0.66 AUC values, respectively. When ADNI and ANM2 were used as training and test datasets with input features of the 334 DEGs, respectively, L1-LR manifested the best AUC (0.69) among classifiers. Note that the process of extracting 334 DEGs in ADNI was completely independent of ANM1 and ANM2 datasets.

When using VAE (334 to 100) and TF-related genes (seven genes), the performances were not better than that of the method using 334 DEGs were proposed when arranging ADNI and ANM (ANM1, ANM2) as training and testing datasets, respectively (Fig. 3A,B, Supplementary Table S6).

We tested the ANM1 dataset with 67 hub genes selected from ADNI and achieved lower AUC than DEGs. However, the best performance (AUC: 0.76) was acquired when testing ANM2 with the 67 hub genes from ADNI and SVM (Fig. 3B). Using 81 DEGs with CFG ≥ 3 as input features, most classifiers did not show improved performances, except for LR (ANM1) and RF (ANM2) (Fig. 3A,B).

The best performance in ANM1 (testing set 1) was an AUC of 0.70 when the 334 DEGs and L1-LR served as the feature selection method and classifying model, respectively (Fig. 3A). In the case of ANM2 (testing set 2), an AUC of 0.76 was the best result achieved using the hub method and SVM (Fig. 3B).

When 1604 DEGs from ANM1 were used for training, SVM showed the best performance with AUCs of 0.62 and 0.79 for test sets ADNI and ANM2, respectively (Fig. 3C,D). When we used VAE (1604 to 300), 18 TF-related genes, 331 hub genes, and the CFG method, performances of most classifying methods decreased. The best performances in ADNI and ANM2 occurred when DEGs and SVM were adopted as the input features and the classifier, respectively.

When 697 DEGs from ANM2 were used for training, DNN showed the best performances of 0.65 and 0.85 for both ADNI and ANM1, respectively (Fig. 3E,F). Using reduced 200 dimensions from 697 DEGs via VAE and TF-related genes as input, overall performance decreased compared with using 697 DEGs. When testing ADNI, the hub gene offered the best performance (AUC: 0.66, DNN); however, the CFG performed best when testing ANM2 (AUC: 0.86, DNN).

In the comparisons across the five classifiers, the performances of four classifiers (L1-LR, SVM, RF, and DNN) were enhanced as compared with that of LR. Among the four classifiers, most comparative analyses showed an insignificant difference in terms of performances (p-values were measured by Venkatraman’s method, refer to the Supplementary File and Supplementary Table S6).

Among five feature selection methods, on average, the DEG (p = 0.146 measured by a t-test, Supplementary File) provided the best performance compared to other methods when ADNI and ANM1 were arranged as training- and test-sets, the “DEG + Hub genes” (t-test, p = 0.2) showed the best performance between ADNI (training) and ANM2 (test), and the “DEG + CFG” (t-test, p = 0.039) yielded the best result between ANM1 (training) and ANM2 (test).

Pathway analysis of informative genes in each dataset

We found that the performances of discriminating AD were best when “DEG” or “DEG + CFG” were used as feature selection methods. Therefore, we performed pathway analysis of DEGs (334 genes of ADNI, 1604 of ANM1, and 697 of ANM2) and “DEG + CFG” (81 genes of ADNI, 334 of ANM1, and 169 of ANM2) using the KEGG pathways⁴⁸ and Gene Ontology⁴⁹ obtained from Molecular Signature Database (MSigDB)⁵⁰. We removed general pathways that consist of ≥ 500 genes. A pathway enrichment test was performed using a hypergeometric test followed by multiple comparison correction (Benjamini–Hochberg method) and pathways with q-values <0.05 were considered significantly enriched. As a result, no pathways (ADNI: DEG), 26 (ANM1: DEG), 57 (ANM2: DEG), 7 (ADNI: DEG + CFG), 340 (ANM1: DEG + CFG), and 366 pathways (ANM2: DEG + CFG) were selected (Supplementary Table S7).

In ADNI, the representative pathways enriched via 81 genes (DEG + CFG) were as follows: immune system process, ErbB signaling, and lipopolysaccharide mediated signaling pathways (Fig. 4).

When using only DEG in ANM1 and ANM2, 26 common pathways were obtained, and the following pathways were notable: mitochondrial translation, oxidative phosphorylation, and ribonucleoprotein complex biogenesis (Fig. 4). DEGs with CFG in ANM1 and ANM2 were commonly enriched in 51 pathways, such as negative regulation of canonical Wnt signaling pathway, activation of the immune response, and myeloid cell homeostasis (Fig. 4).

Common pathways among four lists of pathways (two feature selection methods and two datasets [ANM1, ANM2]) were 17, and representative pathways are as follows: ribosome, the establishment of protein localization to the endoplasmic reticulum, and multi organism metabolic process (Fig. 4). Collectively, highly informative genes in ADNI were enriched in immune and inflammatory pathways. Differentially expressed genes from ANM were associated with energy metabolism (mitochondria and oxidative phosphorylation), and “DEG + CFG” from ANM showed significant enrichment for Wnt-related and immune pathways (Fig. 4).

Common bio-signature between blood and brain AD samples

We analyzed the brain gene expression datasets (GEO: GSE33000), including 157 AD samples and 310 CN. We curated 1291 DEGs using the “lmFit” function in the limma package followed by a multiple comparison correction and a fold change (FC) (FDR <0.05 and | log₂(FC)| > 0.2). When comparing 2021 blood DEGs (union of DEGs among ADNI, ANM1, and ANM2), 140 of 1291 brain DEGs were common with blood DEGs, which were enriched with two KEGG and 31 GO pathways (Fig. 5). Representative pathways included immune response, inflammatory response, MyD88-dependent toll-like receptor signaling, and toll-like receptor 4 signaling pathways (Fig. 5).

Discussion

In this study, we identified AD-related genes by means of DEGs, the TF database, connectivity in the gene network, and CFG. Considering both internal and external validations, this study showed that all DEGs and the DEGs with CFG could accurately identify AD. Among the previous studies listed in Table 1, a study by Li et al.²⁰ classified AD patients most accurately in the ANM dataset. However, in their study²⁰, it seems that both ANM1 and ANM2 datasets were used for selecting features (six genes). After curating the six AD-related genes, they used them as input features for the SVM model, yielding AUCs of 0.86 and 0.873 when testing ANM1 and ANM2, respectively²⁰. With the same six genes and the same training- and test-set described by Li et al.²⁰, we measured prediction performances and obtained results that were consistent with those of Li et al.²⁰. Afterward, we predicted an independent data ADNI using these six genes as input features to the SVM method, and obtained AUCs 0.62 (training dataset: ANM1) and 0.57 (ANM2). This method²⁰ was not as effective as our method using DEGs as input features to the same method, which had AUCs of 0.62 (training dataset: ANM1) and 0.63 (ANM2).

The primary objective of this study was to investigate the prediction of AD patients using AD-related genes obtained across different datasets. The performances in the external validation were high between ANM1 and ANM2, consistent with the previous studies^18,19,20. However, the prediction accuracies were low between ADNI and ANM (ANM1 and ANM2), and there was no study using these datasets as external validation. Several studies have suggested the limitations of ADNI. First, gene expression differences between AD and CN in ADNI were low compared to those in ANM, yielding difficulty in extracting DEGs in ADNI. Li et al.²⁰ faced a similar problem, and selected DEGs based on nominal p-values < 0.05 because no gene passed a multiple testing correction. To overcome this limitation, we removed probes with low intensity by a previously validated method²⁵. Furthermore, we attempted several methods to curate DEGs, including a t-test, limma²⁹, and SAM²⁸, and found that SAM, which is based on the permutation, could extract DEGs (FDR < 0.05). Second, we observed that for ADNI, the numbers of DEGs varied most across five CV sets among ADNI, ANM1, and ANM2 (the highest SDs for the numbers of DEGs were as follows, ADNI: 352.7, ANM1: 309.3, ANM2: 254.4). This observation might partially explain why in the internal validation, the AUCs for ADNI were lower than those for ANM1 and ANM2. Third, the qualities of gene expression data for some samples in ADNI were low. When all samples in ANDI were used, performances in both internal and external validations were low. The AUC values of the highest performance classifiers for internal and external validation were only 0.613 (Internal validation, ADNI), 0.601 (External validation, ANM1 to ADNI), and 0.63 (External validation, ANM2 to ADNI), respectively. Thus, in this study, we used gene expression samples with RIN values ≥ 6.9, resulting in increased performances in Figs. 2 and 3.

In the external validation, L1-LR, SVM, and DNN showed the best performance. In a previous study, L1-LR and SVM performed well as a feature selection method and a classifier for AD classification, respectively²⁰. We observed that DNN did not outperform other classifiers in all cases, which might be because the relatively small size of samples was composed of high dimensional information (i.e., the number of genes) and the number of samples was insufficient to learn all perceptrons of DNN. Furthermore, when we applied several settings of DNN to improve performance, including drop-out and early stopping, we found that although the maximum AUC value increased, the range of the AUC values become more varied than that of the default setting. In future work, the performance will be improved by adjusting parameters and modifying the architecture of DNN when more data are available^51,52.

We investigated DEG genes with high CFG scores for their correlation to AD. Among 334 DEGs in the ADNI dataset, four genes (TGFB1, RAB11A, MAPK3, and RTN4) showed a maximum CFG score of five points. TGFB1 reportedly increases the risk of developing late-onset AD⁵³, ERK (MAPK3) is expectedly activated in AD brains and involved in tau phosphorylation and amyloid deposition⁵⁴. According to recent reports, two additional genes (RAB11A, RTN4) are also related to AD^55,56. Among 1604 DEGs in the ANM1 dataset, five genes (TIMP1, CD14, FADD, CAMK2G, and FCER1G) had the highest CFG score of six points. MMP-9-TIMP1 pathway was known to be stimulated by Abeta 25–35 fragment to eliminate amyloid deposition from AD brains⁵⁷. The lipopolysaccharide (LPS) receptor CD14 also reportedly contributes to neuroinflammation in AD⁵⁸. FCER1G is one of the microglial-specific genes, and the microglial is considered a major causal factor in AD⁵⁹. Among 697 DEGs in the ANM2 dataset, ten genes (DBI, CDK5R1, SORL1, CTNNA1, CTSS, CAPN1, NFKBIA, SERPINA1, CST3, and VIM) had maximum scores of five CFG points. CDK5R1 is known to be closely related to AD onset and progression⁶⁰, SORL1 interacts with the movement of APP and plays a possible role in AD progression⁶¹, whereas CTNNA1 is critical to the folding and lamination of the cerebral cortex and is involved in AD pathogenesis⁶².

We observed several enriched pathways with AD-related genes in blood samples in ADNI and ANM, which were also enriched in human or mouse brain tissues. The main pathways of ADNI were related to immune response. A study that found conserved genetic signals in mice and human brain tissues reported that genes related to early and late-stage AD were significantly enriched in immune system processes⁶³. Additionally, infection-related pathways (lipopolysaccharide mediated signaling pathway) were enriched in ADNI, which was activated in the human brain with AD⁶⁴, and associated with the increased risk of AD⁶⁵. The DEGs in ANM were significantly enriched with mitochondria-related pathways. Liang et al.¹³ demonstrated that AD cases had significantly down-regulated expression of the nuclear genes encoding subunits of the mitochondrial electron transport chain in several brain regions. The DEGs with CFG ≥ 3 in ANM were enriched with the Wnt signaling pathway, which is associated with the developmental process of the nervous system, and especially its association with synaptogenesis was validated in a mouse brain model⁶⁶ and the protective role of neurodegeneration in AD rat model⁶⁷.

In Table 2, ANM1 and ANM2 showed a significant difference in terms of age between AD and CN. Although gender difference did not significantly differ between AD and CN, other studies reported gender difference in the AD risk⁶⁸. Thus, we adjusted for age as well as gender for the ADNI, ANM1, and ANM2 datasets. Afterward, we measured AD predictive performances using five feature selection methods and five classifiers. The detailed adjustment procedure is described in the Supplementary File. As a result, we observed that classifying performances did not significantly differ after the adjustment (Supplementary Fig. S3).

There are three subtypes of AD, including preclinical AD⁶⁹, prodromal AD⁷⁰, and AD dementia. The subtypes of AD are diagnosed using the amyloid positron emission tomography (PET) scan. However, the ANM datasets (ANM1 and ANM2) did not include amyloid PET results. While the ADNI dataset had amyloid PET data, samples with gene expressions were not explicitly classified by the amyloid PET image. Thus, instead of analyzing the three subtypes of ADs, we analyzed MCI samples in the ADNI, ANM1, and ANM2 diagnosed by the NINCDS-ADRDA criteria². First, we compared FCs between two pairs of datasets, AD vs. CN and MCI vs. CN, for each dataset (Supplementary Fig. S4). In detail, we compared log₂FC for all genes (n = 8,835), DEGs (n = 334, 1604 and 697 for ADNI, ANM1, and ANM2, respectively), and DEG with CFG (n = 81, 334, and 169 for ADNI, ANM1, and ANM2, respectively) using Spearman correlation. As a result, we observed positive correlations between log₂ (AD/CN) and log₂ (MCI/CN) in all datasets and several gene sets (all genes, DEG, and DEG with CFG) (Supplementary Fig. S4), suggesting that genes are similarly upregulated or downregulated in AD and MCI.

We further integrated the ADNI, ANM1, and ANM2 datasets, and investigated the AD prediction performance on the integrated dataset (Supplementary File). To integrate the three datasets and select features from it, we applied four different approaches. First, the ComBat method, which was mainly used for internal and external validations, was used to remove batch effects among these datasets²⁷, and then DEGs were selected using the “lmFit” function in the limma package. Second, the scaling & quartiling method by Mohammadi-Dehcheshmeh et al.⁷¹ was used to remove batch effects further, and then DEGs were selected using the “lmFit” function. Third, DEGs were computed for each dataset via the moderate t-test, and then rankings of p-values were used to curate meta-DEGs among three datasets⁷². Fourth, DEGs were computed for each dataset via the moderate t-test, and then p-values were combined via Fisher’s method⁷³. The gene expression values normalized by the ComBat method showed significant correlations with those yielded by the other three approaches (Supplementary Fig. S5). Furthermore, the DEGs curated by the first ComBat approach significantly correlated with those by the other three methods (Supplementary Fig. S6). We evaluated the AD predictive performance by the three-fold CV. When five classifiers, LR, L1-LR, SVM, RF, and DNN, were used for training and test, the ComBat approach obtained AUC values of 0.5, 0.7, 0.8, 0.79, and 0.79, respectively (Supplementary Fig. S7). This result shows that these three datasets can be integrated for classifying AD and CN. Also, the ComBat method outperformed the other three methods in terms of the average AUC values of the five classifiers (Supplementary Fig. S7).

Most studies have used the NINCDS-ADRDA criteria made in 1984² for classifying subjects as AD and CN samples. The NINCDS-ADRDA criteria are symptom- and individual doctor-based diagnosis, and therefore, may yield inconsistencies between different datasets. Using a single nucleotide polymorphism dataset in ADNI, Apostolova et al.⁷⁴ curated seven variants using brain amyloidosis as a dependent variable, while only one variant (FERMT2) was found using the AD stage determined by the NINCDS-ADRDA as a dependent variable. In addition, Edmonds et al.⁷⁵ suggested that some MCI samples diagnosed by the NINCDS-ADRDA are false positives. In 2007, a revised version of the NINCDS-ADRDA criteria, which more focused on the pathology of AD rather than clinical symptoms, was introduced³. In the future, if genomic data composing participants determined by the revised diagnostic criteria of AD are available, more AD-related genes and pathways can be identified.

Conclusions

In this study, we showed that expression values of AD-related genes obtained from blood samples of ADNI, ANM1 and ANM2 could classify AD and CN. Additionally, we observed that AD-related genes from blood samples were enriched with several pathways including immune, inflammation, energy metabolism, and Wnt signaling, which are consistent with observations from brain tissue-based studies. Collectively, AD-related genes from blood samples contribute to the development of blood-based AD diagnostic and treatment tools.

Data availability

ADNI and ANM datasets are publicly available (ADNI, http://adni.loni.usc.edu/; ANM, https://www.ncbi.nlm.nih.gov/geo/).

References

Hebert, L. E., Weuve, J., Scherr, P. A. & Evans, D. A. Alzheimer disease in the United States (2010–2050) estimated using the 2010 census. Neurol. 80, 1778–1783 (2013).
Article Google Scholar
McKhann, G. et al. Clinical diagnosis of Alzheimer’s disease: Report of the NINCDS‐ADRDA Work Group* under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurol. 34, 939–939 (1984).
Article CAS Google Scholar
Dubois, B. et al. Research criteria for the diagnosis of Alzheimer’s disease: revising the NINCDS–ADRDA criteria. Lancet Neurol. 6, 734–746 (2007).
Article PubMed Google Scholar
Lambert, J.-C. et al. Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer’s disease. Nat. Genet. 41, 1094 (2009).
Article CAS PubMed Google Scholar
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452 (2013).
Article CAS PubMed PubMed Central Google Scholar
Xu, M. et al. A systematic integrated analysis of brain expression profiles reveals YAP1 and other prioritized hub genes as important upstream regulators in Alzheimer’s disease. Alzheimers Dement. 14, 215–229 (2018).
Article PubMed Google Scholar
Niculescu, A. B. & Le-Niculescu, H. Convergent Functional Genomics: what we have learned and can learn about genes, pathways, and mechanisms. Neuropsychopharmacology 35, 355 (2010).
Article PubMed Google Scholar
NICULESCU, A. B. III et al. Identifying a series of candidate genes for mania and psychosis: a convergent functional genomics approach. Physiol. Genomics 4, 83–91 (2000).
Article CAS PubMed Google Scholar
Mueller, S. G. et al. Ways toward an early diagnosis in Alzheimer’s disease: the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimers Dement. 1, 55–66 (2005).
Article PubMed PubMed Central Google Scholar
Lovestone, S. et al. AddNeuroMed—the European collaboration for the discovery of novel biomarkers for Alzheimer’s disease. Ann. N. Y. Acad. Sci. 1180, 36–46 (2009).
Article ADS CAS PubMed Google Scholar
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 39, D1005–D1010 (2010).
Article PubMed PubMed Central CAS Google Scholar
Durrenberger, P. F. et al. Common mechanisms in neurodegeneration and neuroinflammation: a BrainNet Europe gene expression microarray study. J. Neural Transm. 122, 1055–1068 (2015).
Article CAS PubMed Google Scholar
Liang, W. S. et al. Alzheimer’s disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons. Proc. Natl Acad. Sci. USA 105, 4441–4446 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Cooper, Y. A. et al. Progranulin levels in blood in Alzheimer’s disease and mild cognitive impairment. Ann. Clin. Transl. Neurol. 5, 616–629 (2018).
Article CAS PubMed PubMed Central Google Scholar
Booij, B. B. et al. A gene expression pattern in blood for the early detection of Alzheimer’s disease. J. Alzheimers Dis. 23, 109–119 (2011).
Article CAS PubMed Google Scholar
Lunnon, K. et al. A blood gene expression marker of early Alzheimer’s disease. J. Alzheimers Dis. 33, 737–753 (2013).
Article CAS PubMed Google Scholar
Sood, S. et al. A novel multi-tissue RNA diagnostic of healthy ageing relates to cognitive health status. Genome Biol. 16, 185 (2015).
Article PubMed PubMed Central CAS Google Scholar
Voyle, N. et al. A pathway based classification method for analyzing gene expression for Alzheimer’s disease diagnosis. J. Alzheimers Dis. 49, 659–669 (2016).
Article CAS PubMed Google Scholar
Li, H. et al. Identification of molecular alterations in leukocytes from gene expression profiles of peripheral whole blood of Alzheimer’s disease. Sci. Rep. 7, 14027 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Li, X. et al. Systematic analysis and biomarker study for Alzheimer’s disease. Sci. Rep. 8, 17394 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Team RC. R: A language and environment for statistical computing (2013).
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Article PubMed PubMed Central Google Scholar
Abadi M. et al. Tensorflow: A system for large-scale machine learning. In: 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16) (ed^(eds) (2016).
Hokama, M. et al. Altered expression of diabetes-related genes in Alzheimer’s disease brains: the Hisayama study. Cereb. Cortex 24, 2476–2488 (2013).
Article PubMed PubMed Central Google Scholar
Antonell, A. et al. A preliminary study of the whole-genome expression profile of sporadic and monogenic early-onset Alzheimer’s disease. Neurobiol. Aging 34, 1772–1778 (2013).
Article CAS PubMed Google Scholar
Irizarry, R. A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003).
Article PubMed MATH Google Scholar
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinforma. 28, 882–883 (2012).
Article CAS Google Scholar
Tusher, V. G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, 5116–5121 (2001).
Article ADS CAS PubMed MATH PubMed Central Google Scholar
Smyth, G. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, Article 3 (2004).
Article MathSciNet MATH Google Scholar
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:13126114 (2013).
Matys, V. et al. TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).
Article ADS CAS PubMed Google Scholar
Keshava Prasad, T. et al. Human protein reference database—2009 update. Nucleic Acids Res. 37, D767–D772 (2008).
Article PubMed PubMed Central CAS Google Scholar
Breen, M. S. et al. Candidate gene networks and blood biomarkers of methamphetamine-associated psychosis: an integrative RNA-sequencing report. Transl. Psychiatry 6, e802 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bertram, L., McQueen, M. B., Mullin, K., Blacker, D. & Tanzi, R. E. Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat. Genet. 39, 17–23 (2007).
Article CAS PubMed Google Scholar
Kim, J., Kim, J. J. & Lee, H. An analysis of disease-gene relationship from Medline abstracts by DigSee. Sci. Rep. 7, 40154 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Cox, D. R. The regression analysis of binary sequences [with discusion]. J. R. Stat. Soc. Ser. B Stat Methodol. 20, 215–242 (1958).
MATH Google Scholar
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat Methodol. 58, 267–288 (1996).
MathSciNet MATH Google Scholar
Ayers, K. L. & Cordell, H. J. SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet. Epidemiol. 34, 879–891 (2010).
Article PubMed PubMed Central Google Scholar
Algamal, Z. Y. & Lee, M. H. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert. Syst. Appl. 42, 9326–9332 (2015).
Article Google Scholar
Dimitriadou, E. et al. Package ‘e1071’. R Software package, avaliable at, http://cran rproject org/web/packages/e1071/index html (2009).
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article MATH Google Scholar
Ebrahimi, M., Lakizadeh, A., Agha-Golzadeh, P., Ebrahimie, E. & Ebrahimi, M. Prediction of thermostability from amino acid attributes by combination of clustering with attribute weighting: a new vista in engineering enzymes. PLoS One 6, e23146 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nat. 521, 436 (2015).
Article ADS CAS Google Scholar
Jamali, A. A. et al. DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug. discovery today 21, 718–724 (2016).
Article CAS PubMed Google Scholar
Duchi, J., Hazan, E. & Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011).
MathSciNet MATH Google Scholar
Kingma, D. P. & Ba, J. A. A method for stochastic optimization. arXiv 2014. arXiv preprint arXiv:14126980, (2019).
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Consortium, G. O. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).
Article CAS Google Scholar
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinforma. 27, 1739–1740 (2011).
Article CAS Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
MathSciNet MATH Google Scholar
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics (ed) (2010).
Bosco, P. et al. Role of the Transforming-Growth-Factor-beta1 Gene in Late-Onset Alzheimer’s Disease: Implications for the Treatment. Curr. genomics 14, 147–156 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zhu, X., Lee, H. G., Raina, A. K., Perry, G. & Smith, M. A. The role of mitogen-activated protein kinase pathways in Alzheimer’s disease. Neurosignals 11, 270–281 (2002).
Article CAS PubMed Google Scholar
Roy, J., Sarkar, A., Parida, S., Ghosh, Z. & Mallick, B. Small RNA sequencing revealed dysregulated piRNAs in Alzheimer’s disease and their probable role in pathogenesis. Mol. Biosyst. 13, 565–576 (2017).
Article CAS PubMed Google Scholar
Masliah, E. et al. Genetic deletion of Nogo/Rtn4 ameliorates behavioral and neuropathological outcomes in amyloid precursor protein transgenic mice. Neurosci. 169, 488–494 (2010).
Article CAS Google Scholar
Hernandez-Guillamon, M. et al. Neuronal TIMP-1 release accompanies astrocytic MMP-9 secretion and enhances astrocyte proliferation induced by beta-amyloid 25-35 fragment. J. Neurosci. Res. 87, 2115–2125 (2009).
Article CAS PubMed Google Scholar
Liu, Y. et al. LPS receptor (CD14): a receptor for phagocytosis of Alzheimer’s amyloid peptide. Brain 128, 1778–1789 (2005).
Article PubMed Google Scholar
Efthymiou, A. G. & Goate, A. M. Late onset Alzheimer’s disease genetics implicates microglial pathways in disease risk. Mol. Neurodegener. 12, 43 (2017).
Article PubMed PubMed Central CAS Google Scholar
Spreafico, M., Grillo, B., Rusconi, F., Battaglioli, E. & Venturin, M. Multiple Layers of CDK5R1 Regulation in Alzheimer’s Disease Implicate Long Non-Coding RNAs. Int J Mol Sci 19 (2018).
Yin, R. H., Yu, J. T. & Tan, L. The Role of SORL1 in Alzheimer’s Disease. Mol. Neurobiol. 51, 909–918 (2015).
Article CAS PubMed Google Scholar
Smith, A., Bourdeau, I., Wang, J. & Bondy, C. A. Expression of Catenin family members CTNNA1, CTNNA2, CTNNB1 and JUP in the primate prefrontal cortex and hippocampus. Brain Res. Mol. Brain Res 135, 225–231 (2005).
Article CAS PubMed Google Scholar
Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nat. 518, 365 (2015).
Article ADS CAS Google Scholar
Zhan, X. et al. Gram-negative bacterial molecules associate with Alzheimer disease pathology. Neurol. 87, 2324–2332 (2016).
Article CAS Google Scholar
Zhan, X., Stamova, B. & Sharp, F. R. Lipopolysaccharide associates with amyloid plaques, neurons and oligodendrocytes in Alzheimer’s disease brain: a review. Front. Aging Neurosci. 10, 42 (2018).
Article PubMed PubMed Central CAS Google Scholar
Ciani, L. et al. Wnt7a signaling promotes dendritic spine growth and synaptic strength through Ca2+/Calmodulin-dependent protein kinase II. Proc. Natl Acad. Sci. USA 108, 10732–10737 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
De Ferrari, G. V. et al. Activation of Wnt signaling rescues neurodegeneration and behavioral impairments induced by beta-amyloid fibrils. Mol. Psychiatry 8, 195–208 (2003).
Article PubMed CAS Google Scholar
Vina, J. & Lloret, A. Why women have more Alzheimer’s disease than men: gender and mitochondrial toxicity of amyloid-beta peptide. J. Alzheimers Dis. 20(Suppl 2), S527–533 (2010).
Article PubMed CAS Google Scholar
Sperling, R. A. et al. Toward defining the preclinical stages of Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 7, 280–292 (2011).
Article PubMed PubMed Central Google Scholar
Albert, M. S. et al. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 7, 270–279 (2011).
Article PubMed PubMed Central Google Scholar
Mohammadi-Dehcheshmeh, M. et al. Unified Transcriptomic Signature of Arbuscular Mycorrhiza Colonization in Roots of Medicago truncatula by Integration of Machine Learning, Promoter Analysis, and Direct Merging Meta-Analysis. Front. plant. Sci. 9, 1550 (2018).
Article PubMed PubMed Central Google Scholar
Sharifi, S. et al. Integration of machine learning and meta-analysis identifies the transcriptomic bio-signature of mastitis disease in cattle. PLoS One 13, e0191227 (2018).
Article PubMed PubMed Central CAS Google Scholar
Farhadian, M., Rafat, S. A., Hasanpur, K., Ebrahimi, M. & Ebrahimie, E. Cross-Species Meta-Analysis of Transcriptomic Data in Combination With Supervised Machine Learning Models Identifies the Common Gene Signature of Lactation Process. Front. Genet. 9, 235 (2018).
Article PubMed PubMed Central CAS Google Scholar
Apostolova, L. G. et al. Associations of the top 20 Alzheimer disease risk variants with brain amyloidosis. JAMA Neurol. 75, 328–341 (2018).
Article PubMed PubMed Central Google Scholar
Edmonds, E. C. et al. Susceptibility of the conventional criteria for mild cognitive impairment to false-positive diagnostic errors. Alzheimers Dement. 11, 415–424 (2015).
Article PubMed Google Scholar

Download references

Acknowledgements

This work has been supported by the Bio & Medical Technology Development Program of NRF funded by the Korean government (MSIT) (NRF-2018M3C7A1054935) and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant Number: HI18C0460).

Author information

Authors and Affiliations

Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology, Gwangju, South Korea
Taesic Lee & Hyunju Lee
Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, South Korea
Hyunju Lee
School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, South Korea
Hyunju Lee

Authors

Taesic Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyunju Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.L. contributed to the study concept and design. T.L. took part in the acquisition of the data and the machine learning algorithms. H.L. and T analyzed and interpreted results. T.L. and H.L. wrote the manuscript. H.L. took part in the study supervision and coordination. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hyunju Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary File.

Supplementary Table 1.

Supplementary Table 2.

Supplementary Table 3.

Supplementary Table 4.

Supplementary Table 5.

Supplementary Table 6.

Supplementary Table 7.

Supplementary Table 8.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lee, T., Lee, H. Prediction of Alzheimer’s disease using blood gene expression data. Sci Rep 10, 3485 (2020). https://doi.org/10.1038/s41598-020-60595-1

Download citation

Received: 02 October 2019
Accepted: 11 February 2020
Published: 26 February 2020
DOI: https://doi.org/10.1038/s41598-020-60595-1

This article is cited by

A multi-cohort study of the hippocampal radiomics model and its associated biological changes in Alzheimer’s Disease
- Huwei Xia
- Xiaoqian Luan
- Weihong Song
Translational Psychiatry (2024)
Predicting early Alzheimer’s with blood biomarkers and clinical features
- Muaath Ebrahim AlMansoori
- Sherlyn Jemimah
- Aamna AlShehhi
Scientific Reports (2024)
DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype–phenotype prediction
- Pramod Bharadwaj Chandrashekar
- Sayali Alatkar
- Daifeng Wang
Genome Medicine (2023)
c-Diadem: a constrained dual-input deep learning model to identify novel biomarkers in Alzheimer’s disease
- Sherlyn Jemimah
- Aamna AlShehhi
BMC Medical Genomics (2023)
Identification of candidate DNA methylation biomarkers related to Alzheimer’s disease risk by integrating genome and blood methylome data
- Yanfa Sun
- Jingjing Zhu
- Lang Wu
Translational Psychiatry (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.