Dementia is a syndrome characterized by loss of cognitive function, in particular memory, thinking, and behavioral abilities. The number of people worldwide with dementia among the elderly population is rapidly increasing, and is expected to reach 74.7 million in 2030 and 131.5 million in 20501. The most common subtype of dementia is Alzheimer’s disease (AD), followed by vascular dementia (VaD), and dementia with Lewy bodies (DLB). The diagnoses of these subtypes of dementia are based on clinical criteria for each subtype2,3,4,5,6. However, it is difficult to differentiate between dementia subtypes, because many clinical features overlap. For example, patients with normal pressure hydrocephalus (NPH) present with similar symptoms to AD, although NPH is often curable by shunt surgery to remove accumulated cerebrospinal fluid (CSF)7. Therefore, the identification of new biomarkers for more efficient diagnosis is urgently required.

Imaging-based techniques, including positron emission tomography scans for detection of amyloid beta (Aβ) deposition or tau tracers, and volumetric magnetic resonance imaging for determination of hippocampal or medial temporal lobe atrophy, are currently used for diagnosis of dementia subtypes; however, these techniques are not suitable for initial screening due to their high cost8. In addition, while several CSF biomarkers, such as Aβ1–42, total tau (T-tau), and phosphorylated tau 181 (P-tau181), are effective for characterizing AD, these are not suitable for initial screening due to their high invasiveness. Therefore, blood-based biomarkers are the most attractive option as candidate biomarkers for practical clinical use due to their minimal invasiveness and cost effectiveness.

MicroRNAs (miRNAs) are small non-coding RNAs of 20–25 nucleotides in length that regulate gene expression by binding to complementary regions of messenger RNAs9. Many circulating miRNAs in blood have been identified as promising biomarkers for cancer diagnosis10,11,12,13,14,15 and dementia diagnosis16,17,18,19,20,21,22. However, the prediction models constructed for dementia using miRNAs are specific to each subtype16,17,18,19,20,21,22 and the development of a multiclass classification prediction model of dementia subtypes is required.

Here, we conducted a comprehensive serum miRNA expression analysis of 1348 Japanese dementia patients composed of four subtypes (AD, VaD, DLB, and NPH) and 246 control subjects. Using this data, we constructed various dementia subtype prediction models based on penalized regression models with multiclass classification. Our final prediction model classified dementia patients into four subtypes of dementia using 46 miRNAs. Furthermore, network-based meta-analysis of the miRNA target genes revealed several important hub genes associated with the pathogenesis of dementia subtypes. We believe that further investigation using a larger sample size will contribute to accurate classification of dementia subtypes.


Clinical samples

All 1594 serum samples and clinical information, including sex, age, and APOE ε4 genotypes, were obtained from the National Center for Geriatrics and Gerontology (NCGG) Biobank. Of these samples, 1009 were from patients with AD, 89 were from patients with VaD, 166 were from patients with DLB, and 84 were from patients with NPH; 246 samples were from cognitively normal elder controls (CN). The AD patients were diagnosed based on the criteria developed by the National Institute on Aging and the Alzheimer's Association (NIA-AA)2,3; the VaD patients were diagnosed using the criteria in the report of the NINDS-AIREN International Workshop4; and the DLB and NPH patients were diagnosed based on the criteria in the fourth report of the DLB Consortium5 and the guidelines for the management of NPH6, respectively. The CN samples had subjective cognitive complaints, but normal cognition on the neuropsychological assessment with a comprehensive neuropsychological test (Mini-Mental State Examination score, > 25). All individuals were 60 years of age or older. This study was approved by the ethics committee of the NCGG. The design and performance of the study were clearly described in a research protocol. Participation was voluntary and all participants completed informed consent in writing before registering to NCGG Biobank. All methods in this study were performed in accordance with the Declaration of Helsinki.

miRNA-microarray assay

Total RNAs were extracted from 300 µL serum samples by using a 3D-Gene RNA extraction reagent (Toray Industries, Inc., Kanagawa, Japan). Comprehensive miRNA expression analysis was performed using the 3D-Gene system and a 3D-Gene Human miRNA Oligo Chip (Toray Industries, Inc.), which was designed to detect 2565 of the miRNA sequences registered in miRBase (release 21, All microarray data used in this study are publicly available through the Gene Expression Omnibus database accession numbers: GSE120584 and GSE167559 ( The miRNA expression normalization was described in our previous study20.

Construction and evaluation of the dementia-type risk prediction model

All data were strictly separated into a discovery set and validation set (Fig. 1). Four-fifths of the discovery set was used to calculate p-values in each cross-validation step. The p-values corresponding to the miRNAs were calculated with a logistic regression model between dementia and CN with adjustments for three covariates: age, sex, and the number of APOE ε4 alleles. The top-ranked m miRNAs were selected stepwise (m = 10, 20, …). Using a combination of the m miRNAs and three clinical variables, risk prediction models were constructed based on penalized regression methods: ridge regression, elastic net, and lasso. Let \(X_{i} = \left( {X_{i,1} , \ldots ,X_{i,p} } \right)\) be the values of the pre-selected top-ranked m miRNAs for a subject i and let \(l\left( {\beta ;\gamma_{i} ,X_{i} } \right)\) be the logistic log-likelihood:

$$l\left( {\beta ;\gamma_{i} ,X_{i} } \right) - \lambda P_{\alpha } \left( \beta \right),$$

where \(P_{\alpha } \left( \beta \right) = \left( {1 - \alpha } \right)\frac{1}{2}\beta^{2} + \alpha \left| \beta \right|\) and \(\alpha\) is set to 1 (lasso), 0 (ridge regression), or 0.1 to 0.9 at 0.1 intervals (elastic net), and the optimal penalty parameter \(\lambda\) is selected using fivefold cross-validation. This process was repeated 5 times (fivefold cross-validation). Based on the average accuracy, we determined the optimal number of miRNAs and \(\alpha\) for model construction. The final model was constructed using the entire discovery set and the adjusted model was evaluated on the independent validation set. The multiclass classification model used in this study was implemented using the glmnet package (version 2.0–18)23 with the function family set to "multinomial" in the statistical software R24.

Figure 1
figure 1

Outline of construction of dementia subtype prediction model and validation.

Network analysis for the target genes of miRNAs

Up- and down-regulated miRNAs were defined based on the coefficients of the multiclass dementia subtype prediction model. The miRNAs with positive coefficients were assigned to up-regulated miRNAs, and those with negative coefficients were assigned to down-regulated miRNAs. The miRNA target gene annotation was conducted using the microRNA Target Prediction and Functional Study Database (miRDB)25. Only target genes with a prediction score (assigned by MirTarget V3 based on miRBase 21) of greater than 90 were used in this study. Based on the target genes, protein–protein interaction (PPI) network analyses were implemented using the web-based tool NetworkAnalyst 3.0 ( together with the International Molecular Exchange Consortium Interactome database27; we set the betweenness cutoff, which indicates the number of shortest paths through a node, to 25 and used the minimum network filter. The PPI networks were visualized using Cytoscape (version 3.7.1)28. We identified nodes with degree > 100 as hub nodes. We identified statistically significant KEGG pathways and biological process gene ontologies (GOs) by using NetworkAnalyst. The P-value was corrected for multiple testing using False Discovery Rate (FDR), and the statistical significance threshold was set at FDR < 0.05.

Ethics approval and consent to participate

This study was approved by the ethics committee of the NCGG. The design and performance of the current study, which involves human subjects, were clearly described in a research protocol. All participation was voluntary, and participants completed informed consent in writing before registering with the NCGG Biobank.



A total of 1594 Japanese people comprising 1009 AD, 89 VaD, 166 DLB, 84 NPH, and 246 CN subjects, were enrolled in this study in the NCGG Biobank. The male:female ratio of all individuals was 1:1.68 with an average age of 78.0 years (SD, 6.8). The allele frequency of the C allele of rs429358, which defines the APOE ε4 phenotype, was 0.203 (Table 1). We also split the participants into a discovery set and a validation set. This separation was stratified on age and sex such that there was a similar distribution of these features in the discovery and validation sets.

Table 1 Sample characteristics of each dementia subtype.

Dementia subtype prediction models

We constructed dementia subtype prediction models based on penalized regression methods for multiclass classification (Fig. 1). We carefully examined the sample size of each dementia subtype in the discovery data set, as the imbalance of sample size in multiclass classification is a well-known issue in the field of machine learning. The detailed procedures of four models, Model A-D, are described in Supplementary Note and Supplementary information (Figs. S1, S2, and Tables S1S5).

To eliminate the bias of sample size among dementia subtypes, we constructed a final model using the same number of samples in each dementia subtype. We selected 50 samples, composed of 25 males and 25 females, from four dementia subtypes and CN. In total, 250 samples were used for a discovery data set and the remaining samples were used for a validation data set (Supplementary Table S3, Dataset 3). Through the fivefold cross-validation, the maximum mean accuracy for dementia subtypes was achieved with a parameter combination of (m, \(\alpha\)) = (200, 0.9) (Supplementary Fig. S3). The final prediction model, Model D (Supplementary Table S6), was constructed with the optimal parameters detected using the entire discovery set and achieved an accuracy of 0.398 in the validation data set and a mean accuracy of 0.374 for dementia subtypes (Supplementary Table S3) when using 46 miRNAs.

PPI network analyses using miRNA target genes

According to the annotation in miRDB, the 46 miRNAs used in Model D were predicted to target 1303 genes. Of these 1303 genes, the numbers of genes related to up- and down-regulated miRNAs were, respectively, 18 and 110 in AD, 440 and 0 in VaD, 428 and 0 in DLB, and 180 and 63 in NPH (Fig. 2). To elucidate functional modules from the target genes, PPI network analyses were performed using NetworkAnalyst for each dementia subtype; PPI network visualization was conducted with Cytoscape software (Fig. 3). Several interesting hub genes were identified in each dementia subtype. Of them, SRC and CHD3, which are targets of down-regulated miRNAs in AD, are particularly interesting. The SRC gene is reported to be associated with AD29,30,31. The CHD3 gene is reported to be involved in transcription of the PSEN1 gene, the most common cause of familial Alzheimer's disease32,33,34. While the UBC gene was observed in all dementia subtypes networks, it was not one of the 1303 target genes that we detected.

Figure 2
figure 2

Effective miRNAs and genes used in our dementia-type prediction model. (a) The number of up- and down-regulated miRNAs used for the final models for each dementia subtype. (b) Target genes for each miRNA used for each model; the target genes were predicted by using annotation in the miRDB data.

Figure 3
figure 3

PPI network analysis for the genes that were targeted by the miRNAs. The gene symbol is displayed for nodes with degree > 100.

We further performed gene set enrichment analyses (GSEA) using NetworkAnalyst for each dementia subtype. A statistically significant KEGG pathway (Ras signaling pathway, hsa04014, FDR = 0.0083 in VaD) and four statistically significant GO terms (Nucleobase-containing compound transport, GO:0015931, FDR = 0.029 in AD; RNA export from nucleus, GO:0006405, FDR = 0.046 in AD; Homophilic cell adhesion, GO:0007156, FDR = 5.3 × 10–5 in DLB; Cell–cell adhesion, GO:0098609, FDR = 6.6 × 10–4 in DLB) were identified in the target genes of up-regulated miRNAs (Table 2).

Table 2 Gene set enrichment analyses using the target genes of the miRNAs.


We report a prediction model for multiclass classification of dementia subtypes using serum miRNAs. Most previous research has focused on developing prediction models for specific dementia subtypes16,17,18,19,20,21,22. Shigemizu et al. reported separate effective blood miRNA-based risk prediction models for several dementia subtypes20,21; however, such individual dementia subtype prediction models would likely predict multiple dementia subtypes for each patient. To address this issue, we propose a multiclass classification dementia subtype prediction model. Although our models could not provide sufficient accuracy, a further, larger cohort study is expected to improve the prediction accuracy. In particular, an improvement of the prediction accuracy can be expected in patients with VaD and NPH as the number of affected patients in this study was small.

Network analyses were performed using the target genes of the 46 miRNAs involved in our model that used equal sample sizes for the four subtypes, Model D. Two interesting hub genes, SRC and CHD3, were detected from the PPI network of the target genes of down-regulated miRNAs in AD. Accumulation of Aβ in the brain is strongly associated with the onset of AD. The constitutively active form of Src increases Aβ generation, and the inhibition of the kinase activity of Src reduces the generation of Aβ in cells stably expressing APP29. Src also regulates the phosphorylation of Mint, which has an important role in the production and the secretion of Aβ30,31. CHD3 suppresses transcription of the familial AD causative gene PSEN132,33 and may potentially rescue Ca2+‐signaling defects in familial AD patients and prevent neuronal apoptosis34. While UBC appeared in all networks, there were no miRNAs targeting UBC in our model. This may be because the housekeeping gene UBC (Ubiquitin C) has a key role in various pathways, such as DNA repair, cell cycle regulation, kinase modification, endocytosis, and regulation of other cell signaling pathways.

Dementia-subtype–specific target genes could be effective in distinguishing between the four dementia subtypes. Two genes, MCU and CASP3, that are associated with DLB pathogenesis35,36,37, were identified from the target genes of up-regulated miRNAs in DLB. Calcium uptake via the mitochondrial calcium uniporter (MCU) complex and mitochondrial calcium overload play a key role in neurodegenerative diseases, including DLB35,36. Neuronal accumulation of α-synuclein is a major feature of DLB. Desplats et al. reported that neuron-to-neuron transmission of α-synuclein triggers activation of caspase 3, which is coded by CASP3, and causes neuronal cell death37.

The Ras signaling pathway (hsa04014) was identified through GSEA using target genes of up-regulated miRNAs in VaD. The Ras signaling pathway was also identified from target genes of down-regulated miRNAs in AD, but the result was not statistically significant after FDR correction. Kirouac et al. reported that the expression of Ras is associated with Aβ production in human AD brains38. Therefore, further study of genes involved in the Ras signaling pathway could contribute to understanding the difference between AD and VaD. In addition, four statistically significant GO terms (Nucleobase-containing compound transport, RNA export from nucleus, Homophilic cell adhesion, and Cell–cell adhesion) were identified from target genes of up-regulated miRNAs in AD and DLB. The association between these terms and dementia is currently unknown. These GO terms, however, may provide important clues to the underlying mechanisms for AD and DLB pathogenesis.

Blood-based biomarkers, which form the basis of our prediction models are attractive because they are minimally invasive, cost-effective, and easy to reproduce; however, the strategy we present here has a limitation. It is not applicable to mixed dementia cases, in which patients exhibit more than one subtype of dementia simultaneously39,40. As long as this issue is unresolved, it may be difficult to apply our findings to practical clinical use.


We developed dementia subtype prediction models by using penalized regression methods for multiclass classification and serum microRNA expression data. Network analysis of the miRNA target genes revealed several important hub genes associated with AD pathogenesis. Moreover, we found several genes associated with DLB pathogenesis, from our DLB-specific target genes. Our study demonstrates the potential of blood-based biomarkers for dementia subtype prediction models. We believe that further investigation using a larger sample size will contribute to a more accurate classification of dementia subtypes.