Correlation between Alzheimer’s disease and type 2 diabetes using non-negative matrix factorization

Chung, Yeonwoo; Lee, Hyunju

doi:10.1038/s41598-021-94048-0

Download PDF

Article
Open access
Published: 27 July 2021

Correlation between Alzheimer’s disease and type 2 diabetes using non-negative matrix factorization

Yeonwoo Chung¹,
Hyunju Lee¹ &
the Alzheimer’s Disease Neuroimaging Initiative

Scientific Reports volume 11, Article number: 15265 (2021) Cite this article

4703 Accesses
14 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Alzheimer’s disease (AD) is a complex and heterogeneous disease that can be affected by various genetic factors. Although the cause of AD is not yet known and there is no treatment to cure this disease, its progression can be delayed. AD has recently been recognized as a brain-specific type of diabetes called type 3 diabetes. Several studies have shown that people with type 2 diabetes (T2D) have a higher risk of developing AD. Therefore, it is important to identify subgroups of patients with AD that may be more likely to be associated with T2D. We here describe a new approach to identify the correlation between AD and T2D at the genetic level. Subgroups of AD and T2D were each generated using a non-negative matrix factorization (NMF) approach, which generated clusters containing subsets of genes and samples. In the gene cluster that was generated by conventional gene clustering method from NMF, we selected genes with significant differences in the corresponding sample cluster by Kruskal–Wallis and Dunn-test. Subsequently, we extracted differentially expressed gene (DEG) subgroups, and candidate genes with the same regulation direction can be extracted at the intersection of two disease DEG subgroups. Finally, we identified 241 candidate genes that represent common features related to both AD and T2D, and based on pathway analysis we propose that these genes play a role in the common pathological features of AD and T2D. Moreover, in the prediction of AD using logistic regression analysis with an independent AD dataset, the candidate genes obtained better prediction performance than DEGs. In conclusion, our study revealed a subgroup of patients with AD that are associated with T2D and candidate genes associated between AD and T2D, which can help in providing personalized and suitable treatments.

Exploring the pathogenesis and key genes associated of acute myocardial infarction complicated with Alzheimer’s disease

Article Open access 16 January 2024

Predicting early Alzheimer’s with blood biomarkers and clinical features

Article Open access 13 March 2024

Unbiased proteomics and multivariable regularized regression techniques identify SMOC1, NOG, APCS, and NTN1 in an Alzheimer’s disease brain proteomic signature

Article Open access 06 July 2023

Introduction

The number of people worldwide suffering from Alzheimer’s disease (AD) has been steadily increasing in recent decades¹. AD is an irreversible disease that slowly and progressively destroys the brain. Specifically, AD affects memory dysfunction, representing a major cause of dementia in the aging population^2,3. However, AD is not a normal component of aging, but is instead a complex disease entity, and the detailed pathogenic mechanisms underlying the disease remain unclear. It has been generally recognized that accumulation of plaques (beta-amyloid) and tangles (tau) is the leading cause of AD⁴, and the prime suspect contributing to the associated neuronal destruction^5,6. Although the specific causes of beta-amyloid and tau accumulation are unknown, this pathogenic event is considered to be the result of various interacting genetic and environmental factors⁷. Therefore, it is important to address the complexity of AD by detecting the underlying characteristics.

One approach to disentangle a complex disease is gene expression analysis, including the identification of potential candidate genes or comparing expression values for specific AD-related genes. Indeed, several studies have discovered AD-related genes and mechanisms using genome-wide analyses^8,9,10,11. In particular, AD analyses using blood samples from patients have received considerable attention as a novel method of diagnosis given the advantages of the non-invasive nature and less expensive process compared to traditional analyses using brain tissue or imaging. In fact, several differentially expressed proteins in the AD brain have been identified in the blood of AD patients¹². Bu et al.¹³ showed that the amyloid-beta protein is produced not only in the brain, but also in the peripheral tissues of AD patients. Therefore, many studies have focused on machine-learning and statistical methods to improve early detection and for the identification of candidate genes that can serve as targets for AD treatment based on data obtained from peripheral samples^14,15,16,17.

Moreover, recent studies have indicated an association of AD with other diseases^18,19,20. Several genetic factors and pathophysiological mechanisms associated with AD are also shared with other AD-related diseases²¹. Therefore, identifying the relationships between AD and AD-related diseases can help to address the complexity of AD and bring us a step closer to realizing the personalized treatment of AD by reducing risk factors from comorbidities and giving a chance to treat common dysregulated pathways between AD and it related diseases^22,23. Narayanan et al.²⁴ investigated co-regulated genes in AD and Huntington’s disease, and identified common differentially co-expressed subnetworks in the two neurodegenerative diseases. In particular, type 2 diabetes (T2D) is highly associated with AD, and epidemiological studies have shown that patients with T2D have a higher risk of developing AD²⁵. One of the hallmark pathophysiologic features in T2D patients is the deposition of amyloid converted from islet amyloid peptide (IAPP)²⁶. Overproduction of IAPP secreted by pancreatic beta-cells may cause beta cell loss in T2D, and there are evidences that intracellular toxic amyloid peptide oligomers are associated with AD²⁷. Given that several lines of evidence indicate a link between AD and T2D, AD can be considered as a brain-specific type of diabetes that has been dubbed “type 3 diabetes”, and inflammation, insulin resistance, and mitochondrial dysfunction were considered as common pathogenesis of two diseases^28,29,30.

Accordingly, the aim of this study was to detect the common and distinct characteristics of AD and T2D using gene expression datasets. Because not all AD patients show an association with T2D, we extracted genes with significantly different expression profiles in AD and T2D patients separately using computational and statistical methods, and then defined subgroups of AD and T2D compared to their respective controls. Based on this analysis, we identified common genes with the same regulation directions (up or downregulated in the disease) in each pair of patient subgroups. Through this approach, we identified candidate genes among the common genes with significant differences in expression levels in the subgroups of the two diseases, which were tested as potential biomarkers for diagnostic or prognostic prediction using an independent Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset³¹. Figure 1 illustrates the procedure for relation extraction between two diseases.

Methods

Data description

The mRNA expression datasets for AD and T2D were downloaded from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). For AD analysis, we used integrated data on peripheral blood gene expression profiles from the GSE63060 and GSE63061 datasets using an R software³², which were generated on the Illumina HumanHT-12 v3.0 Expression BeadChip and the Illumina HumanHT-12 v4.0 Expression BeadChip, respectively. The GSE63060 dataset contains data for 145 AD patients and 104 control samples, and the GSE63061 dataset contains data for 140 AD patients and 135 control samples. For T2D, we used gene expression data (GSE78721) including 68 T2D patients and 62 healthy controls from adipocytes and infiltration macrophages because it was the largest T2D dataset in GEO, and adipose-derived transcription signature is associated with T2D^33,34. Gene expression data were generated using the Affymetrix PrimeView Human Gene Expression Array.

For each gene expression data point from the GEO datasets, the probe ID was converted into the Entrez Gene ID using information from the platforms of each AD and T2D datasets (e.g., GPL6947, GPL10558, and GPL15207). Gene expression levels of the probe that did not match the Entrez Gene ID were removed. Among all the assigned Entrez Gene IDs, protein-coding genes were selected using database Homo_sapiens.GRCh38.94 (http://asia.ensembl.org/Homo_sapiens/Info/Index), where Ensembl IDs were converted to the Entrez Gene IDs using the “biomaRt” package in R software. For duplicated Entrez Gene IDs, the expression values of the same Entrez Gene ID were merged into the mean value. To analyze the AD data, we merged the two datasets (GSE63060 and GSE63061). The number of Entrez Gene IDs selected by the protein-coding gene database in each AD dataset was 16,730 and 14,957, respectively. We selected 14,134 common genes from the two datasets and used the “removeBatchEffect” function in the R package “limma”. As a result, we obtained the expression data of 14,134 genes from the 285 AD patients and 239 control samples. In addition, we normalized each data point in the patient expression data by applying a log2(fold change) conversion as follows:

$$\begin{aligned} y_{ij} = \log _{2}{\frac{patient_{ij}}{normal_i}} \end{aligned}$$

where $patient_{ij}$ is the expression value of the gene i of the jth patient and $normal_i$ is the mean expression value of the gene i in the normal control samples.

Decomposition of gene expression data sets

Most gene expression datasets contain information on thousands of genes, which is relatively large compared to the number of samples; therefore, several studies have applied dimensionality reduction methods to reduce the matrix dimension^35,36. The non-negative matrix factorization (NMF) method has been widely used to reduce the dimension of an input matrix by decomposing a non-negative input matrix into two matrices³⁷. Assuming an input matrix A consisting of the expression data of N genes and M samples, NMF produces the matrices W and H of size $N\times k$ and $k\times M$, respectively, in which the parameter k indicates the number of clusters desired in the input data. The NMF algorithm is a multiplication update algorithm that multiplies W and H to obtain the input A during iteration until convergence. After convergence, the matrices W and H are used to bi-cluster genes and samples^38,39. Each row of $W (gene\times k)$ and each column of $H (k\times sample)$ can be represented by a positive linear combination of k. The element $w_{ij}$ in the W matrix is the coefficient of gene i and cluster $j (1\sim k)$, and the element $h_{ij}$ in the H matrix is the coefficient of cluster $i (1\sim k)$ and sample j (Fig. 1b).

Determination of the optimal number of clusters according to the rank k

To select the optimal number of meaningful clusters that correctly divide the input data, the cophenetic correlation coefficient should be taken into consideration³⁷. NMF updates the $W_i$ and $H_i$ matrices of each ith iteration until convergence. Following this, a connectivity matrix $C_i$ (with size $sample\times sample$) is defined by each sample assignment of the ith iteration from $H_i$ by selecting the maximum index of each column. The elements of the connectivity matrix $c_{ij}$ are filled with 1 s if the samples i and j are assigned to the same cluster and are filled with 0 s otherwise. The average of all connectivity matrices represents the consensus matrix ${\bar{C}}$, which is the probability that the samples i and j belong together during iterations. The cophenetic correlation coefficient is then calculated as the Pearson correlation coefficient between $I-{\bar{C}}$ and the distance between samples in a hierarchical clustering of ${\bar{C}}$. The cophenetic correlation coefficient indicates the dispersion of the sample assignment, which refers to how consistently samples with similar gene expression profiles belong together during iterations. Therefore, the rank k with the highest cophenetic correlation indicates the optimal capacity of the model.

However, NMF can only process non-negative ranges of entries in the input matrix A, and the output matrices W and H also have non-negative ranges. Therefore, to analyze the log2(fold change) expression dataset of each disease, we needed to select a model that can cover both positive and negative values. If the range of the input matrix A is ${\pm }$, then the matrix $A_{\pm }$ is decomposed such that $A_{\pm } \approx W_{\pm }\times H_+$. In convex NMF, the basis vectors of the $W_{\pm }$ matrix are considered to be convex combinations of the input matrix $A_{\pm }$ (i.e., $A_{\pm } \approx W_{\pm }\times H_+ \approx X_{\pm }\times F_+ \times H_+$⁴⁰) and there is an advantage in that the factors of $F_+$ and $H_+$ are sparse. Therefore, we apply the convex NMF to obtain the W and H matrices. $X_{\pm }\times F_+$ from convex NMF is treated as a factor of W in the NMF used for gene clustering, and then $F_+$ and $H_+$ are updated alternatively as follows:

$$\begin{aligned} H_{ik} \leftarrow H_{ik} \sqrt{\frac{[(X^TX)^+F]_{ik}+ [HF^T(X^TX)^-F]_{ik}}{[(X^TX)^-F]_{ik} + [HF^T(X^TX)^+F]_{ik}}} \\ F_{ik} \leftarrow F_{ik}\sqrt{\frac{[(X^TX)^+H]_{ik} + [(X^TX)^-FH^TH]_{ik}}{[(X^TX)^-H]_{ik} + [(X^TX)^+FH^TH]_{ik}}} \end{aligned}$$

Gene and sample clusters

The basic method of gene and sample clustering in NMF using the factors W and H is the “Max” method, which selects the cluster with the highest coefficient³⁷. In general, a gene is assigned to the cluster with the highest coefficient in each gene row in the W matrix, and a sample is assigned to cluster $S_i$ when the ith coefficient is the highest coefficient in each sample column in the H matrix. Accordingly, the gene cluster obtained by the “Max” gene clustering method is a group of genes with relatively upregulated expression in the sample cluster $S_i$ compared to other sample clusters. Conversely, to consider the gene cluster with relatively downregulated expression, a gene is assigned to the cluster with the minimum coefficient using the “Min” gene clustering method. We performed this bi-clustering method via NMF using the AD patient expression data to cluster AD patients into k sample clusters and genes into k relatively up and downregulated clusters, respectively. The T2D patient expression data were processed in the same manner.

However, in most gene expression analyses, the number of genes (features) is larger than the number of samples in the dataset. Even if the dimension of genes and samples in an expression dataset can be reduced using NMF, it is still difficult to analyze genes in k clusters because each gene belongs to one of the k clusters. In addition, each cluster may contain genes whose expression values in the samples of the given cluster are not different compared with those in samples of other clusters. Thus, some genes will be assigned to a gene cluster even if there are no relative differences in expression between sample clusters (Supplementary Fig. S1).

To address this issue, we filtered out genes in clusters by considering the original input matrix A and identified which genes in each cluster showed significantly different expression levels in a specific sample cluster. First, for each gene that was already assigned to the cluster, the distribution of expression levels was defined as $D_i (i=1\sim k)$ for each k sample cluster. We then used the Kruskal–Wallis test to identify genes with a significantly different expression distribution in samples of a given cluster from those in samples of at least one other cluster. The p values of the test were adjusted according to the Bonferroni correction for multiple comparisons, and genes with a q value < 0.05 were selected. Further, the Dunn test was performed between the distribution of expression levels for each gene in all possible sample cluster pairs. If the expression level differences of the gene between the given cluster and other remaining clusters were significant (q values < 0.05 after Bonferroni correction), the gene was selected. For cluster i, these genes with relatively upregulated expression were denoted as $G_i^+$ (Fig. 1b).

Similarly, for a gene assigned to the gene cluster by the “Min” gene clustering method, the Kruskal–Wallis test and Dunn test were subsequently applied to the genes in each cluster. For cluster i, these genes with relatively downregulated expression were selected and denoted as $G_i^-$. This process could effectively reduce the number of genes in each cluster compared to the conventional clustering method of NMF, which facilitated the analysis.

Differentially expressed gene (DEG) subgroups

The Kruskal–Wallis–based gene clustering method described above can extract the genes with relatively up and downregulated expression in patients with a given disease in k sample clusters. However, we further needed to identify whether genes in the obtained $G_i^+$ and $G_i^-$ groups are differentially expressed in the sample cluster $S_i$ compared to controls. In addition, even if $G_i$ selected as the characteristic of $S_i$ differs significantly from other sample clusters, genes in $G_i$ need to be differentially expressed in $S_j$ compared to controls, which means that it can also be the characteristic of the $S_j$ sample cluster. Especially, because AD and T2D datasets are gene expression data from different tissue types, by extracting differentially expressed genes between each patient groups of AD and T2D and their healthy controls, we can remove tissue-specific genes and detect genes related to each disease. Therefore, we considered DEGs from all genes in $G_i^+$ and $G_i^-$ between each patient cluster and their respective controls. First, we collected all genes assigned to any k cluster. Second, the expression levels for each gene were compared between disease samples in the i cluster and control samples using the t-test followed by Bonferroni correction. The genes with a q value < 0.05 for the sample i cluster were assigned to $M_i^+$ or $M_i^-$ depending on the direction of the expression level difference (upregulation or downregulation, respectively) (Fig. 1c).

Because AD and T2D gene expression datasets were obtained from different tissue types, this subgrouping of genes based on DEG is necessary. We can remove tissue-specific genes and select genes related to each disease by using DEGs between each patient groups of AD and T2D and their healthy controls.

Extraction of AD and T2D-related subgroup pairs and candidate genes

We independently applied the NMF approach for the expression data of AD and T2D patients and their respective controls, and obtained AD and T2D DEG subgroups for specific sample clusters. We then aimed to find AD subgroups related to T2D and T2D subgroups related to AD. To this end, we performed pathway enrichment analysis. These enriched pathways were then compared with those of known AD- and T2D-related genes in DigSee⁴¹. In addition, we investigated whether the enriched pathways in the AD DEG subgroups overlapped with T2D-related pathways and vice versa. Then, we selected a AD subgroup and a T2D subgroup containing the largest number of T2D-related pathways and AD-related pathways, respectively, which are a pair of clusters related with each other. From these two clusters, we selected common candidate genes with the same regulation direction, which are referred to as candidate genes (Fig. 1d).

Afterwards, we used an independent AD dataset downloaded from the ADNI (http://adni.loni.usc.edu)³¹, which included gene expression data from 116 AD patients and 246 controls extracted from peripheral blood, to validate the candidate genes identified from the DEG subgroup pairs related to both the diseases. With this dataset, the expression levels of 20,384 protein-coding genes filtered using the database Homo sapiens.GRCh38.94 were used for the classification of AD and the control sample using logistic regression. Tenfold cross-validation was performed with zero initialization and a learning rate of 0.05, and the area under the curve (AUC) was calculated at each tenfold cross-validation to evaluate the predictive ability of the candidate genes.

Additionally, we collected independent T2D gene expression datasets: 25 T2D patient and 71 control samples extracted from beta-cells or pancreatic islets in GSE20966, GSE25724, and GSE38642^42,43,44,45. Then, we selected protein-coding genes using Homo sapiens.GRCh38.94 from each dataset. By removing the batch effect using the “removeBatchEffect” function in the R package “limma”, we normalized the expression profiles for 10,490 common genes in the three datasets and merged them. Similar to AD, we validated the candidate genes using these T2D datasets. Because of the small number of T2D patients, we performed threefold cross-validation with zero initialization and a learning rate of 0.005 in a logistic regression model.

Results and discussion

Clustering of AD and T2D genes

The AD and T2D subgroups were independently defined using the log2(fold change) values from the expression data of 285 AD and 68 T2D patients using the convex NMF approach and NMF-based clustering method. First, to decompose the expression data of the patients into subgroups using the NMF approach, we needed to determine the optimal number of subgroups. In general, initialization of the matrices W and H affects the final outputs of NMF. Hence, we applied the NMF algorithm 10 times for each rank k from 2 to 10 with randomly initialized W and H matrices, and then calculated the average of the cophenetic correlation coefficient. We chose the rank k that had the largest average cophenetic correlation coefficient. Figure 2 shows the average cophenetic correlation coefficients for each rank k in each AD and T2D patient dataset. The optimal rank k of both datasets was 3. Thus, we used the factorized matrices W and H with the largest cophenetic correlation coefficient out of 10 iterations of rank 3 as the NMF output for each disease.

After applying the sample and gene assignment method to the decomposed matrices, we constructed three clusters containing a subset of samples for genes with relatively upregulated and downregulated expression, respectively. For sample clustering, the “Max” method was applied to matrix H in columns, and the 285 AD patients and 68 T2D patients were divided into three sample clusters of 93, 90, and 102 patients, and 17, 9, and 42 patients, respectively. All of the genes in both datasets were first assigned to one of the three gene clusters through the “Max” method. According to the Kruskal-Wallis and Dunn test, 5375 and 4479 genes were significantly upregulated ($G_x^+$) and downregulated ($G_y^-$), respectively, in expression from other clusters for AD. Similarly, 3461 and 7369 genes were upregulated and downregulated, respectively, for T2D (Table 1). Each gene can be assigned to both $G_x^+$ and $G_y^-$ when the distribution in $S_x$ is relatively upregulated whereas that in $S_y$ is relatively downregulated. Thus, in the union of upregulated and downregulated genes, 6729 and 10,051 genes emerged as showing significant differences in expression from other clusters for AD and T2D, respectively.

Each gene can be assigned to both $G_x^+$ and $G_y^-$ when the distribution in $S_x$ is relatively upregulated whereas that in $S_y$ is relatively downregulated. The Kruskal-Wallis based gene clustering method showed that samples and genes in AD patients were more evenly divided compared to those in T2D patients. However, in the T2D dataset, most of the samples were assigned to $S_3$ and the gene cluster also showed a skewed distribution in $G_3$. Because the number of clusters was decided by the optimal rank k, the cluster $S_2$ with the small number samples can be generated when the total number of samples are small such as the T2D dataset. This small size cluster may have distinct characteristics that can be distinguished from other clusters.

Table 1 Clustered samples and genes for (a) Alzheimer’s disease (AD) and (b) type 2 diabetes (T2D).

Full size table

To visually confirm that the Kruskal-Wallis–based gene clustering method removed inappropriate genes in each gene cluster compared to the conventional “Max”/“Min” method, the gene expression matrices were rearranged in cluster order for both genes and samples, which were visualized on a heatmap. The rectangular regions on the diagonal of the heatmap, indicating samples and genes assigned in the same cluster, demonstrate genes with relatively upregulated or downregulated expression in each cluster. Compared to the basic “Max” and “Min” method, genes selected by Kruskal-Wallis test generated more distinct regions, in which genes showed significant differences in expression levels that could be clearly observed on the heatmap for both the AD and T2D datasets (Fig. 3). Figure 3 was generated using “aheatmap” function in the R package “NMF”⁴⁶ and “heatmap” function in the python library “seaborn”⁴⁷.

Next, we constructed $M_i^+$ and $M_i^-$, where gene expressions differ between each disease patient group and its corresponding control group. For the subgroups of each sample cluster, we integrated the genes with relatively up and downregulated expression compared to patients in other clusters ($G_i^+$ and $G_i^-$), resulting in 6729 and 10,051 integrated genes in AD and T2D, respectively. Then, genes were assigned into a subgroup i if they are differentially expressed between patients in the subgroup i and control samples based on the t-test with Bonferroni-corrected q value < 0.05. The numbers of these up and downregulated genes ($M_i^+$ and $M_i^-$) are shown in Table 2.

We further examined the clinical characteristics such as age of these samples in AD modules because the age information is only in AD, not T2D. The result of a one-way ANOVA for age in the AD patient subgroup showed no significant differences between AD patient subgroups (Supplementary Table S2), which suggests that the grouping of samples does not depend on age.

Table 2 Number of DEGs in each cluster.

Full size table

Selecting cluster pairs of interest

To find AD clusters related to T2D and T2D clusters related to AD, pathway enrichment analysis was performed for the up and downregulated genes ($M_i^+$ and $M_i^-$) in each AD and T2D cluster using a total of 10,378 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene Ontology (GO) terms from MSigDB. The hypergeometric test for $M_i^+$ and $M_i^-$ in AD and T2D genes was used to determine significantly enriched KEGG pathways and GO terms (Tables 3 and 4). As references of AD- and T2D-related pathways, we extracted 1635 AD-related genes and 1658 T2D-related genes from DigSee⁴¹, and obtained 1675 and 1757 AD- and T2D-related pathways, respectively, using the hypergeometric enrichment test. We identified common pathways from the DEG subgroups in AD patients with 1757 T2D-related pathways from DigSee⁴¹. Among the AD clusters, patients in $S_3$ were most likely to have an association with T2D compared to AD patients in the AD $S_1$ and AD $S_2$ clusters (Table 3). Likewise, T2D patients in cluster $S_3$ were most likely to be associated with AD (Table 4). There was no clinical information in the datasets to confirm whether the AD $S_3$ patients actually have T2D or whether the T2D $S_3$ patients have AD; however, our data suggest that patients in these clusters might share genetic characteristics of the other disease.

In addition, we used brain gene expression data GSE5281 to determine which AD clusters have features in common with the AD brain⁴⁸. We extracted 1831 DEGs from these gene expression data by performing the t-test with a q value < 0.05, and 160 pathways were extracted from these genes. As a result, genes in the AD $S_3$ module most overlapped with AD brain-related pathways (Supplementary Table S1).

Table 3 Numbers of type 2 diabetes (T2D)-related pathways for each Alzheimer’s disease (AD) differentially expressed gene module.

Full size table

Table 4 Numbers of Alzheimer’s disease (AD)-related pathways in each type 2 diabetes (T2D) differentially expressed gene module.

Full size table

To obtain further evidence that AD $S_3$ and T2D $S_3$ are related, pathway enrichment analysis was performed for the common genes of possible cluster pairs in the up-regulated and down-regulated AD and T2D gene clusters. The common genes with the same regulation direction (i.e., up or downregulated) in both diseases were extracted, and their enriched pathways were compared with the 1498 pathways of the 671 intersecting genes between AD and T2D from DigSee⁴¹. We used the pathways associated with both AD and T2D as references to identify whether the common genes in each possible cluster pair were related. Indeed, the enriched pathways from common genes in the upregulated AD $M_3^+$ and T2D $M_3^+$ modules showed more overlap with pathways from DigSee compared to that in other possible pairs (Table 5). Similarly, the common genes from the downregulated AD $M_3^-$ and T2D $M_3^-$ modules showed the highest overlapping pathway ratio with pathways from DigSee (Table 5). This suggested that the cluster pairs AD $M_3$ and T2D $M_3$ were the most closely related to both AD and T2D.

Table 5 Numbers of common pathways between disease subgroup pairs and DigSee.

Full size table

Extraction of candidate genes

We extracted 241 common genes from the cluster pair AD $M_3$ and T2D $M_3$, including 195 upregulated genes and 46 downregulated genes, which were selected as candidate genes associated with both AD and T2D (Supplementary Table S3). Among these 241 genes, 14 genes were common with genes related with both AD and T2D from DigSee⁴¹. In DigSee, 661 genes were common for both AD and T2D. With a hypergeometric test for significance of 14 genes out of 241, a p value was 0.03826, showing significance of these genes in their roles in AD and T2D. In addition, we collected more AD and T2D genes from AlzGene and T2DiACoD (Supplementary Table S3)^49,50. When comparing the 241 genes with genes in the three databases of DigSee, AlzGene, and T2DiACoD, 56 genes were related to AD or T2D.

These candidate genes were enriched in 29 pathways (Supplementary Table S4) and 14 pathways that are common with AD and T2D-related pathways from DigSee are shown in Table 6. Pathways associated with common pathological features of AD and T2D such as the immune system-related pathways (T cell selection, positive T cell selection, and T cell differentiation)^51,52 and chemokine signaling pathway^53,54 were included. Immune system-related pathways are known to be common characteristics of AD in the brain and blood⁵¹, and there are some evidences that chemokines play an essential role in the central nervous system and neuroprotection^55,56. Interestingly, 11 genes among the 241 candidate genes were involved in chemokine signaling pathway, and 6 of them were AD and T2D-related genes: RAF1, RAC1, RHOA, STAT3, AKT1, and PRKCD.

Table 6 Common pathways between candidate genes and DigSee genes.

Full size table

To verify whether the 241 candidate genes could be informative markers for the classification of each disease patients and controls, we used data from the ADNI cohort³¹ for AD prediction, and a merged independent T2D dataset for predicting T2D. In AD prediction, candidate genes from the (AD $M_3$, T2D $M_3$) pair showed the best diagnostic performance, with an AUC value of 0.6906, compared with genes from nine possible pairs (Table 7).

Table 7 Performance of classification of Alzheimer’s disease (AD) and controls in the ADNI cohort using different sets of genes. ($M_i,M_j$) represents the common genes between the AD $M_i$ module and type 2 diabetes (T2D) $M_j$ module used for classification.

Full size table

For comparison, the performance of classifying AD in the ADNI cohort was measured for 250 random genes to match the size of the candidate genes. Classification using 250 random genes was performed 100 times, and the mean AUC value was 0.5658. The t-test showed that the candidate genes significantly outperformed the randomly selected genes in classification with a p value of $5.723\times 10^{-52}$ (Fig. 4). As another comparison, we obtained 1,466 DEGs from the AD samples (GSE63060 and GSE63061 datasets) by the t-test and selected genes with a q value < 0.05 after Bonferroni correction. When these DEGs were used for AD classification in the ADNI cohort, the AUC value was 0.5757.

Furthermore, we examined serum glucose data in the ADNI dataset to consider the clinical characteristics of samples. When we compared glucose levels between the AD and the control groups by the t-test, there was no significant difference (p value = 0.513). In ADNI, there were 41 AD patients and 95 controls with high glucose levels, including prediabetes or diabetes samples ($\ge$ 100 mg/dL). Using the logistic regression model that were constructed for classification of AD (Table 7), we examined the classification performance for these hyperglycemic samples in the test set of each fold on tenfold cross-validation. The prediction performance for these hyperglycemic samples using the candidate genes (AUC = 0.715) was the highest among those using other genes (Supplementary Table S5). We also found that the predictive power of the prediction model using candidate genes was higher for these hyperglycemic samples compared to those for the whole samples (0.715 in Supplementary Table S5 and 0.6906 in Table 7, respectively).

We also performed classification using the independent T2D datasets (25 T2D and 71 controls). Among the 241 candidate genes, 179 genes were included in the T2D samples. On the threefold cross-validation, we obtained the AUC value of 0.9543. The AUC value of randomly selected 180 genes was 0.9458 and the predictive performances using other possible gene pairs were also high (Supplementary Table S6). This indicates that gene expression levels between T2D samples and controls in pancreatic islets were significantly different for most genes.

Application of the proposed model to another AD dataset

We applied the proposed model to another AD dataset. For AD, ADNI was used instead of gene expression datasets of GSE63060 and GSE63061. For T2D, the same gene expression data (GSE78721) was used. When we try to find the optimal number of clusters for the ADNI dataset, the optimal rank k of the ADNI dataset was 2 (Supplementary Fig. S2). At least three clusters are required to determine significant differences in the distribution of each gene between clusters with the Kruskal-Wallis and Dunn test. Thus, as an alternative, we clustered the ADNI data with the non-optimal rank k = 3, resulting that 116 AD patients were clustered into three clusters with 17, 54, and 45 samples, respectively (Supplementary Table S7). When the pathway enrichment analysis was performed for genes in each cluster, the ADNI $M_3^+$ gene cluster contained the largest number of T2D-related pathways and followed by $M_1^+$. Among 41 hyperglycemic AD patients, the largest 19 belonged to ADNI S3. However, the proportion of hyperglycemic AD patients in each ADNI sample cluster was the highest in ADNI S1 followed by S3 (S1 = 0.47%, S2 = 0.259%, and S3 = 0.422%), which implies that the characteristics of patients of $S_1$ and $S_3$ are similar and can be merged for the high risk subgroup of T2D.

Additionally, there was no difference between the age of patients in $S_1$ and $S_3$, but the age of patients of $S_2$ was significantly lower than those of $S_1$ and $S_3$ (p values were 0.0013 and 0.0207 with a one-way ANOVA test, respectively). We also performed a one-way ANOVA test for APOE4 among subgroups of AD patients and observed no significant difference between APOE4 (p value as 0.35). Therefore, the proposed method may cluster patients that have some similar clinical characteristics such as the age, but not all of subgroups were clustered by these characteristics.

Conclusion

We have provided a methodological and analytical approach for identifying correlations between AD and T2D at the genetic level. Since the AD dataset does not contain information about whether the AD patients have T2D or not, it is important to define subgroups of AD; the same is true for T2D. Because the conventional NMF is not suitable for this task, we developed a method of gene selection from gene expression data. After applying NMF to gene expression data, additional conditions were taken into account for detecting distinct characteristics of subgroups. Genes with significant differences in expression levels in each patient groups (AD and T2D) were first selected to screen patients with AD associated with T2D and patients with T2D associated with AD. We identified genes that characterize these specific AD and T2D patients and identified the potential relationship between the two diseases based on gene expression profiles. To validate these potential relationships from candidate genes, prediction errors of the classification between AD and controls from logistic regression were compared with randomly selected genes in an independent AD dataset. Inclusion of the candidate genes significantly increased the AUC values in classifying AD from controls compared with randomly selected genes.

In conclusion, we provide new insights for extracting differentially expressed genes with relative differences in a specific patient group. These genes were enriched with pathways related to both AD and T2D such as T cell selection and chemokine pathways. As AD patients have genetic heterogeneity, the investigation of commonly dysregulated pathways in AD and T2D can enhance personalized medical cares for a subgroup of AD. Further studies are needed to reveal the relationships among AD and other AD-related diseases which could improve the prevention and treatment of AD.

References

Brookmeyer, R., Johnson, E., Ziegler-Graham, K. & Arrighi, H. M. Forecasting the global burden of Alzheimer’s disease. Alzheimer’s Dement. 3, 186–191 (2007).
Article Google Scholar
Barker, W. W. et al. Relative frequencies of Alzheimer disease, Lewy body, vascular and frontotemporal dementia, and hippocampal sclerosis in the state of Florida Brain Bank. Alzheimer Dis. Assoc. Disord. 16, 203–212 (2002).
Article PubMed Google Scholar
van Oijen, M., de Jong, F. J., Hofman, A., Koudstaal, P. J. & Breteler, M. M. Subjective memory complaints, education, and risk of Alzheimer’s disease. Alzheimer’s Dement. 3, 92–97 (2007).
Article Google Scholar
Armstrong, R. A. The molecular biology of senile plaques and neurofibrillary tangles in Alzheimer’s disease. Folia Neuropathol. 47, 289–99 (2009).
CAS PubMed Google Scholar
Gouras, G. K., Olsson, T. T. & Hansson, O. $\beta$-amyloid peptides and amyloid plaques in Alzheimer’s disease. Neurotherapeutics 12, 3–11 (2015).
Article CAS PubMed Google Scholar
Vogel, G. Tau protein mutations confirmed as neuron killers. Science 280, 1524–1525 (1998).
Article CAS PubMed Google Scholar
Grant, W. B., Campbell, A., Itzhaki, R. F. & Savory, J. The significance of environmental factors in the etiology of Alzheimer’s disease. J. Alzheimers Dis. 4, 179–189 (2002).
Article PubMed Google Scholar
Abid, N. B., Naseer, M. I. & Kim, M. O. Comparative gene-expression analysis of Alzheimer’s disease progression with aging in transgenic mouse model. Int. J. Mol. Sci. 20, 1219 (2019).
Article PubMed Central CAS Google Scholar
Sherva, R. et al. Genome-wide association study of the rate of cognitive decline in Alzheimer’s disease. Alzheimer’s Dement. 10, 45–52 (2014).
Article Google Scholar
Mayeux, R. & Schupf, N. Blood-based biomarkers for Alzheimer’s disease: Plasma a$\beta$40 and a$\beta$42, and genetic variants. Neurobiol. Aging 32, S10–S19 (2011).
Article CAS PubMed PubMed Central Google Scholar
Loring, J., Wen, X., Lee, J., Seilhamer, J. & Somogyi, R. A gene expression profile of Alzheimer’s disease. DNA Cell Biol. 20, 683–695 (2001).
Article CAS PubMed Google Scholar
Khan, A. T., Dobson, R. J., Sattlecker, M. & Kiddle, S. J. Alzheimer’s disease: Are blood and brain markers related? A systematic review. Ann. Clin. Transl. Neurol. 3, 455–462 (2016).
Article PubMed PubMed Central CAS Google Scholar
Bu, X. et al. Blood-derived amyloid-$\beta$ protein induces Alzheimer’s disease pathologies. Mol. Psychiatry 23, 1948–1956 (2018).
Article PubMed CAS Google Scholar
Fehlbaum-Beurdeley, P. et al. Toward an Alzheimer’s disease diagnosis via high-resolution blood gene expression. Alzheimer’s Dement. 6, 25–38 (2010).
Article CAS Google Scholar
Booij, B. B. et al. A gene expression pattern in blood for the early detection of Alzheimer’s disease. J. Alzheimers Dis. 23, 109–119 (2011).
Article CAS PubMed Google Scholar
Lunnon, K. et al. A blood gene expression marker of early Alzheimer’s disease. J. Alzheimers Dis. 33, 737–753 (2013).
Article CAS PubMed Google Scholar
Rye, P. et al. A novel blood test for the early detection of Alzheimer’s disease. J. Alzheimers Dis. 23, 121–129 (2011).
Article CAS PubMed Google Scholar
Fratiglioni, L. et al. Prevalence of Alzheimer’s disease and other dementias in an elderly urban population: Relationship with age, sex, and education. Neurology 41, 1886–1886 (1991).
Article CAS PubMed Google Scholar
Nussbaum, R. L. & Ellis, C. E. Alzheimer’s disease and Parkinson’s disease. N. Engl. J. Med. 348, 1356–1364 (2003).
Article CAS PubMed Google Scholar
Huang, C.-C. et al. Diabetes mellitus and the risk of Alzheimer’s disease: A nationwide population-based study. PloS One 9, e87095 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Panigrahi, P. P. & Singh, T. R. Computational studies on Alzheimer’s disease associated pathways and regulatory patterns using microarray gene expression and network data: Revealed association with aging and other diseases. J. Theor. Biol. 334, 109–121 (2013).
Article ADS CAS PubMed MATH Google Scholar
Edwards, G. A. III., Gamez, N., Escobedo, G. Jr., Calderon, O. & Moreno-Gonzalez, I. Modifiable risk factors for Alzheimer’s disease. Front. Aging Neurosci. 11, 146 (2019).
Article CAS Google Scholar
Santiago, J. A. & Potashkin, J. A. The impact of disease comorbidities in Alzheimer’s disease. Front. Aging Neurosci. 13, 38 (2021).
Article Google Scholar
Narayanan, M. et al. Common dysregulation network in the human prefrontal cortex underlies two neurodegenerative diseases. Mol. Syst. Biol. 10, 743 (2014).
Article PubMed PubMed Central CAS Google Scholar
Barbagallo, M. & Dominguez, L. J. Type 2 diabetes mellitus and Alzheimer’s disease. World J. Diabetes 5, 889 (2014).
Article PubMed PubMed Central Google Scholar
Haataja, L., Gurlo, T., Huang, C. J. & Butler, P. C. Islet amyloid in type 2 diabetes, and the toxic oligomer hypothesis. Endocr. Rev. 29, 303–316 (2008).
Article CAS PubMed PubMed Central Google Scholar
Benilova, I., Karran, E. & De Strooper, B. The toxic a$\beta$ oligomer and Alzheimer’s disease: An emperor in need of clothes. Nat. Neurosci. 15, 349–357 (2012).
Article CAS PubMed Google Scholar
De Felice, F. G. & Ferreira, S. T. Inflammation, defective insulin signaling, and mitochondrial dysfunction as common molecular denominators connecting type 2 diabetes to Alzheimer disease. Diabetes 63, 2262–2272 (2014).
Article PubMed Google Scholar
De Felice, F. G., Lourenco, M. V. & Ferreira, S. T. How does brain insulin resistance develop in Alzheimer’s disease?. Alzheimer’s Dement. 10, S26–S32 (2014).
Article Google Scholar
Götz, J., Ittner, L. & Lim, Y.-A. Common features between diabetes mellitus and Alzheimer’s disease. Cell. Mol. Life Sci. 66, 1321–1325 (2009).
Article PubMed CAS Google Scholar
Mueller, S. G. et al. Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s disease neuroimaging initiative (ADNI). Alzheimer’s Dement. 1, 55–66 (2005).
Article Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019).
Lee, M.-W., Lee, M. & Oh, K.-J. Adipose tissue-derived signatures for obesity and type 2 diabetes: Adipokines, batokines and micrornas. J. Clin. Med. 8, 854 (2019).
Article CAS PubMed Central Google Scholar
Tiwari, P. et al. Systems genomics of thigh adipose tissue from Asian Indian type-2 diabetics revealed distinct protein interaction hubs. Front. Genet. 9, 679 (2019).
Article PubMed PubMed Central CAS Google Scholar
Chiaromonte, F. & Martinelli, J. Dimension reduction strategies for analyzing global gene expression data with a response. Math. Biosci. 176, 123–144 (2002).
Article MathSciNet CAS PubMed MATH Google Scholar
Liu, W., Yuan, K. & Ye, D. Reducing microarray data via nonnegative matrix factorization for visualization and clustering analysis. J. Biomed. Inform. 41, 602–606 (2008).
Article CAS PubMed Google Scholar
Brunet, J.-P., Tamayo, P., Golub, T. R. & Mesirov, J. P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. 101, 4164–4169 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Carmona-Saez, P., Pascual-Marqui, R. D., Tirado, F., Carazo, J. M. & Pascual-Montano, A. Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinform. 7, 78 (2006).
Article CAS Google Scholar
Mejía-Roa, E. et al. Biclustering and classification analysis in gene expression using nonnegative matrix factorization on multi-GPU systems. In 2011 11th International Conference on Intelligent Systems Design and Applications, 882–887 (IEEE, 2011).
Ding, C. H., Li, T. & Jordan, M. I. Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32, 45–55 (2008).
Article Google Scholar
Kim, J. et al. Digsee: Disease gene search engine with evidence sentences (version cancer). Nucleic Acids Res. 41, W510–W517 (2013).
Article PubMed PubMed Central Google Scholar
Marselli, L. et al. Gene expression profiles of beta-cell enriched tissue obtained by laser capture microdissection from subjects with type 2 diabetes. PloS One 5, e11499 (2010).
Article ADS PubMed PubMed Central CAS Google Scholar
Dominguez, V. et al. Class ii phosphoinositide 3-kinase regulates exocytosis of insulin granules in pancreatic $\beta$ cells. J. Biol. Chem. 286, 4216–4225 (2011).
Article CAS PubMed Google Scholar
Taneera, J. et al. A systems genetics approach identifies genes and pathways for type 2 diabetes in human islets. Cell Metab. 16, 122–134 (2012).
Article CAS PubMed Google Scholar
Zhong, M., Wu, Y., Ou, W., Huang, L. & Yang, L. Identification of key genes involved in type 2 diabetic islet dysfunction: A bioinformatics study. Biosci. Rep. 39, BSR20182172 (2019).
Gaujoux, R. & Seoighe, C. A flexible r package for nonnegative matrix factorization. BMC Bioinform. 11, 1–9 (2010).
Article CAS Google Scholar
Waskom, M. et al. Mwaskom/seaborn: V0. 8.1 (september 2017). Zenodo (2017).
Liang, W. S. et al. Gene expression profiles in anatomically and functionally distinct regions of the normal aged human brain. Physiol. Genomics 28, 311–322 (2007).
Article CAS PubMed Google Scholar
Bertram, L., McQueen, M. B., Mullin, K., Blacker, D. & Tanzi, R. E. Systematic meta-analyses of Alzheimer disease genetic association studies: The alzgene database. Nat. Genet. 39, 17–23 (2007).
Article CAS PubMed Google Scholar
Rani, J. et al. T2diacod: A gene atlas of type 2 diabetes mellitus associated complex disorders. Sci. Rep. 7, 1–21 (2017).
Article ADS CAS Google Scholar
Jevtic, S., Sengar, A. S., Salter, M. W. & McLaurin, J. The role of the immune system in Alzheimer disease: Etiology and treatment. Ageing Res. Rev. 40, 84–94 (2017).
Article CAS PubMed Google Scholar
Berbudi, A., Rahmadika, N., Tjahjadi, A. I. & Ruslami, R. Type 2 diabetes and its impact on the immune system. Curr. Diabetes Rev. 16, 442 (2020).
PubMed PubMed Central Google Scholar
Zuena, A. R., Casolini, P., Lattanzi, R. & Maftei, D. Chemokines in Alzheimer’s disease: New insights into prokineticins, chemokine-like proteins. Front. Pharmacol. 10, 622 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yao, L., Herlea-Pana, O., Heuser-Baker, J., Chen, Y. & Barlic-Dicen, J. Roles of the chemokine system in development of obesity, insulin resistance, and cardiovascular disease. J. Immunol. Res.. 2014, 181450 (2014).
Bajetto, A., Bonavia, R., Barbero, S., Florio, T. & Schettini, G. Chemokines and their receptors in the central nervous system. Front. Neuroendocrinol. 22, 147–184 (2001).
Article CAS PubMed Google Scholar
Jaerve, A. & Müller, H. W. Chemokines in CNS injury and repair. Cell Tissue Res. 349, 229–248 (2012).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found http://adni.loni.usc.edu/wp-content/themes/freshnewsdev-v2/documents/policy/ADNI_Acknowledgement_List%205-29-18.pdf. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Novartis Pharmaceuticals Corporation; Pfzer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. Te Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). Te grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Samples from the National Cell Repository for AD (NCRAD), which receives government support under a cooperative agreement grant (U24 AG21886) awarded by the National Institute on Aging (AIG), were used in this study. Additional support for data analysis was provided by NLM R01 LM012535, NIA R03 AG054936, and the Pennsylvania Department of Health (#SAP 4100070267). The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.

Funding

This work has been supported by the Bio & Medical Technology Development Program of NRF funded by the Korean government (MSIT) (NRF-2018M3C7A1054935) and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant Number: HI18C0460).

Author information

List of authors and their affiliations appear at the end of the paper.

Authors and Affiliations

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju, Korea
Yeonwoo Chung & Hyunju Lee
UC San Francisco, San Francisco, CA, 94107, USA
Michael W. Weiner, Norbert Schuf, Howard J. Rosen, Bruce L. Miller & David Perry
UC San Diego, La Jolla, CA, 92093, USA
Paul Aisen, Ronald G. Thomas, Michael Donohue, Sarah Walter, Tamie Sather, Gus Jiminez, Archana B. Balasubramanian, Jennifer Mason, Iris Sim, James Brewer, Helen Vanderswag & Adam Fleisher
Mayo Clinic, Rochester, MN, USA
Ronald Petersen, Cliford R. Jack Jr., Matthew Bernstein, Bret Borowski, Jef Gunter, Matt Senjem, Prashanthi Vemuri, David Jones, Kejal Kantarci, Chad Ward, Sara S. Mason, Colleen S. Albers, David Knopman & Kris Johnson
UC Berkeley, Berkeley, San Francisco, USA
William Jagust, Susan Landau, Judith L. Heidebrink & Joanne L. Lord
University of Pennsylvania, Philadelphia, PA, 19104, USA
John Q. Trojanowki, Leslie M. Shaw, Steven E. Arnold, Jason H. Karlawish, David Wolk & Christopher M. Clark
USC, Los Angeles, CA, 90032, USA
Arthur W. Toga, Karen Crawford, Scott Neu, Sonia Pawluczyk, Mauricio Beccera, Liberty Teodoro & Bryan M. Spann
UC Davis, Sacramento, CA, USA
Laurel Beckett, Danielle Harvey, Charles DeCArli, Evan Fletcher, Pauline Maillard, John Olichney & Owen Carmichael
Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, 02215, USA
RobertC. Green, Reisa A. Sperling, Keith A. Johnson & Gad Marshall
Indiana University, Bloomington, IN, 47405, USA
Andrew J. Saykin, Tatiana M. Foroud, Li Shen, Kelley Faber, Sungeun Kim, Martin R. Farlow, Ann Marie Hake, Brandy R. Matthews, Jared R. Brosch, Scott Herring & Cynthia Hunt
Washington University St. Louis, St. Loui, MO, 63110, USA
John Morris, Marc Raichle, Davie Holtzman, Nigel J. Cairns, Erin Householder, Lisa Taylor-Reinwald, Beau Ances, Maria Carroll, Mary L. Creech, Erin Franklin, Mark A. Mintun, Stacy Schneider, Angela Oliver & Lon S. Schneider
Prevent Alzheimer’s Disease, Rockville, MD, 20850, USA
Zaven Khachaturian
Siemens, Erlangen, Germany
Greg Sorensen & Lean Tha
Alzheimer’s Association, Chicago, IL, 60631, USA
Maria Carrillo
University of Pittsburg, Pittsburgh, PA, 15213, USA
Lew Kuller, Chet Mathis, Oscar L. Lopez, MaryAnn Oakley & Donna M. Simpson
Cornell University, Ithaca, NY, 14853, USA
Steven Paul, Norman Relkin, Gloria Chaing, Michael Lin & Lisa Ravdin
Albert Einstein College of Medicine, Yeshiva University, Bronx, NY, 10461, USA
Peter Davies
AD Drug Discovery Foundation, New York, NY, 10019, USA
Howard Fillit
Acumen Pharmaceuticals, Livermore, CA, 94551, USA
Franz Hefti
Northwestern University, Chicago, IL, 60611, USA
M. Marcel Mesulam, Kristine Lipowski, Masandra Weintraub, Borna Bonakdarpour, Diana Kerwin, Chuang-Kuo Wu & Nancy Johnson
National Institute of Mental Health, Bethesda, MD, 20892, USA
William Potter
Brown University, Providence, RI, 02912, USA
Peter Snyder, Brian R. Ott, Henry Querfurth, Geofrey Tremont, Stephen Salloway, Paul Malloy & Stephen Correia
University of Washington, Seattle, WA, 98195, USA
Tom Montine
University of London, London, UK
Nick Fox
UCLA, Torrance, CA, 90509, USA
Paul Thompson, Virginia Lee, Magdalena Korecka, Michal Figurski, Liana Apostolova, Kathleen Tingus, Ellen Woo, Daniel H. S. Silverman, Po H. Lu & George Bartzokis
University of Michigan, Ann Arbor, MI, 48109-2800, USA
Robert A. Koeppe
University of Utah, Salt Lake City, UT, 84112, USA
Norm Foster
Banner Alzheimer’s Institute, Phoenix, AZ, 85006, USA
Eric M. Reiman, Kewei Chen, Marwan N. Sabbagh, Christine M. Belden, Sandra A. Jacobson, Sherye A. Sirrel, Pierre Tariot, Anna Burke, Nadira Trncic, Adam Fleisher & Stephanie Reeder
UUC Irvine, Orange, CA, 92868, USA
Steven Potkin, Adrian Preda & Dana Nguyen
Johns Hopkins University, Baltimore, MD, 21205, USA
Marissa Natelson Love, Chiadi Onyike, Daniel D’Agostino II & Stephanie Kielb
Richard Frank Consulting, New York, USA
Richard Frank & Hillel Grossman
National Institute on Aging, Baltimore, MD, USA
John Hsiao
Oregon Health and Science University, Portland, OR, 97239, USA
Jefrey Kaye
University of Alabama, Birmingham, AL, USA
Joseph Quinn, Lisa Silbert, Betty Lind, Raina Carter, Sara Dolen, Daniel Marson, Randall Grifth, David Clark, David Geldmacher, John Brockington & Erik Roberson
Mount Sinai School of Medicine, New York, NY, USA
Efe Mitsis
Rush University Medical Center, Chicago, IL, 60612, USA
Raj C. Shah & Leyla deToledo-Morrell
Baylor College of Medicine, Houston, TX, USA
Rachelle S. Doody, Javier Villanueva-Meyer, Munir Chowdhury, Susan Rountree & Mimi Dang
Wien Center, Miami Beach, FL, 33140, USA
Ranjan Duara, Daniel Varon, Maria T. Greig & Peggy Roberts
Columbia University Medical Center, New York, NY, USA
Yaakov Stern, Lawrence S. Honig & Karen L. Bell
New York University, New York, NY, USA
Marilyn Albert, James E. Galvin, Brittany Cerbone, Christina A. Michel, Dana M. Pogorelec, Henry Rusinek, Mony J. de Leon, Lidia Glodzik & Susan De Santi
University of Texas Southwestern Medical School, Galveston, TX, 77555, USA
Kyle Womack, Dana Mathews & Mary Quiceno
Duke University Medical Center, Durham, NC, USA
P. Murali Doraiswamy, Jefrey R. Petrella, Salvador Borges-Neto, Terence Z. Wong & Edward Coleman
Emory University, Atlanta, GA, 30307, USA
Allan I. Levey, James J. Lah & Janet S. Cella
Medical Center, University of Kansas, Kansas City, KS, USA
Jefrey M. Burns, Russell H. Swerdlow & William M. Brooks
University of Kentucky, Lexington, KY, USA
Charles D. Smith, Greg Jicha, Peter Hardy, Partha Sinha, Elizabeth Oates & Gary Conrad
Mayo Clinic, Jacksonville, FL, USA
Neill R. Graf-Radford, Francine Parftt, Tracy Kendall & Heather Johnson
University of Rochester Medical Center, Rochester, NY, 14642, USA
Anton P. Porsteinsson, Bonnie S. Goldstein, Kim Martin, Kelly M. Makino, M. Saleem Ismail, Connie Brand, Ruth A. Mulnard, Gaby Thai & Catherine Mc-Adams-Ortiz
Yale University School of Medicine, New Haven, CT, USA
Christopher H. van Dyck, Richard E. Carson, Martha G. MacAvoy & Pradeep Varma
McGill Univ. Montreal-Jewish General Hospital, Montreal, PQ, H3A 2A7, Canada
Howard Chertkow, Howard Bergman & Chris Hosein
Sunnybrook Health Sciences, Toronto, ON, Canada
Sandra Black, Bojana Stefanovic & Curtis Caldwell
U.B.C. Clinic for AD & Related Disorders, Vancouver, BC, Canada
Ging-Yuek Robin Hsiung, Howard Feldman, Benita Mudge & Michele Assaly
Cognitive Neurology - St. Joseph’s, London, ON, Canada
Elizabeth Finger, Stephen Pasternack, Irina Rachisky, Dick Trost & Andrew Kertesz
Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, 89106, USA
Charles Bernick & Donna Munic
Premiere Research Inst (Palm Beach Neurology), W Palm Beach, FL, USA
Carl Sadowsky & Teresa Villena
Georgetown University Medical Center, Washington, DC, 20007, USA
Raymond Scott Turner, Kathleen Johnson & Brigid Reynolds
Stanford University, Stanford, CA, 94305, USA
Jerome Yesavage, Joy L. Taylor, Barton Lane, Allyson Rosen & Jared Tinklenberg
Boston University, Boston, MA, USA
Neil Kowall, Ronald Killiany, Andrew E. Budson, Alexander Norbash & Patricia Lynn Johnson
Howard University, Washington, DC, 20059, USA
Thomas O. Obisesan, Saba Wolday & Joanne Allard
Case Western Reserve University, Cleveland, OH, 44106, USA
Alan Lerner, Paula Ogrocki, Curtis Tatsuoka & Parianne Fatica
Neurological Care of CNY, Liverpool, NY, 13088, USA
Smita Kittur
St. Joseph’s Health Care, London, ON, N6A 4H1, Canada
Andrew Kertesz, Michael Borrie, T.-Y. Lee, Rob Bartha, Sterling Johnson, Sanjay Asthana, Cynthia M. Carlsson, Elizabether Finger, Stephen Pasternak, Irina Rachinsky, John Rogers & Dick Drost
Dent Neurologic Institute, Amherst, NY, 14226, USA
Vernice Bates, Horacio Capote & Michelle Rainka
Ohio State University, Columbus, OH, 43210, USA
Douglas W. Scharre, Maria Kataki & Anahita Adeli
Albany Medical College, Albany, NY, 12208, USA
Earl A. Zimmerman, Dzintra Celmins & Alice D. Brown
Hartford Hospital Olin Neuropsychiatry Research Center, Hartford, CT, 06114, USA
Godfrey D. Pearlson, Karen Blank & Karen Anderson
Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA
Laura A. Flashman, Marc Seltzer, Mary L. Hynes & Robert B. Santulli
Wake Forest University Health Sciences, Winston-Salem, NC, USA
Kaycee M. Sink, Leslie Gordineer, Jef D. Williamson, Pradeep Garg & Franklin Watkins
Medical University South Carolina, Charleston, SC, 29425, USA
Jacobo Mintzer, Kenneth Spicer & David Bachman
Nathan Kline Institute, Orangeburg, NY, USA
Nunzio Pomara, Raymundo Hernando & Antero Sarrael
University of Iowa College of Medicine, Iowa City, IA, 52242, USA
Susan K. Schultz, Laura L. Boles Ponto, Hyungsub Shim & Karen Ekstam Smith
USF Health Byrd Alzheimer’s Institute, University of South Florida, Tampa, FL, 33613, USA
Amanda Smith, Balebail Ashok Raj & Kristin Fargher

Authors

Yeonwoo Chung
View author publications
You can also search for this author in PubMed Google Scholar
Hyunju Lee
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

the Alzheimer’s Disease Neuroimaging Initiative

Michael W. Weiner
, Paul Aisen
, Ronald Petersen
, Cliford R. Jack Jr.
, William Jagust
, John Q. Trojanowki
, Arthur W. Toga
, Laurel Beckett
, RobertC. Green
, Andrew J. Saykin
, John Morris
, Leslie M. Shaw
, Zaven Khachaturian
, Greg Sorensen
, Maria Carrillo
, Lew Kuller
, Marc Raichle
, Steven Paul
, Peter Davies
, Howard Fillit
, Franz Hefti
, Davie Holtzman
, M. Marcel Mesulam
, William Potter
, Peter Snyder
, Tom Montine
, Ronald G. Thomas
, Michael Donohue
, Sarah Walter
, Tamie Sather
, Gus Jiminez
, Archana B. Balasubramanian
, Jennifer Mason
, Iris Sim
, Danielle Harvey
, Matthew Bernstein
, Nick Fox
, Paul Thompson
, Norbert Schuf
, Charles DeCArli
, Bret Borowski
, Jef Gunter
, Matt Senjem
, Prashanthi Vemuri
, David Jones
, Kejal Kantarci
, Chad Ward
, Robert A. Koeppe
, Norm Foster
, Eric M. Reiman
, Kewei Chen
, Chet Mathis
, Susan Landau
, Nigel J. Cairns
, Erin Householder
, Lisa Taylor-Reinwald
, Virginia Lee
, Magdalena Korecka
, Michal Figurski
, Karen Crawford
, Scott Neu
, Tatiana M. Foroud
, Steven Potkin
, Li Shen
, Kelley Faber
, Sungeun Kim
, Lean Tha
, Richard Frank
, John Hsiao
, Jefrey Kaye
, Joseph Quinn
, Lisa Silbert
, Betty Lind
, Raina Carter
, Sara Dolen
, Beau Ances
, Maria Carroll
, Mary L. Creech
, Erin Franklin
, Mark A. Mintun
, Stacy Schneider
, Angela Oliver
, Lon S. Schneider
, Sonia Pawluczyk
, Mauricio Beccera
, Liberty Teodoro
, Bryan M. Spann
, James Brewer
, Helen Vanderswag
, Adam Fleisher
, Daniel Marson
, Randall Grifth
, David Clark
, David Geldmacher
, John Brockington
, Erik Roberson
, Marissa Natelson Love
, Judith L. Heidebrink
, Joanne L. Lord
, Sara S. Mason
, Colleen S. Albers
, David Knopman
, Kris Johnson
, Hillel Grossman
, Efe Mitsis
, Raj C. Shah
, Leyla deToledo-Morrell
, Rachelle S. Doody
, Javier Villanueva-Meyer
, Munir Chowdhury
, Susan Rountree
, Mimi Dang
, Ranjan Duara
, Daniel Varon
, Maria T. Greig
, Peggy Roberts
, Yaakov Stern
, Lawrence S. Honig
, Karen L. Bell
, Marilyn Albert
, Chiadi Onyike
, Daniel D’Agostino II
, Stephanie Kielb
, James E. Galvin
, Brittany Cerbone
, Christina A. Michel
, Dana M. Pogorelec
, Henry Rusinek
, Mony J. de Leon
, Lidia Glodzik
, Susan De Santi
, Kyle Womack
, Dana Mathews
, Mary Quiceno
, P. Murali Doraiswamy
, Jefrey R. Petrella
, Salvador Borges-Neto
, Terence Z. Wong
, Edward Coleman
, Allan I. Levey
, James J. Lah
, Janet S. Cella
, Jefrey M. Burns
, Russell H. Swerdlow
, William M. Brooks
, Steven E. Arnold
, Jason H. Karlawish
, David Wolk
, Christopher M. Clark
, Liana Apostolova
, Kathleen Tingus
, Ellen Woo
, Daniel H. S. Silverman
, Po H. Lu
, George Bartzokis
, Charles D. Smith
, Greg Jicha
, Peter Hardy
, Partha Sinha
, Elizabeth Oates
, Gary Conrad
, Neill R. Graf-Radford
, Francine Parftt
, Tracy Kendall
, Heather Johnson
, Oscar L. Lopez
, MaryAnn Oakley
, Donna M. Simpson
, Martin R. Farlow
, Ann Marie Hake
, Brandy R. Matthews
, Jared R. Brosch
, Scott Herring
, Cynthia Hunt
, Anton P. Porsteinsson
, Bonnie S. Goldstein
, Kim Martin
, Kelly M. Makino
, M. Saleem Ismail
, Connie Brand
, Ruth A. Mulnard
, Gaby Thai
, Catherine Mc-Adams-Ortiz
, Christopher H. van Dyck
, Richard E. Carson
, Martha G. MacAvoy
, Pradeep Varma
, Howard Chertkow
, Howard Bergman
, Chris Hosein
, Sandra Black
, Bojana Stefanovic
, Curtis Caldwell
, Ging-Yuek Robin Hsiung
, Howard Feldman
, Benita Mudge
, Michele Assaly
, Elizabeth Finger
, Stephen Pasternack
, Irina Rachisky
, Dick Trost
, Andrew Kertesz
, Charles Bernick
, Donna Munic
, Kristine Lipowski
, Masandra Weintraub
, Borna Bonakdarpour
, Diana Kerwin
, Chuang-Kuo Wu
, Nancy Johnson
, Carl Sadowsky
, Teresa Villena
, Raymond Scott Turner
, Kathleen Johnson
, Brigid Reynolds
, Reisa A. Sperling
, Keith A. Johnson
, Gad Marshall
, Jerome Yesavage
, Joy L. Taylor
, Barton Lane
, Allyson Rosen
, Jared Tinklenberg
, Marwan N. Sabbagh
, Christine M. Belden
, Sandra A. Jacobson
, Sherye A. Sirrel
, Neil Kowall
, Ronald Killiany
, Andrew E. Budson
, Alexander Norbash
, Patricia Lynn Johnson
, Thomas O. Obisesan
, Saba Wolday
, Joanne Allard
, Alan Lerner
, Paula Ogrocki
, Curtis Tatsuoka
, Parianne Fatica
, Evan Fletcher
, Pauline Maillard
, John Olichney
, Owen Carmichael
, Smita Kittur
, Michael Borrie
, T.-Y. Lee
, Rob Bartha
, Sterling Johnson
, Sanjay Asthana
, Cynthia M. Carlsson
, Adrian Preda
, Dana Nguyen
, Pierre Tariot
, Anna Burke
, Nadira Trncic
, Adam Fleisher
, Stephanie Reeder
, Vernice Bates
, Horacio Capote
, Michelle Rainka
, Douglas W. Scharre
, Maria Kataki
, Anahita Adeli
, Earl A. Zimmerman
, Dzintra Celmins
, Alice D. Brown
, Godfrey D. Pearlson
, Karen Blank
, Karen Anderson
, Laura A. Flashman
, Marc Seltzer
, Mary L. Hynes
, Robert B. Santulli
, Kaycee M. Sink
, Leslie Gordineer
, Jef D. Williamson
, Pradeep Garg
, Franklin Watkins
, Brian R. Ott
, Henry Querfurth
, Geofrey Tremont
, Stephen Salloway
, Paul Malloy
, Stephen Correia
, Howard J. Rosen
, Bruce L. Miller
, David Perry
, Jacobo Mintzer
, Kenneth Spicer
, David Bachman
, Elizabether Finger
, Stephen Pasternak
, Irina Rachinsky
, John Rogers
, Dick Drost
, Nunzio Pomara
, Raymundo Hernando
, Antero Sarrael
, Susan K. Schultz
, Laura L. Boles Ponto
, Hyungsub Shim
, Karen Ekstam Smith
, Norman Relkin
, Gloria Chaing
, Michael Lin
, Lisa Ravdin
, Amanda Smith
, Balebail Ashok Raj
& Kristin Fargher

Contributions

H.L. contributed to the study concept and design. Y.C. designed and implemented the proposed algorithms. H.L. and Y.C. analyzed and interpreted results. Y.C. and H.L. wrote the manuscript. H.L. took part in the study supervision and coordination. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hyunju Lee.

Ethics declarations

Competing interests

The authors declare no competing interests

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chung, Y., Lee, H. & the Alzheimer’s Disease Neuroimaging Initiative. Correlation between Alzheimer’s disease and type 2 diabetes using non-negative matrix factorization. Sci Rep 11, 15265 (2021). https://doi.org/10.1038/s41598-021-94048-0

Download citation

Received: 20 January 2021
Accepted: 24 June 2021
Published: 27 July 2021
DOI: https://doi.org/10.1038/s41598-021-94048-0

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Exploring the pathogenesis and key genes associated of acute myocardial infarction complicated with Alzheimer’s disease

Predicting early Alzheimer’s with blood biomarkers and clinical features

Unbiased proteomics and multivariable regularized regression techniques identify SMOC1, NOG, APCS, and NTN1 in an Alzheimer’s disease brain proteomic signature

Introduction

Methods

Data description

Decomposition of gene expression data sets

Determination of the optimal number of clusters according to the rank k

Gene and sample clusters

Differentially expressed gene (DEG) subgroups

Extraction of AD and T2D-related subgroup pairs and candidate genes

Results and discussion

Clustering of AD and T2D genes

Selecting cluster pairs of interest

Extraction of candidate genes

Application of the proposed model to another AD dataset

Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Consortia

the Alzheimer’s Disease Neuroimaging Initiative

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links