Alzheimer’s disease, mild cognitive impairment, and normal aging distinguished by multi-modal parcellation and machine learning

A 360-area surface-based cortical parcellation is extended to study mild cognitive impairment (MCI) and Alzheimer’s disease (AD) from healthy control (HC) using the joint human connectome project multi-modal parcellation (JHCPMMP) proposed by us. We propose a novel classification method named as JMMP-LRR to accurately identify different stages toward AD by integrating the JHCPMMP with the logistic regression-recursive feature elimination (LR-RFE). In three-group classification, the average accuracy is 89.0% for HC, MCI, and AD compared to previous studies using other cortical separation with the best classification accuracy of 81.5%. By counting the number of brain regions whose feature is in the feature subset selected with JMMP-LRR, we find that five brain areas often appear in the selected features. The five core brain areas are Fusiform Face Complex (L-FFC), Area 10d (L-10d), Orbital Frontal Complex (R-OFC), Perirhinal Ectorhinal (L-PeEc) and Area TG dorsal (L-TGd, R-TGd). The features corresponding to the five core brain areas are used to form a new feature subset for three classifications with the average accuracy of 80.0%. Results demonstrate the importance of the five core brain regions in identifying different stages toward AD. Experiment results show that the proposed method has better accuracy for the classification of HC, MCI, AD, and it also proves that the division of brain regions using JHCPMMP is more scientific and effective than other methods.

Data preprocessing. The brain is parcellated with 180 areas per hemisphere by using HCPMMP atlas, which delineates the cortical architecture, function, and connectivity. The sparse network is obtained with the help of JHCPMMP 10 . The process of this step is to process the fMRI data, project it to CIFTI Space, and obtain the sparse network through MMP. MMP can show dramatic changes in cortical thickness, myelin atlas, task fMRI, and resting fMRI for each brain region. The correlation can be calculated for 360 areas. The sparse network is generated by searching the proportion of the strongest weights (PSW). The purpose of this step is to reduce noise and weakly correlated connections.
Network features in each node of the connectivity network are calculated as the candidate features. The feature vector of each sample contained strength (S), betweenness centrality (BC), clustering coefficient (CC), local efficiency (LE), eigenvector centrality (EC), k-coreness centrality (KC), page rank centrality (PC), Subgraph centrality (SC) and flow coefficient (FC). The software calculating graph theoretical measures can be the Brain Connectivity Toolbox (BCT, available at: https://sites.google.com/site/bctnet/).
For single local network measure, a vector of 360 × 1 is formed in which each vector represents an eigenvalue from the corresponding functional area in brain cortex. By calculating the attributes of the brain network, a feature matrix of 360 × 9 is formed, and each feature is stored in a column. The advantage of scaling each column of the eigenvalue matrix is that the range of the eigenvalue is not too large, which leads to the dominance of the more valuable features in classification. Each feature is normalized to the range −1 1 . Therefore, the 360 × 9 = 3,240 candidate features are generated for classification of HC, MCI and AD.
Feature selection. The network-based measure generates the 3,240 candidate feature values for classification, which greatly affects the calculation cost and classification accuracy. Then, the 3,240 candidate features was a feature vector for each subject. Noisy and irrelevant features often lead to over-fitting problem. Generally, feature selection should be implemented before classification by extracting a subset of feature from the original 3,240 candidate features, which could reduce training time, test time and improve classification performance.
There are two main methods for feature selection, including filter, wrapper. The characteristic of filter feature selection is to select features from data first, and then train the learner, the process of feature selection is independent of subsequent learners. The wrapper method uses an inductive algorithm directly to evaluate the feature subset, which is generally better than filter method in terms of prediction accuracy, but usually more computationally intensive.
Recursive feature elimination (RFE) is a common method in wrapper feature selection. From the final performance of the learner, the wrapped feature selection is better than the filtered feature selection. The RFE method continuously eliminates the features with low contribution scores on the basis of the iterative method, and then ranks each feature in each cycle to delete the n features with the lowest score.
LR-RFE algorithm is applied to extract important features from the 3,240 features. The main idea of LR-RFE algorithm is to repeatedly eliminate features with low contribution scores based on the iterative method, and rank each feature in each cycle using LR algorithm to delete the 10 features with the lowest score. The process is repeated for the remaining features until all features are traversed. From the 3,240 features, 30 optimal feature subsets are selected by using LR-RFE. The LR-RFE algorithm is implemented for finding optimal feature subset in Python using the Sklearn package. www.nature.com/scientificreports www.nature.com/scientificreports/ SVM classifier. One-vs-the-rest support vector machine (OVR-SVM) is applied to achieve high classification accuracy after the dimension of the features has been reduced by LR-REF. SVM is a binary classification model to find a hyperplane to segment the samples. Dealing with multi-class classification problems requires the construction of a suitable multi-class classifier. This paper adopts OvR multi-class strategy, also known as one-vs-all.
OvR is the most commonly used strategy for multi-class classification. One class at a time is taken as a positive example, and the other classes are taken as a negative example to train N classifiers. If only one classifier is predicted as a positive class, the corresponding class label is used as the final classification result. The OvR-SVM is a multivariate statistical method that can be used for classification. In this paper, we use OVR-SVM as the classifier.
The mathematical principle of OVR-SVM is as follows: When you want to distinguish K classes, the problem can be expressed as the mathematical problem described in Eqs. 1-3 11 .
where the training data x i are mapped to a higher dimensional space by the function φ and C is the penalty parameter. When Eq. 7 is solved, there are decision k functions: We find the largest value of the decision function in the class Through feature selection, the number of data samples in the experiment is 72 and the number of features is 30, which is consistent with the characteristics of small sample and high dimension. It also indicates that OVR-SVM is very suitable for the three-class classification.
LR-RFE algorithm steps.     www.nature.com/scientificreports www.nature.com/scientificreports/ of equal size. Using the model of other k-1 subsets to train classifiers, one of the K parts is tested 12 . In this experiment, the evaluation model uses the following evaluation indicators: Accuracy, Precision, Recall, F1-score. Table 1 lists the confusion matrix of three classification. Each performance is defined in Eqs. 4-7.

Results
The brain MR imaging data of 72 subjects (mean age:76.3 ± 7.7 years, range: 55.8-95.9 years, meal/female: 40/32) used in this paper are obtained from the Alzheimer's disease Neuroimaging Initiative (ADNI database (adni.loni. usc.edu), including T1 and T2 structure data, resting state fMRI with eyes open, field map. In the present study, 24 subjects per groups in three classes of HC, MCI and AD were analyzed in this study. Table 2 lists the demographics of all this subjects.
In this paper, the final feature vectors, which are obtained after dimension reduction using LR_REF, are classified by SVM. A total of 2,160 feature vectors (72 subjects × 30 features) are used for classification. The state recognition of HC, MCI, AD is performed with the three two-class SVM classifiers. We use the SVM classifier which is implemented by and choose Linear as kernel.The parameters of SVM are determined by 5-fold cross-validation method. The classification results are summarized in Table 3. As can be seen form the Table 3, the OVR-SVM classifier achieved the accuracy of 89% for classification of three groups of HC, MCI, and AD. Moreover, we further applied two typical methods, namely, logistic regression (LR) and K-nearest neighbor (KNN) in Alzheimer's disease recognition to the same imaging data for a comprehensive comparison. The classification results are summarized in Table 3, which shows that the proposed method achieves better performance than other two methods. The AD vs. MCI vs. HC classification performance metrics are showed in Fig. 3.
To estimate the generalization ability of our proposed method, experiments are also performed on three binary classification tasks (HC vs. AD, MCI vs. AD, and HC vs. MCI). The classification accuracies of two classes are 98.0% for AD vs. HC, 92.0% for MCI vs. AD, and 95.5% for HC vs. MCI. Similarly, we further applied logistic regression (LR) and K-nearest neighbor (KNN) to the same imaging data for two-class classification as a comparison. The classification results are summarized in Table 4, which shows that the proposed method achieves better performance than other two methods. The brain regions corresponding to the 30 features involved in classification. With HCPMMP's rules for dividing brain regions, the number of the brain region in the right brain is 1-180, and the number of the brain region in the left brain is 181-360. Because the brain is symmetrical, the brain region of the left brain can also be found in the right brain.
In the three-class classification and two-classification of Alzheimer's disease, we used the 30 features corresponding to the 24 cortical areas in Table 5.
As shown in Table 5, we further analyzed the information of 30 features and then found the five key cortical areas, and each and each cortex area corresponded to two or more features, namely Fusiform Face Complex (L-FFC), Area 10d (L-10d), Orbital Frontal Complex (R-OFC), Perirhinal Ectorhinal (L-PeEc) and Area TG dorsal (L-TGd,R-TGd). The corresponding characteristics of specific key areas are shown in Table 6. Their specific distribution in the brain is shown in Fig. 4.   Figure 4. The five core cortical areas' specific distribution in the brain.  Table 7. The classification accuracies corresponding to different brain areas and features. Note: Set 1: classification accuracies in SVM and LR with 24 brain areas and 30 features; Set 2: classification accuracies in SVM and LR with 5 core brain areas and 11 features; Set 3: classification accuracies in SVM and LR with 11 features and random 5 brain areas from 24 brain areas except 5 core brain areas. (2020) 10:5475 | https://doi.org/10.1038/s41598-020-62378-0 www.nature.com/scientificreports www.nature.com/scientificreports/ In order to further analyze the five core Cortical areas, the 11 features corresponding to the five Cortical areas of FFC, 10d, OFC, PeEc and TGd are selected from 30 features corresponding to the 24 Cortical areas, which are used to classify HC, MCI, and AD. Subsequently, we use the 5-fold cross-validation of SVM and LR to classify these separately. From Table 7, the accuracies of the classification in SVM and LR with 11 features are 80% and 78%, respectively.

Modle
In addition, the accuracies of the classification in SVM and LR with 30 features of 24 cortical areas are 89% and 88%, respectively. Furthermore, in order to analyze the role of the features of five cortical areas in classification, we randomly select the corresponding features of the five cortical areas in the remaining 19 cortical areas to calculate the accuracy of classification. The training and test are repeated 10 times to get the average accuracies for SVM and LR. The classification results of Accuracy_3 were given in Table 7.
From Table 7, the classification accuracies of Set 2 are closer to that of Set 1, but the classification accuracies of Set 3 are much lower than that of Set 1. Obviously, when the features are taken from five Cortical areas of FFC, 10d, OFC, PeEc and TGd, the classification accuracy is high than random five cortical areas. Therefore, we observe that the five cortical areas have a great impact on the results of the three-class classification.

Discussion
Most previous studies focused on the two-class classification between HC, MCI, and AD, and they have achieved great accuracy. With the imaging data of ADNI database, some studies also reported recognition results of three-class classification between HC, MCI, and AD. As shown in Table 8, our method obtained higher accuracy than previous studies using old brain parcellation methods. It shows that our parcellation scheme benefits the classification of HC, MCI, and AD. It also proves that the division of brain regions of JHCPMMP is more scientific and effective than other methods.
As shown in Table 7, when the features are taken from five cortical areas of FFC, 10d, OFC, PeEc and TGd, the classification accuracy is high than that using five random cortical areas. Therefore, the five cortical areas have a great impact on the results of the three-class classification. And this finding has been confirmed in previous clinical papers. Zebrowitz 13 observed lower activation, specificity, and resting blood flow for older adults than younger adults in the fusiform face area (FFA) but not in other regions of interest, and then the facial selection mechanism of the elderly was uncoordinated. Bludau et al. 14 found that Fp1 and Fp2 have different contributions to functional networks. Fp1 was involved in cognition, working memory and perception, whereas Fp2 was part of brain networks underlying affective processing and social cognition. Grabenhorst et al. 15 pointed out that OFC can affect people's function of feeling happiness, pain, and reward and punishment. Ding et al. 16 found that human TPC actually includes anterior parts of areas 35, 36, and TPC seems to be involved in social and emotional processing to a large extent, including facial processing, recognition and semantic memory. Olson et al. 17 studied that TGd may combine complex and highly processed perceptual input with visceral emotional response. Thus, there five areas all have been confirmed to be involved in human facial processing, emotional perception and memory function. Therefore, our results were in line with those reported in previous studies, showing significant importance to further explore the treatment strategies of Alzheimer's disease, and carry out early intervention to delay the deterioration of the disease.

conclusion
We propose a method JMMP-LRR which combines LR-RFE and JHCPMMP for three classifications of AD patients. fMRI data is processed by JHCPMMP to obtain small samples, ultra-high-dimensional data, these data directly involved in classification will cause too long running time and low classification accuracy, JMMP-LRR can solve the problem very well. The features obtained by using LR-RFE as feature extraction were more recognizable for the three classifications of AD patients, and could achieve high classification accuracy. By analyzing the The step of calculating the brain region.