Multimodal Hippocampal Subfield Grading For Alzheimer’s Disease Classification

Numerous studies have proposed biomarkers based on magnetic resonance imaging (MRI) to detect and predict the risk of evolution toward Alzheimer’s disease (AD). Most of these methods have focused on the hippocampus, which is known to be one of the earliest structures impacted by the disease. To date, patch-based grading approaches provide among the best biomarkers based on the hippocampus. However, this structure is complex and is divided into different subfields, not equally impacted by AD. Former in-vivo imaging studies mainly investigated structural alterations of these subfields using volumetric measurements and microstructural modifications with mean diffusivity measurements. The aim of our work is to improve the current classification performances based on the hippocampus with a new multimodal patch-based framework combining structural and diffusivity MRI. The combination of these two MRI modalities enables the capture of subtle structural and microstructural alterations. Moreover, we propose to study the efficiency of this new framework applied to the hippocampal subfields. To this end, we compare the classification accuracy provided by the different hippocampal subfields using volume, mean diffusivity, and our novel multimodal patch-based grading framework combining structural and diffusion MRI. The experiments conducted in this work show that our new multimodal patch-based method applied to the whole hippocampus provides the most discriminating biomarker for advanced AD detection while our new framework applied into subiculum obtains the best results for AD prediction, improving by two percentage points the accuracy compared to the whole hippocampus.

Alzheimer's disease (AD) is an irreversible neurodegenerative process leading to mental dysfunctions. Subjects presenting mild cognitive impairment (MCI) have a higher risk of developing AD 1 . To study the preclinical phase of the disease, the Alzheimer's disease neuroimaging initiative (ADNI) has been set up based on two MCI definitions: early MCI (eMCI) and late MCI (lMCI). Subjects with eMCI have milder cognitive impairment than those with lMCI, both suffering from amnesic MCI 2 . Such clinical symptoms are caused by changes like synaptic and neuronal losses that lead to structural and microstructural alterations. Neuroimaging studies performed on AD subjects reveal that when an AD diagnosis is made, alterations of brain structure are already advanced, emphasizing the need to study the early stages of the disease.
The improvement of medical imaging techniques such as magnetic resonance imaging (MRI) has enabled the development of efficient biomarkers capable of detecting alterations caused by AD 3 . Over the past years, many methods have been proposed to perform automatic detection of alterations associated with AD. First, studies proposed methods based on specific regions of interest (ROI) capturing alterations at an anatomical scale. Among structures impacted by AD, previous investigations have been focused on the hippocampus 4-6 , entorhinal cortex (EC) [7][8][9] , parahippocampal gyrus, amygdala 10 , or parietal lobe 11,12 . Alterations of these structures are usually estimated using volume 13,14 , shape 15,16 , or cortical thickness 17,18 measurements. Beside ROI-based methods, whole MRI processing. T1w images were processed using the volBrain system 72 (http://volbrain.upv.es). This system is based on an advanced pipeline providing automatic segmentation of different brain structures from T1w MRI. The preprocessing is based on (a) a denoising step with an adaptive non-local mean filter 73 , (b) an affine registration in the MNI space 74 , (c) a correction of the image inhomogeneities 75 and (d) an intensity normalization.
Afterward, segmentation of hippocampal subfields was performed with HIPS 76 based on a combination of non-linear registration and patch-based label fusion 77 . This method uses a training library based on a dataset composed of high-resolution T1w images manually labeled according to the protocol proposed by Winterburn et al. 37 . To perform the segmentation, the images are up-sampled with a local adaptive super-resolution method to fit the training image resolution 78 . The method provides automatic segmentation of hippocampal subfields gathered into five labels: Subiculum, CA1SP, CA1SR-L-M, CA2-3, and CA4/DG (see Fig. 1). Then, the segmentation maps obtained from the up-sampled T1w images were down-sampled to fit the MNI space resolution. All the following experiments were carried out with images into the MNI space. Finally, an estimation of the total intra-cranial volume was performed 79 . DTI processing. The preprocessing of the diffusion-weighted images is based on (a) a denoising step based on the LPCA filter 80 and (b) a correction of the head motion using an affine registration. Afterward, we performed several steps to first obtain the mapping between the DWI native space and the MNI space and then to estimate the MD in the MNI space.
(1) Estimation of the mapping between DWI native space and MNI space: First, a diffusion tensor model 81 estimated at each voxel using Dipy library 82  (2) Estimation of the MD in the MNI space: The deformation field estimated at the previous step is used to register the b 0 and each DWI direction from their native space into the MNI space using b-spline interpolations 74 . This is done to limit interpolation artifacts and to correct partial volume effect (PVE). It has been shown that up-sampling each DWI direction individually using interpolation before estimating DTI parameters enables the reduction of PVE present in DTI greatly 83 . Thus, the final diffusion tensor model is estimated in the MNI space using all the non-linearly registered DWI and b 0 .
To analyze microstructural modifications, the MD is estimated within each hippocampal subfield and the whole hippocampus structure with the segmentation described in the previous section. MD is defined as where λ 1 , λ 2 , λ 3 are the three eigenvalues of the fitted tensor. Finally, quality control is conducted to exclude data presenting segmentation errors or misregistration after MRI and DTI preprocessing step. Thus, 10 CN subjects, 18 eMCI, 5 lMCI, and 9 AD patients have been excluded from the initial considered ADNI2 dataset (see the dataset used in our experiments Table 1).

Methods
Patch-based grading. Patch-based grading was first proposed for s-MRI 9 . The main idea of this exemplar-based method is to use the capability of patch-based techniques in order to capture subtle signal modifications related to anatomical degradations caused by AD. To date, the PBG methods demonstrate state-of-the-art performances in the detection of the earliest stage of AD 84 . To determine the pathological status of the subject under study, the PBG methods estimate the state of cerebral tissues at each voxel by a similarity measurement. This measurement is performed between the anatomical pattern of the subject under study and those extracted from two training populations, one healthy and another one unhealthy.
First, a training library T composed of two datasets of images is built: one with images from CN subjects and the other one from AD patients. Next, for each voxel x i of the region of interest in the considered subject x, the PBG method produces a weak classifier denoted g x i . This weak classifier provides a surrogate of the pathological grading at the considered position. The weak classifier is computed using a measurement of the similarity between the patch P x i surrounding the voxel x i belonging to the image under study and a set K x i of the closest patches extracted from the library T. The most similar patches are found using an approximative nearest neighbor method 85 . The grading value g x i at x i is defined as: where P t j is the patch surrounding the voxel j belonging to the training template ∈ t T , and w x t ( , ) i j is the weight assigned to the pathological status p t of the training image t. We estimate w such that: and ε → 0. The pathological status p t is set to −1 for patches extracted from AD patient and to 1 for patches extracted from CN subject. Therefore, the PBG method provides a score representing an estimation of the alterations caused by AD at each voxel. Consequently, cerebral tissues strongly altered by AD have grading values close to −1 contrary to healthy one with scores close to 1.
Multimodal patch-based grading fusion. The patch-based method presented in the previous section was designed to capture structural alterations in T1w MRI. Recently, we proposed the extension this method www.nature.com/scientificreports www.nature.com/scientificreports/ to DTI modality in order to detect microstructural modifications 65 . We showed the efficiency of MD grading in improving the classification of the early stages of AD.
In this study, we propose a new framework to perform multimodal patch-based grading (MPBG). To this end, we developed an adaptive fusion of grading maps derived from different modalities (see the example of grading maps on Fig. 2). As shown in the following, this fusion provides more robust and accurate biomarkers compared to monomodal PBG biomarkers.
As in the previous section, a training library of CN and AD subjects is built for each modality. Next, at each voxel within the ROI of the considered subject and for each modality, a set K of most similar patches is extracted. This step provides one set K of patches per modality ∈ m M, where M corresponds to the set of the different modalities provided. Nevertheless, at each voxel, the quality of the grading estimation is not the same for all the modalities. Therefore, the degree of confidence is estimated with the function α defined as: that reflects the confidence of the grading value g x i for the modality m at the voxel x i . This confidence measure is derived from multi-feature fusion 86 . Thus, each modality provides a weak classifier at each voxel that is weighted with its degree of confidence α x i m , . The multimodal grading denoted g x i , is given by: In other words, the weights w and K x i m , are estimated independently for each modality and combined afterward. Therefore, the proposed combination framework is spatially adaptive and takes advantage of the a local degree of confidence α x i m , for each modality m. When the matches found for a modality in the training library is composed of good candidates (i.e., patches very similar to the patch from the subject under study), our confidence α x i m , in the grading estimation for this modality is high. In the end, this modality will have a high weight in the mixing procedure described in (4).

Features estimation.
Features were estimated in each hippocampal subfield and over the whole hippocampus as the union of all hippocampal subfields masks. To reduce the inter-individual variability, all volumes are normalized by the total intra-cranial volume 87 . Afterward, we aggregate weak local classifiers of the grading map into a single feature for each considered structure (i.e., hippocampal subfields, and whole hippocampus) by averaging them. Then, patch-based grading features are computed by an unweighted vote of the weak classifiers using the segmentation masks (see Fig. 3). Finally, to prevent the bias introduced as the structural alterations due to aging, all the features (i.e., volume, mean of MD and MPBG) are age corrected with a linear regression based on the CN group 88 . www.nature.com/scientificreports www.nature.com/scientificreports/ Implementation. We use the OPAL method to find the most similar patches in the training library 89 . OPAL is a fast approximate nearest neighbor patch search technique. This method processes each modality in about 4 seconds on a standard computer. A leave-one-out procedure was followed to construct the training library. Hence, for each test subject, a different training library is built. Consequently, the training library T is composed of 37 images from CN subjects and 37 images from AD subjects, for a total of 76 images. The number of patches extracted from both training libraries is K = 160 (i.e., 80 from CN subjects and 80 from AD patients) and the patch size is 5 × 5 × 5 voxels.
Furthermore, as done in our PBG DTI study 65 , we used zero normalized sum of squared differences for T1w to compute the L2 norm (see Eq. (2)). On the other hand, d-MRI is a quantitative imaging technique. Therefore, a straight sum of squared differences is used for MD in Eq. (2) in order to preserve the quantitative information.
Validation. To evaluate the efficiency of each considered biomarker in detection of AD alterations, the CN group is compared to the group of AD patients. In addition, to discriminate the impairment severity of MCI group, eMCI versus lMCI classification is conducted. The classification step is performed with linear discriminant analysis (LDA) within a repeated stratified 5-fold cross-validation with 200 iterations. Mean area under the curve (AUC) and mean accuracy (ACC) are computed to compare performance for each biomarker over the 200 iterations.
Statistical analyses. Statistical tests were conducted with an analysis of variances (ANOVA) procedure to determine the significance of biomarkers changes, related to the alterations caused by AD. The results of these tests have been corrected for multiple comparisons with Bonferroni's method. Significant changes have been tested within six comparisons (i.e., CN-AD, CN-eMCI, CN-lMCI, eMCI-lMCI, eMCI-AD, and lMCI-AD). These comparisons have been achieved into each region of the hippocampus and with the three considered biomarkers (i.e., the volume, the average of MD, and our newly proposed MPBG). Finally, for each iteration of our stratified 5-fold cross-validation, we estimated the confidence interval of AUC using bootstrap iterated for 100 iterations 90 . Then an average of the minimum and maximum bounds are computed. The results presented in this paper show the average confidence interval based on these average bounds.

Results
In this section, the results are presented in three parts. In the first part, we compare the different approaches applied within the entire hippocampus structure to evaluate the performance of our new MPBG compared to usual biomarkers such as volume and average MD. In the second part, we compare the accuracy of each considered biomarker within hippocampal subfields in order to investigate the potential of hippocampal subfield analysis to improve the result of AD detection and prediction. Finally, we compare the results of our proposed Whole hippocampus. Results of the comparisons over the whole hippocampus are presented in Table 2.
In this experiment, we compared the results of volume, mean of MD and PBG applied with both modality and MPBG over the whole hippocampus.
First, the hippocampus volume and its average of MD were compared. For CN versus AD classification, the volume obtains 86.6% of AUC, and the average of MD obtains 80.6%. For eMCI versus lMCI classification, the volume and the average of MD obtain 59.4% and 55.6% of AUC, respectively. The experiments demonstrate that the volume of the hippocampus results in better classification performances than the average of MD for all comparison, especially for CN versus AD. Second, PBG biomarkers applied with T1w and MD were compared. The results showed that T1w PBG provides better results than MD PBG with 92.6% of AUC for CN versus AD classification. However, for eMCI versus lMCI classification MD grading provides the best results with 69.5% of AUC. MPBG methods combining both modalities performed similarly to the best results for CN versus AD and eMCI versus lMCI with 92.1% and 69.5% of AUC, respectively. Finally, the proposed MPBG biomarker provides results similar to the best modalities for all considered comparisons. MPBG improves CN versus AD comparison result by 5.5% of AUC and by over 10% of AUC for eMCI versus lMCI comparison. Thus, MBPG biomarker has a good capability to capture modifications caused by AD at different stages of severity (see Fig. 2).
Hippocampal subfields. Figure 4 shows the distribution of volumes (A), the average of MD (B), and the MPBG (C) for each hippocampal subfield at different AD stages. For each comparison, a p-value was estimated with a multi-comparison test 91 . We can note that for all hippocampal subfields, alterations caused by the disease are related to volume and MPBG decrease with MD increase. The subiculum subfield presents the most significant differences for CN versus lMCI using volume and MD, for AD versus lMCI using MD, and for eMCI versus lMCI using MPBG. Indeed, it is the only subfield providing a p-value inferior to 0.05 for the comparison of CN versus eMCI using volume, a p-value inferior to 0.01 for lMCI versus AD using MD and a p-value inferior to 0.001 to eMCI versus lMCI using MPBG, which are the most challenging comparisons. The distribution of MPBG shows better discrimination between each group for all hippocampal subfields. Indeed, MPBG applied within CA1SP, and CA1SR-L-M provides p-values inferior to 0.01 for eMCI versus lMCI. Moreover, MPBG applied within the subiculum provides p-value inferior to 0.001 for the same comparison. Thus, MPBG enables AD detection using each subfield with an advantage for subiculum for the comparison of eMCI versus lMCI.
To estimate the efficiency of the considered biomarkers for AD detection, we also performed a classification experiment. Figure 5 shows the results of two comparisons, CN versus AD (part noted A in the figure) and eMCI versus lMCI (part noted B). First, for AD diagnosis (i.e., CN versus AD classification), the subfield providing the most discriminant volume is the CA1S-R-L-M with an AUC of 86.0%. Moreover, the most discriminating MD biomarker is given by the subiculum with an AUC of 88.1%. For this comparison, the MD of subiculum is the only biomarker performing better results than the whole hippocampus. The CA1SP provides the best results using MPBG feature with an AUC of 92.1%, followed by the CA1S-R-L-M and the subiculum.
Second, for eMCI versus lMCI classification, the subiculum provides the best results for each considered feature. Indeed, the subiculum obtained an AUC of 66.1% for the volume, 62.4% for the average of MD, and 71.8% for MPBG. Moreover, the subiculum also provided better results than the whole hippocampus for each considered method. Thus, the experiments conducted with three different biomarkers showed that the use of hippocampal subfields, especially the subiculum, results in better AD prediction than the whole hippocampal analysis.

Comparison with state-of-the-art methods. Direct comparison with other monomodal methods
applied on ADNI1 is difficult since group definition (stable MCI and progressive MCI) are different. However, as recently shown, T1w PBG provides state-of-the-art performance on ADNI1 dataset, even compared to deep learning methods 92 . Consequently, the results presented in this paper with T1w PBG on ADNI2 can reasonably be considered competitive and can be used as a reference.
Consequently, to evaluate the performance of the proposed MPBG, we compared it with state-of-the-art multimodal methods using d-MRI. To this end, we used the ACC values published by the authors. Table 3 shows the comparison of our proposed biomarkers within the hippocampal area providing the best results (i.e. the whole hippocampus and the subiculum) with the state-of-the-art methods using similar dataset based on ADNI-2. We  www.nature.com/scientificreports www.nature.com/scientificreports/ compared these biomarkers with a method using features based on tractography 93 , two different methods based on connectivity networks of the different brain structures 60,94,95 , and a voxel-based method that analyzes alterations of white matter 96 . The results of the comparison show that MPBG over the whole hippocampus obtains the best score for AD versus CN with 88.1% of accuracy while the best result is achieved by a voxel-based method www.nature.com/scientificreports www.nature.com/scientificreports/ Whole hippocampus volume biomarker provides the best results with a mean AUC of 86.6% for CN versus AD comparison, followed by the CA1S-R-L-M volume that obtains a mean AUC of 86%. Subiculum volume provides the best results for eMCI versus lMCI with a mean AUC of 66.1%. The average of MD for subiculum obtains the best results for CN versus AD and eMCI versus lMCI with a mean AUC of 88.1% and 62.4%, respectively. Whole hippocampus MPBG obtains the best results for CN versus AD with a mean AUC of 92.1%. Subiculum MPBG obtains the best results for eMCI versus lMCI comparison with a mean AUC of 71.8%. This comparison shows that subiculum is the only biomarker providing better results than the whole hippocampus. This figure presents mean AUC and the mean confidence intervals that have been computed for each iteration of the stratified 5-fold cross-validation procedure carried out in our experiments.   www.nature.com/scientificreports www.nature.com/scientificreports/ with a feature selection 96 that obtained 87.0% on similar ADNI2 dataset. For the best of our knowledge, the two works providing eMCI versus lMCI comparison 60,94 using s-MRI and d-MRI from a similar ADNI2 dataset are based on a connectivity network and obtained 63.4% and 65.0%, respectively. These comparisons demonstrate the relevance of MPBG biomarkers for AD detection and prediction. Indeed, our method provides similar results than the best methods with similar dataset for CN versus AD classification and provides the best results for eMCI versus lMCI classification. Moreover, the proposed MPBG method based on the subiculum improves the performance for eMCI versus lMCI classification with an accuracy of 70.8%, that increases by 2% the accuracy based the whole hippocampus and over 6% compared to a connectivity network-based method.

Method
Relationship with cognitive scores. To investigate relationships between cognitive scores and MPBG values, we performed a generalized linear analysis with the following model: MPBG = β 0 + β 1 .ages + β 2 .sex + β 3 .MMSE + β 4 .RAVLT + β 5 .FAQ + β 6 .CDRSB + β 7 .ADAS11 + β 8 .ADAS13. We found significant relationship of hippocampal MPBG with sex (p < 0.01), MMSE (p < 0.05) and ADAS 13 (p < 0.01). This correlation with MMSE and ADAS scores is valid for all subfields of the hippocampus. We found no specific model for a given subfield, all presented a similar pattern. These results are in line with relationships obtained between hippocampus subfields volumes and MMSE and ADAS 97 .

Discussion
In this work, multimodal analysis of the hippocampal subfields alterations caused by AD is proposed. First, the structural and microstructural alterations were captured from two MRI modalities with different methods. Then, the use of volume, MD, and the proposed MPBG methods were investigated to achieve this analysis. In this section, the efficiency of these different methods applied to the whole hippocampus, and each hippocampal subfield are discussed.

Whole hippocampus biomarkers.
We first compared the performance of different methods applied to the whole hippocampus (see Table 2). The experiments showed that volume and average of MD of the hippocampus do not provide the most discriminating biomarkers to detect early stages of AD. Indeed, the proposed MPBG method obtains better results compared to the volume and the average of MD. However, for CN vs. AD, our MPBG method obtained lower results than T1w PBG when applied to the hippocampus. Therefore, the substantial structural differences between these two populations seem to be better captured using T1w modality. This probably comes from the better native resolution of this modality. On the other hand, for eMCI vs. lMCI, MPBG and MD PBG obtained the best result. Therefore, the subtle alterations between both populations seem to be better captured using DTI modality. This may come from the capability of this modality to measure microstructural modifications. Finally, when applied on the whole hippocampus, our MPBG demonstrates state-of-the-art performances for AD detection and prediction hippocampus compared to recent methods (see Table 3).
These results emphasize the relevance of using more accurate biomarker, such as MPBG, to study the effectiveness of hippocampal subfields for AD detection and prediction.
Hippocampal subfield biomarkers. The main contribution of this study is the multimodal analysis of hippocampal subfields. Indeed, most of the proposed biomarkers based on the hippocampus focus only on the whole structure or study alterations of hippocampal subfields with methods that do not provide sensitive biomarkers to detect early modification caused by AD. The lack of work studying alterations of hippocampal subfields with advanced biomarkers could be explained by the fact that automatic segmentation of the hippocampal subfields is a complex task due to subtle borders dividing each area.
In this work, we compared the efficiency of diffusion MRI and multimodal patch-based biomarkers for AD detection and prediction over the hippocampal subfields. Comparisons based on MD, volume and multimodal patch-based biomarkers showed that the subiculum is the most discriminating structure in the earliest stage of AD providing the best results for AD prediction (see Figs 4 and 5). However, whole hippocampus structure, followed by CA1SR-L-M, obtains best results for AD detection.
These results are in accordance with literature studies based on animal model and in vivo imaging combining volume and MD demonstrating that the subiculum is the earliest hippocampal region affected by AD 49,50 . Moreover, postmortem studies showed that hippocampal degeneration in the early stages of AD is not uniform. After the apparition of alterations in the EC, the pathology spreads to the subiculum, CA1, CA2-3 and finally the CA4 and DG subfields 43,44,49,98 . It is interesting to note that the results of our experiments using volume-based biomarkers are also coherent with the previous in-vivo imaging studies that analyzed the atrophy of each hippocampal subfield at the advanced stage of AD. These studies showed that CA1 is the subfield impacted with the most severe atrophy 45,46,99,100 . Furthermore, studies using the ultra-high field at 7T, enabling CA1 layers discrimination showed that CA1SR-L-M are the subfields showing the greatest atrophy at advanced stages of AD 47,48 . Comparison with state-of-the-art methods. In the past years, a large number of studies dedicated to automatic detection of Alzheimer's disease have been proposed 53,69,93,101 . For a fair comparison, we consider only methods based on similar modalities and validated on the same ADNI2 dataset. Direct comparison with other monomodal methods applied on ADNI1 is difficult because group definition and pathological status definition are different. However, we can observe that the results obtained by the proposed method are in line with recently published results for AD vs. CN 102 . Strengths and limitations. The major strength of our work comes from studying the effectiveness of using multimodal hippocampal subfields alterations for AD classification with a novel multi-modal patch-based grading framework. Nonetheless, we acknowledge that our multi-modal framework is not without potential limitations. The main limitation is the large voxel size of DWI in native space that is prone to PVE by merging signal from CSF with the signal from brain tissues. This results in an increase of MD coefficients, especially for structures with www.nature.com/scientificreports www.nature.com/scientificreports/ severe atrophies. However, to limit this aspect, we corrected the PVE 83 . Indeed, it has been shown that the use of up-sampling methods over individual DWI direction enables reduction of the PVE effect. Nevertheless, this study does not aim to provide an interpretation of DTI parameters modification, but to study the effectiveness of the use of hippocampal subfields for AD classification with multimodal patch-based grading method. Finally, although our method extracts patches independently from both s-MRI and d-MRI modalities to estimate grading maps from both modalities, the fusion of the two grading maps requires accurate alignment of images from each modality. Consequently, the correction of EPI distortions is crucial in ensuring that each voxel corresponds to the location.

Conclusion
In this paper, we analyzed hippocampal subfield alterations with a multimodal framework based on structural and diffusion MRI. In addition, to study tenuous modifications occurring in each hippocampal subfield, we developed a new multimodal patch-based framework using T1w and DTI. Our novel MPBG method was compared to the volume and the average of MD over the whole hippocampus. This comparison demonstrated that our MPBG method improves performances for AD detection and prediction. Also, a comparison with state-of-the-art diffusion-based methods showed the competitive performance of MPBG biomarkers. Finally, volume, average MD and MBPG methods were used to analyze hippocampal subfields. Although CA1 is the subfields with the greater atrophy in the late stage of AD, the experiments demonstrated that the whole hippocampus provides the best biomarker for AD detection while the subiculum provides the best biomarker for AD prediction.

Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.