Deep-learning-based imaging-classification identified cingulate island sign in dementia with Lewy bodies

The differentiation of dementia with Lewy bodies (DLB) from Alzheimer’s disease (AD) using brain perfusion single photon emission tomography is important but is challenging because these conditions exhibit typical features. The cingulate island sign (CIS) is the most recently identified specific feature of DLB for a differential diagnosis. The current study aimed to examine the usefulness of deep-learning-based imaging classification for the diagnoses of DLB and AD. Furthermore, we investigated whether CIS was emphasized by a deep convolutional neural network (CNN) during differentiation. Brain perfusion single photon emission tomography images from 80 patients, each with DLB and AD, and 80 individuals with normal cognition (NL) were used for training and 20 each for final testing. The CNN was trained on brain surface perfusion images. Gradient-weighted class activation mapping (Grad-CAM) was applied to the CNN to visualize the features that was emphasized by the trained CNN. The binary classifications between DLB and NL, DLB and AD, and AD and NL were 93.1%, 89.3%, and 92.4% accurate, respectively. The CIS ratios closely correlated with the output scores before softmax for DLB–AD discrimination (DLB/AD scores). The Grad-CAM highlighted CIS in the DLB discrimination. Visualization of learning process by guided Grad-CAM revealed that CIS became more focused by the CNN as the training progressed. The DLB/AD score was significantly associated with the three core features of DLB. Deep-learning-based imaging classification was useful for an objective and accurate differentiation of DLB from AD and for predicting clinical features of DLB. The CIS was identified as a specific feature during DLB classification. The visualization of specific features and learning processes could be critical in deep learning to discover new imaging features.

Neuroimaging has contributed to the classification of neurodegenerative dementias such as dementia with Lewy bodies (DLB) and Alzheimer's disease (AD). Early diagnoses of DLB and AD are important from prognostic and therapeutic perspectives, and distinguishing them is clinically vital. Disease-specific features have been extracted from brain perfusion single photon emission tomography (SPECT) images to assist with differential diagnoses of DLB and AD. Brain surface perfusion images produced by three-dimensional stereotactic surface projection (3D-SSP) 1 have been widely applied to statistical analyses that supported the diagnoses of DLB and AD. A perfusion decrease in the parietal association cortex (PAC) and a perfusion preservation in the primary motor and primary somatosensory cortex are typical in patients with DLB and AD 2,3 and have interfered with distinguishing DLB from AD on perfusion SPECT images. An imaging feature for DLB discrimination is occipital hypoperfusion [4][5][6][7] . Another finding that can produce a difference between DLB and AD is perfusion in the posterior cingulate cortex (PCC). Hypoperfusion in the PCC is observed in the early stage of AD, whereas the PCC is relatively preserved in DLB. The phenomenon of sparing the PCC relative to the precuneus plus cuneus that is termed the cingulate island sign (CIS) 8 , has recently garnered attention because it reflects a concomitant AD pathology that affects the clinical symptoms of DLB 9,10 . We discovered CIS peaks at the stage of mild dementia and they (2019) 9:8944 | https://doi.org/10.1038/s41598-019-45415-5 www.nature.com/scientificreports www.nature.com/scientificreports/ disappear gradually as DLB progress 11 . Thus, the CIS can help differentiate DLB from AD especially at the early stage 8,12 with some exceptions, including posterior cortical atrophy 13 .
Deep learning is a primary branch of artificial intelligence comprising a deep convolutional neural network (CNN) capable of automatic feature extraction from data, and recent advances in deep learning have remarkably improved the performance of image classification and detection 14,15 . Some algorithms based on deep learning have been proposed to recognize or differentiate AD and mild cognitive impairment (MCI) 16,17 . In contrast, the ability of a CNN to discriminate DLB has not been investigated in detail. Furthermore, a deep-learning-based SPECT interpretation system that can differentiate between DLB and AD has not been described. The most significant disadvantage of deep learning is that the imaging features used by the CNN for classification have remained unknown. However, gradient-weighted class activation mapping (Grad-CAM) can produce "visual explanations" from a CNN, thus allowing the visualization of areas focused by a CNN 18,19 .
The current study aims to investigate whether a trained CNN can identify the CIS, which is the most recently recognized imaging feature of DLB, while a deep two dimensional CNN (2D-CNN) objectively and automatically classifies brain surface perfusion images through the 3D-SSP of DLB, AD, and individuals with normal cognition (NL). Furthermore, the learning process was visualized during CNN training.

Results
Deep CNN could accurately classify brain surface perfusion images. Tables 1 and 2 summarizes the demographic and cognitive findings of 80/20 persons, each with AD, DLB, and NL of the training/validation and final testing cohorts. The deep CNN was applied to images (n = 160) including the right-left flipped images from each group of 80 patients for binary classification (Fig. 1). The accuracy of the classification was calculated by the final testing cohorts. The binary differentiations between DLB and NL (DLB-NL), DLB and AD (DLB-AD), and AD and NL (AD-NL) were 93.07 ± 3.77%, 89.32 ± 4.59%, and 92.39 ± 4.42% accurate (mean ± standard deviation), respectively. The AUCs of the ROC for differentiating DLB-NL, DLB-AD, and AD-NL were 0.954, 0.935, and 0.943 accurate, respectively.
Validation of epoch number and effect of sample number. One hundered epochs were confirmed to be suitable by the learning curve (Fig. 2).   www.nature.com/scientificreports www.nature.com/scientificreports/ When the sample number was small, the accuracy did not differ greatly from the full set. However, smaller samples exhibited overfitting easily (Fig. 3).   www.nature.com/scientificreports www.nature.com/scientificreports/ Trained CNN identified CIS for DLB detection. Grad-CAM was applied to the trained CNN to produce heatmaps and guided Grad-CAM images for DLB-AD and DLB-NL discrimination. The heatmap clearly highlighted CIS in DLB to discriminate DLB and AD (Fig. 5a). The guided Grad-CAM exhibited a limited range on the image that focused on CIS.

DLB NL AD
All the 80 DLB images are shown in the Supplementary Information. These images are arranged in the descending order of the DLB/AD score. CIS was highlighted in the first 61 DLB images. Among them, obviously highlighted CIS was found in the 48 images. Brain perfusion images with obvious occipital hypoperfusion without CIS were labeled correctly as DLB. Grad-CAM highlighted the cerebellum randomly. The last nine DLB images highlighted the occipital cortex without CIS and were mislabeled as AD.
CIS was highlighted less intensely in DLB-NL than in DLB-AD discrimination (Fig. 5b). The heatmap and guided Grad-CAM for AD highlighted the occipital lobe and cerebellum, but not the PCC (Fig. 5c). The heatmap and guided Grad-CAM for NL diffusely highlighted the occipital lobe, middle cingulate cortex, PCC, and cerebellum (Fig. 5d).
Visualization of feature extraction in the learning process of CNN. Grad-CAM visualized the learning process to extract features that were useful for differentiation by displaying altered images (Fig. 6). In the CNN trained for DLB-AD discrimination with 20 epochs, guided Grad-CAM and original images remained similar, indicating that the CNN could not detect specific features. After training 60 epochs, the guided Grad-CAM images became narrower and the contrast became more obvious. After training with 100 epochs, the CNN focused more on CIS in DLB (Fig. 6a,b) and the occipital lobe, cerebellum, and sensorimotor areas in AD (Fig. 6c,d).
DLB/AD score was associated with core features of DLB. The association between neuroimaging indices (i.e., CIS ratio, DLB/AD and DLB/NL score) and clinical symptoms (i.e., four core features and verbal memory) were analyzed. The DLB/AD score was significantly correlated with hallucination, Parkinsonism, and RBD, but not with fluctuation (Table 3). In contrast, the DLB/NL score was not correlated with any of them. The CIS ratio was correlated with hallucination and RBD. The DLB/AD score and CIS ratio were significantly correlated with verbal memory. www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Our CNN identified the CIS as an imaging feature during DLB-AD discrimination. The CIS ratios correlated closely with the DLB/AD scores, indicating the possibility that the network assessed the CIS indirectly during the discrimination. Furthermore, heatmaps generated by the Grad-CAM highlighted the CIS in DLB. The guided Grad-CAM also focused on the CIS and became restricted to the CIS as the learning process progressed. The indirect evidence of the correlation coefficients may imply that a typical DLB possesses a higher CIS ratio. However, the trained CNN automatically and objectively identified the CIS as an important feature of DLB prediction, considering that the Grad-CAM could visualize the target of the CNN for the classification. The present findings defined the potential of deep learning to discover new features in image diagnosis.  www.nature.com/scientificreports www.nature.com/scientificreports/ The deep CNN could accurately classify brain surface perfusion images. The classification accuracies of DLB-NL, DLB-AD, and AD-NL were 93.1%, 89.3%, and 92.4%, respectively. Most previous studies using deep-learning-based classification aimed to diagnose AD and MCI but not DLB using 3D-CNN, and the CNN diagnosis of DLB using FDG PET or perfusion SPECT has never been reported. Suk et al. 17 reported that the mean accuracies of MRI, FDG PET, and MRI + PET with 3D-CNN were 92.38%, 92.20%, and 95.35%, respectively. Liu et al. 16 generated accuracies of 90.18% (MRI), 89.13% (PET), and 90.27% (MRI + PET). Our 2D-CNN with brain surface perfusion images extracted from whole brain perfusion SPECT data yielded comparable discriminative accuracy. The distribution on brain perfusion and glucose metabolism images was similar 20 . The bird's-eye view brain surface perfusion images represent extracted features that are useful for discriminating neurodegenerative dementia. Furthermore, 3D-CNN requires more calculations to converge more parameters than 2D-CNN. Thus, 2D-CNN with brain surface perfusion images classified more efficiently than 3D-CNN with whole brain images. Our method, which can be operated in a standard computer, can potentially prevail in clinical settings.
The CIS was more involved in the discrimination of DLB-AD rather than of DLB-NL, considering the higher correlation coefficients of the CIS ratios and DLB/AD scores than the CIS ratios and DLB/NL scores. The Grad-CAM supported this notion by focusing on the CIS as an imaging feature of DLB in the DLB-AD and DLB-NL discrimination. Heatmap and guided Grad-CAM highlighted the CIS in the DLB-AD discrimination, while CIS was less highlighted in the DLB-NL discrimination. As DLB and AD exhibit common features such as rCBF decreases in the PAC, classification is typically more difficult for DLB-AD than DLB-NL. Most patients with DLB exhibit concomitant AD pathology 21 , which reportedly affects the CIS of patients with DLB. Specifically, the CIS is not obvious in DLB with abundant AD pathology. Similar to the CIS ratios, the DLB/AD scores in DLB reflects the degree of imaging features of AD that are presumably produced by concomitant AD pathology. Therefore, low CIS ratios and DLB/AD scores represent a high degree of concomitant AD pathology. Conversely, high CIS ratios and DLB/AD scores represent "pure" DLB. This explains why the CIS ratios exhibited a good correlation with the DLB/AD scores.
The Grad-CAM revealed that the CNN classified SPECT images in a manner unlike that of humans. Nuclear medicine physicians simultaneously evaluated these hypoperfused areas and preserved the regions to differentiate DLB from AD, and often considered the contrast of the preserved and decreased areas. In contrast, heatmaps generated by the Grad-CAM were placed only on regions with preserved rCBF in both AD and DLB in the appropriately trained CNN. Guided Grad-CAM images became narrower and restricted to more preserved regions as learning progressed. Consistent with these findings, the CNN focused only on the preserved regions to classify the brain surface perfusion images of both DLB and AD. Regardless of the classification method, the CNN still identified the CIS as an important imaging feature of DLB.
The DLB/AD score was correlated significantly with the scores of three core features, namely hallucination, Parkinsonism, and RBD. In contrast, DLB/NL score was not correlated with any of them. This finding suggested that the DLB/AD scores closely represented various symptoms of DLB. Similar to the DLB/AD score, the CIS ratio was also correlated with hallucination and RBD. As CIS has been reported to reflect AD pathology, a close correlation of the CIS ratio with DLB/AD score indicated that the DLB/AD score reflected comorbid AD pathology. Hallucination was frequently observed in DLB without AD pathology 22 . The manifestation of RBD was reportedly associated with less severe concomitant AD pathology 23 . Our finding was consistent with the previous reports demonstrating the association between core features and AD pathology. Furthermore, the DLB/AD score was correlated with verbal memory score, thus implying that memory impairment is prominent in patients with AD rather than those with DLB. Thus, the DLB/AD score was useful for both discriminating and predicting the clinical features of DLB. Our deep learning system would be beneficial to health care finance. Dopamine transporter (DaT) imaging 24 and [ 123 I] MIBG cardiac sympathetic nerve scintigraphy 25 are authentic in clinically discriminating DLB from AD, and the DLB guidelines treat DaT imaging and [ 123 I] MIBG scintigraphy as indicative biomarkers 26 . However, to assess all amnestic patients using two more nuclear medicine examinations might be cost prohibitive. Brain perfusion SPECT is more commonly used to detect AD, especially when a diagnosis is uncertain. Consequently, our diagnostic system and perfusion SPECT could be initially applied to investigate DLB in patients with suspected AD before using DaT and cardiac sympathetic nerve imaging.
This study has several limitations. Each group comprised only 160 augmented images from 80 individuals because this study was performed at a single institution. However, our brain surface perfusion images were normalized by 3D-SSP and applied only to binary classification. Therefore, we considered that the accuracy was sufficient regardless of the limited number of patients. The accuracy of FDG PET might be better, but perfusion SPECT is more accessible, and it has been proven as a valid alternative in the absence of FDG PET 27 . Furthermore, images with [ 123 I] IMP shows good contrast owing to its high first-pass extraction 11,28 . Recent CNN studies have attempted to enhance accuracy using various combinations of imaging modalities 16,17 . Although the ability of a 2D-CNN with brain surface perfusion images was comparable to previous findings with such combinations, combinations of perfusion SPECT with other imaging modalities should be considered in future studies to enhance accuracy.

Conclusions
Deep-learning-based imaging classification was useful for an objective and accurate differentiation of DLB from AD, and for predicting the clinical features of DLB. The CIS was identified as a specific feature during DLB classification. The visualization of specific features and learning process could facilitate the discovery of new imaging features using deep learning.  31 , respectively. Verbal memory was evaluated using the sum of the five recall trials (1)(2)(3)(4)(5) of the Ray Auditory Verbal Learning Test (RAVLT).
Brain perfusion SPECT images of 20 persons each with DLB, AD, and NL were used for the final testing. All procedures were approved by the Ethical Review Board of Fukujuji Hospital. We followed the clinical study guidelines of Fukujuji Hospital, which conformed to the Declaration of Helsinki (2013). We provided the healthy volunteers, patients, and their families with detailed information about the study, and all had provided written informed consent to participate in the study.
Brain perfusion SPECT imaging. Persons resting with their eyes closed and ears unplugged were assessed by SPECT using Symbia Evo Excel, a gamma camera (Siemens Medical Solutions, Malvern, PA, USA), and fan beam collimators. Fifteen minutes after an intravenous infusion of [ 123 I] IMP (167 MBq), SPECT images were acquired in a 128 × 128 matrix with a slice thickness of 1.95 mm (1 pixel) over a period of 30-40 min. The images were reconstructed by filtered back projection using a Butterworth filter; attenuation was corrected using the Chang method (attenuation coefficient = 0.1 cm −1 ), and scatter was corrected using a triple energy window. Brain surface perfusion images produced using 3D-SSP 1 were augmented by flipping from left to right. The regional cerebral blood flow (rCBF) in the regions of interest (ROI) on the PCC, precuneus, and cuneus was measured as described 11 . The mean value in the bilateral PCC ROI was divided by the mean value in the bilateral precuneus plus the cuneus ROI to derive the CIS ratios from the [ 123 I] IMP SPECT images. The CIS ratio is posterior cingulate/(precuneus + cuneus) 8 .
Preparation for deep CNN. Figure 1 summarizes the architecture of our deep CNN. The network was built with Keras and TensorFlow (Google, Mountain View, CA, USA), a deep-learning framework. We selected simple structures as we found that deeper structures did not contribute to accuracy; we did not use transfer learning to visualize the learning process.
After the convolution operation, rectified linear unit (ReLU) and max-pooling operations proceeded on the output of convolution. The ReLU maintained positive input values whereas negative input values were changed to zeros. The max-pooling operation selected the maximum value and input this value into a smaller feature map. The input data were extracted from the brain perfusion SPECT images. The input image had a matrix of 200 × 200 pixels, i.e., a composite of two lateral and two medial surface images. The input values of the voxels were rescaled within a range of 0 to 255; subsequently, the mean scalar value of each SPECT volume was subtracted. The images were passed through the first convolutional layer that produced 193 × 193 × 32 output images after the 8 × 8 × 32 convolutional filter. Thereafter, ReLU activation and the max-pooling of a 2 × 2 pool proceeded. The second convolutional layer with a 5 × 5 × 32 filter and 92 × 92 × 32 output was followed by the ReLU activation and max-pooling layers. The third convolutional layer with a 3 × 3 × 64 filter and 44 × 44 × 64 output was followed by the ReLU activation and max-pooling layers. The last convolutional layer with a 5 × 5 × 32 filter and 18 × 18 × 32 output was followed by the ReLU activation and max-pooling layers that produced a 9 × 9 × 32 output. Thereafter, a fully connected layer generated the output; subsequently, a softmax function was applied to discriminate the two labels.
The softmax produces two numerical values of which the sum becomes 1.0. The output values before softmax for the binary differentiation of DLB-NL, DLB-AD, and AD-NL are expressed as DLB/NL, DLB/AD, and AD/ NL scores, respectively. We obtained the scores by applying an inverse sigmoid function to the output value. We employed binary discrimination to determine if the CNN recognizes the CIS differently in discriminating DLB-AD and DLB-NL. The network was trained to minimize cross entropy losses between the predicted and true diagnoses based on the images. We used the Adam optimizer and the proposed default settings (learning rate = 0.001, β 1 = 0.9, β 2 = 0.999, decay = 0.0) of the parameters 32 .
The CNN was trained for 100 epochs. The validity of the epoch number was verified by plotting the performance versus epochs. Furthermore, we plotted with reduced number of samples (0.5, 0.75 of original sample number (320)).
To visualize the decision made by the CNN, Grad-CAM was applied to the CNN. Grad-CAM uses the gradients of any target flowing into the final convolutional network to produce heatmaps that highlight important regions upon which the CNN focuses. A guided Grad-CAM was created by fusing the existing pixel-space gradient visualizations with Grad-CAM to achieve both high resolution and class discrimination. Furthermore, we used Grad-CAM to visualize the learning process of the CNN trained with perfusion images.
Statistics. The diagnostic and predictive accuracy of the CNN was calculated from the independent final testing cohorts. An original image and its right-left flip image were in the same set of training or validation. Binary classification scores were evaluated using the receiver operating characteristic (ROC) curve analysis and area under the curve (AUC). Correlations between CIS ratios and DLB/AD or DLB/NL scores were assessed using Pearson's product moment correlation coefficients. Correlations between clinical scores and CIS ratios, DLB/ AD, or DLB/NL scores were assessed using Spearman rank correlation coefficients and the multiple comparison