Introduction

Alzheimer's disease (AD) is one of the most dreadful and generic classes of dementia that causes a progressive loss of memory and cognitive function, leading to poor quality of life. It accounts for almost 60–80% of dementia cases and it is ranked globally as the fifth leading cause of death.

Pathologically, the primary characteristic of neuropathological lesions in AD is the extracellular deposition of amyloid plaques. Amyloid plaque aggregates are composed of amyloid-beta (Aβ), a fragment of amyloid precursor protein (APP) and a single transmembrane protein1. As in Fig. 1, APP is processed by two alternative pathways: nonamyloidogenic and amyloidogenic2. In the nonamyloidogenic pathway, APP is cleaved by α-secretase and γ-secretase generating the extracellular soluble APP-α (sAPP-α), APP intracellular domain (AICD) fragment and a short fragment p3 (N-truncated Aβ fragment)3. In the amyloidogenic pathway, at which Aβ fragments are produced, there are sequationuential cleavages by β- and γ-secretase. APP, at first, is cleaved by β-secretase-producing soluble APP-β (sAPP-β), and then the membrane-retained fragment is cleaved by γ-secretase generating. Another AICD fragment translocated to the nucleus where it affects the transcriptional regulation of several proteins and drives neuroprotective pathways and Aβ fragments of 40 (Aβ40) or 42 (Aβ42) amino acids interacting initially with apolipoprotein E result in an aggregation of beta oligomers to generate beta-amyloid plaques. Eventually, Aβ fragments are involved in several downstream pathways related to AD4.

Figure 1
figure 1

Cleavage of amyloid precursor protein (APP) by nonamyloidogenic and amyloidogenic pathways.

Recently, researchers have introduced novel therapeutic approaches for AD that target the reduction of amyloid oligomer levels, including (1) the use of small molecule inhibitors to prevent oligomerization. (2) Employ the immunotherapy to neutralize oligomeric species. (3) Accurate determination of Aβ-degrading enzymes to dominate Aβ oligomer levels in the brain. (4) Stimulation of the immune system to produce Aβ antibodies to attack aggregates. (5) Use of Aβ blockers to block amyloid channels. All these approaches are currently under development in the preclinical research stages5. However, biological studies can reveal the initiation of the Aβ pathways before the outset of AD symptoms, which contributes to targeting studies of early stages of treatment and slowing disease progression. Early diagnosis of AD, therefore, is needed to provide adequate treatment and avoid deterioration stages5.

Generally, the main challenge is not only to clear but also to prevent the formation of Aβ plaques requiring accurate measures of plaque morphologies for understanding disease progression and pathophysiology. Indeed, there are numerous forms of plaque, but the most prevalent form is characterized as a diffuse, cerebral amyloid angioplasty (CAA), and dense-core (see Fig. 2). The diffuse plaques are loosely organized amorphous clouds. Dense-core plaques are related to synaptic loss. They are surrounded by dystrophic neuritis, activated microglial cells, and reactive astrocytes. The dystrophic neurites are used for the pathological diagnosis of AD as they are associated with the presence of cognitive impairment. In CAA, the Aβ plaques deposit in the tunica media of leptomeningeal arteries and cortical capillaries, small arterioles, and medium-size arteries, particularly in posterior areas of the brain. Some degrees of CAA, usually mild ones, are presented in about 80% of AD patients. In case it is severe, CAA can weaken the vessel wall and cause life-threatening lobar hemorrhages6.

Figure 2
figure 2

Amyloid-beta (Aβ) plaques morphologies: (a) diffuse, (b) cerebral amyloid angioplasty (CAA), and (c) dense-core.

Related works

The application of fractal geometry has become a new trend in studying biological systems in the last years7,8,9,10,11,12,13,14, including AD15,16,17,18,19,20. Fractal is an amorphous geometric concept with infinite nesting of a self-similar structure at different scales providing a general framework for studying different irregular sets. Fractal dimension (FD) seems to be a measure of the fractal properties that describe the space-filling properties of networks, including biological systems. FD has been applied to histopathological studies to determine the complexity of certain tissue components21,22,23,24. Biscetti et al.25 measured FD and other parameters of superficial capillary plexus (SCP), intermediate capillary plexus (ICP), deep capillary plexus (DCP), and choriocapillaris of subjects with mild cognitive impairment (MCI) due to AD and cognitively healthy controls (CN). They found that FD shows early vessel recruitment as a compensative mechanism at disease onset. The calculation of FD from optical coherence tomography angiography (OCT-A) is scanned to show the retinal vascular changes in subjects with AD, MCI, and CN shown in26. They found that FD decreases in elderly people and is lower in males.

The limitation of fractal analysis in describing more complex structures like Aβ plaques by one exponent FD can be solved by multifractal analysis. Multifractal is a generalization of fractal geometry when FD is not sufficient as it provides a spectrum of fractal dimensions FDs27,28,29. Multifractal measures have been observed in different physical situations as neural networks, fluid turbulence, rainfall distribution, mass distribution across the universe, viscous fingering, and many other phenomena.

Machine learning (ML) is a branch of artificial intelligence, which extracts information “training data” from a dataset to make accurate predictions or decisions without being explicitly programmed. Many studies have focused on applying machine learning techniques to diagnose and classify the various stages of AD via different types of physical tests in the last years30,31,32,33,34,35,36,37,38,39,40, and recently using immunohistochemistry images41,42. In43, they use the convolutional neural network (CNN) model on IHC images to classify between Aβ morphologies as dense core plaques, diffuse plaques, and CAA. The utilization of deep learning (DL) to differentiate tauopathies, including AD, progressive supranuclear palsy (PSP), corticobasal degeneration (CBD), and Pick's disease (PID), based on IHC images shown in44. Using MRI scans, Majumder et al.45 applied the artificial neural network (ANN) technique to distinguish between AD and cognitively normal (CN). Mild cognitive impairment (MCI) to Alzheimer's disease (AD) transition prediction was carried out, in46, using the ANN algorithm in MRI images. Additionally, Richhariya et al.47 classified between several stages as CN vs. AD, MCI versus AD, and CN vs. MCI using recursive feature elimination and SVM.

Therefore, the principal objective of the current research is to study the morphologies of amyloid plaques in AD using multifractal analysis that may represent a vital pathway for the increase in the number of neurodegenerative diseases, including Alzheimer's, as well as structure-based drug discovery, which may contribute to the creation of novel treatment strategies for various degenerative diseases. The variety of tissue structures in Whole-Slide Imaging (WSI) in the temporal gyri of the AD patient brain have been discussed in this research. To automate the classification process, the Naive Bayes has been used as a classifier.

The research contributions

The current study contribution can be summed up as follows:

1. A new strategy in assessing of amyloid plaques morphologies using multifractal geometry of analysis.

2. Accurate measure of plaques morphologies for understanding disease pathology.

3. Using Naive Bayes classifier as a classifier saves time and effort other than algorithms that require training procedures.

4. It provides high performance measures compared with other recent classification techniques.

Materials

Data used by Tang et al. are available at48. There are 63 subjects in the sample, and each has a single temporal gyri whole slide image (WSI). The subjects were chosen to represent a broad spectrum of pathological burden for each of the three AD pathologies of interest: cerebral amyloid angiopathy (CAA), dense-core and diffuse plaques. Glass slides with 5 mm sections of the superior and middle temporal gyrus that had been formalin-fixed and paraffin-embedded made up all of the WSIs. Amyloid beta (Aβ) antibody was used to perform immunohistochemistry staining on the tissue. An Aperio AT2 was used to digitize every slide at a magnification up to 40 times. The open-source library PyVips was used to apply the color normalization and subsequently tile the WSI into small images in a structured format (256 × 256 pixels). The used dataset contains 1200 images divided into 400 images for diffuse, 400 images for cerebral amyloid angioplasty (CAA), and 400 images for dense-core cases. Using a custom program written by MATLAB v.9.4 for R2018a (Mathworks, MA, USA), the hardware system is composed of a CPU core i7, 8GB RAM, and 1TB HD.

In this study, the first step of the proposed classification system is the image-processing step. The images, firstly, have been processed to enhance the contrast and resolution. Secondly, the images have been passed through two processing stages: the first stage is responsible for converting the images from an RGB image to a Grayscale image. In the second stage, the images have been converted to a binary form; this can be illustrated in Figure 3. The binarization process is based on converting the image pixel level into two values 1 or 0; therefore, the resulting image has only two colors (Black and white). The pixel conversion process can be achieved through two steps. In the first step, obtain the image histogram, which describes the gray color distribution of the pixels in an image. In the second step, compute the threshold value according to the used threshold technique. In this study, Otsu’s method49 has been used as a threshold technique. This technique is based on maximizing the inter-cluster variation to minimize the intra-cluster variation; hence, it divides all the pixels into two clusters (foreground and background) based on the grayscale intensity values of the image pixels.

Figure 3
figure 3

Sample fore image processing step (a) the raw image (b) the image in gray scale (c) the binary image.

Methods

Multifractal analysis

In the last decades, a broad range of complex structures of interest to scientists, engineers, and physicians have been quantitatively characterized using the idea of a fractal dimension: a dimension that uniquely correlates to the geometrical shape under study and is often not an integer50. The key to this trend is the recognition that many random structures obey a symmetry as remarkable as that obeyed by regular structures. This "scale symmetry" implies that objects appear to be the same at many different scales of observation. To describe a fractal set, it is supposed that S is a subset of a d-dimensional space covered with boxes of length L, then the local density Pi(L) of the object is the mass function of the i-th counting box,

$$ P_{i} \left( L \right) = \frac{{M_{i} \left( L \right)}}{{M_{T} }} $$
(1)

where MT denotes the object's total mass and Mi(L) is the number of pixels that comprise the mass in the box. On the other hand, Pi(L) in heterogeneous objects can vary as:

$$ P_{i} \left( L \right) \sim L^{{\alpha_{i} }} $$
(2)

where αi is the Holder exponent that characterizes the scaling of the i-th region or spatial location. Consequently, the local behavior of Pi(L) around the center of a counting box with length L is thus demonstrated by αi. The number of boxes N(α) where the mass function has exponents range between α and α + dα scales as:

$$ M\left( \alpha \right) \sim L^{ - f \left( \alpha \right)} $$
(3)

where f(α) is the fractal dimension of the fractal units at particular sizes. Scaling of the q-th moments of the density function Pi(L) yields to multifractal measures as

$$ \sum\nolimits_{i = 1}^{M(L)} {P_{i}^{q} (L) = L^{{(q - 1)D_{q} }} } $$
(4)

Hence, the exponent in Eq. (4) is called the mass exponent of q-th moment of order τ(q) that admits the following equation:

$$ \tau \left( q \right) = \left( {q - 1} \right)D_{q} $$
(5)

It is well known as:

$$ \alpha \left( q \right) = \frac{d\tau \left( q \right)}{{dq}} $$
(6)

where Dq denotes the generalized dimensions defined as:

$$ D_{q} = \frac{1}{q - 1}\mathop {\lim }\limits_{L \to 0} \frac{{\ln \;\sum\nolimits_{i = 1}^{M(L)} {P_{i} (L)^{q} } }}{\ln (L)} $$
(7)

The multifractal spectrum illustrated in Fig. 4 is a convex function with a maximum Do at q = 0 and is known as the box-counting dimension51. For q = 1, f (α) = α = D1 is the information dimension. D1 represents the scaling of information generation that describes the rate of information gain by successive measurements or the rate of information loss by time52.

Figure 4
figure 4

The singularity spectrum.

In fact, the set of local scales that may be stated as powers of L is the only one used to estimate the multifractal spectrum because it cannot be calculated as infinity. Additionally, this fact limits the variety of moment q that can be applied53. Therefore, the multifractal spectrum can be computed from:

$$ \mu_{i} \left( {q, \;L} \right) = \frac{{ P_{i}^{q} \left( L \right)}}{{\mathop \sum \nolimits_{i = 1}^{M\left( L \right)} P_{i}^{q} \left( L \right)}} $$
(8)

Thus, the computation of f (q) and α(q) goes as follows:

$$ f\left( q \right) = \mathop {\lim }\limits_{L \to 0} \frac{H\left( L \right)}{{\log L}} = \mathop {\lim }\limits_{L \to 0} \frac{{\mathop \sum \nolimits_{i = 1}^{M\left( L \right)} \mu_{i} \left( {q,\;L} \right)^{ } \log \mu_{i} \left( {q,\;L} \right)}}{\log L} $$
(9)

And

$$ \alpha \left( q \right) = \mathop {\lim }\limits_{L \to 0} \frac{W\left( L \right)}{{\log L}} = \mathop {\lim }\limits_{L \to 0} \frac{{\mathop \sum \nolimits_{i = 1}^{M\left( L \right)} \mu_{i} \left( {q,\;L} \right)^{ } \log P_{i} \left( L \right)}}{\log L} $$
(10)

The second commonly used graph discussed here is the generalized dimension curve (Dq vs. q), which is analogous to applying warping filters to an image to exaggerate parameters that might otherwise be unnoticeable. The term "warp filters" refers to a group of arbitrary exponents represented by the symbol "q". Hence, we can construct a generalized dimension Dq for each q as shown in Fig. 5.

Figure 5
figure 5

The multifractal generalized dimension.

The generalized dimension Dq can be defined as:

$$ D_{q} = \frac{1}{1 - q}\mathop {\lim }\limits_{L \to 0} \frac{{\ln I\left( {q,\;L} \right)}}{{{\text{ln}}\left( {1/L} \right)}} $$
(11)

where I (q, r) is the partition function given by:

$$ I\left( {q,\;L} \right) = \ln \mathop \sum \limits_{i = 1}^{N\left( L \right)} P_{i} \left( L \right)^{q} $$
(12)

Equation (11) becomes:

$$ D_{q} = \frac{1}{1 - q}\mathop {\lim }\limits_{L \to 0} \frac{{\ln \mathop \sum \nolimits_{i = 1}^{N\left( L \right)} P_{i} \left( L \right)^{q} }}{{\ln \left( {1/L} \right)}} $$
(13)

where r denotes the scale of measurement, q is the order of the moment, N(L) is the number of fractal copies based on the scale L and Pi(L) is the growth probability function of the i-th fractal unit. From the general dimension definition, at q = 0, Do describe the box-counting dimension (DB), also known as the capacity dimension. In Eq. (13), when we use a grid of boxes to cover a given space, the box-counting dimension D0 is given by:

$$ D_{0} = \mathop {\lim }\limits_{L \to 0} \frac{\ln N\left( L \right)}{{\ln \left( {1/L} \right)}} $$
(14)

When N(L) is the number of nonempty boxes with length L that cover the space and include at least some part of the attractor (not necessarily the total number of points). At q = 1, D1 is the information dimension (DI) that characterizes the rate of information loss by time or the rate of information gain by sequential measurements. DI analogous to a quantity known as the Shannon entropy. It is given by:

$$ H\left( L \right) = - \mathop \sum \limits_{i = 1}^{N\left( L \right)} P_{i } \left( {\text{L}} \right){\text{ ln}}P_{i} \left( L \right) $$
(15)

Provided we apply the Taylor expansion to Eq. (12), we have:

$$ \ln I\left( {q,L} \right) = \left( {q - 1} \right)ln\mathop \sum \limits_{i = 1}^{N\left( L \right)} P_{i} \left( L \right)\ln P_{i} \left( L \right) $$
(16)

So, Eq. (13) becomes:

$$ D_{I} = \mathop {\lim }\limits_{L \to 0} \frac{{\ln \mathop \sum \nolimits_{i = 1}^{N\left( L \right)} P_{i} \left( L \right)\ln P_{i} \left( L \right)}}{{\ln \left( {1/L} \right)}} $$
(17)

At q = 2, D2 is the correlation dimension54, which characterizes the correlation between pairs of points on a reconstructed attractor. From Eq. (13), the correlation dimension (DC) is given by:

$$ D_{C} = \mathop {\lim }\limits_{L \to 0} \frac{{\ln \mathop \sum \nolimits_{i = 1}^{N\left( L \right)} P_{i} \left( L \right)^{2} }}{lnL} $$
(18)

If D0 = D1 = D2, the structure is termed as monofractal or unifractal. If Do > D1 > D2, the structure is termed as multifractals.

Lacunarity measurement

Lacunarity is a measure of the different gaps distribution throughout an image55. It gives an assessment of the structure heterogeneity. The higher lacunarity value, the less heterogeneous in the fractal geometry. The mean lacunarity Λ can be written as:

$$ {\Lambda } = \left( {\sigma /\mu } \right)^{2} $$
(19)

where µ: the mean for pixels per box, σ: the standard deviation.

Naïve Bayes

It is a supervised learning algorithm based on Bayes’ theorem. Naïve Bayes is considered as a probabilistic classifier with an assumption of independence among predictors. It has several advantages: (1) Fast, easy and simple to implement. (2) No need for large training datasets. (3) It can be used for discrete and analogue data. The main idea in the Naive Bayes classifier is that the presence of a particular feature is unrelated to the presence of any other features. Therefore, it cannot be learnt if there is a relation between the features56,57,58.

Bayes' theorem is used to determine the probability of a hypothesis with the prior knowledge of a class. It can be described by:

$$ P\left( {C{|}x} \right) = \frac{{P\left( {x{|}c} \right)P\left( C \right)}}{P\left( x \right)} $$
(20)

where \(P\left(C|x\right)\) "Posterior probability": is the probability of hypothesis/class "C" on the observed event/features "x"; \(P(C)\) "Prior probability": is the probability of hypothesis before observing the evidence. \(P\left(x|C\right)\) "Likelihood probability": is the probability of the evidence given that the probability of a hypothesis is true. \(P(x)\) "Marginal Probability": is the probability of the evidence or the prior probability of predictor.

Assuming that X represents as the extracted features and can be written as:

$$ X = \left( {x_{1} ,\;x_{2} ,\;x_{3} ,\; \ldots .,\;x_{n} } \right) $$
(21)

Therefore, the probability of a hypothesis/class c for the selected features X with number n can be written as:

$$ P\left( {C{|}x_{1} ,\;x_{2} ,\;x_{3} ,\; \ldots ,\;x_{n} } \right) = \frac{{P\left( {x_{1} {|}c} \right)P\left( {x_{2} {|}c} \right)P\left( {x_{3} {|}c} \right) \ldots P\left( {x_{n} {|}c} \right)P\left( C \right)}}{{P\left( {x_{1} } \right)P\left( {x_{2} } \right)P\left( {x_{3} } \right) \ldots P\left( {x_{n} } \right)}} $$
(22)

Equation (22) can be written in simple form as:

$$ P\left( {C{|}x_{1} ,\;x_{2} ,\;x_{3} ,\; \ldots ,\;x_{n} } \right) = \frac{{P\left( C \right)\mathop \prod \nolimits_{i = 1}^{n} P\left( {x_{i} {|}c} \right)}}{{\mathop \prod \nolimits_{i = 1}^{n} P\left( {x_{i} } \right)}} $$
(23)

According to the used datasets, the classifier system may have m classes:

$$ C = \left( {c_{1} ,\;c_{2} ,\;c_{3} ,\; \ldots ,\;c_{m} } \right) $$
(24)

Then the classifier system can select the class with the highest probability value as:

$$ C = argmax_{j = 1}^{m} = \frac{{P\left( {c_{j} } \right)\mathop \prod \nolimits_{i = 1}^{n} P\left( {x_{i} {|}c_{j} } \right)}}{{\mathop \prod \nolimits_{i = 1}^{n} P\left( {x_{i} } \right)}} $$
(25)

In this study, there are three classes (m = 3) of Aβ plaques \(\left({c}_{1},{c}_{2},{c}_{3}\right)\) as diffuse, CAA, and dense-core. The RF optimized hyperparameters59 can be listed in Table 1.

Table 1 The RF optimized hyperparameters.

The methodology is based on extracting the most changeable features related to AD, the system has 12 extracted features (X = 12). These features can be illustrated in Fig. 6 and listed as follows:

  1. 1.

    The lacunarity (λ),

  2. 2.

    The maximum value of α (αmax) in the singularity spectrum,

  3. 3.

    The singularity spectrum at the αmax (f(αmax)),

  4. 4.

    The minimum value of α (αmin) in the singularity spectrum,

  5. 5.

    The singularity spectrum at the αmin (f(αmin)),

  6. 6.

    The α value at the maximum of the singularity spectrum curve (α0),

  7. 7.

    The width of the singularity spectrum curve (width),

  8. 8.

    The symmetrical shift of the singularity spectrum curve,

  9. 9.

    The box-counting dimension (D0),

  10. 10.

    The Information dimension (D1),

  11. 11.

    The correlation dimension (D2).

Figure 6
figure 6

The extracted features.

These features can be illustrated in Fig. 6

Most of the time, reducing the number of input variables or the extracted features might enhance the efficiency of the model, as well as lowering the computing cost of modelling. Therefore, when creating a predictive model, it is desired to perform a feature selection process to reduce the number of extracted features. This can be done by using a feature selection algorithm as a Random Forest (RF) algorithm60,61.

Random forest algorithm

Random forest is a supervised machine learning algorithm. It is a modified version of the decision trees. It is usually trained using the “bagging” method. It is a collection of multiple decision trees to increase the overall result. To start the training of the RF algorithm, three parameters have to be adjusted first to be operated as a classifier procedure. These parameters can be summarized as (1) the number of the used trees, (2) the number of nodes, and (3) the number of the features sampled. As shown in Fig. 7.

Figure 7
figure 7

The random forest algorithm.

Several advantages can be obtained as a result of using the RF algorithm, these advantages can be listed as (1) reducing the risk of overfitting, (2) performing both classification and regression tasks, (3) giving a good explanation for the resultant, (4) easily determination of the important features, and (5) easily handling of large datasets. However, RF suffers from disadvantages as (1) large time-consuming, (2) more computation resources, and (3) more complex in prediction than the decision tree.

In almost all classification systems, hundreds or thousands of features are used to obtain accurate results. On the other hand, not all the extracted features are important or play a strong influence in the classification processes. Therefore, it is required to create a classification model that includes the most important features, called "Feature Selection". This makes the model simpler, reduces the computational time, and reduces the model variance.

The feature selection can be performed by using a Recursive Feature Elimination procedure62,63. In this study, after creating the classification model, the less relevant feature is removed. Features are ranked by the model performance measures, eliminating the less important features per loop. Repeat the procedures until reaches the high-ranked features.

The workflow of the proposed methodology can be summarized as shown in Fig. 8.

Figure 8
figure 8

The workflow of the proposed methodology.

Results and discussions

The dataset demographic characteristics

The proposed methodology based on using the archived images in Davis Alzheimer’s Disease Center Brain Bank64 at California University. These samples had the following features:

  1. 1)

    In order to ride of endogenous protein, the samples were pretreated with formic acid.

  2. 2)

    An amyloid-β antibody had been used to stain the tissue.

  3. 3)

    The samples were 5 μm formalin fixed

  4. 4)

    Portions of the human brain's superior and middle temporal gyrus that had been encased in paraffin.

  5. 5)

    Aperio Digital Pathology Slide Scanners were used for digitalizing the slides with magnification factor up to 40x.

The dataset demographic characteristics can be summarized in Table 2.

Table 2 The demographic characteristics.

The image singularity spectrums

The image analyses using multifractal are shown in Figs. 9, 10, 11. Figure 9 shows the singularity spectrum for diffuse cases. Figure 10 shows the singularity spectrum for the CAA cases. Figure 11 shows the singularity spectrum for the Dense-core cases. As the amyloid plaques increase, the heterogeneity in the brain tissue increases. Therefore, the spectrum became wider with different asymmetrical shapes as shown in Fig. 12. As the amyloid plaques increase, the curves have moved to the right as the image heterogeneities have grown, with differing singularity spectrum start and end values αmin and αmax respectively. Table 3 summarizes 15 sample images for AD with the extracted feature values.

Figure 9
figure 9

The Diffuse images singularity spectrum.

Figure 10
figure 10

The CCA images singularity spectrum.

Figure 11
figure 11

The Dense-core images singularity spectrum.

Figure 12
figure 12

The singularity spectra for the AD stages.

Table 3 Sample of the extracted features data.
Figure 13
figure 13

The features importance using RF algorithm 1) The lacunarity, 2) αmax, 3)f(αmax), 4) α0, 5) αmin, 6) f(αmin), 7) The width, 8) Symmetrical shift, 9)D0, 10) D1, and 11) D2.

Figure 14
figure 14

Raking of the feature importance provided by RF.

Figure 15
figure 15

The statistical representation of the most important features of the AD stages.

According to the proposed methodology, eleven features have been extracted; they described the changes in the brain tissue related to AD. To reduce the used features, the RF algorithm is employed to remove the less relevant features and is described in Fig. 13. According to Fig. 13; the important features can be concluded as lacunarity, αmax, αmin, Symmetrical shift, and D0. They have an importance weight of not less than 0.5.

The Figure 14 shows the ranking of the feature importance provided by RF59. It represents the raking of the feature importance for the diffuse cases, CCA, and dense-core cases for different thresholds. The blue pars (features) are discarded as being under the threshold value. Performing a model evaluation using multiple thresholds, the optimum threshold value can be chosen as 0.5, due to the lack of importance of the discarded features as f(αmax), α0, f(αmin), The width, D1, and D2.

To explain the importance of the selected features, Figure 15 illustrates the statistical representation of the most important features.

As shown in Figure 15a,b, the diffuse stage has the lowest values of (αmax) and (αmin) while the dense-core stage has the largest value due to the increase in the amyloid plaques accumulation. In Figure 15c,d, the diffuse stage has achieved the highest (D0) and (Lacunarity) due to fewer Amyloid-beta plaques, which resulted from more homogeneity in the diffuse dataset images than other stages. As illustrated in Figure 13 and 15e, the diffuse stage has a shift left to the symmetrical axis of the singularity spectrum rather than CCA and dense-core stages have a shift right to the symmetrical axis.

Performance measures

To ensure the effectiveness of the proposed NB algorithm using the most important features, another classifier as K-Nearest Neighbor (KNN) classifier has been used as a benchmark analysis. Several performance measures have been calculated as shown in Tables 4 and 5, and Fig. 16.

Table 4 The classification data.
Table 5 Performance measure parameters for the classifiers.
Figure 16
figure 16

The confusion matrices analyses for (a) Naïve Bayes (b) K-Nearest Neighbor.

The statistical characteristics obtained from the shown tables demonstrate that the proposed Naïve Bayes classifier has achieved the best performance. It has an accuracy of 99%. The classification method achieves a sensitivity of 100%, specificity of 98.5%, precision of 97.1%, and F-score of 98.5%.

A Comparative analysis

A comparison of the suggested classification system with different classification parameters has been included in Table 6 to confirm its efficacy. Only one scientific paper43 used the same working datasets; the comparison with other researchers who used other datasets may not be fair for all algorithms. Therefore, the comparative results are as follows:

Table 6 Comparative evaluation.

As shown in Table 6, the proposed methodology has achieved high accuracy with less dataset images.

Conclusion

Alzheimer's disease (AD) is one of the most dreadful and generic classes of dementia, which causes a progressive loss of memory and cognitive function, leading to poor quality of life. The deposition of amyloid plaques is the cause of AD. Amyloid plaque aggregates are composed of amyloid-beta (Aβ), which causes the progression of AD disease. The current study proposed the assessment of the amyloid-beta using multifractal geometry. To automate the classification of AD stages, Naïve Bayes and Random Forest as a Feature selection were used. The proposed methodology achieved an accuracy of 99% and a sensitivity of 100%. The quality of the dataset images is the main limitation of the proposed methodology. It should be not less than 35% to obtain good extracted features.

Future work

  • Design a new Graphical User Interface application (GUI) to extract the most important features related to amyloid plaque morphologies as an aiding diagnosis tool.

  • Using multifractal geometry as an analysis tool for detecting or classifying brain tumors.