Introduction

Feature selection is one of the major steps in pattern recognition and classification since it aims to eliminate the redundant and irrelevant features within a dataset. It can be challenging to decide which features are useful without prior knowledge. As a result, numerous feature selection techniques are used to select the best features which give superior performance1. Particularly in applications, each dataset contains numerous significant numbers of features. The key objective of feature selection is to have a greater understanding of the methodology that produced the data in order to identify a subset of pertinent features from the vast pool of available features2.

There are two main types of feature selection techniques. First, filtering techniques that don't rely on learning algorithms but rather specific data attributes. In contrast, wrapper approaches evaluate the chosen subset of features using learning algorithms. Although wrapper methods are computationally expensive, they are more accurate than filter approaches3. In general, feature selection is typically a multi-objective optimization problem. Its two main goals are to reduce the feature space and gives high performance. When there is a tradeoff between these two objectives, which they frequently do, the best choice must be made4.

Recently, meta-heuristic optimization algorithms are frequently used for finding the most discriminative features. The most methods that have been studied are Particle Swarm Optimization (PSO)5, Ant Colony Optimization (ACO)6, Genetic Algorithm (GA)7, Genetic Programming (GP)8, Simulated Annealing (SA)9, Differential Evolution (DE)10, Cuckoo Search (CS)11, Artificial Immune Systems Algorithm (AIS)12, Tabu Search (TS)13, and Whale Optimization algorithm (WOA)14. In other hand, there are studies including multi objective and its hybrid versions that have been published with these classical meta-heuristic algorithms. The theorem of No-Free-Launch (NFL) is the reason of studies multiplicity where no algorithm can give best solution for all problems, so there is always a probability to find better solution with new meta-heuristic algorithm, that’s why there are hundreds of studies in this field15.

Xue et al.16 provided first multi-objective method for feature selection using PSO algorithm, the experiments on 12 Benchmark dataset showed better results for their method comparing traditional one. Emary et al.17 used Anti Lion Optimization (ALO) in two approaches and compared the results with other common algorithms such GA and Big Bang algorithm (BBA) which proved the capability of their proposed method to find optimal features using 20 UCI dataset. Also, he employed Lèvy flight random walk with Ant Lion Optimizer (ALO) and the results showed its improvement comparing to the native ALO using 21 Benchmark dataset3. Genetic algorithms were the earlier method that have been used in feature selection, Aalaei et al.18 developed feature selection method by genetic algorithm (GA) to diagnose breast cancer using Wisconsin breast cancer dataset. Their experiments improved the accuracy, specificity and sensitivity. Ferriyan et al.19 used GA on NSL-KDD Cup 99 datasets. By using one point crossover instead of two, they get better results on the datasets they used comparing to original method.

The artificial bee colony (ABC)20 algorithm is a simple, flexible, and efficient meta-heuristic optimization algorithm. However, it can suffer from slow convergence due to its lack of a powerful local search capability. Etminaniesfahani et al.21 overcome this weakness by hybridizing the ABC algorithm with Fibonacci indicator algorithm (FIA)22, calling the new algorithm by ABFIA21. Their hybrid algorithm combines the strengths of the artificial bee colony (ABC) algorithm and the Fibonacci indicator algorithm (FIA) by combines the global exploration of the FIA with the local exploitation of the ABC. They demonstrate that the hybrid algorithm outperforms the ABC and FIA algorithms and produces superior results for a variety of optimization functions that are commonly used in the literature, including 20 scalable basic and 10 complex CEC2019 test functions. Akinola et al.23 combined the binary dwarf mongoose BDMO algorithm with simulated annealing (SA) algorithm and compared it with other 10 algorithms. The results showed that their proposed (BDMSAO) method is better than other algorithms.

Eluri et al.24 introduces a novel wrapper-based method called BGEO-TVFL for addressing feature selection challenges. Their proposed BGEO-TVFL method employs Binary Golden Eagle Optimizer with Time Varying Flight Length (TVFL) to enhance feature selection. Their method adapts the Golden Eagle Optimizer (GEO), a swarm-based meta-heuristics algorithm, for discrete feature selection. Their work explores various transfer functions and incorporates TVFL for a balanced exploration–exploitation trade-off in GEO. They measure their performance evaluation by using UC Irvine datasets and comparison with standard feature selection approaches namely BAT, ACO, PSO, GWO, GA, CS, IG, CFS, GR. The obtained results reveal the superiority of BGEO-TVFL. Their method is tested using CEC benchmark functions, demonstrating its effectiveness in addressing dimensionality reduction issues compared to existing methods.

Chaotic Binary Pelican Optimization Algorithm is proposed by Eluri and Devarakonda25, their proposed algorithm leverages the principles of chaos theory in a binary context to enhance the efficiency of the Pelican Optimization Algorithm for this purpose. In this binary variant, they introduce chaos to improve exploration and exploitation capabilities. Their algorithm aims to address the challenges of feature selection, particularly in handling large datasets and optimizing performance. Their proposed Chaotic Binary Pelican Optimization Algorithm is presented as a promising solution for improving feature selection outcomes in data analysis tasks.

Feature Selection with a Binary Flamingo Search Algorithm and a Genetic Algorithm is discussed by Eluri and Devarakonda26. They evaluate the performance of HBFS-GA using 18 different UCI datasets and various metrics. The results demonstrate that HBFS-GA outperforms existing wrapper-based and filter-based FS methods.

In the new proposed technique for feature selection, the DMO algorithm is used with chaotic maps to select the best prominent features. The DMO is used to explore and find minimal possible features in the datasets. The K-Nearest Neighbor (KNN) is used to evaluate the performance of the selected features. The results obtained by the proposed method proved their efficiency and gave better performance over other related state-of-the-art methods. We can summarize the main contribution of this paper as follows:

  • Propose a new hybrid feature selection method called CDMO based on improving the performance of DMO using chaotic maps.

  • Evaluate the proposed CDMO method using ten UCI datasets employing the K-nearest Neighbors (KNN) as a classifier to prove its effectiveness.

  • The results obtained by the proposed CDMO give superior performance than the original DMO algorithm and with other well-known meta-heuristic-based feature selection methods.

  • On the CEC’22 test suite, the effectiveness and solution quality generated by our proposed method are computed and compared by all 9 chaotic maps and compared with state-of-the-art algorithms.

The rest part of this study is organized as follows: Section "Background" presents background on DMO algorithm and chaotic maps. Section "The proposed CDMO for feature selection" explains the proposed model. Experimental results and analysis are discussed in Section "Experimental results". Finally, the conclusion is summarized in Section "Conclusion and future work".

Background

Dwarf Mongoose Optimization Algorithm (DMO)

DMO27 is a meta-heuristic method that simulates the foraging behavior of the dwarf mongoose that uses its compensatory behavioral adaptations. The mongoose has two main compensatory behavioral adaptations, which are:

  1. 1.

    Prey size, group size, and space utilization.

  2. 2.

    Food Provisioning.

Large prey items, which could provide food for the whole group, are not amenable to capture by dwarf mongooses. Due to the lack of a killing bite and organized pack hunting, the dwarf mongoose has evolved a social structure that allows each individual to survive independently and move from one location to another. The dwarf mongoose lives a semi-nomadic lifestyle in an area big enough to accommodate the entire colony. Because no previously visited sleeping mound is returned, the nomadic lifestyle ensures that the entire territory is explored and prevents over-exploitation of any one area27.

Population initialization

The candidate populations of the mongooses (X) are initialized using Eq. (1). Between the upper bound (UB) and lower bound (LB) of the given problem, the population is generated stochastically.

$$X=\left[\begin{array}{ccccc}{x}_{\mathrm{1,1}}& {x}_{\mathrm{1,2}}& ...& {x}_{1,d-1}& {x}_{1,d}\\ {x}_{\mathrm{2,1}}& {x}_{\mathrm{2,2}}& ...& {x}_{2,d-1}& {x}_{2,d}\\ & \vdots & {x}_{i,j}& \vdots & \\ {x}_{n,1}& {x}_{n,2}& ...& {x}_{n,d-1}& {x}_{n,d}\end{array}\right]$$
(1)

where \(X\) is the populations, created at random by Eq. (2), \({x}_{i,j}\) stands for the location of the jth dimension in the ith population, n stands for population size, and d stands for the problem dimension.

$${x}_{i,j}=VarMin+rand\times \left(VarMax- VarMin\right)$$
(2)

where rand is a random number between [0, 1], VarMax and VarMin are upper and lower bound of the problem. The best solution over iteration is the best-obtained solution so far.

The fitness of each solution is calculated after the population has been initiated. Equation (3) calculates the probability value for each population fitness, and the alpha female (α) is chosen based on this probability.

$$\alpha =\frac{fi{t}_{i}}{{\sum }_{i=1}^{n}fi{t}_{i}}$$
(3)

The n-bs is equal to the number of mongooses in the alpha group. Where bs represents the number of nannies. Peep is the alpha female's vocalization that directs the family's path.

The DMO applies the formula from Eq. (4) to provide a candidate food position.

$${X}_{i+1}={X}_{i}+phi*peep$$
(4)

where phi is a uniformly distributed random number [− 1,1], after each iteration, the sleeping mound is specified as in Eq. (5).

$$s{m}_{i}=\frac{fi{t}_{i+1}-fi{t}_{i}}{\mathit{max}\{|fi{t}_{i+1},fi{t}_{i}|\}}$$
(5)

The average value of the sleeping mound found is given by Eq. (6).

$$\varphi =\frac{{\sum }_{i=1}^{n}s{m}_{i}}{n}$$
(6)

The mongooses are known to avoid returning to the previous sleeping mound, so the scouts search for the next one to ensure exploration. The scout mongoose is simulated by Eq. (7).

$${X}_{i+1}=\left\{\begin{array}{c}{X}_{i}-CF*phi*rand*\left[{X}_{i}-\overrightarrow{M}\right] if {\varphi }_{i+1}>{\varphi }_{i}\\ {X}_{i}+CF*phi*rand*\left[{X}_{i}-\overrightarrow{M}\right] elsewhere\end{array}\right.$$
(7)

where, \(CF=(1-\frac{iter}{Ma{x}_{iter}}{)}^{\left(2\frac{iter}{Ma{x}_{iter}}\right)}\) indicates the variable, which decreases linearly with each iteration, that controls the group's collective-volatile movement. \(\overrightarrow{M}={\sum }_{i=1}^{n}\frac{{x}_{i}\times s{m}_{i}}{{X}_{i}}\) is the vector that controls the mongoose's movement to its new sleeping mound.

Chaotic maps

Chaos is a phenomenon that can exhibit non-linear changes in future behavior when its initial condition is even slightly altered. Additionally, it is described as a semi-random behavior generated by nonlinear deterministic systems28. One of main search algorithms is Chaos Optimization Algorithm (COA) which moves variables and parameters from the chaos to the solution space. It relies on determining the global optimum for stochastic, regular, and periodicity chaotic motion properties. Due to its simplicity and speedily convergence, COA has widely used in last ten years in many papers e.g.,29,30,31,32. To obtain the chaotic sets, we have used ten well known one-dimensional maps that have been used frequently in literature. Figure 1 shows that the maps have different behaviors which allow testing the behavior of DMO on different maps.

Figure 1
figure 1

Ten chaotic maps.

The proposed CDMO for feature selection

In this study, an alternative feature selection technique is proposed using the Chaotic Dwarf Mongoose Optimization (CDMO) as in Fig. 2. Random numbers which are used in Eq. (7) are replaced by chaotic maps to avoid returning to same sleeping mound.

$${X}_{i+1}=\left\{\begin{array}{ll}{X}_{i}-CF*phi*\rho *\left[{X}_{i}-\overrightarrow{M}\right] &\quad if {\varphi }_{i+1}>{\varphi }_{i}\\ {X}_{i}+CF*phi*\rho *\left[{X}_{i}-\overrightarrow{M}\right] &\quad else\end{array}\right.$$
(8)

where \(\rho\) is value obtained from well-known chaotic maps which reported in Table 1.

Figure 2
figure 2

Flowchart of the proposed CDMO algorithm.

Table 1 Ten chaotic maps.

After that, we have set the dimension of the problem, which is d in Eq. (1) as the number of features then give value of \(VarMin\) and \(VarMax\) in Eq. (2) as 0 and 1, respectively. For each row in Eq. (1) (i.e., the position of each element in \({X}_{i}\)) is threshold by 0.5, since the values are set between 0 and 1. After that, elements with positions > 0.5 are considered as candidate features, while elements with positions < 0.5 are not considered in this solution.

$${X}_{i,j}=\left\{\begin{array}{ll}1&\quad { x}_{i,j}>0.5\\ 0 &\quad Otherwise\end{array}\right.$$
(9)

The candidate features are then applied to the fitness function which calculates the classification accuracy of k-nearest neighbor classifier using the applied candidate features.

$$Fitness= \frac{Number\,of\,wrong\,classified }{Total\,numbers\,of\,instances}+\frac{|{X}_{i}|}{d}$$
(10)

Each time the fitness function is invoked the dataset is divided using the holdout method to 80% training dataset and 20% testing dataset. Algorithm 1 and Fig. 2 show the algorithm and the flowchart of the proposed technique, respectively.

figure a

Algorithm 1 Steps of the developed method.

Experimental results

Dataset and parameters setting

Table 2 lists the 10 datasets that were used in this study which are come from the well-known UCI data warehouse33. They have been chosen with different dimensions and different patterns to evaluate the performance of the proposed method on several complexities.

Table 2 Datasets used in this study.

K-nearest neighbor (KNN) is employed as a classifier in this study as it is one of the most common and simplest learning algorithms, it is trained using the training dataset, then, tested using the testing part, which ensures higher reliability. To simplify the evaluation process, we choose K = 5 in KNN as 5NN34.

Performance metrics

In this study we have used two types of metrics to evaluate the performance which are Fitness metrics and classification Metrics.

In fitness metrics we have used four statistical measurements which are the worst, best, mean fitness value and the standard deviation which are mathematically defined as following

$${\text{BestFitness}}= {{\text{Max}}}_{{\text{i}}=1}^{{N}_{r}}{{\text{BS}}}_{{\text{i}}},$$
(11)
$$\mathrm{Worst Fitness}= {\mathit{Min}}_{i=1}^{{N}_{r}}{{\text{BS}}}_{{\text{i}}},$$
(12)
$$\mathrm{Mean Fitness}= \frac{1}{{N}_{r}}\sum_{{\text{i}}=1}^{{N}_{r}}{{\text{BS}}}_{{\text{i}}},$$
(13)
$$\mathrm{Standard Deviation }\left({\text{SD}}\right)= \sqrt{\frac{\sum_{{\text{i}}=1}^{{N}_{r}}{({{\text{BS}}}_{{\text{i}}}-\upmu )}^{2}}{{N}_{r}}}$$
(14)

where BS is the best score gained in each iteration and Nr is the number of runs35.

The second evaluation was used to evaluate the selected features using classification measures. These measures are accuracy, precision, sensitivity, specificity, and F-Score. Accuracy is a common technique of evaluation, which is defined as the ratio of correctly classified samples to all samples. It’s mathematically defined as following

$$Accuracy= \frac{TP+TN}{TP+TN+FP+FN} ,$$
(15)

Precision, specificity and sensitivity are proper metrics to measure the performance of classification across unbalanced datasets. While they are not affected by differences in data distribution, therefore these measures are useful for evaluating classification performance in unbalanced learning scenarios36. The F-Score metric make combination between precision and sensitivity and it is given by Eq. (19). Therefore, F-Score is suitable in unbalanced scenarios than the accuracy metric. Precision, sensitivity, specificity and F-score measures are defined by the following equations:

$$Precision = \frac{TP}{TP+FP}$$
(16)
$$Sensitivity= \frac{TP}{TP+FN}$$
(17)
$$Specificity = \frac{TN}{TN+FP}$$
(18)
$$F-score= \frac{2*\left(precision *Sensitivity\right)}{Precision +Sensitivity}$$
(19)

where TP is the true positive, FP is the false positive, FN is the false negative and TN represents the true negative.

Performance of DMO based on ten chaotic maps

To evaluate the performance of the proposed CDMO, 10 different datasets from UCI repository are used. The obtained results are compared with the DMO and other well-known meta-heuristic algorithms namely, PSO5, ACO6, ARO37, HHO38, EO39, RTHS40, RSGW41, SSAPSO42, BGA43 and WOA14 algorithms. Each one of them has been performed 25 runs in the same PC specifications. To test the convergence capability, the average 25 runs has been computed and compared for each algorithm. Table 3 illustrates the parameter settings of the algorithms used in this study. The experiments are divided into two sections, the first one is to evaluate the performance of the ten chaotic maps on DMO algorithm as shown in Tables 4 and 5, the second experiments are to compare the best chaotic maps with the six meta-heuristic algorithms DMO, ACO, PSO, ARO, HHO, and WOA as shown in Tables 6 and 7.

Table 3 Parameter setting.
Table 4 Accuracy comparison between ten CDMO.
Table 5 Average fitness comparison between ten CDMO.
Table 6 Comparison between CDMO8 and 6 meta-heuristic algorithms in classification metrics.
Table 7 Comparison between CDMO8 and 6 meta-heuristic algorithms in fitness metrics.

Table 4 shows the accuracy of the average runs for the ten CDMO where the number after CDMO refers to the map number in Table 1, for example CDMO1 is Chebyshev map. Results in Table 4 shows that the Singer map which is CDMO8 has higher results in three datasets named (breastEW, SpectEW, Waveform), CDMO1 and CDMO7 have best results in (KrvskpEW) and (Ionosphere), respectively. All maps have same accuracy in two datasets named (base_exactly) and (base_M-of-n3). Table 5 shows the comparison of average fitness value of the ten chaotic maps. The Singer map (CDMO8) achieved best results in 5 out of 10 datasets. Both CDMO4 and CDMO6 achieved same result in base_M-of-n3. Also, CDMO1, CDMO3, CDMO5, CDMO7, CDMO10 have best results in one dataset for each, so CDMO8 has been chosen to be compared with ACO, PSO, WOA, ARO, HHO and DMO algorithms.

Figure 3 illustrates the convergence curves for the ten chaotic maps. In this figure, the number of iterations is equal to 100. As it can be observed from this figure, almost singer map obtains best result. This is due to that it converges faster than other maps.

Figure 3
figure 3figure 3

Comparison between ten chaotic maps.

Comparison with other meta-heuristic techniques

In this section, we will compare the performance of the developed method based on Singer map with well-known and most used techniques named PSO, ACO, ARO, HHO and WOA.

From Table 6, the CDMO gives best accuracy in seven datasets (base_BreastEW, SonarEW, SpectEW, Waveform, CongressEW, breastEW and Ionosphere) while DMO gives superior performance in one data set named KrvskpeEW. Moreover, DMO and CDMO give equal performance in 2 datasets (base_M-of-n3 and base_exactly). Based on the results of Precision, CDMO8 has better results in seven datasets. Whereas DMO has better results in one dataset named BreastEW, both CDMO8 and DMO have same results in two datasets. By analysis of the obtained results of the Sensitivity, the CDMO8 has highest results of four datasets, while DMO and PSO have highest results in three datasets and one dataset, respectively. Moreover, both CDMO8 and DMO have same results in two datasets named base_exactly and base_M-of-n3. For specificity results, CDMO8 has highest results in seven datasets while PSO has best results in only one dataset named BreastEW. Besides, both CDMO8 and DMO have same results in two datasets. In addition, F-measure results show that CDMO8 has better results in five datasets while DMO has better result in KrvskpEW dataset and ARO has better result in SpectEW and ionosphere datasets, both CDMO8 and DMO have same results in two datasets.

Table 7 presents the results of fitness metrics which is standard deviation SD, Best, Worst and the Average of fitness function. In the average of fitness function, the CDMO8 achieved best results in 9 out of 10 datasets while ACO has best results in Ionosphere dataset only. In terms of best measure, the CDMO8 has best results in 5 out 10 datasets while the original DMO has best results in 2 out of 10 datasets, ARO has better value in ionosphere and base_M-of-n3 datasets both CDMO8 and DMO have same results in breastEW dataset. Furthermore, for Worst measure, CDMO8 has best results in 5 out of 10 datasets, while PSO has the second rank by 3 out of 10 datasets. WOA and DMO have highest results in one dataset for each. Additionally concerning standard deviation, WOA has the superior results by 7 out of 10 datasets, neither CDMO nor original DMO got best results in standard deviation results.

Figure 4 shows the comparison between CDMO8 and other meta-heuristic algorithms (i.e., PSO, ACO, DMO, ARO, HHO and WOA) in convergence curve. As observed from figure, CDMO8 converges faster in most figures.

Figure 4
figure 4figure 4

Comparison between best chaotic map and 6 meta-heuristic algorithms.

Table 8 compares the accuracy of CDMO8 against 6 state-of-the-art methods namely, BGA, RTHS, RSGW, EO, SSAPSO and HSGW. It is clear that our proposed CDMO method stands at the top over these methods. CDMO8 produces higher accuracy in 8 out 10 datasets.

Table 8 Comparison of CDMO8 with other 6 state-of-the-art methods based on achieved accuracy (highest classification accuracies are in bold).

Performance evaluation on CEC’22 benchmark functions

In this section, the performance of the proposed CDMO algorithm in solving optimization problems is tested. To this end, the numerical solving efficiency of CDMO is evaluated by solving twelve functions of CEC’22. The performance of the proposed CDMO on the CEC’22 benchmark function has been determined. Table 9 presents the outcomes for a CEC’2022 test suite for 30 runs performed by the proposed ten chaotic DMO. These benchmark functions consist of four types unimodal, basic, hybrid and composite functions. It is found that CDMO9 achieves the best performance.

Table 9 Comparison of simulation outcomes using DMO with 10 chaotic maps for a CEC’2022 test suite for 30 runs.

In order to verify the effectiveness of CDMO9, the results of the proposed CDMO9 are compared, in Table 10, with six novel optimization algorithms namely, Artificial Hummingbird Algorithm (AHA)44, African Vultures Optimization Algorithm (AVOA)45, Crow Search Algorithm (CSA)46, Harris Hawks Optimization (HHO)38, Northern Goshawk Optimization (NGO)47 and Satin Bowerbird Optimizer (SBO)48. Besides, in order to demonstrate the ability of CDMO9 to solve optimization problems, the obtained results are compared with two algorithms recently improved by scholars namely, an adaptive quadratic interpolation and rounding mechanism Sine Cosine Algorithm (ARSCA)49 and boosting Archimedes Optimization Algorithm using trigonometric operators (SCAOA)50. The experimental results show that the proposed method compares favorably with these methods.

Table 10 Comparison of simulation outcomes for a CEC’2022 test suite for 30 runs (highest classification accuracies are in bold).

Conclusion and future work

Chaotic Dwarf Mongoose Optimization Algorithm (CDMO) was proposed which is Dwarf Mongoose algorithm hybridized by chaos. To enhance the performance of the proposed technique, ten chaotic maps were employed where CDMO is used as a wrapper feature selector. The CDMO gives superior performance than the well-known meta-heuristic algorithms, namely PSO, ACO, WOA, ARO, HHO BGA, RTHS, RSGW, EO, SSAPSO, HSGW and DMO. The obtained results proved that the capability of CDMO to select the best feature set gives high classification results. Moreover, the experimental results proved that the adjusted variable using the Singer map significantly enhanced the DMO algorithm in terms of classification performance, and fitness performance. Moreover, our proposed algorithm is tested using the recent optimizers in CEC’22.

In the future work we can extend this work to solve real world problem like medical data. In addition, it would be interested to investigate in hybridization DMO algorithm with another swarm meta-heuristic algorithm.

Ethics approval

This research contains neither human nor animal studies.