A multistart tabu search-based method for feature selection in medical applications

In the design of classification models, irrelevant or noisy features are often generated. In some cases, there may even be negative interactions among features. These weaknesses can degrade the performance of the models. Feature selection is a task that searches for a small subset of relevant features from the original set that generate the most efficient models possible. In addition to improving the efficiency of the models, feature selection confers other advantages, such as greater ease in the generation of the necessary data as well as clearer and more interpretable models. In the case of medical applications, feature selection may help to distinguish which characteristics, habits, and factors have the greatest impact on the onset of diseases. However, feature selection is a complex task due to the large number of possible solutions. In the last few years, methods based on different metaheuristic strategies, mainly evolutionary algorithms, have been proposed. The motivation of this work is to develop a method that outperforms previous methods, with the benefits that this implies especially in the medical field. More precisely, the present study proposes a simple method based on tabu search and multistart techniques. The proposed method was analyzed and compared to other methods by testing their performance on several medical databases. Specifically, eight databases belong to the well-known repository of the University of California in Irvine and one of our own design were used. In these computational tests, the proposed method outperformed other recent methods as gauged by various metrics and classifiers. The analyses were accompanied by statistical tests, the results of which showed that the superiority of our method is significant and therefore strengthened these conclusions. In short, the contribution of this work is the development of a method that, on the one hand, is based on different strategies than those used in recent methods, and on the other hand, improves the performance of these methods.


Motivation
Diagnosis is a process in which a disease, injury, or other adverse effect is identified based on its signs and symptoms.To aid the establishment of a diagnosis, the clinician may use the patient's medical history or conduct a physical examination and various tests, such as blood tests, imaging tests, and biopsies.In most pathologies, diagnosis has evolved toward faster and more accurate processes 1 .
In medicine, early diagnosis is vital for the effective treatment of some pathologies.For example, early detection improves patient survival in pathologies such as pancreatic cancer 2 .Moreover, early diagnosis is crucial to improve the prognosis of many diseases, e.g., tumors and melanomas 3 .In other diseases, such as diabetes, early diagnosis has become essential to alleviate the secondary complications of the disease, e.g., arterial lesions, hypercholesterolemia, hypertension, obesity, and myocardial infarction 4 .The diagnosis of Alzheimer's disease in its earliest stages is very important, as it allows this disease to be differentiated from other neurodegenerative disorders with dementia 5 .
Data mining 6,7 is a set of tools that have enabled the analysis of large amounts of information, generating patterns and norms that help to understand the behavior of a system.Within this set of tools, those used in prediction-and especially classification-have been adapted to the diagnosis of diseases.Their use in this scope has become an important emerging line of research 8 .With different models, such as those based on logistic regression 9 , discriminant analysis 10 , support vector machine 11 , neural networks 12 , classification trees 13 , nearest neighbor search 14 , and Bayesian classifiers 15 , among other approaches, it is possible to establish an early diagnosis.

Literature on feature selection methods
Feature selection methods can be classified into three types: filter, wrapper, and embedded 19 .Filter methods select a certain number of variables based on criteria such as correlation, likelihood, and information gain without the intervention of a classifier 19,20 .Some examples of filter methods are the following algorithms: correlation feature selection (or the correlation algorithm) 6 , mutual information 16 , the ReliefF algorithm 7 , the chi-square algorithm 21 , the Fisher score algorithm 22 , and the fast correlation-based filter 23 .Additionally, Hancer 24 proposed a filter method based on differential evolution.
Wrapper methods explore different combinations or subsets of variables to evaluate the usefulness of each of these subsets through the predictive efficacy of a specific classifier.The aim is to find the best of these subsets.Wrapper methods usually generate better results than filter methods, since the latter do not evaluate the performance of the selected variables with a certain classifier 17 , although wrapper methods require longer computation times than filter methods.Due to the robustness of metaheuristic techniques in several complex applications, some of these techniques have been used to create wrapper methods for feature selection, such as genetic algorithms [25][26][27] , the gray wolf optimizer 28,29 , the flower pollination algorithm 30,31 , the bat algorithm 32,33 , the ant colony optimization algorithm 34,35 , the whale optimization algorithm 36,37 , particle swarm optimization 38,39 , the harmony search algorithm 40,41 , and the Harris hawk optimization algorithm 18,42 .
Finally, embedded methods integrate feature selection and classifier learning in a single process.These methods have been used in studies as [43][44][45][46] , and 47 .

Literature on feature selection in medicine
Different approaches have been used to select variables/features and obtain models that establish diagnoses with high precision 48 .For example 49 , extracted different features of magnetic resonance imaging (MRI) data to select the most important characteristics for the diagnosis of brain tumors.Additionally, from MRI data 50 , defined biomarkers by quantifying the precision of different sets of morphological features to diagnose Alzheimer's disease.Liu e al. 51 developed a method to measure the performance of both individual features and different subsets of features to identify lung diseases.Similarly, Chong et al. 52 created a classifier that enabled the detection of fibrotic interstitial lung disease using 3D texture features of computed tomography images.Shi et al. 53 used a method based on feature selection that incorporates the clinician's knowledge to achieve an accurate diagnosis of prostate problems.Additionally, Guinin et al. 54 developed an automatic diagnostic tool for prostate pathology, and Sahran et al. 55 proposed a new feature selection method from prostate histopathological images.Furthermore, feature selection studies have been conducted for different types of cancers.Thus, Jain et al. 56 proposed a model to improve the diagnosis and the identification of cancer types.Wang et al. 57 used a feature selection strategy for the diagnosis and classification of cancer with different gene expression profiles.Peng et al. 58 used a forward feature selection algorithm for tumor classification.Finally, Kang et al. 59 combined feature selection, logistic regression models, and support vector regression for tumor classification.Feature selection algorithms have also been combined with the AdaBoost classifier for the detection of glaucoma 60 .
An interesting and recent line of work in this field is that of methods based on evolutionary strategies inspired by biological metaphors.Thus, in Awadallah et al. 61,62 , strategies based on Rat Swarm and Horse Herd behavior were used.In order to improve these methods, different operators are analyzed.On the other hand, Braik et al. 63 proposed three methods based on the Capuchin Search Algorithm, considering the k-Nearest Neighbor (k-NN) classifier.
One interesting field of application of feature selection in bio-medicine is gene interaction detection, such as single nucleotide polymorphisms (SNPs) epistatic interaction.In many cases, different metaheuristic strategies have been used.Thus, in the works of Tuo et al. 64,65 this problem was approached from a multi-objective and multi-task optimization point of view.In both studies, methods based on harmony search optimization were proposed.In 66 a similar approach was taken and a method based on ant colony optimization was proposed.Shang et al. 67 proposed a simulation method combined with resampling methods.
In this paper, we propose a method based on tabu search and multi-start strategies.This is a stark difference with recent work based on evolutionary methods (such as those mentioned above).Evolutionary methods are, in general, more intuitive and easier to adapt and implement.Methods based on local search, such as the one proposed in this work, require greater adaptation to each specific problem and a greater implementation effort.On the other hand, our experience in different fields [68][69][70] demonstrates that these local-search based methods (and specifically those based on tabu search) perform better than evolutionary methods.Indeed, as will be seen below, different computational tests show that our proposed method outperformed other recent methods as gauged by various metrics and classifiers.Our method obtains more efficient models-i.e. a better balance between the number of variables selected in the model and its precision.The results of various statistical tests support these conclusions.
The rest of the study is organized in the following manner: Section "Notation and formulation of the research problem" formulates the research problem formally; Section "Solution method for the proposed problem" describes the proposed method in detail; Section "Computational tests" describes the computational tests conducted to analyze the performance of our method against the other methods; and finally, Section "Conclusions" presents the conclusions.

Notation and formulation of the research problem
To formulate the research problem that is addressed in this study (selection of variables for diagnosis), the data used to select the variables and to generate the models were defined as the training set.This dataset is X .The number of individuals/cases of X is n.The number of variables is m .Finally, the set of variables is V , and the jth variable is v j , that is: For each individual of X , both the value of his or her features and his or her class (the presence or absence of the disease) are known.The problem lies in finding the S ⊂ V subset that maximizes the objective function f (S) .This function is defined in the following manner: where Rat(S) is defined as the rate of cases of X that are well classified by the model obtained with the vari- ables of S and the classifier considered.Parameter β ∈ [0, 1] controls for the rate of correct diagnoses and the size of |S| .Therefore, the objective function f balances the classifying capacity with the size of the set obtained.This objective function has been used in similar studies, such as 26,29,71 , and 38 .Usually, the β values are equal or approximately equal to 0.99, which is the value we used in the present study.

Solution method for the proposed problem
The solution method is a procedure (MultiStartTabu) that combines the multistart and TS strategies.In each iteration, two procedures are performed: a constructive procedure (Constructive) that generates an initial solution, and a second procedure that improves the solution generated by the constructive procedure.This improvement procedure (TabuSearch) is based on the TS strategy.The method ends when the solution does not improve after a series of iterations.Pseudocode 1 presents the MultiStartTabu method.
As shown in the pseudocode, iter is an auxiliary variable that indicates the number of the current iteration; iterbest indicates where the best solution was found; S best and f best are the best solution and its classifying capacity, respectively; finally, maxiterMS is a previously defined parameter indicating the number of iterations that must take place, with no improvement of f best , in order for the method to conclude.
Next, we explain the Constructive and TabuSearch procedures.The Constructive procedure begins with S = ∅ ; in each step, it selects an element of v j * ∈ V − S among those that would best improve the objective function if added to S, and it is then added to S. The process ends when there are no elements in V − S that improve the objective function f .Pseudocode 2 shows the Constructive procedure.
The procedure is relatively simple.Initially, we execute S = ∅ ; in the following steps, for the v j elements that are not in S , the value of the objective function of S is calculated if v j is added ( g j = f (S ∪ v j ); then, the list L′ is created with the indices of the elements that improve the objective function of the current solution ( g j ≥ f (S) ).Next, from the elements of the list L′, another list ( L , a list of candidates) is created from the indices Pseudocode 1. MultiStartTabu method.
Vol:.( 1234567890) www.nature.com/scientificreports/with the highest g j values, and one of them is randomly selected ( j * ).Finally, the corresponding variable v j * is added to S .The process ends when L′ = ∅ (there are no elements that improve the current solution).As can be observed, L′ is initially composed of all the indices of the elements of V (considering f (∅) = 0 ).Parameter α regulates the size of L .It takes values between 0 and 1; thus, if α = 0 , then L = j : v j ∈ V − S , and the process is completely random; on the other hand, if α = 1 , then L consists exclusively of index j , which corresponds to gmax , and the process is therefore deterministic.It is important to select an adequate value of α that allows different high-quality solutions to be obtained.TS 72 is a metaheuristic strategy that, in its basic version, consists of a neighbor search procedure.Each step analyses all the possible movements that can be made from the current solution, and the best movement is selected.Simple movements are used in order to ensure that each movement results in a solution that is relatively similar to the current solution (a nearby or "neighboring" solution).The procedure allows movements to solutions that do not improve the current solution.Moreover, to prevent the algorithm from cycling, some movements are declared "tabu" and are initially not considered.
In our case, we considered three types of movements: a) adding an element v j′ ∈ V − S ; b) removing an ele- ment v j ∈ S ; and c) exchanging an element v j ∈ S with another element v j′ ∈ V − S. The set of neighbor solutions of S (i.e., those that are reached through these movements) is defined as N(S).
To avoid cycles, the output from S (throughout a series of iterations) of elements that recently entered S is declared "tabu".Similarly, the input into S of elements that recently left S is also declared "tabu".To verify the tabu status of the entry or exit of an element v j ∈ V , we defined: VectorIn j :Number of the iteration in which the element v j entered S. VectorOut j :Number of the iteration in which the element v j left S. Thus, the entry of an element Additionally, the exit of an element v j ∈ S is tabu if Finally, the exchange of an element v j ∈ S with an element v j′ ∈ V − S is tabu if either of the two aforemen- tioned conditions is confirmed to be present.
The parameter tenure indicates the number of iterations in which an output or input is tabu.The auxiliary variable iter represents the number of iterations.On the other hand, the tabu status of a movement can be ignored (and thus the movement can be considered) if such movement results in a solution with a greater value of the objective function f than the previous solutions visited ("aspiration criterion").Pseudocode 3 shows the TabuSearch procedure.
As can be observed in Pseudocode 3, each iteration considers all the movements that are not tabu or that meet the aspiration criterion.The best neighbor solution considered is stored in the variable S b .This change is executed ( f b = f (S′)andS= S b ), and the values of VectorIn and/or VectorOut are updated according to the type of move- ment performed and the elements involved.After each iteration, S * and f * , which are the best solution found during the search and its value of the objective function f , respectively, are updated.The procedure ends after a preestablished number of iterations ( maxiterTS ) have taken place with no improvement of f * .In this procedure, the parameter tenure plays an important role: high values result in many movements being declared tabu, thus reducing the flexibility of the process; low values may not prevent cycles.Therefore, adequate selection is critical.

Computational tests
This section describes different computational tests.The first group of tests was performed to adjust the parameters of the proposed model (Section "Fine-tuning of parameters").The second group of tests was conducted to compare our method with other popular variable selection methods in the recent literature (Section "Comparison with other methods").As can be observed in Section "Comparison with other methods", our method, in general, obtained better results according to several metrics.Multiple databases of medical diagnoses were used.
To carry out these tests, nine databases were used: eight databases from the well-known repository of the University of California, Irvine (USA) (https:// archi ve.ics.uci.edu/) and another database of Alzheimer's diagnoses, which is presented for the first time in this study (https:// www.ubu.es/ metah euris ticos-grinu bumet/ ejemp los-y-datos-de-probl emas).Because these databases concern diagnoses, the data were divided into two different types (presence of the disease, or "positive, " and absence of the disease, or "negative").Table 1 shows descriptions of the different databases and their characteristics.
It should be noted that in the Cervical database, the two variables with the highest number of missing values were initially eliminated, and then the cases with missing values were eliminated.

Fine-tuning of parameters
To adjust the parameters, three of the nine databases were considered: one with few features (Parkinson), another with many features (Quality Assessment of Digital Colposcopies, or QADC), and another with an intermediate number of features (Wisconsin Breast Cancer -Prognosis, or WPBC).Discriminant analysis was used as a classifier due to its rapid calculation capacity.
Four parameters in our MultiStartTabu (MST) method were analysed: α , tenure, maxiterTS and maxiterMS .The parameter maxiterTS was used as the stopping criterion in the TabuSearch procedure, whereas maxiterMS Procedure ℎ ( , , ; output: * ) 1. Execute: * = , * = ( ), = 0, = 0 2. Execute: Determine the tabu status of the corresponding movement 6. Determine whether the "aspiraƟon criterion" is met, i.e., verify whether ( ′) > * was used as the stopping criterion in the general MST procedure.To analyze the parameters α and tenure , we set the values maxiterTS = 10 • n and maxiterMS = 20 .For α , we considered the following values: α = 0, 0.1, 0.5, 0.9, 0.99, and 1.For tenure , we considered the following values: tenure = n/2, n, 2 • n and 5 • n .After analyz- ing the 20 combinations, we found the best results at α = 0.99 and tenure = n/2 .Subsequently, the parameters maxiterTS and maxiterMS were analyzed using these values for α and tenure .It was observed that with values higher than maxiterTS = 10 • n and maxiterMS = 10 , there were no significant improvements.Therefore, we selected these values.Figure 1 below shows the evolution of the objective function f over a series of iterations that make up the MultiStartTabu method for the WPBC database.The blue line indicates the values obtained in each iteration, and the red line shows the evolution of the best value.It can be seen that with a relatively low number of iterations the best final value is already reached.

Comparison with other methods
Baseline methods, classifiers, metrics, and experimental details This subsection describes the tests and the corresponding results that were used to compare our method with five recent wrapper methods (baseline methods): genetic algorithm, or GA 26 ; gray wolf optimizer, or GWO 71 ; particle swarm optimization, or PSO 39 ; whale optimization algorithm, or WOA 36 ; and flower pollination algorithm, or FPA 31 .
As classifiers, we used discriminant analysis (DA), logistic regression (LR), and a support vector machine (SVM).For the experiments, we used a k-fold cross-validation design ( k = 10 ), which allowed us to conduct statistical tests.In addition to the value of the objective function, the following metrics were used: where TP , TN , FP , and FN represent true positive, true negative, false positive, and false negative results, respectively.As can be observed, the definition of ACC is equivalent to that of Rat , although it refers to the training set.The three metrics are also frequently employed and very useful when the subsets of each class are unbalanced or poorly balanced.
For each database, the following pre-processing steps were carried out: the data of the features were normalized, and the database was divided into 10 folds, resulting in 10 training set-test set pairs.Subsequently, for each database and classifier, the following process was performed: an i9-10920X processing unit and 128 GB RAM.The parameters used by the baseline methods were the ones recommended by the corresponding articles, except for the stopping criterion, as was commented on above.
-For the estimation of the logistic regression models, we used the algorithm of Lin et al. 73 , whereas for the SVM models, we used the method proposed by Hsieh et al. 74 .
Table 2 shows the computation times used by our MST (and thus by the rest of the methods) for each of the databases and classifiers.The mean time and standard deviation on the set of the 10 folds are indicated.
As can be observed, the computation times were longer in the databases with a larger number of variables (QADC) than in those with a smaller number of variables (Parkinson).Moreover, for the calculation of the objective function f (S) and, more specifically, the rate of correct diagnoses Rat(S) , it was necessary to generate the model with the corresponding classifier using the variables of S .In the case of DA, the models were obtained easily and immediately, whereas in the case of LR and SVM, the models were obtained through a more complex optimization process, as mentioned above.

Results of DA
Next, we show the results obtained for each classifier.First, the results obtained for DA are presented.Table 3 displays the results obtained for the objective function f , with the mean and standard deviation for each database and method.The method with the best mean is indicated in bold text.From these results, a paired two-tailed t test was carried out with each of the other methods.Significant differences in favor of our method are indicated with "+" after the result of the method that it is compared with; negative differences in favor of the other method are indicated with "−"; and "=" represents the absence of significant differences.The last row (W/T/L) indicates, for each baseline method, the number of "wins" (W) (i.e., the number of significant differences in favor of our method compared to the other method), "ties" (T) and "losses" (L).
From Table 3, it can be observed that, in all databases, our method obtained the best results in terms of the objective function f .Moreover, in all cases, the difference with respect to each of the other methods was signifi- cant.Figure 2 shows nine radial plots (one per database) representing the mean results of the different methods.
Similarly, Table 4 and Fig. 3 present the results of the ACC metric on the test set.Table 4 shows the means, standard deviations, and results of t tests (as in Table 3).Figure 3 displays radial plots of the mean results.From Table 4 and Fig. 3, the following can be observed: -In all databases, our method obtained better mean results than any of the other methods tested.The PSO method obtained the same mean result only in the WPBC database.www.nature.com/scientificreports/-Moreover, in most cases (combinations of databases/baseline method), these differences were significant.Only in 10 of the 45 cases were the differences not significant.Specifically, in the t tests compared to GWO, our method "tied" in 5 databases and "won" in 4 of them.In general, the results obtained by the GWO method were relatively similar to those obtained by our MST method.
Tables 5, 6, and 7 show the results of the AUC , GMean , and F1 metrics, respectively, as in the previous tables.The conclusions of these results are very similar to those drawn for the ACC metric:  www.nature.com/scientificreports/-Our MST model obtained a better mean result than the other methods in almost all cases.There was only one case in which the mean value of a baseline method was the same (the AUC metric, WPBC database, and PSO method), and there were another two cases in which the result of a baseline method was slightly better (the Gmean metric, WPBC database, and PSO method; the Gmean metric, QADC database and GA method), although the differences were not significant.-Moreover, in most cases, the differences in favor of our method were significant.Only the results of the GWO method were relatively similar to those of our method in terms of the three metrics, according to the t tests (5 not significant differences and 4 significant differences in favor of our method in AUC and 6 not significant differences and 3 significant differences in favor of our method in Gmean and F1).www.nature.com/scientificreports/Finally, Table 8 shows a summary of the test results ("wins", "ties" and "losses"), in terms of both the objective function f and each of the metrics, for each baseline method.The last row indicates the total sum of "wins", "ties" and "losses" for each baseline method.From these results, it can be concluded that our method obtained better results than the rest of the methods.With respect to the objective function f , all differences were significant in favor of our method.Regarding the metrics, compared to GA, PSO, WOA, and FPA, significant differences were obtained in most cases (of 36 cases, there were 27 significant differences compared to GA, 29 compared to PSO, 33 compared to WOA and 31 compared to FPA).Compared to GWO alone, fewer significant differences were observed (13 out of 36).There were no significant differences in favor of any of the other methods.

Results of LR
Next, we analyze the results obtained for LR.To avoid an excessive number of tables and figures, we present only Table 9, which shows the results of the accuracy ( ACC ) metric (means, standard deviations, and t tests), and Table 10, which displays a summary of the t tests ("Wins", "Ties" and "Losses"), both for the objective function f and for each of the metrics, as applied to each baseline method.www.nature.com/scientificreports/As can be observed from Table 9, our method obtained the best mean results in all databases.The GWO method obtained the same mean result only on the SPECTF Heart database.Moreover, in most cases (40 out of the 45 cases), these differences were significant.Specifically, in the t tests against the GWO method, our method "tied" on 3 databases and "won" on the other 6.As in the case of DA, it seems that the results obtained by GWO were relatively similar to those obtained by our MST.
From Table 10, it can be observed that our method obtained better results than the rest of the methods.Regarding the objective function f , all differences were significant in favor of our method.With respect to the metrics, compared to WOA and FPA, there were 32 significant differences in favor of MST; compared to PSO, there were 30 significant differences in favor of MST; compared to GA, there were 28 significant differences; and compared to GWO, there were 21 significant differences.As in the case of DA, there were no significant differences in favor of any of the other methods.

Results of SVM analysis
Finally, we analyze the results obtained for SVM.As in the case of LR, Table 11 shows the results of the ACC metric, and Table 12 presents the summary of the t tests in terms of both the objective function f and each of the metrics (Table 12).Table 11 shows that, in all the databases, our method obtained better mean results than any of the baseline methods.Moreover, in most (37 out of the 45 cases), these differences were significant.Specifically, as in the cases of LR, in the t tests compared to GWO, our method "tied" in 4 databases and "won" in the other 5.
As can be observed in Table 12, our method obtained better results than any of the other methods.Regarding the objective function f , all the differences were significant in favor of our method.With respect to the metrics, compared to the WOA, there were 34 significant differences in favor of MST; compared to FPA, there were 31 significant differences in favor of MST; compared to PSO, there were 30 significant differences; compared to GA, there were 26 significant differences; and compared to GWO, there were 22 significant differences.As in the cases of DA and LR, there were no significant differences in favor of any of the other methods.

Comments on the set of results
Ultimately, the obtained results indicate the following: -Considering all the databases, our method obtained better results than any of the other classifiers in terms of the objective function f used in the variable selection process.Furthermore, with respect to the other methods, these differences were significant in all cases.-In terms of all the metrics considered, the models obtained by our method in this process achieved better global results on the test set than the models obtained by any of the other methods.In fact, our method yielded the best mean results in most of the specific cases (that is, considering all the metrics, all the databases and all the classifiers used).Similar mean results were obtained in a very small number of cases, and there was one isolated case of a difference in favor of another method, although that difference was not significant.-In addition, these differences in the metrics in favor of our method were significant in most of the cases.Thus, of the 108 tests that were conducted with each of the baseline methods (considering 4 metrics, 9 databases, and 3 classifiers), the differences in favor of our method were significant in 81 cases compared to GA, in 56 cases compared to GWO, in 89 cases compared to PSO, in 99 cases compared to WOA, and in 94 cases compared to FPA.In the rest of the cases, there were no significant differences in favor of any method.Therefore, it seems that only the results of the GWO method were similar to those of MST, and only in half of the cases.

Conclusions
The use of classification models is increasing in the field of medicine, since they help to improve the diagnosis of diseases.Advances in these models are necessary and useful to increase the precision and reduce the uncertainty of diagnoses.Currently, there are databases with several features that aid in the creation of models.However, not all features contribute to this task in the same way.Some features may be irrelevant or noisy, and they can deteriorate the performance of the models.There may also be negative interactions among features, which are not always easy to distinguish and can also reduce the performance of the models.Thus, feature selection is critical in the field of medicine: it improves the diagnosis of diseases, identifies which features lead to a better diagnosis, and identifies which risk factors (personal traits, habits, etc.) have the largest impacts on the disease.From a computational perspective, feature selection is a hard problem due to the complexity of the search space.The most commonly used feature selection methods are based on evolutionary strategies, such as GAs, swarm optimization techniques, and bioinspired algorithms.This study proposes a wrapper method that creates solutions and improves them through a tabu search process in a multistart framework.Therefore, the strategy it follows is not among the ones that are most frequently used in this field.The results of different computational tests in medical databases show that our method outperforms other recently developed wrapper methods in the literature.The tests were conducted using different classifiers and metrics, and the results are accompanied by statistical tests that strengthen the conclusions.
In general, the main limitation of our method is that it requires significant adaptation to each specific problem and costs a great deal of effort to implement.Other recent methods, such as evolutionary methods, seem to be more intuitive and easier to implement.On the other hand, our method demonstrates better performance considering different metrics, databases, statistical tests, etc.
In short, the contribution of this work is the development of a method that, on the one hand, is based on different strategies than those used in recent methods, and on the other hand, significantly improves the performance of these methods.

Figure 2 .
Figure 2. Radial plots of the mean values of the objective function f (%) of the different methods using DA.

Figure 3 .
Figure 3. Radial plots of the mean ACC values (%) of the different methods using DA.

Table 2 .
Computation time in seconds (mean and standard deviation) for each method.

Table 4 .
Results of ACC (%) using DA.The best values are in bold.

Table 5 .
Results of AUC (%) using DA.The best values are in bold.

Table 6 .
Results of Gmean (%) using DA.The best values are in bold.

Table 7 .
Results of F1 (%) using DA.The best values are in bold.

Table 8 .
Summary of the t test results for f and the different metrics with DA.

Table 9 .
Results of ACC (%) using LR.The best values are in bold.

Table 10 .
Summary of the t test results for f and the different metrics with LR.The best values are in bold.

Table 12 .
Summary of the t test for f and the different metrics with SVM.