Individuals redistribution based on differential evolution for covariance matrix adaptation evolution strategy

Among population-based metaheuristics, both Differential Evolution (DE) and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) perform outstanding for real parameter single objective optimization. Compared with DE, CMA-ES stagnates much earlier in many occasions. In this paper, we propose CMA-ES with individuals redistribution based on DE, IR-CMA-ES, to address stagnation in CMA-ES. We execute experiments based on two benchmark test suites to compare our algorithm with nine peers. Experimental results show that our IR-CMA-ES is competitive in the field of real parameter single objective optimization.

Individuals redistribution based on differential evolution for covariance matrix adaptation evolution strategy Zhe Chen 1,2* & Yuanxing Liu 1

Among population-based metaheuristics, both Differential Evolution (DE) and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) perform outstanding for real parameter single objective optimization. Compared with DE, CMA-ES stagnates much earlier in many occasions. In this paper, we propose CMA-ES with individuals redistribution based on DE, IR-CMA-ES, to address stagnation in CMA-ES. We execute experiments based on two benchmark test suites to compare our algorithm with nine peers. Experimental results show that our IR-CMA-ES is competitive in the field of real parameter single objective optimization.
The aim of real parameter single objective optimization is to find the best decision vector which can minimize (or maximize) an objective function in solution space. For years, real parameter single objective optimization is a hot spot of Artificial Intelligence (AI). A variety of population-based metaheuristics have been proposed in literature for the purpose.
Among types of population-based metaheuristic for real parameter single objective optimization, both Differential Evolution (DE) 1 and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) 2  In execution of population-based metaheuristics, two phenomena, early convergence and stagnation, which both lead to the fact that no further improvement on solution can be made, are very common. The former phenomenon means that all individuals in population become same before a global optimum is found, while the other one means that difference between individuals is too low for operators of algorithm to obtain better solution although a global optimum is still not found. For non-trivial instance of real parameter single objective optimization, stagnation occurs much more often than early convergence in execution of types of populationbased metaheuristics, including DE and CMA-ES.
The motivation of this paper is as below. Compared with DE, CMA-ES stagnates much earlier in many occasions. Therefore, measures, such as niching approach and restart strategy have been taken for years to help CMA-ES resist stagnation. Compared with niching approach, restart strategy makes more famous CMA-ES variants. For example, the winner in the CEC 2013 NBIPOP-aCMA-ES and the winner in the CEC 2018 HS-ES are both CMA-ES variants with restart. Beside restart, an improved version of univariate sampling is employed in HS-ES. The further comparison 8 shows that HS-ES are one of the top performers among the six winners in the five competitions held in 2013, 2014, 2016, 2017, and 2018, respectively. It can be seen that, with the help of methods for resisting stagnation, CMA-ES performs better for real parameter single objective optimization than before. In fact, the above methods for resisting stagnation are very simple in idea. Now that such simple ideas are effective for improving CMA-ES, a more complicated strategy may be more promising. For example, DE may be a good choice for improving CMA-ES on resisting stagnation.
In fact, hybridisation techniques, such as memetic computing, are widely concerned in the field of AI 9 . Furthermore, there exist hybridisations of CMA-ES and another metaheuristic for different purposes. Examples are listed below. In CMA-ES/HDE 10 , CMA-ES and hybrid DE occupy a subpopulation, respectively. Migration

DE and CMA-ES for real parameter single objective optimization
In this section, the most popular methods for real parameter single objective optimization, CMA-ES and DE, are further introduced. Then, our idea is analyzed based on the features of CMA-ES and DE.
In population of DE, operators such as mutation, crossover, and selection, are exerted on individuals, i.e., target vectors. In the initial generation of population, target vectors � x i,0 = (x 1,i,0 , x 2,i,0 , . . . , x D,i,0 ) , where i is from 1 to NP and NP denotes the population size with dimensionality as D are produced randomly. In a given generation g, mutant vectors v i,g are produced based on target vectors x i,g by mutation. DE algorithms are compatible with different mutation strategies. Here, two mutation strategies among the popular ones, DE/rand/1 and DE/ best/1, are presented in Eqs. (1) and (2) respectively, for instance In the equations, r1, r2 and r3 are distinct integers randomly chosen from the range [1,NP], and different from i. F is the scaling factor. x best,g denotes the individual with the best fitness in the generation g. After mutation, trial vectors � u i,g = (u 1,i,g , u 2,i,g , . . . , u D,i,g ) are generated based on x i,g and v i,g by crossover. A widely used crossover strategy-binomial crossover-is where Cr ∈ [0, 1] is the crossover rate, and randn(i) is an integer randomly generated from the range [1,NP] to ensure that u i,g has at least one component from v i,g . In DE, crossover and mutation together are specified as trial vector generation strategy. For selection, the operation is where f ( u i,g ) and f ( x i,g ) represent fitness of u i,g and x i,g , respectively.
In population of CMA-ES, the g + 1 th generation is obtained based on the gth generation as follow, (3) u j,i,g = v j,i,g , if rand(0, 1) ≤ Cr or j = randn(i), x j,i,g , otherwise, sel is the set of indices of the same individuals with I (g) sel = µ · σ (g) is the global step size. The random vectors z k in Eq. (5) are N (0, I) distributed (n-dimensional normally distributed with expectation zero and the identity covariance matrix) and serve to generate offspring. We can calculate their center of mass as The covariance matrix C (g) of the random vectors B (g) D (g) z (g+1) k is a symmetrical positive n × n-matrix. The columns of the orthogonal matrix B (g) represent normalized eigenvectors of the covariance matrix. D (g) is a diagonal matrix whose elements are the square roots of the eigenvalues of C (g) . Hence, the relation of B (g) and D (g) to C (g) can be expressed by where b (g) i represents the i-th column of B (g) and b (g) i = 1 and d (g) ii are the diagonal elements of D (g) . Surfaces of equal probability density of the random vectors B (g) D (g) z (g+1) k ∼ N 0, C (g) are (hyper-)ellipsoids whose main axes correspond to the eigenvectors of the covariance matrix. The squared lengths of the axes are equal to the eigenvalues of the covariance matrix.
Either CMA-ES or DE is based on population. It can be seen that DE is much simpler in steps than CMA-ES. Hence, to resist stagnation, modification based on DE is easier than that based on CMA-ES. According to our idea, when CMA-ES is trapped in stagnation, DE takes over population. As a result, individuals are produced in a new manner. If the new individuals survive selection, the state of stagnation may be broken since distribution of population changes significantly. Details of our method are shown in "Methods".

Results and discussion
In our experiments, our IR-CMA-ES is compared with nine population-based metaheuristics, L-SHADE 5 , UMOEAs-II 7 , jSO 15 Table 1. Settings of the involved peers.
Discussion. In our IR-CMA-ES, DE with offspring-surviving selection is employed when stagnation is detected. By this means, execution may jump out of stagnation at the cost of fitness going worse. Nevertheless, population may be further optimized. Therefore, IR-CMA-ES shows better performance than the peers with CMA-ES-UMOEAs-II and HS-ES. Meanwhile, it can be seen that our algorithm performs better than the other peers.   Table 2. Results of the ten algorithms for the CEC 2014 functions with 30 in dimensionality. "+ " or "−" denotes that the current result is significantly better or statistical worse than the result of our IR-CMA-ES in terms of Wilcoxon's rank sum test at a 0.05 significance level, respectively. Meanwhile, " ≈ " represents that there is no significant difference.   Our study shows that CMA-ES requires more study for real parameter single objective optimization. In the future, we will try to propose more schemes to enhance CMA-ES. Provided that stagnation in CMA-ES can be resisted better, a great progress in real parameter single objective optimization may be made.

Methods
Population-based metaheuristics for real parameter single objective optimization, including DE and CMA-ES, tend to stagnation. Furthermore, compared with DE, CMA-ES often stagnates even earlier. In fact, the tendency of stagnation in CMA-ES can be reversed by making changes in operators. Here, we choose to resist stagnation in CMA-ES by implement DE with offspring-surviving selection because operators of DE are much simpler than those of CMA-ES for adapting. In detail, we use Eq. (1) for mutation and Eq. (3) for crossover. More importantly, offspring-surviving selection is employed by us. That is, offspring are always selected, while parents are certain to be eliminated from population. In this way, distribution of population varies significantly. Although fitness may go worse after the change of distribution, stagnation has been solved. Then, CMA-ES is recalled to search in a different region.  Table 3. Results of the ten algorithms for the CEC 2014 functions with 50 in dimensionality. "+ " or "−" denotes that the current result is significantly better or statistical worse than the result of our IR-CMA-ES in terms of Wilcoxon's rank sum test at a 0.05 significance level, respectively. Meanwhile, " ≈ " represents that there is no significant difference.  Table 4. Results of the ten algorithms for the CEC 2014 functions with 100 in dimensionality. "+ " or "−" denotes that the current result is significantly better or statistical worse than the result of our IR-CMA-ES in terms of Wilcoxon's rank sum test at a 0.05 significance level, respectively. Meanwhile, " ≈ " represents that there is no significant difference.     (4.90E+00)≈ (5.21E+00)+ (6.58E+00)+ (5.20E+00)− (7.60E+00)− (6.68E+00)+ (9.56E+00)+ (4.98E+00)− (7.28E+00)≈ (6.75E+00)  Table 5. Results of the ten algorithms for the CEC 2017 functions with 30 in dimensionality. "+ " or "−" denotes that the current result is significantly better or statistical worse than the result of our IR-CMA-ES in terms of Wilcoxon's rank sum test at a 0.05 significance level, respectively. Meanwhile, " ≈ " represents that there is no significant difference.  Table 6. Results of the ten algorithms for the CEC 2017 functions with 50 in dimensionality. "+ " or "−" denotes that the current result is significantly better or statistical worse than the result of our IR-CMA-ES in terms of Wilcoxon's rank sum test at a 0.05 significance level, respectively. Meanwhile, " ≈ " represents that there is no significant difference. www.nature.com/scientificreports/ Our IR-CMA-ES is described in Algorithm 1.

Algorithm 1
The pseudo-code of IR-CMA-ES Input: NP, population size; G N , the upper limit number of sequential generations with no significant improvement in average fitness; T , the threshold for the improving ratio G D , the number of generations for DE Parameter: f p , the average fitness in the previous generation; f c , the average fitness in the current generation; f ar , the best fitness found after the recent round of DE; f , the best fitness during the course of execution; if g n == G N then 12: Record the best found fitness 13: sign=1 14: if f ≥ f ar then 15:  21 . We give parameters of IR-CMA-ES as below. Firstly, parameters for CMA-ES is set according to 2 and omitted here. Meanwhile, for DE with offspring-surviving selection, F = 1 and Cr = 0.5 . Then, for the parameters related to our scheme, we give suggested value in Table 8 Table 7. Results of the ten algorithms for the CEC 2017 functions with 100 in dimensionality. "+ " or "−" denotes that the current result is significantly better or statistical worse than the result of our IR-CMA-ES in terms of Wilcoxon's rank sum test at a 0.05 significance level, respectively. Meanwhile, " ≈ " represents that there is no significant difference.