Machine learning-enabled globally guaranteed evolutionary computation

Evolutionary computation, for example, particle swarm optimization, has impressive achievements in solving complex problems in science and industry; however, an important open problem in evolutionary computation is that there is no theoretical guarantee of reaching the global optimum and general reliability; this is due to the lack of a unified representation of diverse problem structures and a generic mechanism by which to avoid local optima. This unresolved challenge impairs trust in the applicability of evolutionary computation to a variety of problems. Here we report an evolutionary computation framework aided by machine learning, named EVOLER, which enables the theoretically guaranteed global optimization of a range of complex non-convex problems. This is achieved by: (1) learning a low-rank representation of a problem with limited samples, which helps to identify an attention subspace; and (2) exploring this small attention subspace via the evolutionary computation method, which helps to reliably avoid local optima. As validated on 20 challenging benchmarks, this method finds the global optimum with a probability approaching 1. We use EVOLER to tackle two important problems: power grid dispatch and the inverse design of nanophotonics devices. The method consistently reached optimal results that were challenging to achieve with previous state-of-the-art methods. EVOLER takes a leap forwards in globally guaranteed evolutionary computation, overcoming the uncertainty of data-driven black-box methods, and offering broad prospects for tackling complex real-world problems. Evolutionary computation methods can find useful solutions for many complex real-world science and engineering problems, but in general there is no guarantee for finding the best solution. This challenge can be tackled with a new framework incorporating machine learning that helps evolutionary methods to avoid local optima.

are structured into two sub-matrices C ∈ ℝ M×s and R ∈ ℝ s×N (where s ≪ min(M, N)); see Fig. 1ai for the sampling result of a nonconvex Schwefel function (Supplementary Table 1); s = 3, M = N = 100. Targeting a sample-efficient reconstruction of F, we turn to learning another mapping function h ∶ ℝ (M×s)×(s×N) ↦ ℝ M×N , so that F = h(C, R) =f(X) (Fig. 1aii). This formulated model may be intractable as it is usually underdetermined. Fortunately, for the low-rank problems classical PSO methods that seriously undermine trust in their application to real-world problems emerging in science and industry. There being no theoretical guarantee of attaining the global optimum of evolutionary computation methods 21 has been an important open problem for more than 50 years 41 . Due to the lack of a unified description of diverse problem structures and a generic mechanism to avoid local optima, globally guaranteed evolutionary computation remains a challenging task, especially for complex non-convex problems. This leads to the so-called uncertainty problem; that is, after a limited computation time, one can hardly know whether or not the returned solution is globally optimal. For another, according to the celebrated no free lunch (NFL) theorem 42 , almost all such heuristic methods are problem-dependent. That is to say, every refined mechanism is only suitable to specific geometric structures (for example, convex envelope, smoothness and so on), yet ineffective or even problematic to the other problems of different structures. Consequently, there is no universal best method for general problems 21,43,44 . This presents another critical difficulty, especially when handling complex real-world problems with unexplored properties.
In this article we report a new evolutionary computation framework aided by machine learning, named EVOLER, which enables the theoretically guaranteed global optimization of complex non-convex problems. By leveraging the common low-rank structure of real-world problems-which is barely considered in evolutionary computation-our new method involves two stages (Fig. 1). First, with a small portion of structured samples (Fig. 1ai), a low-rank representation of the problem is learned (Fig. 1aii), which helps to identify one small attention subspace (Fig. 1aiii). Second, this identified subspace is explored via evolutionary computation methods (Fig. 1b), which offers a generic mechanism to reliably avoid local optima. As shown by empirical and theoretical results, EVOLER finds the global optimum with a probability approaching 1 for rank-restricted problems (Figs. 2c and 3u,v). Moreover, it substantially extends the applicability of evolutionary computation in diverse problems with ubiquitous low-rank structures, and exhibits the best performance in 20 challenging benchmarks 30,45 (see Fig. 3). Furthermore, the convergence can be accelerated by orders of magnitude (Figs. 2f and 3).
We demonstrate our new optimizer in two important real-world problems: power grid dispatch and the inverse design of nanophotonics devices. For the first problem, EVOLER almost surely finds the optimal economic cost that was not reported, making global optimization possible subject to equality/inequality constraints. For the second problem, it consistently gains the optimal nanophotonic structures and attains the most favourable transmittance responses. Past evolutionary computation methods, however, find only local optima or acquire bad performance. Meanwhile, the number of function evaluations (or samples) is reduced by ~5-10-fold with EVOLER, thus enabling many rapid-prototyping applications. By a marriage of low-rank representation and evolutionary computation, our method makes a substantial stride towards theoretically guaranteed global optimization,   (that is, rank (F) ≃ r ≪ min(M, N)), we are able to apply randomized matrix approximation techniques [46][47][48][49] and hence approximate h via one special form: F = h(C, R) ≜ CWR (Fig. 2a). Here, W ∈ ℝ s×s is one weighting matrix that can be directly obtained (see Methods for more details and the extension to d-dimensional problems).
Taking a 2D Schwefel function, for example, the learned low-rank representation is shown in Fig. 1aiii; the residual error between real function values f(x) and reconstructed ones f (x) is given in Fig. 2b. Our new method would have two key advantages compared with classical neural networks. First, it demonstrates a promising extrapolation capacity and thus attains very high sample efficiency. To be specific, we can reconstruct the unknown problem space of MN samples by only using [s(M + N) − s 2 ] samples, where s ≈ (r log r) ≪ min(M, N). Second, it permits an analytical expression for the reconstructed space F , as well as the bounded residual error 46,47 . As such, it further enables a theoretical analysis on the probability of finding the global optimum (see Supplementary Section 2 for details).

Evolutionary computation
Relying on the low-rank representation learned from the first stage, we identify one attention subspace . As its name suggests, this small subspace probably contains the global solution and hence claims the attention of the next evolutionary computation. The initial population x 0 k (k = 1, 2, ⋯ , K) is thus placed around , and then canonic or local-version PSO 18,19,21 can be used to exploit this small subspace (see Methods for more details). Note that other evolutionary computation methods 50 such as genetic algorithm and differential evolution can be combined with our method. Similarly, classical optimization methods such as the downhill simplex algorithm 51 will be also applicable.
After the total T generations, the final solution x T is obtained. As noted, the identification of attention subspace offers a generic mechanism to reliably avoid the local optima (Supplementary Theorem 3). Incorporating the evolutionary computation or other methods, EVOLER converges to the global solution x* with a probability approaching 1, as demonstrated in widespread rankrestricted problems (see approximately low-rank problems in Supplementary Fig. 1b-f and Fig. 6b, and strictly low-rank problems in Supplementary Figs. 1a,g,j and Fig. 5b). More importantly, EVOLER establishes the first theoretical result on the global optimum of the non-convex problems with common low-rank structures; that is, regardless of diverse geometry properties of problems, the lower bound probability of finding the global optimum is given by ) .
Here ϵ is a small constant (see Supplementary Section 2). This lower bound probability relates to two important properties of the objective function. First, Δf ≜ | f(x*) − f(x local )| reflects the local difference (Fig. 2c), that is, the distance between the global optimum f(x*) and the next local optimum f(x local ). Second, ΔE(s) ∝ [ is the error bound of ∥ F −F ∥ 2 F and reflects the global coherence, that is, the extent of the accuracy of representing the other values with few samples. Here, σ j denotes the jth singular value of F. We consider again the non-convex Schwefel function to illustrate our theoretical result. After obtaining the low-rank representation F (Fig. 1aiii), we estimate Δf > 100 and ΔÊ(s) ≃ 10 −12 (s = 3, M = N = 100). In line with the estimated lower bound in Fig. 2c, EVOLER finds the global optimum almost surely, with probability approaching 1 (Fig. 2e). By contrast, classical optimization methods almost inevitably fall into the local optimums in this case (Fig. 2d,g).

Representative benchmark problems
We examine EVOLER in 20 benchmark problems that are commonly used in evolutionary computation 30,45 , including: unimodal, banana-valley, non-continuous, non-differentiable and multimodal functions, the     independent trials with random restarts, so that 1 − 1/N t ≥ Pr{x T → x*}. The averaged convergence curves are thus derived (see Supplementary  Table 3 for the statistical results); the empirical probability of finding the global optimum was also obtained. As the number of function evaluations in each generation may be different for various methods, we focus on the convergence performance with regards to the number of function evaluations. For d = 2, we assume K = 50 and the experimental results are shown in Fig. 3; for d = 30, K = 100 and the results are shown in Fig. 4 and Supplementary Fig. 5. As mentioned, one critical challenge of classical evolutionary computation methods is the uncertainty problem; that is, we can never know whether or not the returned solution is globally optimal. Apart from this, they are especially vulnerable to varying problem structures. Although some methods may find the global optimum in special cases (for example, MPCPSO in the Griewank function with the convex envelope), none of them can gain the good performance in all such benchmarks (Fig. 3a,t). By contrast, our method potentially overcomes the two pitfalls inherent to evolutionary computation. From Fig. 3, EVOLER finds the global optimum with probability 1, regardless of the different benchmark problems of diverse geometry properties. This further validates our theoretical result. As the benchmark problems generally have low-rank structures (see Supplementary Fig. 1), when s ≈ {r log(r/ϵ)}, we have Pr{x T → x*} → 1. See Fig. 2c for the theoretical lower bound probability of the Schwefel function, and Fig. 3v for the theoretical lower bound probability of the convergence to the global optimum of the other benchmark functions, as well as the empirical results. As such, even for the challenging non-convex problems, EVOLER would gain the global optimum almost surely, whereas classical deterministic or heuristic methods fail to do so (Figs. 2g and 3u).
We also observe that EVOLER attains the best performance in all cases. In contrast to classical PSO methods explicitly or implicitly assuming specific geometry properties, in EVOLER a discrete problem space needs only to be rank-restricted. Note that this lowrank structure is easy to evaluate in practice (see the online rank estimation in Supplementary Section 1a and Supplementary Fig. 3) and, moreover, is ubiquitous in application. As a result, EVOLER substantially extends the applicability of evolutionary computation in diverse problems with common low-rank structures, providing great promise to real-world problems with unexplored properties. Moreover, with the sample-efficient reconstruction of the attention subspace, the sample/time complexity is substantially reduced. For example, the required samples of EVOLER are reduced by more than 30-fold compared with CLPSO in the case of Schwefel function (Fig. 3g). To even things up, we also take samples of the first stage into account.
Our method could be scalable to the more challenging 30-dimensional problems. Taking the shifted Levy function, for example, EVOLER almost surely converges to the near-global optimum within 1.45 s; by contrast, classical PSO methods fail to do so with equal time budgets (see the averaged convergence curves in Fig. 4a, and the statistical distribution of the achieved fitness of various methods in Fig. 4b).
Similar results are obtained in the other 30-dimensional benchmark problems (see Supplementary Fig. 5 for the averaged convergence curves and Fig. 4d for the probability of convergence to the global optimum; 500 independent trials with random restarts). Provided the maximum generation of 900, the averaged running time of various methods is also obtained (Fig. 4c)

Dispatch of power grid
Economic dispatch of a power grid is a classical yet still challenging real-world problem, characterized by the intrinsic difficulties in global optimization, that is, non-smooth fitness with many local optima [12][13][14]52 .
Despite the applications of evolutionary computation 14,52-54 , this important problem still remains unsolved. On the one hand, it is not known whether the global solution could be found; on the other hand, the probability of finding it was rarely considered. Assume the power grid consists of d units; each of them has the maximum power P max j and the minimal power P min j so that the output power satisfies P min j ≤ P j ≤ P max j (see Fig. 5a). Mean while, the accumulated power of d units must be a cons tant, that is, ∑ d j=1 P j = P total . Each unit has a non-smooth cost function f j (P j ) = a j + b j P j + c j P 2 j + |d j sin[e j (P min j − P j )]|;[ a j , b j , c j , d j , e j , P min j , P max j ] is the parameters set of unit j, which are available in ref. 55. The goal is to optimize the output powers of d units, that is, , so as to minimize the accumulated economic cost f = ∑ d j=1 f j (P j ), subject to the above inequality and equality constraints.
We first study the power dispatch with d = 3 by applying evolutionary computation methods. As we are interested in the trusted global optimization, each method was evaluated independently for 50 trials and the averaged fitness was obtained (Fig. 5c). From the singular values of 3D fitness, this real-world problem is strictly low ranked (Fig. 5b). We thus expect EVOLER would reliably find the global optimum. As seen, EVOLER indeed converges to a cheaper optimal cost than that of classical methods, after a few number of function evaluations. The minimal expected cost of classical methods is $USD 8,234.51 per 860 MWh (that is, DSNPSO), whereas that of EVOLER is $USD 8,233.66.
We then evaluate the best solutions of different methods and the probabilities of finding them. For classical methods, we estimate an upper-bound probability of successful results, that is, the sub-optimal results falling to [f best j , (1 + κ)f best j ] are treated as successful ones, where f best j is the best result achieved by method j and κ = 10 −3 . From Fig. 5d, EVOLER finds the optimum solution with probability 1. The best cost of EVOLER is $USD 8,233.66 per 860 MWh. Among classical methods, the minimal cost was achieved by GCPSO, that is, $USD 8,233.66, yet   with a small probability of 0.34. Combining the theoretical result on the low-rank problems, it is convinced that EVOLER would gain the global optimum almost surely. Furthermore, EVOLER accelerates the evolutionary computation by more than 15-fold compared with CLPSO (Fig. 5e).
We further demonstrate the advantage of EVOLER for d = 6 (see Supplementary Fig. 6 for the case of d = 13). From Fig. 5f, it converges stably to the minimal cost, that is, $USD 15,303.872 per 1,260 MWh, whereas classical methods can only achieve $USD 15,352.396 in expectation. By rough calculation, the economic cost will be reduced by 1.1 × 10 9 dollars per year, provided that the power consumption of the whole world in 2021 was about 28.32 × 10 12 KWh. From Fig. 5g, EVOLER again finds the global solution with probability 1. By contrast, classical methods more likely end up with local solutions. Although classical methods have an intrinsic defect of problem dependency (for example, GCPSO for d = 3, DNSPSO for d = 6, CLPSO for d = 13), our method consistently gains the best performance (see Fig. 5d,g and Supplmentary Fig. 6).

Inverse design of nanophotonics devices
Inverse design of nanostructures is essential to nanophotonics 5,6 , which has begun to reshape the landscape of structures available to nanophotonics 4 , and has been applied in many fields 7,56,57 such as optical physics, biosensors, optical communications and so on. This computational task is very challenging, with elusive nanophotonics mechanisms and non-unique local solutions. At the present, there are two common strategies to handle it 4,8 : machine learning and evolutionary computation. The former adopts neural networks to model the complex relationship between nanophotonics structures and spectrum responses [58][59][60] , yet lacking the critical interpretability. The latter directly optimizes nanophotonic devices 7,56,61 , but inevitably acquires non-unique local solutions. Most importantly, both of them fail to gain the optimal structure, severely limiting the functionality of various wavelength-sensitive nanophotonics devices such as biosensors 5,56,62 , optical switches 63 and so on. Despite the great importance, the optimal inverse design of nanophotonics devices remains an open problem.
Here we consider one representative inverse-design problem, multilayer thin-film structures [4][5][6]56,59 , which aims to optimize the thickness of L dielectric layers (SiO 2 and Si 3 N 4 ), subject to the target transmittance response. Note that our method can be generalized to similar problems, such as jointly optimizing the materials type or the number of periods 6 . In the experiment, d = 5 thickness parameters of L = 18 dielectric layers need to be adjusted for two desirable responses (Fig. 6a). In the first case, the desirable response involves two bell-shaped transmittance bands ( Fig. 6e; 3 dB bandwidth = 85 nm, the two centre wavelengths are located at 535 nm and 760 nm). In the second case, the response contains one sharp transmittance band ( Fig. 6g; bandwidth = 200 nm, the centre wavelength is located at 750 nm). As noted, this real-world problem is inherently rank-restricted; see the singular values (Fig. 6b) and 2D slice of d-dimensional fitness ( Fig. 6c; a 2 versus a 3 ; a 1 = 120 nm, a 4 = 80 nm, a 5 = 80 nm; two bell-shaped bands).
Given the thickness parameters a ∈ ℝ d , the designed response was obtained by finite difference time domain (FDTD) methods 64 . The fitness is defined as mean square error between the targeted response r t (λ; a) and a designed response r d (λ; a); that is, f(a) = ∑ To demonstrate the global optimality of EVOLER, we first consider one special case with an achievable optimal response, whereby it reliably gains the global solution ( Supplementary Fig. 7). Various PSO methods were then evaluated in two designed cases, and the convergence curves were derived via five independent trials (Fig. 6d,f). In the first case, EVOLER consistently converges to the best result, acquiring a minimal mean square error of 24.7 with a probability of 1. By contrast, the best expected fitness of classical methods is only 26.95, that is, local PSO (see Supplementary Fig. 8 for their designed responses). Similar results are obtained in the second case ( Fig. 6f and Supplementary Fig. 9). When it comes to realistic applications, for example, the wavelength-sensitive optical communications 6,7 , the transmission efficiency and reliability are substantially enhanced, as  2 a 3 a 4 a 5 a 4 a 3 a 2 a 1 a 1 a 2 a 3 a 4 a 5 a 4 a 3 a 2  the in-band desirable signal is maximized while the out-band interference signal is minimized. Another important advantage of EVOLER is that it permits the highly efficient computation, which is very promising to the time-consuming inverse-design tasks. Recall that each sample was independently evaluated in the first stage, which can be implemented in parallel, thus remarkably reducing the time complexity. From Fig.  6d,f, the computational time is reduced by around tenfold compared with classical PSO; two computers are used in the first stage. That is to say, with a dramatically reduced complexity, EVOLER attains the global optimum with a theoretical guarantee. As such, it provides great potentials to the inverse design of nanophotonics structures, as well as the other complex real-world problems.

Discussion
Many important problems emerging in science and industry call for judicious data-driven and derivative-free optimization. Despite broad application prospects, current evolutionary computation methods depend on specific problem structures and fail to acquire globally guaranteed solutions, severely undermining the trust in their applications. Here we report a new framework of evolutionary computation aided by machine learning, EVOLER, which makes it possible to attain the theoretically guaranteed global optimization of complex non-convex problems. As demonstrated on 20 challenging benchmark problems, and two real-world problems with diverse properties yet common low-rank structures, it finds the global optimum almost surely, thus substantially extending applicability. Meanwhile, the number of samples can be remarkably reduced. As such, our method takes a leap forwards in theoretically guaranteed evolutionary computation, overcoming the uncertainty in data-driven black-box methods, and paving the way for solving complex real-world problems. Note that EVOLER also embodies two important biological characteristics that rarely emerge in classical evolutionary computation 65 : (1) neutrality or weak selection at an early stage to preserve the diversity; and (2) major transitions to suddenly discover the promising region of solutions.
According to the theoretical result, EVOLER would be generally applicable to d-dimensional problems with low-rank structures. The complexity of our method is (dr d−1 N + TK) (see Methods and Supplementary Section 3). Our method would be less attractive for problems with high d (for example, d > 100) or without a low-rank structure. There are two remedies for the potential limitations when tackling high-dimensional problems. First, various dimension reduction techniques are readily applicable. Taking the challenging aerodynamic design problem (for example, ref. 16), a set of orthogonal basis functions was first identified; the high-dimensional problem of d > 20 was then transformed to optimize the coefficients of the d ≤ 10 principle orthogonal basis. Second, as a future direction, the incorporation of the hierarchy concept (for example, the hierarchy reconstruction of d-dimensional space; see Supplementary Fig. 4) could substantially reduce the complexity. For example, assume the number of blocks to be L ≤ ⌊ √ d⌋, the complexity of our method will be {dr (d/L−1) N + TK}. Thus, the complexity in solving high-dimensional problems could be effectively reduced (see Supplementary Fig. 5 for the challenging benchmark problems of d = 30, and Supplementary Fig. 6 for the power-dispatch problem of d = 13). Although the wide applicability of EVOLER has been demonstrated, another open question is whether it is possible to further relax the requirements for unknown problems, which may deserve further study.

Evolutionary computation with PSO
Particle swarm optimization is inspired by collective intelligence in nature, for example, birds flocking or fish schooling [17][18][19] . In canonical PSO 14,18,21 , K particles (or agents) cooperatively search the d-dimensional problem space and update their positions (or solutions) x t+1 (1) As seen, updating the particle velocity v t k involves three parts. The first part is the inertia term-the dependence of the particle's velocity on time t, that is, v t k (k = 0, 1, 2, ⋯ , K − 1); the inertia weight is w ∈ ℝ + . The second part is the cognition item-the influence from the particle's own experience so far; g t pbest,k = arg min t f(x t k ) ∈ ℝ d is the optimal position of the kth particle and f(x) ∶ ℝ d ↦ ℝ is the fitness function, c 1 ∈ ℝ + is a cognition learning factor, u 1 ∈ ℝ d is a Gaussian variable and ∘ is the Hadamard (element-wise) product. The third term is the social part-the influence from the experience of the whole population; f(x t k ) ∈ ℝ d is the optimal position of K particles; c 2 ∈ ℝ + is the social learning factor and u 2 ∈ ℝ d is a Gaussian variable.
The above canonic PSO is referred to as the global version PSO, as the particles will track the global best g t gbest of the whole population. There also exists another local-version PSO, whereby the particles only track the best position of its local neighbourhood, which is denoted as g t lbest . The particle positions (or solutions) will then be updated via:

Exploring a global problem space
Taking d = 2 for example, the original problem space-which is composed of the fitness values on a discrete grid X ∈ ℝ M×N of continuous variables x ∈ ℝ 2 -is structured to one matrix Here, and ℛ are two indexing sets, with the cardinality of | | = |ℛ| = s ≪ min{M, N} (see Supplementary Section 1 for the probability densities of random sampling).

Reconstructing a low-rank representation
Based on the sub-matrices C and R, a low-rank representation space is reconstructed (see Fig. 2a) Here, a small weighting matrix W ∈ ℝ s×s is directly computed by minimizing the residual error [46][47][48]66 ; Y † is the Moore-Penrose pseudo-inverse of Y. When the original space F ∈ ℝ M×N is rank-restricted, that is, rank (F) = r ≪ min (M, N) , and the sampling length satisfies s ≈ {r log(r/ϵ)}, then the residual error is upper bounded with probability 1 − ϵ (refs. 46,47), that is, where ∥ X∥ 2 F is the Frobenius norm of X, σ j is the jth singular value of F and η is a constant related to s. For convenience, we define the

Identifying an attention subspace
Provided a low-rank representation space F ∈ ℝ M×N , an initial solution x opt ∈ X is then identified:x We define an attention subspace as ≜ {x ∈ X; |x −x opt | < δ|} , where is the subspace radius. As discussed, the learned low-rank representation F is highly accurate and thus the attention subspace probably contains the global optimum x opt = arg min x∈X f(x) of a discrete space F (Supplementary Section 2). When N and M are sufficiently large, X has a very high grid resolution and hence subspace would contain the global solution x * = arg min x∈ℝ 2 f(x) (Fig. 1aiii). We thus initialize K particles via one Gaussian distribution:

Reconstructing low-rank representation for d > 2
Similar to the case in which d = 2, the high-dimensional (d > 2) space can be reconstructed via few structured samples. For clarity, a high-dimensional problem space is denoted by the tensor ℱ ∈ ℝ N×N×⋯×N , and we assume each dimensionality of the tensor ℱ is equal to N.
When it comes to the arbitrary d-dimensional tensor, the above sampling and reconstruction procedure can be directly applicable. In this case, the required samples for a structured exploration is {ds d−1 N} , s = ⌊r × log(r )⌋; r is the rank of the tensor. To reduce the samples, we may apply a multiscale exploration, that is, we first reconstruct one down-sampling (

Parameter settings
EVOLER. The hyperparameters in EVOLER can be classified into two subsets. In the first stage, there are two parameters: the discrete length N (for simplicity, we assume N 1 = N 2 = … = N d = N) and the sampling length s. In practice, the discrete length N is directly set to ~20-120. The rank of a discrete problem space is then estimated (see Supplementary Section 1a for more details), which is denoted as r . On this basis, the sampling length is configured to be s = ⌊r × log(r )⌋ (in practice, we find s =r is also applicable). In the second stage, there are three parameters in the PSO method: the inertia weight w, the cognition learning factor c 1 and the social learning factor c 2 . In the tth generation (0 ≤ t ≤ T), we have w = 0.3, c 1 = 2 − 1.5 × t/T and c 2 = 1.5 + 0.5 × t/T; whereby the maximum generation is T. In each independent trial, K particles are randomly initialized, and the sampling indexes are also randomly determined. For the high-dimensional problems (for example, d = 30), the hierarchy exploration is applied (see Supplementary Fig. 15). In this case, the discrete length N is ~3-6, and the iterative hierarchy exploration was terminated when the change on solutions was smaller than 10 −3 .

Classical methods.
In the experiments, the parameter settings of classical PSO methods remain exactly the same as the related references. For these comparative methods, their parameter settings are summarized in Supplementary Table 2.

Data availability
The representative benchmarks for testing evolutionary computation methods are partly available at http://www.sfu.ca/~ssurjano/ optimization.html, and partly presented in refs. 30,45. The powerdispatch dataset, that is, the Economic Load Dispatch Test Systems Repository 55 , is available at https://al-roomi.org/economic-dispatch. The dataset of inverse design of multilayer thin-film structures is available at Gitbub (https://github.com/BinLi-BUPT/EVOLER.git) 71 , which was generated by the FDTD method. Source Data are provided with this paper.