Abstract
Evolutionary computation, for example, particle swarm optimization, has impressive achievements in solving complex problems in science and industry; however, an important open problem in evolutionary computation is that there is no theoretical guarantee of reaching the global optimum and general reliability; this is due to the lack of a unified representation of diverse problem structures and a generic mechanism by which to avoid local optima. This unresolved challenge impairs trust in the applicability of evolutionary computation to a variety of problems. Here we report an evolutionary computation framework aided by machine learning, named EVOLER, which enables the theoretically guaranteed global optimization of a range of complex nonconvex problems. This is achieved by: (1) learning a lowrank representation of a problem with limited samples, which helps to identify an attention subspace; and (2) exploring this small attention subspace via the evolutionary computation method, which helps to reliably avoid local optima. As validated on 20 challenging benchmarks, this method finds the global optimum with a probability approaching 1. We use EVOLER to tackle two important problems: power grid dispatch and the inverse design of nanophotonics devices. The method consistently reached optimal results that were challenging to achieve with previous stateoftheart methods. EVOLER takes a leap forwards in globally guaranteed evolutionary computation, overcoming the uncertainty of datadriven blackbox methods, and offering broad prospects for tackling complex realworld problems.
Similar content being viewed by others
Main
Evolutionary computation is essential to complex realworld problems that cannot be solved by classical gradientbased methods, for example, molecular docking or dynamics simulation^{1,2,3}, nanophotonics^{4,5,6,7}, ultrafast lasers^{8,9}, medical diagnosis^{10}, nuclear physics^{11}, power grid dispatch^{12,13,14}, aerodynamics^{15,16} and so on. Particle swarm optimization (PSO), a wellknown evolutionary computation method, is a powerful tool for such challenging problems^{17,18,19,20}. By reconciling two conflicting aspects—exploitation and exploration^{19}—PSO learns from the best experience of the particle itself and the best experience of the whole population^{17,21}; it has therefore attracted great attention due to its high efficiency and flexibility^{22,23}. Various new mechanisms have recently been designed to avoid vanishing diversity or achieve fast convergence^{21}, including: multiple populations^{24,25,26}, learning strategies^{27,28}, velocity update^{29}, parameter settings^{30}, population topology^{31,32} and so on. Furthermore, existing theoretical works have analysed the trajectory of the simplified particle^{33,34}, the secondorder and Lyapunov stabilities^{35,36,37}, and the effect of parameters on stability and convergence^{35,38,39,40}. But there are still two longstanding pitfalls in classical PSO methods that seriously undermine trust in their application to realworld problems emerging in science and industry.
There being no theoretical guarantee of attaining the global optimum of evolutionary computation methods^{21} has been an important open problem for more than 50 years^{41}. Due to the lack of a unified description of diverse problem structures and a generic mechanism to avoid local optima, globally guaranteed evolutionary computation remains a challenging task, especially for complex nonconvex problems. This leads to the socalled uncertainty problem; that is, after a limited computation time, one can hardly know whether or not the returned solution is globally optimal. For another, according to the celebrated no free lunch (NFL) theorem^{42}, almost all such heuristic methods are problemdependent. That is to say, every refined mechanism is only suitable to specific geometric structures (for example, convex envelope, smoothness and so on), yet ineffective or even problematic to the other problems of different structures. Consequently, there is no universal best method for general problems^{21,43,44}. This presents another critical difficulty, especially when handling complex realworld problems with unexplored properties.
In this article we report a new evolutionary computation framework aided by machine learning, named EVOLER, which enables the theoretically guaranteed global optimization of complex nonconvex problems. By leveraging the common lowrank structure of realworld problems—which is barely considered in evolutionary computation—our new method involves two stages (Fig. 1). First, with a small portion of structured samples (Fig. 1ai), a lowrank representation of the problem is learned (Fig. 1aii), which helps to identify one small attention subspace (Fig. 1aiii). Second, this identified subspace is explored via evolutionary computation methods (Fig. 1b), which offers a generic mechanism to reliably avoid local optima. As shown by empirical and theoretical results, EVOLER finds the global optimum with a probability approaching 1 for rankrestricted problems (Figs. 2c and 3u,v). Moreover, it substantially extends the applicability of evolutionary computation in diverse problems with ubiquitous lowrank structures, and exhibits the best performance in 20 challenging benchmarks^{30,45} (see Fig. 3). Furthermore, the convergence can be accelerated by orders of magnitude (Figs. 2f and 3).
We demonstrate our new optimizer in two important realworld problems: power grid dispatch and the inverse design of nanophotonics devices. For the first problem, EVOLER almost surely finds the optimal economic cost that was not reported, making global optimization possible subject to equality/inequality constraints. For the second problem, it consistently gains the optimal nanophotonic structures and attains the most favourable transmittance responses. Past evolutionary computation methods, however, find only local optima or acquire bad performance. Meanwhile, the number of function evaluations (or samples) is reduced by ~5–10fold with EVOLER, thus enabling many rapidprototyping applications. By a marriage of lowrank representation and evolutionary computation, our method makes a substantial stride towards theoretically guaranteed global optimization, overcoming the inherent uncertainty of blackbox methods and offering broad prospects for solving challenging realworld problems.
Evolutionary computation with machine learning
We are interested in applying evolutionary computation to solve a prevalent problem, that is, finding the global optimum of a complex nonanalytic function \(f:{{\mathbb{R}}}^{d}\mapsto {{\mathbb{R}}}^{1}\). Our new method consists of two stages: (1) learning the lowrank representation, and (2) performing the evolutionary computation (Fig. 1 and Supplementary Fig. 2). First we reconstruct a lowrank representation of unknown problem space from limited samples (Fig. 1a). This learned global representation helps to identify one small attention subspace, which potentially contains the global optimum. Second, we use the optimization methods to explore this small subspace (Fig. 1b) by reliably rejecting the local optima. As the learned lowrank representation has the analytical expression and the bounded error, the theoretical probability of finding the global optimum can be derived, thus overcoming the uncertainty of blackbox evolutionary computation.
Learning lowrank representation
Classical machine learning methods can be applied in the first stage. For example, one may directly use J samples \({\{{{{{\bf{x}}}}}_{j},f({{{{\bf{x}}}}}_{j})\}}_{j = 1}^{J}\) to train one neural network, thus modelling the complex unknown function \(\hat{f}:{{\mathbb{R}}}^{d}\mapsto {{\mathbb{R}}}^{1}\) by \(\hat{f}({{{\bf{x}}}})=g({{{\bf{W}}}}{{{\bf{x}}}}+b)\) (a single layer neural network is considered here for simplicity), where \({{{\bf{W}}}}\in {{\mathbb{R}}}^{1\times d}\) is the learned network weight, g(⋅) is the nonlinear activation function and \(b\in {\mathbb{R}}\) is the bias. However, there are two limitations to this classical method in lowrank representation learning. First, it requires huge samples to explore the whole problem space, each of which {x_{j}, f(x_{j})} corresponds to one time function evaluation. Second, it acquires unexplainable weights (that is, W), which hinders further theoretical analysis. To this end, we design a lowrank representation learning method to reconstruct a problem space from limited samples (Fig. 1aii). Our method is inspired by the ubiquitous lowrank structures of realworld problems (see Supplementary Fig. 1 for diverse benchmark problems, Fig. 5b for the powerdispatch problem and Fig. 6b for the nanophotonics inversedesign problem).
For simplicity, we start with the twodimensional problems. We first explore the discrete problem space \({{{\bf{F}}}}=f({{{\bf{X}}}})\in {{\mathbb{R}}}^{M\times N}\) by randomly sampling s rows/columns (M and N denote the grid sizes of a solution space \({{{\bf{X}}}}\in {{\mathbb{R}}}^{M\times N}\)). The sampled function values \({\{f({{{{\bf{x}}}}}_{j})\}}_{j = 1}^{s(M+N){s}^{2}}\) are structured into two submatrices \({{{\bf{C}}}}\in {{\mathbb{R}}}^{M\times s}\) and \({{{\bf{R}}}}\in {{\mathbb{R}}}^{s\times N}\) (where \(s\ll \min (M,N)\)); see Fig. 1ai for the sampling result of a nonconvex Schwefel function (Supplementary Table 1); s = 3, M = N = 100. Targeting a sampleefficient reconstruction of F, we turn to learning another mapping function \(h:{{\mathbb{R}}}^{(M\times s)\times (s\times N)}\mapsto {{\mathbb{R}}}^{M\times N}\), so that \(\hat{{{{\bf{F}}}}}=h({{{\bf{C}}}},{{{\bf{R}}}})=\hat{f}({{{\bf{X}}}})\) (Fig. 1aii). This formulated model may be intractable as it is usually underdetermined. Fortunately, for the lowrank problems (that is, \(\,{{\mbox{rank}}}\,({{{\bf{F}}}})\simeq r\ll \min (M,N)\)), we are able to apply randomized matrix approximation techniques^{46,47,48,49} and hence approximate h via one special form: \(\hat{{{{\bf{F}}}}}=h({{{\bf{C}}}},{{{\bf{R}}}})\triangleq {{{\bf{CWR}}}}\) (Fig. 2a). Here, \({{{\bf{W}}}}\in {{\mathbb{R}}}^{s\times s}\) is one weighting matrix that can be directly obtained (see Methods for more details and the extension to ddimensional problems).
Taking a 2D Schwefel function, for example, the learned lowrank representation is shown in Fig. 1aiii; the residual error between real function values f(x) and reconstructed ones \(\hat{f}({{{\bf{x}}}})\) is given in Fig. 2b. Our new method would have two key advantages compared with classical neural networks. First, it demonstrates a promising extrapolation capacity and thus attains very high sample efficiency. To be specific, we can reconstruct the unknown problem space of MN samples by only using [s(M + N) − s^{2}] samples, where \(s \approx {{{\mathcal{O}}}}(r\log r)\ll \min (M,N)\). Second, it permits an analytical expression for the reconstructed space \(\hat{{{{\bf{F}}}}}\), as well as the bounded residual error^{46,47}. As such, it further enables a theoretical analysis on the probability of finding the global optimum (see Supplementary Section 2 for details).
Evolutionary computation
Relying on the lowrank representation learned from the first stage, we identify one attention subspace \({{{\mathcal{A}}}}\). As its name suggests, this small subspace \({{{\mathcal{A}}}}\) probably contains the global solution and hence claims the attention of the next evolutionary computation. The initial population \({{{{\bf{x}}}}}_{k}^{0}(k=1,2,\cdots \,,K)\) is thus placed around \({{{\mathcal{A}}}}\), and then canonic or localversion PSO^{18,19,21} can be used to exploit this small subspace (see Methods for more details). Note that other evolutionary computation methods^{50} such as genetic algorithm and differential evolution can be combined with our method. Similarly, classical optimization methods such as the downhill simplex algorithm^{51} will be also applicable.
After the total T generations, the final solution x^{T} is obtained. As noted, the identification of attention subspace offers a generic mechanism to reliably avoid the local optima (Supplementary Theorem 3). Incorporating the evolutionary computation or other methods, EVOLER converges to the global solution x* with a probability approaching 1, as demonstrated in widespread rankrestricted problems (see approximately lowrank problems in Supplementary Fig. 1b–f and Fig. 6b, and strictly lowrank problems in Supplementary Figs. 1a,g,j and Fig. 5b). More importantly, EVOLER establishes the first theoretical result on the global optimum of the nonconvex problems with common lowrank structures; that is, regardless of diverse geometry properties of problems, the lower bound probability of finding the global optimum is given by
Here ϵ is a small constant (see Supplementary Section 2). This lower bound probability relates to two important properties of the objective function. First, Δf ≜ ∣ f(x*) − f(x^{local})∣ reflects the local difference (Fig. 2c), that is, the distance between the global optimum f(x*) and the next local optimum f(x^{local}). Second, \(\Delta E(s)\propto {\left[\frac{1}{MN}\mathop{\sum }\nolimits_{j = s+1}^{\min (M,N)}{\sigma }_{j}^{2}\right]}^{1/2}\) is the error bound of \(\parallel {{{\bf{F}}}}\hat{{{{\bf{F}}}}}{\parallel }_{F}^{2}\) and reflects the global coherence, that is, the extent of the accuracy of representing the other values with few samples. Here, \({\sigma }_{j}\) denotes the jth singular value of F. We consider again the nonconvex Schwefel function to illustrate our theoretical result. After obtaining the lowrank representation \(\hat{{{{\bf{F}}}}}\) (Fig. 1aiii), we estimate \(\Delta \hat{f} > 100\) and \(\Delta \hat{E}(s)\simeq 1{0}^{12}\) (s = 3, M = N = 100). In line with the estimated lower bound in Fig. 2c, EVOLER finds the global optimum almost surely, with probability approaching 1 (Fig. 2e). By contrast, classical optimization methods almost inevitably fall into the local optimums in this case (Fig. 2d,g).
Results
Representative benchmark problems
We examine EVOLER in 20 benchmark problems that are commonly used in evolutionary computation^{30,45}, including: unimodal, bananavalley, noncontinuous, nondifferentiable and multimodal functions, the rotated versions, the shifted versions and the hybrid composite functions (see Supplementary Table 1 for details on these benchmark functions). Classical methods of various new mechanisms were evaluated, including: multiple populations, for example, MPCPSO^{26}, MCPSO^{25} and CPSOH/S^{24}; improved learning strategies, for example, CLPSO^{28} and NPSO^{27}; refined position update, for example, DPSO^{29}; parameter settings, for example, DNSPSO^{30}; and population topology, for example, GCPSO^{31} (see Methods for the parameter settings of EVOLER and Supplementary Table 2 for the parameter settings of classical methods). In the experiments, each method was evaluated for N_{t} = 500 independent trials with random restarts, so that 1 − 1/N_{t} ≥ Pr{x^{T} → x*}. The averaged convergence curves are thus derived (see Supplementary Table 3 for the statistical results); the empirical probability of finding the global optimum was also obtained. As the number of function evaluations in each generation may be different for various methods, we focus on the convergence performance with regards to the number of function evaluations. For d = 2, we assume K = 50 and the experimental results are shown in Fig. 3; for d = 30, K = 100 and the results are shown in Fig. 4 and Supplementary Fig. 5.
As mentioned, one critical challenge of classical evolutionary computation methods is the uncertainty problem; that is, we can never know whether or not the returned solution is globally optimal. Apart from this, they are especially vulnerable to varying problem structures. Although some methods may find the global optimum in special cases (for example, MPCPSO in the Griewank function with the convex envelope), none of them can gain the good performance in all such benchmarks (Fig. 3a,t). By contrast, our method potentially overcomes the two pitfalls inherent to evolutionary computation.
From Fig. 3, EVOLER finds the global optimum with probability 1, regardless of the different benchmark problems of diverse geometry properties. This further validates our theoretical result. As the benchmark problems generally have lowrank structures (see Supplementary Fig. 1), when \(s \approx {{{\mathcal{O}}}}\{r\log (r/\epsilon )\}\), we have Pr{x^{T} → x*} → 1. See Fig. 2c for the theoretical lower bound probability of the Schwefel function, and Fig. 3v for the theoretical lower bound probability of the convergence to the global optimum of the other benchmark functions, as well as the empirical results. As such, even for the challenging nonconvex problems, EVOLER would gain the global optimum almost surely, whereas classical deterministic or heuristic methods fail to do so (Figs. 2g and 3u).
We also observe that EVOLER attains the best performance in all cases. In contrast to classical PSO methods explicitly or implicitly assuming specific geometry properties, in EVOLER a discrete problem space needs only to be rankrestricted. Note that this lowrank structure is easy to evaluate in practice (see the online rank estimation in Supplementary Section 1a and Supplementary Fig. 3) and, moreover, is ubiquitous in application. As a result, EVOLER substantially extends the applicability of evolutionary computation in diverse problems with common lowrank structures, providing great promise to realworld problems with unexplored properties. Moreover, with the sampleefficient reconstruction of the attention subspace, the sample/time complexity is substantially reduced. For example, the required samples of EVOLER are reduced by more than 30fold compared with CLPSO in the case of Schwefel function (Fig. 3g). To even things up, we also take samples of the first stage into account.
Our method could be scalable to the more challenging 30dimensional problems. Taking the shifted Levy function, for example, EVOLER almost surely converges to the nearglobal optimum within 1.45 s; by contrast, classical PSO methods fail to do so with equal time budgets (see the averaged convergence curves in Fig. 4a, and the statistical distribution of the achieved fitness of various methods in Fig. 4b). Similar results are obtained in the other 30dimensional benchmark problems (see Supplementary Fig. 5 for the averaged convergence curves and Fig. 4d for the probability of convergence to the global optimum; 500 independent trials with random restarts). Provided the maximum generation of 900, the averaged running time of various methods is also obtained (Fig. 4c); 30dimensional composite function, 20 trails, CPU = 4 GHz, RAM = 32 GB.
Dispatch of power grid
Economic dispatch of a power grid is a classical yet still challenging realworld problem, characterized by the intrinsic difficulties in global optimization, that is, nonsmooth fitness with many local optima^{12,13,14,52}. Despite the applications of evolutionary computation^{14,52,53,54}, this important problem still remains unsolved. On the one hand, it is not known whether the global solution could be found; on the other hand, the probability of finding it was rarely considered. Assume the power grid consists of d units; each of them has the maximum power \({P}_{j}^{\max}\) and the minimal power \({P}_{j}^{\min}\) so that the output power satisfies \({P}_{j}^{\min}\le {P}_{j}\le {P}_{j}^{\max}\) (see Fig. 5a). Meanwhile, the accumulated power of d units must be a constant, that is, \(\mathop{\sum}\nolimits_{j = 1}^{d}{P}_{j}={P}^{{{{\rm{total}}}}}\). Each unit has a nonsmooth cost function \({f}_{j}({P}_{j})={a}_{j}+{b}_{j}{P}_{j}+{c}_{j}{P}_{j}^{2}+ {d}_{j}\sin [{e}_{j}({P}_{j}^{\min}{P}_{j})];\)\(\left[\right.{a}_{j},{b}_{j},{c}_{j},{d}_{j},{e}_{j},{P}_{j}^{\min},\) \({P}_{j}^{\max}\left.\right]\) is the parameters set of unit j, which are available in ref. ^{55}. The goal is to optimize the output powers of d units, that is, \({\{{P}_{j}\}}_{j = 1}^{d}\), so as to minimize the accumulated economic cost \(f=\mathop{\sum}\nolimits_{j = 1}^{d}{f}_{j}({P}_{j})\), subject to the above inequality and equality constraints.
We first study the power dispatch with d = 3 by applying evolutionary computation methods. As we are interested in the trusted global optimization, each method was evaluated independently for 50 trials and the averaged fitness was obtained (Fig. 5c). From the singular values of 3D fitness, this realworld problem is strictly low ranked (Fig. 5b). We thus expect EVOLER would reliably find the global optimum. As seen, EVOLER indeed converges to a cheaper optimal cost than that of classical methods, after a few number of function evaluations. The minimal expected cost of classical methods is $USD 8,234.51 per 860 MWh (that is, DSNPSO), whereas that of EVOLER is $USD 8,233.66.
We then evaluate the best solutions of different methods and the probabilities of finding them. For classical methods, we estimate an upperbound probability of successful results, that is, the suboptimal results falling to \([{f}_{j}^{{{\,{\rm{best}}}}},(1+\kappa ){f}_{j}^{{{\,{\rm{best}}}}}]\) are treated as successful ones, where \({f}_{j}^{{{\,{\rm{best}}}}}\) is the best result achieved by method j and κ = 10^{−3}. From Fig. 5d, EVOLER finds the optimum solution with probability 1. The best cost of EVOLER is $USD 8,233.66 per 860 MWh. Among classical methods, the minimal cost was achieved by GCPSO, that is, $USD 8,233.66, yet with a small probability of 0.34. Combining the theoretical result on the lowrank problems, it is convinced that EVOLER would gain the global optimum almost surely. Furthermore, EVOLER accelerates the evolutionary computation by more than 15fold compared with CLPSO (Fig. 5e).
We further demonstrate the advantage of EVOLER for d = 6 (see Supplementary Fig. 6 for the case of d = 13). From Fig. 5f, it converges stably to the minimal cost, that is, $USD 15,303.872 per 1,260 MWh, whereas classical methods can only achieve $USD 15,352.396 in expectation. By rough calculation, the economic cost will be reduced by 1.1 × 10^{9} dollars per year, provided that the power consumption of the whole world in 2021 was about 28.32 × 10^{12} KWh. From Fig. 5g, EVOLER again finds the global solution with probability 1. By contrast, classical methods more likely end up with local solutions. Although classical methods have an intrinsic defect of problem dependency (for example, GCPSO for d = 3, DNSPSO for d = 6, CLPSO for d = 13), our method consistently gains the best performance (see Fig. 5d,g and Supplmentary Fig. 6).
Inverse design of nanophotonics devices
Inverse design of nanostructures is essential to nanophotonics^{5,6}, which has begun to reshape the landscape of structures available to nanophotonics^{4}, and has been applied in many fields^{7,56,57} such as optical physics, biosensors, optical communications and so on. This computational task is very challenging, with elusive nanophotonics mechanisms and nonunique local solutions. At the present, there are two common strategies to handle it^{4,8}: machine learning and evolutionary computation. The former adopts neural networks to model the complex relationship between nanophotonics structures and spectrum responses^{58,59,60}, yet lacking the critical interpretability. The latter directly optimizes nanophotonic devices^{7,56,61}, but inevitably acquires nonunique local solutions. Most importantly, both of them fail to gain the optimal structure, severely limiting the functionality of various wavelengthsensitive nanophotonics devices such as biosensors^{5,56,62}, optical switches^{63} and so on. Despite the great importance, the optimal inverse design of nanophotonics devices remains an open problem.
Here we consider one representative inversedesign problem, multilayer thinfilm structures^{4,5,6,56,59}, which aims to optimize the thickness of L dielectric layers (SiO_{2} and Si_{3}N_{4}), subject to the target transmittance response. Note that our method can be generalized to similar problems, such as jointly optimizing the materials type or the number of periods^{6}. In the experiment, d = 5 thickness parameters of L = 18 dielectric layers need to be adjusted for two desirable responses (Fig. 6a). In the first case, the desirable response involves two bellshaped transmittance bands (Fig. 6e; 3 dB bandwidth = 85 nm, the two centre wavelengths are located at 535 nm and 760 nm). In the second case, the response contains one sharp transmittance band (Fig. 6g; bandwidth = 200 nm, the centre wavelength is located at 750 nm). As noted, this realworld problem is inherently rankrestricted; see the singular values (Fig. 6b) and 2D slice of ddimensional fitness (Fig. 6c; a_{2} versus a_{3}; a_{1} = 120 nm, a_{4} = 80 nm, a_{5} = 80 nm; two bellshaped bands).
Given the thickness parameters \({{{\bf{a}}}}\in {{\mathbb{R}}}^{d}\), the designed response was obtained by finite difference time domain (FDTD) methods^{64}. The fitness is defined as mean square error between the targeted response r_{t}(λ; a) and a designed response r_{d}(λ; a); that is, \(f({{{\bf{a}}}})=\mathop{\sum }\nolimits_{i = 0}^{I1} {r}_{\mathrm{t}}({\lambda }_{i};{{{\bf{a}}}}){r}_{\mathrm{d}}({\lambda }_{i};{{{\bf{a}}}}){ }^{2};I=201,{\lambda }_{i}\) is the discrete wavelength of interest. Note that each run is time consuming, for example, ~10–20 h for one trial (maximum generation = 200, K = 50; CPU = 4 GHz, RAM = 32 GB).
To demonstrate the global optimality of EVOLER, we first consider one special case with an achievable optimal response, whereby it reliably gains the global solution (Supplementary Fig. 7). Various PSO methods were then evaluated in two designed cases, and the convergence curves were derived via five independent trials (Fig. 6d,f). In the first case, EVOLER consistently converges to the best result, acquiring a minimal mean square error of 24.7 with a probability of 1. By contrast, the best expected fitness of classical methods is only 26.95, that is, local PSO (see Supplementary Fig. 8 for their designed responses). Similar results are obtained in the second case (Fig. 6f and Supplementary Fig. 9). When it comes to realistic applications, for example, the wavelengthsensitive optical communications^{6,7}, the transmission efficiency and reliability are substantially enhanced, as the inband desirable signal is maximized while the outband interference signal is minimized.
Another important advantage of EVOLER is that it permits the highly efficient computation, which is very promising to the timeconsuming inversedesign tasks. Recall that each sample was independently evaluated in the first stage, which can be implemented in parallel, thus remarkably reducing the time complexity. From Fig. 6d,f, the computational time is reduced by around tenfold compared with classical PSO; two computers are used in the first stage. That is to say, with a dramatically reduced complexity, EVOLER attains the global optimum with a theoretical guarantee. As such, it provides great potentials to the inverse design of nanophotonics structures, as well as the other complex realworld problems.
Discussion
Many important problems emerging in science and industry call for judicious datadriven and derivativefree optimization. Despite broad application prospects, current evolutionary computation methods depend on specific problem structures and fail to acquire globally guaranteed solutions, severely undermining the trust in their applications. Here we report a new framework of evolutionary computation aided by machine learning, EVOLER, which makes it possible to attain the theoretically guaranteed global optimization of complex nonconvex problems. As demonstrated on 20 challenging benchmark problems, and two realworld problems with diverse properties yet common lowrank structures, it finds the global optimum almost surely, thus substantially extending applicability. Meanwhile, the number of samples can be remarkably reduced. As such, our method takes a leap forwards in theoretically guaranteed evolutionary computation, overcoming the uncertainty in datadriven blackbox methods, and paving the way for solving complex realworld problems. Note that EVOLER also embodies two important biological characteristics that rarely emerge in classical evolutionary computation^{65}: (1) neutrality or weak selection at an early stage to preserve the diversity; and (2) major transitions to suddenly discover the promising region of solutions.
According to the theoretical result, EVOLER would be generally applicable to ddimensional problems with lowrank structures. The complexity of our method is \({{{\mathcal{O}}}}(d{r}^{d1}N+TK)\) (see Methods and Supplementary Section 3). Our method would be less attractive for problems with high d (for example, d > 100) or without a lowrank structure. There are two remedies for the potential limitations when tackling highdimensional problems. First, various dimension reduction techniques are readily applicable. Taking the challenging aerodynamic design problem (for example, ref. ^{16}), a set of orthogonal basis functions was first identified; the highdimensional problem of d > 20 was then transformed to optimize the coefficients of the d ≤ 10 principle orthogonal basis. Second, as a future direction, the incorporation of the hierarchy concept (for example, the hierarchy reconstruction of ddimensional space; see Supplementary Fig. 4) could substantially reduce the complexity. For example, assume the number of blocks to be \(L\le \lfloor \sqrt{d}\rfloor\), the complexity of our method will be \({{{\mathcal{O}}}}\{d{r}^{(d/L1)}N+TK\}\). Thus, the complexity in solving highdimensional problems could be effectively reduced (see Supplementary Fig. 5 for the challenging benchmark problems of d = 30, and Supplementary Fig. 6 for the powerdispatch problem of d = 13). Although the wide applicability of EVOLER has been demonstrated, another open question is whether it is possible to further relax the requirements for unknown problems, which may deserve further study.
Methods
Evolutionary computation with PSO
Particle swarm optimization is inspired by collective intelligence in nature, for example, birds flocking or fish schooling^{17,18,19}. In canonical PSO^{14,18,21}, K particles (or agents) cooperatively search the ddimensional problem space and update their positions (or solutions) \({{{{\bf{x}}}}}_{k}^{t+1}\in {{\mathbb{R}}}^{d}\), that is,
As seen, updating the particle velocity \({{{{\bf{v}}}}}_{k}^{t}\) involves three parts. The first part is the inertia term—the dependence of the particle’s velocity on time t, that is, \({{{{\bf{v}}}}}_{k}^{t}(k=0,1,2,\cdots \,,K1)\); the inertia weight is \(w\in {{\mathbb{R}}}^{+}\). The second part is the cognition item—the influence from the particle’s own experience so far; \({{{{\bf{g}}}}}_{{\mathrm{pbest}},k}^{t}=\arg \mathop{\min }\limits_{t}f({{{{\bf{x}}}}}_{k}^{t})\in {{\mathbb{R}}}^{d}\) is the optimal position of the kth particle and \(f({{{\bf{x}}}}):{{\mathbb{R}}}^{d}\mapsto {\mathbb{R}}\) is the fitness function, \({c}_{1}\in {{\mathbb{R}}}^{+}\) is a cognition learning factor, \({{{{\bf{u}}}}}_{1}\in {{\mathbb{R}}}^{d}\) is a Gaussian variable and ∘ is the Hadamard (elementwise) product. The third term is the social part—the influence from the experience of the whole population; \({{{{\bf{g}}}}}_{\mathrm{{gbest}}}^{t}=\arg \mathop{\min }\limits_{k,t}f({{{{\bf{x}}}}}_{k}^{t})\in {{\mathbb{R}}}^{d}\) is the optimal position of K particles; \({c}_{2}\in {{\mathbb{R}}}^{+}\) is the social learning factor and \({{{{\bf{u}}}}}_{2}\in {{\mathbb{R}}}^{d}\) is a Gaussian variable.
The above canonic PSO is referred to as the global version PSO, as the particles will track the global best \({{{{\bf{g}}}}}_{\mathrm{gbest}}^{t}\) of the whole population. There also exists another localversion PSO, whereby the particles only track the best position of its local neighbourhood, which is denoted as \({{{{\bf{g}}}}}_{\mathrm{lbest}}^{t}\). The particle positions (or solutions) will then be updated via:
Exploring a global problem space
Taking d = 2 for example, the original problem space—which is composed of the fitness values on a discrete grid \({{{\bf{X}}}}\in {{\mathbb{R}}}^{M\times N}\) of continuous variables \({{{\bf{x}}}}\in {{\mathbb{R}}}^{2}\)—is structured to one matrix F = {f(X_{m,n})}, where m = 1, 2, … , M, and n = 1, 2, … , N; M and N denote the grid sizes. To explore this unknown problem space F, we randomly sample s columns and rows, thus obtaining two submatrices:
Here, \({{{\mathcal{C}}}}\) and \({{{\mathcal{R}}}}\) are two indexing sets, with the cardinality of \(\vert{\mathcal{C}}\vert= \vert{\mathcal{R}}\vert\)\(= s \ll \min \{M,N\}\) (see Supplementary Section 1 for the probability densities of random sampling).
Reconstructing a lowrank representation
Based on the submatrices C and R, a lowrank representation space is reconstructed (see Fig. 2a).
Here, a small weighting matrix \({{{\bf{W}}}}\in {{\mathbb{R}}}^{s\times s}\) is directly computed by minimizing the residual error^{46,47,48,66}; Y^{†} is the Moore–Penrose pseudoinverse of Y. When the original space \({{{\bf{F}}}}\in {{\mathbb{R}}}^{M\times N}\) is rankrestricted, that is, \(\,{{\mbox{rank}}}\,({{{\bf{F}}}})=r\ll \min (M,N)\), and the sampling length satisfies \(s \approx {{{\mathcal{O}}}}\{r\log (r/\epsilon )\}\), then the residual error is upper bounded with probability 1 − ϵ (refs. ^{46,47}), that is,
where \(\parallel {{{\bf{X}}}}{\parallel }_{F}^{2}\) is the Frobenius norm of X, \({\sigma }_{j}\) is the jth singular value of F and η is a constant related to s. For convenience, we define the upper bound of the averaged residual error of each element in \(\hat{{{{\bf{F}}}}}\) as \(\Delta E(s)\triangleq {\left[\frac{1}{MN}\times (1+\eta )\mathop{\sum }\nolimits_{j = s+1}^{\min (M,N)}{\sigma }_{j}^{2}\right]}^{1/2}\) so that we have \(\frac{1}{MN}\parallel {{{\bf{F}}}}\hat{{{{\bf{F}}}}}{\parallel }_{F}^{2}\le \Delta {E}^{2}(s)\).
Identifying an attention subspace
Provided a lowrank representation space \(\hat{{{{\bf{F}}}}}\in {{\mathbb{R}}}^{M\times N}\), an initial solution \({\hat{{{{\bf{x}}}}}}^{{{{\rm{opt}}}}}\in {{{\bf{X}}}}\) is then identified:
We define an attention subspace as \({{{\mathcal{A}}}}\triangleq \{{{{\bf{x}}}}\in {{{\bf{X}}}}; {{{\bf{x}}}}{\hat{{{{\bf{x}}}}}}^{{{{\rm{opt}}}}} < \delta  \}\), where 𝛿 is the subspace radius. As discussed, the learned lowrank representation \(\hat{{{{\bf{F}}}}}\) is highly accurate and thus the attention subspace \({{{\mathcal{A}}}}\) probably contains the global optimum \({{{{\bf{x}}}}}^{{{{\rm{opt}}}}}=\arg \mathop{\min }\nolimits_{{{{\bf{x}}}}\in {{{\bf{X}}}}}f({{{\bf{x}}}})\) of a discrete space F (Supplementary Section 2). When N and M are sufficiently large, X has a very high grid resolution and hence subspace \({{{\mathcal{A}}}}\) would contain the global solution \({{{{\bf{x}}}}}^{* }=\arg \mathop{\min }\nolimits_{{{{\bf{x}}}}\in {{\mathbb{R}}}^{2}}f({{{\bf{x}}}})\) (Fig. 1aiii). We thus initialize K particles via one Gaussian distribution:
where \({\delta }_{1}=({x}_{1}^{\max }{x}_{1}^{\min })/M\) and \({\delta }_{2}=({x}_{2}^{\max }{x}_{2}^{\min })/N\) denote the grid resolution.
Reconstructing lowrank representation for d > 2
Similar to the case in which d = 2, the highdimensional (d > 2) space can be reconstructed via few structured samples. For clarity, a highdimensional problem space is denoted by the tensor \({{{\mathcal{F}}}}\in {{\mathbb{R}}}^{N\times N\times \cdots \times N}\), and we assume each dimensionality of the tensor \({{{\mathcal{F}}}}\) is equal to N.
Taking the threedimensional representation space reconstruction as an example, we first apply a structured random sampling and abstract three small tensors^{67,68,69} \({{{{\mathcal{C}}}}}_{1}\in {{\mathbb{R}}}^{N\times {s}_{2}\times {s}_{3}},{{{{\mathcal{C}}}}}_{2}\in {{\mathbb{R}}}^{{s}_{1}\times N\times {s}_{3}},{{{{\mathcal{C}}}}}_{3}\in {{\mathbb{R}}}^{{s}_{1}\times {s}_{2}\times N}\) and one intersection tensor \({{{\mathcal{R}}}}\in {{\mathbb{R}}}^{{s}_{1}\times {s}_{2}\times {s}_{3}}\) (s_{1}, s_{2}, s_{3} ≪ N) (Supplementary Fig. 14). A representation tensor is then reconstructed via:
where \({{{{\bf{C}}}}}_{1}\in {{\mathbb{R}}}^{N\times ({s}_{2}{s}_{3})}\) was obtained by expanding the tensor \({{{{\mathcal{C}}}}}_{1}\) to one 2D matrix along the first dimension; \({{{{\bf{U}}}}}_{1}\in {{\mathbb{R}}}^{{s}_{1}\times ({s}_{2}{s}_{3})}\) was obtained by expanding a tensor \({{{\mathcal{R}}}}\) to 2D matrix along the first dimension, \({{{{\bf{U}}}}}_{1}^{{\dagger}}\) denotes the pseudoinverse of U_{1} and \({{{{\mathcal{R}}}}}_{\times 1}({{{{\bf{C}}}}}_{1}{{{{\bf{U}}}}}_{1}^{{\dagger}})\in {{\mathbb{R}}}^{N\times {s}_{2}\times {s}_{3}}\) denotes the onemode product^{70}.
When it comes to the arbitrary ddimensional tensor, the above sampling and reconstruction procedure can be directly applicable. In this case, the required samples for a structured exploration is \({\mathcal{O}}\left\{d{s}^{d1}N\right\},s=\lfloor{\hat{r}}\times{\mathrm{log}}({\hat{r}})\rfloor;r\) is the rank of the tensor. To reduce the samples, we may apply a multiscale exploration, that is, we first reconstruct one downsampling \(\left(\frac{N}{2}\times\frac{N}{2}\times \ldots \times \frac{N}{2}\right)\) tensor and then interpolate it to form the other (N × N × … × N) tensor (also see Supplementary Figs. 5, 6 and 15 for practical methods to further reduce the samples).
Parameter settings
EVOLER
The hyperparameters in EVOLER can be classified into two subsets. In the first stage, there are two parameters: the discrete length N (for simplicity, we assume N_{1} = N_{2} = … = N_{d} = N) and the sampling length s. In practice, the discrete length N is directly set to ~20–120. The rank of a discrete problem space is then estimated (see Supplementary Section 1a for more details), which is denoted as \(\hat{r}\). On this basis, the sampling length is configured to be \(s=\lfloor \hat{r}\times \log (\hat{r})\rfloor\) (in practice, we find \(s=\hat{r}\) is also applicable). In the second stage, there are three parameters in the PSO method: the inertia weight w, the cognition learning factor c_{1} and the social learning factor c_{2}. In the tth generation (0 ≤ t ≤ T), we have w = 0.3, c_{1} = 2 − 1.5 × t/T and c_{2} = 1.5 + 0.5 × t/T; whereby the maximum generation is T. In each independent trial, K particles are randomly initialized, and the sampling indexes are also randomly determined. For the highdimensional problems (for example, d = 30), the hierarchy exploration is applied (see Supplementary Fig. 15). In this case, the discrete length N is ~3–6, and the iterative hierarchy exploration was terminated when the change on solutions was smaller than 10^{−3}.
Classical methods
In the experiments, the parameter settings of classical PSO methods remain exactly the same as the related references. For these comparative methods, their parameter settings are summarized in Supplementary Table 2.
Data availability
The representative benchmarks for testing evolutionary computation methods are partly available at http://www.sfu.ca/~ssurjano/optimization.html, and partly presented in refs. ^{30,45}. The powerdispatch dataset, that is, the Economic Load Dispatch Test Systems Repository^{55}, is available at https://alroomi.org/economicdispatch. The dataset of inverse design of multilayer thinfilm structures is available at Gitbub (https://github.com/BinLiBUPT/EVOLER.git)^{71}, which was generated by the FDTD method. Source Data are provided with this paper.
Code availability
Matlab code of EVOLER is available at https://github.com/BinLiBUPT/EVOLER.git (ref. ^{71}).
References
Weiel, M. et al. Dynamic particle swarm optimization of biomolecular simulation parameters with flexible objective functions. Nat. Mach. Intell. 3, 727–734 (2021).
Morris, G. M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 19, 1639–1662 (1998).
Quignot, C. et al. Interevdock2: an expanded server for protein docking using evolutionary and biological information from homology models and multimeric inputs. Nucleic Acids Res. 46, W408–W416 (2018).
Molesky, S. et al. Inverse design in nanophotonics. Nat. Photon. 12, 659–670 (2018).
Sreekanth, K. V. et al. Biosensing with the singular phase of an ultrathin metaldielectric nanophotonic cavity. Nat. Commun. 9, 1–8 (2018).
Qiu, C. et al. Simultaneous inverse design continuous and discrete parameters of nanophotonic structures via backpropagation inverse neural network. Optics Commun. 483, 126641 (2021).
Fischer, B. et al. Autonomous onchip interferometry for reconfigurable optical waveform generation. Optica 8, 1268–1276 (2021).
Genty, G. et al. Machine learning and applications in ultrafast photonics. Nat. Photon. 15, 91–101 (2021).
Andral, U. et al. Fiber laser mode locked through an evolutionary algorithm. Optica 2, 275–278 (2015).
Inbarani, H. H., Azar, A. T. & Jothi, G. Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput. Methods Prog. Biomed. 113, 175–185 (2014).
Wild, S. M., Sarich, J. & and Schunck, N. Derivativefree optimization for parameter estimation in computational nuclear physics. J. Phys. G 42, 034031 (2015).
Park, J.B., Lee, K.S., Shin, J.R. & Lee, K. Y. A particle swarm optimization for economic dispatch with nonsmooth cost functions. IEEE Trans. Power Syst. 20, 34–42 (2005).
Park, J. B., Jeong, Y. W., Shin, J. R. & Lee, K. Y. An improved particle swarm optimization for nonconvex economic dispatch problems. IEEE Trans. Power Syst. 25, 156–166 (2010).
Del Valle, Y., Venayagamoorthy, G. K., Mohagheghi, S., Hernandez, J.C. & Harley, R. G. Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Trans. Evolut. Comput. 12, 171–195 (2008).
Skinner, S. N. & ZareBehtash, H. Stateoftheart in aerodynamic shape optimisation methods. Appl. Soft Comput. 62, 933–962 (2018).
Yasong, Q., Junqiang, B., Nan, L. & Chen, W. Global aerodynamic design optimization based on data dimensionality reduction. Chinese J. Aeronaut. 31, 643–659 (2018).
Kennedy, J. & Eberhart, R. Particle swarm optimization. In Proc. International Conference on Neural Networks (ICNN’95) Vol. 4, 1942–1948 (IEEE, 1995).
Shi, Y. & Eberhart, R. C. A modified particle swarm optimizer. In Proc. IEEE ICEC Conference (IEEE, 1998).
Shi, Y. & Eberhart, R. C. Empirical study of particle swarm optimization. In Proc. 1999 Congress on Evolutionary Computation (CEC’99) Vol. 3, 1945–1950 (IEEE, 1999).
Mendes, R., Kennedy, J. & Neves, J. The fully informed particle swarm: simpler, maybe better. IEEE Trans. Evolut. Comput. 8, 204–210 (2004).
Wang, D., Tan, D. & Liu, L. Particle swarm optimization algorithm: an overview. Soft Comput. 22, 387–408 (2018).
Shi, Y. et al. Particle swarm optimization: developments, applications and resources. In Proc. 2001 Congress on Evolutionary Computation (CEC) Vol. 1, 81–86 (IEEE, 2001).
Poli, R., Kennedy, J. & Blackwell, T. Particle swarm optimization. Swarm Intell. 1, 33–57 (2007).
Van den Bergh, F. & Engelbrecht, A. P. A cooperative approach to particle swarm optimization. IEEE Trans. Evolut. Comput. 8, 225–239 (2004).
Niu, B., Zhu, Y., He, X. & Wu, H. MCPSO: A multiswarm cooperative particle swarm optimizer. Appl. Mathematics Comput. 185, 1050–1062 (2007).
Li, W., Meng, X., Huang, Y. & Fu, Z.H. Multipopulation cooperative particle swarm optimization with a mixed mutation strategy. Inf. Sci. 529, 179–196 (2020).
Yang, C. & Simon, D. A new particle swarm optimization technique. In 18th International Conference on Systems Engineering (ICSEng’05) 164–169 (IEEE, 2005).
Liang, J. J., Qin, A. K., Suganthan, P. N. & Baskar, S. Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Trans. Evolut. Comput. 10, 281–295 (2006).
Xie, X.F., Zhang, W.J. & Yang, Z.L. Dissipative particle swarm optimization. In Proc. 2002 Congress on Evolutionary Computation (CEC’02) Vol. 2, 1456–1461 (IEEE, 2002).
Zeng, N. et al. A dynamic neighborhoodbased switching particle swarm optimization algorithm. IEEE Trans. Cybernetics 52, 9290–9301 (2022).
Peer, E. S., van den Bergh, F. & Engelbrecht, A. P. Using neighbourhoods with the guaranteed convergence PSO. In Proc. 2003 IEEE Swarm Intelligence Symposium (SIS’03) 235–242 (IEEE, 2003).
Blackwell, T. & Kennedy, J. Impact of communication topology in particle swarm optimization. IEEE Trans. Evolut. Comput. 23, 689–702 (2018).
Kennedy, J. The behavior of particles. In International Conference on Evolutionary Programming 579–589 (Springer, 1998).
Ozcan, E. & Mohan, C. K. Analysis of a simple particle swarm optimization system. Intell. Eng. Syst. Artificial Neural Netw. 8, 253–258 (1998).
Clerc, M. & Kennedy, J. The particle swarmexplosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evolut. Comput. 6, 58–73 (2002).
Van Den Bergh, F. et al. An analysis of particle swarm optimizers. PhD thesis, Univ. Pretoria (2007).
Kadirkamanathan, V., Selvarajah, K. & Fleming, P. J. Stability analysis of the particle dynamics in particle swarm optimizer. IEEE Trans. Evolut. Comput. 10, 245–255 (2006).
Van den Bergh, F. & Engelbrecht, A. P. A study of particle swarm optimization particle trajectories. Inf. Sci. 176, 937–971 (2006).
FernandezMartinez, J. L. & GarciaGonzalo, E. Stochastic stability analysis of the linear continuous and discrete PSO models. IEEE Trans. Evolut. Comput. 15, 405–423 (2010).
Van den Bergh, F. & Engelbrecht, A. P. A convergence proof for the particle swarm optimiser. Fundam. Inform. 105, 341–374 (2010).
Holland, J. H. Genetic algorithms and the optimal allocation of trials. SIAM J. Comput. 2, 88–105 (1973).
Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evolut. Compu. 1, 67–82 (1997).
GarciaMartinez, C., Rodriguez, F. J. & Lozano, M. Arbitrary function optimisation with metaheuristics. Soft Comput. 16, 2115–2133 (2012).
Adam, S. P., Alexandropoulos, S.A. N., Pardalos, P. M. & Vrahatis, M. N. in Approximation and Optimization 57–82 (Springer 2019).
Suganthan, P. N. et al. Problem definitions and evaluation criteria for the CEC2005 special session on realparameter optimization. In Proc. IEEE Congr. Evol. Comput. (CEC) 1–5 (2005).
Drineas, P., Mahoney, M. W. & Muthukrishnan, S. Relativeerror CUR matrix decompositions. SIAM J. Matrix Anal. Appl. 30, 07070471X (2008).
Wang, S. & Zhang, Z. Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling. J. Mach. Learning Res. 14, 2729–2769 (2013).
Li, B. et al. Random sketch learning for deep neural networks in edge computing. Nat. Comput. Sci. 1, 221–228 (2021).
Liu, H., Wei, Z., Zhang, H., Li, B. & Zhao, C. Tiny machine learning (TINYML) for efficient channel estimation and signal detection. IEEE Trans. Vehicular Technol. 71, 6795–6800 (2022).
Younis, A. & Dong, Z. Trends, features, and tests of common and recently introduced global optimization methods. Eng. Optimization 42, 691–718 (2010).
Nelder, J. A. & Mead, R. A simplex method for function minimization. Comput. J. 7, 308–313 (1965).
Yang, H.T., Yang, P.C. & Huang, C.L. Evolutionary programming based economic dispatch for units with nonsmooth fuel cost functions. IEEE Trans. Power Syst. 11, 112–118 (1996).
Singh, R. P., Mukherjee, V. & Ghoshal, S. Optimal reactive power dispatch by particle swarm optimization with an aging leader and challengers. Appl. Soft Comput. 29, 298–309 (2015).
Xu, S., Xiong, G., Mohamed, A. W. & Bouchekara, H. R. Forgetting velocity based improved comprehensive learning particle swarm optimization for nonconvex economic dispatch problems with valvepoint effects and multifuel options. Energy 256, 124511 (2022).
AlRoomi, A. R. Economic Load Dispatch Test Systems Repository. Electric Power Systems Analysis & NatureInspired Optimization Algorithms https://www.alroomi.org/economicdispatch (2016).
Rodrigo, D. et al. Midinfrared plasmonic biosensing with graphene. Science 349, 165–168 (2015).
Lin, Z., Liang, X., Loncar, M., Johnson, S. G. & Rodriguez, A. W. Cavityenhanced secondharmonic generation via nonlinearoverlap optimization. Optica 3, 233 (2016).
Liu, Z., Zhu, D., Raju, L. & Cai, W. Tackling photonic inverse design with machine learning. Adv. Sci. 8, 2002923 (2021).
Sheverdin, A., Monticone, F. & Valagiannopoulos, C. Photonic inverse design with neural networks: the case of invisibility in the visible. Phys. Rev. Appl. 14, 024054 (2020).
Zhang, T. et al. Machine learning and evolutionary algorithm studies of graphene metamaterials for optimized plasmoninduced transparency. Opt. Express 28, 18899–18916 (2020).
Yu, Z., Cui, H. & Sun, X. Genetically optimized onchip wideband ultracompact reflectors and Fabry–Perot cavities. Photon. Res. 5, B15–B19 (2017).
Zhang, T. et al. Plasmon induced absorption in a graphenebased nanoribbon waveguide system and its applications in logic gate and sensor. J. Phys. D 51, 055103 (2018).
Miller, K. J., Hallman, K. A., Haglund, R. F. & Weiss, S. M. Silicon waveguide optical switch with embedded phase change material. Opt. Express 25, 26527–26536 (2017).
Lumerical, F. FDTD Solutions 6.5 (Ansys, 2023); http://www.lumerical.com/tcadproducts/fdtd
Miikkulainen, R. & Forrest, S. A biological perspective on evolutionary computation. Nat. Mach. Intell. 3, 9–15 (2021).
Li, B., Wang, S., Zhang, J., Cao, X. & Zhao, C. Ultrafast accurate AoA estimation via automotive massiveMIMO radar. IEEE Trans. Vehicular Technol. 71, 1172–1186 (2021).
Mahoney, M. W., Maggioni, M. & Drineas, P. TensorCUR decompositions for tensorbased data. SIAM J. Matrix Anal. Appl. 30, 957–987 (2008).
Cai, H., Hamm, K., Huang, L. & Needell, D. Modewise tensor decompositions: multidimensional generalizations of CUR decompositions. JMLR 22, 8321–8356 (2021).
Song, Z., Woodruff, D. P. & Zhong, P. Relative error tensor low rank approximation. In Proc. 30th Annual ACMSIAM Symposium on Discrete Algorithms 2772–2789 (SIAM, 2019).
Kolda, T. G. & Bader, B. W. Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009).
Li, B., Wei, Z. & Wu, J. Machine Learningenabled Globally Guaranteed Evolutionary Computation [Source Code] (Zenodo, 2023); https://doi.org/10.5281/zenodo.7688653
Acknowledgements
This work was supported by National Natural Science Foundation of China under grant nos. U1805262, 62088101, 61971050 and 62171055.
Author information
Authors and Affiliations
Contributions
J.Z. supervised the study. B.L. conceived the idea. B.L. and Z.P.W. designed and implemented source code. B.L., T.Z., S.Y. and Z.P.W. performed the experiments on inverse design of nanophotonics structures. All of the authors interpreted the findings and wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Jing Liang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–15, Theorems 1–4, Discussion and Tables 1–3.
Source data
Source Data Fig. 2
Reconstruction error, convergence performance and probability of finding the global optimum in a 2D nonconvex Schwefel function.
Source Data Fig. 3
Convergence performance of 20 2D benchmark functions.
Source Data Fig. 4
Convergence performance of 15 30D benchmark functions.
Source Data Fig. 5
Costs of power grid dispatch of various methods.
Source Data Fig. 6
Inverse design results of nanophotonics structures of various methods.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, B., Wei, Z., Wu, J. et al. Machine learningenabled globally guaranteed evolutionary computation. Nat Mach Intell 5, 457–467 (2023). https://doi.org/10.1038/s42256023006424
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256023006424
This article is cited by

Integrated WorkingAge Maintenance to the Unrelated Parallel Machine Scheduling with SequenceDependent Setup Times
Arabian Journal for Science and Engineering (2024)