Using Differential Evolution to avoid local minima in Variational Quantum Algorithms

Variational Quantum Algorithms (VQAs) are among the most promising NISQ-era algorithms for harnessing quantum computing in diverse fields. However, the underlying optimization processes within these algorithms usually deal with local minima and barren plateau problems, preventing them from scaling efficiently. Our goal in this paper is to study alternative optimization methods that can avoid or reduce the effect of these problems. To this end, we propose to apply the Differential Evolution (DE) algorithm to VQAs optimizations. Our hypothesis is that DE is resilient to vanishing gradients and local minima for two main reasons: (1) it does not depend on gradients, and (2) its mutation and recombination schemes allow DE to continue evolving even in these cases. To demonstrate the performance of our approach, first, we use a robust local minima problem to compare state-of-the-art local optimizers (SLSQP, COBYLA, L-BFGS-B and SPSA) against DE using the Variational Quantum Eigensolver algorithm. Our results show that DE always outperforms local optimizers. In particular, in exact simulations of a 1D Ising chain with 14 qubits, DE achieves the ground state with a 100% success rate, while local optimizers only exhibit around 40%. We also show that combining DE with local optimizers increases the accuracy of the energy estimation once avoiding local minima. Finally, we demonstrate how our results can be extended to more complex problems by studying DE performance in a 1D Hubbard model.


I. INTRODUCTION
Variational Quantum Algorithms (VQAs) are a promising group of hybrid classical-quantum algorithms that can reach quantum advantage for solving many relevant problems in NISQ-era quantum computers [1].VQAs are composed of classical and quantum routines, which are used to minimize a given cost function.They use (i) quantum computing to evaluate a function with a parameterized quantum circuit, and (ii) classical computing to optimize the circuit parameters on each iteration.Their parameterized structure gives VQAs the flexibility to work with shallow quantum circuits, unlike other quantum algorithms such as the Quantum Phase Estimation, Shor's and Grover's algorithms [2][3][4].Therefore, VQAs can circumvent the issues associated with the errors and coherence times intrinsic to current quantum hardware to generate reliable outputs even without error correction techniques.In addition, their general purpose (minimizing a function) allows using them for many different optimization problems in machine learning, chemistry, physics, and mathematics, among others [5][6][7][8][9].One of the most common algorithms in the family of VQAs is the Variational Quantum Eigensolver (VQE), which aims to find the quantum state (or the set) that minimizes the energy of a given Hamiltonian [6,10].On a small scale, VQE can solve, for instance, small molecules and lattice models [11,12].But on a broader framework, it is an excellent candidate to simulate large chemical reactions, perform exact calculations on crystalline solids, and uncover the physics behind complex systems such as the Hubbard model or exotic states of matter [13][14][15][16][17][18].
However, currently, VQE and VQAs have scaling problems [19].Typical issues concern ansatz expressibility and trainability, which refers to the degree of information that a quantum circuit has to reproduce an energy state of the system and the easiness of fitting the parameters to find the global minimum, respectively [20,21].Both concepts are directly related to problems in the optimization landscape where VQAs can present many local minima and barren plateaus [19,[21][22][23][24].These problems worsen with the number of qubits since the Hilbert space grows exponentially [19,24].Local minima inherently arise from minimizing a complex function.Barren plateaus are flat areas in the cost function landscape of VQAs where gradients vanish exponentially with the problem size [24].These areas can arise from different sources, such as random parameterized quantum circuits, noisy environments, and a high degree of entanglement [25,26].Vanishing gradients and local minima are severe problems towards scaling VQAs.These cause unsuccessful optimizations as well as a significant increase in the number of measurements needed to estimate tiny gradients [22,24].Thus, VQE circuit construction requires a smart and educated ansatz selection taking, for example, some knowledge from your Hamiltonian to reduce the number of parameters to optimize without losing expressibility [10,27].Additionally, it is important to deploy a suitable optimization method that maximizes the probability of avoiding traps in the optimization landscape and the consequences of dealing with tiny gradients.
In this work, we focus on the optimization problem motivated by the lack of optimization methods that can successfully avoid these issues in VQAs [23].In fact, there is evidence that usual gradient-based and some gradient-free local optimizers suffer from barren plateaus and local minima syndrome [23,26].On the other hand, there are recent alternatives that use the Quantum Fisher Information Matrix (QFIM) to lead optimization [28,29].In particular, the Quantum Natural Gradient has successfully found the ground state (GS), i.e. the state with minimum energy, of some specific models up to a considerable number of qubits [27].Gradient-based optimizers using the QFIM are promising methods.However, they still depend on gradients, and in general, they are computationally expensive and imply a significant increase in the number of function evaluations [28].Alternatives designed to decrease the computational complexity, such as QN-SPSA, could not achieve the same performance [29].In this article, we analyze an alternative optimization strategy based on Differential Evolution (DE) algorithm.DE is an evolutionary algorithm based on population breeding that is gradient-free, easy to implement, and to parallelize.We expect DE to avoid or drastically reduce the effects of vanishing gradients and local minima since its parameter update can naturally keep evolving even in these cases.
To test this approach, we compare DE with four state-of-the-art local optimizers (SLSQP, COBYLA, L-BFGS-B, and SPSA) that have demonstrated good results in the current literature [11,12,22,[30][31][32].For this, we choose a 1D Ising model without a magnetic field.That is a simple model but a good test for optimization.In fact, as we show later, usual local gradient-based and gradient-free optimizers tend to meaningfully fail as one increases the number of qubits/lattice sites in the system.The reasons are an enlarged number of excited states and their growing degeneracy in the parameter space that defines a robust local minima problem for optimization.Our results show how DE can avoid or drastically reduce this problem in the optimization landscape, substantially improving the success rate.We test several simple variations within this genetic algorithm that always outperform local optimization methods for large systems.Specifically, we identify one recombination strategy of DE with an exponential crossover that avoids all local minima in the range studied and is suitable to work together with gradient-based methods to speed up optimization and its convergence.Finally, we test DE performance in a correlated fermionic system by applying the same methodology to a 1D Hubbard model with eight qubits.We show again how DE outperforms local optimizers, indicating its potential usage to scale up the study of strongly correlated systems in quantum devices.

Ising model without magnetic field
The Ising model is one of the most simple and well-studied models in the literature, which serves as a starting point towards more complex models for studying magnetism and describing phase transitions [33,34].Furthermore, classical optimization problems can also be tackled by mapping them to spin Hamiltonians [35].An example of this is in the Quantum Approximate Optimization Algorithm (QAOA) [36,37].In quantum computing, spin models can also help to get some intuition for circuit construction.This is because some ansatzes, such as the family of Unitary Coupled Cluster (UCC), QAOA, and Hamiltonian Variational Ansatz (HVA), employ a Hamiltonian to build the parameterized quantum circuit [10,[38][39][40].Thus, working with spin models can also serve to estimate the system connectivity and the degree of entanglement required to simulate our system efficiently [41].In its more general form, the Ising model consists of a Hamiltonian where the first summation stands for the interaction between adjacent spins σ ∈ {−1, +1}, J ij represents the coupling constant between spins in sites i, j and n is the number of sites (qubits) in the lattice (circuit).The second summation represents the coupling of individual magnetic moments with an external magnetic field, where h i is the magnetic field at site i.In the Ising model, each site has two possible values ±1 so that, in a quantum mechanical description, σ can be any of the Pauli matrices.Focused on the 1D case, we set the magnetic field h = 0, maximizing the degeneracy of the excited levels, and the interaction constant J = J ij = 1, ∀i, j.So, our Hamiltonian in matrix form is a sum of Pauli strings being I the 2×2 identity matrix.⊗ stands for the Kronecker product.We take, without loss of generality, σ = σ y and open boundary conditions, so we have a chain of spins with no interaction between the first and last elements (J 1n = 0).In this way, it is straightforward to see that there are just two configurations minimizing the energy ⟨H⟩ with all spins oriented in the same direction.In the present case, these are |i⟩ ⊗n , |−i⟩ ⊗n or their superposition, being |±i⟩ the eigenstates of σ y .The degeneracy of states in the first excited energy level is 2(n − 1), and for n > 2, the second excited level is (n − 1)(n − 2)-fold degenerate.Eigenvalues go from −(n − 1) to n − 1 in steps of two.
FIG. 1.Quantum circuit used to simulate the 1D Ising model with chain length n = 3. L stands for the number of layers, while l = 1, 2, ..., L. Each layer contains a set of parameterized RyRz gates followed by a ladder of CZ gates.The circuit ends with a final set of parameterized RyRz gates.The total number of parameters N θ scales with 2n(L + 1).

Quantum Circuit
If we call |ψ( ⃗ θ)⟩ the state generated by the ansatz, the optimization problem is defined as finding the values of ⃗ θ that minimize and the optimization is successful if min(E θ ) = E 0 , being E 0 the energy of the ground state.To find the eigenvector that gives the ground state of the system, a hardware-efficient ansatz with an entanglement between adjacent qubits is expressive enough [11].In our case, it consists of L layers of parameterized R y R z gates per qubit, followed by a ladder of controlled Z gates (CZ) plus a final parameterized layer of R y R z gates (Fig. 1).So, the total number of parameters N θ is 2n(L + 1).Finally, our quantum circuit also has an initial layer of R y (π/4) gates to avoid starting directly in the state |0⟩ ⊗n .However, this layer is not necessary unless we initialize parameters near zero.
It is important to note that this is not the smallest possible ansatz.For instance, a QAOA ansatz is also expressive and uses fewer parameters, as in Ref. [27], where only N θ = n parameters are necessary to solve the Transverse Field Ising Model (TFIM).This gives a more trainable quantum circuit which performs a more efficient energy minimization.This also occurs for adaptive ansatzes, which help to mitigate problems in the optimization landscape [38,42].Therefore, combining a highly trainable and expressive ansatz with an efficient optimization method is crucial if we want to scale towards larger systems.However, to see the capability of optimizers to avoid local minima and make a representative statistical sample, an expressive ansatz with relatively low trainability is more suitable to work with classical computational resources.

Simulation Details
All simulations use a locally developed code that can be found at https://gitlab.com/proyectos-cesga/quantum/react-eu/vqe_ising_chain_de.Software version information is also available at this repository.In this work, we do not perform any technical modification in the underlying mechanism behind each optimizer.All simulations use the optimizers available from Qiskit and Scipy packages.For local optimizers, the maximum number of iterations and/or function evaluations are the unique adjusted parameters (Table I).All optimizations using these methods finished within the specified number of iterations except some using SPSA and COBYLA (L = 4), which run out all.For these cases optimizations end near the GS or some excited state, although with less precision.For DE, we fix the maximum number of iterations, the crossover strategy (bin/exp), and the initialization.First, we study the response of different used local optimizers in VQAs, which are SLSQP, COBYLA, L-BFGS-B and SPSA.These are some of the most used gradient-based and gradient-free optimizers in the VQAs literature [11,12,22,[30][31][32].However, they are predicted not to avoid barren plateaus, and in some problems, they start to get trapped in local minima as we increase the circuit complexity [23,26,27].Due to the considerable amount of optimizations, we select the success rate (SR) as our metric to compare the performance of the different methods.SR is the percentage of optimizations that finish in the ground state related to the total number of optimizations for each case.For this measurement, we set a tolerance δ = 1 − |E θ /E 0 | ≤ 10 −2 to declare the optimization as successful.For each optimization, we initialize each parameter θ k,l with a random number uniformly distributed along the interval [−π, π) as they correspond to rotations in the Bloch sphere.However, we do not constrain the parameters in that range during minimization.
Figure 2a shows the results for each local optimizer, where each point represents a total of 180 different optimizations N opt .Noticeably, for L = 1, we observe that the success rate significantly decays as we increase the chain length.In particular, for chains with more than five spins, these local optimizers do not guarantee an appropriate energy minimization without exploring modifications to their algorithms.The trend improves as we increase the number of layers in the ansatz.This occurs at the expense of increasing the number of parameters and, hence, the computational time.Nevertheless, this scaling does not seem to compensate as we still have a low success rate, and the downtrend remains unalterable.
As we will see later on, it is important to remark that any time that optimization does not find the GS, it finishes in one of the excited states of the system.This allows us to interpret it as a robust local minima problem.However, as one increases the system size, introduces noise, or increases the system entanglement, other issues, such as barren plateaus, are predicted to appear [26], and one can deal with a variational state whose associated energy could not identify with an eigenvalue of the Hamiltonian.

B. Differential Evolution
A possible way to avoid this local minima problem is with a multiparticle strategy that can update the particles parameters with the remaining population members.This occurs regardless of whether the particle gets trapped in a local minimum or a barren plateau.In this work, we focus on Differential Evolution (DE), an evolutionary algorithm based on population breeding that is easy to implement and run in parallel processors [43,44].Together with DE, there are other evolutionary algorithms, such as the Particle Swarm Optimization, or different variations of these methods [45][46][47].However, up to now, their use is still limited in the field of quantum computing [48][49][50].
In general terms, DE starts from an initial population with size P instead of a single individual, as in the previously analyzed optimizers.Once the population has been created, either explicitly or randomly, DE evaluates the objective function, in our case the energy, to determine its value for the different population members and selects the one with the lowest value (best).Individuals are vectors ⃗ x ≡ (θ 1 , θ 2 , ..., θ N θ ) with N θ elements.Remember that N θ = 2n(L + 1) for our ansatz.From here, it starts a process in which it is defined a new candidate (mutant) for each of the population members (target).There are several strategies to do so [51].Commonly, these strategies will employ three (or more) members of the population to generate a new mutant: which can be all randomly chosen or using ⃗ x best in some of the terms.For our simulations, each mutant is built from three vectors as in (4) but using ⃗ x 0 = ⃗ x best .F is the mutation factor.Once we have a candidate for each population member, a recombination phase between the targets and mutants starts.In there, some parameters of the target are replaced by the mutant's ones following a crossbreeding strategy.This strategy can be binomial, where each parameter has a probability C of being changed by the mutant's one, or exponential, where all parameters between two randomly chosen elements of the target are replaced [52].The process finishes with the evaluation of the energy of the modified target vector.The modified target replaces the original target if the energy is lower or is discarded if it is higher.This process is repeated until convergence where t, and t ′ are the absolute and relative tolerances, and Ē is the average energy of the population.With this in mind, we perform our simulations using DE.For this task, we use the scipy.optimizepackage where DE is already implemented, allowing direct parallelization in different workers using Multiprocessing.In this case, we increase N opt up to 1000 for the local optimizers, repeating the optimizations for L = 1 to make our comparison with DE more reliable.For DE, we choose a random initialization with halton (method provided by Scipy), which allows maximizing the parameter space explored at the beginning [53].In this way, the population size P is exactly the product of p • N θ , being p an integer that fixes the number of individuals per parameter.We execute two sets of simulations: one using p ∈ {1, 15} with a binomial crossover and a second with p = 1 and an exponential recombination strategy (Fig. 2b).In these cases, N opt = 100 except for DE with binomial crossover and p = 1 where N opt = 1000.It is important to remark that simulations with p = 1 allow a direct comparison between DE and the previously analyzed methods.SLSQP and L-BFGS-B (SPSA) compute (approximates) the gradients of N θ components, and COBYLA computes N θ distances.Therefore, for these optimizers, the total number of virtual targets is P = N θ .Tolerance values for the stopping criteria are t = 0 and t ′ = 10 −5 .

Binomial crossover
For simulations with a binomial crossover, we observe that even in the same conditions (p = 1), DE clearly outperforms local optimizers.This is because, even if one target gets trapped in a local minimum, DE can continue evolving.Therefore, this individual has a non-zero probability of eluding the local minimum, thanks to the remaining members.This is not possible for the previous methods unless we explore modifications to their mechanism.Nevertheless, as we increase the number of qubits, the number of possible local minima increases exponentially as 2 n while the ground state always maintains the same degeneracy.It stands to reason that although DE offers better results than the previous methods, even when the ansatz contains more than a single layer, the SR also decays, following, in general terms, the same trend as the previous methods but for a higher number of qubits.Based on this knowledge, increasing the population size P is one way to enhance the SR using DE.To this aim, we execute another set of simulations with p = 15, for which we find that the SR always holds above 88%.This configuration improves the previous results and opens the path to further enhance the SR by only increasing P .However, this will severely enlarge the number of function evaluations needed for the optimization, which is something to look out for when executing in a quantum computer.Notice that DE evaluates P = p • N θ circuits in each iteration in contrast to SLSQP and L-BFGS-B, which only require N θ , and COBYLA, SPSA optimizers, which make one and two function evaluations per iteration, respectively.Then, it is convenient to explore other alternatives rather than directly increasing P , for instance, modifying the recombination scheme.

Exponential crossover
Despite the significant improvement, using a binomial crossover together with the mutation scheme given by (4) for p ∈ {1, 15} does not guarantee the avoidance of all local minima in the range studied.This suggests that the configuration used can still cause all individuals to cluster in a local minimum if ⃗ x best and other targets get trapped.
To try to avoid this, we change the recombination criteria to exponential, which results in a more aggressive mutation scheme.An exponential crossover drastically modifies the target, maximizing the parameter space exploration and the probability of tunneling in a minimum.However, we lose convergence when approaching the global minimum, given that ⃗ x best has less influence, and most of the modified targets will not improve the energy.In this way, we fix the maximum number of iterations allowed to 2.5 • 10 4 , which is far from the number of iterations N it done in the previous simulations that encompass from a few hundred to around 8 • 10 3 for p = 1 and 14 qubits.We can see in Figure 2 how this DE configuration reaches the GS with a 100% probability in the range studied.
However, this stands when the upper limit of δ that defines SR is 10 −2 .More precise approaches to the GS (δ << 10 −2 ) will require a larger number of iterations to maintain the SR (Figure 3).Nevertheless, this is not associated with local minima, as in the previous cases, but with convergence.Therefore, although the relative error achieved is enough in many cases and lower than feasible VQE errors in current quantum hardware, our next goal is to minimize N it and, if possible, reach a δ similar to what gradient methods offer when successfully finding the GS (Figure 4a) [27].In Figure 4, we can see the complete set of final relative errors using the L-BFGS-B (N opt = 10 3 ) method and DE (p = 1, exp, N opt = 10 2 ).We can see how the L-BFGS-B method increasingly finds more excited states of H as we grow in the number of qubits.On the contrary, DE always avoids these states but features a relative error higher than the L-BFGS-B.The causes are that we limited the maximum number of iterations to N max it = 2.5 • 10 4 and that the exponential crossover leads to slow convergence.

C. Hybrid optimization
The straightforward strategy to reduce δ is using a gradient-based optimizer after DE with exponential crossover.In this case, we call the L-BFGS-B method, given that it is the gradient-based optimizer with the lowest tolerance value (10 −15 ) by default.Our simulations (N opt = 100) take the individual with the lower energy ⃗ x best after 2.5 • 10 4 iterations and initialize a new optimization with the L-BFGS-B method from its parameters.Results are shown in Figure 4c, where this hybrid optimization avoids all local minima and finds the GS with high accuracy.
On the other hand, reducing N it , and hence the number of circuit executions can be done efficiently by calling earlier a gradient-based optimizer.However, this could imply knowing a priori some information from your energy spectra, such as the energy difference between the GS and the first excited level.Another possibility is to alternate the use of DE with exp and bin recombination schemes.

IV. TOWARDS STRONGLY CORRELATED SYSTEMS
In this final section, we apply the same methodology to a more complex model from the physical point of view, the Hubbard model.Although the 1D Ising model can work as a complex optimization problem, as we saw in the previous section employing a quantum circuit that is agnostic to the system, its spin configurations can be analytically obtained.In fact, going to the base of fermionic operators by means of an inverse Jordan-Wigner transformation, the Ising model studied just involves quadratic terms of fermionic creation/annihilation operators, which are classically tractable.This is not the case for the exact calculation of strongly correlated systems, which include in their Hamiltonians higher-order terms, and for what quantum computing holds as a promising tool in diverse fields.To delve into correlations we use the most common alternative, which is the Hubbard model in a 1D lattice [12] where the index s ∈ {↑, ↓} denotes the spin of the electrons, t is the hopping amplitude that we assume constant for simplicity, and U is the Hubbard constant that represents the strength of the on-site Coulomb repulsion.The operator a † i,s (a i,s ) creates (annihilates) an electron of spin s at site i.We take periodic boundary conditions.In this case, to see DE performance minimizing the energy, we do not consider a quantum circuit agnostic to the system to ensure expressibility.By contrast, we use a Hamiltonian Variational Ansatz (HVA) that is adequate to model a wide range of condensed matter systems [39,40,54].The HVA is a QAOA-inspired ansatz based on adiabatic evolution to achieve the ground state of the system [40].It takes the Hamiltonian to construct the quantum circuit, so we need to map it into Pauli operators using, for instance, a Jordan-Wigner transformation [6].The resulting Hamiltonian is expressed then as a sum of commuting groups, i.e., H = k H k where [H k , H k ′ ] ̸ = 0, that determines our quantum circuit.The HVA generates a trial wavefunction of the form t come from the two sets of commuting groups defined from the Jordan-Wigner transformation of Ht.(b) Plot of relative error comparison using several optimizers in a 8-qubit 1D Hubbard model.We include a table to show the SR for each of them.The solid, dashed, and dotted lines illustrate the ground state, the first excited, and the upper energy levels, respectively.All optimizations using DE with exponential crossover strategy stopped at N max it = 2.5 • 10 4 iterations.Therefore, its SR is not associated with local minima but limited to N max it .Simulations using the hybrid optimization scheme confirm this fact featuring a 100% SR.

|Ψ(θ)⟩
where each layer l is a product of unitary operators U k (θ k,l ) = e −iθ k,l H k that can be translated to qubit instructions, and |ψ 0 ⟩ is the ground state of some part of the Hamiltonian, for instance, H t .There are different ways to obtain |ψ 0 ⟩ with variable fidelity with respect to the target state [54,55].However, in contrast to the general prescription, we opt to make the preparation of |ψ 0 ⟩ part of our variational protocol up to some extent.We do this to increment the complexity of the optimization procedure as the HVA is an expressible and trainable ansatz.Otherwise, we would need to increase the number of qubits to see the effects of a complex optimization landscape, as it happens for the TFIM with the QAOA ansatz [27].As a consequence, we would require a considerable amount of cores and memory for a classical simulation.To ensure expressibility, we prepare a parameterized wavefunction inspired in the ground state of the non-interacting part of the Hubbard Hamiltonian (6), i.e., the one consisting of quadratic terms.We perform a classical calculation of the transformation matrix that allows us to diagonalize the non-interacting part of (6).Then, we compute the Slater determinant associated with the minimum eigenvalue.In the computational basis, this state |ψ 0 ⟩ can be expressed as a product of NOT gates (depending on the number of electrons), single qubit rotations R y R z , and CNOT gates [55].Thus, we can build the parameterized initial state |ψ 0 ⟩ with the same structure as the layers of the hardware efficient ansatz used for the 1D Ising model but removing the parameterized R z gates and substituting the CZ gates with CNOTs.We can disregard R z gates since the Hamiltonian H t is real and, therefore, also its eigenstates.The structure of the full quantum circuit is shown in Figure 5a.For simulations, we focus on a four-site Hubbard model (t = 1, U = 1), for which we need a total of n = 8 qubits due to the electron's spin.Our quantum circuit takes two layers for the variational initial state and n/2 layers for the HVA, i.e., N θ = 36 parameters.We perform again several sets of simulations using i) all previous local optimizers, ii) DE with p = 1 and binomial crossover, iii) DE with p = 1 and exponential crossover and iv) a hybrid optimization scheme DE (exp)/L-BFGS-B as the one previously employed.We execute N opt = 1000 optimizations with random initialization for the local optimizers and N opt = 100 for the ones using DE.The parameters of the optimizers remain unchanged except for SPSA, for which we need to enlarge N max it to 5 • 10 4 .
As shown in Figure 5b, local optimizers fail to minimize the energy featuring a relatively low SR.As before, DE with binomial crossover provides better results than these algorithms but still gives a substantial amount of incorrect outputs.For its part, DE evolution with exponential crossover approaches quite well to the GS energy after N it = 2.5 • 10 4 iterations.That is an outstanding point, given that any of the optimizations end up meeting the convergence criteria.Therefore, the success rate obtained is only a convergence problem associated with limiting N it and our tolerance threshold for δ.In this way, we suppose that DE exp has avoided all traps in the optimization landscape.This is demonstrated in the final set of simulations, where the L-BFGS-B optimizer applied to the best candidate of DE exp achieves the ground state energy with 100% probability.Finally, another important point comes from the fact that, for local optimizers and DE bin, optimizations stop at energy values that are not eigenvalues of the Hamiltonian.This could be associated with other traps in the optimization landscape, like barren plateaus, rather than local minima.It stands to reason that these areas appear due to the complexity of the problem, the variational initial state, and the breakout of adiabaticity caused by the random initialization of the parameters [27].Of course, this does not harm the ansatz expressibility, just the complexity of the optimization.Despite that, DE with exponential crossover avoids all conflicting points for the case studied.

DISCUSSION
In the near term, the most powerful applications of quantum computers rely on VQAs.However, their applicability to large systems is a current challenge due to optimization-landscape problems.In this context, the ansatz selection, to enhance trainability maintaining expressibility, and the optimization method, are crucial for their performance.In this work, we show DE as a strong candidate to lead the optimization in VQAs.In particular, using basic DE configurations, our results evidence the ability of DE to avoid local minima, clearly outperforming four of the most used optimizers in VQAs literature.
We find that DE with binomial crossover performs better than local optimizers and provides strong convergence.However, it fails to avoid all local minima.DE with exponential crossover features a more aggressive mutation scheme, increases the parameter space exploration and the probability of eluding excited states, but needs a large N it to maintain the same accuracy as the other methods when finding the ground state.To avoid local minima, we can apply other alternatives instead of or together with the exponential crossover.For instance, increasing P , using more individuals in the mutation scheme given by (4), or using a random individual for ⃗ x 0 instead of ⃗ x best .Nevertheless, we can not expect these variations to need fewer circuit executions to reach the ground state.In this context, hybrid optimization schemes such as the one employed or another that intersperse DE with a local optimizer seem to be more correct approaches to this purpose, as well as to elude local minima.
Besides their simplicity, the most relevant characteristic of DE resides in its ability to continue evolving regardless of one or several individuals getting stuck, so it could be immune/resilient to vanishing gradient areas.This comes reasonably from Figure 5b, although more experiments should address it specifically.In addition, DE could present additional benefits against gradient-based algorithms since computing gradients requires more shots as they become small [22].This work uses exact simulations to find the GS in a VQE to see the capability of the different optimizers to avoid local minima in ideal conditions.In this context, DE and hybrid optimizations with DE demonstrate a compelling advantage to scale VQAs.Future work addressing DE performance in noisy environments and real quantum computers can reaffirm DE-based optimization as a robust candidate to walk toward the so-called quantum advantage in VQAs.

DATA AVAILABILITY
All data generated or analysed during this study are available from the corresponding author upon reasonable request.
DE simulations use Scipy together with Multiprocessing to run in multiple processors.COBYLA SLSQP L-BFGS-B SPSA DE (bin) DE (exp)N θ N max it 10 3 N θ 2N max it pN θ N max it pN θ N max it TABLE I. Maximum number of iterations (N max it) and maximum number of function evaluations (N max f ) for each optimizer used to minimize the 1D Ising model.p is an integer that fixes the number of individuals per parameter when using DE.

FIG. 2 .
FIG. 2. (a) Success rates for 180 optimizations with different random initializations of 3-to-14 qubit circuits and a tolerance (threshold) of 10 −2 for success.Each of the four curves stands for a different number of layers used in the ansatz, as shown in legends, with each subplot for an optimizer.(b) Success rate ( δ ≤ 10 −2 ) for local optimizers and DE.Quantum circuits have L = 1 and N θ = 4n parameters.In optimizations with DE, p indicates the number of individuals per parameter.So, the total number of individuals in the population is p • N θ .Labels bin and exp stand for binomial and exponential crossover strategies, respectively.Nopt = 1000 in all cases except for DE with p=15 with a binomial crossover and DE with p=1 with exponential recombination, where Nopt = 100.

FIG. 3 .
FIG.3.Relative error δ = 1 − |E θ /E0| vs. iteration for a 10 qubits Ising chain without magnetic field using DE with exponential crossover (p = 1).The plot shows five different random initialization of the population P .Optimizations leap all excited states in the first 1000 iterations approximately.However, they need much more iterations to improve the convergence to the GS.

FIG. 4 .
FIG. 4. Relative errors comparison using local, global, and hybrid optimization strategies.(a) L-BFGS-B data of Nopt = 1000 experiments.The inset shows the excited states (grey curves) found during optimizations.(b) Results using DE with p=1 and exponential crossover for Nopt = 100.Relative error grows with the number of qubits given that optimizations are limited by a constant number of iterations (N max it = 2.5 • 10 4 ).(c) Results of a hybrid strategy, where we repeat the previous DE simulations to avoid convergence to excited states, but we subsequently call the L-BFGS-B optimizer to polish ⃗ x best .

FIG. 5 .
FIG. 5. (a) Schematic illustration of the parameterized quantum circuit used to find the ground state of the Hubbard Hamiltonian (6).The figure exemplifies the circuit used to solve a two-site lattice (n = 4 qubits).The part colored in red represents the parameterized initial state belonging to the non-interacting term Ht.The blue one corresponds to the layers of the HVA.The unitary operator UU (θ) = e −iθH U is associated with the Hubbard term in (6), while Ut and U ′ t come from the two sets of commuting groups defined from the Jordan-Wigner transformation of Ht.(b) Plot of relative error comparison using several optimizers in a 8-qubit 1D Hubbard model.We include a table to show the SR for each of them.The solid, dashed, and dotted lines illustrate the ground state, the first excited, and the upper energy levels, respectively.All optimizations using DE with exponential crossover strategy stopped at N max