Systematic study on the dependence of the warm-start quantum approximate optimization algorithm on approximate solutions

Quantum approximate optimization algorithm (QAOA) is a promising hybrid quantum-classical algorithm to solve combinatorial optimization problems in the era of noisy intermediate-scale quantum computers. Recently it has been revealed that warm-start approaches can improve the performance of QAOA, where approximate solutions are obtained by classical algorithms in advance and incorporated into the initial state and/or unitary ansatz. In this work, we study in detail how the accuracy of approximate solutions affects the performance of the warm-start QAOA (WS-QAOA). We numerically find that in typical MAX-CUT problems, WS-QAOA achieves higher fidelity (probability that exact solutions are observed) and approximation ratio than QAOA as the Hamming distance of approximate solutions to the exact ones becomes smaller. We reveal that this could be quantitatively attributed to the initial state of the ansatz. We also solve MAX-CUT problems by WS-QAOA with approximate solutions obtained via QAOA, having higher fidelity and approximation ratio than QAOA especially when the circuit is relatively shallow. We believe that our study may deepen understanding of the performance of WS-QAOA and also provide a guide as to the necessary quality of approximate solutions.


I. INTRODUCTION
The last decade has seen significant technological progress in manufacturing hardware platform of quantum computers [1].The current pace of scale-up in quantum devices raises a hope that quantum processors with hundreds of physical qubits could be available within the next decade.These near-term quantum computers are referred to as noisy intermediatescale quantum (NISQ) computers [2] in that they are classically intractable, but still not sufficiently large to implement quantum error correction.As the NISQ era approaches, there have been an increasing number of researches that develop algorithms to efficiently leverage NISQ devices [3][4][5].They are designed to solve quantum many-body problems in chemistry and physics as well as classical problems in combinatorial optimization and machine learning.Most of these studies employ hybrid quantum-classical approaches, primarily variational quantum algorithms [6,7].In these algorithms, variational quantum states are created via parameterized quantum circuits on a quantum computer, whereas the parameters are updated on a classical computer to optimize the objective function calculated with the measurement outcomes.Since variational quantum algorithms generally take a relatively low number of gate operations, they are considered as suitable to gain quantum advantage on NISQ computers.
Quantum approximate optimization algorithm (QAOA) [8], a representative example of variational quantum algorithms, solves combinatorial optimization problems in a spirit analogous to adiabatic quantum annealing (QA) [9][10][11].Indeed, the variational state (ansatz) of QAOA can be deduced by the Trotter de-composition of the time evolution of QA.Despite some numerical demonstrations of its efficacy in small-size problems [12,13], it has been a subject of discussions whether QAOA could practically outperform the best classical algorithms [14,15].Several kinds of variants have been proposed to improve upon the original version of QAOA [16][17][18][19][20][21][22].To name a few, Farhi et.al proposed a variant of the ansatz by allowing different parameters for each rotation gate [16].Hadfield et.al extended the ansatz by generalizing mixer operations, which could be suitable to optimization problems with constraints [19].
The variant that our work will focus on is the warmstart QAOA (WS-QAOA) proposed by Egger et.al [21] and Tate et.al [22].The basic idea behind the approach is to facilitate the convergence to the solution by distorting the original ansatz towards a classicallyobtained approximate solution.Egger et.al encodes rounded/unrounded semidefinite programming solutions into the initial state and mixer term in the ansatz [21].Tate et.al also encodes semidefinite programming relaxations into the initial state, but not the unitary circuit [22].We also mention that a similar warm-start approach has been independently studied in QA [23].
In this paper, we examine how the performance of WS-QAOA depends on quality of approximate solutions to make a deep understanding of its efficacy.In refs.[21,22], the authors solved problems with WS-QAOA by acquiring approximate solutions by classical algorithms.Meanwhile, it remains unclear how accurate approximate solutions should be for WS-QAOA to outperform QAOA.Here we deduce the ansatz of WS-QAOA starting from QA with a bias field [23] and carefully study how the performance of WS-QAOA depends on the quality of approximate solutions by numerical simulations on the MAX-CUT problem.We find out that WS-QAOA shows a better performance relative to QAOA as the Hamming distance of the approximate solutions to the exact ones becomes smaller.We also reveal that the observation could be partially attributed to the initial state of the ansatz.Finally, we solve the MAX-CUT problem with WS-QAOA after obtaining approximate solutions by QAOA and have superior results to QAOA especially when the circuit depth is small.
The rest of the paper is organized as follows.In Sec.II, we formulate WS-QAOA in the context of QA.Then, in Sec.III, we numerically study the performance of WS-QAOA on the MAX-CUT problem for various approximate solutions in terms of the Hamming distance to the exact solutions as well as for different strengths of the bias field.In Sec.IV, we solve the MAX-CUT problem by combining WS-QAOA with QAOA and compare its efficacy to QAOA.Finally, in Sec.V, we summarize our results.

II. FORMULATION A. MAX-CUT problem
As a prototypical combinatorial optimization problem, we consider the MAX-CUT problem, which is known as NP-hard.It is defined on a graph G = (V, E), where V represents a set of vertices, and E represents a set of edges between the vertices.We denote the number of vertices in G as n.The MAX-CUT problem is to find a partition of V into two subsets that maximizes the total number of edges between one subset and the other.In a general case that each edge is associated with a realvalued weight w ij , one evaluates the weighted sum of those edges.The problem is formulated as maximization of the following objective function where x i denotes a binary variable associated with vertex i (x i = 0, 1).We note that bit strings {x i } and {x i } (x i ≡ 1 − x i ) give the same value of C. In the following, we denote {x sol i } and {x sol i } as the single pair of solutions.In the language of physics, the MAX-CUT problem is encoded in finding the ground state of the corresponding Ising Hamiltonian, which is obtained by replacing x i for (1 − Z i )/2 (Z: the Pauli Z matrix) in the objective function C and changing the whole sign.The Hamiltonian reads where the offset is left off.

B. QAOA
QAOA searches for the ground state of the Hamiltonian H C using a QA-inspired ansatz with 2p variational parameters for depth p [8].The ansatz is constructed by alternating applications of the driver operation U C and mixer operation U T to the equal-weight superposition state |+ ⊗n .It is written down with variational parameters β s and γ s (1 ≤ s ≤ p) as Here the driver and mixer are defined as U C (γ s ) = e −iγsHC and U T (β s ) = e −iβsHT , respectively, where H T = − i X i (X: the Pauli X matrix) represents a transverse-field term.One can deduce the ansatz |Ψ QAOA via the first-order Trotter decomposition of the QA procedure, where the wave function evolves under the Hamiltonian with a schedule function u(t) (u(0) = 0 and u(T ) = 1).

C. WS-QAOA
QA has had considerable success in solving combinatorial optimization problems [9][10][11].However, when gap closing occurs during the annealing, it often gets stuck at suboptimal solutions.Reverse QA is an effective variant to circumvent this challenge, which incorporates in the annealing process an approximate solution obtained in advance [24].In this procedure, the state adiabatically evolves from the approximate solution at the beginning to the exact solution at the end, driven by quantum fluctuations of a transverse field with a mountain-like time profile.The dynamics is described by the Hamiltonian , where H I yields the approximate solution as the ground state, and h(t) is a concave function with h(0) = h(T ) = 0.It was shown that the performance of reverse QA is largely dominated by the Hamming distance of the approximate solution from the exact one [24].
Recently Graß [23] proposed a similar but simpler QA procedure to make use of an approximate solution, which introduces a longitudinal bias field that favors the approximate solution.The procedure, which we call biased quantum annealing (BQA) hereafter, is governed by the Hamiltonian where H L represents a site-dependent longitudinal field defined as Here {x 0 i } represents an approximate solution, and α denotes strength of the bias field.The author showed that BQA outperforms QA when one prepares approximate solutions that are close enough to the exact solutions in terms of the Hamming distance [23].
Here we formulate a QAOA version of BQA, which actually corresponds to WS-QAOA [21,22].One can derive the ansatz via the Trotter decomposition of BQA under the Hamiltonian H BQA (t) in the same manner as one deduces |Ψ QAOA from H QA (t).Then the WS-QAOA ansatz is represented as where U L is defined as U L (β s ) = e −iβsHL .The initial state |Ψ 0 is written down as R Y (θ) = e i(θ/2)Y (Y : the Pauli Y matrix) represents a θ-rotation around the y-axis.For α = 0, |Ψ WS−QAOA corresponds to the QAOA ansatz |Ψ QAOA .The WS-QAOA ansatz |Ψ WS−QAOA is almost identical to that in ref. [21] except a small difference in representation of the mixer; the latter implements e −iβs(HT+HL) with three layers of rotation gates, whereas the former uses a decomposed form e −iβsHT e −iβsHL with two layers.

III. NUMERICAL SIMULATIONS
In this section, we examine how the WS-QAOA performance varies with choice of approximate solutions {x 0 i }.For that purpose, we numerically study the MAX-CUT problem on weighted 3-regular (w3R) graphs.In w3R graphs, each vertex is connected to three others chosen at random, and each edge has weight w ij randomly set from [0, 1).We employ a fast quantum circuit simulator Qulacs [25].
For optimization of the parameters, we use two methods, random initialization (RI) and an interpolationbased heuristic termed INTERP [13].In RI, we take the best sample out of 50 randomizations of the initial values.Given the translational symmetry of the ansatz ) for α = 1, and [−π, π) otherwise, whereas those of γ s are set from [−2π, 2π).On the other hand, in INTERP, the parameters are optimized incrementally from depth 1 to depth p.Here initial values of the parameters at depth p, β s [p] and γ s [p] (1 ≤ s ≤ p), are uniquely determined via an interpolation of the optimized values at depth p − 1 as . It has been revealed that INTERP works more efficiently than RI for QAOA on w3R graphs [13].
In this work, based on our benchmark calculations, we choose a better method, depending on α, p.For QAOA (α = 0), we use INTERP.For WS-QAOA, with α = 0.4, we use INTERP, whereas, with α = 1, we use RI at p ≤ 3 and INTERP at p = 4.We note that, regardless of α( = 0), we use RI when {x 0 i } corresponds to the exact solution.In both methods, parameters are updated via a gradient descent until the gradient becomes lower than a certain threshold value.In WS-QAOA, we set approximate solutions {x 0 i } by flipping d bits randomly selected from n bits in the solution {x sol i }.In other words, d represents the Hamming distance of {x 0 i } to {x sol i }.Figures 1 show the optimized parameters of QAOA (α = 0), and WS-QAOA with α = 0.4, 1 for d = 1 on 50 graph instances of n = 10.Figs. 1 (a) and 1(d) show that in QAOA, β s (γ s ) decreases (increases) with s, which resembles the process of QA [13].We observe a similar trend in WS-QAOA with α = 0.4 in Figs.1(b) and 1(e).Meanwhile, when α = 1, the parameters are not monotonic against s, which may reflect that the property of QA declines as α becomes larger.We also note that the parameters are optimized by RI in Figs.1(c) and 1(f).
We compare the performance of WS-QAOA to that of QAOA.As a performance indicator, we use the fidelity of the optimized ansatz |Ψ WS−QAOA .We define the fidelity of a wave function |Φ as In Fig. 2(a), F of WS-QAOA with α = 0.4 at p = 3 is plotted against that of QAOA on 50 graph instances of n = 14.In the following, we focus on α = 0.4, 1.We refer to Appendix A for a closer look at α dependence.Figure 2(a) indicates that the relative performance of WS-QAOA against QAOA is dominated by the Hamming distance d.Importantly, F WS−QAOA becomes higher as d decreases.We find that WS-QAOA outperforms QAOA in all cases for d ≤ 2 and in most cases for d = 3 [Fig.2(a)].We note that F WS−QAOA is always almost unity for d = 0.The enhanced fidelity with the decrease in d has also been observed in BQA [23].
Success of WS-QAOA with small d is also manifested in the bit string with highest probability, {x hp i }, in the optimized ansatz.We proceed to study the graph size dependence.In Figs.3(a) and 3(c), the averaged fidelity of WS-QAOA at p = 3 is shown against the number of vertices n, together with that of QAOA (α = 0).Figures 3(a) and 3(c) correspond to α = 0.4 and α = 1, respectively.We present the entire data of p = 1-4 in Appendix B. In both WS-QAOA and QAOA, F shows a nearly exponential decay with n, but importantly it decreases less steeply in WS-QAOA than in QAOA.As a result, with larger n, WS-QAOA outperforms QAOA with even larger d.We also compare α = 0.4 and α = 1.Figures 3(a) and 3(c) indicate that as d increases incrementally, fidelity decreases roughly by a constant multiplicative factor (aside from d = 0 → 1) and that the factor is smaller for α = 0.4 than for α = 1.These features seem to stem from the initial state at least In Figs.
We draw the curve of Eq. ( 12) in Figs. 4. One can see that in both α, d c /n estimated from the actual fidelity is smaller than the theoretical curve for the initial state.This indicates that QAOA gains more fidelity by the optimized unitary circuit than WS-QAOA.Average out probabilities P m WS-QAOA (m = 0,…,M-1): In the previous section, we studied the dependence of the WS-QAOA performance on approximate solutions and revealed that their Hamming distance to the exact solutions plays a crucial role.In this section, we solve the MAX-CUT problem with WS-QAOA while finding suitable approximate solutions.To find them, as in the previous studies [21,22], one could rely on the well-known classical algorithm for combinatorial optimization [26].Instead, we employ QAOA here.This resembles the approach in the previous study of BQA, where approximate solutions are obtained by QA beforehand [23].
In Fig. 5, we depict a flow diagram of our procedure.We call this procedure QAOA+WS hereafter.First, we solve the problem using QAOA and pick up M bit strings with highest probabilities, {x m i } (m = 0, ..., M − 1), based on the distribution P QAOA from |Ψ QAOA .Then we conduct WS-QAOA with {x m i } as an approximate solution and obtain the distribution P m WS−QAOA from |Ψ WS−QAOA .At the end, we obtain the final distribution P QAOA+WS by averaging out M distributions P m WS−QAOA .We compare the fidelity of QAOA+WS to that of QAOA.The fidelity of QAOA+WS is calculated as the average over the fidelities of M runs of WS-QAOA.present the fidelity of QAOA+WS with α = 0.4 along with that of QAOA plotted against the number of vertices.In Figs. 7 (a-d), we set p = 1-4, respectively.The fidelity is averaged over 50 graph instances for n = 10, 12, 14, 20, 20 for n = 16, and 15 for n = 18.Importantly, the fidelity decays more slowly with n in QAOA+WS than in QAOA for p = 1-4 [Figs.7].As a result, QAOA+WS on average outperforms QAOA for all M as n increases; for n ≥ 10 at p = 1, n ≥ 14 at p = 2, n ≥ 16 at p = 3, and n = 20 at p = 4.It should be also mentioned that QAOA+WS becomes more beneficial for smaller p, because the difference in the decay with n seems to decrease as p increases.

V. CONCLUSION
In this work, we systematically studied how the performance of WS-QAOA depends on the quality of approximate solutions by numerical simulations on the MAX-CUT problem on w3R graphs.We found that WS-QAOA yields higher fidelities than QAOA when one uses approximate solutions that are close enough to the exact solutions in terms of the Hamming distance; WS-QAOA with α = 0.4 (α = 1) on average outperforms QAOA if the relative Hamming distance of approximate solutions to the exact ones, d c /n, is below 0.2-0.3(0.1-0.25).We also obtained theoretical curves that explain those properties.Lastly, we showed that QAOA could serve as a capable way to find approximate solutions for WS-QAOA.We found out that WS-QAOA combined with QAOA shows a better performance than QAOA specifically when the depth is limited to a small number.
We believe that our findings could allow one to make a clear understanding of the efficacy of WS-QAOA.They might also be helpful to determining the criteria of approximate solutions for WS-QAOA.fidelity over 50 graph instances of n = 10 as a function of α for p = 1-4.Here we optimize the parameters using RI for WS-QAOA with d = 0 and INTERP otherwise.We note that α = 0 corresponds to QAOA.For WS-QAOA with d = 0, the fidelity F almost equals 1, aside from p = 1.For d = 1, 2, F has a peak around α = 0.4-0.8,whereas, for d ≥ 3, F monotonically decreases with α.Therefore the optimal α varies with d.

5 FIG. 2 .
FIG. 2. Performance of WS-QAOA with α = 0.4 at p = 3 on 50 instances of w3R graph (n = 14).d represents the Hamming distance between the approximate solution {x 0 i } and exact one {x sol i }.(a) Fidelity of WS-QAOA (FWS−QAOA) versus that of QAOA (FQAOA).The dotted line corresponds to FWS−QAOA = FQAOA.(b) Histogram over d and d hp .d hp is the Hamming distance of {x hp i } to {x sol i }.The dotted line corresponds to d hp = d.

Figure 2 (
b) shows the histogram over d and the Hamming distance of {x hp i } to {x sol i }, d hp , on 50 instances of n = 14.One can see that the closer the approximate solution is to the exact solution, the more likely {x hp i } is to correspond with the exact solution (d hp = 0) [Fig.2(b)].It is notable that for d = 4, the optimized ansatz still yields the solution as the highestprobability string in about a half of the instances.
FIG. 3. (a, c) Graph size dependence of the average fidelity obtained by WS-QAOA with different d compared to QAOA (α = 0) for p = 3.(a) corresponds to α = 0.4 and (c) to α = 1.The fidelity is averaged over 50 instances for n ≤ 14, 20 for n = 16, 15 for n = 18, and 10 for n = 20.The error bar represents s.e.m. (b, d) Calculated fidelity for the initial state of the ansatz, Eq. (11).
3(b) and 3(d), one can observe similar behaviors to Figs. 3(a) and 3(c), although the magnitude of the fidelity is significantly improved by the optimized circuit.The calculations above indicate how close approximate solutions should be to the exact solutions for WS-QAOA to outperform QAOA.From the graph size dependence of the fidelity [Figs.3(a) and 3(c)], we estimate the critical Hamming distance of {x 0 i }, d c , which determines whether WS-QAOA outperforms QAOA or not.For example, we estimate d c = 3 for n = 12, α = 0.4 from Fig. 3(a).Figures 4(a) and 4(b) show d c scaled by the graph size n for α = 0.4 and α = 1, respectively.For α = 0.4, d c /n ranges within [0.2, 0.3), whereas for α = 1, it hovers from 0.1 to 0.25.We also theoretically derive d c /n for the initial state |Ψ 0 starting from F 0 (α) = F 0 (α = 0) (see Appendix C), which reads

4 FIG. 4 .
FIG. 4. Critical relative Hamming distance dc/n plotted against n for (a) α = 0.4 and (b) α = 1.When approximate solutions are less-than-dc away from exact solutions, WS-QAOA outperforms QAOA on average.The black line denotes the theoretical value of dc/n estimated for the initial state |Ψ0 (see Eq. (12) in the text).