Multi-angle quantum approximate optimization algorithm

The quantum approximate optimization algorithm (QAOA) generates an approximate solution to combinatorial optimization problems using a variational ansatz circuit defined by parameterized layers of quantum evolution. In theory, the approximation improves with increasing ansatz depth but gate noise and circuit complexity undermine performance in practice. Here, we investigate a multi-angle ansatz for QAOA that reduces circuit depth and improves the approximation ratio by increasing the number of classical parameters. Even though the number of parameters increases, our results indicate that good parameters can be found in polynomial time for a test dataset we consider. This new ansatz gives a 33% increase in the approximation ratio for an infinite family of MaxCut instances over QAOA. The optimal performance is lower bounded by the conventional ansatz, and we present empirical results for graphs on eight vertices that one layer of the multi-angle anstaz is comparable to three layers of the traditional ansatz on MaxCut problems. Similarly, multi-angle QAOA yields a higher approximation ratio than QAOA at the same depth on a collection of MaxCut instances on fifty and one-hundred vertex graphs. Many of the optimized parameters are found to be zero, so their associated gates can be removed from the circuit, further decreasing the circuit depth. These results indicate that multi-angle QAOA requires shallower circuits to solve problems than QAOA, making it more viable for near-term intermediate-scale quantum devices.

One approach to reduce the circuit depth of QAOA is to increase the number of classical parameters introduced in each layer, a variation that we term multi-angle QAOA (ma-QAOA). This approach was originally briefly introduced in 50 . Increasing the number of classical parameters allows for finer-grain control over the optimization of the cost function and the approximation ratio, which measures optimality relative to the known best solution. While introducing more classical parameters can lead to a more challenging optimization, a corresponding reduction in circuit depth preserves the critical resource of the quantum state. In addition, finding the absolute optimal angles is not necessary in order to see an improvement over QAOA.
Here, we quantify the advantages of using multiple parameters for each layer of QAOA. First, we prove that the approximation ratio converges to one as the number of iterations of ma-QAOA tends to infinity, a property that ensures the optimal solution is the most likely. We next demonstrate that one iteration of ma-QAOA gives an approximation ratio that is at least that of the approximation ratio after one iteration of QAOA. This shows that ma-QAOA performs at least as well as QAOA. We also show that ma-QAOA used to solve the MaxCut problem www.nature.com/scientificreports/ on star graphs achieves an approximation ratio of one after one iteration, while single-iteration QAOA tends to an approximation ratio of 0.75 as the number of vertices goes to infinity. This result gives a concrete example where ma-QAOA gives a strictly larger approximation ratio than QAOA. We simulate solving MaxCut using ma-QAOA and QAOA on all connected, non-isomorphic eight vertex graphs and compare the performance of the two ansatzes. In doing so, we find that the average approximation ratio for ma-QAOA after one iteration is larger than the average approximation ratio of QAOA after three iterations. In looking at larger, fifty and onehundred vertex graphs, we see that ma-QAOA retains its advantage over QAOA, giving approximation ratios that are on average six percentage points higher after the first iteration.

Results
Multi-angle quantum approximate optimization algorithm. We develop the multi-angle QAOA beginning with the standard formulation of the quantum approximate optimization algorithm (QAOA). The QAOA relies on a combination of classical parameter optimization and applying cost and mixing operators to a quantum state in order to approximately solve combinatorial optimization (CO) problems 13 . CO problems are defined by an objective function, C(z), where z is a bit string of length n. Often, C(z) is the sum over a collection of clauses, When solving these problems with QAOA, C(z) is encoded into a matrix C with eigenvalues given by the classical cost values QAOA requires two operators, and which have real-valued angle inputs γ ∈ [0, 2π) and β ∈ [0, π) . B drives transitions between computational basis states and is typically where B v = σ x v is the Pauli-x operator acting on qubit v in the quantum system. The two operators are applied to an initial state, Here the sum is over the computational basis |z� . The QAOA ansatz operator applied p times to |s� is denoted p-QAOA. The state for p-QAOA is Since C and B are sums of matrices, we may write and Instead of focusing on minimizing the classical optimization efforts in QAOA, QAOA can be modified such that it requires more classical parameters 50 . The new classical parameters are introduced to QAOA by allowing each summand of the cost and mixing operators to have its own angle instead of a single angle for the cost operator and a second angle for the mixing operator. In this modification, and where � γ l = (γ l,a 1 , γ l,a 2 , ...) and � β l = (β l,v 1 , β l,v 2 , ...) . Here, l denotes the layer, a i denotes a specific clause, and v j refers to a specific qubit. We call this modification multi-angle QAOA and abbreviate it ma-QAOA. Similarly to QAOA, when the operators for ma-QAOA are applied p times to the initial state, we call this p-ma-QAOA.
|γ , β� = U(β p , B)U(γ p , C)...U(β 1 , B)U(γ 1 , C)|s�. www.nature.com/scientificreports/ The performance of the algorithm is typically characterized by the approximation ratio, denoted A.R., which compares the expectation value of the cost operator C to the optimal solution value C max . We will write �C� = �C� p for p-QAOA and �C� = �C� ma p for p-ma-QAOA.
Let M p be the maximum of C p over all angles. Then, M p ≥ M p−1 . Farhi, Goldstone, and Gutmann showed that M p tends to the maximum of the objective function, C max , for the CO problem being solved as p tends to infinity 13 . We similarly define the expected value of C after p iterations of ma-QAOA as �C� ma . We also define M ma p to be the maximum of C ma p over all angles. Clearly, M ma p ≥ M p because QAOA is the special case of ma-QAOA where β p,u = β p,v for all u = v and γ p,a i = γ p,a j for edges a i = a j .
In order to show ma-QAOA gives the optimal solution to a combinatorial optimization problem, we must show C ma p converges to C max as p tends to infinity. Convergence is the first main result of this work.

Theorem 2.1
The multi-angle quantum approximate optimization algorithm converges to the optimal solution of a combinatorial optimization problem as p → ∞.
The proof of convergence is given in section "Methods".
MaxCut problem and performance on star graphs. In graph theory, a graph G = (V , E) consists of a collection of vertices, V, and edges, E, which are pairs of vertices. MaxCut is a CO problem defined with respect to a graph. For QAOA, each qubit corresponds to a vertex in G and the cost operator is 13 The goal of the problem is partition the vertices into two sets such that the number of edges with endpoints in each set is maximized.
A star graph on n vertices is a graph that consists of one vertex of degree n − 1 , called the center. All other vertices of the graph have degree one, meaning each vertex is connected to the center and only the center. An example can be seen in Fig. 1. All stars are trees, and are thus bipartite, so the optimal MaxCut solution includes all edges of the graph. In order to show ma-QAOA outperforms QAOA when solving MaxCut on star graphs, we show that �C� ma 1 = 1 and C 1 tends to 0.75 as n tends to infinity. The proof is found in section "Methods".
Computational results. In order to test how ma-QAOA performs, we simulated the algorithm on a collection of one-hundred triangle-free 3-regular graphs with fifty vertices and one-hundred triangle-free 3-regular graphs with 100 vertices and compared the approximation ratios calculated with ma-QAOA to those of 1-QAOA. We also performed the same calculations with fifty modified G n,p random graphs with fifty and onehundred vertices each; approximation ratio results for all large graphs are summarized in Table 1. In the G n,p model, n sets the number of vertices, and p is the probability that an edge exists. In particular, we examined G 50,0.08 and G 100,0.035 in order to create random graphs that have average degree approximately three. After randomly generating the graphs, triangles were removed by randomly removing edges from each triangle. For these sets of triangle-free graphs we can compute C ma 1 for large n using the analytical result of Theorem 4.1. Table 1 shows the average approximation ratios for each collection of graphs with ma-QAOA and 1-QAOA, as well as the changes in the approximation ratio and percent change in the approximation ratio gap. This approximation ratio gap is the percent difference between one minus the approximation ratio for 1-QAOA and one minus the approximation ratio for ma-QAOA. The ma-QAOA has a higher average approximation ratio and gives a significant percent increase in approximation ratio gap for each collection of graphs. These simulations only compare www.nature.com/scientificreports/ 1-QAOA to 1-ma-QAOA, however, the next set of computational results compares 1-ma-QAOA to p-QAOA for p ≤ 3 on all connected, non-isomorphic graphs.
In previous work, we determined C 1 , C 2 , and C 3 for all connected, non-isomorphic eight vertex graphs and compiled them into an online data set 35,51 . For this work, we calculated the angles that maximize C ma 1 and compared C p to C ma 1 . On average, the performance of ma-QAOA is comparable to 3-QAOA on these graphs. Table 2 shows that ma-QAOA has a higher average approximation ratio than 1-QAOA and 2-QAOA on all eight vertex graphs. However, the average approximation ratio for one iteration of ma-QAOA is larger than the average approximation ratio for 3-QAOA. Figure 2 shows how the distribution of approximation ratios for ma-QAOA compares to the approximation ratios for up to three iterations of QAOA for all connected, non-isomorphic eight vertex graphs. The percentage of graphs with approximation ratio at least 0.95 is significantly higher with ma-QAOA than up to three levels of QAOA. The fraction of graphs with approximation ratio at least 0.85 and 0.9 is higher for 3-QAOA than ma-QAOA, however significantly more graphs have an approximation ratio of at least 0.95 with ma-QAOA.

Measurement reliability.
We quantify the number of measurements to obtain a reliable result from ma-QAOA and QAOA using a simple noise model with Kraus-operator error channels acting after each unitary operator in the ansatz. On fully connected hardware, the numbers of one-qubit unitary operators and two-qubit unitary operators per iteration of QAOA for MaxCut equal the numbers of vertices n and edges m in the graph, respectively. On connected n = 8 vertex graphs, 7 ≤ m ≤ 28 . Following these unitary and channel operators, the circuit produces a final state ρ = Fρ ideal + (1 − F)ρ noise , where F is the probability associated with the ideal noiseless evolution component ρ ideal 52 . Assuming error rates of ǫ n and ǫ m for each one-and two-qubit unitary respectively, A measurement projects ρ onto a basis state |z� and the total set of measurement probabilities is described by ρ ′ = z � z ρ� z , with z = |z��z| . The expected number of measurements to sample a result |z� from the ideal distribution is 1/F in the worst-case 48 , when Trρ ′ ideal ρ ′ noise = 0 ; the number of measurements can decrease   www.nature.com/scientificreports/ depending on the specific state and noise process, but to keep the discussion general we take the expected number of measurements as 1/F . We compute F using the average numbers of edges m for graphs in our datasets, for example �m� = 14.4 at n = 8 , but note each specific graph has an integer number of edges. Assuming p = 1 , n = 8 , �m� = 14.4 , and an error rate of 1% for each unitary operator, the expected number of measurements to obtain a sample from the noiseless distribution is 1.25. We find that parameter optimization with ma-QAOA yields angles of zero for a subset of the edge and vertex unitary operators and we use this in the calculation of F. Since exp(−iγ p,a C a ) = I = exp(−iβ p,v B v ) when γ p,a = 0 and β p,v = 0 , all unitary operators with an angle of zero may be excluded from the optimized circuit. This decreases the exponent of the first and second terms in F by the number of vertex and edge operators that have zero angles, respectively, and thus reduces the amount of noise in ma-QAOA relative to QAOA. Table 3 gives the percent of zero angles, rounded to three decimal places, for each collection of graphs that were studied. Table 4 shows the ratio of the expected number of measurements needed to sample from the noiseless distribution for p-QAOA relative to ma-QAOA for each collection of graphs with varying values of ǫ m , using the average reduction in gates for ma-QAOA from Table 3. Note that if the ǫ �m� = 0.05 , the number of samples increases rapidly with p.
From the performance bound of Theorem 2.1, ma-QAOA will never need more layers than standard QAOA to reach a given approximation ratio. Whenever standard QAOA requires more layers than ma-QAOA, the additional noise from these layers will lead to an increase in the number of samples that are needed according to our model. Since one iteration of ma-QAOA is comparable to three iterations of QAOA on eight vertex graphs, if the trend holds for larger graphs, ma-QAOA has the potential to require significantly fewer samples than QAOA.
Computing angles. With a larger number of variables to optimize, the ma-QAOA method requires more classical effort to find angles that optimize the approximation ratio. However, it is not necessary to identify exact optimal angles, only to find angles that are better than QAOA angles.
We used the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm to compute angles for the 8-vertex graphs; details can be found in "Methods" section. Figure 3 shows how the approximation ratio improves on average across all iterations of BFGS for each ansatz studied for a random sample of eight vertex graphs. Note that after approximately ten iterations, ma-QAOA tends to achieve a higher approximation ratio than any of the p-QAOA. We do note that the time required to perform each iteration of BFGS is slower for ma-QAOA, as the number of gradient components is linearly dependent on the number of variables being optimized.
Scaling. We assess the scalability of ma-QAOA using computed optimized C for sets of triangle-free Erdős-Rényi and 3-regular graphs with n = 50 and n = 100 vertices. The computational details are given in section "Methods". We compare the run times for typical graph optimizations to assess how the ma-QAOA parameter optimization time increases with graph size.
For the Erdős-Rényi graphs, the time for a single optimization for n = 50 was 0.10 seconds, for n = 100 it was 0.46 seconds. We attribute the difference primarily to the scaling in the calculation of the gradient, which is the most expensive calculation in the optimization. Our approach computes each of the n + m derivatives Table 3. The percent of β v and γ a , rounded to three decimal places, that are zero when optimizing ma-QAOA on the family of graphs found in the first column.  Table 4. The ratio of the expected number of measurements to obtain a sample from the noiseless distribution for p-QAOA relative to 1-ma-QAOA on an n vertex graph, assuming an average number of edges m for graphs in the datasets.
n m ǫ n = ǫ �m� = 0.01 ǫ n = 0.01, ǫ �m� = 0.05 www.nature.com/scientificreports/ ∂ C p,uv ma /∂β p,w and ∂ C p,uv ma /∂γ p,jk for each of the m terms C p,uv ma in the cost function, giving a total number of terms ∼ (n + m)m . The time to compute each term will vary with the degree of the graph, as this determines the number of cosine terms in Theorem 4.1; however, for our graphs the degree is approximately constant hence can be neglected in the scaling. For our graphs m ∼ n on average, so the overall scaling is ∼ n 2 , which is consistent with the ≈ 4× increase in time when n is doubled from n = 50 to n = 100 . We attribute the remainder of the time difference to variations in the number of iterations as n and m increase.
It is interesting to consider scaling of the optimization time with the number of vertices n for instances beyond the current dataset. For a gradient-based optimization this requires computing ∂�C ma �/∂θ = a ∂�C a �/∂θ for each parameter θ , for each step in the optimization. For MaxCut and a variety of other problems 53 , the number of clauses C a is poly(n), and so there are poly(n) parameters and poly(n) partial derivatives ∂ C a /∂θ in the gradient. There are situations in which the time to compute each ∂ C a /∂θ is independent of n, specifically, when p and the graph structure are fixed such that each partial derivative can be computed using n-independent "sub-graphs" 13 . Then we need to compute poly(n) terms with fixed compute time per term, so the overall time to compute the gradient scales as poly(n). The gradient based optimization approach BFGS exhibits super-linear convergence on a variety of practical problems 54 , which supports the idea that the number of steps will not scale problematically with n. Perhaps counterintuitively, a recent investigation of variational quantum algorithms suggests that algorithms with more parameters have fewer local optima and achieve better convergence to global optima 55 , suggesting ma-QAOA may require fewer BFGS step to optimize than standard QAOA.

Discussion
We have shown that multi-angle QAOA converges to an optimal solution, and furthermore that �C� ma 1 ≥ �C� 1 , as QAOA is a special case of ma-QAOA. Additionally, the analysis of star graphs shows that there is a family of graphs that always gives larger C for MaxCut when solved with ma-QAOA than when solved with QAOA. We find significant increases in the approximation ratio in numerical optimizations for large triangle-free graphs and over the set of all non-isomorphic graphs with eight vertices, hence fewer layers are required to reach the same performance as QAOA. We also show that optimized rotation angles are often zero in ma-QAOA and this reduces the number of unitary operators per layer relative to QAOA. In the presence of noise, the reduction in number of layers and in the number of unitary operators per layer can significantly decrease the expected number of measurements needed to sample a result |z� in the distribution of the noiseless state. This could be a significant advantage for computations on noisy quantum hardware.
Interestingly, some graphs do not have a significantly higher C when solving MaxCut with ma-QAOA versus QAOA. It would be useful to characterize for which graphs the increase in C from QAOA to ma-QAOA is insignificant. This would help determine the appropriate ansatz to use in order to solve MaxCut on the graph.
One drawback to ma-QAOA is that the number of classically optimized parameters is n + m per layer, where n is the number of vertices of G and m is the number of edges. An argument can be made that if x parameters are required to optimize one iteration of ma-QAOA, the results should be compared to QAOA with the same number of parameters. This approach would require p ≈ x 2 iterations of QAOA, which may not be feasible on current hardware as a large number of layers will accumulate considerable noise. From this perspective, it is advantageous to incorporate additional parameters into a small number of circuit layers. It could be interesting to consider the comparison with the same numbers of parameters from a theoretical perspective, but it is beyond our scope here.
From a practical standpoint, one way to solve optimal ma-QAOA angles would be to calculate β and γ that optimize QAOA. We can use those angles as the initial point of a BFGS search for the optimal β p,v and γ p,a i for all vertices v and edges a i . Overall, however, the results seem to indicate that good parameters can be found in polynomial time. As many combinatorial optimization problems, like MaxCut, are NP-Hard, any polynomiallybounded effort that improves performance is likely to improve performance at large scale. Proof Recall that QAOA converges to the optimal solution for a combinatorial optimization problem, which is the maximum over the objective function 13 . Thus, in order to show convergence of ma-QAOA, we need only bound ma-QAOA from below by the value of QAOA. However, it is clear that the optimal expected value of the cost function for ma-QAOA can be no lower than that of QAOA, since QAOA is a special case of ma-QAOA when all γ p,ij = γ p,kl and all β p,a = β p,b for all edges ij, kl and all vertices a, b.
Formula for C . In order to prove that �C� ma 1 = 1 for MaxCut on star graphs, we derive a formula that calculates C ma 1 for MaxCut on triangle-free graphs.
The expected value of C after one iteration of ma-QAOA applied to MaxCut for triangle-free graphs G is The neighborhood of a vertex x, denoted Nbhd(x), is the set of vertices y such that xy ∈ E(G).

Proof
The proof of this result relies on the Pauli-solver algorithm, which is explained in detail in 56 . The proof of the result is virtually identical to that for QAOA on triangle-free graphs, but we include the proof here for completeness.
Consider edge uv and consider acting on C uv = (1/2)(I − Z u Z v ) by conjugation of the mixing operator, i∈V e −iβ 1,i B i , followed by conjugation of the phase operator, uv∈E e −iγ 1,uv C uv . We have that Note that the first term commutes with uv∈E e −iγ 1,uv C uv , so does not contribute to the expected value. Let V u be the neighborhood of u in V(G). Conjugating the third term of Eqn. (3) by uv∈E e −iγ 1,uv C uv , we get where ϒ = e −iγ 1,uv C uv e −i a∈Vu\v γ 1,ua C ua , and ϒ † is its Hermitian conjugate. By symmetry, the term for �s|ϒ † Y u Z v ϒ|s� = �s|e 2iγ 1,uv C uv e 2i a∈Vu\v γ 1,ua C ua Y u Z v |s� = �s|e −iγ 1,uv Z u Z v e −i a∈Vu\v γ 1,ua Z u Z a Y u Z v |s� = �s|(I cos γ 1,uv − i sin γ 1,uv )Z u Z v a∈V u (I cos γ 1,ua − i sin γ 1,ua Z u Z a )Y u Z v |s� = − sin γ 1,uv a∈V u cos γ 1,ua , www.nature.com/scientificreports/ Star graphs. First, we will show that C ij approaches 0.75 as n tends to infinity for QAOA. Since there are n − 1 edges in a star on n vertices, this implies C tends to 0.75(n − 1) . Additionally, n − 1 is the size of the optimal MaxCut solution, so �C� 1 /C max = 0.75. Wang, Hadfield, Jiang, and Rieffel showed that 57 where d is the deg(i) − 1 , e is the deg(j) − 1 and f is the number of triangles containing edge ij 56,57 . Let us consider the above formula applied to a star graph. Without loss of generality, let j be the center of each star. Then d = 0 , e = n − 2 , and f = 0 , since star graphs are trees. For each edge of the star, Eq. (4) reduces to We set β = π/8 , which implies sin 4β = 1 , since only one trigonometric function has β as an argument. As n tends to infinity, note cos n−2 γ tends to zero unless γ = kπ for some k ∈ N . However, if γ = kπ , sin γ = 0 . Thus, this quantity is maximized when γ = kπ , which implies C ij 1 tends to 0.75 for star graphs.
In order to prove �C� ma = n − 1 for ma-QAOA on star graphs, we examine Theorem 4.1. Without loss of generality, let u be a leaf vertex and v be the center. Note that the first product is empty, since the leaf vertices have no neighbors except the center. Thus, Theorem 4.1 reduces to Now, recall � γ 1 � β 1 C uv � γ 1 � β 1 ≤ 1 , as two vertices that have an edge between them add one to the objective function if they are in different sets. In order to obtain equality, we can set γ 1,uv = π/2 , as it is an argument for only a single sine term. Next, note that if either term in the parenthesis is one, the other must be zero. Also, setting one term equal to one allows gives an expected value of one for the edge. Let β ′ 1,u = π/2 and β ′ 1,v = 0 . Then cos β ′ 1,v = sin β ′ 1,u = 1 while cos β ′ 1,u = sin β ′ 1,v = 0 . Thus, the first term in the parenthesis is one and the second is zero. This allows us to set γ 1,vx = π/2 for all x ∈ Nbhd(v) . Since each of the n − 1 edges in the star are described similarly, �C� ma 1 = n − 1 for all n. The size of the optimal cut on a star graph is n − 1 , so �C� ma 1 /C max = 1.
Setup for computational results. In order to calculate the angles that maximize C p and C ma 1 for the eight vertex graphs, we used the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm 58 . The algorithm inputs an initial collection of angles and then uses a numerical gradient and second order approximate Hessian to find angles that converge to local maxima of C p and C ma 1 . For the eight vertex graphs, one-hundred random seeds were used to optimize C ma 1 . The results for the C p were taken from the online dataset 51 of Ref. 35 , where we performed an exhaustive analysis of QAOA performance on small graphs. These used fifty seeds for p = 1 , onehundred seeds for p = 2 , and one-thousand seeds for p = 3.
For the fifty and one-hundred vertex graphs, we used the method of moving asymptotes (MMA) algorithm 59,60 , but note that calculations with BFGS gave similar results. The C 1 were computed using Eq. (4) and the reported results were taken as the best from one-thousand initial seeds in MMA optimizations. The C ma 1 were computed with Theorem 4.1 and MMA optimization. We report results as the best optimized values from one-thousand seeds at n = 50 and from one-hundred seeds at n = 100.

Data availability
The datasets generated during and/or analysed during the current study are available in the Multi-Angle-QAOA repository, https:// code. ornl. gov/ 5ci/ multi-angle-qaoa.