Sampling Frequency Thresholds for Quantum Advantage of Quantum Approximate Optimization Algorithm

In this work, we compare the performance of the Quantum Approximate Optimization Algorithm (QAOA) with state-of-the-art classical solvers such as Gurobi and MQLib to solve the combinatorial optimization problem MaxCut on 3-regular graphs. The goal is to identify under which conditions QAOA can achieve"quantum advantage"over classical algorithms, in terms of both solution quality and time to solution. One might be able to achieve quantum advantage on hundreds of qubits and moderate depth $p$ by sampling the QAOA state at a frequency of order 10 kHz. We observe, however, that classical heuristic solvers are capable of producing high-quality approximate solutions in linear time complexity. In order to match this quality for $\textit{large}$ graph sizes $N$, a quantum device must support depth $p>11$. Otherwise, we demonstrate that the number of required samples grows exponentially with $N$, hindering the scalability of QAOA with $p\leq11$. These results put challenging bounds on achieving quantum advantage for QAOA MaxCut on 3-regular graphs. Other problems, such as different graphs, weighted MaxCut, maximum independent set, and 3-SAT, may be better suited for achieving quantum advantage on near-term quantum devices.


I. INTRODUCTION
Quantum computing promises enormous computational powers that can far outperform any classical computational capabilities [1].In particular, certain problems can be solved much faster compared with classical computing, as demonstrated experimentally by Google for the task of sampling from a quantum state [2].Thus, an important milestone [2] in quantum technology, socalled 'quantum supremacy', was achieved as defined by Preskill [3].
The next milestone, 'quantum advantage', where quantum devices solve useful problems faster than classical hardware, is more elusive and has arguably not yet been demonstrated.However, a recent study suggests a possibility of achieving a quantum advantage in runtime over specialized state-of-the-art heuristic algorithms to solve the Maximum Independent Set problem using Rydberg atom arrays [4].Common classical solutions to several potential applications for near-future quantum computing are heuristic and do not have performance bounds.Thus, proving the advantage of quantum computers is far more challenging [5][6][7].Providing an estimate of how quantum advantage over these classical solvers can be achieved is important for the community and is the subject of this paper.
Most of the useful quantum algorithms require large fault-tolerant quantum computers, which remain far in the future.In the near future, however, we can expect to have noisy intermediate-scale quantum (NISQ) devices [8].In this context variational quantum algorithms (VQAs) show the most promise [9] for the NISQ era, such as the variational quantum eigensolver (VQE) [10] and the Quantum Approximate Optimization Algorithm (QAOA) [11].Researchers have shown A particular classical algorithm may return some solution to some ensemble of problems in time TC (horizontal axis) with some quality CC (vertical axis).Similarly, a quantum algorithm may return a different solution sampled in time TQ, which may be faster (right) or slower (left) than classical, with a better (top) or worse (bottom) quality than classical.
If QAOA returns better solutions faster than the classical, then there is clear advantage (top right), and conversely no advantage for worse solutions slower than the classical (bottom left).
remarkable interest in QAOA because it can be used to obtain approximate (i.e., valid but not optimal) solutions to a wide range of useful combinatorial optimization problems [4; 12; 13].In opposition, powerful classical approximate and exact solvers have been developed to find good approximate solutions to combinatorial optimization problems.For example, a recent work by Guerreschi and Matsuura [5] compares the time to solution of QAOA vs. the classical combinatorial optimization suite AKMAXSAT.The classical optimizer takes exponential time with a small prefactor, which leads to the conclusion that QAOA needs hundreds of qubits to be faster than classical.This analysis requires the classical optimizer to find an exact solution, while QAOA yields only approximate solutions.However, modern classical heuristic algorithms are able to return an approximate solution on demand.Allowing for worse-quality solutions makes these solvers extremely fast (on the order of milliseconds), suggesting that QAOA must also be fast to remain competitive.A valid comparison should consider both solution quality and time.
In this way, the locus of quantum advantage has two axes, as shown in Fig. 1: to reach advantage, a quantum algorithm must be both faster and return better solutions than a competing classical algorithm (green, top right).If the quantum version is slower and returns worse solutions (red, bottom left) there is clearly no advantage.However, two more regions are shown in the figure.If the QAOA returns better solutions more slowly than a classical algorithm (yellow, top left), then we can increase the running time for the classical version.It can try again and improve its solution with more time.This is a crucial mode to consider when assessing advantage: heuristic algorithms may always outperform quantum algorithms if quantum time to solution is slow.Alternatively, QAOA may return worse solutions faster (yellow, bottom right), which may be useful for time-sensitive applications.In the same way, we may stop the classical algorithm earlier, and the classical solutions will become worse.
One must keep in mind that the reason for using a quantum algorithm is the scaling of its time to solution with the problem size N .Therefore, a strong quantum advantage claim should demonstrate the superior performance of a quantum algorithm in the large-N limit.
This paper focuses on the MaxCut combinatorial optimization problem on 3-regular graphs for various problem size N .MaxCut is a popular benchmarking problem for QAOA because of its simplicity and straightforward implementation.We propose a fast fixed-angle approach to running QAOA that speeds up QAOA while preserving solution quality compared with slower conventional approaches.We evaluate the expectation value of noiseless QAOA solution quality using tensor network simulations on classical hardware.We then find the time required for classical solvers to match this expected QAOA solution quality.Surprisingly, we observe that even for the smallest possible time, the classical solution quality is above our QAOA solution quality for p = 11, our largest p with known performance.Therefore, we compensate for this difference in quality by using multishot QAOA and find the number of samples K required to match the classical solution quality.K allows us to characterize quantum device parameters, such as sampling frequency, required for the quantum algorithm to match the classical solution quality.

II. RESULTS AND DISCUSSION
This section will outline the results and comparison between classical optimizers and QAOA.This has two halves: Sec.II A outlines the results of the quantum algorithm, and Sec.II B outlines the results of the classical competition.

A. Expected QAOA solution quality
The first algorithm is the quantum approximate optimization algorithm (QAOA), which uses a particular ansatz to generate approximate solutions through measurement.We evaluate QAOA for two specific modes.The first is single shot fixed angle QAOA, where a single solution is generated.This has the the benefit of being very fast.The second generalization is multi-shot fixed angle QAOA, where many solutions are generated, and the best is kept.This has the benefit that the solution may be improved with increased run time.
In Section III C we find that one can put limits on the QAOA MaxCut performance even when the exact structure of a 3-regular graph is unknown using fixed angles.We have shown that for large N the average cut fraction for QAOA solutions on 3-regular graphs converges to a fixed value f tree .If memory limitations permit, we evaluate these values numerically using tensor network simulations.This gives us the average QAOA performance for any large N and p ≤ 11.To further strengthen the study of QAOA performance estimations, we verify that for the small N , the performance is close to the same value f tree .We are able to numerically verify that for p ≤ 4 and small N the typical cut fraction is close to f tree , as shown on Fig. 6.
Combining the large-N theoretical analysis and small-N heuristic evidence, we are able to predict the average performance of QAOA on 3-regular graphs for p ≤ 11.We note that today's hardware can run QAOA up to p ≤ 4 [4] and that for larger depths the hardware noise prevents achieving better QAOA performance.Therefore, the p ≤ 11 constraint is not an important limitation for our analysis.

B. Classical solution quality and time to solution
The second ensemble of algorithms are classical heuristic or any-time algorithms.These algorithms have the property that they can be stopped mid-optimization and provide the best solution found so far.After a short time spent loading the instance, they find an initial 'zero-time' Time required for a single-shot QAOA to match classical MaxCut algorithms.The blue line shows time for comparing with the Gurobi solver and using p = 11; the yellow line shows comparison with the FLIP algorithm and p = 6.Each quantum device that runs MaxCut QAOA can be represented on this plot as a point, where the x-axis is the number of qubits and the y-axis is the time to solution.For any QAOA depth p, the quantum device should return at least one bitstring faster than the Y-value on this plot.
guess.Then, they explore the solution space and find incramentally better solutions until stopping with the best solution after a generally exponential amount of time.We experimentally evaluate the performance of the classical solvers Gurobi, MQLib using BURER2002 heuristic, and FLIP in Sec.III B. We observe that the zero-time performance, which is the quality of the fastest classical solution, is above the expected quality of QAOA p = 11, as shown in Fig. 3.The time to first solution scales almost linearly with size, as shown in Fig. 2. To compete with classical solvers, QAOA has to return better solutions faster.

C. Multi-shot QAOA
To improve the performance of QAOA, one can sample many bitstrings and then take the best one.This approach will work only if the dispersion of the cut fraction distribution is large, however.For example, if the dispersion is zero, measuring the ansatz state would return only bitstrings with a fixed cut value.By analyzing the correlations between the qubits in Section III C, we show that the distribution of the cut fraction is a Gaussian with the standard deviation on the order of 1/ √ N .The expectation value of maximum of K samples is proportional to the standard deviation, as shown in Equation 7.This equation determines the performance of multishot QAOA.In the large N limit the standard deviation is small, and one might need to measure more samples in order to match the classical performance.
If we have the mean performance of a classical algorithm, we can estimate the number of samples K required for QAOA to match the classical performance.We denote the difference between classical and quantum expected cut fraction as ∆ p (t), which is a function of the running time of the classical algorithm.Moreover, it also depends on p, since p determines QAOA expected performance.If ∆ p (t) < 0, the performance of QAOA is better, and we need only a K = 1 sample.In order to provide an advantage, QAOA would have to measure this sample faster than the classical algorithm, as per Fig. 1.On the other hand, if ∆ p (t) > 0, the classical expectation value is larger than the quantum one, and we have to perform multisample QAOA.We can find K by inverting Equation 7. In order to match the classical algorithm, a quantum device should be able to run these K samples in no longer than t.We can therefore get the threshold sampling frequency.
The scaling of ∆ p (t) with t is essential here since it determines at which point t we will have the smallest sampling frequency for advantage.We find that for BURER2002, the value of ∆(t) is the lowest for the smallest possible t = t 0 , which is when a classical algorithm can produce its first solution.To provide the lower bound for QAOA we consider t 0 as the most favourable point, since classical solution improves much faster with time than a multishot QAOA solution.This point is discussed in more detail in the Supplementary Methods.Time t 0 is shown on Fig. 2 for different classical algorithms.We note that in the figure the time scales polynomially with the number of nodes N .Figure 3  Evolution of cut fraction value in the process of running the classical algorithms solving 3-regular MaxCut with N =256.The shaded area shows 90-10 percentiles interval, and the solid line shows the mean cut fraction over 100 graphs.The dashed lines show the expectation value of single-shot QAOA for p = 6, 11, and the dash-dotted lines show the expected performance for multishot QAOA given a sampling rate of 5 kHz.Note that for this N = 256 the multi-shot QAOA with p = 6 can compete with Gurobi at 50 milliseconds.However, the slope of the multi-shot line will decrease for larger N , reducing the utility of the multi-shot QAOA.
the mean cut fraction for the same classical algorithms, as well as the expectation value of QAOA at p = 6, 11.These two figures show that a simple linear-runtime FLIP algorithm is fast and gives a performance on par with p = 6 QAOA.In this case ∆ 6 (t 0 ) < 0, and we need to sample only a single bitstring.To obtain the p = 6 sampling frequency for advantage over the FLIP algorithm, one has to invert the time from Fig. 2. If the quantum device is not capable of running p = 6 with little noise, the quantum computer will have to do multishot QAOA.Note that any classical prepossessing for QAOA will be at least linear in time since one must read the input and produce a quantum circuit.Therefore, for small p < 6 QAOA will not give significant advantage: for any fast QAOA device one needs a fast classical computer; one might just run the classical FLIP algorithm on it.
The Gurobi solver is able to achieve substantially better performance, and it slightly outperforms p = 11 QAOA.Moreover, the BURER2002 algorithm demonstrates even better solution quality than does Gurobi while being significantly faster.For both Gurobi and BURER2002, the ∆ 11 (t 0 ) > 0, and we need to either perform multishot QAOA or increase p. Figure 5 shows the advantage sampling frequency ν 11 (t 0 ) for the Gurobi and BURER2002 algorithms; note that the vertical axis is doubly exponential.
The sampling frequency is a result of two factors that work in opposite directions.On the one hand, the time to solution for a classical algorithm grows with N , and hence ν drops.On the other hand, the standard devi- Sampling frequency required to achieve MaxCut advantage using QAOA p = 11.The shaded area around the solid lines corresponds to 90-10 percentiles over 100 seeds for Gurobi and 20 seeds for BURER2002.The background shading represents comparison of a quantum computer with BURER2002 solver corresponding to modes in Fig. 1.Each quantum device can be represented on this plot as a point, where the x-axis is the number of qubits, and the y-axis is the time to solution.Depending on the region where the point lands, there are different results of comparisons.QAOA becomes inefficient for large N , when sampling frequency starts to grow exponentially with N .ation of distribution vanishes as 1/ √ N , and therefore the number of samples K grows exponentially.There is an optimal size N for which the sampling frequency is minimal.This analysis shows that there is a possibility for advantage with multi-shot QAOA for moderate sizes of N = 100..10 000, for which a sampling frequency of ≈ 10kHz is required.These frequencies are very sensitive to the difference in solution quality, and for p ≥ 12 a different presentation is needed, if one quantum sample is expected to give better than classical solution quality.This is discussed in more detail in Supplementary Methods.
For large N , as expected, we see a rapid growth of sampling frequency, which indicates that QAOA does not scale for larger graph sizes, unless we go to higher depth p > 11.The color shading shows correspondence with Fig. 1.If the quantum device is able to run p ≥ 11 and its sampling frequency and the number of qubits N corresponds to the green area, we have a quantum advantage.Otherwise, the quantum device belongs to the red area, and there is no advantage.
It is important to note the effect of classical parallelization on our results.Despite giving more resources to the classical side, parallel computing is unlikely to help it.To understand this, one has to think on how parallelization would change the performance profile as shown on Figure 4.The time to the first classical solution is usually bound from below by preparation tasks such as reading the graph, which are inherently serial.Thus, parallelization will not reduce t 0 and is in fact likely to increase it due to communication overhead.Instead, it will increase the slope of the solution quality curve, helping classical algorithms to compete in the convergence regime.

D. Discussion
As shown in Fig. 1, to achieve quantum advantage, QAOA must return better solutions faster than the competing classical algorithm.This puts stringent requirements on the speed of QAOA, which previously may have gone unevaluated.If QAOA returns a solution more slowly, the competing classical algorithm may 'try again' to improve its solution, as is the case for anytime optimizers such as the Gurobi solver.The simplest way to improve the speed of QAOA is to reduce the number of queries to the quantum device, which we propose in our fixed-angle QAOA approach.This implementation forgoes the variational optimization step and uses solution concentration, reducing the number of samples to order 1 instead of order 100,000.Even with these improvements, however, the space of quantum advantage may be difficult to access.
Our work demonstrates that with a quantum computer of ≈ 100 qubits, QAOA can be competitive with classical MaxCut solvers if the time to solution is shorter than 100 µs and the depth of the QAOA circuit is p ≥ 6.Note that this time to solution must include all parts of the computation, including state preparation, gate execution, and measurement.Depending on the parallelization of the architecture, there may be a quadratic time overhead.However, the required speed of the quantum device grows with N exponentially.Even if an experiment shows advantage for intermediate N and p ≤ 11, the advantage will be lost on larger problems regardless of the quantum sampling rate.Thus, in order to be fully competitive with classical MaxCut solvers, quantum computers have to increase solution quality, for instance by using p ≥ 12. Notably, p = 12 is required but not sufficient for achieving advantage: the end goal is obtaining a cut fraction better than ≥ 0.885 for large N , including overcoming other challenges of quantum devices such as noise.
These results lead us to conclude that for 3-regular graphs (perhaps all regular graphs), achieving quantum advantage on NISQ devices may be difficult.For example, the fidelity requirements to achieve quantum advantage are well above the characteristics of NISQ devices.
We note that improved versions of QAOA exist, where the initial state is replaced with a preoptimized state [14] or the mixer operator is adapted to improve performance [15; 16].One also can use information from classical solvers to generate a better ansatz state [17].These algorithms have further potential to compete against classical MaxCut algorithms.Also, more general problems, such as weighted MaxCut, maximum independent set, and 3-SAT, may be necessary in order to find problem instances suitable for achieving quantum advantage.
When comparing with classical algorithms, one must record the complete time to solution from the circuit configuration to the measured state.This parameter may be used in the extension of the notion of quantum volume, which is customarily used for quantum device characterization.Our work shows that QAOA MaxCut does not scale with graph size for at least up to p ≤ 11, thus putting quantum advantage for this problem away from the NISQ era.

III. METHODS
Both classical solvers and QAOA return a bitstring as a solution to the MaxCut problem.To compare the algorithms, we must decide on a metric to use to measure the quality of the solution.A common metric for QAOA and many classical algorithms is the approximation ratio, which is defined as the ratio of cut value (as defined in Eq. ( 3)) of the solution divided by the optimal (i.e., maximum possible) cut value for the given graph.This metric is hard to evaluate heuristically for large N , since we do not know the optimal solution.We therefore use the cut fraction as the metric for solution quality, which is the cut value divided by the number of edges.
We analyze the algorithms on an ensemble of problem instances.Some instances may give advantage, while others may not.We therefore analyze ensemble advantage, which compares the average solution quality over the ensemble.The set of 3-regular graphs is extremely large for large graph size N , so for classical heuristic algorithms we evaluate the performance on a subset of graphs.We then look at the mean of the cut fraction over the ensemble, which is the statistical approximation of the mean of the cut fraction over all 3-regular graphs.

A. QAOA Methodology
Usually QAOA is thought of as a hybrid algorithm, where a quantum-classical outer loop optimizes the angles γ, β through repeated query to the quantum device by a classical optimizer.Depending on the noise, this process may require hundreds or thousands of queries in order to find optimal angles, which slows the computation.To our knowledge, no comprehensive work exists on exactly how many queries may be required to find such angles.It has been numerically observed [6; 18], however, that for small graph size N = 12 and p = 4, classical noise-free optimizers may find good angles in approximately 100 steps, which can be larger for higher N and p.Each step may need order 10 3 bitstring queries to average out shot noise and find expectation values for an optimizer, and thus seeking global angles may require approximately 100 000 queries to the simulator.The angles are then used for preparing an ansatz state, which is in turn measured (potentially multiple times) to obtain a solution.Assuming a sampling rate of 1 kHz, this approach implies a QAOA solution of approximately 100 seconds.
Recent results, however, suggest that angles may be precomputed on a classical device [19] or transferred from other similar graphs [20].Further research analytically finds optimal angles for p ≤ 20 and d → ∞ for all largegirth d-regular graphs, but does not give angles for finite d [21].Going a step further, a recent work finds that evaluating regular graphs at particular fixed angles has good performance on all problem instances [22].These precomputed or fixed angles allow the outer loop to be bypassed, finding close to optimal results in a single shot.In this way, a 1000 Hz QAOA solution can be found in milliseconds, a speedup of several ordesr of magnitude.
For this reason we study the prospect for quantum advantage in the context of fixed-angle QAOA.For dregular graphs, there exist particular fixed angles with universally good performance [23].Additionally, as will be shown in Section III E, one can reasonably expect that sampling a single bitstring from the fixed-angle QAOA will yield a solution with a cut fraction close to the expectation value.
The crucial property of the fixed-angle single-shot approach is that it is guaranteed to work for any graph size N .On the other hand, angle optimisation could be less productive for large N , and the multiple-shot (measuring the QAOA ansatz multiple times) approach is less productive for large N , as shown in Section III F.Moreover, the quality of the solution scales with depth as √ p [23], which is faster than with the number of samples √ log K, instructing us to resort to multishot QAOA only if larger p is unreachable.Thus, the fixed-angle single-shot QAOA can robustly speed up finding a good approximate solution from the order of seconds to milliseconds, a necessity for advantage over state-of-the-art anytime heuristic classical solvers, which can get good or exact solutions in approximately milliseconds.Crucially, single-shot QAOA quality of solution can be maintained for all sizes N at fixed depth p, which can mean constant time scaling, for particularly capable quantum devices.
To simulate the expectation value of the cost function for QAOA, we employ a classical quantum circuit simulation algorithm QTensor [24][25][26].This algorithm is based on tensor network contraction and is described in more detail in Supplementary Methods.Using this approach, one can simulate expectation values on a classical computer even for circuits with millions of qubits.

B. Classical Solvers
Two main types of classical MaxCut algorithms exist: approximate algorithms and heuristic solvers.Approximate algorithms guarantee a certain quality of solution for any problem instance.Such algorithms [27; 28] also provide polynomial-time scaling.Heuristic solvers [29; 30] are usually based on branch-and-bound methods [31] that use branch pruning and heuristic rules for variable and value ordering.These heuristics are usually designed to run well on graphs that are common in practical use cases.Heuristic solvers typically return better solutions than do approximate solvers, but they provide no guarantee on the quality of the solution.
The comparison of QAOA with classical solvers thus requires making choices of measures that depend on the context of comparison.From a theory point of view, guaranteed performance is more important; in contrast, from an applied point of view, heuristic performance is the measure of choice.A previous work [22] demonstrates that QAOA provides better performance guarantees than does the Goemans-Williamson algorithm [28].In this paper we compare against heuristic algorithms since such a comparison is more relevant for real-world problems.On the other hand, the performance of classical solvers reported in this paper can depend on a particular problem instance.
We evaluate two classical algorithms using a single node of Argonne's Skylake testbed; the processor used is an Intel Xeon Platinum 8180M CPU @ 2.50 GHz with 768 GB of RAM.
The first algorithm we study is the Gurobi solver [29], which is a combination of many heuristic algorithms.We evaluate Gurobi with an improved configuration based on communication with Gurobi support [32].We use Symmetry=0 and PreQLinearize=2 in our improved configuration.As further tweaks and hardware resources may increase the speed, the results here serve as a characteristic lower bound on Gurobi performance rather than a true guarantee.We run Gurobi on 100 random-regular graphs for each size N and allow each optimization to run for 30 minutes.During the algorithm runtime we collect information about the process, in particular the quality of the best-known solution.In this way we obtain a performance profile of the algorithm that shows the relation between the solution quality and the running time.An example of such a performance profile for N = 256 is shown in Fig. 4. Gurobi was configured to use only a single CPU, to avoid interference in runtime between different Gurobi optimization runs for different problem instances.In order to speed up collection of the statistics, 55 problem instances were executed in parallel.
The second algorithm is MQLib [30], which is implemented in C++ and uses a variety of different heuristics for solving MaxCut and QUBO problems.We chose the BURER2002 heuristic since in our experiments it performs the best for MaxCut on random regular graphs.Despite using a single thread, this algorithm is much faster than Gurobi; thus we run it for 1 second.In the same way as with Gurobi, we collect the performance profile of this algorithm.
While QAOA and Gurobi can be used as generalpurpose combinatorial optimization algorithms, this algorithm is designed to solve MaxCut problems only, and the heuristic was picked that demonstrated the best performance on the graphs we considered.In this way we use Gurobi as a worst-case classical solver, which is capable of solving the same problems as QAOA can.Moreover, Gurobi is a well-established commercial tool that is widely used in industry.Note, however, that we use QAOA fixed angles that are optimized specifically for 3-regular graphs, and one can argue that our fixed-angle QAOA is an algorithm designed for 3-regular MaxCut.For this reason we also consider the best-case MQLib+BURER2002 classical algorithm, which is designed for MaxCut, and we choose the heuristic that performs best on 3-regular graphs.

C. QAOA performance
Two aspects are involved in comparing the performance of algorithms, as outlined in Fig. 1: time to solution and quality of solution.In this section we evaluate the performance of single-shot fixed-angle QAOA.As discussed in the introduction, the time to solution is a crucial part and for QAOA is dependent on the initialization time and the number of rounds of sampling.Single-shot fixed-angle QAOA involves only a single round of sampling, and so the time to solution can be extremely fast, with initialization time potentially becoming the limiting factor.This initialization time is bound by the speed of classical computers, which perform calibration and device control.Naturally, if one is able to achieve greater initialization speed by using better classical computers, the same computers can be used to improve the speed of solving MaxCut classically.Therefore, it is also important to consider the time scaling of both quantum initialization and classical runtime.
The quality of the QAOA solution is the other part of performance.The discussion below evaluates this feature by using subgraph decompositions and QAOA typicality, including a justification of single shot sampling.
QAOA is a variational ansatz algorithm structured to provide solutions to combinatorial optimization problems.The ansatz is constructed as p repeated applications of an objective Ĉ and mixing B unitary: where B is a sum over Pauli X operators B = N i σi x .A common problem instance is MaxCut, which strives to bipartition the vertices of some graph G such that the maximum number of edges have vertices in opposite sets.Each such edge is considered to be cut by the bipartition.This may be captured in the objective function whose eigenstates are bipartitions in the Z basis, with eigenvalues that count the number of cut edges.To get the solution to the optimization problem, one prepares the ansatz state |γ, β on a quantum device and then measures the state.The measured bitstring is the solution output from the algorithm.
While QAOA is guaranteed to converge to the exact solution in the p → ∞ limit in accordance with the adiabatic theorem [11; 33], today's hardware is limited to low depths p ∼ 1 to 5, because of the noise and decoherence effects inherent to the NISQ era.
A useful tool for analyzing the performance of QAOA is the fact that QAOA is local [11; 12]: the entanglement between any two qubits at a distance of ≥ 2p steps from each other is strictly zero.For a similar reason, the expectation value of a particular edge ij depends only on the structure of the graph within p steps of edge ij .Regular graphs have a finite number of such local structures (also known as subgraphs) [22], and so the expectation value of the objective function can be rewritten as a sum over subgraphs Here, λ indexes the different possible subgraphs of depth p for a d regular graph, M λ (G) counts the number of each subgraph λ for a particular graph G, and f λ is the expectation value of the subgraph (e.g., Eq. (4)).For example, if there are no cycles ≤ 2p + 1, only one subgraph (the tree subgraph) contributes to the sum.
With this tool we may ask and answer the following question: What is the typical performance of singleshot fixed-angle QAOA, evaluated over some ensemble of graphs?Here, performance is characterized as the typical (average) fraction of edges cut by a bitstring solution returned by a single sample of fixed-angle QAOA, averaged over all graphs in the particular ensemble.
For our study we choose the ensemble of 3-regular graphs on N vertices.Different ensembles, characterized by different connectivity d and size N , may have different QAOA performance [34; 35].
Using the structure of the random regular graphs, we can put bounds on the cut fraction by bounding the number of different subgraphs and evaluating the number of large cycles.These bounds become tighter for N −→ ∞ and fixed p since the majority of subgraphs become trees and 1-cycle graphs.We describe this analysis in detail in Supplemental methods, which shows that the QAOA cut fraction will equal the expectation value on the tree subgraph, which may be used as a 'with high probability' (WHP) proxy of performance.Furthermore, using a subgraph counting argument, we may count the number of tree subgraphs to find an upper and lower WHP bound on the cut fraction for smaller graphs.These bounds are shown as the boundaries of the red and green regions in Fig. 6.

D. QAOA Ensemble Estimates
A more straightforward but less rigorous characterization of QAOA performance is simply to evaluate fixedangle QAOA on a subsample of graphs in the ensemble.The results of such an analysis require an assumption not on the particular combinatorial graph structure of ensembles but instead on the typicality of expectation values on subgraphs.This is an assumption on the structure of QAOA and allows an extension of typical cut fractions from the large N limit where most subgraphs are trees to a small N limit where typically a very small fraction of subgraphs are trees.
Figure 6 plots the ensemble-averaged cut fraction for p = 2 and various sizes of graphs.For N ≤ 16, the ensemble includes every 3-regular graph (4,681 in total).For each size of N > 16, we evaluate fixed-angle QAOA on 1,000 3-regular graphs drawn at random from the ensemble of all 3-regular graphs for each size N ∈ (16,256].Note that because the evaluation is done at fixed angles, it may be done with minimal quantum calculation by a decomposition into subgraphs, then looking up the subgraph expectation value f λ from [22].This approach is also described in more detail in [36].In this way, expectation values can be computed as fast as an isomorphism check. From Fig. 6 we observe that the median cut fraction across the ensemble appears to concentrate around that of the tree subgraph value, even for ensembles where the typical graph is too small to include many tree subgraphs.Additionally, the variance (dark fill) reduces as N increases, consistent with the fact that for larger N there are fewer kinds of subgraphs with non-negligible frequency.Furthermore, the absolute range (light fill), which plots the largest and smallest expectation value across the ensemble, is consistently small.While the data for the absolute range exists here only for N ≤ 16 because of complete sampling of the ensemble, 0ne can reasonably expect that these absolute ranges extend for all N , suggesting that the absolute best performance of p = 2 QAOA on 3-regular graphs is around ≈ 0.8.
We numerically observe across a range of p (not shown) that these behaviors persist: the typical cut fraction is approximately equal to that of the tree subgraph value f p-tree even in the limit where no subgraph is a tree.This suggests that the typical subgraph expectation value f λ ≈ f p-tree , and only an atypical number of subgraphs have expectation values that diverge from the tree value.With this observation, we may use the value f p-tree as a proxy for the average cut fraction of fixed-angle QAOA.
These analyses yield four different regimes for advantage vs. classical algorithms, shown in Fig. 6.If a classical algorithm yields small cut fractions for large graphs (green, bottom right), then there is advantage in a strong sense.Based only on graph combinatorics, with high probability most of the edges participate in few cycles, and thus the cut fraction is almost guaranteed to be around the tree value, larger than the classical solver.Conversely, if the classical algorithm yields large cut fractions for large graphs (red, top right), there is no advantage in the strong sense: QAOA will yield, for example, only ∼ 0.756 for p = 2 because most edges see no global structure.This analysis emphasizes that of [12], which suggests that QAOA needs to 'see' the whole graph in order to get reasonable performance.Two additional performance regimes for small graphs exist, where QAOA can reasonably see the whole graph.If a classical algorithm yields small cut fractions for small graphs (yellow, bottom left), then there is advantage in a weak sense, which we call the 'ensemble advantage'.Based on QAOA concentration, there is at least a 50% chance that the QAOA result on a particular graph will yield a better cut fraction than will the classical algorithm; assuming that the variance in cut fraction is small, this is a 'with high probability' statement.Conversely, if the classical algorithm yields large cut fractions for small graphs (orange, top left), there is no advantage in a weak sense.Assuming QAOA concentration, the cut fraction will be smaller than the classical value, and for some classical cut fraction there are no graphs with advantage (e.g., > 0.8 for p = 2).
Based on these numerical results, we may use the expectation value of the tree subgraph f p-tree as a high- FIG. 7. Long-range antiferromagnetic correlation coefficient on the 3-regular Bethe lattice, which is a proxy for an N → ∞ typical 3-regular graph.Horizontal indexes the distance between two vertices.QAOA is strictly local, which implies that no correlations exist between vertices a distance > 2p away.As shown here, however, these correlations are exponentially decaying with distance.This suggests that even if the QAOA 'sees the whole graph', one can use the central limit theorem to argue that the distribution of QAOA performance is Gaussian with the standard deviation of ∝ 1/ √ N probability proxy for typical fixed-angle QAOA performance on regular graphs.For large N , this result is validated by graph-theoretic bounds counting the typical number of tree subgraphs in a typical graph.For small N , this result is validated by fixed-angle QAOA evaluation on a large ensemble of graphs.

E. Single-shot QAOA Sampling
A crucial element of single-shot fixed-angle QAOA is that the typical bitstring measured from the QAOA ansatz has a cut value similar to the average.This fact was originally observed by Farhi et al. in the original QAOA proposal [11]: because of the strict locality of QAOA, vertices a distance more than > 2p steps from each other have a ZZ correlation of strictly zero.Thus, for large graphs with a width > 2p, by the central limit theorem the cut fraction concentrates to a Gaussian with a standard deviation of order 1 √ N around the mean.As the variance grows sublinearly in N , the values concentrate at the mean, and thus with high probability measuring a single sample of QAOA will yield a solution with a cut value close to the average.However, this result is limited in scope for larger depths p, because it imposes no requirements on the strength of correlations for vertices within distance ≤ 2p.Therefore, here we strengthen the argument of Farhi et al. and show that these concentration results may persist even in the limit of large depth p and small graphs N .We formalize these results by evaluating the ZZ correlations of vertices within 2p steps, as shown in Fig. 7. Expectation values are computed on the 3-regular Bethe lattice, which has no cycles and thus can be considered the N → ∞ typical limit.Instead of computing the nearest-neighbor correlation function, the x-axis computes the correlation function between vertices a certain distance apart.For distance 1, the correlations are that of the objective function f p-tree .Additionally, for distance > 2p, the correlations are strictly zero in accordance with the strict locality of QAOA.For distance ≤ 2p, the correlations are exponentially decaying with distance.Consequently, even for vertices within the lightcone of QAOA, the correlation is small; and so by the central limit theorem the distribution will be Gaussian.This result holds because the probability of having a cycle of fixed size converges to 0 as N → ∞.In other words, we know that with N → ∞ we will have a Gaussian cost distribution with standard deviation ∝ 1 √ N .When considering small N graphs, ones that have cycles of length ≤ 2p + 1, we can reasonably extend the argument of Section III D on typicality of subgraph expectation values.Under this typicality argument, the correlations between close vertices is still exponentially decaying with distance, even though the subgraph may not be a tree and there are multiple short paths between vertices.Thus, for all graphs, by the central limit theorem the distribution of solutions concentrates as a Gaussian with a standard deviation of order 1 √ N around the mean.By extension, with probability ∼ 50%, any single measurement will yield a bitstring with a cut value greater than the average.These results of cut distributions have been found heuristically in [37].
The results are a full characterization of the fixed-angle single-shot QAOA on 3-regular graphs.Given a typical graph sampled from the ensemble of all regular graphs, the typical cut fraction from level p QAOA will be about that of the expectation value of the p-tree f p-tree .The distribution of bitstrings is concentrated as a Gaussian of subextensive variance around the mean, indicating that one can find a solution with quality greater than the mean with order 1 samples.Furthermore, because the fixed angles bypass the hybrid optimization loop, the number of queries to the quantum simulator is reduced by orders of magnitude, yielding solutions on potentially millisecond timescales.

F. Mult-shot QAOA Sampling
In the preceding section we demonstrated that the standard deviation of MaxCut cost distribution falls as 1/ √ N , which deems impractical the usage of multiple shots for large graphs.However, it is worth verifying more precisely its effect on the QAOA performance.The multiple-shot QAOA involves measuring the bitstring from the same ansatz state and then picking the bitstring with the best cost.To evaluate such an approach, we need to find the expectation value for the best bitstring over K measurements.
As shown above, the distribution of cost for each measured bitstring is Gaussian, p(x) = G( x−µp σ N ).We define a new random variable ξ which is the cost of the best of K bitstrings.The cumulative distribution function (CDF) of the best of K bitstrings is F K (ξ), and F 1 (ξ) is the CDF of a normal distribution.The probability density for ξ is where F 1 (ξ) = ξ −∞ p(x)dx and F K 1 is the ordinary exponentiation.The expectation value for ξ can be found by While the analytical expression for the integral can be extensive, a good upper bound exists for it: Combined with the 1/ √ N scaling of the standard deviation, we can obtain a bound on improvement in cut fraction from sampling K times: where γ p is a scaling parameter.The value ∆ is the difference of solution quality for multishot and singleshot QAOA.Essentially it determines the utility of using multishot QAOA.We can determine the scaling constant γ p by classically simulating the distribution of the cost value in the ansatz state.We perform these simulations using QTensor for an ensemble of graphs with N ≤ 26 to obtain γ 6 = 0.1926 and γ 11 = 0.1284.It is also worthwhile to verify the 1/ √ N scaling, by calculating γ p for various N .We can do so for smaller p = 3 and graph sizes N ≤ 256.We calculate the standard deviation by ∆C = C 2 − C 2 and evaluate the C 2 using QTensor.This evaluation gives large light cones for large p; the largest that we were able to simulate is p = 3.From the deviations ∆C we can obtain values for γ 3 .We find that for all N the values stay within 5% of the average over all N .This shows that they do not depend on N , which in turn signifies that the 1/ √ N scaling is a valid model.The results of numerical simulation of the standard deviation are discussed in more detail in the Supplementary Methods.
To compare multishot QAOA with classical solvers, we plot the expected performance of multishot QAOA in Fig. 4 as dash-dotted lines.We assume that a quantum device is able to sample at the 5kHz rate.Today's hardware is able to run up to p = 5 and achieve the 5 kHz sampling rate [38].Notably, the sampling frequency of modern quantum computers is bound not by gate duration, but by qubit preparation and measurement.
For small N , reasonable improvement can be achieved by using a few samples.For example, for N = 256 with p = 6 and just K = 200 shots, QAOA can perform as well as single-shot p = 11 QAOA.For large N , however, too many samples are required to obtain substantial improvement for multishot QAOA to be practical.

G. Classical performance
To compare the QAOA algorithm with its classical counterparts, we choose the state-of-the art algorithms that solve the similar spectrum of problems as QAOA, and we evaluate the time to solution and solution quality.Here, we compare two algorithms: Gurobi and MQLib+BURER2002.Both are anytime heuristic algorithms that can provide an approximate solution at arbitrary time.For these algorithms we collect the 'performance profiles'-the dependence of solution quality on time spent finding the solution.We also evaluate performance of a simple MaxCut algorithm FLIP.This algorithm has a proven linear time scaling with input size.It returns a single solution after a short time.To obtain a better FLIP solution, one may run the algorithm several times and take the best solution, similarly to the multishot QAOA.
Both algorithms have to read the input and perform some initialization step to output any solution.This initialization step determines the minimum time required for getting the initial solution-a 'first guess' of the algorithm.This time is the leftmost point of the performance profile marked with a star in Fig. 4. We call this time t 0 and the corresponding solution quality 'zero-time performance'.
We observe two important results.
1. Zero-time performance is constant with N and is comparable to that of p = 11 QAOA, as shown in Fig. 3, where solid lines show classical performance and dashed lines show QAOA performance.
2. t 0 scales as a low-degree polynomial in N , as shown in Fig. 2. The y-axis is t 0 for several classical algorithms.
Since the zero-time performance is slightly above the expected QAOA performance at p = 11, we focus on analyzing this zero-time regime.In the following subsections we discuss the performance of the classical algorithms and then proceed to the comparison with QAOA.

H. Performance of Gurobi Solver
In our classical experiments, as mentioned in Section III B, we collect the solution quality with respect to time for multiple N and graph instances.An example averaged solution quality evolution is shown in Fig. 4 for an ensemble of 256 vertex 3-regular graphs.Between times 0 and t 0,G , the Gurobi algorithm goes through some initialization and quickly finds some naive approximate solution.Next, the first incumbent solution is generated, which will be improved in further runtime.Notably, for the first 50 milliseconds, no significant improvement to solution quality is found.After that, the solution quality starts to rise and slowly converge to the optimal value of ∼ 0.92.
It is important to appreciate that Gurobi is more than just a heuristic solver: in addition to the incumbent solution, it always returns an upper bound on the optimal cost.When the upper bound and the cost for the incumbent solution match, the optimal solution is found.It is likely that Gurobi spends a large portion of its runtime on proving the optimality by lowering the upper bound.This emphasizes that we use Gurobi as a worst-case classical solver.
Notably, the x-axis of Fig. 4 is logarithmic: the lower and upper bounds eventually converge after exponential time with a small prefactor, ending the program and yielding the exact solution.Additionally, the typical upper and lower bounds of the cut fraction of the best solution are close to 1.Even after approximately 10 seconds for a 256-vertex graph, the algorithm returns cut fractions with very high quality ∼ 0.92, far better than intermediate-depth QAOA.
The zero-time performance of Gurobi for N = 256 corresponds to the Y-value of the star marker on Fig. 4. We plot this value for various N in Fig. 3.As shown in the figure, zero-time performance goes up and reaches a constant value of ∼ 0.882 at N ∼ 100.Even for large graphs of N = 10 5 , the solution quality stays at the same level.
Such solution quality is returned after time t 0,G , which we plot in Fig. 2 for various N .For example, for a 1000node graph it will take ∼ 40 milliseconds to return the first solution.Evidently, this time scales as a low-degree polynomial with N .This shows that Gurobi can consistently return solutions of quality ∼ 0.882 in polynomial time.

I. Performance of MQLib+BURER2002 and FLIP Algorithms
The MQLib algorithm with the BURER2002 heuristic shows significantly better performance, which is expected since it is specific to MaxCut.As shown in Fig. 4 for N = 256 and in Fig. 2 for various N , the speed of this algorithm is much better compared with Gurobi's.Moreover, t 0 for MQLib also scales as a low-degree polynomial, and for 1,000 nodes MQLib can return a solution in 2 milliseconds.The zero-time performance shows the same constant behavior, and the value of the constant is slightly higher than that of Gurobi, as shown in Fig. 3.
While for Gurobi and MQLib we find the time scaling heuristically, the FLIP algorithm is known to have linear time scaling.With our implementation in Python, it shows speed comparable to that of MQLib and solution quality comparable to QAOA p = 6.We use this algorithm as a demonstration that a linear-time algorithm can give constant performance for large N , averaged over multiple graph instances.

A. Quantum simulator QTensor
The most popular method for quantum circuit simulation is state-vector evolution.It stores full state vector and hence requires memory size ∝ 2 N , exponential in the number of qubits.It can simulate only circuits with 30 qubits on a common computer, and simulation of 45 qubits on a supercomputer was reported [39].One can perform compression of the state vector and therefore simulate up to 61 qubits on a supercomputer [40].
For this work, we aim to study performance of QAOA on instances large enough to use in comparison with classical solvers.To simulate expectation values on lange graphs, we use the classical simulator QTensor [24; 41].This simulator is based on tensor network contraction and allows for simulation of a much larger number of qubits.QTensor converts a quantum circuit to a tensor network, where each quantum gate is represented by a tensor.The indices of this tensor represent input and output subspaces of each qubit that the gate acts on.The tensor network constructed in this way can then be contracted in an efficient manner to compute the required value.The contraction does not maintain any information about the structure of the original quantum circuit, and can result in significant simulation cost reduction, not limited to the QAOA context [42].
To calculate an expectation value of some observable R in a state generated by a circuit Û , one can evaluate the following expression: 0| Û † R Û |0 .This value can be calculated by using a tensor network as well.When applied to the MaxCut QAOA problem, the R operator is a sum of smaller terms, as shown in Eq. 3. The expectation value of the cost for the graph G and QAOA depth p is then where f jk = γ, β|(1 − σj z σk z )|γ, β /2 is an individual edge contribution to the total cost function.Each contribution to the cost function f jk can be evaluated by using a corresponding tensor network.Note that the observable σj z σk z in the definition of f jk acts only on two qubits and hence commutes with gates that act on other qubits.The γ, β| state is not stored in memory at any time but rather is represented as a tensor network gen-erated from the quantum circuit shown in Eq. 2. When two such tensor network representations (one for γ, β| and another for |γ, β ) are joined aside of the observable operator, it is possible to cancel out the quantum gates that commute through the observable, thereby significantly reducing the size of the tensor network.The tensor network after the cancellation is equivalent to calculating σj z σk z on a subgraph S of the original graph G.While multiple approaches exist for determining the best way to contract a tensor network, we use a contraction approach called bucket elimination [43], which contracts one index at a time.At each step we choose some index j from the tensor expression and then sum over a product of tensors that have j in their index.The size of the intermediary tensor obtained as a result of this operation is very sensitive to the order in which indices are contracted.To find a good contraction ordering, we use a line graph of the tensor network.A tree decomposition [44] of the line graph corresponds to a contraction path that guarantees that the number of indices in the largest intermediary tensor will be equal to the width of the tree decomposition [45].In this way one can simulate QAOA to reasonable depth on hundreds or thousands of qubits.More details of QTensor and tensor networks are in [24; 46; 47].

B. FLIP Algorithm
As an example of a simple and fast classical MaxCut algorithm we evaluate a local search algorithm.This class of heuristic is frequently referred to as FLIP [48] or FLIPneighborhood [49].A FLIP algorithm heuristic searches locally for improvements to the current solution that flip a single vertex.One still retains the freedom to choose how to search for the vertex to flip at each stage of the algorithm.Examples of vertex selection methods include randomized message passing [50] and zero temperature annealing.
The FLIP algorithm is as follows.First, for each vertex of a graph, initially assign a value of 0 or 1 at random with equal probability.From this starting point, randomly order the vertices of the graph, iterate through each vertex in order, and flip the vertex if flipping it will improve the cut value.Once all vertices have been iterated through, repeat this process until no vertices are changed in a full iteration.This procedure is analogous to zero-temperature Monte Carlo annealing and is a greedy solver.The end result is a partition in which flipping any individual vertex will not improve the cut size.On 3-regular graphs we observe that this algorithm runs on graphs of N = 10, 000 nodes in about 70 ms on an Intel I9-10900K processor and gives a mean cut fraction of 0.847, which matches the performance of p = 6 QAOA.
Given more time, the FLIP algorithm can improve its performance by reinitializing with a new random choice of vertex assignments and vertex orderings, as shown in Fig. 4. Given an exponential number of repetitions, the algorithm will eventually converge on the exact result, although very slowly.
As a local binary algorithm, it runs into locality restrictions [51] and less-than-ideal performance but is extremely fast.To put this into perspective with QAOA, we implemented FLIP using Python.We observe that a simple implementation returns solutions for a 100,000 vertex 3-regular graph in < 1 second.Optimized or parallelized implementation using high-performance languages such as C++ may run several times faster.The main property is that for a graph of degree k, girth L, and size N , the FLIP algorithm runtime scales as O(N Lk) [50], which we verify experimentally.Notably, for any quantum initialisation step the time scaling would also be at least O(N ), since we have to somehow move information about the graph to the quantum device.

C. Graph statistics bounds
It is known [52; 53] that in the limit N → ∞, the probability of a graph having t cycles of length l, where connectivity d is odd, is asymptotic to Summing over all t, we find that the average number of cycles of size l is equal to the same size-independent constant This is a probabilistic estimate based on statistics of regular graphs, does not depend on QAOA, and is asymtotically precise; thus, this is a 'with high probability' (WHP) result.For 3 regular graphs, the small values of k l = [1.33,2.00, 3.20, increasing exponentially.These sparse cycles modify the subgraph that QAOA 'sees' in its local environment, as shown in Supplementary Fig. 1.Each cycle of length l modifies the subgraph of edges; l for the edges which participate in the cycle, plus some additional number that is exponential in p.For example, a 4-cycle of a 3 regular graph and p = 3 modifies 16 edges, as shown in Supplementary Fig. 1 This count of modified edges serves as an upper bound on the number of tree subgraphs as an edge may participate in more than one cycle.Note that M p-tree should trivially be ≥ 0, which occurs for large p and small N .For p = 2 and 3 regular graphs, this value is M 2-tree ≥ M − 40, and serves as a characteristic scale for when QAOA will begin to see global structure.
The expectation value as a sum over subgraphs can then be broken into two parts: the tree subgraph and everything else.The sum can then be bounded by fixing an extremal value for the expectation value of every other subgraph, knowing that 0 ≤ f min ≤ f λ ≤ f max ≤ 1. Ĉ = M p-tree f tree + λ =p-tree ≤ M − M p-tree (f max − f p-tree ), (6) ≥ M p-tree f p-tree + (M − M p-tree )f min (7) Combined with the lower bound of Supplemental Eq. 4, these bounds are Ĉ Using the enumeration of subgraphs from [22], f min,p=2 = 0.4257 and f max,p=2 = 0.8771.Thus, for p = 2 and 3 regular QAOA, the performance can be bounded to be between Therefore, for N large, the value of Ĉ is bounded from above and below by a constant amount and converges to the tree value.Similarly, for N small, the value of Ĉ is bounded between 0 and 1 as WHP every edge participates in at least one cycle of length l ≤ 2p + 1 and so there are no tree subgraphs to contribute to the count.In principle, these bounds may be tightened by including expectation values f λ for more subgraphs.This is a 'with high probability' result: there may be extremely atypical graphs that have much different numbers of tree and single cycle subgraphs.For example, an atypical graph may be one of size N that is two graphs of size N/2 connected by a single edge.This bound is a generalization of the work of [12], which observes that the QAOA needs to 'see the whole graph' in order to have advantage.Here the upper and lower bounds are based on the same argument, except generalized to the small-N regime

D. Experimental validation of standard deviation scaling
We verify numerically the ∝ 1 √ N scaling of standard deviation using two approaches.Besides the validation of the theoretical results, these calculations allow to find the scaling coefficient to use in Equation 7.
The first approach is to calculate multiple probability amplitudes for each graph and estimate the variation by drawing a large number of samples from the distribution.This approach is feasible for graphs of small size and large p since it only requires to calculate a subset probability amplitudes.For size N < 30 any p is feasible since it's possible to fit the full statevector in memory.The second approach is to construct tensor network for observable C 2 , then calculate the variance using V = C 2 − C 2 .The quadratic observable has to be calculated for each pair of edges, and introduces ∝ N 2 tensor networks.This approach allows to get the exact value of standard deviation without sampling error.Additionally, it allows to apply the lightcone optimization, which reduces computational cost for large graphs.However, the complexity grows rapidly with p and we observe that only p = 2, 3 are feasible on our hardware.
We can see that for N ≤ 26 there is good agreement with theoretical predictions across different values of p.We use these approximate small-N results to find the scaling coefficient for each p.This will be used in the rest of the paper to estimate the standard deviation for arbitrary N and fixed p.In particular, Figure 5 uses these estimations to obtain number of multi-shot QAOA

samples.
To further verify the validity of our prediction, we use exact calculations of variance via QTensor.These values are shown as crosses on Supplementary Fig. 2 and are

FIG. 1 .
FIG. 1. Locus of quantum advantage over classical algorithms.A particular classical algorithm may return some solution to some ensemble of problems in time TC (horizontal axis) with some quality CC (vertical axis).Similarly, a quantum algorithm may return a different solution sampled in time TQ, which may be faster (right) or slower (left) than classical, with a better (top) or worse (bottom) quality than classical.If QAOA returns better solutions faster than the classical, then there is clear advantage (top right), and conversely no advantage for worse solutions slower than the classical (bottom left).
FIG. 2.Time required for a single-shot QAOA to match classical MaxCut algorithms.The blue line shows time for comparing with the Gurobi solver and using p = 11; the yellow line shows comparison with the FLIP algorithm and p = 6.Each quantum device that runs MaxCut QAOA can be represented on this plot as a point, where the x-axis is the number of qubits and the y-axis is the time to solution.For any QAOA depth p, the quantum device should return at least one bitstring faster than the Y-value on this plot.

FIG. 3 .
FIG.3.Zero-time performance for graphs of different size N .The Y-value is the cut fraction obtained by running corresponding algorithms for minimum possible time.This corresponds to the Y-value of the star marker in Fig.4.Dashed lines show the expected QAOA performance for p = 11 (blue) and p = 6 (yellow).QAOA can outperform the FLIP algorithm at depth p > 6, while for Gurobi it needs p > 11.Note that in order to claim advantage, QAOA has to provide the zero-time solutions in faster time than FLIP or Gurobi does.These times are shown on Fig.2.
FIG. 5.Sampling frequency required to achieve MaxCut advantage using QAOA p = 11.The shaded area around the solid lines corresponds to 90-10 percentiles over 100 seeds for Gurobi and 20 seeds for BURER2002.The background shading represents comparison of a quantum computer with BURER2002 solver corresponding to modes in Fig.1.Each quantum device can be represented on this plot as a point, where the x-axis is the number of qubits, and the y-axis is the time to solution.Depending on the region where the point lands, there are different results of comparisons.QAOA becomes inefficient for large N , when sampling frequency starts to grow exponentially with N .
Supplementary Fig.1.Counting types of subgraphs on sparse cycles to find upper and lower limits on QAOA expectation values.The presence of a finite number of cycles in an infinitely large graph slightly modifies the value of the QAOA expectation value by modifying the local subgraphs.As shown above, the edges that are modified as a part of each cycle for p ≤ 3 are shown in black; vertices which connect to the rest of the graph are shown in red.Edge labels refer to subgraph indexing in[22].

Supplementary
Fig. 2. Dependence of standard deviation of MaxCut cut fraction on N and p for random 3-regular graphs.Circle markers represent approximate evaluations over 1000 samples per graph and 20 graphs for each size N .The dashed line shows a fit to the approximate data using the ∝ 1 √ N scaling.Cross markers show exact standard deviation values for larger N and p = 3 with one graph per each N .These values were obtained using tensor network contraction via QTensor.The bold cross marker at N = 256 is also an exact value of QAOA standard deviation at size corresponding to N used on Figure 4.Note that this plot has a log-log scale.