Analog Errors in Quantum Annealing: Doom and Hope

Quantum annealing has the potential to provide a speedup over classical algorithms in solving optimization problems. Just as for any other quantum device, suppressing Hamiltonian control errors will be necessary before quantum annealers can achieve speedups. Such analog control errors are known to lead to $J$-chaos, wherein the probability of obtaining the optimal solution, encoded as the ground state of the intended Hamiltonian, varies widely depending on the control error. Here, we show that $J$-chaos causes a catastrophic failure of quantum annealing, in that the scaling of the time-to-solution metric becomes worse than that of a deterministic (exhaustive) classical solver. We demonstrate this empirically using random Ising spin glass problems run on the two latest generations of the D-Wave quantum annealers. We then proceed to show that this doomsday scenario can be mitigated using a simple error suppression and correction scheme known as quantum annealing correction (QAC). By using QAC, the time-to-solution scaling of the same D-Wave devices is improved to below that of the classical upper bound, thus restoring hope in the speedup prospects of quantum annealing.


I. Introduction
The demonstration of scaling speedups [1,2] using quantum hardware is the holy grail of quantum computing, and massive efforts are underway worldwide in their pursuit.A daunting obstacle is the fact that all physical implementations of quantum computers suffer from analog control errors, in which the coefficients of the Hamiltonian implemented differ from those intended (e.g., [3]), a fact that threatens to spoil the results of computations due to the accumulation of small errors.This problem was recognized early on in the gate model of quantum computing [4], and soon after theoretically dealt with by error discretization via quantum error correcting codes [5].Moreover, the accuracy threshold theorem guarantees that if the physical gates used to implement encoded, error-corrected quantum circuits have sufficiently high fidelity, then any real, noisy quantum computation can be made arbitrarily close to the intended, noiseless computation with modest overhead [6][7][8].In contrast, the ultimate impact of analog control errors in Hamiltonian quantum computing, in particular in adiabatic quantum optimization [9] and quantum annealing (QA) [10], is not as clear.While unlike the gate model adiabatic quantum evolution is inherently robust to path variations due to unitary control errors [11], there is as of yet no equivalent mechanism of error discretization or an analogous accuracy threshold theorem in this paradigm.Yet, at the same time quantum annealing offers a currently unparalleled opportunity to explore NISQ-era [12] quantum optimization with thousands of qubits [13,14].It is thus of great importance to assess the role of analog control errors in QA, and to find ways to mitigate them.Here we do so in the context of spin glass problems, which are known to exhibit a type of control-error induced bond or disorder chaos already in a purely classical setting, causing chaotic changes in the ground or equilibrium state [15,16].
The extent to which analog control errors present a challenge in using any physical realization of quantum annealing, such as the D-Wave processors [17][18][19], for optimization, cannot be overstated.Indeed, an earlier study of such errors in these processors found evidence of sub-classical performance and referred to the effect as J-chaos [20], a terminology we adopt here.More recently it was shown that analog control noise causes a decrease in the probability that the implemented Hamiltonian shares a ground state with the intended Hamiltonian that scales exponentially in the size of the problem and the magnitude of the noise [21].This means that even if the annealer solves the implemented problem correctly, it has an exponentially shrinking probability of finding the intended ground state.In other words, subject to J-chaos an otherwise perfectly functioning quantum annealer will typically find the correct answer to the wrong problem.
To mitigate this "wrong Hamiltonian" problem and restore the prospects for a speedup in the use of quantum annealing for optimization, it is necessary to introduce techniques for error suppression and correction.This observation is not new [22][23][24], and repetition coding along with the use of energy penalties has been shown to significantly enhance the performance of quantum annealers [25][26][27][28][29][30].In contrast to previous work, here, for the first time, we directly address the impact of J-chaos on algorithmic scaling of optimization in experimental quantum annealing, while accessing a computational scale that is still far out of reach of current gate-model quantum computing devices.We employ a quantum annealing correction method to mitigate the problem, and demonstrate that while the scaling of uncorrected quantum annealing in solving random Ising spin glass problems is catastrophically affected by J-chaos -in that it is worse than even that of a deterministic (brute force) classical solver -hope for a quantum speedup [31] is restored with error suppression and correction.This reassuring conclusion is reached here using the simplest possible error suppression and correction scheme [25], so that much room for improvement remains for more advanced methods.We expect our results to apply broadly, certainly beyond the D-Wave devices to other quantum [32][33][34] and semiclassical annealing implementations [35,36], and to other forms of analog quantum computing [37].

II. Results
Inspired by classical simulated annealing, in which thermal fluctuations are used to hop over barriers, quantum annealing uses quantum fluctuations to tunnel through barriers [10,[38][39][40][41][42][43].The D-Wave processors are physical implementations of such devices [44].In the standard forward annealing protocol they apply a time-dependent transverse field H X = i σ x i (σ a i denotes the Pauli matrix of type a ∈ {x, y, z} applied to qubit i ∈ {1, . . ., N }) and Ising Hamiltonian as follows, where HIsing = H Ising + δH Ising and H Ising are, respectively, the implemented (perturbed) and intended (unperturbed) "problem" Hamiltonians, while δH Ising is an error term (the perturbation).Thus HIsing is the (wrong) Hamiltonian including analog control errors while H Ising is the Hamiltonian whose ground state we wish to find as the solution to the optimization problem specified by a set of local fields {h i } and couplers {J ij }: where V and E are the vertex and edge sets of the graph G, and N = |G|.We assume that the noise is Gaussian with zero mean and standard deviation η: We note that analog errors resulting in the replacement of H X by i (1 + i )σ x i with random i are expected as well, but we do not consider such transverse field errors here.The normalized time, s = t/t f , with t f denoting the final time, increases from 0 to 1, with A(0) B(0) and B(1) A (1), and A(s) [B(s)] decrease (increase) monotonically.As such, the transverse field initially drives strong quantum fluctuations that eventually give way to the implemented Ising Hamiltonian.In the absence of δH Ising and an environment the adiabatic theorem guarantees that the ground state of H Ising will be found if t f is large compared to the inverse of the minimum gap of H(s) and the maximum time-derivative(s) of H(s) [45,46].In the presence of an environment the adiabatic theorem instead guarantees evolution towards the steady state of the corresponding Liouvillian [47,48], which becomes the ground state only for sufficiently low temperature (compared to the gap of the Liouvillian).When δH Ising = 0, the probability that the computation ends in the ground state of H Ising decreases exponentially in both N and some power of η [21].The reason is that as more noise is added, there is an increasing probability that the spectrum changes such that the ground state is swapped with an excited state.Thus, increasing noise and problem size leads to a rapidly growing probability of failure to find the correct ground state.In experimental quantum annealing both the environment and control errors inevitably play a role.

A. Effect of control noise
In order to systematically test the effect of analog control errors we studied the performance of two D-Wave devices on random Ising instances of varying size N , to which we added artificially generated Gaussian control noise η.This noise was added to the intrinsic analog device noise η int [49], so that the total control noise had variance η 2 int + η 2 .Adding noise in this manner allowed us to test its effect on algorithmic performance, and the efficacy of the quantum annealing correction (QAC) strategy described below.
Throughout this work we used Ising instances with local fields h i = 0 and couplers J ij selected uniformly at random from the set ±{1/6, 1/3, 1/2}, and chose η ∈ {0, 0.03, 0.05, 0.07, 0.10, 0.15}.Such instances have been studied before [26,31,[50][51][52], but only with η = 0, i.e., never subject to the systematic addition of control noise.We define "success" in all cases as finding a ground state of the unperturbed Hamiltonian H Ising .See Methods Section A for a complete description of the instances and how we verified ground states.
The number of qubits N is proportional to L 2 , the number of Chimera graph unit cells of the D-Wave devices we used (see Methods Section B). Figure 1 displays a series of correlation plots between different levels of added noise, for different problem sizes parametrized by L. At every size, addition of noise results in a lower success probability for all instances.Increased size also results in lower success probability, as expected.Thus, Fig. 1 gives a visual confirmation of the detrimental effect of control noise; we quantify this systematically below.
Next we test whether control noise results in J-chaos.The latter exhibits itself as large variations in the success probability of the programmed Hamiltonian across different runs.We quantify this in terms of the J-chaoticity measure σ/µ, where σ is the standard deviation of the success probability across repeated runs of a given instance, and µ is the corresponding mean (see Methods Section C for more details).In Fig. 2, we plot the correlation of the J-chaoticity measure with increasing noise.For most instances this quantity becomes larger with increasing noise and size, which indicates that they are becoming more chaotic.Success probability is also strongly (negatively) correlated with increasing J-chaoticity, as shown in Fig. 3.This establishes that control-noise induced J-chaos is responsible for a strong decline in performance.Before we quantify this decline in terms of the time-to-solution metric, we first address how to mitigate this problem using error suppression and correction.

B. Quantum Annealing Correction
The error suppression and correction scheme used in this work is the [3,1,3] 1 QAC code introduced in [25] and further studied experimentally in [26][27][28].We refer the reader to these references for details, and to Methods Section D for a brief summary.The [3,1,3] 1 code is a three-qubit repetition code that corrects bit-flip errors, with the subscript denoting one extra penalty qubit.The penalty term energetically suppresses all errors that do not commute with σ z during the anneal.
Since the [3,1,3] 1 code graph is a minor of the Chimera graph (see Methods Fig. 9), we can also implement these instances without QAC.But, to ensure a fair comparison we need to equalize the resources used with QAC and without it.The [3,1,3] 1 code consumes four qubits to encode one logical qubit.Thus, we can use the same amount of resources as the encoded logical problem by running four unencoded copies in parallel, which is called the classical repetition strategy (C).To be clear, the difference between QAC and the C strategy is fourfold: QAC uses logical qubits, logical operators, and an energy penalty term, while C uses physical qubits, physical operators, and no penalty.The decoding strategy for QAC is a majority vote over the three data qubits of each logical qubit, while for C it is best-of-four-copies of the logical problem solved by QAC.A more powerful, nested QAC strategy is known [29], but it requires more physical qubits per logical qubit, and hence is less suitable for a scaling analysis of the type we perform here.
We now discuss the results after the application of QAC and compare them to the C strategy.Fig. 4 illustrates that for relatively large problem sizes and strong added control noise, such as at L = 14 and η = 0.07, QAC is able to find nearly all ground states while the C strategy only finds the ground state of a small fraction of instances.
More systematically, we show in Fig. 5(a) the fraction of instances where using QAC improved success probability when compared to the C strategy.If more than half of the instances exhibit better performance for the QAC strategy, applying it is useful for median instances.Evidently, QAC becomes a better strategy for large size and large noise, i.e., as finding the ground state becomes harder.Similarly, we compare the J-chaoticity measure σ/µ for QAC and C, as seen in Fig. 5(b).Just as in Fig. 5(a), we see greater advantage using QAC over C with increasing size and noise.These observations are consistent with earlier results [25][26][27][28] where the [3,1,3] 1 code performs better  1, but here we compare the J-chaoticity measure of instances with different added noise values.The measure increases as we go from the no added noise case η = 0 to η = 0.05 (top left).The same trend persists as we compare η = 0.05 to η = 0.10 (top right) and η = 0.10 to η = 0.15 (bottom left).The bottom right panel shows the extreme comparison of η = 0 to η = 0.15, where for all instances at all sizes J-chaoticity is higher to well within the 95% C.I.s We plot the success probability and J-chaoticity measure with noise η = 0.03.There is evidently a strong correlation between the two, showing that the success probability is lowered due to J-chaos.The inset shows the Pearson correlation coefficient ρ between success probability and Jchaoticity, as a function of added noise, for all η values tested.
As η grows, success probability and J-chaoticity become more negatively correlated.than the classical repetition strategy at large problem sizes.Here, we have shown that this is also true in the high control noise and J-chaos regime.
To understand the source of the improvement in the success probability, consider that application of repetition codes can decrease the effective noise on the encoded problem [22].In particular, the encoded operators of an n-qubit repetition code have an effective energy scale that scales extensively relative to the unencoded problem (see Methods Section D), while the random control noise adds up incoherently and hence its energy contribution only scales up by a factor of √ n.Thus, the encoding In (a), we plot the fraction of instances where QAC found higher success probability than the C strategy after removing any instances where both failed completely (see Appendix B 1).In (b), we plot the fraction of instances where QAC lowered the J-chaoticity measure when compared to the C strategy.QAC becomes better with increasing size.At large noise, QAC becomes the better strategy for all instances.We note that here and the other figures below, we have omitted the η = 0 results for L ≥ 13.The reason is a discontinuity between the DW2X and DW2000Q devices which is discussed in detail in Appendix C, and which is unrelated to the scaling analysis that forms the main focus of this work.
reduces the effective noise of the problem by the factor 1/ √ n.This enhances the success probability of the encoded problem over the simple C strategy, essentially by leveraging just classical properties of the repetition code.However, there is a quantum mechanism at work as well: a mean-field analysis reveals that the penalty term reduces the tunneling barrier width and height in the QAC case [53,54].Indeed, we shall next see that our empirical results are inconsistent with the constant success probability enhancement that would be expected FIG. 6. TTS for QAC and the C strategy, sorted by added noise.We show the TTS to find at least one ground state of the median instance of class (L, η).In (a), we show the results from the classical repetition strategy.For large instances L ≥ 10 and high noise η = 0.15, the C strategy fails to find any ground state for the median instance in our data set.In (b), we show the results for QAC, which is always able to find the ground state of the median instance in our data set.The scaling in (b) is milder compared to (a).For all η > 0, a missing data point indicates that the ground state was never found at that L value, e.g., for all L ≥ 14 in (a).from a purely classical reduction of the effective noise by the factor 1/ √ 3 (we use an n = 3 repetition code).

C. Scaling of the time-to-solution
We now discuss the impact of the analog noise on the computational effort required to find a solution of the problem.This can be quantified by the time-to-solution (TTS) metric [31], which is the number of runs required to obtain the correct ground state at least once with 99% success probability: where t f is the total anneal time per run, and P g is the probability of finding the ground state in a given run (see Methods Section C for details on how P g was computed).
The TTS metric is often used for benchmarking quantum annealing against classical algorithms (e.g., [55][56][57][58][59]).The TTS metric gives accurate scaling with problem size only when t f is optimized to minimize the TTS for each size [13,31].In our experiments, we used a fixed t f = 5 µs, and hence these results only place a lower bound on the true scaling [55], but this is sufficient for our purposes.Since our anneal time was fixed, we actually report the number of runs R = TTS/t f , which we still refer to as TTS.Note that R ≥ 1 as one needs to run the annealer at least once to find the correct ground state.
In Fig. 6, we show the TTS required to find the ground state for the median instances at each size, for both C and QAC, sorted by different levels of added noise η.As expected for spin glasses, the TTS scales at least exponentially in L for both cases, with the scaling becoming worse for larger η.However, we note that QAC exhibits both milder scaling and lower absolute effort.The same conclusion holds when we sort instances by hardness (see Appendix B 2).To more directly see the advantage of QAC over C, we plot the speedup ratio [55] in Fig. 7.This plot clearly shows the scaling advantage of QAC over C for sufficiently large size L and added noise η: for all added noise levels η, the slope of the speedup ratio becomes positive beyond an initial transient at small sizes L, and this happens sooner the larger η is.

D. Data collapse and scaling: doom vs hope
We have seen that QAC outperforms the C strategy.But what is the worst-case classical cost of solving the same Ising problem instances?For a generalized Chimera graph of L×L unit cells of complete bipartite graphs K r,r , the tree-width is w = rL + 1; for the D-Wave devices used here r = 4. Dynamic programming takes time O(L 2 2 w ) to find the ground state of any Ising problem defined on such a graph [59,60].Here 2 w is the dimension of the exhaustive search space for each of the L 2 tree nodes of width w.Thus in the present case any problem can be solved exactly, deterministically, in time TTS DP = O(L 2 2 4L ).However, adding analog errors exponentially suppresses the probability of success.Specifically, if the errors are drawn from a Gaussian distribution with standard deviation η on an instance with N spins, then P g = O(e −η α N ), where α ≤ 1 depends on the problem class [21].Thus, to find the ground state of the intended Hamiltonian H Ising , running dynamic programming on the Ising instances with noise added is expected to take a time scaling as TTS DP × ln(1−0.99)ln(1−Pg) , which reduces, in the limit P g 1, to: since the dynamic programming algorithm is only presented the intended Ising instance once every 1/P g times on average.Thus the worst case classical cost is asymptotically O(e 8η α L 2 ).Note that this scaling is determined not by the intrinsic performance of the DP algorithm, but by the probability P g that it is presented with the intended Hamiltonian, which is algorithm-independent (and is, of course, not a problem an algorithm running on classical digital computers would need to suffer from).A random guess would find the intended ground state with the same asymptotic scaling: TTS rand /P g = 2 N e 8η α L 2 = O(e c rand L 2 ), but with a larger exponent: To compare the D-Wave device's TTS scaling to this form, we attempted a data collapse of the results shown in Fig. 6, in order to include the η dependence in the scaling function.To achieve the data collapse we ran a comprehensive search for functions f (L, η) that would collapse both the C and QAC data using as few fitting parameters as possible (see Methods Section E for details of the procedure).A natural choice for such a function is a generalization of Eq. ( 5) with up to five free parameters, of the form TTS = aL 2 10 bL+c(η 2 +d 2 ) e L 2 .However, we found it to perform poorly (see Appendix B 3).Instead, we found that the four-parameter form where the crucial difference is the replacement of L 2 by L d in the exponential, works very well for both the C and QAC data (using three or fewer parameters gives poor  6)].The dashed line is the asymptotic scaling of the classical dynamic programming algorithm, for which log 10 (TTS) ∼ L 2 .The red/green region above/below this line is where the C/QAC data lies after the data collapse.These correspond, respectively, to a guaranteed slowdown and a possible (but not guaranteed) speedup.The blue and red solid are the fits derived from the data collapse, with parameters given in Table I.The shaded regions around the fitted lines represent the 95% C.I. fits as described in Appendix B 3.
agreement).The data collapse and fit results are shown in Fig. 8, and the fit parameters along with their 95% C.I. are given in Table I.The relatively tight error bounds are evidence of the quality of the data collapse.Surprisingly, we find that d > 2 for the C strategy, with high statistical confidence.This means that without error suppression, and even after using a majority vote among four copies of the problem, the performance of the quantum annealer is worse than that of a deterministic worst-case classical algorithm, for which d = 2. Hence the "doom" advertised in the title of this work.
Fortunately, not all is lost: this disturbing finding is mitigated by QAC.As seen in Fig. 8 and Table I, for QAC we obtain d < 2, again with high statistical confidence.This result restores the hope that a quantum annealer can eventually become competitive with classical optimization algorithms, but only after the incorporation of an error suppression and correction strategy such as QAC.

III. Discussion
It should be remarked that our results on optimization have no direct bearing on other tasks quantum annealers are potentially capable of speeding up, such as approximate optimization [61,62] and sampling [63][64][65][66][67]. Nor do our results address quantum annealing slowdowns due to small gaps [68][69][70][71], which may be addressed via other  6) after data collapse of the TTS scaling data shown in Fig. 6. "Upper" and "lower" refers to the 95% C.I. values of the parameters, calculated as explained in detail in Appendix B 3. Of particular note is the d parameter, which determines the asymptotic scaling.For QAC d < 2 while for C d > 2, with d = 2 being the scaling of an exhaustive classical solver.
To conclude, we have shown that QAC can reduce the detrimental effects of J-chaos on the performance of quantum annealers.In the regime we tested, QAC becomes more effective the higher the noise is and the larger the problem size is.The improvements seen are distinctly greater than without error suppression and correction, even after equalizing resources in terms of total qubit count, in terms of both scaling and absolute effort.Moreover, QAC undoes a catastrophic loss to an exhaustive classical algorithm by improving the scaling of the annealer's TTS to below the classical upper bound.Thus, we have demonstrated that QAC is not only an effective tool that can be used to improve current quantum annealing hardware, but that error suppression and correction are essential to ensure competitive performance against classical alternatives.Further improvements using more powerful error suppression and correction strategies than the simple one we explored here are certainly expected, and undoubtedly necessary, as ultimately only a fully fault-tolerant approach is expected to be effective in the asymptotic limit of large problem sizes.

A. Random Ising Instances
Without noise -The set of instances used were generated randomly on the [3, 1, 3] 1 code graph produced by the L × L Chimera graph for L ∈ {2, . . ., 12} on the D-Wave 2X and L ∈ {13, . . ., 16} on the D-Wave 2000Q.There were 100 instances at each graph size such that the local fields, h i , were 0 and the couplings J ij were drawn uniformly at random from the set ± 1 6 × {1, 2, 3}.We found the ground state energy of these logical instances via the Hamze-Freitas-Selby (HFS) algorithm [60,79] and parallel tempering with iso-energetic cluster (Houdayer) moves (PTICM) [80,81].By using both, we consistently found ground state energies lower than or just as low as those found by the D-Wave devices.In a few instances the latter found lower energies than HFS (one instance at L = 15 and five at L = 16), and these were confirmed as correct using PTICM.As such, we are confident that the ground state energies found were in fact correct.
With noise -We generated random numbers δJ ij ∼ N (0, η 2 ).If a modified coupler value Jij = J ij + δJ ij fell outside the experimentally allowed range [−1, 1], we truncated it to ±1.Since the largest coupling in our set was 0.5 and the largest noise had a standard deviation of η = 0.15, these truncated values were only used a handful of times in our entire data collection.

B. The D-Wave devices used
In this study, we used the D-Wave 2X (DW2X) annealer installed at the USC Information Sciences Institute and the D-Wave 2000Q (DW2000Q) annealer installed at the NASA Quantum Artificial Intelligence Laboratory (QuAIL).The qubits of the annealer occupy the vertices of the Chimera graph of size 12 × 12 for the DW2X and 16 × 16 for the DW2000Q (see Fig. 10 in Appendix A).The DW2X has 1098 functional qubits, leading to a [3, 1, 3] 1 code graph with 236 functional logical qubits and the DW2000Q has 2031 functional qubits, leading to a [3, 1, 3] 1 code graph with 504 functional logical qubits, as shown in Fig. 9.
The annealing time t f can be chosen in the range [5,2000]µs on the DW2X and [1,2000]µs on the DW2000Q.We used t f = 5µs, since this is the fastest time that could be used across both devices.
There were some differences between the structure and performance of these two devices.A discussion of these differences can be found in Appendix C.

C. Data analysis
We used a Bayesian bootstrap [82] over the underlying data (collected as described in Methods Section A) to compute the mean µ and the standard deviation σ of the success probabilities and their associated error bars.The ground state probability P g used in Eq. ( 4) is the same as µ.Consider g gauges where for each gauge i we find s i successful readouts out of the total M readouts.We used the Beta function β(s i , M − s i ) as our posterior probability distribution of success, i.e., it is our best guess of the distribution of the success probability, given the observation of s i successful hits in M attempts.To draw one sample of our bootstrap distribution, we did the following: 1. First, sample from each of the β-distributions.Let B i be a sample from the distribution β(s i , M − s i ) and let B = {B 1 , B 2 , . . ., B g }.

Then, sample a point from the g-dimensional uniform
Dirichlet distribution.This is a g-dimensional vector D.
3. The estimate of the success probability for this bootstrap sample is given by and the estimate of the standard deviation of this sample is the square root of the weighted variance of B where the weights are given by D, In Eqs. ( 7) and ( 8), we have used the fact that the samples of the uniform Dirichlet distribution sum to 1: From these two quantities, we computed σ/µ.Other quantities of interest can be similarly derived from a combination of B and D.
We repeated these steps a large number of times to obtain a bootstrap distribution over our quantity of interest (in this case, µ and σ/µ).Our best estimate of the quantity and its associated error bars are given by the mean and the spread of the bootstrap distribution respectively.
Data for C was collected as follows.For each instance and each noise value, we ran the annealer with 5 random gauges [83,84] with 10, 000 readouts each.If p is the success probability of one unencoded copy and each copy is statistically independent, then the C strategy will have at least one successful copy with probability 1 − (1 − p) 4 .Here, we only collected data for a single copy, and then used this combinatorial formula to get an estimate on the success probability of the classical repetition case, as in earlier work [26].In fact, due to crosstalk this provides an upper bound on the actual performance of the C strategy (see Appendix C 2), so that our results favor QAC over C even more than our plots indicate.
Data for QAC (see Section D) was collected as follows.Every data collection run for problem instance i of size L can be labeled by two additional parameters; the strength of artificial injected noise η and the strength of the penalty value γ.The penalty strength was chosen from the set γ ∈ {0.1, 0.2, . . ., 0.5}.For each data collection run (i, L, η, γ), we ran the annealer with 5 random gauges with 10, 000 readouts each, for a total of 50, 000 readouts.From this data we estimated the mean success probability over the gauges, P g (i, L, η, γ).The optimal penalty value maximizes the success probability within the chosen range of penalty values.The results shown in Section II were picked to be at this optimal value for each instance.Histograms of the optimal strengths for each problem size are shown in Appendix D.

D. Quantum annealing correction
In QAC we encode each logical Pauli-Z operator as a sum of n such physical operators, i.e., Furthermore, we add a term to ferromagnetically couple the physical copies through an auxiliary qubit, i.e., where N is the number of logical variables in the original optimization problem.We refer to the physical copies, σ z i l , as data qubits and the auxiliary qubit, σ z ip , as a penalty qubit.Thus, we arrive at the following encoding of our logical problem: where H Ising is the encoded version of H Ising in Eq. (2a), i.e., σ z i → σ z i and σ z i σ z j → σ z i σ z j , and α is an overall energy scale for the problem Hamiltonian (not used in this work, but complementary to adding control noise [29]).When we add control noise to the QAC Hamiltonian, we replace h i by hi = h i + δh i and J ij by Jij = J ij + δJ ij , with the noise satisfying Eq. (3).
The current generations of D-Wave devices allow a direct implementation of this code in the Ising Hamiltonian for n = 3, as shown in Fig. 9, but are unable to encode the driver Hamiltonian H X , as this requires many-body terms of the form (σ x ) ⊗n .Thus, increasing the penalty strength, γ, begins to diminish the effect of the quantum fluctuations that drive quantum annealing.On the other hand, larger γ values are more able to suppress bit flip errors.Thus, there exists an optimal value of γ which depends on the spectrum of the problem instance [25][26][27][28].This optimization is further discussed in Methods Section A.
After annealing, we obtain a state vector where each data qubit is measured in the computational basis.From this, we can obtain a state vector of logical qubits via a variety of decoding strategies [28].In this work, we exclusively used the strategy in which each logical qubit is decoded by a majority vote of its constituent data qubits.

E. Data collapse
Here we explain our procedure for identifying the optimal fit and data collapse function, and for extracting confidence intervals (C.I.'s) and error bars.We considered trial TTS functions of the form f (L, η) = 10 gi(L,η) , with: For g 3 we focused on the three cases Thus our trial functions had either four ({a, b, c, d}) or five ({a, b, c, d, e}) free fitting parameters.For each trial function we computed non-linear least-square fits to the median TTS data for C on L ∈ {2, . . ., 12}, and QAC on L ∈ {2, . . ., 16}.The fitting parameters were initially allowed to take any values.However, we only accepted fits with a ≥ 0 in g 3 , since a < 0 (scaling that decreases with L) would have to reflect overfitting.Thus, we also computed fits where we squared all the fitting parameters [i.e., replaced a by a 2 in Eq. ( 13), etc.] in order to enforce positivity.Furthermore, we tested if the discrepancy between the ideal and actual number of Chimera graph couplers made a difference by fitting with an effective L; see Appendix C for details.Thus, for each trial function there were four different methods: unconstrained/squared fitting parameters with L/effective L. Lastly, all fits were attempted with each of the optimization methods possible in Mathematica: SimulatedAnnealing, RandomSearch, NelderMead, and DifferentialEvolution.
Across the different methods and optimization algorithms used, g 1 was consistently the best of the 4parameter fits and was always very close to g 2 , which is its 5-parameter generalization.The g 3 functions always resulted either in a < 0 or otherwise a very poor fit.Parameter squaring also improved the fit quality, and of the optimization methods only NelderMead tended to give inferior results.
After determining the three parameters {a, b, c} for the median TTS data for g 1 , we found least-squares fits to the upper and lower bounds determined by the 95% C.I.'s for the median TTS, by using the same set {a, b, c} and letting only d be a free parameter.In this manner we found d − and d + , the exponents that provide respective lower and upper bounds on d for the median TTS data.In turn, d − and d + have associated 95% C.I.'s, denoted ∆d − and ∆d + .The reported range of d in Table I is  Figure 12 shows the same data as in Fig. 6 when we combine all the noise realizations at each size L and plot different percentiles of hardness.QAC improves the scaling at all percentiles, from the easiest instances at the 10 th percentile to the hardest instances at the 90 th percentile.

Data collapse of TTS with error bars
Figure 13 shows the results of the fits computed for the data collapse procedure described in Methods Section E, including the error bars.
C. Difference Between the DW2X and DW2000Q

Coupler counts
Since the DW2X and DW2000Q are different generations of the D-Wave devices, differing in both structure (see Fig. 10) and noise characteristics, performance differences are to be expected.In particular, the DW2X used has 1098 functional qubits out of 1152, and the DW2000Q used has 2031 functional qubits out of 2048.This leads to differences in the logical problems embeddable on each device, as we now explain in detail.
The couplings in the logical graphs of the [3,1,3] 1 code differ between the two devices, as seen in Fig. 9. Without any holes (missing physical qubits), in the logical graph each unit cell would be reduced to two logical qubits.These logical qubits would have one coupling between them, contributing a total of L 2 couplings to the logical problem.Furthermore, each unit cell would have one coupling to the unit cell below it, contributing another L 2 , except that the last row of unit cells has no unit cells below it to connect to, so we have over-counted by L couplings.The same analysis applies to the couplings to the right of each unit cell, contributing another L 2 − L couplings.Thus, the total number of couplings in the ideal graph is L 2 + 2(L 2 − L) = L(3L − 2).However, each hole in the physical graph contributes to the holes in the logical graph, removing some number of active couplers.12. TTS for QAC and the C strategy, sorted by hardness.We show the number of runs R required to find at least one ground state of the different hardness class of these instances.Here, for each size L, we group together the TTS of all the instances at every noise realization.From this group, we pick the p th -hardest percentile instance with p ∈ {10, . . ., 90}.Then, we pick out the proper percentile instances, which now only depend on the system size L. In (a), we show the results from the C strategy.In (b), we show the results for QAC.The scaling in (b) is milder compared to (a), demonstrating that QAC decreases the number of runs required in presence of J-chaos.
The difference between the ideal and actual number of couplers is shown in Fig. 14(a).As can be seen, there is a sudden jump from the DW2X to the DW2000Q in terms of number of couplers.Since the problem instances used in this work involve adding noise to each coupler present, this implies that at equal (L, η) the problems solved on the DW2000Q are expected to be somewhat harder than on the DW2X.

The no added noise case
The results discussed in the main text excluded the η = 0 results for L ≥ 13.We now explain why, and provide an analysis focused on the performance in the case of no added noise.
As discussed in the main text in Section II A, there is an intrinsic level of control noise.When η = 0, only  We show the difference between the theoretical probability of finding the ground state at least once using three independent copies, and the actual result of using 3 physical copies of the same set of 100 problem instances on the DW2X for L = 13 and η = 0.
this noise plays a role.We show the TTS for both C and QAC in the η = 0 case in Fig. 14(b).As can be seen, there is a sudden change in C's performance when we switch from the DW2X to the DW2000Q at L = 13.This is consistent with there being less noise on the later generation machine, the DW2000Q.However, the latter also has a larger fraction of active qubits, which, as discussed Appendix C 1, yields a higher count of couplers involved in the problem instances for L ≥ 13 [Fig.14(a)].Since the physical implementation of QAC uses four times more couplers than the C strategy, QAC should be much more affected by noise due to this jump in coupler count.Thus, for C the lower intrinsic noise dominates, while the smooth behavior seen for QAC is likely due to a cancellation of the lower intrinsic noise with the higher noise due to the higher coupler count.This argument explains why C exhibits a discontinuity in its TTS between L = 12 and L = 13, and why QAC appears to transition smoothly from the DW2X to the DW2000Q.Furthermore, this difference in physical implementation also implies that QAC will be more sensitive to coupler cross talk effects than C. Indeed, the harmful effect of cross talk can be seen in Fig. 15, in which a theoretical 3-copies repetition code outperforms the physical implementation for every instance.Thus, the jump in coupler count from the DW2X to the DW2000Q would introduce significant cross talk that is not harming this implementation of C. The benefits of a less noisy machine are thus countered by the cross talk associated with the sharp increase in physical coupler count for QAC, while the idealized form of C simply benefits from less noise.

D. Optimal penalty strength
Fig. 16 shows histograms of the optimal penalty strength defined in Eq. ( 9).As can be seen, as we increase the amount of noise added, the optimal penalty strength increases.This ought to be expected, since a noisier problem will require more error suppression.The DW2000Q histograms (i.e., L ≥ 13) tend slightly more towards the larger penalty strengths once we have added enough noise.This is consistent with the discussion in Appendix C about why the instances on the DW2000Q have more couplings for a given grid size.FIG.16.Optimal penalty strength histograms.We show histograms of the optimal penalty strength γopt ∈ {0.1, . . ., 0.5} for the set of instances at each size, including the cases when no solution was found for any strength (denoted "fail").As can be seen in comparing (a) to (b), the noisier problems required stronger penalties.

2 FIG. 1 .
FIG. 1. Correlation plots of ground state (success) probabilityPg with increasing added noise η. Results for random Ising instances are shown after addition of coupler noise δJij ∼ N (0, η 2 ).In each panel every data point is Pg for the same instance with different η values.Top left: success probability decreases as we go from the no added noise case η = 0 to η = 0.05.Note that even for η = 0 there is an intrinsic control error ηint.The same trend persists as we compare η = 0.05 to η = 0.10 (top right) and η = 0.10 to η = 0.15 (bottom left).The bottom right panel shows the extreme comparison of η = 0 to η = 0.15: the ground state is almost never found for η = 0.15.In all plots success probability generally decreases with increasing size L. Instances with zero success probability are not shown, hence the number of data points drops with increasing η.Unless otherwise noted, here and in later plots error bars always denote 95% confidence intervals (C.I.) obtained via a bootstrap.Details of our data analysis are given in Methods Section C.

1 FIG. 2 .
FIG. 2. Correlation plots of J-chaoticity measure σ/µwith increasing added noise η.Details are as in Fig.1, but here we compare the J-chaoticity measure of instances with different added noise values.The measure increases as we go from the no added noise case η = 0 to η = 0.05 (top left).The same trend persists as we compare η = 0.05 to η = 0.10 (top right) and η = 0.10 to η = 0.15 (bottom left).The bottom right panel shows the extreme comparison of η = 0 to η = 0.15, where for all instances at all sizes J-chaoticity is higher to well within the 95% C.I.s

4 FIG. 3 .
FIG.3.Correlation between success probability and Jchaoticity.We plot the success probability and J-chaoticity measure with noise η = 0.03.There is evidently a strong correlation between the two, showing that the success probability is lowered due to J-chaos.The inset shows the Pearson correlation coefficient ρ between success probability and Jchaoticity, as a function of added noise, for all η values tested.As η grows, success probability and J-chaoticity become more negatively correlated.

FIG. 4 .
FIG.4.Sorted success probabilities for L = 14 and η = 0.07, all 100 instances.C is able to find the correct ground state for only 11 instances.Meanwhile, QAC is able to find the correct ground state for all but 2 instances and increases the mean success probability over C.

FIG. 5 .
FIG.5.Fraction of instances where QAC outperforms the C strategy.In (a), we plot the fraction of instances where QAC found higher success probability than the C strategy after removing any instances where both failed completely (see Appendix B 1).In (b), we plot the fraction of instances where QAC lowered the J-chaoticity measure when compared to the C strategy.QAC becomes better with increasing size.At large noise, QAC becomes the better strategy for all instances.We note that here and the other figures below, we have omitted the η = 0 results for L ≥ 13.The reason is a discontinuity between the DW2X and DW2000Q devices which is discussed in detail in Appendix C, and which is unrelated to the scaling analysis that forms the main focus of this work.

3 FIG. 7 .
FIG.7.Median speedup of QAC over C. We show the ratio of TTS of C and TTS of QAC, which will sit above the dashed line at speedup ratio = 1 for cases when it is advantageous to use QAC.Furthermore, a positive slope indicates a potential scaling speedup of QAC over C, since this occurs when the TTS of C is growing more quickly than of QAC.

FIG. 8 .
FIG. 8. Data collapse and fit.Result of the collapse of the data after fitting the TTS results shown in Fig. 6 to 10 a(η 2 +b 2 ) c L d[Eq.(6)].The dashed line is the asymptotic scaling of the classical dynamic programming algorithm, for which log 10 (TTS) ∼ L 2 .The red/green region above/below this line is where the C/QAC data lies after the data collapse.These correspond, respectively, to a guaranteed slowdown and a possible (but not guaranteed) speedup.The blue and red solid are the fits derived from the data collapse, with parameters given in TableI.The shaded regions around the fitted lines represent the 95% C.I. fits as described in Appendix B 3.

FIG. 9 .
FIG.9.The[3,1,3] 1 code and the resulting logical graphs of the DW2X and DW2000Q devices.(a) A logical qubit of the [3, 1, 3] 1 code is formed by connecting three qubits in one half of the unit cell to a single qubit on the other half of the unit cell.The three qubits are called the "data qubits" and the single qubit plays the role of a "penalty qubit".The data are tied together with the penalty qubit by ferromagnetic couplings to ensure consistent evolution of each of the three data qubits.Each unit cell now contains two logical qubits.(b) and (c) Each logical qubit connects to its neighbor in the same cell via the intra-cell couplings and connects to its horizontal or vertical neighbor via the inter-cell couplings.This construction gives rise to a graph of degree 3 which is a minor embedding of the Chimera graph.(b) is the logical graph for the DW2X, (c) for the DW2000Q.In both graphs, green (red) circle denote operational (inactive) logical qubits.Orange circles denote logical qubits with intact data qubits but an inactive penalty qubit.The lines denote the active couplings between the operational logical qubits.For each problem size L, we used a subgraph formed by taking an L × L square starting from the top left corner.

1 FIG. 11 .
FIG. 11.Fraction of instances where both C and QAC failed to find the ground state.All instances are solved by either QAC or C for L ≤ 10.The fraction of failures rises more rapidly as η grows from 0 to 0.15, and only at the extreme of L = 16, η = 0.15 are no instances solved by either strategy.
FIG.12.TTS for QAC and the C strategy, sorted by hardness.We show the number of runs R required to find at least one ground state of the different hardness class of these instances.Here, for each size L, we group together the TTS of all the instances at every noise realization.From this group, we pick the p th -hardest percentile instance with p ∈ {10, . . ., 90}.Then, we pick out the proper percentile instances, which now only depend on the system size L. In (a), we show the results from the C strategy.In (b), we show the results for QAC.The scaling in (b) is milder compared to (a), demonstrating that QAC decreases the number of runs required in presence of J-chaos.

FIG. 13 .
FIG. 13.Fits to the upper and lower error bounds of the median TTS data.Error bars denote 95% C.I.'s for the median TTS data (blue curves) at each value of added noise η.The fits shown are obtained by first performing a data collapse of the median data, then fitting f (L, η) = 10 a(η 2 +b 2 ) c L d and extracting {a, b, c, d}, then using the obtained {a, b, c} values to fit new exponents d+ and d− for the upper and lower error bars, respectively.The resulting fits f (L, η), with d replaced by d±, are shown in purple (upper bound) and green (lower bound).

FIG. 14 .
FIG.14.Effect of transition from DW2X to DW2000Q.(a) Coupler counts in the logical graph of the [3, 1, 3] 1 code.We show the ideal number of couplers in a [3, 1, 3] 1 code graph with L × L unit cells, and the actual number in the two devices used, with the vertical dashed line marking the transition from the DW2X to the DW2000Q.(b) Runs-to-solution for QAC and C strategy for η = 0. We show the number of runs R of the annealer required to find at least one ground state of the median instance of class (L, η = 0), for QAC and the C strategy.There is a sudden change in the C case for L = 13, due to the transition from the DW2X to the DW2000Q device.

FIG. 15 .
FIG.15.Cross Talk in Repetition Code.We show the difference between the theoretical probability of finding the ground state at least once using three independent copies, and the actual result of using 3 physical copies of the same set of 100 problem instances on the DW2X for L = 13 and η = 0.

TABLE I .
Fit parameters for Eq. (