Nested Quantum Annealing Correction

We present a general error-correcting scheme for quantum annealing that allows for the encoding of a logical qubit into an arbitrarily large number of physical qubits. Given any Ising model optimization problem, the encoding replaces each logical qubit by a complete graph of degree $C$, representing the distance of the error-correcting code. A subsequent minor-embedding step then implements the encoding on the underlying hardware graph of the quantum annealer. We demonstrate experimentally that the performance of a D-Wave Two quantum annealing device improves as $C$ grows. We show that the performance improvement can be interpreted as arising from an effective increase in the energy scale of the problem Hamiltonian, or equivalently, an effective reduction in the temperature at which the device operates. The number $C$ thus allows us to control the amount of protection against thermal and control errors, and in particular, to trade qubits for a lower effective temperature that scales as $C^{-\eta}$, with $\eta \leq 2$. This effective temperature reduction is an important step towards scalable quantum annealing.

Quantum annealing (QA) attempts to exploit quantum fluctuations to solve computational problems faster than it is possible with classical computers [1][2][3][4][5][6][7]. As an approach designed to solve optimization problems, QA is a special case of adiabatic quantum computation (AQC) [8], a universal model of quantum computing [9][10][11][12]. In AQC, a system is designed to follow the instantaneous ground state of a time-dependent Hamiltonian whose final ground state encodes the solution to the problem of interest. This results in a certain amount of stability, since the system can thermally relax to the ground state after an error, as well as resilience to errors, since the presence of a finite energy gap suppresses thermal and dynamical excitations [13][14][15][16][17][18].
Despite this inherent robustness to certain forms of noise, AQC requires error-correction to ensure scalability, just like any other form of quantum information processing [19]. Various error correction proposals for AQC and QA have been made [20][21][22][23][24][25][26][27][28][29][30][31][32][33], but an accuracy-threshold theorem for AQC is not yet known, unlike in the circuit model (e.g., [34]). A direct AQC simulation of a fault-tolerant quantum circuit leads to many-body (high-weight) operators that are difficult to implement [23,24] or myriad other problems [12]. Nevertheless, a scalable method to reduce the effective temperature would go a long way towards approaching the ideal of closedsystem AQC, where only non-adiabatic transitions constitute the source of errors.
Motivated by the availability of commercial QA devices featuring hundreds of qubits [35][36][37][38], we focus on error correction for QA. There is a consensus that these devices are significantly and adversely affected by decoherence, noise, and control errors [39][40][41][42][43][44][45][46], which makes them particularly interesting for the study of tailored, practical error correction techniques. Such techniques, known as quantum annealing correction (QAC) schemes, have already been experimentally shown to significantly improve the performance of quantum annealers [26,[30][31][32], and theoretically analyzed using a mean-field approach [33]. However, these QAC schemes are not easily generalizable to arbitrary optimization problems since they induce an encoded graph that is typically of a lower degree than the qubit-connectivity graph of the physical device. Moreover, they typically impose a fixed code distance, which limits their efficacy.
To overcome these limitations, here we present a family of error-correcting codes for QA, based on a "nesting" scheme, that has the following properties: (1) it can handle arbitrary Ising-model optimization problem, (2) it can be implemented on present-day QA hardware, and (3) it is capable of an effective temperature reduction controlled by the code distance. Our "nested quantum annealing correction" (NQAC) scheme thus provides a very general and practical tool for error correction in quantum optimization.
We test NQAC by studying antiferromagnetic complete graphs numerically, as well as on a D-Wave Two (DW2) processor featuring 504 flux qubits connected by 1427 tunable composite qubits acting as Ising-interaction couplings, arranged in a non-planar Chimera-graph lattice [47] (complete graphs were also studied for a spin glass model in Ref. [48]). We demonstrate that our encoding schemes yields a steady improvement for the probability of reaching the ground state as a function of the nesting level, even after minor-embedding the complete graph onto the physical graph of the quantum annealer. We also demonstrate that NQAC outperforms classical repetition code schemes that use the same number of physical qubits.

I. QUANTUM ANNEALING AND ENCODING THE HAMILTONIAN
In QA the system undergoes an evolution governed by the following time-dependent, transverse-field Ising Hamiltonian: arXiv:1511.07084v1 [quant-ph] 22 Nov 2015 with respectively monotonically decreasing and increasing "annealing schedules" A(t) and B(t). The "driver Hamiltonian" H X = − i σ x i is a transverse field whose amplitude controls the tunneling rate. The solution to an optimization problem of interest is encoded in the ground state of the Ising problem Hamiltonian H P , with where the sums run over the weighted vertices V and edges E of a graph G = (V, E), and σ x,z i denote the Pauli operators acting on qubit i. The D-Wave devices use an array of superconducting flux qubits to physically realize the system described in Eqs. (1) and (2) on a fixed "Chimera" graph (see For closed systems, the adiabatic theorem [49,50] guarantees that if the system is initialized in the ground state of H(0) = A(0)H X , a sufficiently slow evolution relative to the inverse minimum gap of H(t) will take the system with high probability to the ground state of the final Hamiltonian H(t f ) = B(t f )H P . Dynamical errors then arise due to diabatic transitions, but they can be made arbitrarily small via boundary cancellation methods that control the smoothness of A(t) and B(t), as long as the adiabatic condition is satisfied [51][52][53]. For open systems, specifically a system that is weakly coupled to a thermal environment, the final state is a mixed state ρ(t f ) that is close to the Gibbs state associated with H(t f ) if equilibration is reached throughout the annealing process [18,54,55]. In the adiabatic limit the open system QA process is thus better viewed as a Gibbs distribution sampler. The main goal of QAC is to suppress the associated thermal errors and restore the ability of QA to act as a ground state solver. In addition QAC should suppress errors due to noise-driven deviations in the specification of H P [25].
Error correction is achieved in QAC by mapping the logical Hamiltonian H(t) to an appropriately chosen encoded Hamil-tonianH(t): defined over a set of physical qubitsN larger than the number of logical qubits N = |V|. Note thatH P also includes penalty terms, as explained below. The logical ground state of H P is extracted from the encoded system's stateρ(t f ) through an appropriate decoding procedure. A successful error correction scheme should recover the logical ground state with a higher probability than a direct implementation of H P , or than a classical repetition code using the same number of physical qubits N . Due to practical limitations of current QA devices that prevent the encoding of H X , only H P is encoded in QAC.
In order to allow for the most general N -variable Ising optimization problem, we now define an encoding procedure for problem Hamiltonians H P supported on a complete graph K N . The first step of our construction involves a "nested" HamiltonianH P that is defined by embedding the logical K N into a larger K C×N . The integer C is the "nesting level" and controls the amount of hardware resources (qubits, couplers, (a) Logical graph: 1st level.   FIG. 1. Illustration of the nesting scheme. In the left column, a Clevel nested graph is constructed by embedding a KN into a KC×N , with N = 4 and C = 1 (top) and C = 4 (bottom). Red, thick couplers are energy penalties defined on the nested graph between the (i, c) nested copies of each logical qubit i. The right column shows the nested graphs after ME on the DW2 Chimera graph. Brown, thick couplers correspond to the ferromagnetic chains introduced in the process. and local fields) used to represent the logical problem.H P is constructed as follows. Each logical qubit i (i = 1, . . . , N ) is represented by a C-tuple of encoded qubits (i, c), with c = 1, . . . , C. The "nested" couplersJ (i,c),(j,c ) and local fieldsh (i,c) are then defined as follows: This construction is illustrated in the left column of Fig. 1. Each logical coupling J ij has C 2 copiesJ (i,c),(j,c ) , thus boosting the energy scale at the encoded level by a factor of C 2 . Each local field h i has C copiesh (i,c) ; the factor C in Eq. (4b) ensures that the energy boost is equalized with the couplers. For each logical qubit i, there are C(C − 1)/2 ferromagnetic couplingsJ (i,c),(i,c ) of strength γ > 0 (to be optimized), representing energy penalties that promote agreement among the C encoded qubits, i.e., that bind the C-tuple as a single logical qubit i. The second step of our construction is to implement the fully connected problemH P on given QA hardware, with a log 10 (,) Experimental and numerical results for the antiferromagnetic K4, after encoding, followed by ME and decoding. Left: DW2 success probabilities PC (α) for eight nesting levels C. Increasing C generally increases PC (α) at fixed α. Middle: Rescaled PC (αµC ) data, exhibiting data-collapse. Right: scaling of the energy boost µC vs the maximal energy boost µ max C , for both the DW2 and SQA. Purple circles: DW2 results. Blue stars: SQA for the case of no ME (i.e., for the problem defined directly over KC×N and no coupler noise). Red up-triangles: SQA for the Choi ME [56] (for a full Chimera graph), with σ = 0.05 Gaussian noise on the couplings. Yellow right-triangles: SQA for the DW2 heuristic ME [57,58] (applied to a Chimera graph with 8 missing qubits) with σ = 0.05 Gaussian noise on the couplings. The flattening of µC suggests that the energy boost becomes less effective at large C. However, this can be remedied by increasing the number of SQA sweeps (see Appendix C), fixed here at 10 4 . Thus the lines represent best fits to only the first four data points, with slopes 0.98, 0.91, 0.62 and 0.69 respectively. In all panels N phys ∈ [8, 288].
lower-degree qubit connectivity graph. This requires a minor embedding (ME) [56][57][58][59][60]. The procedure involves replacing each qubit inH P by a ferromagnetically-coupled chain of qubits, such that all couplings inH P are represented by inter-chain couplings. The intra-chain coupling represents another energy penalty that forces the chain qubits to behave as a single logical qubit. The physical Hamiltonian obtained after this ME step is the final encoded HamiltonianH P . We can minor-embed a K C×N nested graph representing each qubit (i, c) as a physical chain of length L = CN/4 + 1 on the Chimera graph [56]. This is illustrated in the right column of Fig. 1. The number of physical qubits necessary for a ME of a K C×N is N phys At the end of a QA run implementing the encoded Hamil-tonianH P and a measurement of the physical qubits, a decoding procedure must be employed to recover the logical state. For the sake of simplicity we only consider majority vote decoding over both the length-L chain of each encoded qubit (i, c) and the C encoded qubits comprising each logical qubit i (decoding over the length-L chain first, then over the C encoded qubits, does not affect performance; seeAppendix A). The encoded and logical qubits can thus be viewed as forming repetition codes with, respectively, distance L and C. Other decoding strategies are possible wherein the encoded or logical qubits do not have this simple interpretation; e.g., energy minimization decoding, which tends to outperform majority voting [31]. In the unlikely event of a tie, we assign a random value of +1 or −1 to the logical qubit.

II. RESULTS
Free energy -Using a mean-field analysis similar to the approach pursued in Ref.
[33] we can compute the partition function associated with the nested Hamiltonian A(t)H X + B(t)H P for the case with uniform antiferromagnetic couplings. This leads to the following free energy density in the low temperature and thermodynamic limits (see Appendix B): where m is the mean-field magnetization. There are two key noteworthy aspects of this result. First, the driver term is rescaled as A(t) → C −1 A(t). This shifts the crossing between the A and B annealing schedules to an earlier point in the evolution and is related to the fact that QAC encodes only the problem Hamiltonian term proportional to B(t).
Consequently the quantum critical point is moved to earlier in the evolution, which benefits QAC since the effective energy scale at this new point is higher [33]. Second, the inverse temperature is rescaled as β → C 2 β. This corresponds to an effective temperature reduction by C 2 , a manifestly beneficial effect. The same conclusion, of a lower effective temperature, is reached by studying the numerically computed success probability associated with thermal distributions (see Appendix C). We shall demonstrate that this prediction is born out by our experimental results, though it is masked to some extent by complications arising from the ME and noise.
NQAC results -The hardness of an Ising optimization problem, using a QA device, is controlled by its size N as well as by an overall energy scale α [42]. The smaller this energy scale, the higher the effective temperature and the more susceptible QA becomes to (dynamical and thermal) excitations out of the ground state and misspecification noise on the problem Hamiltonian. This provides us with an opportunity to test NQAC. Since in our experiments we were limited by the largest complete graph that can be embedded on the DW2 device, a K 32 (see Appendix D for details), we tuned the hardness of a problem by studying the performance of NQAC as a function of α via H P → αH P , with 0 < α ≤ 1. Note that we did not rescale γ; instead γ was optimized for optimal post-decoding performance (see Appendix E). It is known that for the DW2, intrinsic coupler control noise can be taken to be Gaussian with standard deviation σ ∼ 0.05 of the maximum value for the couplings [45]. Thus we may expect that, without error correction, Ising problems with α 0.05 are dominated by control noise.
We applied NQAC to completely antiferromagnetic (h i = 0 ∀i) Ising problems over K 4 (J ij = 1 ∀i, j), and K 8 (random J ij ∈ [0.1, 1] with steps of 0.1) with nesting up to C = 8 and C = 4, respectively. We denote by P C (α) the probability to obtain the logical ground state at energy scale α for the C-level nested implementation (see Appendix A for data collection methods). The experimental QA data in Fig. 2 (left) shows a monotonic increase of P C (α) as a function of the nesting level C over a wide range of energy scales α. As expected, P C (α) drops from P C (1) = 1 (solution always found) to P C (0) = 6/16 (random sampling of 6 ground states, where 4 out of the 6 couplings are satisfied, out of a total of 16 states).
Note that P 1 (α) (no nesting) drops by ∼ 50% when α ∼ 0.1, which is consistent with the aforementioned σ ∼ 0.05 control noise level, while P 8 (α) exhibits a similar drop only when α ∼ 0.01. This suggests that NQAC is particularly effective in mitigating the two dominant effects that limit the performance of quantum annealers: thermal excitations and control errors. To investigate this more closely, the middle panel of Fig. 2 shows that the data from the left panel can be collapsed via P C (α) → P C (α/µ C ), where µ C is an empirical rescaling factor discussed below (see also Appendix F). This implies that P 1 (µ C α) ≈ P C (α), and hence that the performance enhancement obtained at nesting level C can be interpreted as an energy boost α → µ C α with respect to an implementation without nesting.
The existence of this energy boost is a key feature of NQAC, as anticipated above. Recall [Eq. (4)] that a nested graph K C×N contains C 2 equivalent copies of the same logical coupling J ij . Hence a level-C nesting before ME can provide a maximal energy boost µ max , with η max = 2. This simple argument agrees with the reduction of the effective temperature by C 2 based on the calculation of the free energy (5). The right panel of Fig. 2 shows µ C as a function of µ max C , yielding µ C ∼ C η with η ≈ 1.37 (purple circles). To understand why η < η max , we performed simulated quantum annealing (SQA) simulations (see Appendix G for details). We observe in Fig. 2 (right) that without ME and control errors, the boost scaling matches µ max C (blue stars). When including ME and control errors a performance drop results (red triangles). Both factors thus contribute to the sub-optimal energy boost observed experimentally. However, the optimal energy boost is recovered for a fully thermalized state with a sufficiently large penalty (see Appendix C). To match the experimental DW2 results using SQA we replace the Choi ME designed for full Chimera graphs [56] by the heuristic ME designed for Chimera graphs with missing qubits [57,58], and achieve a near match (yellow triangles) (see Appendix D for more details on ME).
Performance of NQAC vs classical repetition -Recall that N phys C = CN L is the total number of physical qubits used at nesting level C; let C max denote the highest nesting level that can be accommodated on the QA device for a given K N , i.e., C max N L ≤ N tot < (C max + 1)N L, where N tot is the total number of physical qubits (504 in our experiments). Then M C = N phys Cmax /N phys C is the number of copies that can be implemented in parallel. For NQAC at level C to be useful, it must be more effective than a classical repetition scheme where M C copies of the problem are implemented in parallel. If a single implementation has success probability P C (α), the probability to succeed at least once with M C statistically independent implementations is P C (α) = 1 − [1 − P C (α)] M C . It turns out that the antiferromagnetic K 4 problem, for which a random guess succeeds with probability 6/16, is too easy [i.e., P C (α) approaches 1 too rapidly], and we therefore consider a harder problem: an antiferromagnetic K 8 instance with couplings randomly generated from the set J ij ∈ {0.1, 0.2, . . . , 0.9, 1} (see Appendix E for more details and data on this and additional instances). Problems of this type turn out to have a sufficiently low success probability for our purposes, and can still be nested up to C = 4 on the DW2 processor.
Results for P C (α) are shown in Fig. 3 (left), and again increase monotonically with C, as in the K 4 case. For each C, P C (α) peaks at a value of α for which the maximum allowed strength of the energy penalties γ = 1 is optimal (γ > 1 would be optimal for larger α, as shown in Appendix E; the growth of the optimal penalty with problem size, and hence chain length, is a typical feature of minor-embedded problems [48]). An energy-boost interpretation of the experimental data of Fig. 3 is possible for α values to the left of the peak; to the right of the peak, the performance is hindered by the saturation of the energy penalties. Figure 3 (middle) compares the success probabilities P C (α) adjusted for classical repetition, where we have set C max = 4, and shows that P 2 (α) > P 1 (α), i.e., even after accounting for classical parallelism C = 2 performs better than C = 1. However, we also find that P 4 (α) < P 3 (α) ≤ P 2 (α), so no additional gain results from increasing C in our experiments. This can be attributed to the fact that even the K 8 problem still has a relatively large P 1 (α). Experimental tests on QA devices with more qubits will thus be important to test the efficacy of higher nesting levels on harder problems.
To test the effect of increasing C, and also to study the effect of varying the annealing time, we present in Fig. 3 (right) the performance of SQA on a random K 8 antiferromagnetic instance with the Choi ME. The results are qualitatively similar to those observed on the DW2 processor with the heuristic ME [ Fig. 3 (left)]. Interestingly, we observe a drop in the peak performance at C = 5 relative to the peak observed for C = 4. We attribute this to both a saturation of the energy penalties and a suboptimal number of sweeps. The latter is confirmed in Fig. 3 (right, inset), where we observe that the scaling of µ C with C is better for the case with more sweeps, i.e., again µ C ∼ C η , and η increases with the number of sweeps.

III. DISCUSSION
Nested QAC offers several significant improvements over previous approaches to the problem of error correction for QA. It is a flexible method that can be used with any optimization problem, and allows the construction of a family of codes with arbitrary code distance. We have given experimental and numerical evidence that nesting is effective by performing studies with a D-Wave QA device and numerical simulations. We have demonstrated that the protection from errors provided by NQAC can be interpreted as arising from an increase (with nesting level C) in the energy scale at which the logical problem is implemented. This represents a very useful tradeoff: the effective temperature drops as we increase the number of qubits allocated to the encoding, so that these two resources can be traded. Thus NQAC can be used to combat thermal excitations, which are the dominant source of errors in QA, and are the bottleneck for scalable QA implementations. We have also demonstrated that an appropriate nesting level can outperform classical repetition with the same number of qubits, with improvements to be expected when next-generation QA devices with larger numbers of physical qubits become available. We, therefore, believe that our results are of immediate and near-future practical use, and constitute an important step toward scalable QA.
[7] S. Suzuki and A. Das (guest eds.  We tested NQAC on the DW2 quantum annealing device at the University of Southern California's Information Sciences Institute (USC-ISI), which has been described in numerous previous publications (e.g., see [42]). The largest complete graph that can be embedded on this device, featuring 504 active qubits, is a K 32 .
We determined an experimental value of the success probability P C (α, γ) as a function of the energy penalty strength γ. All figures show, whenever the γ dependence is not explicitly considered, the optimal value P C (α) = max γ P C (α, γ), with γ ∈ {0.05, 0.1, 0.2, . . . , 0.9, 1}. We used the same penalty value for both the nesting and the ME. In principle these two values can be optimized separately for improved performance, but we did not pursue this here, since the resulting improvement is small, as shown in Fig. 4, and costly since each instance needs to be rerun at all penalty settings. Effect of separately optimizing γ for ME and penalties. The plot shows the success probability from SQA simulations, for NQAC applied to a random antiferromagnetic K8 with 10, 000 sweeps, σ = 0.05 noise, Choi embedding, with β = 0.1. The results obtained after separately optimizing the penalty for the nesting and for the ME are denoted "non-unif", while the results for using a single penalty for both (the strategy used in the main text) is denoted "unif". The former results in a small improvement. Also shown is that separate ("MV ME") or joint ("MV all") majority vote decoding of the nesting and the ME has no effect.
Each P C (α, γ) is the overall success probability after 2 × 10 4 annealing runs obtained by implementing 20 programming cycles of 10 3 runs each. A sufficiently large number of programming cycles is necessary to average out intrinsic control errors (ICE) that, as explained in the main text, prevent the physical couplings to be set with a precision better than ∼ 5%. To further remove possible sources of systematic noise, at each programming cycle we perform a random gauge transformation on the values of the physical qubits. A permutation of the C × N vertices is a symmetry of the nested graph but it is not a symmetry of the encoded Hamiltonian obtained after ME. This is because the C × N chains of physical qubits are physically distinguishable. In each programming cycle we also then performed a random permutation of the vertices of the nested graph, before proceeding to the ME. Error bars correspond to the standard error of the mean of the 20 P C (α) values. where A(t), B(t) have dimensions of energy, and where J and γ are dimensionless, and have each absorbed a factor of 1/2 to account for double counting. Note that both H x and H z are extensive (proportional to N ). Throughout we use σ z ic ≡ σ z ici (σ x ic ≡ σ x ici ) to denote the Pauli z (x) operator acting on physical qubit c of encoded qubit i.
We define the collective variables We can interpret S x i and S z i as the mean transverse and longitudinal fields on logical qubit i, respectively. Then but the last term is a constant [equal to γN C1 1], so it can be ignored. Therefore, up to a constant we havē where encodes the penalty strength; the 1/N correction will disappear in the thermodynamic limit. Note that 1 (1), like (S z ) 2 , and henceH P is extensive in N , as it should be. The form (B6) forH P shows that the NQAC Hamiltonian in the fully antiferromagnetic K N ×C case can be interpreted as describing the collective evolution of all logical qubits. The term λ N i=1 (S z i ) 2 favors all the spins of each logical qubit (where by spin we mean the qubit at t = t f ) being aligned, since this maximizes each summand.

Partition Function Calculation
We are interested in the partition function where θ = βB(t) is the dimensionless inverse temperature. We write the partition function explicitly as [64] where {σ z } is a sum over all possible 2 CN spin configurations in the z basis, and |{σ z } = ⊗ N i=1 ⊗ C c=1 |σ z ic . Z M is determined using the Trotter-Suzuki formula e A+B = lim M →∞ e A/M e B/M M : After a lengthy calculation [63] we find where m ≡ 1 N N j=1 m j , and where m j is the Hubbard-Stratonovich field that represents S z j (α) after the static approximation (i.e., dropping the α dependence) [65,66]. The second Hubbard-Stratonovich fieldm j acts as a Lagrange multiplier.

Free energy
In the large β (low temperature) limit, the partition function is dominated by the global minimum. This minimum is given by m = 0, which corresponds to either a paramagnetic phase (all m j = 0) or a symmetric phase (m j = ±m in equal numbers). It can be shown that the system undergoes a second order QPT, with the critical point moving to the left as C and γ grow [63]. Using a saddle point analysis of the partition function we can show thatm j = ±2iθC 2 Jλm, and hence, the dominant contribution to the partition function is given by: For B(t) > 0 and in the low temperature limit (θ 1) we can approximate 2 cosh(θ|x|) as e θ|x| , where in the second line we reintroduced the physical inverse temperature β [recall Eq. (B8)]. Factoring out C 2 and taking the large N limit then directly yields the free energy density expression (5) given in the main text.
Appendix C: Additional Numerical Data Figure 5(a) shows that the saturation of µ C at large C is removed when the number of sweeps is increased. The thermal state, where the system has fully thermalized, can be understood as the limit of an infinite number of sweeps. Figure 5(b) shows that the saturation is fully removed for the thermal state (generated using parallel tempering), and nesting is then equivalent to an energy (or temperature) boost close to the ideal result µ max C = C 2 . This suggests that for a sufficiently large sweep number, performance can be brought to near the ideal result. Figure 6 gives further evidence that nesting can be interpreted as an effective reduction of temperature by studying the success probability associated with the thermal distribution on the ME. We used parallel tempering (PT) to sample from the thermal state associated with the ME of the different NQAC cases shown in Fig. 2, and decoded using majority voting. We find that the thermal state at different temperatures but fixed C, exhibits the same qualitative behavior as the thermal state at fixed temperature but different C [see Fig. 6(a) vs. Fig. 6(b)]. Therefore, the performance improvement associated with reducing the temperature can also be reproduced by increasing C. This enforces that the energy boost can also be interpreted as decrease of the effective temperature of the device. We also find that the thermal state exhibits an energy boost scaling of µ C ∼ C 2 [see Fig. 6(c)]. The Choi technique requires a perfect Chimera graph, without missing vertices. In actual devices, however, imperfections in fabrication or the calibration process lead to the presence of unusable qubits (e.g., due to trapped flux). These qubits, along with their couplings are then permanently disabled and cannot be used in the QA process. Efficient heuristic algorithms have been developed to search for MEs for the resulting induced Chimera subgraphs [57,58,60]. Figure 7(b) shows the ME of a K 32 obtained when the heuristic algorithm developed in Ref. [58] is applied to the actual hardware graph of the DW2 "Vesuvius" chip installed at USC-ISI. Note how the ME avoids the unusable qubits, depicted as black circles in Fig. 7(b).
The MEs shown in Fig. 7 are the actual "Choi" and "heuristic" MEs used in our experiments and simulations. As discussed in the main text, SQA simulations demonstrate that the choice of the ME has a significant impact on the performance of NQAC. In particular, it turns out that the Choi ME outperforms the heuristic ME. Since the two MEs use the same amount of physical resources (a logical qubit is represented by chains of equal lengths), it is unclear why the Choi embedding should perform better than the heuristic embedding, and further investigations are needed in order to clarify this point. In the present work we limit ourselves to stressing the importance of the embedding choice when assessing the performance of minor-embedded problems in QA.

Appendix E: Additional Experimental Data
In this section we present additional experimental data for K N 's with couplings randomly generated from the set J ij ∈ {0.1, 0.2, . . . , 0.9, 1}. For large N , K N generated in this manner have a finite temperature spin glass phase transition [68]. This property renders simulated annealing inefficient in finding the ground state of such problems [69]. The main text reports data for a random K 8 instance that is referred to here as "harder-K 8 ": The solid lines represent the best linear fit to all the data points. All the best-fit lines have slopes greater than 0.95, so we find that the optimal scaling of µC ∼ C 2 is recovered at all (sufficiently large) inverse temperatures tested. This illustrates that for a sufficiently cold equilibrated system ME does not result in a suboptimal energy boost. Figure 8 includes similar data for another random K 8 instance that turned out to have a higher success probability, so we refer to it as "easier-K 8 ":  FIG. 7. MEs of a K32. We used these, e.g., to minor-embed a C = 8 nesting of a K4, or a C = 4 nesting of a K8. (a) The Choi embedding implemented on a perfect Chimera graph. (b) A heuristic ME for the actual DW2 device used in this work, whose Chimera graph contains 8 unusable qubits (black circles). Different colors (and labels) denote chains representing minor-embedded logical qubits. Black (thin) lines are logical couplings, while brown (thick) lines represent energy penalties (ferromagnetic couplings). Figure 9 displays results for NQAC applied to an "easier-K 10 ": and a "harder-K 10 ": In all cases we display results up to nesting level C = 3. Figure 10 shows the optimal penalty strength as a function of the energy scale α for the four instances considered. A saturation of the optimal penalty is visible at the maximal possible value |γ| = 1 for α close to 1, implying that the true optimal penalty values are > 1 in this range. Figure 11 shows that the antiferromagnetic harder-K 8 problem considered in the main text, as well as the easier-K 8 problem, also admit a data collapse (left), to the left of the peak. Recall that the peak is due to having reached the maximum penalty value, as illustrated in Fig. 10. The associated scaling of the energy boost µ C is shown in the right column, yielding µ C ∼ C 1.32 (harder-K 8 ) and µ C ∼ C 1.26 (easier-K 8 ). Figure 12 shows the same for harder-K 10 and easier-K 10 problems. There we find µ C ∼ C 1.34 for both problems.

Appendix F: Determination of µC
To determine the values of µ C and estimate error bars, we proceeded as follows. First, we used smoothing splines to determine a continuous interpolation P mid C (α) of the discrete data points P C (α). In the same way we also determined the higher and lower interpolating curves P high C (α) and P low C (α) for the data points P C (α) + δP C (α) and P C (α) − δP C (α) respectively, where δP C (α) denotes the standard error of P C (α). A reference value α mid C was then determined such that P mid C (α mid C ) = P 0 , where we used the smooth interpolation of the experimental data. The energy boost was then determined as µ C = α mid 1 /α mid C . P 0 is an arbitrarily chosen reference value where the different P C (α) curves are overlapped. This reference serves as a base point for computing µ C . As shown in the main text for the K 4 , the overlap of the P C data over the entire α range means that the specific choice of P 0 is arbitrary.
We similarly determined µ high and µ low C = α low 1 /α low C using the corresponding interpolating curves. The error bars shown in the figures were then centered at µ C , with lower and upper error bars being µ high C and µ low C , respectively.

Appendix G: Numerical Methods
We reported results based on quantum Monte Carlo techniques in the main text. Here we briefly review this technique. Simulated Quantum Annealing (SQA) is a quantum Monte Carlo based algorithm whereby Monte Carlo dynamics are used to sample from the instantaneous Gibbs state associated with the Hamiltonian H(t) of the system. The state at the end of the quantum Monte Carlo simulation of the quantum Hamiltonian H(t) is used as the initial state for the next Monte Carlo simulation with Hamiltonian H(t+∆t). This is repeated until H(t f ) is reached. SQA was originally proposed as an optimization algorithm [70,71], but it has since gained traction as a computationally efficient classical description for T > 0 quantum annealers [39,41,61,62]. An important caveat is that SQA does not capture the unitary dynamics of the quantum system, but it is hoped that the sampling of the instantaneous Gibbs state captures thermal processes in the quantum annealer, which may be the dominant dynamics if the evolution is sufficiently slow. Although there is strong evidence that SQA does not completely capture the final-time output of the D-Wave processors [41,72], at present it is the only viable means to simulate large ( 15 qubits) open QA systems. We used discrete-time quantum Monte Carlo in our simulations with the number of Trotter slices fixed to 64. Spin updates were performed via Wolff-cluster updates [73] along the Trotter direction only.