Determination and correction of persistent biases in quantum annealers

Calibration of quantum computers is essential to the effective utilisation of their quantum resources. Specifically, the performance of quantum annealers is likely to be significantly impaired by noise in their programmable parameters, effectively misspecification of the computational problem to be solved, often resulting in spurious suboptimal solutions. We developed a strategy to determine and correct persistent, systematic biases between the actual values of the programmable parameters and their user-specified values. We applied the recalibration strategy to two D-Wave Two quantum annealers, one at NASA Ames Research Center in Moffett Field, California, and another at D-Wave Systems in Burnaby, Canada. We show that the recalibration procedure not only reduces the magnitudes of the biases in the programmable parameters but also enhances the performance of the device on a set of random benchmark instances.

, where σ i z is the Pauli z operator acting on the ith spin. More details of QA can be found in the supplementary information.
Currently, D-Wave devices are only calibrated at the level of ensuring that the low-level control circuitry has its intended effect on the physical quantities like current, flux, etc., that it is meant to control 27 (Lanting, T. Private communication, 2015). Early research into the performance of D-Wave devices has indicated the presence of Scientific RepoRts | 6:18628 | DOI: 10.1038/srep18628 significant imprecision in the setting of the fields that define the problem to be solved, a significant impairment to the successful solution of the problem 13,17,19,23 . Recently, some work has used a phenomenological noise model of the fields {h i } and {J ij } in which the distributions of the deviations from the programmed values are given by Gaussians with means zero and standard deviations, respectively, of 0.05 and 0.035 (in units of the maximal J ij ), independently instantiated for each qubit and anneal and constant throughout the course of a given anneal. The parameters of the Gaussians were derived by adding in quadrature the variances of several known microscopic sources of noise (Lanting, T. Private communication, 2015). This model has been used in an attempt to explain the failure rate of D-Wave devices as partly due to misspecification of the programmable values. There are many sources of noise in quantum annealers, each with a different effect and time scale, and we address here only one manifestation. The variances we report in this paper are in a sense incomparable to those just mentioned, and relevant only within the context of the experiments described below.
The presence of systematic biases in quantum annealers has been reported elsewhere 24 . The biases referred to there are fundamentally different in nature from the ones address here, in that the former are collective biases on the qubits of ferromagnetic chains that depend on the strength and topology of the couplings therein and are due to the noise specifically caused by those couplings, and they must be determined anew for each embedding topology used. In this work, we present a methodology for determining, in parallel and using a relatively small amount of total annealing time, the persistent (see Fig. S3), systematic biases in all of the individually available programmable parameters of a quantum annealer. We show that correcting for these biases produces an increase in the quality of solutions found on a set of random benchmark instances. The strategy presented here is the first proposal for a software-level recalibration of the full device by the user, i.e. based only on the data from tailored instances and without access to the low-level control circuitry.

Results
Because actual quantum annealers operate at non-zero temperature, there exists some threshold for the values of the fields {h i } and couplings {J ij } below which thermal effects dominate the annealing process. When the strengths of the fields and couplings are set sufficiently small, the probabilities of the final states of the qubits are well described by a Boltzmann distribution. Roughly, the relevant energy scale is given by kT, where k is Boltzmann's constant and T is an effective temperature, not necessarily equal to the device temperature. (Henceforth, we will work in units in which k = 1.) By running experiments in this regime, persistent biases in the values of the programmed fields can be uncovered, as described in detail in the Methods section. In be the probability of qubit i being in the spin-up [spin-down] state at the end of an anneal with the programmed value ( ) h i p . A completely thermal model for this probability is given by and T i is the effective temperature of qubit i. This yields To illustrate the efficacy of our approach in quantum annealers, we applied our method to two D-Wave devices, the NASA device and the Burnaby device. Figure 1  Narrowing of the bias distribution. To show the correctability of the persistent biases, we ran the experiment described above repeatedly, each time attempting to correct the biases using estimates thereof from the prior iteration. Let ( , ) h i

Determination of the h biases.
for the first and second iterations, i.e. before and after the recalibration procedure. The narrowing of the distribution is a clear indication that the procedure is working to remove the biases. Further support for the success of recalibration is provided by looking at the distribution over the qubits of the success probabilities for all values of h (p) . Figure 1(f) shows a uniform reduction of the variance over the qubits in the values of  h. Each point is the mean h over the qubits of  h { } i for the corresponding h (p) , and the shaded region indicates the standard deviaton. The narrowing of the distribution is clear evidence that the recalibration procedure not only narrows the distribution of the biases [ Fig. 1(d,e)], as reflected in the shift of the mean, but also reduces the variance for all values of h (p) .

Determination of the J biases.
In the data presented here, the J biases were determined using (7). The   variance is about the same before and after correction, yet is much more uniform after correction, which we consider beneficial. i.e. by subtracting the sums of the residual biases from the prior iterations from the desired values. Figure 2(d-f) show the narrowing of the distribution of residual J biases with correction. Unlike the case for the h biases, for which data indicate that the distribution is essentially converged after a single iteration, here we see two new phenomena. First, the distribution continues to narrow between the second and third iterations. Second, the distribution of the residual biases from the second iteration, while narrower than that from the first, is not centered around zero. We believe this is due to overcorrection; that is, the estimates of the biases from the first iteration have a high degree of uncertainty, and so simply subtracting their values from the intended value introduces some amount of bias itself. This is consistent with the overall small magnitudes of the J biases relative to those of the h biases, especially as compared to the corresponding noise levels. This overcorrection can be mitigated by weighting the correction in a way that accounts for the uncertainty in the estimate using Bayesian reasoning. We will explore these ideas further in future work.
Effect of correction on the performance of benchmark problems. Ultimately, the goal of calibration is to optimize the performance of a quantum annealer on problems of computational interest. It is not clear a priori that the biases present in one-and two-qubit experiments are the same as those present in anneals involving hundreds of qubits. Even if they were, their estimation would be of no practical value unless their correction improves performance.
To address this, we tested the effect of correcting the h biases on the performance of the quantum annealer at NASA Ames, using the same parameterized random ensemble of instances used in a previous study benchmarking a D-Wave quantum annealer 9 . As in those studies, r is a parameter that tunes the difficulty of the average instance (the larger the r the more difficult the average instance). We expected the recalibration to have a major positive impact in the harder family of instances. For each instance, the uncorrected and corrected results were compared using two methods, a "greedy" one and the elite mean. The results of the comparison are summarized in Table 1, showing the proportion of instances, for each r, for which the correction improved the performance, using each of the two comparison methods described above. The data set indicates that correcting for the h biases improves performance according to these two reasonable metrics. At a large enough range r, however, even correction of the biases is not enough. In this limit of large r, the spacing of 1/r between the different specifiedd J values is beyond the precision of the device and poorly resolved. In this limit inherent fluctuations lead to almost zero success probabilities due to problem misspecification, i.e., the device is finding the solution to another problem different from the one indicated. This would explain the possible pattern seen in the elite mean comparison (Table 1) that the advantage of correction peaks seems to peak at the level of r considered to correspond to the precision limit of the device. (That such a pattern is not as apparent in the greedy comparison is easily explained by natural noisiness of that comparison method, especially for instances with extremely low success probability as was the case here).

Discussion
Disentangling the mutual effect of the h and J biases on each other by alternating between iterations of the iterations of h and J experiments (as opposed to doing each alone as reported) will likely lead to more accurate estimates of each individually. Lastly, the risk of overcorrection can be mitigated by weighting the correction by the degree of certainty of the estimate of the bias to be corrected.
Although we focused initially on a standard random ensemble of Ising instances for benchmarking the performance of quantum annealers, the effect of correcting biases should be greatest on instances whose ground states are most sensitive to misspecification of the programmable parameters 20,21 .
There is reason to suspect that correction will also have a beneficial effect in reducing the effect of gauge selection on success probability. While there are other suspected reasons for the effect of gauge selection (which would be non-existent in an ideal device), biases such as the ones corrected here could be one of the leading factors. The effect of gauge selection is significant, sometimes leading to an orders-of-magnitude difference in the success probabilities, and so this is a promising application for bias correction.
Importantly, while the J biases determined here are in general smaller than the h biases, numerical studies indicate that often instances are more sensitive to misspecification in the J parameters than in the h parameters 20 .
The methods presented here complement a growing suite of tools for optimal programming of quantum annealers 23,24 , tuning the performance thereof to cope with the intrinsic noise in current and future physical implementations. One way is to simply use the fitted parameters as is:

Calculation of the h biases. To experimentally determine the biases
i . Experimental data, however, indicate that the estimates of the qubit "temperatures" calculated as above are not exactly that, but include in their calculation effects other than that due to true variation in temperature between the qubits. Some estimate of a uniform device temperature should therefore be used. (See Sec. III in the SI for more detail.) In our experiments, we used two different quantities. The first is the "mean temperature", = ∑ necessary, rather than scaling to 1 as in previous studies, to allow for consistency with future experiments in which the J biases are corrected.) 100 such instances were generated and run twice with 1,000 annealing cycles for each of the same (uniformly randomly generated) 10 gauges. In all runs, {J ij } were programmed as in the instances. For the first set of runs, which we call "uncorrected", the {h i } were also programmed as in the instances, i.e. to zero. For the other set, which we call "h-corrected", the local fields were programmed to the inverse of the biases computed via experiments as in Eq. 4.
Greedy and elite mean metrics. The greedy comparison is as follows: the energies of all states returned were computed, and those for all gauges were grouped together. Whichever method (uncorrected or corrected) returned the lower minimum energy was deemed to have performed better. If the minimum energies were the same, the tie was broken by the number of times that energy was returned. If this number was the same, the method with the second-lowest energy was deemed to have performed better, with ties broken by the number of times the second-lowest energy was returned, and so on. The "elite mean" score function 23 , a quantity previously introduced to allow comparison of the performance of different programming parameters in quantum annealers when the success probabilities are too low (and thus noisy), is defined as the mean energy of the "elite" states, i.e. those with the lowest energies. The elite mean is parameterized by the fraction of energies over which to take the mean; here we use 2%. For r = 1 and r = 2, there were 6 and 2 instances, respectively, for which the elite mean comparison was tied, all but one due to success probabilities greater than 2% for both the corrected and uncorrected experiments.