# Characterizing large-scale quantum computers via cycle benchmarking

## Abstract

Quantum computers promise to solve certain problems more efficiently than their digital counterparts. A major challenge towards practically useful quantum computing is characterizing and reducing the various errors that accumulate during an algorithm running on large-scale processors. Current characterization techniques are unable to adequately account for the exponentially large set of potential errors, including cross-talk and other correlated noise sources. Here we develop cycle benchmarking, a rigorous and practically scalable protocol for characterizing local and global errors across multi-qubit quantum processors. We experimentally demonstrate its practicality by quantifying such errors in non-entangling and entangling operations on an ion-trap quantum computer with up to 10 qubits, and total process fidelities for multi-qubit entangling gates ranging from $$99.6(1)\%$$ for 2 qubits to $$86(2)\%$$ for 10 qubits. Furthermore, cycle benchmarking data validates that the error rate per single-qubit gate and per two-qubit coupling does not increase with increasing system size.

## Introduction

Practical methods to characterize quantum processes acting on large-scale quantum systems are required to assess current devices and steer the development of future, more powerful devices. In principle, quantum processes can be fully characterized using, for example, quantum process tomography1 or gate set tomography2,3,4. However, any protocol for fully characterizing a quantum process requires a number of experiments and digital post-processing resources that grows exponentially with the number of qubits, even with improvements such as compressed sensing5,6. As a result, the largest quantum processes that have been fully characterized to date acted only on three qubits7.

The exponential resources required for a full characterization can be circumvented by extracting partial information about quantum processes. A partial characterization typically yields some figure of merit comparing the noisy implementation of a quantum process to the desired operation. We will consider the process fidelity (also known as the entanglement fidelity), which is equivalent to the average gate fidelity up to a dimensional factor that is approximately 18,9.

The process fidelity can be efficiently estimated by randomized benchmarking10,11,12 or direct fidelity estimation13,14,15. Direct fidelity estimation can be efficient and hence has been implemented for up to 7 qubits16 but conflates state preparation and measurement (SPAM) errors with the process fidelity, limiting its value for realistic systems. SPAM errors increase with the system size and so robustness to SPAM is increasingly important for many qubits. Randomized benchmarking decouples the SPAM errors from gate operation errors by applying multiple random elements of the $$N$$-qubit Clifford group11,12. However, implementing each Clifford operation requires $${\mathcal{O}}({N}^{2}/{\mathrm{loge}}N)$$ primitive two-qubit operations17, so that randomized benchmarking provides very coarse information about the primitive operations. Furthermore, for error rates as low as $$0.1 \%$$ per two-qubit operation, a single 10-qubit Clifford operation will have a cumulative error rate on the order of $$10 \%$$, which substantially increases the number of measurements required to accurately estimate the process fidelity.

Owing to these practical limitations, randomized benchmarking has only been applied on operations involving three or less qubits18. While randomized benchmarking can be performed on small subsets of the qubit register19, such experiments do not explore the full Hilbert space and therefore will not detect important performance-limiting error mechanisms such as cross-talk. Moreover, errors in operations must be characterized in the context in which they are used because control sequences for a specific gate are often distorted by other gates performed in parallel. One method to achieve this is to only perform gates in fixed modes of parallel operation. We refer to a parallel set of gates as a cycle, in analogy with a digital clock cycle. In typical architectures, there are two types of cycles, namely, cycles of single-qubit gates and cycles of multi-qubit gates. Undetected calibration and cross-talk errors will typically lead to coherent and spatially correlated errors that can lead to substantially larger algorithmic errors and can require higher overheads in fault-tolerant quantum error correction schemes20. Such errors can be converted to stochastic Pauli errors by randomizing the cycles of single-qubit gates in such a way that the overall ideal circuit remains unchanged, a technique known as randomized compiling (RC)21. The error rate due to the resulting stochastic Pauli errors can then be accurately quantified by the process fidelity.

In this paper, we introduce cycle benchmarking (CB), a protocol for estimating the process fidelity of a global noise process affecting a quantum device that occur when a cycle of operations is applied to a quantum register. Under the assumption of Markovian noise such that the noise on each cycle of independent single-qubit gates is independent of the specific gates being implemented (see Supplementary Note 1), we prove that CB is robust to SPAM errors and that the number of measurements required to estimate the process fidelity to a fixed precision is approximately independent of the number of qubits. We demonstrate the practicality of CB for many-qubit systems by using it to experimentally estimate the process fidelity of both non-entangling Pauli operations and the multi-qubit entangling Mølmer–Sørensen (MS) gate22,23 acting on up to ten qubits. We also confirm that the protocol and analysis methods, derived under theoretical assumptions, produce consistent results in our experimental system.

## Results

### The CB protocol

We now outline how the CB protocol can quantify the effect of global and local error mechanisms affecting different primitive cycle operations of interest.

Mathematically, the ideal operation of interest is described by the corresponding unitary matrix $$G$$. Its action is expressed by a map $${\mathcal{G}}:\rho \to G\rho {G}^{\dagger }$$ that acts on the state of the quantum register, described by the density matrix $$\rho$$. We denote the map of an ideal operation by capital calligraphic letters, such as $${\mathcal{G}}$$, and their noisy experimental implementations will be indicated by an overset tilde, such as $$\tilde{{\mathcal{G}}}$$. We denote the composition of gates by the natural matrix operations for the map representation, so, e.g., $${\mathcal{R}}{\mathcal{G}}$$ means first apply $${\mathcal{G}}$$ then apply $${\mathcal{R}}$$, and $${{\mathcal{G}}}^{m}$$ means apply $${\mathcal{G}}$$ a total of $$m$$ times. A particularly important class of processes are Pauli cycles $${\mathcal{P}}$$, where the unitary matrix of the process is the $$N$$-qubit Pauli matrix $$P$$.

We evaluate the quality of a noisy process $$\tilde{{\mathcal{G}}}$$ by its process fidelity to the ideal target $${\mathcal{G}}$$, which can be written as13

$$F(\tilde{{\mathcal{G}}},{\mathcal{G}})=\sum _{P\in {\{I,X,Y,Z\}}^{\otimes N}}{4}^{-N}{F}_{P}(\tilde{{\mathcal{G}}},{\mathcal{G}}),$$
(1)

where

$${F}_{P}(\tilde{{\mathcal{G}}},{\mathcal{G}})={2}^{-N}{\rm{Tr}}\left[{\mathcal{G}}(P)\tilde{{\mathcal{G}}}(P)\right].$$
(2)

Each quantity $${F}_{P}(\tilde{{\mathcal{G}}},{\mathcal{G}})$$ can be experimentally estimated by preparing an eigenstate of $$P$$, applying the noisy gate $$\tilde{{\mathcal{G}}}$$, and then measuring the expectation value of the ideal outcome $${\mathcal{G}}(P)$$. The process fidelity may be estimated by averaging $${F}_{P}(\tilde{{\mathcal{G}}},{\mathcal{G}})$$ over a set of Pauli matrices. However, a sampling protocol (as in direct fidelity estimation13,14) for estimating these individual terms is not robust to SPAM errors. Robustness to SPAM is particularly important because SPAM errors can dominate the gate errors.

Inspired by randomized benchmarking10, SPAM errors can be decoupled from the process fidelity by applying the noisy operation of interest $$\tilde{{\mathcal{G}}}$$ a total of $$m$$ times and extracting the process fidelity from the decay of $${F}_{P}({\tilde{{\mathcal{G}}}}^{m},{{\mathcal{G}}}^{m})$$ as a function of the sequence length $$m$$. Extracting a meaningful error per application of the gate of interest is nontrivial for generic noise channels24. However, decay rates can be extracted straightforwardly for Pauli noise channels, that is, classical mixtures of Pauli operations that are applied to the register randomly with given probability. Mathematically, a Pauli noise channel is a map

$${\mathcal{E}}:\rho \to \sum _{P\in {\{I,X,Y,Z\}}^{\otimes N}}\mu (P)P\rho {P}^{\dagger }$$
(3)

for some probability distribution $$\mu$$. Such channels cannot exactly describe, for example, small over-rotation errors or amplitude damping channels.

Since the noise in our system is generic, we want to engineer the noise such that it can be described well by a Pauli noise channel. It has been shown that this can be accomplished by introducing a random Pauli cycle $${\mathcal{R}}$$ at each time step between each application of the cycle of interest $${\mathcal{G}}$$25,26. This additional random Pauli cycle $${\mathcal{R}}$$ comes with an additional overhead that will increase the number required gates to implement a given algorithm. RC has been developed to eliminate this overhead21. The resulting noise channel when using RC is then associated with the composition of $${\mathcal{G}}$$ with a random Pauli cycle $${\mathcal{R}}$$, called a dressed cycle $${\mathcal{G}}{\mathcal{R}}$$, which is an important characterization primitive for any algorithm implemented via RC21. Therefore CB estimates the average of the process fidelities of the dressed cycle $$\tilde{{\mathcal{G}}}\tilde{{\mathcal{R}}}$$

$${F}_{{\rm{RC}}}(\tilde{{\mathcal{G}}},{\mathcal{G}})=\sum _{{\mathcal{R}}\in {\{{\mathcal{I}},{\mathcal{X}},{\mathcal{Y}},{\mathcal{Z}}\}}^{\otimes N}}{4}^{-N}F(\tilde{{\mathcal{G}}}\tilde{{\mathcal{R}}},{\mathcal{G}}{\mathcal{R}}).$$
(4)

In addition to the dressed cycle fidelity, the process fidelity of the noisy gate $$\tilde{{\mathcal{G}}}$$ alone is of interest. The process fidelity of a specific gate $$\tilde{{\mathcal{G}}}$$ may be estimated by taking the ratio of the estimates obtained for $$\tilde{{\mathcal{G}}}$$ and the identity process $$\tilde{{\mathcal{I}}}$$, in analogy to interleaved benchmarking27. It should be noted that this method of estimating the fidelity of the noise on $$\tilde{{\mathcal{G}}}$$ alone is generally subject to a large systematic uncertainty28, so the CB method is most precise in the important context of characterizing errors on dressed cycles21.

CB can be used to efficiently characterize non-Clifford gates by selecting random gates and correction operators using RC21. However, the general protocol for non-Clifford gates is more complex, so a simplified version for characterizing the errors occurring under a fixed cycle of Clifford gates $${\mathcal{G}}$$ composed with a random Pauli cycle $${\mathcal{R}}$$ is as follows (the protocol is illustrated in Fig. 1, where we explain the motivation for each step further below):

1. 1.

Select a set of $$N$$-qubit Pauli matrices $$\sf{P}$$ with $$K=| {\sf{P}}|$$ elements.

2. 2.

Select two lengths $${m}_{1}$$ and $${m}_{2}$$ such that the multiple application of $${\mathcal{G}}$$ composes to the identity $${{\mathcal{G}}}^{{m}_{1}}={{\mathcal{G}}}^{{m}_{2}}={\mathcal{I}}$$.

3. 3.

Perform the following sequence for each Pauli matrix $$P\in {\sf{P}}$$, length $$m\in ({m}_{1},{m}_{2})$$, and $$l\in (1,\ldots ,L)$$, where $$L$$ describes the number of random sequences per Pauli.

1. 3a.

Select $$m+1$$ random $$N$$-qubit Pauli cycles $${{\mathcal{R}}}_{0},{{\mathcal{R}}}_{1},\ldots ,{{\mathcal{R}}}_{m}$$, and define the randomized circuit

$${\mathcal{C}}(P)={{\mathcal{R}}}_{m}{\mathcal{G}}{{\mathcal{R}}}_{m-1}{\mathcal{G}}\ldots {{\mathcal{R}}}_{1}{\mathcal{G}}{{\mathcal{R}}}_{0}$$
(5)

as illustrated in Fig. 1.

2. 3b.

Calculate the expected outcome of the sequence $${\mathcal{C}}(P)$$ assuming ideal gate implementations.

3. 3c.

Main experiment: Implement $${\mathcal{C}}(P)$$ and estimate the overlap

$${f}_{P,m,l}={\rm{Tr}}[{\mathcal{C}}(P)\ \tilde{{\mathcal{C}}}(\rho )]$$
(6)

between the expected outcome and the noisy implementation $$\tilde{{\mathcal{C}}}(\rho )$$ for some initial state $$\rho$$ that is a $$+1$$-eigenstate of $$P$$. State preparation and measurement are realized by applying the operations $${\tilde{{\mathcal{B}}}}_{P}$$ and $${\tilde{{\mathcal{B}}}}_{{\mathcal{C}}(P)}^{\dagger }$$ that are described in Supplementary  Note 2.

4. 4.

Estimate the composite process fidelity via

$${F}_{{\rm{RC}}}(\tilde{{\mathcal{G}}},{\mathcal{G}})={\sum _{{P}\in {\sf{P}}}}\frac{1}{|\sf{P}| }{\left(\frac{{\sum }_{l=1}^{L}{f}_{P,{m}_{2},l}}{{\sum }_{l=1}^{L}{f}_{P,{m}_{1},l}}\right)}^{\frac{1}{{m}_{2}-{m}_{1}}}.$$
(7)

Step 1 ensures that the action of the $$N$$-qubit process is accurately estimated. In Supplementary Note 5, we prove that the uncertainty of the fidelity estimate is independent of the number of qubits $$N$$, and the number of Pauli matrices $$K$$ that need to be sampled depends only on the desired precision. This highlights the scalability of the protocol for large quantum processors.

Step 2 ensures that the measurement procedures for circuits in Eq. (8) with two different values of $$m$$ are the same. Having the same measurement procedures for the two values of $$m$$ is crucial to decouple the SPAM errors from the decay in the process fidelity via the ratio in Eq. (7). In our experiment, we always choose $${m}_{1}=4$$ and $${m}_{2}$$ to be an integer multiple of 4, as, for the considered gates, applying the operation four times subsequently yields the identity process $${{\mathcal{G}}}^{4}={\mathcal{I}}$$.

In step 3a, we choose random Pauli cycles to engineer an effective Pauli noise process across the $$L$$ randomizations. This enables us to extract a process fidelity from the decay of $${\sum }_{l=1}^{L}{f}_{P,m,l}/L$$ with the sequence length $$m$$. Note that unlike typical randomized benchmarking protocols, the above protocol does not have an inversion gate. Formally, the final random Pauli can be regarded as a correction gate for the random Pauli gates in the rest of the circuit composed with another random Pauli that we use to isolate exponential decays as in character benchmarking29.

In step 3b, for any Clifford cycle $${\mathcal{G}}$$, Pauli matrix $$P$$, and Pauli cycles $${{\mathcal{R}}}_{0},\ldots ,{{\mathcal{R}}}_{m}$$, the expected outcome of the ideal implementation $${\mathcal{C}}(P)$$ is a Pauli matrix that can be efficiently calculated. Note that only the sign of $${\mathcal{C}}(P)$$ depends on the random Pauli cycles. This sign is accounted for when estimating the expectation value with the procedure outlined in Supplementary Note 2. Incorporating the sign engineers a measurement of the expectation value of $${\mathcal{C}}(P)$$ that is robust to SPAM errors, as otherwise the expectation values result from a multi-exponential decay24,29.

In step 3c, we experimentally prepare an eigenstate of a Pauli matrix $$P$$, apply a circuit $$\tilde{{\mathcal{C}}}$$ with interleaved random Pauli cycles, and measure the expectation value of $${\mathcal{C}}(P)$$. The explicit procedures we use for preparing the eigenstate and measuring the expectation value are described in Supplementary Note 2. As discussed in Supplementary Note 5, the number of measurements required to estimate the expectation value to a fixed additive precision is independent of the number of qubits.

As we prove in Supplementary Note 4, the expected value of $${F}_{{\rm{RC}}}(\tilde{{\mathcal{G}}},{\mathcal{G}})$$ in Eq. (7) for two values of $${m}_{1}$$ and $${m}_{2}$$ as in step 2 is equal to the composite process fidelity $${F}_{{\rm{RC}}}(\tilde{{\mathcal{G}}},{\mathcal{G}})$$ in Eq. (4) up to $${\mathcal{O}}({[1-{F}_{{\rm{RC}}}(\tilde{{\mathcal{G}}},{\mathcal{G}})]}^{2})$$ and always provides a lower bound.

### Experimental results

We demonstrate the practicality of CB for multi-qubit systems by using it to experimentally estimate the process fidelity of cycles acting globally on quantum registers containing 2, 4, 6, 8, and 10 qubits. The specific cycles we consider consist of simultaneous local Pauli gates and multi-qubit entangling MS gates22,23 combined with simultaneous local Pauli gates. We confine $${}^{40}{{\rm{Ca}}}^{+}$$ ions in a linear Paul-trap and encode a single qubit in the electronic states of each atomic ion. The encoding utilizes the $$\left|0\right\rangle =4{S}_{1/2}({m}_{j}=-1/2)$$ ground-state and the $$\left|1\right\rangle =3{D}_{5/2}({m}_{j}=-1/2)$$ metastable excited state. Our quantum computing toolbox comprises independent arbitrary single-qubit operations and fully entangling $$N$$-qubit MS gates, acting on all $$N$$ qubits in the register simultaneously (see Supplementary Note 7). An experimental run consists of: (i) Doppler cooling, (ii) sideband-cooling of the two motional modes with lowest frequencies, (iii) optical pumping to the initial state $${\left|0\right\rangle }^{\otimes N}$$, (iv) coherent manipulation, and (v) readout of the ions. Each sequence is repeated 100 times to gather statistics (for experimental details, see Supplementary Note 7 and ref. 30).

Under Markovian noise, the estimate of the process fidelity from Eq. (7) is independent of the sequence lengths $${m}_{1}$$ and $${m}_{2}$$ used to estimate it (see Supplementary Note 3). We tested whether our experimental apparatus satisfied this assumption by performing measurements at three values of $$m$$ (4, 8, and 12) on a register containing 6 qubits and comparing the results obtained from pairs of sequence lengths against each other. The data are tabulated in Supplementary Table 2, where the variation of the estimated fidelities is within 0.1%, which is smaller than the corresponding uncertainties of 0.4%. This suggests that the errors are Markovian and the estimated process fidelity is independent of the chosen sequence lengths for our system and henceforth we only use two sequence lengths to estimate the process fidelity.

The CB protocol is practical to implement on large processors because the fidelity can be accurately estimated using a number of Pauli matrices that is independent of the number of qubits $$N$$ (see Supplementary Note 5). To illustrate the rapid convergence under finite sample size, we performed CB of local Pauli operations on a 4-qubit register by exhaustively estimating all $${4}^{4}-1=255$$ possible decay rates. We estimate the average fidelities via Eq. (7) for multiple subsets $${\sf{P}}$$ of the set of all Pauli matrices. For each $$K=1,\ldots ,100$$, we evaluate the fidelity for 30 randomly chosen subsets $${\sf{P}}$$ containing $$| {\sf{P}}| =K$$ Pauli matrices. The mean and standard deviation of the estimated fidelities as functions of the subset size are shown in Fig. 2. In Fig. 2b, we introduce two boundaries between which the observed standard deviation should lie if we are choosing appropriate sequence lengths and sample sufficiently many random circuits per sequence length. For the lower bound, we assume quantum projection noise to be the only noise source. We evaluate the shot noise for the measured data and perform error propagation to calculate the lower bound $${\sigma }_{{\rm{lower}}}=0.00375(1)/\sqrt{K}$$. This lower limit could be reached if the noise in the system is completely isotropic (e.g., global depolarizing). Biased noise or drift (see Supplementary Note 10) will lead to uncertainties bigger than those originating from quantum projection noise. We furthermore test that the fluctuations between different Pauli channels is bounded by an error model that assumes worst-case fluctuations between channels. This bound does not depend on the register size but only on the fidelity $$F$$ and can be estimated via $${\sigma }_{{\rm{Pauli}}}=(1-F)/\sqrt{K}=0.0275(8)/\sqrt{K}$$ (see Supplementary Note 5).

The observed standard error of the mean $$\sigma =0.0127(2)/\sqrt{K}$$ is larger than the lower bound given by quantum projection noise but smaller than the worst-case bound from sampling finite Pauli channels. The data demonstrate that we can estimate the process fidelity $$F$$ to an uncertainty smaller than $$(1-F)/\sqrt{K}$$ independent of the register size with other experimental parameters held fixed (the parameters are listed in Supplementary Table 1).

We performed CB on local operations and with an interleaved MS gate on registers containing 2, 4, 6, 8, and 10 qubits. The process fidelity as a function of the number of qubits in the register is shown in Fig. 3 and Table 1. While it is expected that the fidelity over the full register decreases with increasing register size, an important question is whether the effective error rate per qubit increases or significant cross-talk effects appear, with increasing numbers of qubits.

We observe that the fidelity for local CB (blue circles in Fig. 3a) decays linearly with register size $$N$$, as

$$F=1-{\epsilon }_{P}N,$$
(8)

with $${\epsilon }_{P}=0.011(2)$$. The linear decay of the fidelity indicates that our single-qubit Pauli operations do not show increasing error rates per qubit or a significant onset of cross-talk errors as the register size increases. Each single-qubit Pauli operation requires $${n}_{{\rm{S}}}$$ native gates, where on average $$\langle {n}_{{\rm{S}}}\rangle =1.27$$, independent of the system size. Therefore, the effective process fidelity of a native single-qubit gate is $$1-{\epsilon }_{P}/\langle {n}_{{\rm{S}}}\rangle =0.992(1)$$.

The CB measurements with interleaved MS gates give the process fidelity of the MS gate composed with a round of local randomizing gates as in Eq. (4) (a dressed MS gate, see red diamonds in Fig. 3a). This determines the error rate when a circuit is implemented by RC21. The process fidelity of the interleaved gate can be estimated by the ratio of the dressed MS and local fidelities as in interleaved randomized benchmarking27. The resulting estimates are plotted in Fig. 3b. We note that these estimates may have a large systematic error that is on the same order as the overall error rate28. This systematic uncertainty primarily arises due to coherent over- and under-rotations with similar rotation axes. The MS gate performs rotations around the non-local axes $${\sigma }_{x}^{(i)}\otimes {\sigma }_{x}^{(j)}$$, which are substantially different from the single-qubit rotation axes. Therefore, it is unlikely that any coherent errors on the MS gate accumulate with the errors on the single-qubit rotations, and so we neglect this systematic error. We conjecture that the process fidelity of the MS gate should decay quadratically due to an error in each of the $$\left({{N}\atop{2}}\right)$$ couplings between pairs of qubits introduced by the MS gate. If we assume an average error rate $${\epsilon }_{2}$$ per two-qubit coupling, we can describe the MS gate fidelity as

$${F}_{{\rm{MS}}}=1-{\epsilon }_{2}\frac{{N}^{2}-N}{2}\ .$$
(9)

Fitting this model to the results in Fig. 3b gives an estimated error per two-qubit coupling of $${\epsilon }_{2}=0.0030(2)$$. However, we cannot harness these two-qubit couplings individually in the experiment and thus they cannot be compared to individually available gates. The deviations of the fidelity estimates from the model defined in Eq. (9) are within the expected statistical uncertainty and we believe that these deviations arise mainly from day-to-day fluctuations in the experiment.

## Discussion

In summary, we have developed CB and demonstrated its practicality by implementing it on quantum registers containing $$N=2$$, 4, 6, 8, and 10 qubits. In comparison, a single random Clifford gate for 8 and 10 qubits would require >50 MS gates and so randomized benchmarking for 8 and 10 qubits would require a large number of measurements to achieve a useful statistical precision. CB is practical in regimes where randomized benchmarking is impractical because it uses local randomizing gates. A similar approach was independently considered in refs. 29,31 to characterize a two-qubit Clifford gate. However, the approach implemented here and proposed previously in ref. 26 can be applied in a scalable manner to processors with arbitrary numbers of qubits.

The total experimental time and post-processing resources required for our implementation were approximately independent of the number of qubits (see Supplementary Table 1), after accounting for the additional tests performed on specific numbers of qubits. This is achieved because, as we provide proof in Supplementary Note 5, the uncertainty of the fidelity estimate is independent of the number of qubits $$N$$, and the number of Pauli matrices $$K$$ that need to be sampled depends only on the desired precision. In addition, we demonstrated experimentally that the estimate of the fidelity and its error converges rapidly under finite sample size (Fig. 2) and that the estimated fidelities are approximately independent of the sequence lengths used. The data from CB also gives estimates of the diagonal of the Pauli–Liouville representation of the effective noise. A natural open question is to use this procedure to reconstruct the underlying noise model, which we leave for future work.

CB can be readily implemented on general quantum computing architectures to estimate the fidelity of multi-qubit processes. The fidelity corresponds to the effective error rate under RC32. It should be noted that the performance of the same operation in a circuit without RC can differ significantly from the estimated fidelity of its constituents due to the addition or cancellation of coherent errors33. This is a general issue with performance metrics for quantum operations34 and we want to emphasize that RC has been designed to eliminate these coherent errors. The protocol also provides insight into how noise scales within a fixed architecture. In our ion trap, the fidelity of local gates across the whole register decreased linearly with $$N$$, demonstrating that our native single-qubit gates have an average fidelity of 99.2(1)% and do not deteriorate with the register size. Thus we have demonstrated a scalable method to validate a major requirement for fault-tolerant quantum computation. In addition, we performed interleaved CB protocols to estimate the performance of the multi-qubit entangling MS gate. From the ratio between the dressed MS and the local CB fidelities, we infer entangling gate fidelities ranging from 99.6(1)% to  86(2)% for 2–10 qubits. While this inference is in principle subject to a large systematic uncertainty24, we have argued that the systematic uncertainty should be small for our set of operations. We leave the problem of quantifying or reducing this systematic uncertainty open, but note that a natural approach would be to quantify how coherent the errors are using generalizations of either purity benchmarking35 or iterated interleaved benchmarking36.

## Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## References

1. 1.

Chuang, I. L. & Nielsen, M. A. Prescription for experimental determination of the dynamics of a quantum black box. J. Mod. Opt. 44, 2455 (1997).

2. 2.

Merkel, S. T. et al. Self-consistent quantum process tomography. Phys. Rev. A 87, 062119 (2013).

3. 3.

Blume-Kohout, R. et al. Robust, self-consistent, closed-form tomography of quantum logic gates on a trapped ion qubit. Preprint at https://arxiv.org/abs/1310.4492 (2013).

4. 4.

Blume-Kohout, R. et al. Demonstration of qubit operations below a rigorous fault tolerance threshold with gate set tomography. Nat. Commun. 8, 14485 (2017).

5. 5.

Flammia, S. T., Gross, D., Liu, Y. K. & Eisert, J. Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators. New J. Phys. 14, 095022 (2012).

6. 6.

Rodionov, A. V. et al. Compressed sensing quantum process tomography for superconducting quantum gates. Phys. Rev. B 90, 144504 (2014).

7. 7.

Weinstein, Y. S. et al. Quantum process tomography of the quantum Fourier transform. J. Chem. Phys. 121, 6117 (2004).

8. 8.

Horodecki, M., Horodecki, P. & Horodecki, R. General teleportation channel, singlet fraction, and quasidistillation. Phys. Rev. A 60, 1888 (1999).

9. 9.

Nielsen, M. A. A simple formula for the average gate fidelity of a quantum dynamical operation. Phys. Lett. A 303, 249 (2002).

10. 10.

Emerson, J., Alicki, R. & Życzkowski, K. Scalable noise estimation with random unitary operators. J. Optics B 7, S347 (2005).

11. 11.

Dankert, C., Cleve, R., Emerson, J. & Livine, E. Exact and approximate unitary 2-designs and their application to fidelity estimation. Phys. Rev. A 80, 012304 (2009).

12. 12.

Magesan, E., Gambetta, J. M. & Emerson, J. Scalable and robust randomized benchmarking of quantum processes. Phys. Rev. Lett. 106, 180504 (2011).

13. 13.

Flammia, S. T. & Liu, Y. K. Direct fidelity estimation from few Pauli measurements. Phys. Rev. Lett. 106, 230501 (2011).

14. 14.

da Silva, M. P., Landon-Cardinal, O. & Poulin, D. Practical characterization of quantum devices without tomography. Phys. Rev. Lett. 107, 210404 (2011).

15. 15.

Moussa, O., da Silva, M. P., Ryan, C. A. & Laflamme, R. Practical experimental certification of computational quantum gates using a twirling procedure. Phys. Rev. Lett. 109, 070504 (2012).

16. 16.

Lu, D. et al. Experimental estimation of average fidelity of a clifford gate on a 7-qubit quantum processor. Phys. Rev. Lett. 114, 140505 (2015).

17. 17.

Aaronson, S. & Gottesman, D. Improved simulation of stabilizer circuits. Phys. Rev. A 70, 052328 (2004).

18. 18.

McKay, D. C., Sheldon, S., Smolin, J. A., Chowand, J. M. & Gambetta J. M. Three qubit randomized benchmarking. Phys. Rev. Lett. 122, 200502 (2019).

19. 19.

Gambetta, J. M. et al. Characterization of addressability by simultaneous randomized benchmarking. Phys. Rev. Lett. 109, 240504 (2012).

20. 20.

Preskill, J. Sufficient condition on noise correlations for scalable quantum computing. Quantum Inf. Comput. 13, 181 (2013).

21. 21.

Wallman, J. J. & Emerson, J. Noise tailoring for scalable quantum computation via randomized compiling. Phys. Rev. A 94, 052325 (2016).

22. 22.

Sørensen, A. & Mølmer, K. Quantum computation with ions in thermal motion. Phys. Rev. A 82, 1971 (1999).

23. 23.

Sørensen, A. & Mølmer, K. Entanglement and quantum computation with ions in thermal motion. Phys. Rev. A 62, 022311 (2000).

24. 24.

Carignan-Dugas, A., Wallman, J. J. & Emerson, J. Characterizing universal gate sets via dihedral benchmarking. Phys. Rev. A 92, 060302 (2015).

25. 25.

Knill, E. Quantum computing with realistically noisy devices. Nature 434, 39 (2005).

26. 26.

Wallman, J. J. & Emerson, J. System and methods for local randomized benchmarking. US patent application 2019/0026211 A1 (2019). 2017-07-23.

27. 27.

Magesan, E. et al. Efficient measurement of quantum gate error by interleaved randomized benchmarking. Phys. Rev. Lett. 109, 080505 (2012).

28. 28.

Carignan-Dugas, A., Wallmanand, J. J. & Emerson, J. Bounding the average gate fidelity of composite channels using the unitarity. New J. Phys. 21, 053016 (2019).

29. 29.

Helsen, J., Xue, X., Vandersypen, L. M. K. & Wehner, S. A new class of efficient randomized benchmarking protocols. npj Quantum Information 5, https://www.nature.com/articles/s41534-019-0182-7 (2019).

30. 30.

Schindler, P. et al. A quantum information processor with trapped ions. New J. Phys. 15, 123012 (2013).

31. 31.

Xue, X. et al. Benchmarking gate fidelities in a Si/SiGe two-qubit device. Phys. Rev. X 9, 021011 (2018).

32. 32.

Wallman, J. J. Randomized benchmarking with gate-dependent noise. Quantum 2, 47 (2018).

33. 33.

Wallman, J. J. Error rates in quantum circuits. Preprint at https://arxiv.org/pdf/1511.00727.pdf (2015).

34. 34.

Gilchrist, A., Langford, N. K. & Nielsen, M. A. Distance measures to compare real and ideal quantum processes. Phys. Rev. A 71, 062310 (2005).

35. 35.

Wallman, J. J., Granade, C., Harper, R. & Flammia, S. T. Estimating the coherence of noise. New J. Phys. 17, 113020 (2015).

36. 36.

Sheldon, S. et al. Characterizing errors on qubit operations via iterative randomized benchmarking. Phys. Rev. A 93, 012301 (2016).

## Acknowledgements

We gratefully acknowledge support by the Austrian Science Fund (FWF), through the SFB Fo-QuS (FWF Project No. F4002-N16), as well as the Institut für Quanteninformation GmbH. In addition, we acknowledge support from the Austrian Research Promotion Agency (FFG) contract 872766. This research was funded by the Office of the Director of National Intelligence (ODNI) and Intelligence Advanced Research Projects Activity (IARPA) through the Army Research Office grant W911NF-16-1-0070. All statements of fact, opinions, or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of IARPA, the ODNI, or the U.S. Government. We also acknowledge support by U.S. A.R.O. through grant W911NF-14-1-0103. This research was undertaken thanks in part to funding from TQT, CIFAR, the Government of Ontario, and the Government of Canada through CFREF, NSERC, and Industry Canada. We want to thank the anonymous referees for their valuable remarks.

## Author information

A.E., J.J.W., P.S., T.M., J.E. and R.B. wrote the manuscript and provided revisions. J.J.W., T.M. and P.S. developed the research based on discussions with J.E. and R.B. J.J.W. and J.E. developed the theory. A.E., E.M. and P.S. performed the experiments. A.E., E.A.M., P.S., L.P., M.M. and R.S. contributed to the experimental set-up. A.E. and J.J.W. analyzed the data. All authors contributed to discussions of the results and the manuscript.

Correspondence to Joel J. Wallman or Thomas Monz.

## Ethics declarations

### Competing interests

J.J.W. and J.E. are founding members of Quantum Benchmark Inc. T.M. and R.B. are founding members of Alpine Quantum Technologies GmbH.

Peer review information Nature Communications thanks Jungsang Kim and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Erhard, A., Wallman, J.J., Postler, L. et al. Characterizing large-scale quantum computers via cycle benchmarking. Nat Commun 10, 5347 (2019) doi:10.1038/s41467-019-13068-7

• ### Benchmarking an 11-qubit quantum computer

• K. Wright
• , K. M. Beck
• , S. Debnath
• , J. M. Amini
• , Y. Nam
• , N. Grzesiak
• , J.-S. Chen
• , N. C. Pisenti
• , M. Chmielewski
• , C. Collins
• , K. M. Hudek
• , J. Mizrahi
• , J. D. Wong-Campos
• , S. Allen
• , J. Apisdorf
• , P. Solomon
• , M. Williams
• , A. M. Ducore
• , A. Blinov
• , S. M. Kreikemeier
• , V. Chaplin
• , M. Keesan
• , C. Monroe
•  & J. Kim

Nature Communications (2019)

• ### A new class of efficient randomized benchmarking protocols

• Jonas Helsen
• , Xiao Xue
• , Lieven M. K. Vandersypen
•  & Stephanie Wehner

npj Quantum Information (2019)

• ### Spectral quantum tomography

• Jonas Helsen
• , Francesco Battistel
•  & Barbara M. Terhal

npj Quantum Information (2019)