Characterizing large-scale quantum computers via cycle benchmarking

Quantum computers promise to solve certain problems more efficiently than their digital counterparts. A major challenge towards practically useful quantum computing is characterizing and reducing the various errors that accumulate during an algorithm running on large-scale processors. Current characterization techniques are unable to adequately account for the exponentially large set of potential errors, including cross-talk and other correlated noise sources. Here we develop cycle benchmarking, a rigorous and practically scalable protocol for characterizing local and global errors across multi-qubit quantum processors. We experimentally demonstrate its practicality by quantifying such errors in non-entangling and entangling operations on an ion-trap quantum computer with up to 10 qubits, and total process fidelities for multi-qubit entangling gates ranging from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99.6(1)\%$$\end{document}99.6(1)% for 2 qubits to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$86(2)\%$$\end{document}86(2)% for 10 qubits. Furthermore, cycle benchmarking data validates that the error rate per single-qubit gate and per two-qubit coupling does not increase with increasing system size.

Practical methods to characterize quantum processes acting on large-scale quantum systems are required to assess current devices and steer the development of future, more powerful, devices.In principle, quantum processes can be fully characterized using, for example, quantum process tomography [1] or gate set tomography [2][3][4].However, any protocol for fully characterizing a quantum process requires a number of experiments and digital post-processing resources that grows exponentially with the number of qubits, even with improvements such as compressed sensing [5,6].As a result, the largest quantum processes that have been fully characterized to date acted only on three qubits [7].
The exponential resources required for a full characterization can be circumvented by extracting partial information about quantum processes.A partial characterization typically yields some figure of merit, such as the process fidelity [8], comparing the noisy implementation of a quantum process to the desired operation.
The process fidelity can be efficiently estimated by randomized benchmarking [9][10][11] or direct fidelity estimation [12][13][14].Direct fidelity estimation can be efficient and hence has been implemented for up to 7 qubits [15], but conflates state preparation and measurement (SPAM) errors with the process fidelity, limiting its value for realistic systems.SPAM errors increase with the system size and so robustness to SPAM is increasingly important for many qubits.Randomized benchmarking decouples the SPAM errors from gate operation errors by applying multiple random elements of the N -qubit Clifford group [10,11].However, implementing each Clifford operation requires O(N 2 / log N ) primitive two-qubit operations [16], so that randomized benchmarking provides very coarse information about the primitive operations.Furthermore, for error rates as low as 0.1% per twoqubit operation, a single 10-qubit Clifford operation will have a cumulative error rate on the order of 10%, which substantially increases the number of measurements required to accurately estimate the process fidelity.Due to these practical limitations, randomized benchmarking has only been applied on operations involving three or less qubits [17].While randomized benchmarking can be performed on small subsets of the qubit register [18], such experiments do not explore the full Hilbert space and therefore will not detect important performance-limiting error mechanisms such as cross-talk.Most crucially, undetected cross-talk and other spatially correlated errors will typically require much higher overheads in faulttolerant quantum error correction schemes [19].Hence characterizing all significant errors affecting an entire register is a critical prerequisite for scalable quantum computation.To achieve this, we focus on the concept of a cycle of operations (introduced in [20]), which is a set of operations that act on an entire quantum register within a set period of time, in analogy to a digital clock cycle.
In this paper, we introduce cycle benchmarking (CB), a protocol for estimating the effect of all global and local error mechanisms that occur when a clock cycle of operations is applied to a quantum register.We prove that CB is robust to SPAM errors and that the number of measurements required to estimate the process fidelity to a fixed precision is approximately independent of the number of qubits.We demonstrate the practicality of CB for many-qubit systems by using it to experimentally estimate the process fidelity of both non-entangling Pauli operations and the multi-qubit entangling Mølmer-Sørensen (MS) gate [21,22] acting on up to ten qubits.

arXiv:1902.08543v1 [quant-ph] 22 Feb 2019
We also confirm that the protocol and analysis methods, derived under theoretical assumptions, produce consistent results in our experimental system.
Mathematically, the ideal operation of interest is described by the corresponding unitary matrix G. Its action is expressed by a map G : ρ → GρG † that acts on the state of the quantum register, described by the density matrix ρ.We denote the map of an ideal operation by capital calligraphic letters, such as G, and their noisy experimental implementations will be indicated by an overset tilde, such as G.We denote the composition of gates by the natural matrix operations for the map representation, so, e.g., RG means first apply G then apply R, and G m means apply G a total of m times.A particularly important class of processes are Pauli cycles P, where the unitary matrix of the process is the N -qubit Pauli matrix P .
We evaluate the quality of a noisy process G by its process fidelity to the ideal target G, which can be written as [12] where Each quantity F P ( G, G) can be experimentally estimated by preparing an eigenstate of P , applying the noisy gate G, and then measuring the expectation value of the ideal outcome G(P ).The process fidelity may be estimated by averaging F P ( G, G) over a set of Pauli matrices.However, a sampling protocol (as in direct fidelity estimation [12,13]) for estimating these individual terms is not robust to SPAM errors.Robustness to SPAM is particularly important because SPAM errors can dominate the gate errors.
We now outline how the CB protocol can quantify the effect of global and local error mechanisms affecting different primitive cycle operations of interest.Inspired by randomized benchmarking [9], SPAM errors can be decoupled from the process fidelity by applying the noisy operation of interest G a total of m times and extracting the process fidelity from the decay of F P ( Gm , G m ) as a function of the sequence length m.Extracting a meaningful error per application of the gate of interest is nontrivial for generic noise channels [23].However, the process fidelity can be rigorously extracted from the decay of F P ( Gm , G m ) with m if the noise process is a Pauli channel.We can engineer an effective Pauli noise channel by introducing a round of random Pauli cycles at each time step between each application of the cycle of interest [24] and the overhead for this randomization can then be eliminated via randomized compiling [20].The effective noise is then associated with the composition of G with a random Pauli cycle, called a dressed cycle, which is an important characterization primitive for any algorithm implemented via randomized compiling [20].Therefore the estimated quantity is the average of the process fidelities of the composite cycle of G combined with a uniformly random Pauli cycle R, The process fidelity of the noise on G alone may also be estimated by taking the ratio of the estimates obtained for G and the identity process Ĩ, in analogy to interleaved benchmarking [25].It should be noted that this method of estimating the fidelity of the noise on G alone is generally subject to a large systematic uncertainty [26], so the CB method is most precise in the important context of characterizing errors on dressed cycles [20].
The full cycle benchmarking protocol for characterizing the errors occurring under a fixed cycle of Clifford gates G composed with a random Pauli cycle R is as follows, illustrated in Fig. 1, where we explain the motivation for each step further below: 1. Select a set of N -qubit Pauli matrices P with K = |P| elements.
2. Select two lengths m 1 and m 2 such that the multiple application of G composes to the identity as illustrated in Fig. 1.
3b. Calculate the expected outcome of the sequence C(P ) assuming ideal gate implementations.
3c. [Main experiment] Implement C(P ) and estimate the overlap between the expected outcome and the noisy implementation C(ρ) for some initial state ρ that is a +1-eigenstate of P .State preparation and measurement are realized by applying the operations BP and B † C(P ) that are described in the Supplementary Information.Step 1 ensures that the action of the N -qubit process is accurately estimated.In the Supplementary Information we prove that the number of Pauli matrices that need to be sampled is independent of the number of qubits, highlighting the scalability of the protocol for large quantum processors.

Estimate the composite process fidelity via
Step 2 ensures that the measurement procedures for circuits in Eq. ( 4) with two different values of m are the same.Having the same measurement procedures for the two values of m is crucial to decouple the SPAM errors from the decay in the process fidelity via the ratio in Eq. (6).In our experiment, we always choose m 1 = 4 and m 2 to be an integer multiple of 4 as, for the considered gates, applying the operation four times subsequently yields the identity process G 4 = I.
In step 3a, we choose random Pauli cycles to engineer an effective Pauli noise process across the L randomizations.This enables us to extract a process fidelity from the decay of L l=1 f P,m,l /L with the sequence length m.This protocol is a special case of a more general protocol that can be used to efficiently characterize non-Clifford gates [27] by selecting random gates and correction operators using randomized compiling [20] instead of Pauli frame randomization.
In step 3b, for any Clifford cycle G, Pauli matrix P , and Pauli cycles R 0 , . . ., R m the expected outcome of the ideal implementation C(P ) is a Pauli matrix that can be efficiently calculated.Note that only the sign of C(P ) depends on the random Pauli cycles.This sign is accounted for when estimating the expectation value with the procedure outlined in the Supplementary Information.Incorporating the sign engineers a measurement of the expectation value of C(P ) that is robust to SPAM errors, as otherwise the expectation values result from a multi-exponential decay [23,28].
In step 3c, we experimentally prepare an eigenstate of a Pauli matrix P , apply a circuit C with interleaved random Pauli cycles, and measure the expectation value of C(P ).The explicit procedures we use for preparing the eigenstate and measuring the expectation value are described in the Supplementary Information.As discussed in the Supplementary Information, the number of measurements required to estimate the expectation value to a fixed additive precision is independent of the number of qubits.
As we prove in the Supplementary Information, the expected value of F RC ( G, G) in Eq. ( 6) for two values of m 1 and m 2 as in step 2 is equal to the composite process fidelity and always provides a lower bound.
We demonstrate the practicality of CB for multi-qubit systems by using it to experimentally estimate the process fidelity of cycles acting globally on quantum registers containing 2, 4, 6, 8, and 10 qubits.The specific cycles we consider consist of simultaneous local Pauli gates and multi-qubit entangling Mølmer-Sørensen (MS) gates [21,22] combined with simultaneous local Pauli gates.We confine 40 Ca + ions in a linear Paul-trap and encode a single qubit in the electronic states of each atomic ion.The encoding utilizes the |0 = 4S 1/2 (m j = −1/2) ground-state and the |1 = 3D 5/2 (m j = −1/2) metastable excited state.Our quantum computing toolbox comprises independent arbitrary single qubit operations and fully entangling N -qubit MS gates (see Supplementary Information).An experimental run consists of: (i) Doppler-cooling; (ii) sideband-cooling of two the mo-tional modes with lowest frequencies; (iii) optical pumping to the initial state |0 ⊗N ; (iv) coherent manipulation; and (v) readout of the ions.Each sequence is repeated 100 times to gather statistics (for experimental details see Supplementary Information and Ref. [29]).
Under Markovian noise, the estimate of the process fidelity from Eq. ( 6) is independent of the sequence lengths m 1 and m 2 used to estimate it (see Supplementary Information).We tested whether our experimental apparatus satisfied this assumption by performing measurements at three values of m (4, 8, and 12) on a register containing 6 qubits and comparing the results obtained from pairs of sequence lengths against each other.The data is tabulated in the Supplementary Information, where the variation of the estimated fidelities is within 0.1 %, which is smaller than the corresponding uncertainties of 0.4 %.This suggests that the errors are Markovian and the estimated process fidelity is independent of the chosen sequence lengths for our system and henceforth we only use two sequence lengths to estimate the process fidelity.
The CB protocol is practical to implement on large processors because the fidelity can be accurately estimated using a number of Pauli matrices that is independent of the number of qubits (see Supplementary Information).To illustrate the rapid convergence under finite sample size, we performed CB of local Pauli operations on a 4 qubit register by exhaustively estimating all 4 4 − 1 = 255 possible decay rates.We estimate the average fidelities via Eq.( 6) for multiple subsets P of the set of all Pauli matrices.For each K = 1, . . ., 100, we evaluate the fidelity for 30 randomly chosen subsets P containing |P| = K Pauli matrices.The mean and standard deviation of the estimated fidelities as functions of the subset size are shown in Fig. 2. The observed standard error of the mean σ = 0.0135(3)/ √ K is larger than the lower bound given by quantum projection noise, σ proj = 0.00151(2)/ √ K, but smaller than the upper bound σ bound = 0.0252(8)/ √ K on the contribution from sampling a finite number of Pauli matrices (see Supplementary Information).The data demonstrate that we can estimate the process fidelity F to an uncertainty smaller than (1 − F )/ √ K using only K ≈ 20 Pauli matrices with other experimental parameters held fixed (the parameters are listed in the Supplementary Information).
We performed CB on local operations and with an interleaved MS gate on registers containing 2, 4, 6, 8, and 10 qubits.The process fidelity as a function of the number of qubits in the register is shown in Fig. 3 and Table I.While it is expected that the fidelity over the full register decreases with increasing register size, an important question is whether the effective error rate per qubit increases, or significant cross-talk effects appear, with increasing numbers of qubits.
We observe that the fidelity for local CB (blue circles in Fig. 3 (a)) decays linearly with register size N , as with P = 0.011 (2).The linear decay of the fidelity indicates that our single-qubit Pauli operations do not show increasing error rates per qubit or a significant onset of cross-talk errors as the register size increases.Each single-qubit Pauli operation requires n S native gates, where on average n S = 1.27, independent of the system size.Therefore the effective process fidelity of a native single-qubit gate is 1 − P / n S = 0.992(1).FIG. 3. Experimental estimates of how rapidly error rates increase as the processor size increases.(a) Process fidelities obtained under CB for local gates (blue circles) and for sequences containing dressed MS gates (red diamonds), that is, MS gates composed with a random Pauli cycle, plotted against the number of qubits in the register.The local operations are consistent with independent errors fitted according to Eq. ( 7).(b) Estimate of the process fidelity of an MS gate obtained by taking the ratio of dressed MS and local process fidelities.The data is fitted to Eq. ( 8) and is consistent with a constant error per two-qubit coupling.
The CB measurements with interleaved MS gates give the process fidelity of the MS gate composed with a round of local randomizing gates as in Eq. (3) (a dressed MS gate, see red diamonds in Fig. 3 (a)).This determines the error rate when a circuit is implemented by randomized compiling [20].The process fidelity of the interleaved gate can be estimated by the ratio of the dressed MS and local fidelities as in interleaved randomized benchmarking [25].The resulting estimates are plotted in Fig. 3 (b).We note that these estimates may have a large systematic error that is on the same order as the overall error rate [26].This systematic uncertainty primarily arises due to coherent over-and under-rotations with similar rotation axes.The MS gate performs rotations around the non-local axes σ x , which are substantially different from the single-qubit rotation axes.Therefore it is unlikely that any coherent errors on the MS gate accumulate with the errors on the single-qubit rotations, and so we neglect this systematic error.We conjecture that the process fidelity of the MS gate should decay quadratically due to an error in each of the N  2 couplings between pairs of qubits introduced by the MS gate.If we assume an average error rate 2 per two-qubit coupling, we can describe the MS gate fidelity as Fitting this model to the results in Fig. 3 (b) gives an estimated error per two-qubit coupling of 2 = 0.0030 (2).However, we cannot harness these two-qubit couplings individually in the experiment and thus they cannot be compared to individually available gates.
In summary, we have developed cycle benchmarking and demonstrated its practicality by implementing it on quantum registers containing N = 2, 4, 6, 8 and 10 qubits.In comparison, a single random Clifford gate for 8 and 10 qubits would require more than 50 MS gates and so randomized benchmarking for 8 and 10 qubits would require a large number of measurements to achieve a useful statistical precision.CB is practical in regimes where randomized benchmarking is impractical because it uses local randomizing gates.A similar approach was independently considered in [28,30] to characterize a twoqubit Clifford gate.However, the approach implemented here and proposed previously in Ref. [27] can be applied in a scalable manner to processors with arbitrary numbers of qubits.
The total experimental time and post-processing resources required for our implementation were approximately independent of the number of qubits (see Supplementary Information), after accounting for the additional tests performed on specific numbers of qubits.This is achieved because, as we prove in the Supplementary Information, the number K of Pauli matrices that need to be sampled to estimate the fidelity is independent of the number of qubits and the fidelity.In addition we demonstrated experimentally that the estimate of the fidelity and its error converges rapidly under finite sample size (Fig. 2), and that the estimated fidelities are approximately independent of the sequence lengths used.
Cycle benchmarking can be readily implemented on general quantum computing architectures to estimate the fidelity of multi-qubit processes.The fidelity corresponds to the effective error rate under randomized compiling [31].The protocol also provides insight into how noise scales within a fixed architecture.In our ion trap, the fidelity of local gates across the whole register decreased linearly with N , demonstrating that our native single-qubit gates have an average fidelity of 99.2(1) % and do not deteriorate with the register size.Thus we have demonstrated a scalable method to validate a major requirement for fault-tolerant quantum computation.In addition, we performed interleaved CB protocols to estimate the performance of the multi-qubit entangling MS gate.From the ratio between the dressed MS and the local CB fidelities we infer entangling gate fidelities ranging from 99.6(1) % to 86(2) % for 2 to 10 qubits.
Quanteninformation GmbH.This research was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Office grant W911NF-16-1-0070.All statements of fact, opinions or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of IARPA, the ODNI, or the U.S. Government.We also acknowledge support by U.S. A.R.O.through grant W911NF-14-1-0103 This research was undertaken thanks in part to funding from TQT, CIFAR, the Government of Ontario, and the Government of Canada through CFREF, NSERC and Industry Canada.
In our experiment, we can only directly perform noisy preparations and measurements in the N -qubit computational basis {|z : z ∈ Z N 2 }.We now specify the basis changes and coarse graining we use to perform other preparations and measurements.For an N -qubit matrix Q (e.g., P , C(P ) from the main text), let B Q rotate the computational basis to an eigenbasis of Q such that For the processes we investigated, C(P ) is always an Nqubit Pauli matrix.Therefore, we only need to prepare eigenstates of Pauli matrices P and measure the expectation value of Pauli matrices C(P ).Consequently, our SPAM procedures are fully specified by defining B Q for arbitrary Pauli matrices Q.We choose to construct the B Q out of local Clifford operators to maximize the SPAM coefficients (which results in a smaller statistical uncertainty).Specifically, let P | j denote the jth tensor factor of a matrix, A I = A Z = I and Then we choose the basis-changing gate for an N -qubit Pauli matrix Q to be Note that the basis changing procedure is independent of the sign of Q.
We now specify the coarse-graining procedure we use to measure the expectation value of observables.Suppose a system is in a state ρ and let Pr(z|Q) be the probability of observing the computational basis outcome z after applying the process B † Q .One measures the expectation value of Q [e.g., Q = C(P )] by applying B † Q , measuring in the computational basis, and averaging the probabilities of the outcomes weighted by the coefficients Tr [B Q (|z z|)Q], where the weights are computed from the ideal quantities.From Eq. ( 9) and by the linearity of the trace, Note that as we average the relative frequencies over all outcomes, the number of measurements required to estimate the expectation value of Q to a fixed additive precision is independent of the number of qubits N by a standard application of, e.g., Hoeffding's inequality [32].
The above estimation procedure will include several sources of SPAM error per qubit, including errors in qubit initialization, measuring qubits in the computational basis, and in the local processes used to change the basis.Consequently, a protocol has to be robust to SPAM errors to provide a practical characterization of a multi-qubit gate.

B. Modelling the decay as a function of the sequence length
We now determine the expected value of L l=1 f P,m,l /L for fixed values of P and m under gate-independent Markovian noise on the random Pauli gates.As in randomized compiling [20], the noise on the gate of interest can be an arbitrary Markovian process.The assumption of gate-independent noise on the random Pauli gates is weaker than the corresponding assumption in randomized benchmarking, namely, that the noise over the whole N -qubit Clifford group is independent of the target.This assumption can be relaxed using the analysis of Ref. [20] at the cost of more cumbersome notation.
Theorem 1.Let G be a Clifford cycle and G be an implementation of G with Markovian noise.Suppose there exists a process A such that R = AR for any Pauli process R. Then for a fixed Pauli matrix P and positive integer m, the expected value of f P,m,l from step 3c of the protocol over all random Pauli processes R 0 , . . ., R m is where E = G † GA and β is a scalar that depends only on P and G m (P ).Moreover, β = 1 in the absence of SPAM errors.
Proof.Substituting Ri = AR i into the noisy version of Eq. (4) (i.e., overset each operator with a ∼), the average superoperator applied over all sequences for a fixed choice of random sequences is Inserting GG † between the ideal Pauli processes R i and the adjacent G gives where E = G † GA.We can now do a standard relabelling of the randomizing gates to obtain a twirl by setting T 0 = R 0 and recursively defining for i > 0. With this relabelling, The T i are all Pauli processes because GPG † is a Pauli process for any Pauli process P and any Clifford process G.Moreover, the T i are uniformly random because the Pauli processes are sampled uniformly at random and form a group.Therefore averaging independently over all T 0 , . . ., T m−1 for a fixed choice of T m results in the effective superoperator where Ẽ = 4 −N Now note that Ẽ is invariant under conjugation by Pauli operators and so Ẽ(Q) ∝ Q for all Q ∈ P N [33].As the Pauli matrices form a trace-orthogonal basis for the set of matrices, for any Q ∈ P N , where we have used the fact that P(Q) = P QP † = ±Q for any Pauli matrices P, Q and Eq. ( 2) .For any two Pauli matrices P, Q ∈ P N , let Then, from Eq. ( 16) with P = G m (P ) for convenience, the expected outcome of the ideal circuit is C = η(T m , P ; )P .Now note that under measurement errors and noisy changes of basis [i.e., errors in the Pr(z|Q)] and folding the residual A into the measurement, Eq. (11) gives the expectation value of some operator P (which is not uniquely defined).Since only the weights in Eq. ( 11) depend on the sign of P and are calculated from the ideal expressions, the noisy measurement for −P gives the expectation value of − P by linearity.Let ρ be the prepared state after applying a noisy change of basis.Then the expectation value of f P,m,l in step 3c over all sequences is by Lemma 2 below, where α P = 2 −N Tr[P P ] is 1 in the absence of errors.Expanding ρ = Q∈P N ρ Q Q and noting that G is a Clifford cycle, Eq. ( 20) reduces to As the Pauli matrices are trace-orthogonal and where ρ P = 2 −N in the absence of SPAM errors, so that β = 2 N α P ρ P = 1 in the absence of SPAM errors.
In the above proof, we make use of the following lemma proven and applied to randomized benchmarking in Ref. [28].
Lemma 2. For any matrix M and any Pauli matrix P , Proof.As the Pauli matrices form an orthogonal basis for the space of matrices, we can write where by linearity, where As η(Q, P ) is a real 1-dimensional representation of the Pauli group for any fixed Pauli matrix P and η(Q, P ) and η(Q, R) are inequivalent as representations for P = R, by Schur's orthogonality relations.

C. Estimating the process fidelity
We now prove that the expectation value of Eq. ( 6) provides an accurate, yet conservative, estimate of the process fidelity in Eq. ( 3) under the same assumptions as in Eq. ( 2) .
be the expected outcome of the cycle benchmarking protocol over all randomizations.Let G be a Clifford cycle and G be an implementation of G with Markovian noise.Suppose there exists a process A such that R = AR for any Pauli process R. Then F ≤ F RC ( G, G) and Proof.First, recall that the process fidelity is linear and for any unitary process U, Therefore from Eq. (3) , Moreover, F (E, I) = F ( Ẽ, I) by Eq. ( 1) and Eq. ( 18), and so we will prove statements for F (E, I).Now fix a Pauli matrix P and note that if m 1 and m 2 = m 1 +δm are chosen so that P = G m2 (P ) = G m1 (P ) (guaranteed by step 2 of the protocol), then by Theorem 1, as the scalar is the same for m 1 and m 2 .That is, the terms being averaged over in Eq. ( 6) are themselves geometric means of F Q ( Ẽ, Ĩ) for different Pauli matrices Q obtained by applying G to the sampled P .Formally, let w(Q|P , δm) be the relative frequency of Q in the list (G j (P ) : j = 0, . . ., δm − 1).Then f P,m2,l f P,m1,l By the inequality of the weighted arithmetic and geometric means, As G is a Clifford matrix, P ∈P N ω(Q|P, δm) = 1 for all Pauli matrices Q. Therefore summing Eq. ( 29) over all input Pauli matrices P gives F ≤ F (E, I).To prove the approximate statement, let r Q = 1 − F Q (E, I).Expanding Eq. ( 28) to second order in the r Q gives f P,m2,l f P,m1,l The approximate claim then holds as Proof.Note that Eq. ( 18) holds for any completely positive and trace preserving map E with Ẽ as defined in Eq. (17).In particular, F P ( Ẽ, I) = F P (E, I) for all P ∈ P N and so F ( Ẽ, I) = F (E, I) by Eq. ( 1) .As Ẽ is covariant under Pauli channels, there exists a probability distribution p(Q) over the set of Pauli matrices such that [33].
For any Kraus operator decomposition, the process fidelity can be written as [34] F ( Ẽ, I) Substituting Eq. ( 30) into Eq.( 2) and using [P, I] = 0, p(Q) ≥ 0, and Eq. ( 31) gives The lower bound follows as the F P ( Ẽ, I) are eigenvalues of Ẽ and hence are in the unit disc [35].

D. Finite sampling effects
We now consider the effect of finite samples.All the "approximately normal" statements in this section can be replaced by rigorous statements using the results of [36], Hoeffding's inequality [32] and the union bound, at the expense of additional notation and less favorable (but pessimistic) constants.
First, note that sampling a finite number of random sequences (i.e., finite L) and estimating each expectation value with a finite number of measurements will produce an estimate of f P,m,l with an error P,m that is approximately normally distributed with standard deviation σ P,m .Using a series expansion of the ratio FP := f P,m2,l f P,m1,l 1/δm , the error in each term in the sum will be approximately ( P,m2 − P,m1 )/δm and so will be approximately normally distributed.Moreover, if we choose m 1 and m 2 so that δm ≈ 1 − F (E, I) (where E is as in Theorem 1), then the error on each term in the sum will have standard deviation σ P ∝ 1 − P (E, I).The values of m in Table II satisfy this condition.We now consider the effect of sampling a finite number K of Pauli matrices P with replacement under the same assumptions as in Theorem 1. Sampling K Pauli matrices P uniformly at random with replacement and averaging the estimates FP gives an estimate F whose expected variance over the Pauli matrices is The first term satisfies since for any Pauli matrix P , by Lemma 4. Note that the variance is independent of the number of qubits.Furthermore, if the δm are chosen to be proportional to 1/(1 − F ), then the variance of F is proportional to (1 − F ) 2 , so that we can efficiently estimate 1 − F to multiplicative precision.It can be seen in Fig. 2 that the standard deviation decreases with the square-root of the sampled subspaces K, with a least squares fit giving σ = 0.0135(3)/ √ K.The observed standard deviation is larger than the lower bound given by quantum projection noise σ proj = 0.00151(2)/ √ K but smaller than the upper bound σ bound = 0.0252(8)/ √ K on the contribution from sampling a finite number of Pauli matrices.This suggests that the other source of statistical uncertainty, namely, a finite number of randomizations L and measurements per sequence, is sufficiently small to allow us to accurately estimate the process fidelity.

E. Correction operators for the MS gate
We performed cycle benchmarking for the identity and MS gates.The MS gate satisfies MS 4 = I, so that we can restrict m to be an integral multiple of 4. Indeed, MS 2 ∝ X ⊗N so that we could restrict m to be even numbers by keeping track of the sign (which would depend on the Pauli matrix P ).To compute the expectation value of C(P ), we need to know how an arbitrary Pauli operator Q propagates through the MS gate.Using MS ∝ (I − iX ⊗N )/ √ 2 for even N gives

II. EXPERIMENTAL METHODS
The CB experiments are defined by a sequence of Nqubit Clifford gates according to the experimental protocol in Fig. 1.Specifically, the sequences contain a series of single qubit rotations and N -qubit MS gates.A rotation of qubit j with angle θ is defined as R(θ) j = exp(iθp j /2), where p j ∈ [X, Y, Z] are single-qubit Pauli operations.
After defining the sequences we compile them into the actual machine language [37].In this experiment an elementary single qubit operation consist of one addressed z-rotation sandwiched between two collective rotations around the x-or y-axis, e.g.X(π/2) 1 = X(−π/2) 12 Z(π/2) 1 X(π/2) 12 for 2 qubits.The collective x-and y-rotations can be seen as simple basis changes on the entire register, and thus these basis changes can be shared by the individual qubit operations.By changing the temporal order of the collective x-, y-rotations and the individual z-rotations, the total number of collective rotations can be minimized.
We expect the single qubit z-rotations to have significantly larger infidelity compared to the collective rotations for the following reasons: First, the addressed laser beam has a smaller beam size and hence has larger intensity fluctuations.Second, we perform the z-rotations using the AC-Stark effect, which is quadratically more sensitive to intensity fluctuations than resonant x-, yrotations.Therefore the number of single qubit rotations Z(θ) j needed to perform a N -qubit Pauli operation is expected to be the limiting factor for local operations.In general, the average number of single qubit rotations per N -qubit Pauli operation scales linearly with N .To simplify the calibration procedure we only perform Z(π/2) j rotations.Thus e.g. a Z(π) j operation is implemented using two Z(π/2) j operations.In Fig. 4 we show the dependency of the average number of Z(π/2) j operations on the number of qubits.On average we implement 1.27(2) • N addressed π/2 rotations for an N -qubit Pauli operation.
In Table II we give an overview of the experimental parameters that we used to estimate the local CB and the dressed MS fidelities.
A. Testing the dependence of the estimator on the sequence length If the noise in the system is Markovian, we expect the estimated process fidelity to be independent of the sequence lengths m 1 and m 2 to within O([1 − F RC ( G, G)] 2 ) (see Theorem 3).We test this by performing measurements at 3 different sequence lengths for 6 qubits, as described in Table II.We validate that the estimated pro- cess fidelity is independent of m 1 and m 2 by comparing the results of three different length pairs 4-8, and 8-12.As can be seen in Table III, the measured fidelities agree to within half a standard deviation, which supports the validity of the assumtions for our experimental apparatus.

B. Analyzing fidelity drift
Slow temperature fluctuations on the timescale of minutes to days cause changes in various components of our experimental apparatus.One of the major causes for a loss in fidelity over time is the alignment of the laser beams relative to the ion position.The single ion addressing laser beam is tightly focused to a spot size of ∼ 2 µm and the beam position changes as the temperature varies.This change in position leads to a miscalibration of the Rabi frequency as well as an increase in intensity fluctuations.We analyze the temporal dependence of the fidelity with 4-qubit CB as depicted in Fig. 5.The 255 subspaces were measured in 3 sessions, where the experimental system was recalibrated at the beginning of each session.We approximate the drift of the fidelity to be linear in first order and thus can describe the time dependent fidelity as F (t) = F 0 − t.We obtain an average loss of fidelity of L = 3.3(5) • 10 −3 h −1 for local gates and I = 5.4(8) • 10 −3 h −1 for the dressed MS gate, see Table IV.This measurement suggests that we can expect a maximum loss of fidelity of 1 % when recalibrating the apparatus every two hours.

FIG. 2 .
FIG.2.Experimental evidence demonstrating rapid convergence under finite sample size with favorable constant factors.(a) Mean fidelity estimates from 30 randomly sampled subsets of Pauli matrices as a function of the size of the subset.The error bars illustrate the standard deviation of the 30 samples, that is, the standard error of the mean.The green line describes the mean fidelity F = 97.25(8)% calculated from the complete data set.(b) The standard deviation of the fidelity from plot (a) against K including an upper bound in orange (see Supplementary Information), a fit of the standard deviation data in green and a fit of the calculated projection noise in red.

1 )FIG. 5 . 4 -
FIG.5.4-qubit Pauli fidelities for local gates (blue) and the dressed MS gate (red) plotted on the time in hours.We measured all 255 subspaces in three measurement sessions, where the experiment was recalibrated at the beginning of each session.
) . 1. Schematic circuit implementation of the experimental cycle benchmarking (CB) protocol.The protocol can be subdivided into three parts, depicted by the different colors.The green gates describe basis changing operations, which are defined in the Supplementary Information.The red gates G are the noisy implementations of some gate of interest.The blue gates are random Pauli cycles that are introduced to create an effective Pauli channel per application of the gate of interest, where Ri,j denotes the j th tensor factor of the i th gate.Creating an effective Pauli channel per application enables errors to be systematically amplified under m-fold iterations for more precise and SPAM-free estimation of the errors in the interleaved red gates G.The blue and the red gates together form the random circuit C(P ).The sequence of local operations before the first and last rounds of random Pauli cycles are identified as conceptually distinct but were compiled into the initial and final round of local gates in the experiment.The experimental parameters K, m, and L of this work are given in the Supplementary Information. FIG

TABLE II .
Experimental parameters for the taken CB data for different register.Qubits Subspaces K Sequence lengths m Random sequences L Total sequences Measurement time (h)

TABLE III .
6-qubit process fidelities estimated via CB (%) using different pairs sequence lengths (m1, m2).The results illustrate that the estimated process fidelity is independent of the sequence lengths used, subject to the constraint in step 2 of the protocol.

TABLE IV .
4-qubit fidelity drift rates, where L and D describe the loss of fidelity per hour for local gates and the dressed MS gate.The data corresponds to the estimated linear slopes of