Shadow estimation of gate-set properties from random sequences

With quantum computing devices increasing in scale and complexity, there is a growing need for tools that obtain precise diagnostic information about quantum operations. However, current quantum devices are only capable of short unstructured gate sequences followed by native measurements. We accept this limitation and turn it into a new paradigm for characterizing quantum gate-sets. A single experiment—random sequence estimation—solves a wealth of estimation problems, with all complexity moved to classical post-processing. We derive robust channel variants of shadow estimation with close-to-optimal performance guarantees and use these as a primitive for partial, compressive and full process tomography as well as the learning of Pauli noise. We discuss applications to the quantum gate engineering cycle, and propose novel methods for the optimization of quantum gates and diagnosing cross-talk.


I. INTRODUCTION
Recent years have seen the rapid development of quantum computing devices to unprecedented system sizes.These devices are still noisy and of limited computational power, but go substantially beyond what was conceivable not very long ago.In order to scale even further to larger and more accurate devices, it is key to develop tools for efficiently characterizing quantum operations [1,2] at scale.Besides providing crucial actionable advice for the practitioner, the characterization of quantum operations is also important for developing an indepth theoretical understanding of the actual capabilities of quantum devices and for providing a fair comparison between different types of devices, and with classical computing power on the same tasks [3][4][5].Over the years, many protocols for characterizing quantum operations have been developed [6][7][8].
That said, while a wealth of theoretical ideas for benchmarking, verification and tomographic recovery have been suggested, only a few of them are relevant in practice.With present quantum devices, only relatively short gate sequences can be implemented on qubit arrays, followed by a native measurement at the end of the circuit that typically suffers from sizeable read-out noise.With these limitations the most prominent protocols for characterizing digital quantum gates fall into the class of randomized benchmarking (RB) [9][10][11][12][13][14] (including newer protocols such as averaged circuit eigenvalue sampling [15]).RB implements suitable sequences of random quantum gates and extracts a measure of quality as parameters describing the decay rate of the measured signal with the sequence length.This has the advantage of yielding state preparation and measurement (SPAM) error robust error metrics.The experimental sequences of most RB protocols are carefully designed (such as compiled circuit inverses) to efficiently extract specific information from a gate-set.Prominent exceptions are 'filtered' RB protocols such as linear crossentropy benchmarking (XEB) [3] that directly work with random sequences of i.i.d.drawn gates and, e.g., omit an inversion gate.
In this work, we take these observations seriously, and revert the mindset that is commonly applied when devising new schemes for benchmarking and characterization.We ask the question: If all we can feasibly do is implementing unstructured random sequences followed by a native measurement, what can we learn?At first sight this endeavour is not promising.Compared to 'traditional' RB and tomographic protocols we are giving up on central ingredients.Thinking about how much information we measure in an unstructured way, we run into the problem that typically, the probabilities of individual measurement results are exponentially small in the number of qubits.This is orthogonal to the careful design of efficient characterization schemes in prior work, and does not obviously yield sample efficient estimation schemes at all.
Our change of paradigm is analogous to the mindset of classical shadows [16,17].Classical state shadows allow for the sample-efficient estimation of (exponentially) many different functions of a quantum state from the same data by only modifying the classical post-processing.Perhaps the central surprise value of the result of ref. [16] is rigorously guaranteeing that the fidelity of a quantum state with respect to any pure state can be estimated from the same experiment, using only constantly many state copies with sufficiently randomized ba-FIG.1.The gate-set shadow estimation protocol proceeds in two stages: First, for a fixed inital state ρ and varying sequence lengths m a total of S random sequences of quantum gates of length m are experimentally implemented and each followed by a measurement.We call the observed tuples of measurement outcome and gate sequence (x j , g j 1 , . . ., g j m ), j = 1, . . ., S the gate-set shadow.The second classical post-processing stage consists itself of three steps: (i) A given sequence correlation function is calculated for every entry of the gate set shadow.For the UIRS protocol a sequence correlation function fA is specified in terms of a probe super-operator A and an irreducible representation σ. (ii) We calculate the the sequence average kf A (m) as the mean or median-of-means of the result of step (i) over sequences of the same length m. (iii) Sequence averages for different lengths m are used as data points to fit a theoretical model (eq.( 5)) in order to extract the generalized gate-set fidelity with respect to the super-operator A and the irreducible representation σ, denoted here by p(A).One of the most important features of this approach is that we can use the same experimental data to accurately estimate exponentially many generalized fidelities p(A1), p(A2), . . ., p(AS) by evaluating different sequence correlation functions on the same gate-set shadow.In this way, we can self-consistently and robustly estimate many different properties of the gate-set noise from a minimal amount of data obtained in a simple experiment.Section II H to II J explain and derive guarantees of how the gate-set shadow estimation protocol can be used as a primitive in other more detailed characterization task, such as compressive channel or marginal tomography, potentially allowing one to run the whole engineering cycle on essentially the same type of data.
sis measurements.This is in stark contrast to schemes like direct fidelity estimation [18] that given a priori knowledge of the target state carefully optimize the measurements that are performed.
In this work, we define the observed measurement outcomes of random sequences of quantum gates as the classical shadow of a gate-set and study the sample efficiency of SPAM-robust estimators for different linear functionals of a gate-set from the same data.Borrowing the median-of-mean estimators used on classical state shadows, we show that the sampling complexity of the estimation (the number of singleshot quantum measurements) can be controlled by a dynamic shadow norm with exponential confidence.We prove bounds on this dynamic shadow norm-a considerably more involved object than its state counterpart-for prominent gate-sets such as the multi-qubit Clifford group and the local Clifford group.We find that by a suitable post-processing we can estimate the relative average gate fidelities of the noise of a Clifford gate-set with respect to an exponentially large number of unitary channels from polynomially many measurement samples from the same uniformly random experiment.More generally, we show that the dynamical shadow norm can be controlled in terms of the unitarity of the estimated linear quantity.Using local gate-sets, we show that one can selectively gain information about channel marginals capturing correlations in their noisy implementation.We promote this primitive further to design a highly scalable and efficient tomography scheme for cross-talk effects.Furthermore, we exemplify how gateset shadows can be used to construct SPAM-robust objective functions for learning noise models and for robust low-rank quantum process tomography.
The important feature of all these schemes is that we only adapt the classical post-processing to the task at hand, not the quantum experiment.A single type of data, namely samples from simple local measurements on uniformly random gate sequences, is sufficient to perform a large class of diagnostic tasks of benchmarking, verification and tomographic recovery.The mindset can be captured as "Measure first, ask later!".Going beyond uniformly independently random sequences, we can generalize our approach to provide an optimal scheme to learn Pauli noise, emulating the protocol of ref. [19] with a simpler experimental prescription and theoretical analysis.
Related work.We build on a body of literature on randomized schemes for quantum device characterization [8].The potential of analyzing the output statistics of gate-set sequences to self-consistently extract essentially all information of a gate-set (as well as the initial state and the measurement) has been realized by gate-set tomography [20][21][22][23][24][25] with recent variants only requiring random sequences (gate-set shadows) [26,27].In contrast to this self-consistent tomographic estimation of all gates in the gate-set, we here target individual linear quantities of the gate-set's average noise or an interleaved quantum process.Our cross-talk tomography pro-tocol follows the spirit of simultaneous RB [28], but goes significantly beyond simultaneous RB in providing higherorder correlation measures and tomographic information of noise-channel marginals, efficiently from the data of a single randomized experiment.In ref. [29], it has been observed that variants of interleaved multi-qubit Clifford randomized benchmarking experiments [30] have access to relative average gate fidelities from which unital quantum channels can be reconstructed.The protocol of ref. [29] performs a different experiment for each fidelity yielding a sub-optimal overall sample complexity for tomography or low-rank tomography [31,32].Gate-set shadow estimation solves both these shortcomings.

II. RESULTS
We begin with explaining the general protocol.In the subsequent sections, we then provide theoretical performance guarantees for specific gate-sets and explain how the protocol can be used as a robust estimation primitive in more complex characterization tasks, such as channel tomography.The gate-set shadow estimation protocol consists of two separate stages: an experiment, where measurement results from random circuits of different lengths are recorded, and a classical post-processing step, where different parameters can be estimated from the measured data.Figure 1 summarizes the complete protocol.

A. Protocol: Experiment
We aim at characterizing the accuracy of the implementation of a target gate-set G.The experimental primitive is the realization of random (gate) sequences of length m: After preparing an initial ρ (e.g., |0⟩⟨0|) a sequence of gates g ∈ G ×m is drawn at random according to a distribution µ m : G ×m → [0, 1] and applied to ρ.This is then followed by a measurement specified by a POVM {E x } x with measurement outcomes in X (e.g., a computational basis measurement).If x ∈ X is observed, the result of the primitive is a tuple (x, g) ∈ X × G ×m .
Repeating the primitive multiple times yields a series of tuples {(x i , g i )} S i=1 which we refer to as a (self-consistent) gate-set shadow.(Note that ref. [16] actually calls the dual frame elements indexed by the observed output statistics of an informationally complete POVM a state's shadow.In contrast, we here directly refer to the sampled sequence and observed measurement outcomes as a shadow.) A complete experimental protocol further involves measuring such shadows for a set of different sequence lengths m.In order to simplify the theoretical analysis, we focus on the paradigmatic case of G being a finite subgroup of SU(2 N ) (such as the Clifford group) and distributions on the sequences arising from the uniform measure over these subgroups.
The simplest example of protocols in this context are uniform independent random sequence (or UIRS) protocols where the gates in the sequences are drawn from the gate-set uniformly and independently at random.This can be seen as the paradigmatic case, although we will go beyond this later in this work.We make shadow gate-set estimation through the UIRS protocol explicit for several important gate-sets: namely the multi-qubit Clifford group C n and the independent single -qubit Clifford group C ×n 1 (which we will call the local Clifford group).

B. Protocol: Classical post-processing
Given a gate-set shadow {(x i , g i )} S i=1 , we define an empirical estimator in terms of a sequence correlation function f (x, g) : X × G ×m → C. For every such sequence correlation function, in the post-processing, we (i) evaluate f for all entries of the gate-set shadows and (ii) calculate the empirical mean or median-of-mean estimator of the result.After repeating step (i) and (ii) for different sequence lengths m, we fit in step (iii) a theoretical model k f to the estimates of the sequence means kf (m).After giving this overview of the post-processing protocol, let us take a closer look at the steps and explain their roles in the UIRS protocol: Regarding step (i): Generally speaking, sequence correlation functions can be seen as the gate-set analog of an observable in shadow estimation.They allow us to compute properties of noisy gate-sets (for example the average fidelity of an average group element) from experimentally observed gate-set shadows.We emphasize that, like state shadow estimation, the data collection step of random sequence estimation is independent of the gate-set properties one wishes to estimate, with this estimation step happening entirely in classical post-processing.Importantly, this enables one to estimate many different correlation functions from the same experimental data.
We here introduce a particular class of sequence correlation functions for UIRS protocols: Consider an irreducible representation σ of G with representation space V σ .For the multi-qubit Clifford group, e.g., its adjoint action on traceless Hermitian matrices is of main interest.We further specify a sequence correlation function in terms of a matrix A, POVM {E x } x∈X and state ρ, on V σ as with a suitable normalization factor α. (Note that for m = 1, and perfectly implemented gates, this expression reproduces the classical state shadows of ref. [16].Generally, restricted to multiplicity-free, irreducible representations, the dual frame construction of ref. [16] simply amounts to introducing a proper normalization factor, justifying our choice of calling the observed statistics directly the shadow.)We refer to A as a probe (super-)operator as it specifies the linear quantity of the gate-set that is encoded into the decay parameter of the empirical estimator.Note that the expression eq. ( 2) is closely related to the Born probability of measuring x after applying the sequence g to ρ.The main differences are that we restrict the computation to the subspace V σ and interleave the sequence with the probe operator A. Similar to classical shadows, the computation of f A requires, in general, the same resources as simulating the physical evolution within a subspace.In many situations, however, further structure renders this task efficient.This is in particular the case when both the gate-set and the probe-operators are chosen to be multi-qubit Clifford operations.
Note that all previously existing RB protocols only use functions that at most depend on the product of the operations in the sequence, In filtered RB protocols, such as linear cross-entropy benchmarking [3], character benchmarking [33] and Pauli-noise tomography [19], the inversion gate can in this way be omitted and accounted for in post-processing.Using a non-trivial A goes significantly beyond existing schemes and allows one to even efficiently 'interleave in-post' the same data with different probe operators.
Regarding step (ii): By taking an empirical average over the gate-set, we expect kf (m) to be a degree m polynomial in the 'average noise' of the gate-set.One insight of standard Clifford randomized benchmarking is that by taking an uniform average over a sufficiently large group the 'average noise' is probed isotropically, effectively projecting it onto a depolarizing channel.Similarly, UIRS will probe the 'average noise' of the gate-set, but by choosing different probe operators A, we can alter the operator on which the noise is projected, revealing more information.Performing the post-processing separately for different irreducible representations σ, ensures that the gate-set always averages sufficiently over the subspace under consideration.We will make this intuition precise in the subsequent section.
Regarding step (iii): The projection onto isotropic noise (on each representation space) also dramatically 'simplifies' the functional form of the expected value of the sequence averages kf (m).Recall that for standard Clifford RB, one effectively witnesses a single exponential decay.Below we show that analogously for UIRS protocols, the theoretical fitting model is a single (matrix) exponential decay encoding linear quantities of the noise in its decay parameter.The decay parameter(s) can be extracted using least-square fitting algorithms (or tone-finding algorithms such as ESPRIT).See ref. [14,Sec.VII] for a discussion on different postprocessing techniques.In the end, the UIRS gate-set shadow estimation protocol returns the decay parameters for different choices of probe operators A and representations σ.

C. Fitting model
In order to keep the theoretical derivation and statements concise and straight-forward to interpret, we adhere to some standard assumptions that are commonly used in the analysis of RB protocols.First, we assume that the quantum channel that implements a sequence g on the quantum device, can be written as E(g) = m i=1 ϕ(g i ) with a map ϕ : G → S n .Here, S n is the space of n-qubit super-operators.The existence of E already excludes, e.g., time-dependent effects in between different experiments, and the factorization into a map ϕ further restricts to Markovian noise.Under this assumption it can be proven that RB protocols [9][10][11][12][13][14] function correctly [14,34,35].For non-Markovian noise much less is known, but in the context of RB rigorous results have been obtained for quasi-static noise [36], time-dependent noise [37] and more recently using tensor-models [38].We expect these results to broadly carry over to random sequence estimation.
Second, we assume gate-independent noise, positing the existence of quantum channels Λ L , Λ R such that where ω(g)(ρ) = U g ρU g † is some ideal implementation of the gate g.We argue in section II K that our results also apply (up to a negligible error) in the more general Markovian error model, but rigorously proving this (along the lines of ref. [14]) is beyond the scope of this work.
Instead of Λ R , Λ L describing the noise of the gate-set implementation, one can also take the perspective of actively interleaving a channel of interest between a fairly ideal implementation of a gate-set (as is done in interleaved randomized benchmarking [30]).While different in protocol and data interpretation, in the analysis, this black-box query model is simply a special case of the gate-independent noise model and results carry over.
The main analytical result of this work is to establish rigorous performance guarantees for the estimation from gate-set shadows.The obvious first question being: what do we actually estimate?As a first result, we establish the 'simple' model that we should be fitting to the data.We show that for a probe operator A the empirical estimator of the protocol converges in probability with the number of samples S in the shadow to a matrix-exponential decay Here, the matrix Φ depends only on the 'between-gates noise channel' Λ := Λ R Λ L and the probe super-operator A, while Θ captures SPAM dependence.In particular, if ω contains t copies of the representation σ then we have where P i is the projector onto the ith copy of the representation σ inside ω.Note that here the trace is taken on the space of super-operators.We give the derivation of this result in the Supplementary Note 4. Equation (5) indicates that we should fit a linear combination of (up-to) t exponential decays to the sequence average kf A (m).The resulting decay parameters are the eigenvalues of the matrix Φ, which encode information about the overlap of Λ and A in the representation space.
A particularly simple fitting model with easily interpretable decay parameters, arises when the representation σ appears in the decomposition of ω without multiplicities (i.e., there is no other representation in ω related to σ by a change of basis).If σ is multiplicity-free, then k f A (m) describes a single scalar exponential decay with decay parameter and A σ = P σ AP σ the probe operator restricted to the representation σ of dimension d σ .Note that the proportionality now hides the SPAM-dependent pre-factor.Thus, by fitting a single exponential decay to the empirically observed sequence averages kf A , we can estimate p σ,A (Λ), the trace-overlap of Λ with A on σ.The decay parameter can be thought of as a generalized fidelity or effective depolarization parameter, indicating how much the noise channel Λ agrees on average with the probe operator A on the representation space of σ.

D. Sample complexity
Against the background of the extensively explored variants of RB protocols, the above decay model is not entirely unexpected.A priori less obvious, however, is the sample efficiency of gate-set shadow protocols.The sequence correlation functions f (x, g) involve normalization factors that typically scale with the dimension of the irreducible representation under consideration.As a consequence their range can become exponentially large in the number of qubits, causing a simple empirical mean estimator to be susceptible to outliers in the measurement statistics, as well as making a suitably bounded variance a priori nontrivial.Going significantly beyond the established statistical guarantees in RB, we establish general variance bounds for the UIRS protocol.We do this by introducing a sequence analogue to the shadow norm introduced in ref. [16] defined on probe super-operator A as opposed to observables.Emphasizing its explicit dependence on the sequence length m we call this norm (really a family of norms indexed by m) the dynamic shadow norm ∥A∥ dyn,m .This norm, formally defined in Supplementary Equation ( 25), depends on the underlying gate-set G as well as the ideal input POVM {E x } and state ρ.Given these parameters it quantifies the sample complexity of estimating the mean k f A (m) for arbitrary gate-independent noise.Because of its dependence on the sequence length, the dynamic shadow norm is a more intricate object than its state counterpart.Evaluating it for specific gate-sets accounts for the bulk of the technical innovation in this paper.In terms of the dynamic shadow norm we have the following upper bound on the variance of the UIRS protocol.Theorem 1 (Upper bound on the variance).Consider an UIRS protocol (at sequence length m) with gate-set G and a correlation function f A with probe operator A. The variance of the An extended statement and the proof is given in the Supplementary Note 4. The bound on the variance V A (m) directly implies a non-asymptotic bound on the sample complexity for the estimator kf A (m) with exponential confidence through the use of median-of-means estimation.The exponential confidence in particular allows us to estimate 'many' quantities simultaneously from the same shadow data with only logarithmic overhead in the number of quantities.See Supplementary Note 3 for details.More precisely, we get the following guarantee: Run the UIRS protocol (at sequence length m) and measure a gate-set shadow of S many samples.Choose a set A of probe operators, an ϵ > 0 and ensure that for all for a suitable constant C.Then, in the post-processing, we obtain ϵ-additive estimates, i.e., |k Hence, bounding the dynamic shadow norm for all A ∈ A and different sequence lengths m gives simultaneous guarantees for many estimators kA (m) with an overall sampling complexity being the sum of the bounds Equation ( 10) for all m.As explained above, m → kA (m) is then fitted using a theoretical signal model.For example, in the scenario of multiplicity-free representations giving rise to a single exponential decay eq. ( 7), we thereby obtain an estimator for p σ,A (Λ) for all A ∈ A. The exponential fitting itself is a well-studied problem, for which many advanced techniques [39,40], flexible software packages [41], and rigorous bounds [42] can be readily applied.

E. Example: Multi-qubit Clifford UIRS
We now provide two particularly practically relevant examples of UIRS protocols, derive their signal model and a dynamical shadow norm bound guaranteeing their efficiency.
The first example is the multi-qubit Clifford group C n that already takes a prominent role in quantum characterization and quantum computation more generally [43].We consider an UIRS experiment for C n : i.e., sequences of i.i.d.Clifford gates uniformly drawn at random, acting on the initial state |0⟩⟨0| and ending in a computational basis measurement.This is a common gate-set with a well-understood representation structure, allowing us to explicitly calculate the sequence mean k A (m) and give bounds on the dynamic shadow norm ∥A∥ dyn,m which controls the sample complexity of sequence estimation.
Signal model.The adjoint representation of the multiqubit Clifford group ω(g) decomposes into two inequivalent irreducible representations [44]: σ tr supported on the normalized identity matrix and σ ad supported on the space of traceless matrices, spanned by the generalized Pauli matrices.See Supplementary Note 2 for details.We focus on sequence correlation functions with support on σ ad only, i.e., A = P ad AP ad .Then, k f A (m) describes a single exponential decay eq. ( 7) with This is a familiar quantity: For A = P ad , it corresponds to the depolarizing probability (essentially the average fidelity) of the channel Λ.As a very special case the Clifford UIRS protocol in this way emulates standard Clifford randomized benchmarking without performing an inversion.However, gate-set shadows are considerably more flexible.For instance, by choosing A = U a unitary channel, p ad,U (Λ) measures the relative average fidelity of Λ w.r.t. the unitary U (i.e., the average fidelity of U † • Λ).In particular, for U a Clifford channel, the corresponding sequence correlation function can be evaluated efficiently.Relative average gate fidelities are also estimated in interleaved RB.Compared to existing interleaved RB protocols such as the scheme of ref. [29], gate-set shadows have the crucial advantage that the experimental protocol itself is independent of U .Since we do not have to implement A on a quantum device, we can also consider A that do not correspond to quantum channels such as rank-one super-operators of the form X Tr(Y • )for operators X, Y .Hence, the gate-set shadows are a versatile tool to estimate properties of the implementation of a Clifford gate-set.
Dynamical shadow norm.The versatility of Clifford UIRS in practice of course crucially depends on the sample efficiency of the estimation.From the above it is not clear that k A (m) can be efficiently estimated for arbitrary A. Demanding that k A (1) = 1 in the limit of perfect state preparation, measurement and gates, the normalization factor α in eq. ( 2) is α = 2 n + 1, leading to a single-shot estimator taking values exponentially large in n.Building upon the machinery of the dynamic shadow norm and theorem 1, we can still provide guarantees for efficiently estimatable probe operators and investigate the limits of Clifford UIRS.As a first step, we assume A to be a restriction of a unitary channel U to the traceless subspace, i.e., A = P ad U P ad .In this case, the dynamic shadow norm can in fact be bounded by a small constant independent of the sequence length.
Theorem 2 (Clifford UIRS unitary norm bound).For the nqubit Clifford UIRS protocol, U a unitary channel, and A = P ad U P ad , it holds that Theorem 2 is noteworthy for several reasons.First, it does not depend on the number of qubits n.Therefore, the estimation of k U (m) is efficient even on a quantum system consisting of many qubits.Second, the shadow-norm bound does not depend on the sequence length m, enabling relative accuracy estimation of the decay rate in certain regimes.We note that the constant 10 is probably sub-optimal.The derivation of this theorem can be found in the Supplementary Note 6.
As the main consequence of theorem 2 together with eq. ( 10), we find that it is possible to sample-efficiently estimate exponentially many relative fidelities with respect to unitary channels to additive precision from the same gate-set shadows obtained by multi-qubit Clifford UIRS.
Next, we consider a general probe super-operator A restricted to the traceless subspace.Note that A does not need to be a quantum channel.In the following, we show that the dynamical shadow norm can be controlled in terms of the unitarity [45] of A, For instance, u(A) ≤ 1 if A is a quantum channel with equality if A is indeed unitary.We prove the following theorem.
Theorem 3 (Clifford UIRS general norm bound).Consider the n-qubit Clifford UIRS protocol and let A = P ad AP ad be a probe super-operator restricted to the traceless subspace.The dynamic shadow norm for m > 2 is upper bounded by with r(A) = (1 + 2 4−n/3 )u(A) and suitable constant C.
The proof of this theorem, given in the Supplementary Note 6, is similar in spirit to theorem 2, but significantly more involved.Choosing A to be unitary (u(A) = 1) does not recover theorem 2, due to the appearance of the quadratic scaling in m.This term arises because we consider general probe super-operators A, giving rise to polynomial transient dynamics in the dynamic shadow norm (due to the non-normality of the underlying operators [46,Chapter 6]).For many sensible choices of A, the polynomial scaling in m does not appear as is evidenced by theorem 2. Also, the bound does not quite scale with unitarity u(A), but rather with the parameter r(A) which differs from u(A) by an exponentially small factor.We believe this to be an artifact of the proof technique.
This theorem leads us to the remarkable conclusion that the multi-qubit Clifford UIRS protocol allows us to estimate overlaps p(AΛ) for a very large class of super-operators.In particular, A can be any trace non-increasing map, allowing us, e.g., to characterize the overlap between the noise channel Λ and sets of Kraus operators, making the Clifford UIRS protocol an all-purpose tool for noise map exploration.

F. Example: Local Clifford UIRS
A particularly scalable and interesting protocol arises when performing a UIRS protocol with the local Clifford group C ×n 1 over n qubits.In this case, the experiment consists of performing sequences of i.i.d.random single-qubit gates simultaneously on all qubits, initially prepared in |0⟩⟨0| ending with a computational basis measurement.
For C ×n with U g = U (g1,...,gn) = U g1 ⊗ . . .⊗ U gn decomposes into 2 n irreducible, mutually inequivalent representations σ w with w ∈ {0, 1} n that have support on the normalized non-identity Pauli operators on all qubits i for which w i = 1.We denote the projectors onto these irreducible sub-representations as P w (see Supplementary Note 2 for more details).
Signal model.We consider sequence correlation functions with probe operators A that only have support on a single irreducible representation σ w (g) and set α = 2 n 3 |w| .Then, the mean k f A (m) again describes a single exponential decay eq. ( 7) with We will refer to this quantity as a local fidelity w.r.t. A. The local fidelity is again somewhat familiar.The special case p w,I has been called the 'addressability' in ref. [28], where it was used to gain information about the strength of correlated errors.Using gate-set shadows of simultaneously applied local gate sequences, we can collect even more information about correlated errors, giving rise to an efficient cross-talk tomography protocol introduced in section II I.We can again equip the UIRS protocol with sampling complexity guarantees by bounding the shadow norm.Dynamic shadow norm.We derive a bound on the dynamic shadow norm of the local Clifford group that depends exponentially on the Hamming weight |w| of the bit-string w labeling the representation being addressed but is independent of the total number of qubits in the system.Theorem 4 (Local Clifford UIRS norm bound).For the local Clifford UIRS protocol on n qubits, w ∈ {0, 1} n , and A = P w AP w a probe operator, it holds that The proof is given in the Supplementary Note 5. Note that the term inside the square bracket in eq. ( 16) can be considered as a variant of the unitarity restricted to the image of P w .In particular, if A = P w U P w for any unitary channel U we have 3 −|w| Tr(AA † ) = 1.Thus, for restrictions of unitary probe operators the bound becomes independent of the sequence length and in consequence the protocol is sampleefficient for bounded |w|.

G. Example beyond UIRS: Pauli-noise estimation
Thus far we have focused on uniformly independently sampled random sequences (UIRS protocols).It is also fruitful to consider more general probability distributions on the set of sequences of a given length.We give an example for this by constructing a simple protocol that estimates the diagonal elements of an n-qubit channel Λ using only O(n2 n ) samples.This sampling complexity matches the asymptotic bound given for this task in ref. [19].Using gate-set shadows, however, gives a simpler experimental description and analysis.To this end, consider random sequences of the form g = (c −1 , p m , . . ., p 1 , c) where p 1 , . . ., p m are chosen independently uniformly at random from the Pauli group P n and c is chosen uniformly at random from the Clifford group C n .Note the inverse c −1 here at the end of the sequence.In a black-box fashion, we additionally intersperse the channel Λ in between executing the random Pauli elements in the experiment.The measurement is again a computational basis measurement and the initial state ρ = |0⟩⟨0|.Choose τ to be a Hilbert-Schmidt normalized traceless Pauli operator.As the associated correlation function we define (17) with A τ := τ Tr(τ •) and α = 2 n (2 n + 1).For convenience, we ignore the SPAM in deriving and stating the following results.Both of these assumptions can be easily relaxed.As we show in the Supplementary Note 7, the corresponding sequence mean is the power of the diagonal matrix entry of Λ corresponding to τ , i.e., We further show that the variance of the associated estimator can be bounded as for all choices of τ .Note that there are 4 n − 1 such choices, characterizing all diagonal elements of the quantum channel Λ.Hence, by using median-of-means estimators, we can estimate k τ (m) for all τ to uniform additive precision using O(n2 n ) samples (independently of m).By the analysis in ref. [42] for the estimation of single exponential decays, and the fact that the decay rates Λ τ,τ are strongly clustered ([33, Lemma 4]) this leads to a relative precision estimation of the associated Pauli fidelities, matching the performance given in ref. [19].

H. Application: Learning unitary noise models
In the previous section we have shown how to efficiently estimate the overlap of certain probe-operators with the noise of a gate-set.This data, e.g., the average gate fidelity of the noise with a specific gate, is already of interest.The most intriguing feature, however, is that we can estimate many different probe-operators from the same data.In this way, we can use estimates from gate-set shadows as a subroutine in a complex post-processing pipeline that extracts more information about the noise.This opens up the way to perform many different characterization tasks that arise in a full-scale engineering cycle of building a quantum computer from the same simple data.Importantly, the resulting protocols automatically inherit the SPAM robustness of the estimation protocol.We illustrate these possibilities with three concrete examples.
When characterizing noisy quantum gates one differentiates between coherent noise (due to imperfect specification of the gate) and incoherent noise (due to interactions with the environment).These two types of noise have different consequences for, e.g., error correction [1,49] and are engineered away in different ways.At the same time, coherent errors can be corrected by experimental design and control if one has a concrete description.Given a model for a unitary channel θ → U (θ), we can learn the model parameters θ approximating the noise channel Λ by maximizing F (U (θ), Λ).During  107), for cross-talk of the form Nc = XX(θ), with dashed boxes indicating the unital marginals Λ0,1, Λ1,0, and Λ1,1.Panels f) and g) show the PTMs of the difference between the unital marginal Λ1,1 and the tensor product Λ1,0 ⊗ Λ0,1 as a characterization of the cross-talk between the two qubits, for cross-talk Nc = XX(θ = 0.4) in f) and Nc = ZZ(θ = 0.4) in f).Simulations have been performed using Qiskit [47] with single-qubit depolarizing noise of p1 = 0.002 for single-qubit gates and two qubit depolarizing noise p2 = 0.01 for two-qubit gates (on top of the custom noise processes after each Clifford layer).For the PTM plots, modified functions from the Forest Benchmarking package [48] have been used.
the optimization the objective function, its gradient, etc. can be estimated from the same classical gate-set shadow.For the multi-qubit Clifford UIRS every estimation requires a polynomial size shadow in the number of qubits and only a logarithmic overhead in the number of evaluations F (U (θ), Λ).A numerical simulation of a simple learning example is given in fig. 2.

I. Application: Cross-talk tomography
A key source of error in today's quantum computing devices is correlated noise, or cross-talk.For this reason, a significant effort has gone into characterizing cross-talk errors specifically [28,50,51].Using the flexibility of extracting manifold information from gate-set shadows in the post-processing, we here propose cross-talk tomography as an efficient, robust, and detailed cross-talk characterization procedure, based on the local Clifford UIRS protocol.
The protocol gains tomographic information about, what we call, the unital marginals Λ w = P w ΛP w , w ∈ {0, 1} n , of the noise channel Λ. (Here, P w is again the projector onto the irreducible representations of the local Clifford group.)These unital marginals arise as restrictions of channel marginals Λ Ā, where one evaluates Λ on a maximally mixed input on a system A and traces out the resulting state on A [52].Now Λ w can be reconstructed via simple linear inversion (see ref. [32,Lemma 37]) from the local fidelities p w,C (Λ) = 3 −|w| Tr(ΛP w CP w ) with respect to the probe-operators given by the local Clifford channel C according to where the sum is restricted to local Clifford channels with unitaries from the subgroup C w of C n acting non-trivially on only the qubits in the support of w.In fact, it is sufficient to consider all local Clifford channels C that act non-trivially on the support of the bit-string w.Not restricting the non-trivial support of C, however, allows us to simultaneously reconstruct Λ w for multiple values of w.This constitutes the basis of cross-talk tomography for klocal interactions.Let H k ⊂ {0, 1} n be the subset of bit strings with Hamming weight k. i) Perform the UIRS experiment for the local Clifford group over n qubits.ii) Estimate p w (C) for all w ∈ H k and for all C acting non-trivially on the support of w. iii) Reconstruct all Λ w for w ∈ H k .
By comparing Λ w for different bit strings, one obtains information about the correlations present in Λ. Building upon the guarantees for UIRS, we show that cross-talk tomography is ϵ-accurate in diamond norm for all Λ w using O(k 2 2 9k /ϵ 2 ) shadow samples (up-to log-factors).Thus, for small k, crosstalk tomography is highly scalable to large numbers of qubits.In the light of theorem 4, this efficiency stems from using local unitary probe operators.The derivation and even tighter guarantees are given in Supplementary Note 8.
As an illustration we study the protocol with a 2-qubit example.We start by using the local Clifford UIRS protocol to reconstruct the 2-qubit unital marginals Λ 1,0 , Λ 0,1 and Λ 1,1 .Next we compute the tensor product Λ 1,0 ⊗ Λ 01 .It is straightforward to see that if the channel Λ is a tensor product of single-qubit quantum channels featuring no correlations (i.e., there is no cross-talk) then Λ 1,0 ⊗ Λ 0,1 = Λ 1,1 .Hence, both the difference Λ 1,0 ⊗ Λ 0,1 − Λ 1,1 and the product Λ 1,1 (Λ 1,0 ⊗Λ 0,1 ) −1 provide meaningful characterizations of cross-talk present between qubits 1 and 2. The difference measure can be considered as a generalization of the commonly used addressability metric proposed in ref. [28].But going beyond a mere metric, we expect that the channel marginals not only detect the presence of cross-talk, but also provide more detailed diagnostic information.As a proof of principle we have numerically simulated the above protocol to diagnose cross-talk in a two-qubit system.The results of a numerical simulation of the protocol are presented in fig. 2.

J. Application: SPAM-robust channel reconstruction
Kimmel et al. [29] have proposed the idea to combine the output of O(2 4n ) different interleaved RB experiments in order to get a robust tomographic estimate of a unital quantum channel Λ.By explicitly exploiting the low Kraus-rank, compressive RB tomography [31,32] can reconstruct a unitary approximation to the quantum channel from (up-to-log-factors) O(2 2n ) randomly selected different relative average-gate fidelities with respect to Clifford unitaries.The previous references, however, left the problem open of providing a SPAMrobust RB protocol that achieves the information-theoretically optimal sampling complexity of O(2 4n ) [32] for reconstructing a unitary channel.
We fill in this blank using the data from a multi-qubit Clifford UIRS protocol.Using a set of randomly selected Clifford unitaries as probe-operators, we can provide the input data to the reconstruction algorithm of ref. [32].We show in Supplementary Note 9 that the number of gate-set shadows to guarantee an accurate reconstruction (in Hilbert-Schmidt norm of the Choi-states) indeed matches the lower-bound of O( 2 4n ).Note that the number of channel invocations is bounded by the maximal sequence length times the number of sequences.Besides the favorable scaling, the UIRS protocol has the crucial advantage compared to, e.g., the interleaved protocol of ref. [29] that the same measurement data is used for estimating all the average fidelities.
Going beyond the compressive reconstruction of unitary quantum channels, we can use Clifford UIRS as a primitive for the robust reconstruction of arbitrary unital quantum channels in the spirit of ref. [29], see also ref. [32, theorem 38] and ref. [53].The required size of the gate-set shadow is O(2 8n ) for an accurate reconstruction in any norm in which unitary channels are normalized.

K. Gate-dependent noise
The presentation so far assumed gate-independent noise.This assumption can be substantially relaxed, at the cost of introducing a more complex description of the noise.We will focus on the UIRS protocol, which is particularly robust against gate-dependent fluctuations.We give a fairly comprehensive argument, but leave a rigorous proof of the robustness to future work.Our argument follows that of the robustness against gate-dependent errors for RB [14,34].For gatedependent noise, the data form in expectation can be generally written as where Ξ depends on the state and measurement and the operator F(ϕ)[σ] := E g∈G σ(g) ⊗ ϕ(g) is known as the (noncommutative) Fourier transform of ϕ, evaluated at the irreducible representation σ; see the derivation of Theorem 7 in the Supplementary Information.
A key fact about this Fourier transform (see, e.g., ref. [54] for a proof) is that if ϕ is a representation ω (i.e., a perfectly implemented gate-set), then F(ϕ)[σ] is an orthogonal projector with rank equal to the number of copies of σ present in ω.For simplicity, let ω be multiplicity-free.Then, F(ϕ)[σ] is a rank-one projector.This implies that (A ⊗ I)F(ϕ)[σ] is also a rank-one projector.When ϕ is a sufficiently 'good' implementation of ω, the difference between F(ϕ)[σ] and F(ω)[σ] is small (in some suitable norm) and can be regarded as a perturbation of F(ω) [σ].(See ref. [14] for a discussion of norms on this space.)Applying the perturbation theory of non-normal matrices, we conclude that (A ⊗ I)F(ϕ)[σ] is as well approximately rank-one, and in particular that there exist super-operators Λ L , Λ R such that where E is a matrix of small norm and P σ is the projector onto σ (in the image of ω).This means that the decay rate k A (m) has the general functional form where B 1 , B 2 are real numbers encoding SPAM, δ(E) is small, and p(A), the dominant eigenvalue of (A ⊗ I)F(ϕ)[σ], is given by Up to a small and exponentially decreasing error, we thus recover the functional form of eq. ( 5) also in the presence of gate-dependent noise.It is important to note however, that in this general case Λ L and Λ R (and their product) need not be CPTP.This complicates the interpretation of p(A) as describing an aspect of a physical noise process.

III. DISCUSSION
It has long been known that classical randomness can facilitate the construction of informative characterization protocols for quantum devices.Randomized benchmarking [9][10][11][12][13][14] and classical shadow estimation [16,17] are examples of this mindset.In our work, we follow this paradigm even more stringently for diagnosing noise in gate-set implementations.Instead of engineering sophisticated and specific experimental protocols for a specific task, we turn the approach upside down: we focus on the 'simplest' randomized protocol that can be implemented with current and near-term quantum architectures: Random gate-sequences followed by native measurements.Accepting this restriction, we then ask how detailed diagnostic information can be extracted from the resulting data and most importantly how many samples are required.
It turns out that the resulting prescription-a single experiment that can and has been implemented experimentally already-allows for solving many benchmarking, certification and identification problems with (near-)optimal efficiency.All the technicalities that come along with different tasks are shifted to the classical post-processing phase.Most importantly, multiple diagnostic tasks can be performed from the same measurements, allowing us to base an entire engineering cycle on a single experiment.
The ideas advocated here constitute the beginning rather than the conclusion of a program.We regard our theoretical results as a strong motivation to experimentally realize and make use of the concrete applications, such as robust learning of unitary noise and cross-talk tomography.In addition, several further extensions seem exciting.A logical first extension of our work is UIRS with other groups and non-uniform measures over said groups.As with state shadow tomography and randomized benchmarking, we believe the UIRS protocol can be furnished with rigorous guarantees for several other useful gate-sets such as the matchgates [55,56], the Heisenberg-Weyl group, the CNOT-dihedral group and even gate-sets that do not constitute a group [57,58].
We also illustrated the potential of using correlated sequences where the gates are not drawn independently.We believe that using simple correlated sequences gives a fruitful perspective on long-standing problems such as the characterization of non-Markovian and time-varying noise processes in an experimentally friendly and scalable way.Furthermore, while not demonstrated here, akin to their state analog, gateset shadows can also be used for estimating non-linear quantities.
While the bulk of this work discusses diagnostic tools for developing near-term quantum computing devices, random sequence protocols apply beyond that.We expect that gate-set shadows will for instance find application as a primitive in quantum machine learning [59], in particular in dynamic settings such as time-series estimation.Also in this context, the possibility to 'measure first and ask later' increases the flexibility in devising hybrid quantum-classical schemes with experimentally feasible quantum computations.

IV. DATA AVAILABILITY
The simulated data used for creating the plots in fig. 2 have been deposited on Figshare and are publicly available [60].

V. CODE AVAILABILITY
The code used to simulate the protocol and create the plots in fig. 2 is available upon request.

VI. NOTATION
Throughout this work, we are operating in the Liouville or transfer matrix representation of quantum channels.We represent finite-dimensional d × d density matrices ρ as length d 2 column vectors |ρ⟩⟩ and POVM elements E as length d 2 row vectors ⟨⟨E|, with a corresponding trace-inner product ⟨⟨E|ρ⟩⟩ = Tr(E † ρ).In this picture, super-operators E get mapped to d 2 × d 2 matrices E with the property E|ρ⟩⟩ = |E(ρ)⟩⟩.Note that this representation is compatible with composition of super-operators (mapping to matrix multiplication), and the taking of tensor products.When d is a power of two a good basis for the space of matrices is the set of Hermitian Pauli operators P * (normalized under the trace inner product), in this case we also always write d = 2 n .We denote the normalized identity by 2 −n/2 I = τ 0 and the set of normalized traceless Hermitian Pauli matrices τ as P * 0 .Finally, we use a tilde to indicate noisy implementations of POVM elements and states, so ρ is a noisy implementation of the state ρ and { Ẽx } x is a noisy implementation of the POVM {E x } x .For the specific case of the all-zero computational basis state |0⟩⟨0| ⊗n we write |0 n ⟩⟩ (with noisy version | 0n ⟩⟩) and for the computational basis POVM |x⟩⟨x| we write ⟨⟨x| (with noisy version ⟨⟨x|).

VII. TECHNICAL PRELIMINARIES ON REPRESENTATION THEORY
In this section we recall some basic facts of representation theory (of finite groups), and discuss generally how it applies to our work, with a particular focus on the representation theory of the Clifford group.For a more in depth introduction to representation theory, we recommend the standard textbook ref. [61].
Let G be a finite group and consider the space M d of linear transformations of C d .A representation ω is a map ω : G → M d that preserves the group multiplication, i.e., We will require the operators ω(g) to be unitary as well (for finite groups this can always be done).
Reducible and irreducible representations.If there is a non-trivial subspace W of C d such that for all vectors w ∈ W we have then the representation ω is called reducible.The restriction of ω to the subspace W is also a representation, which we call a subrepresentation of ω.If there are no non-trivial subspaces W such that eq. ( 26) holds the representation ω is called irreducible.We will generally reserve the letter σ to denote irreducible representations.Two representations ω, ω ′ of a group G are called equivalent if there exists an invertible linear map T such that We will denote this by ω ≃ ω ′ .Sums, products, and Maschke's Lemma.We will make use of sums and products of representations.Given representations ω, ω ′ , the maps are again representations.They are, however, generally not irreducible (even if ω and ω ′ are).However, Maschke's Lemma ensures that every representation ω of a group can be uniquely written as a direct sum of irreducible representations, that is where the index set S labels a subset of the irreducible representations of G and n λ is an integer denoting the number of copies (or multiplicity) of σ λ present in ω.

Averages of representations.
Here we recall some standard results about averages over representations of finite groups.We will present these without proof, referring again to ref. [61] for a more detailed explanation.First is the basic statement that the average over any representation of a finite group is a projector (precisely onto the subspace on which the representation acts trivially): Lemma 5. Let ω be a representation of a group G then E g∈G ω(g) = P inv (31) where P inv is the projector onto the subspace left invariant under the action of ω(g), i.e., all vectors v s.t.ω(g)v = v for all g ∈ G.
Second is a useful statement about the invariant subspaces of two-fold tensor powers of representations.
Lemma 6.Let σ, σ ′ be real, irreducible, inequivalent, and non-trivial representations of a finite group G. Then the representation σ ⊗ σ ′ has no invariant subspace, while the representation σ ⊗2 leaves the vector v(P σ ) invariant, where v(P σ ) is the vectorized projector onto the image of σ.
Representation theory of the Clifford group.Here we give some basic facts about the representation theory of the Clifford group, which are used in the main text to derive the decay models for the multi-qubit and local Clifford UIRS protocols.
Lemma 7. The Liouville representation of the n-qubit Clifford group C n decomposes into two irreducible representations, in particular we have for all g ∈ C n : where ω triv (g) has support on Span{τ 0 } and ω ad (g) has support on the space of traceless matrices spanned by all normalized traceless Hermitian Pauli operators τ ∈ P * 0 .This is a direct consequence of the 2-design property of the Clifford group.An early proof can be found in ref. [44].A direct consequence is the following.Lemma 8. Let ω be the Liouville representation of the n-qubit Clifford group.Then we have We have similar statements for the local Clifford group.
Lemma 9.The Liouville representation of the local Clifford group C ×n 1 on n qubits decomposes into 2 n mutually inequivalent irreducible representations where σ w (g) has support on Span{P * w } with This is a direct result of the previous lemmas, applied to each of the n qubits individually.We also have the following statement.
Lemma 10.Let ω be the Liouville representation of the local Clifford group on n qubits.Then we have where again

VIII. SIMULTANEOUSLY ESTIMATING MANY OBSERVABLES
Key to the results in this work is the following general statistics observation, which also powers state shadow estimation [16].Let p be a probability distribution over some (finite) set X , and let A be a set of observables, i.e., functions f : X → R. Suppose we wish to estimate the vector of means [E X (f A )] A∈A to some overall error ϵ.A surprising fact from mathematical statistics is that this is possible by drawing only O(log(|A|)V max (f A )/ϵ 2 ) samples from p where V max (f A ) = max A∈A V X (f A ) is the maximal variance of the functions in A. Doing this (without making strong assumptions on the observables f A ) requires the construction of so-called sub-Gaussian estimators (see refs.[62][63][64] for reviews) for the means E X (f A ).An example of such an estimator that is straightforward to compute is the median-of-means estimator, which has been used in state shadow estimation by ref. [16].Following their notation, it involves gathering S = N K samples {x i } N K i=1 from the distribution p, where N, K are integers.For an observable f A , one can then construct the estimator for the average E(f A ), splitting the data into K equally sized parts of size N .It can be shown that if we set then we have with probability 1 − δ.We can substitute ϵ to obtain the direct relation in terms of the total number of samples N K. Hence, providing bounds on the maximal variance of a set of observables provides a rigorous guarantee on their estimation at any degree of confidence.Note however that the construction of the estimator is dependent on the level of confidence δ (through the setting of K).This is unfortunate, but it turns out to be impossible [64] to drop this requirement for sub-Gaussian estimators.

IX. GUARANTEES FOR THE UIRS PROTOCOL
In this section, we give the derivations of the general performance guarantees for the UIRS protocol summarized in the main text.

A. Fitting model
As we have argued in the main text, a useful class of sequence correlation functions is given by where A is some fixed probe super-operator, α is a suitable normalization and σ(g), ϕ(g) are representations of the gate-set group G.We begin by deriving the main result (eq.( 5) in the main text) on the mean k A (m) in the UIRS protocol.
Theorem 11.Let k A (m) be the outcome of an UIRS experiment with a correlation function as in eq. ( 43), over a gate-set G.
Then we have, under the assumption of gate-independent noise, where Θ, Φ are matrices induced by the representation structure of ω(g).Φ(A, Λ) depends only on the between-gates noise channel Λ := Λ R Λ L and the probe super-operator A, while Θ captures state preparation and measurement (SPAM) dependence.
In particular, if ω contains n σ copies of the representation σ then we have where P i is the projector onto the ith copy of the representation σ inside ω.
The formulation of the result given in the main text directly follows from theorem 11 by additionally realizing that f A (x, g) regarded as a random variable pushing forward p(x, g) is bounded and, thus, the corresponding mean estimator kf A (m) is unbiased and consistent.Correspondingly the median-of-mean estimator converges to the expected value of the mean.
where we have used that the representation average is a projector (and thus equal to its square).Now note that we can write ω(g) = σ nσ (g) ⊕ ω ′ (g) where ω ′ is a representation that contains no copies of σ.From this and lemmas 5 and 6 given above we can see that where P i is the projector on the i'th copy of σ in ω.Using the fact that and defining the matrices Θ, Φ appropriately, we obtain the theorem statement.

B. Variance bound with the dynamic shadow norm
In order to bound the sampling complexity of the estimation of UIRS means k A (m) it is sufficient, through the use of medianof-means estimators, to obtain a bound on the variance of associated probability distribution.In the main text we did this by introducing the dynamic shadow norm.The dynamic shadow norm is formally defined as with We prove the associated theorem: Theorem 12 (Restatement of Theorem 1 in the main text).Consider an UIRS protocol (at sequence length m) with gate-set G. Also consider a correlation function f A with probe super-operator A. The (single-shot) variance of the associated mean estimator kf A (m) is bounded as Proof.The variance of a discrete random variable X assuming values x ∈ X with probability p(x) is given by with µ the expected value of X.We can therefore obtain an upper bound on the variance by simply considering Thus, we have that Using the identities Tr(A ⊗ B) = Tr(A) Tr(B) and AB ⊗ AB = A ⊗2 B ⊗2 we obtain Maximizing over Λ R , Λ L and recalling the definition of the shadow norm completes the argument.

X. SHADOW NORM BOUND FOR LOCAL CLIFFORD UIRS
In this section, we consider the UIRS protocol with the local Clifford group C ×n 1 .We will model the noisy implementation of any given Clifford by Λ L ω(g)Λ R and we will denote Λ R Λ L =: Λ for brevity.In the main text we stated the following theorem: Theorem 13 (Restatement of Theorem 4 in the main text).Consider the local Clifford UIRS protocol.Let ϕ(g) = Λ L ω(g)Λ R be a noisy implementation of the local Clifford group on n qubits and A = P w AP w be a probe super-operator with |w| = k for k a fixed integer.Also let 0n be a noisy implementation of the all-zero state and {x} x the noisy computational basis POVM).The shadow norm of the random variable X A (m) is upper bounded independently of the number of qubits n and sequence length m.In particular, it holds that Proof.We begin from the general expression of the shadow norm, which can be written as [1,n] ,...,g Λω(g where {⟨⟨x|} x and | 0n ⟩⟩ are the noisy measurement POVM and the noisy initial state, respectively.We now make use of the fact that A is assumed to be supported on only a single irreducible representation denoted by k ∈ {0, 1} n i.e., A = P k AP k where P k is the projector onto that irreducible representation.Without loss of generality we will here set k to be the all 1 bit string on the first k bits and 0 on the remaining n − k bits.The projector P w acts as |τ 0 ⟩⟩⟨⟨τ 0 | on these last n − k qubits.Since we see that in ⟨⟨x ⊗2 |ω(g n ⟩⟩ we can absorb the action of the local Clifford group on the last n − k qubits.Hence, the local Cliffords g (i) [k+1,n] only act by a single conjugation, i.e., Λω(g Now we use the fact that we end up with [1,k] ,...,g [ × Λω(g Obviously |ρ k ⟩⟩ := |Tr k+1,n (Λ R ( 0n ))⟩⟩ is a k-qubit state and, moreover, we have where Λ k is the k-qubit marginal of Λ.At this point, we see that the shadow norm is bounded independently of the number of qubits n and only depends on the dimension of the irreducible representation which we assume A to have support on (it is a function of k only).As we can see, we are left with a third moment calculation over C ×k 1 .However, for the sake of obtaining an upper bound we simply note that ⟨⟨ Ẽx [1,k] |ω(g since Λ k is a quantum channel.We, therefore, can (using the invariance of the Haar measure) simplify the bound to Again using the fact that A is taken to only have overlap with a single irreducible representation, lemma 10 and the fact that ⟨⟨0|τ i ⟩⟩ = 1/ √ 2 if and only if τ i = τ 0 or τ Z (and zero otherwise) we have which is what we set out to prove.

XI. SHADOW NORM BOUND FOR MULTI-QUBIT CLIFFORD UIRS
In this section, we will go into the details of the shadow norm bound calculations for the multi-qubit Clifford group UIRS protocol.Concretely, we prove the following theorem.
Theorem 14 (Restatement of Theorem 3 in the main text).Consider the n-qubit Clifford UIRS protocol and let A = P ad AP ad be a probe super-operator restricted to the traceless subspace.Also let 0n be a noisy implementation of the all-zero state and {x} x∈{0,1} n the noisy computational basis POVM).The associated shadow norm is upper bounded as ∥A∥ dyn,m ≤ 11 u(A) r(A) m−2 + 2(m − 2) 2 r(A) m−3 max 11u(A), (11 with and where u(A) = Tr(AA † )/(2 2n − 1) is the unitarity of A.
Restricting to m > 0 and using that r(A) ≥ u(A) yields the simplified statement in the main text.
Proof.Recall that the dynamic shadow norm for the multi-qubit Clifford group is given by where Λ := Λ R Λ L is the noise in-between subsequent gates.We can provide a concrete resolution for the third moment by noting that C n is a 3-design [65], and, hence, its third moment follows that of the unitary group U (2 n ), which is fully determined (for n ≥ 2) by Schur-Weyl duality.In particular, we have where the matrices π permute copies of the base Hilbert space, i.e., The Weingarten matrix can be explicitly written down in the basis {e, (12), ( 23), ( 13), ( 123), (132)}.Now defining the matrices and we get We begin by analyzing the matrix Ω.Note that A(I) = A † (I) = 0, since A is supported only on the space of traceless matrices by construction.This means that Ω π,π ′ is zero unless π, π ′ ∈ {( 12), (123), (132)}.Thus we can write Ω = P † ΩP with P the restriction from Span{e, (12), ( 23), ( 13), ( 123), (132)} to Span{( 12), (123), (132)} and where we have used Λ † (I) = I, J u denotes the unnormalized Choi-isomorphism (J u (A) = I ⊗ A(|v(I)⟩ ⟨v(I)| with v the column-stacking vectorization map) and T is the (non-CP) transposition map.This can be derived directly from the definition of Ω and some diagram chasing.Before we continue, we establish some facts about the entries of Ω.We begin by noting that for the off-diagonal term Ω(123),(132) , we have Tr(J u (AT ) 2 J u (ΛT )) = Tr(J u (T A) 2 J u (T Λ)) = Tr(J u (A † T ) 2 J u (Λ † T )). (77) This follows from three facts: (1) A ⊗2 commutes with the super-operator L (12) defined as left-multiplication with the matrix (12), (2) L (12) ((123)) = ( 132) and (3) the trace is invariant under Hermitian conjugation.One can also see this graphically through the following series of tensor manipulations: as where we have used that the Liouville representation of a dual super-operator Λ † is given by the transpose of the Liouville representation of Λ.For clarity we marked for clarity one of the copies of A with a dot.The rightmost side of eq. ( 77) is amenable to a bound of the form where again the constant 4 is overestimate.Hence, the spectral radius of Ω0 is bounded by u(A) up to a small multiplicative correction, i.e.
The plan is now to leverage lemma 15 to bound the spectral radius of Ω Ŵ .To do this we first need to bound the norm (from eq. ( 85)) We can bound the terms on the RHS in a straightforward manner (using the definition eq. ( 73)), and (using the definition of Ω0 ) and finally Plugging this all back in we get Moreover, we have that for n ≥ 3, and by the same argument Hence, through lemma 15, the spectral radius difference can be bounded as and the spectral radius of Ω Ŵ is thus bounded by (using the above equation and eq.( 89)) We can plug this into lemma 16 to obtain which takes care of the 'dynamic' part of the dynamic shadow norm.To bound the SPAM contribution we can observe Of these only ∥ x∈{0,1} n x ∥ HS has not been considered.From a straightforward calculation (using that Λ R ( 0n ) is a state and We can use this to upper bound the SPAM factor as Gathering terms, throwing away some negative ones, remembering our bound for Ω HS , and using that and ⟨⟨0 n |Λ R ( 0n )⟩⟩ ≤ 1 by construction we can bound this further by We would like to stress that from this expression we can already see that the SPAM norm contribution is asymptotically independent of n.By basic numerics, we can obtain We again note that this constant is sub-optimal (especially for large n).Putting all of this together we obtain the stated bound.
When the probe super-operator is a unitary restricted to the traceless subspace (A = P ad U P ad for some unitary U ) then we can obtain a substantially improved bound, which hinges critically on the fact that U is a quantum channel.Theorem 17 (Restatement of Theorem 2 in the main text).Consider the n-qubit Clifford UIRS protocol and let A = P ad U P ad be a probe super-operator with U a unitary.Also let 0n be a noisy implementation of the all-zero state and {x} x the noisy computational basis POVM).The dynamic shadow norm is bounded as ∥A∥ dyn,m ≤ 10. (108) Proof.We begin from the definition the dynamic shadow norm, given by Here, again we write Λ := Λ R Λ L .Note also that P 2 ad = P ad and that P ad commutes with both U and ϕ(g).Hence, we can write (110) Furthermore E g∈Cn ϕ(g) ⊗3 is a projector, and thus where we have used the resolution of the third-moment projector as in eq. ( 73).Now defining the vectors we can express the shadow norm as By direct calculation analogous to the calculations done in the proof of theorem 14, this becomes Finally, we use that ∥π ′ ∥ ∞ = 1 and that |0 ⊗2 n ⊗ Λ R ( 0n )⟩⟩ is a quantum state to see that for n ≥ 2.
Finally, we provide a proof of lemma 16, adapted from a very similar statement in ref. [46,Lemma 8.5].
Proof of lemma 16.We begin bringing A into Schur normal form, i.e., where D is diagonal (with the eigenvalues of A on the diagonal), N is strictly upper triangular and U is unitary.Now consider the expansion of (D + N ) m .Since N is strictly upper triangular, any term with more than d − 1 factors of N must vanish.Hence, we have , where we have used that m k ≤ m d−1 , Hölder's inequality, and the monotonicity of the exponential function.Now note that Combining this with ∥D∥ ∞ = s(D) = s(A) we obtain the lemma statement.

XII. PAULI-NOISE ESTIMATION
Here we analyze the Pauli-noise estimation scheme outlined in the main text.Recall that we consider sequences of the form g = (c −1 , p m , . . ., p 1 , c), where p 1 , . . ., p m are i.i.d.randomly drawn elements from the Pauli group P n and c is randomly drawn multi-qubit Clifford C n .For τ a traceless Hilbert-Schmidt normalized multi-qubit Pauli operator and sequence g we define the filter-function as with A τ = |τ ⟩⟩⟨⟨τ | and α = 2 n (1 + 2 n ).Without SPAM, the expected value of the single shot estimator, is given by with Λ τ,τ = ⟨⟨τ |Λ|τ ⟩⟩.Since the Clifford group acts transitively on the traceless Pauli operators P * n , we can rewrite Out of the 2 2n − 1 traceless Pauli-operators only 2 n − 1 have non-vanishing diagonal entries (those consisting only out of local I and Z).The non-vanishing diagonal entries are all identical to 2 n/2 .Thus, using the definition of α we have It remains to calculate the variance associated with estimating k τ (m), as given in (19) in the main text.We intend to prove that k τ (m) has variance bound in O(2 n ).To do this first, note that since P ⊗2 τ ⊗2 P † ⊗2 = τ ⊗2 .Hence, the variance associated to the estimation can be upper bounded by where we have used that E ω(p) = |τ 0 ⟩⟩⟨⟨τ 0 | and the trace preservation of Λ.Noting again that ω(c) acts trivially on τ 0 and transversally on the traceless Pauli operators, we see that which becomes by the analogous argument as above as intended.

XIII. MARGINAL CHANNEL RECONSTRUCTIONS AND CROSS-TALK TOMOGRAPHY
We here show how to employ the local Clifford UIRS protocol to get tomographic information of channel marginals.To this end, recall that with the local Clifford UIRS protocol we can efficiently estimate the quantity 3 −|w| Tr(ΛP w U P w Λ) for any unitary channel U , where Λ is a quantum channel and P w is the projector onto the irreducible representation of C ×n 1 labeled by the bit string w, provided |w| bounded.We can introduce channel marginals Λ k of Λ by inserting a maximally mixed state into all but the first k (out of n) inputs and tracing out all but the first k output qubits.Note that we can choose the order of the qubits arbitrarily, therefore restricting to the first k qubits does not cost any generality.For any w ∈ {0, 1} k × {0} n−k we have which only depends on the marginal Λ k .Now consider the k-qubit super-operator which we will refer to as the pinched marginal associated with the marginal Λ k .Note that this super-operator is not necessarily a quantum channel (although it is trace preserving).One can see that the pinched marginal is composed of blocks Λ w = P w ΛP w which we refer to as the unital marginals in the main text.We can reconstruct the pinched marginal S k using the local Clifford UIRS protocol.To see this, consider the group of k-qubit Clifford operators C k .Reference [32, theorem 39] implies that From theorem 13 we know that we can estimate the quantities 3 −|w| Tr(Λ k P w CP w ) to accuracy ϵ using S = O(k2 (2+2 log 2 (3))k /ϵ 2 ) runs of the local Clifford UIRS protocol.Hence, we can reconstruct S k to ϵ error in diamond norm using S = O(k2 (2+4 log 2 (3))k /ϵ 2 ) runs.In particular, we can also construct every 'block' Λ w of S k with additive error in diamond norm from the same number of samples.Moreover, since the procedure we have described above is independent of which set of k qubits is considered, it follows immediately that one can reconstruct all n k pinched marginals associated to each set of k qubits to a global ϵ error in diamond norm using S = O(nh(k/n)k2 (2+4 log 2 (3))k /ϵ 2 ) samples (where h(k/n) is the binary entropy).Using log n k ≤ k log(en/k), we can relax the statement to guarantee ϵ-accurate recovery of all unital marginals Λ w with |w| = k in diamond norm from O(k 2 2 9k /ϵ 2 ).

XIV. DETAILS ON SPAM-ROBUST CHANNEL RECONSTRUCTION
Using multi-qubit Clifford UIRS we can extract the relative average-gate fidelities that enter the tomographic reconstruction of both schemes from the output statistics of random gate-set sequences, without the need to perform different interleaved experiments.This gives rise to an efficient and robust channel reconstruction In the multi-qubit Clifford UIRS protocol the decay rate are given as Hence, if we assume that A = P ad U P ad for a unitary channel U we see that p(U ) = (2 n F (U, Λ) − 1)/(2 n − 1), where F (U, Λ) is the average fidelity between U and Λ.By theorem 17 and theorem 14, we can estimate using the UIRS protocol, an exponential number of average fidelities using only a polynomial number of samples (and equivalently channel queries).Furthermore, for U a Clifford unitary calculating the sequence correlation function, and, thus, the entire classical-post processing, is time and space efficient in the number of qubits.Characterizing a quantum channels in terms of different relative average gate fidelities with unitaries can provide valuable diagnostic in itself.This can be seen as a robust gate-set or channel variant of selective state tomography [67].Beyond this, building on the results of refs.[29,31,32,53], having access to relative average fidelities is an powerful primitive for the tomographic reconstruction of channels, which we will now consider in more detail.
The first task we consider is the reconstruction of unitary (or more generally bounded Kraus rank) quantum channels-lowrank randomized benchmarking tomography.This task is vital to the characterization of calibration errors.Reference [32] establishes that given a list of estimates [ F (C, Λ)] C∈A of relative average gate fidelities with respect to a randomly chosen subset A of Clifford unitaries, a constraint least-squares fit can reconstruct Λ provided that |A| ≥ cd 2 log(d).More precisely, the error of the channel estimate Λ in Hilbert-Schmidt norm of the Choi-states fulfills where F := [F (C, Λ)] C∈A is the vector of average fidelities of length |A|, F is an estimate of F produced through the shadow sequence protocol and the norm is the l 2 vector norm.Furthermore, the reconstruction is stable against Λ deviating from the low-rank assumption (model-mismatch) and can be formulated in different p-norms on both sides, we refer to the supplemental material of ref. [32] for details.Reference [32], however, has not analyzed the overall sampling complexity of the resulting RB tomography scheme when combined with a robust way to acquire the relative average fidelities.
The UIRS protocol can provide the missing piece.After decay fitting, we can give estimates F for the vector of fidelities F with error guarantee with success probability 1 − δ using S samples, i.e., the size of the gate-set shadow.Using the standard relations between l ∞ and l 2 vector norms this implies we can obtain an ϵ-accurate reconstruction of Λ provided S ≥ C 2 4n log(2|A|/δ) ϵ 2 (136) with a suitable constant C. Dropping polynomial factors in n and 1/ϵ, we find that the total number of gate-set shadows scales as O(2 4n ).Note that the number of channel invocations is bounded by the maximal sequence length times the number of sequences.This matches the scaling of the information theoretic lower bound derived in ref. [32] for the case that the average gate fidelities are measured independently.Besides the favourable scaling, the UIRS protocol has the benefit compared to, e.g., the interleaved protocol of ref. [29] that the same measurement data is used for estimating all the average fidelities.
Besides the compressive, low-rank quantum channel tomography, we can use average gate-fidelities from UIRS protocols for the tomography of a more general class of quantum channels.If Λ is a unital quantum channel then it is known [32, theorem 38] (see also ref. [53]) that it can be expressed as ).The same argument holds for all norms for which ∥C∥ = 1, such as the diamond norm.

FIG. 2 .
FIG. 2. Numerical simulations of two potential applications, unitary noise optimization (sec.II H) and cross-talk tomography (sec.II I).Panels a) and b) show simulation results of the multi-qubit Clifford UIRS protocol for two qubits and 1000 random sequences per sequence length.Between every Clifford gate C, two independent Z-rotations RZ (θ) with rotation angles θ1 = 0.07 and θ2 = 0.13 have been applied (see circuit diagram c)).Panel b) shows average fidelities F (U (θ), Λ) reconstructed from the gate-set shadows using the ansatz U (θ1, θ2) = RZ (θ1) ⊗ RZ (θ2).Example decays of the sequence averages k(m) are shown in panel a) with bootstrapped 95% confidence intervals around the decay points.Panels e) to g) display simulation results for cross-talk tomography from two-qubit local Clifford UIRS data with 15.000 random sequences per sequence length.After every layer of local Cliffords, an entangling cross-talk noise process Nc has been applied (see the circuit diagram d)).Panel e) shows the Pauli transfer matrix (PTM) of the reconstructed pinched marginal S Supplementary Equation (107), for cross-talk of the form Nc = XX(θ), with dashed boxes indicating the unital marginals Λ0,1, Λ1,0, and Λ1,1.Panels f) and g) show the PTMs of the difference between the unital marginal Λ1,1 and the tensor product Λ1,0 ⊗ Λ0,1 as a characterization of the cross-talk between the two qubits, for cross-talk Nc = XX(θ = 0.4) in f) and Nc = ZZ(θ = 0.4) in f).Simulations have been performed using Qiskit[47] with single-qubit depolarizing noise of p1 = 0.002 for single-qubit gates and two qubit depolarizing noise p2 = 0.01 for two-qubit gates (on top of the custom noise processes after each Clifford layer).For the PTM plots, modified functions from the Forest Benchmarking package[48] have been used.