Group-theoretic error mitigation enabled by classical shadows and symmetries

Estimating expectation values is a key subroutine in quantum algorithms. Near-term implementations face two major challenges: a limited number of samples required to learn a large collection of observables, and the accumulation of errors in devices without quantum error correction. To address these challenges simultaneously, we develop a quantum error-mitigation strategy called ``symmetry-adjusted classical shadows,'' by adjusting classical-shadow tomography according to how symmetries are corrupted by device errors. As a concrete example, we highlight global $\mathrm{U(1)}$ symmetry, which manifests in fermions as particle number and in spins as total magnetization, and illustrate their group-theoretic unification with respective classical-shadow protocols. We establish rigorous sampling bounds under readout errors obeying minimal assumptions, and perform numerical experiments with a more comprehensive model of gate-level errors derived from existing quantum processors. Our results reveal symmetry-adjusted classical shadows as a low-cost strategy to mitigate errors from noisy quantum experiments in the ubiquitous presence of symmetry.


I. INTRODUCTION
Quantum computers are highly susceptible to errors at the hardware level, posing a considerable challenge to realize meaningful applications in the so-called noisy intermediate-scale quantum (NISQ) era [1,2].One particularly promising and natural candidate for NISQ applications is the simulation of quantum many-body physics and chemistry [3][4][5][6].In order to minimize the accumulation of errors, such algorithms prioritize lowdepth circuits, for instance, variational quantum circuits [7][8][9][10].However, in order to exhibit quantum advantage, these circuits must also be beyond the capabilities of classical simulation [11][12][13][14], resulting in noise levels that nonetheless corrupt the calculations.
While quantum error correction is the long-term solution, current state-of-the-art hardware is still a few orders of magnitude from achieving scalable, fault-tolerant quantum computation [15][16][17][18][19][20][21][22][23].In the meantime, there have been considerable theoretical and experimental efforts probing the beyond-classical potential of NISQ computers .Should such an application be demonstrated, quantum error mitigation (QEM) is expected to play a crucial role.Broadly speaking, QEM aims to approximately recover the output of an ideal quantum computation, given only access to noisy quantum devices and offline classical resources.We refer the reader to Refs.[47,48] for a review of prominent concepts and strategies in QEM.
A related but separate challenge for NISQ algorithms is the need to learn many observables in a rudimentary fashion, i.e., by repeatedly running and sampling from quan-tum circuits.The number of repetitions required can be immense, both to suppress shot noise and to handle the measurement of noncommuting observables [49,50].While a variety of strategies have been proposed to address this bottleneck [10,51], one particularly promising approach is that of classical shadows [52,53].
Classical shadows were developed primarily from the union of two themes in quantum learning theory: linear-inversion estimators for state tomography [54,55] (closed-form solutions that admit fast postprocessing and rigorous guarantees) and the framework of shadow tomography [56,57] (predict only a subset of observables, not the entire density matrix).The result is a simple but powerful protocol that accurately estimates a large collection of observables from relatively few samples.In terms of quantum resources, classical shadows only require the ability to measure in randomly selected bases, making the protocol particularly amenable to NISQ constraints.These desirable features have inspired a wide range of extensions and applications, for example: entanglement detection [58], quantum Fisher information bounds [59,60], learning quantum processes [61,62], navigating variational landscapes [63,64], energy-gap estimation [65], and applications to fermions [66][67][68][69][70] and bosons [71,72].For an overview of classical shadows and randomized measurement strategies, see Ref. [73].
Due to their experimental friendliness and versatile prediction power, classical shadows naturally have been considered for QEM as well.For example, Refs.[74,75] used classical shadows to approximately project a noisy quantum state toward a target subspace via classical postprocessing, the subspaces being either the logical subspace of an error-correcting code [76] and/or the dominant eigenvector (purification) of the noisy mixed state [77,78].These shadow-based ideas circumvent some of the difficulties of performing subspace projec-tion, at the cost of an exponential sample complexity.Meanwhile, Ref. [79] intertwined classical shadows with other popular QEM strategies, with a particular focus on probabilistic error cancellation [80].They establish rigorous estimators and performance guarantees, assuming an accurate characterization of the noisy quantum device.Finally, Refs.[81,82] described modifications to the classical linear-inversion step in order to mitigate errors in the randomized measurements.In particular, robust shadow estimation [81] assumes no prior knowledge of the noise, instead implementing a separate calibration experiment that learns the necessary noise features.
In this paper, we take this latter perspective [81,82], with an eye on a more comprehensive mitigation of errors beyond readout errors.We introduce a QEM protocol, which we refer to as symmetry-adjusted classical shadows, that takes advantage of known symmetries in the quantum system of interest.For example, in simulations of chemistry, the number of electrons is typically fixed.The corruption of such symmetries by noise informs us how to undo the effects of that noise.Crucially, because randomized measurements scramble the information, the other properties of the quantum system are corrupted (and therefore can be mitigated) in the same manner.Using these insights, symmetry-adjusted classical shadows appropriately modifies the linear-inversion based on the symmetry information alone.
A notable advantage of our protocol is that we do not run any extraneous calibration experiments.This has the added benefit of inherently accounting for errors that occur throughout the full quantum circuit, rather than the randomized measurements in isolation [81][82][83][84][85]. Also, the simplicity of the protocol allows for additional QEM techniques to be straightforwardly applied in tandem.Finally, in contrast to other symmetry-based ideas [79,[86][87][88], our approach goes beyond the concept of symmetry projection, instead utilizing a unified group-theoretic understanding of classical shadows in conjunction with symmetries.
This paper is structured as follows.In Sec.II, we establish preliminaries and background material.In Sec.III we provide a self-contained summary of results, including a number of ancillary results that may be of independent interest.We construct symmetry-adjusted classical shadows in Sec.IV, illustrating the general theory and establishing performance bounds.In Secs.V and VI we apply the idea to fermionic and qubit systems with global U(1) symmetries.We then turn to numerical experiments in Sec.VII, investigating and validating the performance of symmetry-adjusted classical shadows in practical scenarios.Notably, we incorporate a realistic noise model based on existing superconducting-qubit platforms into our simulations.Finally, we close with conclusions and a discussion on future prospects in Sec.VIII.Appendices provide technical details regarding the proofs and numerical calculations, and for the latter we have also made open-source code available at Ref. [89].

II. BACKGROUND
Here, we provide a review of classical shadows [52,53] and robust shadow estimation [81].Readers familiar with this background material can skip to the summary of our results in Sec.III, after familiarizing themselves with the notation that we establish below.
Throughout this paper, we consider an n-qubit system with Hilbert space H := (C 2 ) ⊗n .Its dimension is denoted by d ≡ 2 n unless otherwise specified.We often work with the space of linear operators L(H) ∼ = C d×d as a vector space, so it will be convenient to employ the Liouville representation: for any operator A ∈ L(H), its vectorization |A⟩⟩ ∈ C d 2 in some orthonormal operator basis {B 1 , . . ., B d 2 | tr(B † i B j ) = δ ij } is defined by the components ⟨⟨B i |A⟩⟩ := tr(B † i A).Under this representation, superoperators are mapped to d 2 × d 2 matrices: any E ∈ L(L(H)) can be specified by its matrix elements We let E denote both the superoperator and its matrix representation, and in a similar fashion we sometimes write |A⟩⟩ = A.
For systems of qubits, the normalized Pauli operators P(n)/ √ d are a convenient basis for L(H), where This choice is called the Pauli transfer matrix (PTM) representation.The weight, or locality, of a Pauli operator P ∈ P(n) is the number of its nontrivial tensor factors, denoted by |P |.For each i ∈ [n], we define W i ∈ P(n) which acts as W ∈ {X, Y, Z} on the ith qubit and trivially on the rest of the system.For fermions in second quantization, a natural choice of basis is the set of Majorana operators, defined as The Hermitian generators {γ µ | µ ∈ [2n]} ⊂ L(H) obey the anticommutation relation γ µ γ ν + γ ν γ µ = 2δ µν I (we will use I to denote any identity operator whose dimension is clear from context).By convention, the elements of µ and the product in Eq. ( 2) are in strictly ascending order.We call |µ| the degree of Γ µ , or equivalently refer to them as (|µ|/2)-body operators whenever the degree is even.It is straightforward to check that Majorana operators are isomorphic to Pauli operators, in particular satisfying the orthogonality relation ⟨⟨Γ µ |Γ ν ⟩⟩ = dδ µν .For any unitary U ∈ U(d), its corresponding channel is denoted by U(•) := U (•)U † .For any |φ⟩ ∈ H, |φ⟩⟩ is the vectorization of |φ⟩⟨φ|.We use tildes to indicate objects affected by quantum noise, e.g., U denotes a noisy implementation of the U. Hats indicate statistical estimators, e.g., ô denotes an estimate for o = tr(Oρ).Asymptotic upper and lower bounds are denoted by O(•) and Ω(•) respectively, and f (x) = Θ(g(x)) means that f (x) is both O(g(x)) and Ω(g(x)).

B. Classical shadows
We summarize the method of classical shadows as formalized by Huang et al. [52], borrowing the PTM language of Chen et al. [81] which will make the robust extension clear later.Our task is to estimate the expectation values tr(O j ρ) = ⟨⟨O j |ρ⟩⟩ of a collection of L observables O 1 , . . ., O L ∈ L(H), ideally using as few copies of ρ as possible.Classical shadows is based on a simple measurement primitive: for each copy of ρ, apply a unitary U randomly drawn from a distribution of unitaries and measure in the computational basis.This produces a sample b ∈ {0, 1} n with probability ⟨⟨b|U|ρ⟩⟩.One then inverts the unitary on the outcome |b⟩ in postprocessing, which amounts to storing a classical representation of U † |b⟩.
The unitary distribution determines the efficiency of this protocol with respect to the properties of interest.Throughout this paper, we assume that the distribution is a finite group equipped with the uniform probability distribution. 1Specifically, let U : G → U(H) be a unitary representation of a group G.The measurement primitives averaged over all random unitaries and measurement outcomes implement the quantum channel where describes the effective process of computational-basis measurements.The channel U g is the random unitary acting on the target state ρ, while U † g is its classically computed inversion on the measurement outcomes |b⟩⟩.Thus in expectation we produce the state If M is invertible (corresponding to informational completeness of the measurement primitive), then applying M −1 to Eq. ( 5) recovers the state: The objects |ρ g,b ⟩⟩ := M −1 U † g |b⟩⟩ are called the classical shadows of |ρ⟩⟩, for which they serve as unbiased estimators.Hence by construction they can predict expectation values, E g∼G,b∼Ug|ρ⟩⟩ ⟨⟨O j |ρ g,b ⟩⟩ = ⟨⟨O j |ρ⟩⟩, (7) as well as nonlinear functions of ρ [52].While M −1 is not a physical map (it is not completely positive), it only appears as classical postprocessing.Such a computation can be accomplished, for instance, by first deriving a closed-form expression for M.
One systematic approach to deriving such an expression is through the representation theory of G. First, note that the d-dimensional unitary U is promoted to a d2 -dimensional representation U. Equation ( 3) reveals that M is a twirl of M Z by the group G under the action of U.Such objects are well studied: assuming that the irreducible components of U have no multiplicities, 2 an application of Schur's lemma implies that [90] Here, R G is the set of labels λ for the irreducible representations (irreps) of G.The superoperators Π λ are orthogonal projectors onto the irreducible subspaces V λ ⊆ L(H).Choosing an orthonormal basis {|B j λ ⟩⟩ | j = 1, . . ., dim V λ } for each subspace, we can write the projectors as The eigenvalues f λ of M can be computed using the orthogonality of projectors: Note that tr(Π λ ) = dim V λ .From this diagonalization, we immediately acquire an expression for the desired inverse: If some f λ = 0, then we may instead define M −1 as the pseudoinverse on the subspaces where f λ is nonvanishing.This implies that the measurement primitive is informationally complete only within those subspaces.
To analyze the sample efficiency of this protocol, suppose we have performed T experiments, yielding a collection of independent classical shadows ρ1 , . . ., ρT where From this data we can construct estimates which by linearity converge to tr(O j ρ).The single-shot variance of ôj can be bounded in terms of the so-called shadow norm: This variance controls the prediction error, rigorously established via probability tail bounds. 3In particular, taking a number of samples ensures that, with probability at least 1−δ, each estimate exhibits at most ϵ additive error: Finally, we comment on the classical computation of ôj .In order to evaluate Eq. ( 12), one may use Eqs.( 9) and (11) to express the ℓth-sample estimate as Thus it suffices to be able to efficiently compute the expansion coefficients ⟨⟨O j |B k λ ⟩⟩ = tr(O j B k λ ) of the observable O j in a basis of V λ , as well as the matrix elements Note that this does not require explicitly representing the classical shadow M −1 U † g |b⟩⟩; we only need to determine the diagonal entry of the rotated operator U g (B k λ ) † U † g for a given basis state |b⟩.

C. Robust shadow estimation
We now summarize the robust shadow estimation protocol by Chen et al. [81]; we note that Refs.[83][84][85] describe analogous ideas in the case of random single-qubit measurements.The basic premise is the fact that Schur's lemma applies to the twirl of any channel, not just M Z .Suppose that instead of U g , the quantum computer implements a noisy channel U g which obeys the following assumptions: Assumptions 1 ([81, Simplifying noise assumption A1]).The noise in U g is gate independent, time stationary, and Markovian.Hence there exists the decomposition U g = EU g , where E is a completely positive, tracepreserving map, independent of both the ideal unitary and the experimental time.
They also assume the ability to prepare the state |0 n ⟩ with sufficiently high fidelity.Given these conditions, the noisy version of the shadow channel implemented in experiment becomes which is now a twirl over the composite channel M Z E.
Although E is unknown, Schur's lemma implies that the eigenbasis is preserved, as we now have where the eigenvalues depend on E, Therefore if one knows f λ , then one can perform the correct linear inversion in the presence of noise, i.e., by replacing f −1 λ with f −1 λ in Eq. ( 16).Because E depends on the details of the quantum hardware, it is not possible to determine f λ without an a priori accurate characterization of the noise.Absent such information, a calibration protocol is proposed to experimentally estimate the value of f λ .This proceeds by performing the classical shadows protocol on a fiducial state |0 n ⟩, rather than the unknown target state ρ.This enables the study of errors in the random circuits U g .Because |0 n ⟩ is known exactly, one can compare its noiseless properties against the noisy experimental data to determine a calibration factor.Specifically, Chen et al. [81] construct an estimator NoiseEst G (λ, g, b) for each sample (U g , b) of the calibration experiment, which converges to f λ in expectation over g and b.Although they do not prescribe a generic expression for NoiseEst G (instead considering particular choices of G), it is straightforward to derive one following their ideas.Let D λ ∈ V λ be an observable supported exclusively by a single irrep such that ⟨0 n |D λ |0 n ⟩ ̸ = 0. Then we have On the other hand, using the fact that it follows that the random variable One can recover the definitions for NoiseEst G introduced by Chen et al. [81] as follows.The global Clifford group Cl(n) has two irreps: the span of the identity operator, V 0 = span{I} (which is trivial), and its orthogonal complement V 1 = V ⊥ 0 (the set of all traceless operators).
On the other hand, the local Clifford group Cl(1) ⊗n has 2 n irreps, labeled by all subsets I ⊆ [n].Each I indexes a subsystem of qubits, and each subspace V I is the span of all n-qubit Pauli operators which act nontrivially on exactly that subsystem.Defining one obtains where now Any QEM strategy necessarily incurs a sampling overhead dependent on the amount of noise [91][92][93][94].For global Clifford shadows, Chen et al. [81] show that the sample complexity is augmented by a factor of O(F Z (E) −2 ) for estimating observables with constant Hilbert-Schmidt norm, where F Z (E) = 2 −n b∈{0,1} n ⟨⟨b|E|b⟩⟩ is the average Z-basis fidelity of E. Meanwhile for local Clifford shadows, they prove that product noise of the form exhibits an overhead factor of e O(kξ) for estimating k-local qubit observables.

III. SUMMARY OF RESULTS
The primary contribution of this paper, symmetryadjusted classical shadows, is visualized in Figure 1.We summarize the main idea and results in Sec.III A. Then, we highlight other notable technical innovations: in Sec.III B, we describe a modification to random Pauli measurements required to tailor its irreps for use with common symmetries; in Sec.III C, we discuss an improved design for compiling fermionic Gaussian unitaries with lower circuit depth and fewer gates than prior art; and in Sec.III D, we summarize a symmetry adaptation to fermionic classical shadows which reduces the quantum resources required, applicable to systems with spin symmetry.

A. Symmetry-adjusted classical shadows
Consider a classical shadows protocol over G with target observables O 1 , . . ., O L .Without loss of generality, let each O j ∈ V λ for some subset of irreps λ ∈ R ′ ⊆ R G .Suppose the experiment experiences an unknown noise channel E obeying Assumptions 1.
We show that, if ρ obeys symmetries which are "compatible" with the irreps in R ′ , then it is possible to construct an estimator which accurately predicts the ideal, noiseless observables.By compatible, we mean that there exist symmetry operators S λ ∈ V λ for each λ ∈ R ′ for which their ideal expectation values are known a priori.Using T (noisy) classical shadows, we construct error-mitigated estimates as We find that the relevant noise characterization in this scenario is which can be seen as a generalization of the noise fidelity F Z (E) described in Sec.II C. Here, F Z,R ′ (E) only considers how the noise channel acts within the irreducible subspaces of interest.
As two key applications, we study how symmetryadjusted classical shadows perform in simulations of fermionic and qubit systems.For fermions, we consider G corresponding to fermionic Gaussian unitaries [66] (also known as matchgate shadows [67]).We establish the following performance bound for fermionic systems with particle-number symmetry, Theorem 1 (Fermions with particle-number symmetry, informal).Let ρ be an n-mode state with tr(N ρ) = η fermions.Under the noise model E satisfying Assumptions 1 and assuming η = O(n), matchgate shadows of size suffice to achieve prediction error with high probability, where the observables O j can be taken as all one-and two-body Majorana operators.The dependence on system size n and prediction error ϵ matches noiseless estimation with matchgate shadows [66,67].Meanwhile, the overhead of error mitigation is O(F Z,R ′ (E) −2 ), analogous to prior related results [81,82].The irreps R ′ = {2, 4} correspond to the Majorana degree of the k-body observables.
For qubit systems, we consider G essentially corresponding to the local Clifford group (i.e., random Pauli measurements) [52,53].In order to make the irreducible structure compatible with commonly encountered symmetries, we introduce a technical modification that we call subsystem-symmetrized Pauli shadows (see Sec. III B for a summary).The symmetry we consider here is generated by the total longitudinal magnetization, M = i∈[n] Z i .For error-mitigated prediction of local qubit observables, we have the following result.
Theorem 2 (Qubits with total magnetization symmetry, informal).Let ρ be an n-qubit state with a fixed magnetization, tr(M ρ) = m.Under the noise model E satisfying Assumptions 1 and assuming m = Θ(1), subsystemsymmetrized Pauli shadows of size suffices to achieve prediction error with high probability, where the observables O j can be taken as all one-and two-local Pauli operators.
Note that the irreps of subsystem-symmetrized Pauli shadows are labeled by Pauli weight.The variance bound we advertise here is linear in n, resulting from the extensive nature of the symmetry M .Specifically, we show that when m = Θ(1), ∥M ∥ 2 shadow = O(n) dominates the asymptotic complexity over the k-local Pauli observables (for which our protocol exhibits the usual ∥O j ∥ 2 shadow = 3 k ).This is consistent with standard Pauli shadows, wherein the shadow norm of arbitrary k-local observables scales at most linearly with spectral norm and exponentially in k [52,53].
Besides these two examples, we describe symmetryadjusted classical shadows for a more general class of groups G, and we establish accompanying bounds in Theorem 4. This allows for applications to other systems and unitary distributions.See Sec.IV for the general theory, and Secs.V and VI for the applications to fermion and qubit systems, respectively.
Because our protocol always runs the full noisy quantum circuit, it has the potential to mitigate a wider range of errors than those covered by Assumptions 1, albeit without the rigorous theoretical guarantees.This is a significant feature of the method, as the preparation of ρ often dominates the total circuit complexity (i.e., U prep in Figure 1).
We explore this broader mitigation potential with a series of numerical experiments in Secs.VII A 2 and VII B 2, wherein we simulate noisy Trotter circuits for systems of interacting fermions and spin-1/2 particles, respectively.The gate-level noise model is based on a superconducting architecture, with error rates derived from publicly available data of an existing Google Sycamore processor [33,35,95,96].Overall, we assess that in this more realistic scenario, symmetry-adjusted classical shadows successfully mitigates errors, but with diminishing effectiveness as the circuit grows deeper.We observe an error floor to our approach, beyond which more samples does not improve prediction accuracy due to violations of Assumptions 1.However, even in this regime we see substantially improved qualitative agreement of the mitigated results to the true dynamics.Open-source code for our numerical implementations is available at Ref. [89].
Under this group, the (noiseless) expressions for M and Var[ô] coincide with those of standard Pauli shadows.
This modification therefore reduces the number of irreps from 2 n to n + 1, achieved by symmetrizing over all k-qubit subsystems.Meanwhile, the desirable estimation properties from standard Pauli shadows are retained: for instance, the shadow norm obeys ∥P ∥ 2 shadow = 3 k for klocal Pauli operators P .
The upshot is that the symmetry M is now compatible with this group, thereby enabling results such as Theorem 2. We describe this construction in Sec.VI, with technical details in Appendix B.

C. Improved circuit design for fermionic Gaussian unitaries
Fermionic Gaussian unitaries are a broad class of freefermion rotations, and they are ubiquitous primitives in algorithms for simulating (interacting) fermions.In the context of classical shadows, they form the basis for randomized measurements in matchgate shadows [66][67][68].Such unitaries can be described by an orthogonal transformation Q ∈ O(2n) of the Majorana operators, for each µ ∈ [2n].The quantum circuits implementing these transformations take O(n 2 ) gates in O(n) depth [98,99].While this scaling is generically necessary due to parameter counting, constant-factor savings can substantially improve performance in practice, especially on noisy quantum computers.
To this end, we introduce a new compilation algorithm for fermionic Gaussian unitaries, starting from an arbitrary input Q ∈ O(2n).Using insights into how Majorana modes are mapped to qubits under the Jordan-Wigner transformation, our circuit design improves the parallelization of gates compared to prior art, leading to reductions in both depth and gate count.Numerically, we infer improvements of roughly 1/3 reduction in depth and 1/2 reduction in gate count with respect to a gate set native to superconducting-qubit platforms (see Figure 10).This improved circuit design is implemented in our numerical experiments, in particular those of Sec.VII A 2. The compilation algorithm is described in Appendix D.

D. Spin-adapted matchgate shadows
Systems of spinful fermions often obey a spin symmetry, which allows for compressed block-diagonal representations according to the spin sectors.Such techniques are referred to as symmetry adaptation.We introduce such an adaptation of the matchgate shadows protocol wherein the random distribution is restricted to blockdiagonal orthogonal transformations, We call this protocol spin-adapted matchgate shadows.This restricted group remains informationally complete over operators which respect the spin sectors, thus sufficing for learning properties in systems with this symmetry.
In fact, we show that the shadow norms for k-fermion operators under the spin-adapted protocol scale identically as in the unadapted setting.The main advantage of spin adaptation is that the block-diagonal transformation where P ↓ = Z ⊗n/2 is the parity operator on the spindown sector and s = δ −1,det Q ↑ .This tensor-product unitary requires roughly half the number of gates and circuit depth compared to implementing a dense element of O(2n).We prove the necessary details in Appendix C and implement this modified protocol in our numerical experiments when applicable.

IV. THEORY OF SYMMETRY-ADJUSTED CLASSICAL SHADOWS
We now describe the theory behind symmetry-adjusted classical shadows.This approach uses known symmetry information about the ideal, noiseless state ρ that we wish to prepare (but are only able to produce a noisy version of).In this section we describe the idea for an arbitrary multiplicity-free group G; in Secs.V and VI, we will provide concrete applications to the efficient estimation of local fermionic and qubit observables, respectively.
Suppose ρ is a quantum state obeying a known symmetry, corresponding to a collection of operators S λ ∈ V λ for which the values s λ := ⟨⟨S λ |ρ⟩⟩ are known a priori.For example, suppose the system has a conserved quantity with operator S. Then we construct the S λ operators using the projectors Π λ : assuming that S spans multiple irreps.By construction, S λ is an eigenoperator of both M and M: If one is interested in only a subset R ′ ⊆ R G of the irreps, then it suffices to only know those symmetries S λ for which λ ∈ R ′ .
Because the ideal values of s λ and f λ are already known, we can use the estimated noisy expectation value of S λ to build an estimate for f λ .We start with the standard postprocessing of classical shadows: applying M −1 to the measurement outcomes of the noisy quantum experiments produces, in expectation, the effective state which clearly differs from |ρ⟩⟩ when M ̸ = M. Nonetheless, we can use this noisy data to estimate the value of ⟨⟨S λ | ρ⟩⟩, which is equal to by Eq. (38).In fact, this relation applies to any O ∈ V λ : Hence while we use Eq.(40) to learn f λ from the symmetry S λ , this is in turn applicable to all other operators within the same irrep.This leads to the recovery of the ideal expectation values as Having established the theory in expectation, we now analyze the implementation in practice.Let T be the number of classical-shadow snapshots, |ρ ℓ ⟩⟩ = M −1 U † g ℓ |b ℓ ⟩⟩ for ℓ = 1, . . ., T , obtained by sampling the noisy quantum computer.Recall that these snapshots converge to ρ rather than ρ.From their empirical average, ρ(T ) = (1/T ) T ℓ=1 ρℓ , we can estimate the lefthand side of Eq. ( 40) as ŝλ (T ) := ⟨⟨S λ |ρ(T )⟩⟩.(43) This in turn provides an estimate for f λ , This can be understood as a generalization of NoiseEst G (λ, g, b) from Eq. ( 22), making the replacements D λ → S λ and |0 n ⟩⟨0 n | → ρ.Indeed, one can view the calibration state |0 n ⟩ as obeying the symmetries given by its stabilizer group.Consider the estimation of observables O 1 , . . ., O L with symmetry-adjusted classcial shadows.Suppose without loss of generality that each O j ∈ V λ for some λ ∈ R ′ . 4From the same noisy classical shadow ρ(T ), we also have estimates for their noisy expectation values: E⟨⟨O j |ρ(T )⟩⟩ = ⟨⟨O j | ρ j ⟩⟩.Then, following Eq.(42) we can directly construct error-mitigated estimators as ôEM j (T ) := ôj (T ) ŝλ (T )/s λ , (45) which converges to ⟨⟨O j |ρ⟩⟩ in the (for nontrivial random variables X and Y ), Eq. ( 45) describes a biased estimator.In the following theorem, we quantify this bias by bounding the total prediction error of ôEM j (T ).This in turn bounds the number of symmetryadjusted classical-shadow samples T required.Theorem 4. Fix accuracy and confidence parameters ϵ, δ ∈ (0, 1).Let O 1 , . . ., O L be a collection of observables, each supported on an irrep of U : be a symmetry operator for each λ ∈ R ′ , for which the ideal values s λ = tr(S λ ρ) of the target state ρ are known a priori.
Suppose that each noisy unitary satisfies Assumptions 1, U g = EU g , and define the quantities Then, a (noisy) classical shadow ρ(T ) of size can be used to construct error-mitigated estimates which obey for all 1 ≤ j ≤ L, with success probability at least 1 − δ.
The proof of this statement is presented in Appendix A. Note that ∥ • ∥ ∞ denotes the spectral norm.We phrase this result in terms of variances, rather than the state-independent shadow norm, because knowledge about ρ (namely, its symmetries) can potentially provide tighter bounds.Note that the variance is with respect to the effective noisy state ρ, which was defined in Eq. (39).
We now make a few remarks on this result.First, the symmetry operators appear in the sample complexity, normalized by the value of the symmetry sector as S λ /s λ .Therefore we expect that for typical applications, the variance Var[ŝ λ /s λ ] ≤ ∥S λ /s λ ∥ 2 shadow will be comparable to the baseline variance of estimation, Var[ô j ] ≤ max j ′ ∥O j ′ ∥ 2 shadow .Additionally, the number of irreps considered is typically |R ′ | ≪ L (for instance, in the concrete examples of Secs.V and VI, |R ′ | is a constant).Thus, we expect that the inclusion of symmetry operators incurs negligible overheads for most applications.
Instead, the primary overhead arises from the fact that error-mitigated estimation necessarily comes at the cost of larger overall variances [91][92][93][94].The quantity characterizes an effective noise strength, and it can be seen as a generalization of the average Z-basis fidelity of E, which appears in prior works on noise-robust classical shadows [81,82].In contrast to F Z (E), the quantity F Z,R ′ (E) is a more fine-grained characterization of the noise channel, averaged within the relevant subspaces V λ .Similar to prior results [81][82][83][84][85], the sampling overhead of our error-mitigated estimates also depends inverse quadratically on this noise fidelity.Finally, the error bound we obtain is O(∥O j ∥ ∞ ϵ) when ϵ < 1.Note that ∥O j ∥ ∞ = 1 for Pauli and Majorana operators.Our result also features error terms of order O(∥O j ∥ ∞ ϵ 2 ), which reflect the biased nature of ôEM j (T ).Nonetheless, this bias vanishes as ϵ 2 ∼ 1/T , so that for sufficiently large T the prediction error is dominated by the standard shot-noise scaling of ϵ ∼ 1/ √ T .

V. APPLICATION TO FERMIONIC (MATCHGATE) SHADOWS
The first application of symmetry-adjusted classical shadows that we consider is the estimation of local fermionic observables.This is achieved efficiently by fermionic classical shadows [66], wherein the group G corresponds fermionic Gaussian unitaries (also referred to as matchgate shadows [67]).We will consider a commonly encountered symmetry in fermionic systems: fixed particle number.However, it will be clear how the general idea can apply to other symmetries, such as spin.We begin with a review of matchgate shadows.

A. Background on matchgate shadows
Let a † p , a p be creation and annihilation operators for a system of n fermionic modes, p ∈ [n].The Majorana operators can be constructed as Under the Jordan-Wigner transformation [100], they are mapped to Pauli operators as Recall from Sec. 2 that all d 2 basis operators are generated by taking arbitrary products: where µ = (µ 1 , . . ., µ m ) ⊆ [2n].By convention, we order We can group all the m-degree Majorana indices by defining the set Physical fermionic observables have even degree m = 2k.An important subset of such operators comprises those which are diagonal in the standard basis, corresponding to the index set Using Eq. ( 54), each τ ∈ D 2n,2k corresponds to the Pauli- The group of fermionic Gaussian unitaries is the image of a homomorphism U : O(2n) → U(d) whose adjoint action obeys This transformation generalizes to m-degree Majorana operators Γ µ as where Q ν,µ is the m×m submatrix of Q formed out of its rows and columns indexed by ν and µ, respectively [101, Appendix A].These unitaries are equivalent to (generalized) matchgate circuits [102] and constitute a class of classically simulatable circuits [103][104][105][106][107][108].Fermionic (matchgate) shadows randomize over certain subgroups G ⊆ O(2n) of these Gaussian unitaries.The measurement channel takes the form where the eigenvalues are and each irreducible eigenspace is the image of the projector While U carries 2n + 1 unique irreps (each labeled by a Majorana degree m) [102,109], only the n + 1 irreps λ = 2k have nonvanishing f λ [66,67].Therefore M −1 is formally the pseudoinverse restricted to those subspaces.Finally, the shadow norm of k-body Majorana operators is [66] Variance expressions for arbitrary observables can be found in Refs.[67,68].For the postprocessing of T shadows into estimates of all k-body Majorana observables, we describe an algorithm in Appendix E 1 which runs in time O(n k T ).We now comment on the choice of G ⊆ O(2n).Fermionic classical shadows were introduced in Ref. [66], which initially considered the intersection of proper matchgate circuits [the special orthogonal group SO(2n)] with n-qubit Clifford unitaries Cl(n).The result is the group of all 2n × 2n signed permutation matrices with determinant 1, denoted by B + (2n) ⊂ SO(2n).They also showed that its unsigned subgroup, Alt(2n) ⊂ B + (2n), possesses the same irrep structure [66, Supplemental Material, Theorem 11].While the full, continuous group SO(2n) has not yet been analyzed for classical shadows, it was studied for character randomized benchmarking [110] in Ref. [109,Sec.VI], wherein they demonstrated the presence of multiplicities.These multiplicities can be avoided by enlarging to the generalized matchgate group, i.e., all of O(2n) [102, Lemma 3].Ref. [67] applied these generalized matchgates to fermionic classical shadows, and in particular they prove that the Clifford intersection in this setting (now yielding the subgroup B(2n) ⊂ O(2n) of signed permutation matrices with either determinant ±1) is a 3-design for O(2n).This implies that B(2n) is also multiplicity-free.
Due to the variety of options, for the rest of this paper we assume matchgate shadows under any G with the desired irreps.We note that Ref. [68] introduced a smaller subset of B(2n) based on perfect matchings, which has the same channel M and variances; however its connection to representation theory was not discussed.

B. Utilizing particle-number symmetry
Suppose the ideal state we wish to prepare lies in the ηparticle sector of H.This is a U(1) symmetry generated by the fermion-number operator, which provides us a collection of conserved quantities with which to perform symmetry adjustment.Recall from Eq. ( 62) that Π m projects onto the irrep Then, projecting N k onto V 2k yields the symmetry operators S 2k , and solving the resulting linear system of equations recovers the ideal values for s 2k = tr(S 2k ρ).
For ease of exposition we will consider only k = 1, 2, but one may generalize to higher k using these ideas.Concretely, we start with the fact that a † p a p = (I − Γ (2p,2p+1) )/2, and Γ (2p,2p+1) Γ (2q,2q+1) = Γ (2p,2p+1,2q,2q+1) for p < q.Then, expanding N and N 2 into a linear combination of Majorana operators, one finds Using Eq. ( 64) and the relations between S 2 and S 4 to N and N 2 (for example, N = nI/2 + S 2 ), we arrive at: For the sampling cost incurred by these symmetry operators, we argue that the typical shadow norms of these symmetries are ∥S 2k /s 2k ∥ 2 shadow = O(n k ), which is the same as the base estimation.To see this, consider a triangle inequality on the shadow norm: ). Next, we need to examine how s 2 2k scales with system size.Assuming that s 2 , s 4 ̸ = 0 and that the number of electrons is η = O(n), then from Eqs. ( 68) and ( 69) we see that

Avoiding division by zero
One potential obstruction to symmetry adjustment is when some s 2k = 0.This can occur whenever the particle number takes a specific value: Equation ( 73) occurs at half filling, which is fairly common.On the other hand, Eq. ( 74) occurs only when the number of modes n is a perfect square and the number of particles η is one of two specific values, so it is less likely to occur.Nonetheless, there is a straightforward way to circumvent both possibilities by introducing a single ancilla qubit.
To do so, append an additional fermion mode initialized in the unoccupied state |0⟩, so that the ideal state is now the (n + 1)-mode state ρ ′ = ρ ⊗ |0⟩⟨0|.Given that ρ has η particles on n modes, ρ ′ is an η-particle state on n + 1 modes.The new symmetry operators on the (n + 1)-mode Hilbert space are which have ideal values It is straightforward to check that, if either condition Eq. ( 73) or Eq. ( 74) holds, then s ′ 2 and s ′ 4 are always nonzero for n > 1.
Under the Jordan-Wigner mapping, this modification is easily achieved by initializing a single ancilla qubit in |0⟩.Recall that the terms in the symmetries S 2 , S 4 are the diagonal operators Γ (2p,2p+1) = Z p and Γ (2p,2p+1,2q,2q+1) = Z p Z q .Note also that the ancilla qubit is acted on only during the random unitary U Q (where now Q has dimension 2n + 2) and otherwise does not interact with the n system qubits.

VI. APPLICATION TO QUBIT (PAULI) SHADOWS
Now we turn to the application for local observable estimation in systems of spin-1/2 particles (qubits).Random Pauli measurements are efficient for this task; however, for compatibility with the global U(1) symmetry considered in this work, we must slightly modify the protocol to accommodate its irreps.We begin with a review of the standard Pauli shadows protocol, followed by our modification.

A. Background on standard Pauli shadows
The local Clifford group Cl(1) ⊗n is implemented by uniformly drawing a single-qubit Clifford gate for each qubit independently.It has 2 n irreducible representations, corresponding to all k-qubit subsystems I ⊆ [n], where |I| = k ∈ {0, 1, . . ., n} [111].Twirling M Z by this group yields where f I = 3 −|I| and Π I projects onto the subspace of operators which act nontrivially on precisely the subsystem I.The classical shadows can be expressed compactly as where A more general variance bound was derived in Ref. [53]: a simple loose bound of their result can be stated as where O is an arbitrary k-local traceless observable and R is the number of terms in its Pauli decomposition.However, they argue that a tighter expression, essentially 3 k ∥O∥ 2 ∞ , is a good approximation to the variance.

B. Subsystem symmetrization of Pauli shadows
The irreps of Cl(1) ⊗n are difficult to reconcile with commonly encountered symmetries.For example, consider a conserved total magnetization M = i∈[n] Z i .In terms of qubits, this is equivalent to the different Hamming-weight sectors.Each term Z i lies in a different irrep I = {i}, so M spans multiple irreps rather than having a single conserved quantity per irrep.
To remedy this conflict, we introduce what we call subsystem-symmetrized Pauli shadows, which randomizes over a group whose irreps are labeled only by the qubit locality k, rather than any specific subsystem I of k qubits.(This is analogous to how the matchgate irreps depend only on fermionic locality, due to the inherent antisymmetry of fermions.)We formalize the group as follows.
Definition 5.The subsystem-symmetrized local Clifford group is defined as Cl(1) ⊗n Sym := Sym(n) × Cl(1) ⊗n , where Sym(n) is the symmetric group and Cl( 1) is the single-qubit Clifford group.Its unitary action on H is given by where ⊗n and π ∈ Sym(n) is represented by a permutation of the n qubits: The unitaries S π can be implemented with O(n 2 ) gates and depth O(n), for example by constructing a parallelized network of nearest-neighbor SWAP gates according to an odd-even sorting algorithm [97] applied to π. Representing π as an array of the permuted elements of [n], the sorting algorithm returns a sequence of adjacent transpositions i ↔ i + 1 which maps π to (0, 1, . . ., n − 1).This sequence therefore implements π −1 as desired.Each such transposition then maps to a SWAP i,i+1 gate to construct the quantum circuit.For the postprocessing of T shadows into k-local Pauli estimates, we review in Appendix E 2 the algorithm which runs in time O(n k T ).
We prove the relevant properties of subsystemsymmetrized Pauli shadows in Appendix B, namely its irreps and the shadow norm of local observables.We summarize the results here: each irrep is the space of all k-local operators, (84) for each k ∈ {0, 1, . . ., n}.Hence the (noisy) measurement channel is where When E is the identity channel, we recover Also in the absence of noise, the variance formulas are exactly the same as in standard Pauli shadows.5

C. Utilizing total magnetization symmetry
We take the U(1) symmetry generated by a total magnetization M = i∈[n] Z i .Suppose the ideal state has a known value of m = tr(M ρ) (equivalently, ρ lives in a sector of fixed Hamming weight (n − m)/2).The symmetries projected into the irreps of Cl(1) ⊗n Sym are then whose ideal values are As in the fermionic setting, we encounter issues if s 1 or s 2 vanish (i.e., m = 0 or m = ± √ n, respectively).In this case, we can perform the same ancilla trick, appending a qubit in |0⟩ and modifying the conserved quantities to The variances of the symmetry operators are whenever the ideal state lives in a symmetry sector of constant m = Θ(1).We show this in Appendix B 2, along with general m-dependent expressions in Eqs.(B43) and (B57).This n-dependent variance bound reflects the fact that the symmetries are extensive properties.While local Pauli operators have variances bounded by a constant, we point out that many local observables of interest are linear combinations of an extensive number of Pauli terms.As such, their shadow norms typically grow with system size as well (recall the discussion at the end of Sec.VI A).

VII. NUMERICAL EXPERIMENTS
We now demonstrate the error-mitigation capabilities of symmetry-adjusted classical shadows through numerical simulations.We focus on the task of estimating oneand two-body observables in both fermion and qubit systems which obey the global U(1) symmetries described in Secs.V B and VI C. We provide open-source code for these calculations at Ref. [89].
For each type of system, we first present results when the noise models obey Assumptions 1 (readout errors).We demonstrate the successful mitigation at varying sample sizes, noise rates, and system sizes, confirming the correctness of our theory.
Next, we investigate how symmetry adjustment performs under a more comprehensive noise model based on superconducting-qubit platforms.
These simulations were performed using the Quantum Virtual Machine (QVM) within the Cirq open-source software package [95,96].It uses existing hardware data on a native gate set (single-qubit rotations and two-qubit √ iSWAP gates on a square lattice) to mimic the realistic performance of a noisy quantum computer.We use the calibration data provided of Google's 23-qubit Rainbow processor based on the Sycamore architecture, which was used in quantum experiments simulating quantum chemistry and strongly correlated materials [33,35].The noise model consists of depolarizing channels, two-qubit coherent errors, single-qubit idling noise, and readout errors.Error rates vary across the chip; on the 2 × 4 grid that we simulated, the average single-and two-qubit Pauli error rates are ∼0.15% and ∼1.5%, respectively.A precise description of the noise model can be found in Appendix F 2.
Throughout, we use the following conventions for figures.Noiseless data (blue squares) correspond to simulations of an ideal quantum computer, which experiences no noise channel and only exhibits the fundamental sampling error.Unmitigated data (black X's) are simulations of classical shadows on a noisy quantum computer, using standard postprocessing routines.The mitigated estimates (red diamonds) are instead postprocessed as symmetry-adjusted classical shadows, as described in Sec.IV.In some experiments, we also compare against robust shadow estimation [81] (RShadow, green crosses), which involves simulating the calibration protocol on |0 n ⟩ under the same noise model.Finally, the true values (teal curves) are the ground truth, against which we determine the prediction error.
Uncertainty bars represent one standard deviation of the combined sampling and postprocessing, computed by empirical bootstrapping [112].To ease the computational load, we slightly modify the procedure by batching samples; see Appendix F 5 for details.

A. Fermionic systems
Our first set of numerical experiments consider the application to matchgate shadows to learn and mitigate noise in one-and two-body fermionic observables.

Readout noise
First, we consider the reconstruction of the fermionic two-body reduced density matrix (2-RDM) from matchgate shadows.The 2-RDM elements of a state ρ are given by In general, knowledge of the k-RDM allows one to calculate any k-body observable of the system.By anticommutation relations, there are only n 2 2 unique matrix elements, corresponding to the indices p < q and r < s.We therefore represent 2 D as an n 2 × n 2 Hermitian matrix, flattening along those index pairs.Estimates 2 Dpq rs (T ) = tr(a † p a † q a s a r ρ(T )) are computed from T matchgate-shadow samples.Here, our figure of merit for the prediction error is the spectral-norm difference between the reconstructed and the numerically exact 2- We demonstrate 2-RDM reconstruction on an ensemble of 20 random Slater determinants (noninteractingfermion states with particle number η).An η-fermion Slater determinant is specified by the first η columns of an n×n unitary matrix, so we generate the random states by uniformly drawing elements of U(n).This n × n representation is then lifted to the 2n × 2n fermionic Gaussian representation, which allows us to apply the random matchgate transformations Q ∈ B(2n) efficiently.This simulates the action of ρ → U Q ρU † Q .The measurement of this rotated state is then simulated using the algorithm of Ref. [113,Sec. 5.1].Finally, to simulate the readout noise we implement the effective noise channel on the sampled bit strings offline.
While the 2-RDM of free-fermion states can be computed from the 1-RDM using Wick's theorem, we do not employ any such tricks here.(We use Slater determinants simply to facilitate fast classical simulation.)We also do not use any additional error-mitigation strategies, such as RDM positivity constraints [114], that could in principle be applied in tandem.
The results are presented in Figure 2. We consider a small system size, n = 8 and η = 2, and simulate three types of single-qubit noise channels before readout: depolarizing, amplitude damping, and bit flip.The noise rate p represents the probability of such an error occurring, independently on each qubit (see Appendix F 1).In the top row, we show how the prediction error varies with the total number of samples T .As expected, the noiseless estimates (corresponding to p = 0) converge as ∼ T − which is the standard shot-limited behavior.Then, setting p = 0.2, we see how the unmitigated data experiences an error floor beyond which taking additional samples does not improve the accuracy.On the other hand, the mitigated results clearly bypass this error floor and recover the shot-noise scaling with T , thus validating the theory of symmetry-adjusted classical shadows.Compared to the noiseless simulations, our mitigated data exhibit a constant factor increase in the sampling cost, corresponding to the overhead of O(F −2 Z,R ′ ) appearing in Theorem 4.
For these experiments, we also compare to the performance of robust shadow estimation (RShadow) by Chen et al. [81], which requires simulating the calibration procedure on |0 n ⟩.For a fair comparison, we allocate T /2 samples to the calibration step and T /2 samples to the estimation step, so that the total number of samples is the same.While Chen et al. [81] did not originally consider matchgate shadows, from our generalization in Eq. ( 22) we can construct NoiseEst B(2n) by taking As expected, RShadow behaves similarly to symmetryadjusted classical shadows in this scenario wherein the noise obeys Assumptions 1.However, even here we observe the advantage of our approach in terms of the number of samples.We attribute the additional overhead of RShadow to its calibration procedure, which symmetry adjustment avoids.
In the bottom row of Figure 2, we simulate the same collection of random Slater determinants, but now varying the noise rate p at a fixed sample size T = 10 6 .While the unmitigated errors quickly grow with increasing noise rate, the mitigated estimates remain under control.Note that the errors of the mitigation protocol still grow modestly because we have fixed the number of samples; in order to achieve a constant prediction error, one would need to scale T proportional to Our key takeaway is that the combination of both rows of plots indicates the ability to handle a range of noise channels and error rates.Indeed, the growing errors seen in the bottom row can be suppressed by simply taking more samples, which is what the top row demonstrates.
Next, we consider the simulation of a 1D spinful Fermi-Hubbard chain of L = n/2 sites (for a total of n fermionic modes/qubits).Under open boundary conditions, the Hamiltonian for this model is where are the hopping and interaction terms, respectively.The creation operators a † i,σ produce an electron at site i with spin σ, and N i,σ = a † i,σ a i,σ is the associated occupationnumber operator.We set units such that the hopping strength is t = 1.
For the target state, we use the ground state of the noninteracting term J, which is also a Slater determinant.This allows us to use the same simulation techniques as before to efficiently simulate up to 20 sites.The number of electrons in each spin sector is  3. Estimation of the energy per electron, ⟨H⟩/η, of a 1D spinful Fermi-Hubbard model at interaction strength U/t = 4 on 4 ≤ L ≤ 20 sites, measured using spin-adapted matchgate shadows with T = 2 × 10 6 samples.We simulate the protocol on the ground state of the noninteracting component J of the Hamiltonian, Eq. ( 98), at half filling in each spin sector, which is a Slater determinant with ησ = L/2 electrons per sector.These simulations involve n ′ = 2L + 2 noisy qubits experiencing readout bit-flip errors with probabilities 1%, 3%, and 5%.The additional qubit per spin sector is prepared in |0⟩ to avoid division by zero (see Sec. V B 1).
system is at half filling, which requires the use of ancilla qubits to avoid division by zero (recall Sec.V B 1).In fact, we simulate n ′ = n + 2 qubits because we append an ancilla qubit to each spin sector.This because we employ spin-adapted matchgate shadows, described in Sec.III D and Appendix C.This modification essentially treats each spin sector independently in terms of the randomized measurements, and so each sector itself is at half filling.
The Fermi-Hubbard results are shown in Figure 3.We consider the estimation of energy per electron, ⟨H⟩/η.We set the interaction strength to U/t = 4 and the noise model to single-qubit bit-flip errors, with probabilities p ∈ {0.01, 0.03, 0.05}.The energy per electron (top) and absolute estimation error (bottom) are plotted as the system size grows, keeping the number of samples fixed at T = 2 × 10 6 .Again, these results serve to validate our theory, showing the correctness of symmetry adjustment at larger system sizes (recall that number of electrons scales with the number of sites at half filling).This also demonstrates the use of spin-adapted matchgate shadows and the successful use of ancillas to avoid division by zero in ôEM j .

QVM noise model
Now we turn to the gate-level noise model simulated through the QVM [95,96].This model strongly violates Assumptions 1, reflecting the fact that the statepreparation circuit U prep is typically the dominant source of errors.
As our testbed fermionic system, we again consider the 1D spinful Fermi-Hubbard chain with open boundary conditions and interaction strength U/t = 4. Rather than the static problem, here we simulate a Trotterized time evolution of the Hamiltonian, as the number of Trotter steps provides a systematic way to increase the circuit depth (and hence the cumulative amount of noise).Note that because we are focusing on the mitigation of noisy quantum circuits, the ground truth corresponds to the noiseless Trotter circuit with a finite step size (i.e., we are not interested in the exact non-Trotterized dynamics).
We closely follow the setup of the experiment performed in Ref. [35] (which was in fact performed on a Sycamore processor that our noise model is based on), using code made available by the authors at Ref. [115].Because simulating the full noisy circuit is exponentially expensive, we restrict to a four-site instance (n = 2L = 8).The initial state is the ground state in the η ↑ , η ↓ = 1 sector of the noninteracting Hamiltonian where J is the hopping term defined in Eq. ( 98) and we set the on-site potentials to have a Gaussian form, ε i,σ = −λ σ e − 1 2 (i+1−c) 2 /s 2 .This generates a Slater determinant whose charge density has a Gaussian profile, centered around c with width s and magnitude λ σ .We set the parameters to c = L/2 + 1/2 = 2.5, s = 7/3, and λ σ = 4δ σ,↑ .This initial state is prepared by the appropriate single-particle basis rotations (a subset of fermionic Gaussian unitaries) [98,116,117] on the state |1000⟩ within each spin sector.Denote this unitary by U (H 0 ).The system is then evolved by Trotterized dynamics according to H, with R ∈ {0, 1, . . ., 5} steps of size δt = 0.2.Let J even (resp.,J odd ) be the terms in J with i even (resp., odd), and similarly for V even , V odd .One Trotter step is ordered as ).The noise model is that of the Google Sycamore Rainbow processor [33,35], implemented within the Cirq QVM [95,96].The size of the noisy circuit grows systematically with the number of Trotter steps.We use spin-adapted matchgate shadows, taking T = 9.6 × which is then compiled into the native gate set.The full state-preparation circuit is then where X 0,σ places a spin-σ electron on the first site from the vacuum (i.e., prepares |1000⟩ in each spin sector).
Note that R = 0 corresponds to only preparing the initial Slater determinant.Further details on the construction of these circuits are available in Refs.[35,115].One final detail of Ref. [35] that we follow is their method of qubit assignment averaging (QAA).This technique is employed as a means of ameliorating inhomogeneities in error rates across the quantum device.QAA works by identifying a collection of different assignments for the physical qubit labels and uniformly averaging over them (note that the Jordan-Wigner convention is kept fixed).For example, one may vary qubit assignments by selecting a different portion of the chip, or rotating/flipping the layout.Here, we fix a 2 × 4 grid of qubits and perform QAA over four different orderings of those eight qubits; see Appendix F 4 for the specific assignments chosen.
For each target state U prep (R)|0 n ⟩, we collect T = 9.6 × 10 5 spin-adapted matchgate shadow samples.In Figure 4, we plot the Trotterized time evolution of charge density throughout the chain, as well as the charge spread which quantifies how the density spreads away from the center of the chain.These quantities are only one-body observables, so as an example two-body observable we also plot the energy per electron, ⟨H⟩/η.Because Assumptions 1 no longer hold, we no longer have the guarantees of Theorem 4 and we do not observe an arbitrary amount of error mitigation.We see that as the circuit size grows, so too do the prediction error and uncertainty.This behavior is a reflection of the noise assumptions being increasingly violated.Nonetheless, our results still show a substantial amount of noise reduction, and overall we maintain the qualitative features of the dynamics compared to the unmitigated protocol.

B. Qubit systems
Next, we study the application of symmetry-adjusted classical shadows to subsystem-symmetrized Pauli shadows, to predict one-and two-body qubit observables in the presence of noise.  5. Average estimation error of the 2-RDM over all two-qubit subsystems, reconstructed from (subsystem-symmetrized) Pauli shadows.Error is quantified by the spectral-norm difference between the estimated and true 2-RDM for each subsystem, taking the mean over all n 2 subsystems.We simulate the protocol on a collection of 20 random eight-qubit matrix product states, with maximum bond dimension χ ≤ n and fixed Z magnetization, m = 0.Because m = 0 implies s1 = 0, we simulate a system of nine qubits with the ancilla prepared in |0⟩.(Top) Scaling of estimation error with the total number of samples T , fixing the noise rate to p = 0.2 for all noise models.(Bottom) Scaling of the estimation error with the noise rate p, fixing the total number of samples to T = 10 6 .
Numerically, we implement all MPS calculations using the open-source software ITensor [120], which can guarantee the correct symmetry sector using efficient tensornetwork representations.Within such representations, it is straightforward to apply random local Clifford gates and SWAP gates, and to sample measurements in the computational basis.
Unlike fermions, qubits are not symmetrized, so their 2-RDMs generally differ between different two-qubit subsystems.
Our accuracy metric here is therefore the mean 2-RDM error over all pairs of qubits: From subsystem-symmetrized Pauli shadows ρ(T ) of size T , we reconstruct the qubit 2-RDMs by estimating all one-and two-local Pauli expectation values and forming the 4 × 4 matrices tr The results are shown in Figure 5.Our conclusions here are entirely parallel to those of Figure 2, and we refer the reader to its corresponding discussion.We note here that this simple demonstration also validates the subsystem-symmetrized Pauli shadows protocol and our use of the ancilla trick for Pauli shadows (recall that the random MPS have vanishing symmetry value, m = s 1 = 0).
Throughout, we set units such that J = 1 and consider an anisotropy of ∆ = 1.5.This Hamiltonian has the U(1) symmetry described in Section VI C, and in particular the ground state obeys m = 0 (assuming the number of spins n is even).We find the ground state via the density-matrix renormalization group (DMRG) algorithm [121], represented as an MPS; therefore we apply the same classical simulation algorithms as before.Although m = 0 implies a vanishing conserved quantity for the one-body subspace, s 1 = 0, we do not employ the ancilla technique for these simulations because we will only be interested in strictly two-body observables (for which As a first demonstration, in Figure 6 we plot the mitigation of spin-spin correlation functions in a chain of length n = 48, fixing one of the spins to the end of the chain.The noise model is set to a single-qubit bit-flip channel with flip rate p = 0.05.The correlation between spins i and j is defined as the expectation value of the operator S i • S j , where Then in Figure 7 we show the mitigation of macroscopic observables at different system sizes and bit-flip rates.
The top two rows of plots show the estimation of energy per spin ⟨H⟩/n, while the bottom two rows show the estimation of a Néel order parameter, which quantifies antiferromagnetic correlations throughout the chain.For these experiments, the number of samples taken is T = 10 6 .Overall, we draw conclusions parallel to those of Figure 3. Namely, the results validate our theory for a range of observables, noise rates, and system sizes.

QVM noise model
We now turn to simulations using the QVM noise model, taking the same XXZ Heisenberg spin chain (∆ = 1.5 and n = 8) as our testbed system.Similar to our numerical experiments with the Fermi-Hubbard model, we simulate Trotter circuits of the XXZ model starting from a product state within the symmetry sector of m = 0. Again, we will only be interested in strictly two-local observables so we do not employ the ancilla trick here either.
Our initial state is a Néel-ordered product state, |01010101⟩ = j odd X j |0 n ⟩.Defining H even and H odd as the terms in H with i even and odd, respectively, a single Trotter step is given by where we take the step size to be δt = 0.2.Hence, the full state-preparation circuit for R steps is which is then compiled into the native gate set.For each R, we collect T = 4.8 × 10 5 samples using subsystemsymmetrized Pauli shadows.Because the initial state is a simple basis state, we only display results for R ∈ {1, . . ., 5} for these studies.In line with our Fermi-Hubbard simulations on the QVM, we perform QAA here as well, averaging over twelve different assignments of the same 2 × 4 qubits; see Appendix F 4 for details.First, we compute the spin-spin correlations for all qubit pairs (i, j) throughout the chain.We plot the prediction errors of these correlation functions in Figure 8, with the unmitigated data in the first row and mitigated data in the second row.We observe that, while the shallower circuits are well handled by symmetry-adjusted classical shadows, the mitigation power diminishes as the circuit grows deeper.To examine this effect closer, we plot in the bottom two rows of Figure 8 the correlation functions between the first spin and the rest of the chain.We see that the ⟨S 0 • S 1 ⟩ errors are particularly dominant due to the magnitude of its true value.Although the absolute error is only marginally improved, we interpret the qualitative behavior as being more faithfully recovered compared to the unmitigated data.
Next, we consider macroscopic observables in Figure 9, the Néel order parameter ⟨S 2 AF ⟩ and energy per spin ⟨H⟩/n.Again we see general trends similar to the other QVM simulations: the mitigated results are in closer qualitative agreement with the true values than the unmitigated data, at the cost of larger uncertainty bars, and without arbitrary amounts of error mitigation.Symmetry adjustment consistently reduces the absolute error compared to the unmitigated data, although we note that some of the energy estimates are still a few standard deviations away from the true value.

VIII. DISCUSSION
In this paper, we have introduced symmetry-adjusted classical shadows, a QEM protocol applicable to quantum systems with known symmetries.Our approach builds on the highly successful classical-shadow tomography [52,53], modifying the classically computed linearinversion step according to symmetry information in the presence of noise.Because our strategy is performed in postprocessing on the noisy measurement data, it allows for straightforward combinations with other QEM strategies.As opposed to prior related works [81][82][83][84][85], the main advantage of our approach is the use of the entire noisy circuit, thereby bypassing the need for calibration experiments and accounting for errors in state preparation.Meanwhile, in contrast with other symmetry-based strategies [79,[86][87][88], we require no additional quantum resources, utilize finer-grained symmetry information, and can easily take advantage of a wider range of symmetries (e.g., particle number as opposed to only parity conservation).
Overall, our findings reveal that as a low-cost scheme, symmetry-adjusted classical shadows by itself is already potent for practical error mitigation.Our analytical results guarantee the accuracy of prediction under readout noise assumptions.Even when these assumptions are violated in practice, we expect these results to still provide intuition regarding the mitigation behavior.Indeed, this expectation is validated by our numerical experiments with superconducting-qubit noise models on the Cirq QVM [95,96].From these simulations, we have observed substantial quantitative improvement when the cumulative circuit noise is sufficiently weak, and qualitative improvements across all experiments performed.
Along the way, we have developed a number of ancillary results that may also be of independent interest.Of note are (1) the subsystem-symmetrized Pauli shadows, which uniformly symmetrizes the irreps of the local Clifford group among subsystems; (2) a new circuit compilation scheme for fermionic Gaussian unitaries, which treats Majorana modes on a more natural footing to improve two-qubit gate parallelization; and (3) symmetryadapted matchgate shadows, which uses block-diagonal transformations within spin sectors to reduce the size of the random matchgate circuits.We expect that these techniques will find broader applicability in quantum simulation beyond the scope of this paper.
A number of pertinent open questions and future directions remain.For simplicity of the protocol, and because of the examples that we focused on, we restricted attention to multiplicity-free groups.However, tools to generalize to non-multiplicity-free groups already exist, and in the context of character randomized benchmarking [110] such an extension has been developed successfully [109].It would therefore be useful to extend our ideas similarly, and investigate what effect (if any) multiplicities have on symmetry-adjusted classical shadows.
Regarding the protocols considered, we have focused on local observable estimation in systems with global U(1) symmetry.However, it is worth noting that the nqubit Clifford group possesses only one nontrivial irrep, making it essentially compatible with any symmetry.Because its shadow norm is exponentially large for local observables, it is an unfavorable choice for typical quantumsimulation applications.One wonders whether this desirable universality of its irrep can nonetheless be harnessed, analogous to our construction of subsystem-symmetrized Pauli shadows.We posit that global SU(2)/Cl(1) control [122], or single-fermion U(n) basis rotations [69], would be particularly promising groups to investigate.Alternatively, one may consider different classes of symmetries, such as local (rather than global) symmetries.
One key advantage of symmetry adjustment is its flexibility, allowing for easy integration with other errormitigation strategies.Investigating this interplay is a clear target for future work.Particularly valuable would be other techniques to massage the circuit noise into approximately satisfying Assumptions 1, for instance by randomized compiling [123].From our usage of QAA [35] in the numerical experiments, we have already shown heuristically that the mere choice of qubit assignments appears to have such an effect.
Indeed, the reliance on such assumptions for rigorous guarantees may be viewed as a limitation of this work.While our numerical results are encouraging, it behooves one to seek a more comprehensive error analysis applicable to a wider range of noise models.For example, while gate-dependent errors are particularly detrimental to our method, they have been closely studied in the context of randomized benchmarking [124][125][126][127].The tools developed therein may be valuable to this setting as well.Establishing a better understanding here may also inspire extensions to surpass the limitations of the current theory.We leave such goals to future work.

be a collection of observables, each supported on an irrep of
be a symmetry operator for each λ ∈ R ′ , for which the ideal values s λ = tr(S λ ρ) of the target state ρ are known a priori.Suppose that each noisy unitary satisfies Assumptions 1, U g = EU g , and define the quantities Then, a (noisy) classical shadow ρ(T ) of size can be used to construct error-mitigated estimates which obey for all 1 ≤ j ≤ L, with success probability at least 1 − δ.
Proof.Let ρ1 , . . ., ρT be the T noisy classical shadows.Construct the mean of these snapshots, (It is straightforward to replace this by a median-ofmeans estimator if necessary.)In expectation we have E[ρ(T )] = ρ, where the effective noisy state can be described as Let O ∈ V λ , with symmetry s λ = tr(S λ ρ) in the same irrep.Define estimates of the noisy expectation values using ρ(T ): In expectation, these random variables obey E[ X] = tr(O ρ) and E[ Ȳ ] = tr(S λ ρ)/s λ = f λ /f λ .Therefore as established by Eq. ( 41) from the main text, we have where we have defined the function r(x, y) := x/y.From a finite number of samples, however, we can only construct ôEM (T ) := r( X, Ȳ ), which is generally a biased estimator since To quantify the estimation error, we employ Taylor's remainder theorem: expanding r(x, y) to first order about a point (x 0 , y 0 ), we have where the remainder term is for some points a ∈ [min(x, x 0 ), max(x, x 0 )] and b ∈ [min(y, y 0 ), max(y, y 0 )].The relevant partial derivatives of r(x, y) are enumerated below: Suppose T is large enough such that (with high prob-ability) the estimation error of all noisy observables are uniformly bounded by some ε ∈ (0, 1): This is achieved by standard classical shadow arguments, which we will elaborate on later.For now, assuming these error bounds hold, we rearrange Eq. (A11), set (x, y) = ( X, Ȳ ) and (x 0 , y 0 ) = (E[ X], E[ Ȳ ]), and apply a triangle inequality to obtain To proceed with this error bound, we make the following observations.First, note that which we will denote by ξ λ .We assume that that noise channel E is such that ξ λ > 0, as otherwise the quantity Thus Eq. (A20) becomes We can bound the remainder term |h 1 ( X, Ȳ )| as follows.Applying a triangle inequality to Eq. (A12) yields Taylor's remainder theorem tells us that the value of a (resp., b) lies between X and E[ X] (resp., Ȳ and E[ Ȳ ]), which we know are at most ε apart.We can therefore bound Similarly for b, using the fact that If ε < ξ λ , then |b| ≥ ξ λ − ε > 0 always holds.We will see later that this condition is always justified; for now, we will just suppose that this lower bound on |b| holds.Then the remainder obeys Combining Eqs.(A23) and (A27), we arrive at In order to bound this error by O(∥O∥ ∞ ϵ) for some desired ϵ ∈ (0, 1), we can choose ε = ξ λ ϵ, yielding ôEM (T ) − tr(Oρ) ≤ (∥O∥ ∞ + 1)ϵ (A29) Thus, by demanding ϵ < 1 we ensure that the required technical condition ε < ξ λ is met.Now we need to verify that the remainder term is bounded by O(∥O∥ ∞ ϵ 2 ), so that the O(∥O∥ ∞ ϵ) term dominates asymptotically as ϵ → 0. Indeed, as long as ϵ is bounded away from 1 then Finally, we analyze the sample complexity required to achieve the error bound of Eq. (A29).In Eqs.(A18) and (A19) we required that the number of samples T be such that the shot noise of X and Ȳ are at most ε.These random variables correspond to the observables A31) suffices to accomplish this task (with probability at least 1 − δ) [52].Then, setting ε = min λ∈R ′ ξ λ ϵ ensures that ε is small enough for Eq.(A29) to apply to all target observables.
where π ∈ Sym(n) acts on (C 2 ) ⊗n as For shorthand, we write π(b) for the n-bit string b . It is clear that the adjoint representation U (π,C) block diagonalizes into subspaces spanned by k-local Pauli operators: This can be seen from the fact that neither single-qubit nor SWAP gates can change the operator locality; however, SWAP gates can map between equally sized subsystems on which the operator nontrivially acts.What remains is to show that each of these subspaces is irreducible.
First, we define the twirling map.Definition 6.Let ϕ : G → U(V ) be a unitary representation of a compact group G on a vector space V , and let Φ : G → U(L(V )) be its adjoint action, i.e., Φ g (•) = ϕ g (•)ϕ † g .The t-fold twirl by Φ is defined as which is a linear map on L(V ) ⊗t .
Twirls have a number of convenient properties, mostly arising from the fact that Φ is a group homomorphism.For example, they are G-invariant from the left and right: for all h ∈ G.This furthermore implies that they are in fact projectors: The study of twirls also allows us to determine the irreducible representations of a group.This can be seen by the following well-known result for multiplicityfree groups, which for completeness we provide a selfcontained proof of at the end of this subsection.
Proposition 7. Let G, V , ϕ, and Φ be as in Definition 6.For any X ∈ L(V ), the 1-twirl of X by Φ takes the form if and only if ϕ decomposes irreducibly as V = λ∈R G V λ , where Π λ is the orthogonal projector onto V λ .

Our strategy for determining the irreps of G = Cl(1) ⊗n
Sym is therefore to directly compute T 1,Φ , from which we can infer the irreps from its block-diagonal structure.To use Proposition 7, we will take ϕ as the unitary channel U, so that V = L(H) and Φ(•) = U(•)U † (note that this is a superchannel).For technical reasons, it will be easier to first compute T 2,U , from which the desired twirl T 1,Φ can be evaluated.The relation between these two twirls is given by the following lemma.Lemma 8. Let U : G → U(L(H)) be a unitary representation and Φ : G → U(L(L(H))) its adjoint representation, i.e., Φ g (A) = U † g AU g for any superoperator A. The 1-twirl by Φ can be computed from the 2-twirl by U as for all A ∈ L(L(H)) and X ∈ L(H).Here, the domain of T 2,U is understood with respect to the isomorphism L(L(H)) ∼ = L(H) ⊗2 , given by ] is an orthonormal operator basis.By a direct calculation: Before we can compute T 2,U for the subsystemsymmetrized local Clifford group, we will need a small result about the group orbit of a k-local Pauli operator P under the action of U. The orbit is defined as (B11) This will help us determine how the twirl acts on Pauli operators, which as an basis is used to compute the matrix elements of T 2,U .To this end, we define an orthonormal basis of k-local Pauli operators, Then, conjugation by S π for arbitrary π ∈ Sym(n) permutes the k nontrivial factors of P among the n qubits.The orbit over all permutations yields all possible n k supports.Taking the direct product of both these Clifford-and symmetric-group actions therefore yields all k-local Pauli operators, with prefactors ±1.
We are now ready to compute the 2-fold twirl by U. We comment that the high-level proof structure of this lemma is inspired by that of Ref. [67, Section IV A 1].

Lemma 10. Let U : G → U(L(H)) be the unitary representation of G = Cl(1) ⊗n
Sym , defined by where |Σ (2) Proof.First, we will establish that for any two basis Pauli operators P ̸ = Q, we have T 2,U |P ⟩⟩|Q⟩⟩ = 0. Thus we only need to consider basis elements of L(H) ⊗2 of the form |P ⟩⟩|P ⟩⟩.Next, we will show that T 2,U |P ⟩⟩|P ⟩⟩ = T 2,U |P ′ ⟩⟩|P ′ ⟩⟩ whenever |P | = |P ′ |.Finally, using these two properties we can derive Eq. (B13).Fix the basis of Pauli operators such that P, Q ∈ 0≤k≤n B k .If P ̸ = Q, then there exists at least one qubit i ∈ [n] on which P and Q act as a different Pauli matrix.Hence there always exists some W ∈ P(1) which anticommutes with one and commutes with the other, e.g., ) where e ∈ Sym(n) is the identity permutation.Thus using the property that implying that T 2,U |P ⟩⟩|Q⟩⟩ = 0. Now let P, P ′ be k-local Pauli operators for any k.If they act nontrivially on different subsets I, I ′ ⊆ [n] of qubits, then let π ∈ Sym(n) be a permutation that maps I to I ′ .Given this permutation, if they act as different Pauli matrices on their new shared support I ′ , then furthermore let C i ∈ Cl(1) for i ∈ I ′ be Clifford gates that map each one to the other.Writing k) , this transformation acts as We are now ready to derive Eq. (B13).As established by Eq. (B15), we only need to expand the 2-fold twirl in the basis of |P ⟩⟩|P ⟩⟩: where the second simplification is due to the fact that U (π,C) preserves Pauli locality, hence k for all P, P ′ ∈ B k (i.e., the matrix element does not depend on the particular choice of P, P ′ ).Hence (B19) We first compute the average over the group for some fixed P .By Lemma 9, we know that the orbit where the factor of 2 is due to the fact that for each Q ∈ B k , both ±Q ∈ G • P , and the factor of |G|/|G • P | takes care of double counting when summing over all elements of G. Noting that |G • P | = 2|B k |, we can plug this result into Eq.(B19) to find that as desired.
We are now ready to prove the main result of this section: the irreps of Cl(1) ⊗n Sym are labeled by the Pauli weights k ∈ {0, 1, . . ., n}.The proof structure is as follows: from the expression for T 2,U from Lemma 10, we can compute T 1,Φ by using Lemma 8. Then by examining T 1,Φ , we use Proposition 7 to infer the irreps.
Theorem 11.The representation U : Cl(1) ⊗n Sym → U(L(H)), defined by U (π,C) (ρ) = S π CρC † S † π , decomposes into the irreps Proof.From Lemma 10, we have where k ⟩⟩ is defined in Eq. (B14).Using Lemma 8, we compute T 1,Φ (A) by evaluating T 2,U (A) for arbitrary superoperators A. Let us express A in the Pauli basis: Recall from Eq. (B9) that in order to evaluate T 2,U (A), we need T 2,U (Q ⊗ P ) for every P, Q.But because T 2,U projects onto symmetrized basis elements P ⊗ P , we only have to consider the case where P = Q: where We make a number of observations here.First, note that Also, |B k | = tr(Π k ).Finally, the sum over P ′ can be represented as Because this holds for all X ∈ L(H), we can say that By Proposition 7, we know that the twirl has this expression if and only if the irreducible subspaces of U are Finally, we close this subsection with the deferred proof of the well-known result Proposition 7, for completeness.Proposition 7).For the forward direction, suppose ϕ = λ∈R G ϕ (λ) where each ϕ (λ) : G → U(V λ ) is irreducible.(This is guaranteed by Maschke's theorem, and generalizes to the Peter-Weyl theorem for compact groups [90].)Because T 1,Φ (X) commutes with all ϕ g , they are simultaneously block diagonal, so T

Proof (of
From the orthogonality of projectors Π λ Π λ ′ = δ λλ ′ Π λ , the scalar c λ (X) is determined by For the reverse direction, suppose the twirl takes the form where we denote each block by T 1,Φ (λ) (X).Again because T 1,Φ (X) and ϕ g commute, the matrix ϕ g is block diagonal in the subspaces V λ for all g ∈ G.We need to show that each block ϕ (λ) is irreducible.
Recall that ϕ (λ) is irreducible if the only subspaces g |v⟩ ∈ W . Suppose there exists a vector |x⟩ ∈ V λ that is orthogonal to W and set Supposing |x⟩ ̸ = 0, we see that |v⟩ = 0 is the only possible element of W to satisfy Eqs.(B32) and (B33) simultaneously.Hence W = {0}.Otherwise, |x⟩ = 0 is the only element of V λ orthogonal to W , implying that there is in fact no nontrivial subspace orthogonal to W . Thus W = V λ in this case.

Variance of symmetry operators
In this section, we analyze the variance associated with the symmetry operators, Because our error analysis of symmetry-adjusted classical shadows (see Appendix A) bootstraps from the variance of unmitigated estimation, we only need to compute quantities related to the noiseless protocol.In this case, the subsystem-symmetrized local Clifford group yields the same channel and variances as the standard local Clifford group because the random permutations have no effect on the twirling on computational basis states.Specifically, using the two-and three-fold twirls we can express and where the t-fold twirl by U : G → U(H) is defined as T t,G := E g∼G U ⊗t g .These expressions are the same whether we take G = Cl(1) ⊗n Sym or Cl(1) ⊗n , due to the following equivalence: The third equality follows due to the fact that permutations are bijections, hence each b∈{0,1} n |π(b)⟩⟨π(b)| ⊗t is just a reordering of the terms in b∈{0,1} n |b⟩⟨b| ⊗t .
As an immediate consequence, we see that the variance of observables under subsystem symmetrization are exactly the same as with standard Pauli shadows.For the rest of this section, we will explicitly compute the variance of S k using known Haar-averaging formulas over the Clifford group [52, Eqs.(S35) and (S36)]: for all unit vectors |x⟩ ∈ C 2 and Hermitian matrices A, B 0 , C 0 ∈ C 2×2 , with tr B 0 = tr C 0 = 0.The extension to Cl(1) ⊗n follows by linearity and statistical independence.We first apply these formulas to S 1 to compute its variance.
Thus the variance is If ρ is the ideal state with symmetries tr(S 1 ρ) = m and tr(S On the other hand, if we make the noisy replacement ρ → ρ = M −1 M(ρ), then we can obtain a bound Next we compute the variance of estimating S 2 .Analogous to the calculation presented in Eq. (B41), we expand ⟨b|CM −1 (S 2 )C † |b⟩ 2 and group terms based on the overlapping of indices: We now go through each summation and evaluate the expectations: i<j;k<l k̸ =i̸ =l;k̸ =j̸ =l Eqs. (B46) to (B49) can be combined by relabeling the indices and recognizing that the resulting three-index summation has 3 n 3 = (n − 2) n 2 terms of the form Z p Z q (symmetric across the index pairs p, q ∈ {i, j, k} with p ̸ = q).Hence there are only n 2 unique terms, all of which are repeated n − 2 times: Combining these expressions, we obtain Due to the presence of the four-body term, we will need the conserved quantity associated with M 4 : The four-index sum here can be grouped as we did in Eq. (B44), and the conditions can be simplified as before.Along with the fact that Z 2 i = I, a straightforward calculation reveals i<j k<l Plugging this into Eq.(B53), combined with tr(M 4 ρ) = m 4 and tr(S 2 ρ) = m 2 − n, we arrive at tr(S 4 ρ) = 24 tr where Finally, applying this result to Eqs. (B52), we can compute the variance with respect to an ideal state lying in the symmetry sector: Again making the replacement ρ → ρ, we instead have the following bound: If m = Θ(1), which is the case in our numerical experiments of antiferromagnetic spin systems in Section VII B, then Var ρ [ŝ 2 /s 2 ] = O(1).Recall from Eq. (B43) that Var ρ [ŝ 1 /s 1 ] = O(n/m 2 ).Therefore the variance overhead of estimating these symmetry operators is, asymptotically, when tr(M ρ) = m = Θ(1).Systems wherein m depends on n will require a caseby-case analysis, which we leave to the reader.As a pathological example, consider two different functions which are both Θ( Thus the specific form of m = f (n) can drastically affect the asymptotic bounds here.

Appendix C: Spin-adapted fermionic shadows
It is well known that number-conserving fermion basis rotations which preserve spin symmetries can be block diagonalized according to the spin sectors, leading to savings in both classical and quantum resources.Here we show how to leverage spin symmetries for the broader class of fermionic Gaussian transformations, and in particular we construct a spin-adapted matchgate shadows protocol.As such, this scheme will be informationally complete only over spin-conserving observables.
Let n ↑ and n ↓ be the number of spin-up and spindown fermionic modes, respectively.The total number of modes is n = n ↑ + n ↓ , and we order the labels such that all spin-up modes come first.Gaussian transformations which do not mix between different spin types are block diagonal, where be the spin-adapted group, i.e., the set of all elements of the form of Eq. (C1).In order to calculate properties of this ensemble for classical shadows, we shall use the fact that U Q is nearly equivalent to the tensor product U Q ↑ ⊗ U Q ↓ , up to a factor that depends on the determinant of Q ↑ .More precisely, we have the following.
Lemma 12.For all block-diagonal Q of the form of Eq. (C1), the unitary can be written as where and c µ,σ is the parity operator on the σ-spin sector.
For technical reasons, we have introduced two new sets of Majorana operators {c µ,σ | µ ∈ [2n σ ]} on each n σmode Hilbert space, in order to talk about the different spin sectors in terms of the standard tensor product.Because the tensor product does not respect the antisymmetry of fermions, these new Majorana operators are related to the usual Majorana operators (acting on the full n-mode Hilbert space) via Lemma 12 therefore addresses this technicality of maintaining the anticommutation relations when expressing U Q as a tensor product.
Proof (of Lemma 12).It is clear that conjugation by U Qσ transforms each c µ,σ as desired.What we need to ensure is that the Majorana operators γ µ,σ on the full Hilbert space transform properly.Indeed, Eq. (C2) performs the desired transformation; for the spin-up sector, we simply have For the spin-down sector, we will make use of following commutation relations: Consider the case det(Q ↑ ) = 1.Then s(Q ↑ ) = 0, and so where we have used the fact that P ↓ P † ↓ = P 2 ↓ = I.
In the context of classical shadows, the appearance of P ↓ is inconsequential because it merely acts as a phasing operator which appears directly before measurement.To see this, first observe that M Z = M Z,↑ ⊗ M Z,↓ , where M Z,σ = b∈{0,1} nσ |b⟩⟩⟨⟨b|.Using the fact that P ↓ |b⟩⟩ = P ↓ |b⟩⟨b|P † ↓ = |b⟩⟨b| = |b⟩⟩, we have that that shadow channel of the spin-adapted ensemble is Therefore the spin-adapted matchgate shadows behaves as two independent instances on each spin sector.The estimators and variance bounds also follow straightforwardly; first, the shadow channel is where for ease of notation in this section, we define For the variance, we use the property that the shadow norm of a tensor-product distribution is the product of shadow norms on each subsystem.This can be seen from the fact that shadow norm of an operator A is the spectral norm of a related operator which holds because A G is positive semidefinite.For clarity, in this section we use the notation ∥ • ∥ G for the shadow norm associated with the group G. Thus for any shadow channel formed as a tensor product (This argument generalizes to multiple tensor products.)Within the context of our spin-adapted ensemble, this implies that any spin-respecting Majorana operator has a squared shadow norm of Thus, for Majorana operators of constant degree 2(j + ℓ) ≤ 2k, the variance scales as O(n j ↑ n ℓ ↓ ) = O(n k ), just as in the unadapted setting.Note that spin-respecting here means that the operator factorizes into an even-degree Majorana operator on each spin sector.
The advantage of this ensemble is that the required circuit depth and gate count are roughly halved, since we only need to implement two independent matchgate circuits on n σ qubits each.Furthermore, one can also check that the shadow norm constant factors in the spin-adapted setting are also slightly smaller (for example, for k = 2 and n σ = n/2, the ratio of spinadapted to unadapted shadow norms is asymptotically

Appendix D: Improved compilation of fermionic Gaussian unitaries
In this section we describe a new scheme for compiling the matchgate circuit U Q for arbitrary Q ∈ O(2n), under the Jordan-Wigner mapping.This approach improves upon the circuit depth of prior art [98] by optimizing the parallelization of nearest-neighbor single-and two-qubit gates.We accomplish this by modifying previously established ideas to better respect the mapping of Majorana modes to qubits.Our improved design is implemented in code at our open-source repository [89].

A previous circuit design
First we will review a prior circuit design to encode the action of U Q into a sequence of single-and two-qubit gates, from which it will become clear where there is room for improved parallelization.While the precise scheme that we describe here has not previously appeared in the literature, the high-level ideas follow from a combination of already developed results [98,99,116,117,128,129].
Recall that our convention for the Jordan-Wigner mapping is for p ∈ [n], and our convention for the Gaussian transformation is From this property, a circuit for arbitrary U Q can be constructed by a QR decomposition of Q.Such a decomposition yields a sequence of nearestneighbor Givens rotations, which we then map to singleand adjacent two-qubit gates.
One possible QR decomposition is where each G j is a Givens rotation among adjacent rows and columns, and D is the upper-right triangular matrix from the QR decomposition.Because Q is an orthogonal matrix, D is guaranteed to be a diagonal matrix with ±1 entries along the diagonal.This is equivalent to the Reck et al. [128] design, and the number of Givens rotations is By the homomorphism property of U, this matrix decomposition yields a sequence of circuit elements that implements the desired unitary: Alternatively, the Clements et al. [129] design computes a decomposition of the form6 The total number of Givens rotations here is the same, R + L = O(n 2 ).However, by utilizing rotations that act from both left and right, it optimizes parallelization to reduce the depth by a constant factor (roughly 1/2).
Refs. [98,116,117] showed how to convert these Givens rotations into number-preserving quantum gates; here we seek to generalize to fermionic Gaussian unitaries which do not necessarily conserve particle number.While Ref. [98] also considered this scenario, they maintained the representation of Givens rotations as numberpreserving gates.Their circuit design breaks particlenumber symmetry by interspersing particle-hole transformations throughout the decomposition.
Instead, we will use a representation that inherently features non-number-preserving rotations.Suppose that the Givens rotation G j acts nontrivially on the axes (µ, µ + 1) as The quantum gate which achieves this transformation is a single-or two-qubit Pauli rotation, given by To implement the diagonal matrix D of signs, we require a different scheme.In particular, we can construct U D as a single layer of Pauli gates.Consider the 2 × 2 block along the diagonal which describes the transformation The remaining cases, D 2p = −D 2p+1 , can be handled as follows.First, suppose D 2p = 1.We wish to find the gates which perform the transformation while leaving all other Majorana operators invariant.We can almost accomplish this with X p , since it will map X p to itself and Y p to −Y p .It also commutes with all Majorana operators γ 2q , γ 2q+1 for q < p.However, for q > p this will accrue unwanted signs: To correct these signs, we introduce a Pauli-Z string running in the opposite direction of the Jordan-Wigner convention.That is, define This unitary has the correct action on γ 2p , γ 2p+1 and continues to commute with the Majorana operators γ 2q , γ 2q+1 with q < p.For q > p, however, we now have p .This causes the sign of γ 2p , rather than γ 2p+1 , to flip, while retaining all other properties.
Altogether, we determine these transformations for all 2 × 2 diagonal blocks of D, resulting in n Pauli strings of the form The overall transformation is then simply the product of these Pauli strings, which can be concatenated into a single layer of Pauli gates: Note that the order of this product does not matter, since Pauli gates commute up to an unobservable global phase.

Discussion on suboptimality
Now we observe that, depending on the parity of µ ∈ {0, . . ., 2n − 2}, U Gj is either a single-or two-qubit gate.However, the decomposition of Q described above is implicitly optimized under the assumption that only twoqubit gates are present: each Givens rotation acts on two axes at a time, and it is assumed that this corresponds to physically acting on two wires at a time.This results in underutilized space in the quantum circuit whenever a single-qubit Z rotation occurs, as it leaves a qubit wire needlessly idle.This is true for both the Reck et al. [128] and Clements et al. [129] designs.Ultimately, this suboptimality is due to the fact that Q is a 2n × 2n matrix, so there is a two-to-one correspondence between axes and qubits: the rows/columns labeled by (2p, 2p + 1) correspond to two Majorana operators, both of which are in turn associated with a single qubit p.Note that this discrepancy is not present in circuit designs for the class of number-conserving rotations [98,116,117], which are instead more compactly represented by an n × n unitary matrix already.

Circuit design with improved parallelization
Now we introduce a circuit design which explicitly accounts for this two-to-one correspondence.The basic idea is to generalize the notion of Givens rotations, which act on a two-dimensional subspace to zero out a single matrix element, to a four-dimensional orthogonal transformation which zeroes out blocks of 2 × 2 at a time.Each 4 × 4 orthogonal transformation acts on the axes (2p, 2p + 1, 2p + 2, 2p + 3), which corresponds to qubits p and p + 1.By performing this process according to the scheme of Clements et al. [129] (but now treating each 2 × 2 block of Q as a "single" element), we obtain a decomposition wherein the optimal parallelization of the scheme is fully preserved in terms of interactions between nearest-neighbor qubits.Finally, each 4 × 4 orthogonal transformation is ultimately decomposed into six rotations of the form of Eq. (D8) and a layer of Pauli gates, achieved by the standard decomposition that we described in Appendix D 1 (i.e., by bootstrapping off the prior scheme within blocks of 2n = 4).Note that in principle one may instead implement the 4 × 4 orthogonal transformations using any gate set of one's choice, rather than XX and Z rotations.
We now describe the algorithm in detail.First we compute a decomposition analogous to the Clements et al. [129] design, but instead of Givens rotations, each B k acts nontrivially on a 4 × 4 block.(Note that there is a single 2 × 2 Givens rotation G as well, which serves to zero out a final matrix element that we will elaborate on later.)We accomplish this by treating Q as an n × n matrix of 2 × 2 blocks, for each p, q ∈ [n].Just as Givens rotations are chosen to zero a specific matrix element, each B k acts to zero out a particular 2 × 2 block Q p,q .Suppose we want to find a B R+i which acts from the left (i = 1, . . ., L) to zero out the block Q p,q .Then we perform a QR decomposition on the 4 × 2 submatrix which includes the target block and the block directly above it: hence zeroing out the lower block as desired.Here, 4) is computed from the QR decomposition, and so the orthogonal matrix B R+i ∈ O(2n) appearing in Eq. (D21) is defined as B ′ R+i along the axes (2p − 2, 2p − 1, 2p, 2p + 1) and the identity elsewhere.
Similarly, if we want a B j which acts from the right (j = 1, . . ., R), then we consider instead a 2×4 submatrix with the target block on the left: This can be zeroed out by performing an LQ decomposition (which is essentially just the transpose of the QR decomposition).For notation in this section, let tildes denote the flipping of rows in a matrix, for example Then performing an LQ decomposition on the rowflipped version of Eq. (D24), we have Qp,q Qp,q+1 = * 0 0 0 * * 0 0 B ′ j = 0 0 0 * 0 0 * * B′ j . (D26) Flipping the rows back to normal on the lefthand side, we get as desired.Then we define B j ∈ O(2n) acting as B′ j on the axes (2q, 2q + 1, 2q + 2, 2q + 3) and trivially elsewhere.Now we address the need for the sole Givens rotation G appearing in Eq. (D21).As the zeroing-out procedure described above progresses, the nonzero blocks get "pushed" towards the diagonal until the final matrix is (2 × 2)-block diagonal.These nonzero blocks must be triangular because they are produced by QR/LQ decompositions; but since Q is orthogonal, this implies that the final triangular blocks along the diagonal must be diagonal themselves.The exception to this is either the leftmost or rightmost block, depending on whether n is even or odd.This is because the decomposition procedure inevitably leaves one of those blocks untouched, so it was never made triangular/diagonal.This can be visualized as follows: if n is odd, then we have We use boldface to clarify which matrix elements are newly zeroed at each step.The condition that Q is an orthogonal matrix implies the final equality.It also enforces the remaining 2×2 block to be orthogonal, so that we can diagonalize it by computing the appropriate Givens rotation acting on axes (2n − 2, 2n − 1).This elucidates the appearance of G in Eq. (D21).On the other hand, if n is even, then the top-left block remains instead:  * * 0 0 0 0 0 0 * * 0 0 0 0 0 0 0 0 ±1 0 0 0 0 0 0 0 0 ±1 0 0 0 0 0 0 0 0 ±1 0 0 0 0 0 0 0 0 ±1 0 0 0 0 0 0 0 0 ±1 0 0 0 0 0 0 0 0 ±1 In this case, G needs to act on axes (0, 1).Thus we have obtained the decomposition of Eq. (D21) as desired.The implementation of each component then follows from bootstrapping the prior techniques: the diagonal matrix D becomes a layer of Pauli gates, described by Eq. (D20); and the four-dimensional orthogonal transformations B j are further decomposed into Givens rotations, described in Sec.D 1 (wherein n = 2).
In Figure 10 we demonstrate the circuit depth and gate count of this new design.For each n, we run our algorithm on a randomly generated element of O(2n).We further compile the circuits to a gate set native to superconducting-qubit platforms, consisting of arbitrary single-qubit rotations and nearest-neighbor twoqubit √ iSWAP gates (described in Appendix F).For comparison, we also compile the same unitary according to the old design described in Appendix D 1 and the algorithm of Jiang et al. [98], which is implemented within the open-source library OpenFermion [131].This latter design also uses a Givens-rotation decomposition, but rather than the O(2n) Majorana representation it employs particle-hole transformations on the ladder operators to incorporate non-particle-conserving operations.We also make polynomial fits, demonstrating the resource savings of our design.We infer asymptotic reductions in the circuit depth and gate count by about 1/3 and 1/2, respectively.Especially for near-term quan-tum computers, such savings provide significant improvements to overall performance.

Appendix E: Classical shadows postprocessing details
In this section we provide details for the classical postprocesisng of local observable estimators from classical shadows.We include this for a self-contained and explicit presentation, and also to address the modified shadows protocols (subsystem symmetrization and spin adaptation) introduced in this paper.These algorithms are implemented at our open-source repository [89].

Matchgate shadows
For any orthogonal matrix Q ∈ O(2n), Ref. [67] derived formulas involving the multiplication of 2n × 2n matrices and the computation of Pfaffians of 2k × 2k submatrices for estimating k-body Majorana observables.However when restricting Q ∈ B(2n), there exists a significantly cheaper method that does not involve such numerical linear algebra routines.This algorithm was implicitly described in Ref. [66], but not explicitly outlined.We do so here; for k = O( to return estimates for all 2j-degree Majorana operators, 1 ≤ j ≤ k, from T samples.Note that the number of operators is O(n 2k ), so our approach has significant savings over a naive iteration.Furthermore, it largely involves integer storage and manipulations rather than floatingpoint operations.
The estimator for tr(Γ µ ρ) can be written as tr Q can be expanded in terms of subdeterminants of Q according to Eq. ( 59).However, a simplified derivation is possible here by using the fact that Q implements a signed permutation: Hence for operators of degree 2j, We would like to retain the ordering of indices when working with the multidegree Majorana operators; therefore we introduce a further a permutation as π−1 (µ i ), which is defined to satisfy π−1 (µ 1 ) < This incurs another sign factor (−1) p where p ∈ {0, 1} is the parity of the permutation which sends π −1 (µ) → π−1 (µ).Collecting all signs as sign Q The matrix element ⟨b|Γ π−1 (µ) |b⟩ is nonzero if and only if π−1 (µ) ∈ D 2n,2j , from which its value of ±1 is straightforward to determine (e.g., by mapping to Pauli-Z operators).In total, evaluating Eq. (E3) takes time O(n 2 + j 2 + j), corresponding respectively to the inversion of π ∈ Sym(2n), the calculation of π−1 (µ) and its parity on 2j indices, and evaluating the product of 2j + 1 signs and ⟨b|Γ π−1 (µ) |b⟩, the latter requiring only checking 2j indices and j bits of b.Assuming j ≤ k = O(1), this implies a computational complexity of O(n 2 ) per operator per sample.
We can speed up the computation over all operators per sample to O(n j ) by using the fact that many ⟨b|Γ π−1 (µ) |b⟩ vanish.That is, rather than compute π−1 (µ) for all µ ∈ C 2n,2j and checking whether each is an element of D 2n,2j , we work backwards by looping over all target elements τ ∈ D 2n,2j and computing π(τ ) to find its preimage.As before, let π(τ ) ∈ C 2n,2j be the reordering of π(τ ) with associated sign (−1) p .Then for each τ ∈ D 2n,2j , we compute the estimator for Γ π(τ ) , where the cumulative sign is sign Q (π(τ )) = (−1) p s τ1 • • • s τ2j .All other Majorana operators not in the preimage are implicitly assigned an estimate of 0. Hence we only iterate over the n j = O(n j ) elements of D 2n,2j , with each evaluation of Eq. (E4) taking O(j 2 ) = O(1) time.Note that this approach also avoids the need to find the inverse permutation π −1 .
Performing this procedure over all T samples results in a time complexity of O(n j T ) ≤ O(n k T ), running over all j ∈ {1, . . ., k}.We can also include an additive O(n 2k ) cost to preallocate storage for j≤k C 2n,2j .While not strictly necessary, this is convenient in practice, and besides when T = Õ(n k ϵ −2 ) the total complexity is Õ(n 2k ϵ −2 ) whether or not we preallocate memory.
For the spin-adapted shadows, because the protocol factorizes across the spin sectors, we perform this algorithm on each sector independently.The estimator for operators of the form Γ µ ⊗ Γ ν is then the product of the independent estimates.Note that if either |µ| or |ν| are odd, then the estimator always vanishes; this reflects the fact that the spin-adapted ensemble is not informationally complete over such operators.

Pauli shadows
Because single-qubit measurements factorize, we consider each qubit i ∈ [n] independently.Given the random Clifford C i ∈ Cl(1) and measurement outcome b i ∈ {0, 1}, the estimator for σ The product over i ∈ [n] then estimates P = i∈[n] σ (i)  for the full n-qubit system.This suffices to estimate any k-local observable, which can be decomposed into a linear combination of polynomially many (≤ k)-local Pauli operators.The total time complexity of estimating all k-local Pauli operators with T snapshots is O(n k T ).The algorithm is as follows.For each W = (W 0 , . . ., W n−1 ), we take, for each j ≤ k, all n j combinations W i1 , . . ., W ij and compute Eq. (E6) for each σ (i) = ±W i .We assign the result as an estimate for the j-local operator , and implicitly assign 0 to all other Pauli operators.Note that there are a total of j≤k 3 j n j = O(n k ) local Pauli operators, so preallocating storage here is asymptotically negligible.
For the subsystem-symmetrized protocol, the n-qubit estimator now takes the form Using the fact that S † π |b⟩ = i∈ [n] |π(b i )⟩, we can simply apply the standard scheme described above, but with the replacement b i → π(b i ).For each sample this is only an additive O(n) cost.

Appendix F: Additional details on numerical experiments 1. Readout noise models
In Secs.VII A 1 and VII B 1, we demonstrated our mitigation strategy under single-qubit readout errors.The noise channels occur immediately before measurement and are implemented probabilistically: independently and identically (i.i.d.) on each qubit per circuit repetition.We consider depolarizing, amplitudedamping, and bit-flip errors occurring with probability p, which are respectively These models obey Assumptions 1, although we comment that more complicated noise channels can also satisfy the assumptions, such as non-i.i.d.errors, correlated multiqubit errors, and even coherent gate errors [81].

QVM gate set and noise model
The noise model we implement on the Cirq Quantum Virtual Machine is based on the Google Sycamore processor "Rainbow," a 2D grid of 23 superconducting qubits.We use the calibration data obtained from November 16, 2021, which can be found in the Cirq open-source repository [96].The native gate set that we compile our circuits to include single-qubit rotations in the form of phased XZ gates, PhXZ(x, z, a) This describes a rotation by πx about an axis determined by the parameter a within the xy plane, followed by a phasing of πz.The native two-qubit gates that we use are  The QVM noise model that we simulate is not fully comprehensive of all types of errors occurring in an actual device, however it captures the most dominant error sources in the superconducting platform [95].It consists of four categories: 1. Readout errors are modeled as asymmetric bit-flip channels on each qubit.The asymmetry reflects the fact that the probability of a |1⟩ outcome being erroneously measured as |0⟩ is generally higher than misreading a |0⟩ outcome.Although the errors are modeled as single-qubit channels, the calibration data is taken from parallel experiments, to potentially account for effects such as readout crosstalk and other unintended interactions between qubits.
2. Decay (T 1 ) and dephasing (T 2 ) errors occur whenever a qubit idles during a moment (layer) of a circuit.Both T 1 and T 2 relaxations are incorporated into a single channel, E idle (ρ) = 1 − ρ 11 e −t/T1 ρ 01 e −t/T2 ρ 10 e −t/T2 ρ 11 e −t/T1 .(F6) The decay time T 1 is characterized by a simple experiment that prepares |1⟩ and measures the survival probability as a function of t.This experiment is performed in isolation, i.e., one qubit at a time while all other qubits on the chip idle.
The T 2 time is determined from the equation where 1/T ϕ is the pure dephasing rate that can in principle be measured by Ramsey interferometry.For simplicity, however, this noise model instead approximates T ϕ from the total single-qubit incoherent error ϵ inc , which is determined by purity benchmarking [132,133] performed in isolation.To leading order, T ϕ is approximated using the relation The time t which appears in the model channel E idle is the longest gate duration occurring within that moment: PhXZ gates have a duration of 25 ns, while √ iSWAP gates take 32 ns.
3. Single-qubit gate errors are modeled as depolarizing channels occurring after each gate.The depolarizing rate is set to match the total single-qubit Pauli error, which is measured from the device via randomized benchmarking (RB) [134,135] in isolation.which is a native, tunable interaction on the superconducting platform.The √ iSWAP gate is the instance (θ, ϕ) = (− π 4 , 0).Coherent errors are thus modeled as an overrotation by (δθ, δϕ), which are determined for each pair of connected qubits by fitting to cross-entropy benchmarking (XEB) data using random cycles of gates across the chip [32,136,137].
After the coherent overrotation, an incoherent error follows, modeled as a two-qubit depolarizing channel.The depolarizing rate r (i,j) dep for each pair (i, j) of connected qubits is inferred as follows: from the total XEB Pauli error r (i,j) XEB , we subtract off the single-qubit incoherent error rates r Further details of the noise model, its numerical implementation, and the calibration-data acquisition are described in Ref. [95], as well as in the Cirq repository [96].For completeness, in Figure 11 we display a series of plots which show the chip connectivity and numerical values of the calibration data used for the various errors described above.

Compiling circuits to the native gate set
Single-qubit rotations are compiled into PhXZ gates according to an Euler-angle decomposition.Two-qubit unitaries are compiled into at most three √ iSWAP gates (interleaved with single-qubit rotations) by a KAK decomposition, although most two-qubit unitaries (79% with respect to the Haar measure) can be implemented with just two √ iSWAP gates [138].After compiling the entire circuit into this gate set, single-qubit rotations are concatenated into a single PhXZ gate whenever possible.All operations besides readout are pushed as early into the circuit as possible.
One exception we make is in the random permutation circuits S π appearing in the group Cl(1) ⊗n Sym (for subsystem-symmetrized Pauli shadows).First, we decompose π into an parallelized network of adjacent transpositions using an odd-even sorting algorithm [97].Each transposition i ↔ j corresponds to a SWAP gate between qubits i and j.However, rather than compile SWAP to the gate set directly (which would require three √ iSWAP gates and four layers of PhXZ ⊗2 gates), we instead implement the unitary iSWAP = which uses only two √ iSWAP gates and no single-qubit gates.The iSWAP gate differs from SWAP only by a phasing of i on the basis states |01⟩ and |10⟩.Such a replacement is valid because S π occurs only at the end of the circuit, immediately before readout.Thus while this phasing is technically unwanted, it has no observable effect on the measurement outcomes.
Finally, we note that the Trotter circuits for our Fermi-Hubbard simulations are optimized for the Sycamore architecture according to Ref. [35], which we follow closely.
In particular, open-source code for their implementation can be found in Ref. [115].

Qubit assignment averaging
Our eight-qubit numerical experiments on the QVM utilize the 2 × 4 grid spanning from qubits (5, 1) to (6, 4) (see Figure 11).To map these qubits to the simulated degrees of freedom (fermion modes or spin-1/2 particles), we employ qubit assignment averaging (QAA), which was introduced in Ref. [35] in order to handle the issue of inhomogeneous error rates across a noisy quantum device.QAA works by identifying N different assignments of the n qubits and allocating T /N of the experimental repetitions to each realization.Properties are estimated by averaging over all T samples as usual.In principle, one can use a combination of shifting, rotating, and flipping the qubits throughout the chip; for our simulations, we vary qubit assignments within the same fixed 2 × 4 grid.
For the Fermi-Hubbard model, we assign a spin sector to each of the parallel 1×4 qubit chains.We average over N = 4 different qubit assignments, defined by setting either the top or bottom chain as the spin-up chain, and ordering the four site labels starting either from the left or the right.
While QAA aims to reduce device inhomogeneities, it cannot lower the total amount of circuit noise.Thus QAA does not necessarily improve prediction accuracy with the unmitigated (standard shadow) estimators.Instead, homogenizing the noise appears to massage it into an effective form which approximately satisfies Assumptions 1 better than a single fixed configuration.We substantiate this claim with Figure 12, using spin-spin correlations of the XXZ model (R = 4 Trotter steps) as a demonstrative example.We see that the unmitigated errors are virtually identical whether or not we perform QAA.On the other hand, the symmetry-adjusted estimates with QAA exhibits a more uniform error profile and overall improved noise suppression.Further investigation into this behavior is left as an open problem.

Bootstrapping uncertainty bars
To estimate uncertainty bars, we employ empirical bootstrapping [112], modified by batching together samples.First we summarize the original method: given T classical-shadow snapshots, one resamples that data T times with replacement.Then, averages ôj (T ) (being either the unmitigated or mitigated estimators) are computed from that resampled data, yielding one bootstrap sample.Repeating this B times and computing the standard deviation among those B bootstrap samples yields the uncertainty bar.
Due to the size T ∼ 10 6 -10 7 from our simulations and limitations on classical compute resources, we perform bootstrapping on batches of snapshots.Split the T samples into K batches (each containing T /K samples) and compute ô(k) j (T /K) for each batch k = 1, . . ., K. Because these estimates obey ôj (T ) = (1/K) K k=1 ô(k) j (T /K), we resample the K batches (rather than all T shots) to bootstrap uncertainty bars for ôj (T ).Depending on T , we set K ∼ 10 2 -10 3 , and for all cases we take B = 200.Here we provide an estimate of how much the QVM noise model violates Assumptions 1.We quantify this by computing a lower bound on the minimal observable error achievable by symmetry-adjusted classical shadows.
Let U prep be the state-preparation circuit and U g a random measurement circuit.For Schur's lemma to hold (Assumptions 1), we require that the entire noisy circuit take the form EU g U prep , where E is the both time-and g-independent.While the noise model that we simulate is indeed time stationary and Markovian, the effective error channel E = E g depends on g. (This can be seen, for example, by commuting all the individual gate-level errors throughout U g and U prep to the end of the circuit.) In order to study this dependence on g, consider the decomposition where E 0 is defined to be independent of g ∈ G.Although somewhat of an artificial decomposition, this is always mathematically possible with both E 0 and ∆ g completely positive; indeed, a trivial choice is E 0 = 0. Our goal is to find the "largest" (in some sense) valid solution for E 0 .The remaining contribution ∆ g will then represent the minimal amount of assumption-violating noise in the model that our rigorous theory currently has no guarantees for.From the decomposition above, the noisy measurement channel can be written as where is diagonal in the irreps of G, while the form of ∆ := E g∼G U † g M Z ∆ g U g is unknown.Applying M −1 and taking expectation values for the observables {O j } L j=1 yields (assuming each O j ∈ V λ ) The terms δ j := ⟨⟨O j |M −1 ∆|ρ⟩⟩ describe the deviation of observable estimates due to violations of the noise assumptions, which is precisely what we wish to quantify.For notation, denote the noisy expectations by y j := ⟨⟨O j |M −1 M|ρ⟩⟩ and noiseless expectations by We report the number of single-qubit (PhXZ) and two-qubit ( √ iSWAP) gates used, as well as the overall compiled circuit depth.Uncertainty bars are one standard deviation variations in the random measurement circuits.
x j := ⟨⟨O j |ρ⟩⟩.We collect these quantities into vectors of length L and define the diagonal matrix A ∈ R L×L with eigenvalues f λ (E 0 )/f λ (in the appropriate positions corresponding to the irreps).This yields in the linear relationship This equation is underconstrained, so we opt for an estimate of δ by bounding its norm from below.Namely, let Â be a diagonal matrix of free parameters 0 ≤ ξ λ ≤ 1, which we optimize by nonnegative least-squares (NNLS) minimization: Define δ := y − Âx as the solution to this problem.In this sense, δ represents an error floor beyond which our theory for symmetry adjustment cannot mitigate due to inherent violations of Assumptions 1.
In Figure 13 we plot the root mean square of δ, which quantifies the average additive error of the estimates.The observables we choose constitute local op-erators depending on the type of system simulated.For fermions, we consider one-and two-body Majorana operators that respect the spin adaptation.For qubits, we take strictly two-body Pauli operators.Uncertainty bars are bootstrapped as described in Sec.F 5, where each bootstrap sample is obtained from the NNLS solution of the resampled data.We also show data for the Trotter circuit size: the number of single-and two-qubit gates after compiling to the native gate set, as well as the circuit depth.Uncertainty bars here are given by one standard deviation in the size fluctuations due to the random unitaries U g .
Overall, we assess that there is an error floor on the order of 10 −2 per observable (recall that the observables have unit spectral norm).Interestingly, this lower bound appears roughly independent of circuit size (within uncertainty bars), perhaps indicating a saturation of the g-dependent contributions after a certain circuit size.In practice however, we have observed that symmetryadjusted classical shadows only achieve mitigated errors on the order of 10 −1 at the deepest circuits.We leave a closer analysis of this behavior, and whether this lower bound can actually be achieved, to future work.

FIG. 6 .
FIG.6.Estimation of spin-spin correlations ⟨S0 • Si⟩ in the ground state of a 48-spin XXZ chain (m = 0), fixing the first spin.We average over T = 10 6 samples collected by subsystem-symmetrized Pauli shadows.The noise model is single-qubit readout errors with bit-flip probability p = 5%.

Lemma 9 .
) which contains |B k | = 3 k n k elements.Let G = Cl(1) ⊗n Sym .The orbit G • P of any P ∈ B k is equal to ±B k , i.e., the set of all signed k-local Pauli operators.Proof.Let the nontrivial support of P be I ⊆ [n], |I| = k.For each (normalized) Pauli matrix acting on subsystem I, its orbit by all single-qubit Clifford gates is ±{X, Y, Z}/ √ 2. Meanwhile, the trivial factors I/ √ 2 acting on [n] \ I are invariant to any unitary transformation.Therefore Cl(1) ⊗n • P is the set of all normalized Pauli operators acting nontrivially only on the qubits in I (with both signs ±1).

4 .
Two-qubit gate errors are modeled with both coherent and incoherent components.The coherent contribution uses the fact that √ iSWAP is an instance of the general fermionic simulation (fSim) inc (determined from RB), as well as the average entangling error rate r (i,j) ent , which are calculated using the coherent errors δθ, δϕ.The model's two-qubit depolarizing rate is then set to account for the remaining amount of error: nature of XEB, both two-qubit error sources are characterized by parallel experimental data.

6 .
Estimating the gate dependence of the QVM noise model

FIG. 13 .
FIG.13.Estimated deviation of the QVM noise from a gate-independent model.(Top) Lower bound on the root-mean-square deviation of one-and two-body Majorana/Pauli expectation values as the number of Trotter steps (noisy circuit depth) grows.The estimated vector of deviations δ is described in Eq. (F17).Uncertainty bars are calculated by empirical bootstrapping.(Bottom) Metrics for the size of the full circuit (state preparation via Trotterization and the random measurement unitary).We report the number of single-qubit (PhXZ) and two-qubit ( √ iSWAP) gates used, as well as the overall compiled circuit depth.Uncertainty bars are one standard deviation variations in the random measurement circuits.

B. Subsystem-symmetrized Pauli shadows
While random Pauli measurements are efficient for predicting local qubit observables, the irreducible structure of the local Clifford group Cl(1) ⊗n is difficult to reconcile with common symmetries under symmetry adjustment, such as the U(1) symmetry generated by M = i∈[n] Z i .To remedy this issue, we modify the protocol by what we call subsystem symmetrization: define the group Cl(1) ⊗n (33):= Sym(n) × Cl(1) ⊗n ,(33)which has the unitary representation U (π,C) = S π C where S π permutes the qubits according to π ∈ Sym(n) and C ∈ Cl(1) ⊗n .The circuit for S π can be Estimation error of the fermionic 2-RDM reconstructed from matchgate shadows.Error is quantified by the spectralnorm difference between the estimated and true 2-RDM.We simulate the protocol on a collection of 20 random Slater determinants on n = 8 modes with η = 2 fermions; faint dashes are results for individual states, markers indicate the median error across the 20 states.(Top) Scaling of estimation error with the total number of samples T , fixing the noise rate to p = 0.2 for all noise models.(Bottom) Scaling of the estimation error with the noise rate p, fixing the total number of samples to T = 10 6 .
1), it runs in time O(n k T )