Symmetry-resolved entanglement detection using partial transpose moments

We propose an ordered set of experimentally accessible conditions for detecting entanglement in mixed states. The $k$-th condition involves comparing moments of the partially transposed density operator up to order $k$. Remarkably, the union of all moment inequalities reproduces the Peres-Horodecki criterion for detecting entanglement. Our empirical studies highlight that the first four conditions already detect mixed state entanglement reliably in a variety of quantum architectures. Exploiting symmetries can help to further improve their detection capabilities. We also show how to estimate moment inequalities based on local random measurements of single state copies (classical shadows) and derive statistically sound confidence intervals as a function of the number of performed measurements. Our analysis includes the experimentally relevant situation of drifting sources, i.e. non-identical, but independent, state copies.


I. INTRODUCTION
In the past years, considerable effort led to the building of larger and larger Noisy Intermediate-Scale Quantum (NISQ) devices [1][2][3]. For the benchmarking of such devices comes the need for more scalable tools in order to characterize the underlying many-body quantum state (see e.g. [4] and references therein). For instance, characterizing the entanglement properties of these quantum states is, besides the intrinsic theoretical interest, essential to gauge the performance and verify the proper working of the NISQ devices.
As a first prominent example among the tools to characterize entanglement, there is the concept of entanglement witness [5]. An entanglement witness is a functional of the quantum density matrix that separates a specific entangled state from the set of all separable states [6]. By contrast, in this work, we shall focus on a superset of the set of separable states: the set of states with positive partial transpose. In other words, we will focus on sufficient conditions for entanglement (equivalently, necessary conditions for separability).
From the numerous theoretical sufficient conditions for entanglement that have been developed in the literature, many cannot be straightforwardly implemented experimentally, mainly because they require the (exponentially expensive) knowledge of the full density matrix [7][8][9]. This is for instance the case of the celebrated PPT condition [10], which states that a separable state ρ always has a positive semi-definite (PSD) partial transpose (PT) * These authors contributed equally. ρ Γ for any bipartite splitting of its subsystems. Thus, if ρ Γ has (at least) a single negative eigenvalue, then ρ is entangled. The negativity, which resulted from this condition, is a highly used entanglement measure for mixed states [11,12].
This powerful entanglement condition, which found many applications in theoretical works [13][14][15][16][17][18][19][20], is difficult to apply in experimental conditions as the PT spectrum is difficult to access. To overcome this challenge, it was shown in Ref. [21] that valuable information about the PT spectrum can be obtained from a few PT moments tr(ρ Γ ) k only. Using the first three PT moments, an entanglement condition, called p 3 -PPT, was proposed and shown to be useful for detecting entangled states in several different contexts. Moments tr(ρ Γ ) k have the advantage that they can be estimated using shadow tomography [21] in a more efficient way than if one had to reconstruct ρ via full quantum state tomography. As in other randomized measurements protocols probing entanglement [21][22][23][24][25][26][27][28][29][30][31], the classical shadows formalism only requires (randomized) single-qubit measurements in experiments realizing the single-copy state ρ. In this paper, we follow this idea of using PT moments to build experimentally computable entanglement conditions, and extend the p 3 -PPT condition in two directions.
On the one hand, we propose different entanglement detection strategies depending on how many PT moments can be estimated. Starting from the third order moment, we show that the estimation of each higher order moment gives access to an independent entanglement condition. Interestingly, if all the PT moments can be estimated, this set of conditions is then necessary and sufficient for the state to be PPT (i.e. to have a positive semi-definite partial transpose). Of course, the higher the moment, the larger the number of experimental runs needed. In case higher order moments cannot be accessed, we show how to obtain an optimal entanglement condition using PT moments of order up to three.
On the other hand, we investigate the effect of symmetries on this entanglement detection method. As shown in Ref. [32] for the case of dynamical purification, taking symmetries into account to define symmetry-resolved (SR) versions of the tools usually used to characterize quantum states can enable a finer characterization of some quantum features and even reveal phenomena that cannot be observed without symmetry-resolution. For states preserving an extensive quantity, we define SR versions of the PT-moment inequalities mentioned previously and show that these are indeed better suited to detect the entanglement of such states. Furthermore, we also show that these SR inequalities provide a sufficient entanglement condition for states that do not possess any symmetry.
The conditions derived here are particularly interesting from an experimental (and numerical) point of view, as low moments of (partially transposed) density operators are accessible. We show how source drifts in an experiment can be taken into account and how the quantities which are of interest here can be accurately estimated via local measurements on single copies of the state.
The paper is structured as follows. In Sec. II, we summarize our results. Our methods to obtain entanglement conditions from PT-moments are presented in Sec. III. In Sec. IV we study the effect of symmetries and show how to obtain a SR version of these PT-moment inequalities. In Sec. V, we apply these inequalities to a variety of physical systems and compare their efficiency for detecting entanglement. Finally, we conclude and give some outlook in Sec. VI.

II. DEFINITIONS AND SUMMARY OF RESULTS
In this section, we introduce the basic definitions needed for the entanglement detection criteria below, and summarize in a succinct manner our main results. Given a bipartite state ρ = ρ AB , we denote by ρ Γ its partial transpose with respect to subsystem B. We say that ρ is PPT if ρ Γ is positive semi-definite, and NPT otherwise. All NPT states are entangled, however there are entangled states, known as bound-entangled states, that are not NPT. We focus here on the detection of NPT entangled states.
We denote the k-th order moment of a matrix M by We will mostly consider moments of ρ Γ , and sometimes use the short-hand notation p k ≡ p k (ρ Γ ). In the presence of symmetries, the partial transpose can be cast in block diagonal form: we denote as ρ Γ (q) the resulting blocks, where q indicates a quantum number, and define the corresponding moments p k (ρ Γ (q) ) as from Eq. (1). We start by recalling the p 3 -PPT condition of Ref. [21], i.e. that any PPT state satisfies p 3 (ρ Γ )p 1 (ρ Γ ) (p 2 (ρ Γ )) 2 . (2) Any state violating this condition is NPT and therefore entangled. The p 3 -PPT condition will serve as a reference point below in accessing the predictive power of the new relations.
In the following, we will establish several sets of necessary (and sometimes also sufficient) PPT conditions, summarized as follows: i) the first set of conditions, that we dub D n conditions, also contains polynomial inequalities in the moments p k of order up to k n. The first non-trivial such a condition is D 3 , and reads: Knowing only the first three moments p 1 (ρ Γ ), p 2 (ρ Γ ) and p 3 (ρ Γ ), this condition is optimal for detecting entanglement if 1/2 p 2 (ρ Γ ) 1. Knowing moments of order up to the dimension of ρ Γ , the set of D n conditions becomes a necessary and sufficient condition for NPT entanglement; ii) the second set of conditions, dubbed Stieltjes n , involves inequalities among the moments p k of order up to n. The condition Stieltjes 3 is equivalent to p 3 -PPT, while Stieltjes 5 reads: and similar conditions are obtained including higher moments; iii) in case high-order moments are difficult or too expensive to access, we also show how to obtain an optimized, necessary condition for PPT using only PT moments of order up to three. We call this condition D opt 3 ; iv) all of the above conditions can be cast in terms of ρ Γ (q) , in which case we add the prefix SR (for symmetryresolved). For instance, the SR-p 3 -PPT condition for sector q reads Since these conditions are sensitive to the presence of negative eigenvalues in a specific symmetry sector, they are typically much more sensitive than their non-SR counterparts, as illustrated in Fig. 1.
In the SR case, it is worth mentioning that also the SR-D 2 condition, is non-trivial; 1. An illustration of the proposed method for entanglement detection. We assume the experimentally relevant situation of a source producing non-identical but independent copies {ρ1, . . . , ρN } ("drift"). Randomly-chosen unitaries Ui are applied to the qubits of each copy and then measured in the standard basis. Using classical shadows [28], these measurement outcomes are post-processed to obtain the moments pj = tr(ρ Γ avg ) j . As explained in the main text, we combine those moments to derive inequalities whose violation implies that the state ρavg is NPT, thus showing that at least one of the states ρ k produced by the source is entangled. We also show how symmetry-resolution techniques can be used to enhance the entanglement detection capabilities.
v) we show how SR conditions can, in fact, be applied to arbitrary states, via application of a proper transformation on the density matrix of interest. In practice, this transformation is effectively carried out in the postprocessing step of the classical shadows; vi) as illustrated in Fig. 1, the uncertainty in estimating the moments can be bounded, in principle, using the classical shadows formalism. Here, we show how to combine those bounds to provide rigorous confidence intervals for SR-D 2 , which considerably strengthen the impact of our results in real experiments.

III. ENTANGLEMENT DETECTION FROM PARTIAL TRANSPOSE MOMENTS
In this section, we present entanglement conditions based on PT moments. We extend the idea behind the p 3 -PPT condition (c.f. Eq. (2)) of Ref. [21] in two directions. On the one hand, we present a set of inequalities involving all the PT moments which provides a necessary and sufficient condition for the underlying state to be PPT. In addition, each condition of this set is itself a necessary PPT condition. On the other hand, we show how to optimize such entanglement conditions when only few low-order PT moments are accessible.
The idea behind this set of conditions is to use Descartes' rule of signs on the characteristic polynomial of a Hermitian matrix to obtain a set of moment inequalities that has to be satisfied by any PSD matrix. Applied to the partially transposed matrix ρ Γ , such conditions can then be used to detect the entanglement of NPT quantum states. More precisely, using the definition of the elementary symmetric polynomials on d variables, for i = 1, . . . , d, and e 0 (x 1 , . . . , x d ) = 1, we derive in Appendix A the following lemma.  (7)).

Lemma 1. A Hermitian matrix
Using Newton's identities, which relate the elementary symmetric polynomials, e k , in the eigenvalues of A to the moments of A through the recursive formula each inequality e i 0 can be transformed into an inequality involving moments of A of order up to i. We denote by D i these moments inequalities. As an illustration, the conditions D 1 to D 4 read respectively. One has p 1 (ρ Γ ) = 1 for any quantum state ρ, implying that D 1 is trivially satisfied. Similarly, since p 2 (ρ Γ ) is equal to p 2 (ρ) (i.e., to the purity of ρ) for any quantum state ρ, the inequality D 2 is also trivially satisfied. Therefore, when ρ is a quantum state, the first non-trivial inequality for ρ Γ is D 3 . As will be shown in the next section, it is sometimes more efficient (in order to detect entanglement) to apply these inequalities to projections of ρ Γ onto specific subspaces, rather than to ρ Γ itself. We would like to stress here that, in that case, the argument above does not hold, so that the inequality D 2 is not trivially satisfied and can already reveal the presence of entanglement (see Sec. IV). When applied to ρ Γ , Lemma 1 and Newton's identities (8) can thus be used to detect NPT entangled states from PT moments only. From an experimental point of view, this is an important aspect of this entanglement detection scheme, as PT moments are experimentally more affordable to estimate than, for instance, the whole spectrum of ρ Γ . As PT moments are more expensive to be estimated the higher the order, these inequalities should be considered starting from those involving the lowest moment orders. Even though showing that a state is NPT with this method can in principle require the knowledge of all the PT moments, we will provide many experimentally relevant instances where entanglement can be effectively detected from low-order moments even in the presence of errors. To this end, we provide confidence intervals for the quantities of interest granting that a certain inequality is violated with high probability (see Theorem 1 and, e.g, Appendix D).
Similarly, let us mention here that necessary and sufficient conditions for a matrix to be PSD can be expressed as different sets of polynomial inequalities in its moments. One of such sets can be deduced from the well-known (truncated) Stieltjes moment problem (see Appendix B). In Sec. V, we illustrate the usefulness of these inequalities by applying them to the entanglement detection of the ground state of the XXZ model (c.f. Fig 5). Let us finally also mention that, from a few moments of a Hermitian matrix, one can also obtain bounds on the distance between this matrix and the PSD cone [33].

A. Optimized condition for low-order moments
Due to (experimental) constraints, it might not be possible to determine all, but only a few, PT moments. This is why, we show here how to optimize necessary PPT conditions using only PT moments of order up to three. From the previous sections, we already have two examples of such conditions, namely the p 3 -PPT and D 3 conditions. As illustrated in Fig. 2, the p 3 -PPT (D 3 ) condition is tighter than D 3 (p 3 -PPT) for states with purity larger (smaller) than 1/2, respectively. As the low-order moments are easier to access experimentally, we now address the question about the optimal inequality involving PT moments of order up to three.
To answer this question, we use the following approach. For fixed values p 1 and p 2 of the first two moments, we determine the minimal value p min 3 that the third moment can reach for any PSD matrix [34]. From this bound, we know that any Hermitian matrix with a smaller third moment is necessarily not PSD. Naturally, we restirc ourselves to values of p 1 and p 2 which are compatible with a PSD matrix, and therefore satisfy Eqs. (9) and (10)  (thick black curve) conditions as a function of the second moment p2 for a normalized Hermitian matrix. According to the p3-PPT condition, any state ρ with a value of p3(ρ Γ ) below the dashed orange curve is entangled.
Similarly, the condition D3 shows that any state ρ with a value of p3(ρ Γ ) below the dashed green line is entangled. From this plot, it is clear that, for p2(ρ Γ ) > 1/2, all entangled states detected by the p3-PPT condition are also detected by D3, which coincides with D opt 3 in this case. When p2(ρ Γ ) < 1/2, the p3-PPT condition is then stronger than D3, and D opt 3 represents a slight improvement over the p3-PPT condition. As illustrated in Sec. V, this slight improvement can nevertheless be important for the detection of physically relevant states.
ues λ 1 , . . . , λ r , for some r ∈ [1, d], this optimization can be performed with the help of Lagrange multipliers. As shown in Appendix C, this leads to solutions with only two distinct eigenvalues λ a , λ b with multiplicity r a , r−r a , respectively, for r a ∈ [1, r]. Assuming, without loss of generality, that λ a λ b , the optimization of p 3 leads then to r a = r − 1. For each value of r, the optimal value of p 3 can be easily determined in the interval [1/r, 1/(r − 1)]. For r = 2 this leads to D 3 whereas for r > 2 one obtains an optimal value of p 3 which is slightly better than p 3 -PPT. Observe that p min 3 (p 2 ) is a piece-wise function and the derivative ∂p min 3 /∂p 2 is discontinuous at points p 2 = 1/r (see Fig. 2).

IV. SYMMETRY-RESOLVED ENTANGLEMENT DETECTION
Symmetries, as they often occur in physical situations, can be exploited to observe relevant phenomena (see e.g. Refs. [32,[36][37][38][39][40][41][42][43][44][45][46]). Here, we use symmetries to ease the detection of entanglement. More precisely, we apply the previously developed tools to symmetric states, which will lead to conditions of entanglement involving much lower moments of the partial transpose projected onto certain subspaces. Despite the fact that these quantities differ significantly from the moments of ρ Γ , we will show later on that they can nevertheless be estimated using the framework of classical shadows.
We consider a bipartite state ρ = ρ AB , with subsystems A and B containing n and m qubits, respectively. We assume that this state commutes with n+m i=1 Z i , or similarly with the total number operator N = N A + N B . Here and in the following, we denote by X, Y, Z the Pauli operators. Obviously, such a state has a block diagonal form, i.e., where each block (or sector) is labeled by an eigenvalue q ∈ {0, 1, . . . , n + m} of the operator N and has support in the corresponding eigenspace. Here, with and similarly for B. It has been shown [47] that, for this type of symmetry, the partial transpose ρ Γ is also block diagonal, but in a different basis. In fact, where P q is the projector onto the eigenspace of N A − N B with eigenvalue q ∈ {−m, −m + 1, . . . , n} [48], i.e.
The size of the sector corresponding to the eigenvalue q in the block-decomposition of ρ Γ is given by When the partial transpose of a density matrix has a block structure, it is naturally PSD iff each block is itself a PSD matrix. Therefore, one can apply the conditions of the previous section directly to the blocks ρ Γ (q) of the partial transpose. For the p 3 -PPT condition, the corresponding symmetry-resolved (SR) inequalities are simply for all q = −m, −m + 1, . . . , n. Any violation of a PSD condition in one of the blocks is then sufficient to show that ρ Γ has at least one negative eigenvalue and that ρ is therefore entangled. When using the D i conditions, symmetry-resolution can be a significant advantage (see e.g. Sec. V). First, the necessary and sufficient PSD conditions involve moments of order at most equal to the dimension of the largest block, that is n+m (n+m)/2 , which is necessarily smaller than the dimension of the density matrix itself. Second, since a block ρ Γ (q) of ρ Γ is (in general) not the partial transpose of any positive matrix [49], the inequality: is not necessarily satisfied. This implies that moments of order two can already be sufficient to detect entanglement.
As stressed in the introduction, using PT-moment inequalities to detect entanglement is particularly interesting from an experimental point of view, because such PT moments can be estimated, for instance using shadow tomography [21]. As we show in the following lemma, the shadow tomography protocol used in Ref. [21] can also be used to estimate moments of blocks of the partial transpose (which differ significantly from the PT moments) by slightly modifying the non-linear observable that has to be measured.

Lemma 2. Given a symmetric state ρ
where the operators L (k) i are given by |a k a 1 · · · a k−1 a 1 · · · a k | , Here, the sum over each a i (b i ) runs from 1 to 2 n (2 m ) respectively.
Proof. Denoting by ρ ab,a b the entries of the operator and using that ρ ab,a b = ρ Γ ab ,a b , it is straightforward to see that Then, we have that Here, the first equality follows from the definition of L (k) i and Eq. (17); the second, from tr(RS Γ ) = tr(R Γ S) for any two matrices R, S; and the third, from P Γ i = P i . Finally, using that P i = P 2 i are orthogonal projectors, the cyclic property of the trace, and the block structure of ρ Γ , we have which completes the proof.

A. Classical shadows
Classical shadows are a convenient formalism to reason about predicting properties of a quantum system based on randomized single-qubit measurements performed in single-copy experiments [28]. We refer the reader to [28] and Appendix D for an introduction to classical shadows. The original classical shadow formalism is contingent on noiseless measurements and sources that produce iid states. Subsequently, it was shown that classical shadows can also handle noisy measurements [50,51]. As we show in detail in Appendix D, the formalism can also be extended to take non-identical, but independent, state preparations ("drifts") into account, i.e. the source produces the states {ρ 1 , ρ 2 , . . . , ρ N }. In this case, each snapshotρ i will have a different expectation value and i.e, the average of the snapshots converges to the average state. Since different snapshots are statistically independent, it turns out that one can estimate linear functions, say tr(Oρ avg ).
These ideas regarding the prediction of linear observables do extend to higher order polynomials. Here, we restrict ourselves to the quadratic case [28] involving up to second order moments of a block of the partial transpose. An extension to higher order polynomials is conceptually straightforward, but can become somewhat tedious to analyze [21]. Let us fix a block label i and consider the second order moment inequality restricted to this block: where, using Lemma 2, Here, we will provide confidence intervals in the estimation of D (i) 2 (ρ avg ) given a fixed number of measurements. Observe that D 2 (ρ avg ) < 0 implies that the source is able to produce entangled states.
Let us finally introduce (see Appendix D for details) the empirical average, over N (N −1) pairs of independent snapshots, We can fix the desired approximation accuracy and a probability-of-error threshold δ to obtain a lower bound on the measurement budget N . For simplicity, let us consider a non-trivial sector (i.e. q = −m, n) and assume n + m 4. Then one has the following theorem (see Appendix D for a slightly better bound).

Theorem 1 (Error bound for D
. Fix ∈ (0, 1) (accuracy of the approximation), δ ∈ (0, 1) (probability-oferror threshold), a bipartition AB, as well as a symmetry sector i. Suppose that we perform randomized, single-qubit measurements on independent states. Then, Finally, let us stress that this error bound addresses the estimation of D (i) 2 (ρ avg ) in terms of a single U-statistics estimator. The poor scaling in 1/δ can be exponentially improved by dividing the classical shadow into equallysized batches and performing a median-of-U-statistics estimation instead [28]: 1/δ → const × log(1/δ). However, numerical experiments conducted in Ref. [21] suggest that this trade-off is only worthwhile if one attempts to predict many properties with the same data set.

B. SR inequalities applied to states without symmetries
In the last part of this section, we show that the SR inequalities can also be used to detect the entanglement of arbitrary states, including those that do not have any symmetry.
The reason for that is that there exists a local channel C that transforms any state ρ into a state σ ≡ C(ρ) that has the desired block structure. The channel can be realized with local operations assisted by classical communication, and can thus not generate entanglement. Therefore, the initial state ρ must be at least as entangled as the final block diagonal state σ. This statement holds for any entanglement measure. As a consequence, if entanglement is detected in σ (which can be investigated using the symmetry-resolved tools), then ρ is necessarily also entangled. In other words, looking at the entanglement of σ, the "block-diagonalized" version of ρ, provides a sufficient condition of entanglement for ρ. This condition is not necessary as it could be that the channel C destroys all the entanglement of ρ.
The local channel that can be used for this approach is the following: where k = log(n+m) +1 and U i = Z i/2 k . The fact that this channel maps ρ to a state σ that is block-diagonal in the number-of-excitations basis can easily be seen as follows. First, observe that for any j ∈ {0, . . . , 2 (n+m) − 1}, the computational basis state |j is an eigenvector of U i , associated to an eigenvalue, (−1) |j|i/2 k , that essentially depends on |j|, the number of excitations of |j . Therefore, we have For any j and j having different number of excitations, i.e. such that |j| = |j |, the sum over i in Eq. (19) vanishes, explaining why σ is diagonal in the numberof-excitations basis.
As can be seen from the argument above, the non-zero elements of σ are all equal to the corresponding elements of ρ. This implies that the channel can effectively be replaced by a sum of projectors onto all the number-ofexcitations sectors. From an experimental point of view, the practical implementation of this channel can thus be circumvented by using the observables of Lemma 2 in the post-processing of the classical shadows.

V. APPLICATIONS
In this section, we apply and compare the entanglement conditions presented in the previous sections on various physical systems. For the systems possessing a symmetry as discussed in Sec. IV, we highlight some of the advantages that can result from considering symmetryresolved entanglement detection tools.

A. Entanglement detection in quench dynamics
We begin by considering the situation of quench dynamics, where entanglement emerges from the dynamics of a many-body Hamiltonian. We consider the model presented in Ref. [32], where the interplay between coherent dynamics with U (1) symmetry and dissipation leads to a dynamics of 'purification'. Here, we will use the same formalism to show how entanglement is generated at short times, and can be detected via the symmetry-resolved versions of the D 2 and p 3 -PPT conditions. In the next subsection, we will consider an analogous experimental situation obtained with trapped ions [25]. Our model is described by a master equation

and the Hamiltonian
and where γ is the spontaneous emission rate. Here, we consider open boundary conditions. The hopping between spins i and j is described by the coefficient J ij and we consider this subsection nearest-neighbor hopping As shown in Ref. [32], the time evolved state ρ(t) of the N spin system has the block diagonal form of Eq. (13). Moreover, the partially transposed matrix w.r.t a partition A, ρ Γ is also block diagonal with blocks ρ Γ (q) . Here, the index q represents the difference between the number of spin excitations in A and the one in the complement partition B (see also Sec. IV).
As we are interested in short time dynamics, we can solve Eq. (20) in first order in perturbation theory, which is valid for t 1/J, 1/γ. Considering for concreteness a half-partition A, made of the first N A sites, we obtain a block with a negative eigenvalue [32]. Assuming for simplicity N A = N/2, N A even, we obtain The presence of a negative eigenvalue in this sector can be detected from the value of the moments in leading order in J γN A . In particular, the p 3 -PPT ratio and the D 2 condition can be used to reveal the presence of entanglement at short times. We show in Fig. 3 a numerical confirmation of these results for various values of γ/J and N = 8, Symmetry resolved entanglement detection in quench dynamics with spin excitation loss. We study SRentanglement in quench dynamics in a system consisting of N = 8 spins initialized in a Néel state | ↓↑ ⊗N/2 and evolved with HXX subject to spin excitation loss with various rates γ (γ/J increases with the darkness of the color, see insets). We take A = [1, 2, 3, 4] and B = [5,6,7,8]. In panels a) and b), the D2 ratio and p3-PPT ratio of sector q = −1 are shown, respectively. Entanglement is detected for values below unity in the shaded gray areas. The insets in a) and b) show the early time value at t = 0 + of the D2-ratio a) and p3-PPT ratio b), respectively, as function of the decoherence rate γ/J. Black lines are the perturbation theory results displayed in Eqs. (28) and (27). which was obtained by simulating Eq. (20). We note that, in the present context, utilizing symmetry-resolution is fundamental to detect entanglement: this is due to the fact that the negative eigenvalues in ρ Γ appear in sectors that are not macroscopically populated [32], so that moments without symmetry resolution would not be able to detect them.

B. Experimental demonstration in a trapped-ion quantum simulator
In the previous section, we showed in an idealized theoretical setting that entanglement is generated -and can be revealed via SR-entanglement conditions-at early times after a quantum quench. Here, we demonstrate this effect experimentally via the measurement of the SR-D 2 and SR-p 3 -ppT condition using randomized measurement data taken at early times after quantum quench in a trapped ion quantum simulator (c.f. Ref. [25]). In particular, we show that the SR-D 2 condition and SR-p 3 -PPT condition allow for a fine-grained detection of bipartite entanglement, in regimes where the corresponding global conditions [21] and conditions relying on the purities of different subsystems [25] are not conclusive.
In the experiment reported in Ref. [25], a onedimensional spin-1/2-chain, consisting of N = 10 spins, was initialized in the Néel state |↑↓ ⊗5 and time-evolved with the Hamiltonian H XY [Eq. (21)] where the coupling parameter J ij follows the approximate power-law decay The Hamiltonian evolution exhibits a global U (1)-symmetry conserving the total magnetization of the system (i.e., [H, i Z i ] = 0). Symmetry-breaking terms (such as ) are strongly suppressed due to a large effective magnetic field [25]. As detailed in Refs. [25,32] weak decoherence effects are present in the experiment, including imperfect initial state preparation, local spinflips and spontaneous emission during the dynamics, and measurement errors model as local depolarization. Note that coherent spin-flips do not preserve the global magnetization and block-diagonal form of the (reduced) density matrix. On the timescales accessed in the experiment, these effects are however very weak (causing in numerical simulations including the above decoherence model a purity mismatch of the order of 10 −5 of the full 10-spin density matrix ρ vs. the projected one ρ Q = q Q q ρQ q at t = 5ms).
In Ref. [25] randomized measurements were performed at various times (t = 0ms, . . . , 5ms) after the quantum quench. As described in detail in Ref. [32] (see also Sec. IV A and Appendix D), we can use this data to estimate SR-PT moments and the SR entanglement conditions via classical shadow formalism [28]. In Fig. 4, we present the SR D 2 and p 3 -PPT conditions in the different sectors, for a subsystem consisting of the neighbouring spins A, B = [4,5], [6,7] and where the partial transpose is taken in the subsystem A = [4,5]. Similar to the results of the previous subsection, both conditions detect entanglement at short times in the sector q = −1. The corresponding global conditions, in particular the global p 3 -PPT condition, do not reveal the presence of entanglement in this regime [see Fig. 4 The fact that the SR-D 2 condition can reveal the presence of entanglement is particularly interesting from an experimental point of view as it implies that entanglement can be detected from the estimation of only two moments of the partial transpose (in a sector). For the shadow estimation of D 2 (−1), our rigorous bound from Theorem 1 ensures that ∼ 1.3 × 10 6 measurements would be sufficient to guarantee entanglement detection with a probability of 95%. While this represents an upper bound, valid irrespective of the quantum state in question, for the specific states in the experiment only 8 × 10 5 have been performed. The errorbars of the experimental are then drawn at 1.96σ where the standard error of the mean σ has been estimated for each data point using Jackknife resampling [52].
While the SR-D 2 condition requires only the estimation of first and second PT-moment, the third order SR-p 3 -PPT condition [panel b)], allows to detect entanglement in an even wider time window. In comparison to the global p 3 -PPT condition [red curve in panel b)], this clearly demonstrates the benefit of taking symmetryresolution into account.

C. Entanglement detection in the ground state of the XXZ model
The XXZ spin chain is a generalization of the Heisenberg chain including an anisotropy in the interaction FIG. 4. Experimental SR-entanglement detection in a trapped ion quantum simulator using data obtained in Ref. [25]. For a total system of N = 10 spins and subsystem A, B = [4,5], [6,7], we present in a) the SR-D2 ratio and b) SR-p3-PPT ratio as a function of time for various symmetry sectors . In both panels, entanglement is detected in regimes where the corresponding global conditions do not reveal entanglement, as indicated in the shaded grey areas (values below unity). Dashed (solid) lines are theoretical simulations of unitary dynamics (taking decoherence effects into account), as detailed in Refs. [25,32].
along the z direction, whose Hamiltonian reads: We will fix J = 1 as energy unit: J z sets the strength of the anisotropy along the z-axis. The phase diagram at zero temperature is known [53]: the system hosts an antiferromagnetic phase when J z < −1, a Luttinger liquid for J z ∈ [−1, 1], and a ferromagnetic one for J z > 1. We might expect that the entanglement conditions we described in the previous sections will detect that the state is not PPT in the range J z ∈] − ∞, −1]. Since the XXZ spin chain exhibits a U (1) symmetry related to magnetization conservation, we can exploit the symmetryresolved counterpart of the D k conditions, the p 3 -PPT and their optimized version D opt 3 . The simulation results are shown in Fig. 5. We consider the ground state of an open chain of L = 14 sites. In a) and b), we select = 10 sites in the middle as subsystem A and divide it in two parts A = A 1 ∪ A 2 . We use the negativity as a reference to benchmark the efficiency of some entanglement conditions to detect entanglement between A 1 and A 2 .
In Fig. 5a), we calculate the p 3 -PPT, the D 3 , the optimal D opt 3 condition and the Stieltjes condition using moments up to order five (see Appendix B). The convention we choose in the plot is that entanglement is detected whenever the value is positive. All the conditions work in most of the interval J z ∈ [−4, 1], where we expect entanglement to be sizeable, except for D 3 failing in the vicinity of J z = 1. The presence of entanglement is confirmed by the calculation of the negativity (red line).
In Fig. 5b) we focus on the q = 1 sector. In this case, we observe that all conditions indicate the presence of at least a negative eigenvalue in the sector ρ Γ A (q = 1)that is, they are informative about which sector for the reduced density matrix contributes to violating PPT. In this specific instance, SR is however not fundamental in detecting entanglement beyond what non-SR conditions can.
In Fig. 5c) and d), we carry out the same analysis for disconnected partitions. We consider A = A 1 ∪ A 2 , where A 1 consists of the first l/2 sites and A 2 of the last l/2, and L = 14, l = 10. In Fig. 5c) for J z ∼ −1.9 all the quantities except Stieltjes 5 are below zero, thus not revealing entanglement even though the negativity is positive. In this plot, one can also see that, for J z < −2, the optimized condition D opt 3 detects entanglement whereas both p 3 -PPT and D 3 fail. This illustrates that the slight improvement obtained from the optimization in Sec. III A (see Fig. 2) can be decisive to detect the entanglement of physically relevant states from the first three moments only.

D. Entanglement detection under constrained dynamics
As a third example, we study the detection of mixedstate entanglement in subsystems of constrained spin models after a global quantum quench. Such models have been realized experimentally with neutral atoms in optical tweezer arrays coupled to Rydberg states [54,55]. Below we simulate an experiment, in which moments of the partially transposed density matrix are obtained from a classical shadow involving global random unitaries available in current experimental setups. In particular we demonstrate that periodic revivals of mixed state entanglement can be detected from the conditions D 3 and D 4 (Eqs. (11) and (12)) requiring only a small number of experimental runs.
We consider the Fibonacci chain with open boundary conditions described by the Hamiltonian where P i = |0 i 0| are local projectors. As can be seen from (30), each spin undergoes independent Rabioscillations as long as the neighbouring spins are in their ground state |0 . This constraint breaks the tensor product structure of the Hilbert space (as it the case in a lattice gauge theory [56]). The model effectively resembles the experimental situation in [54] if the Rydberg atoms are driven close to resonance and neighbouring atoms cannot be simultaneously in the state |1 due to the Rydberg blockade mechanism. The Hamiltonian (30) has recently attracted great interest in the context of quantum many-body scarring [57,58]. In particular, performing a quantum quench on special unentangled product states results in long-lived periodic revivals which have been attributed to the existence of quantum scarred eigenstates in the many-body spectrum [57].
In the following we study the conditions given in Eqs. (11) and (12), when a quench is performed from a product state that leads to kinetically constrained dynamics. To this end, the initial state |Ψ 0 = |10 ⊗N/2 is time evolved with the Hamiltonian (30) up to t = 50/Ω. Fig. 6 a) shows the local Z i -expectation values exhibiting long-lived persistent oscillations. This striking departure from a thermalizing behaviour is also reflected in the slow growth of entanglement entropy (Panel b). We now analyse the time resolved behaviour of mixed state entanglement for a subsystem depicted in the inset of Fig. 6 c). The revivals in the negativity indicate that spins in the subsystem get periodically entangled and disentangled with each other. Interestingly, the p 3 -PPT condition is unable to detect the revivals, while D 3 yields positive values at the first 3 peaks in the negativity. At later times, the D 3 fails to detect the entanglement present in the system, but this entanglement is still captured by D 4 .
Finally, we investigate the required number of experimental runs in order to measure the conditions up to given error bar. The classical shadow is constructed by sampling bit strings from the quantum state after applying a global random unitary on the subsystem A. At each point in Fig. 6 c), we collect 5000 bit strings in different random basis. Note that global random unitaries in Rydberg systems can be implemented via random quenches with local disorder potentials [59]. The entire estimation of the conditions is repeated 20 times in order to obtain statistical uncertainties. Note that statistical covariances among the measured moments tr(ρ Γ ) n can give rise to nonuniform sizes for the error bars as can bee seen in Fig. 6 c). In Fig. 6 c) we depict the 2σ error bars, showing that entanglement can be detected with a moderate experimental effort.

VI. CONCLUSIONS AND OUTLOOK
The study of entanglement has a long and prominent history in a variety of disciplines. And with the advent of serious quantum technologies, reliable entanglement generation is more important than ever. This work provides a novel and principled approach to reliably detect bipartite entanglement between subsystem A and subsystem B. We have presented a set of inequality conditions D k (1 k 2 |AB| ). Each D k is an inequality that compares the first k moments of the partially transposed density operator. Violation of a single inequality implies that the underlying density operator cannot have a PSD partial transpose. This in turn implies that the state must be entangled. Conversely, if the underlying state is not PSD, then, there must exist at least one D k that is violated. This motivates a sequence of one-sided entanglement tests. Start with D 3 -the lowest non-trivial condition -and check whether it is violated. If this is the case, we are done. If not, we move on to the next higher condition (D 4 ) and repeat until we find a violation. For states having an extensive conserved quantity (such as total magnetization, in the case of spin systems), both the density matrix and its partial transpose have a block-diagonal structure [47]. In this case, it is advisable to apply these conditions directly to individual symmetry sectors of the partial transpose. The resulting sequence of symmetry-resolved conditions is stronger in the sense that lower moments (of blocks of the partial transpose) suffice to detect entanglement. Importantly, this approach is not only conceptually sound, but also tractable from an experimental perspective. The classical shadows formalism [28] allows for reliably estimating moments of the partial transpose from randomized single-qubit measurements. We demonstrated how to include the experimentally relevant situation of non identical (however independent) copies in the analysis and derived error bounds and confidence intervals for D 2 , with a natural extension to quantities involving higher moments. Empirical evaluations complement our theoretical findings. Applications to several theoretical models, as well as experimental data, demonstrate both tractability and viability of our approach.
We are confident that this work opens up several interesting future research directions. Firstly, the sequence of D k 's is designed to detect bipartite entanglement in a reliable and experimentally accessible fashion. A natural next step is to try to extend similar ideas to multipartite entanglement detection, e.g. using non-linear entanglement witnesses [60,61]. Secondly, the complete sequence of D k 's is used to answer a binary question: is the partial transpose negative or not? Entanglement measures, like the negativity, address entanglement in a quantitative fashion, but are also harder to estimate. Is it possible to use moments (or other density matrix functionals) to define entanglement measures that are experimentally tractable? The statistical analysis of the estimation procedure is also far from complete. We have shown that independence between the states that are produced in each iteration of an experiment is enough to derive statistically sound confidence intervals for estimating matrix moments with classical shadows. This addresses the practically relevant case of drifting sources, but further extensions to correlated states would also be appealing. In this context the quantum de Finetti theorem seems highly relevant. In future work, we will also investigate how importance sampling [62,63] and/or derandomization [28,64] can further improve moment estimation based on classical shadows. Finally, another promising direction of research would be to try to detect and characterize phase transitions in quantum mechanical Hamiltonians at finite (non-zero) temperature. Quantum phase transitions at zero temperature originate from quantum fluctuations, whereas quantum phase transitions at finite temperature are due to thermal fluctuations. Following Ref. [65], quantum phase transitions at finite temperature can be studied using the negativity. It would be interesting to investigate whether low-order PT moments, intimately related to the negativity, can also be used to this end.
Note: While completing the writing of the present work, we became aware of a work by Yu et al. [66], in which similar questions have been addressed.
For convenience, let us now consider the polynomial P (−t), which effectively replaces the positive eigenvalues of A by negative ones and vice versa. The coefficients of this polynomial can be expressed using the elementary symmetric polynomials in its roots, e i (λ 1 , . . . , λ d ), defined as For a polynomial with real roots (as it is the case here), Descartes' rule of sign states that the number of positive roots is given by the number of sign changes between consecutive elements in the ordered list of its nonzero coefficients (see Ref. [67] and references therein). The matrix A is PSD iff P (−t) has only negative roots, which by Descartes' rule is the case iff there is no sign change in the ordered list of its non-zero coefficients, i.e., iff e i (λ 1 , . . . , λ d ) 0 for all i = 1, . . . , d, since e 0 (λ 1 , . . . , λ d ) = 1. (B1) Defining the matrices and a solution to this problem can be stated as follows [69]. These solutions to the Stiltjes moment problem can be used to obtain entanglement conditions. Given λ 1 , . . . , λ r the eigenvalues of ρ Γ for some density matrix ρ, let us define the (atomic) eigenvalue distribution function where δ is the Dirac delta distribution. If ρ is PPT, this density function has support on [0, ∞) and reproduces the moments of ρ Γ , as Therefore, according to the solution of the Stieltjes moment problem mentioned above, the moments of any PPT state necessary satisfy either condition (B4) or (B5), depending on the value of r. The violation of any of these conditions for a set of PT moments thus reveals that the corresponding state must be entangled. The range condition may require the knowledge of all the moments to be checked, but the PSD conditions can be broken into sets of simpler conditions. And some of them only involve low order moments. Indeed, it is well known (see e.g. [70]) that a matrix is PSD if and only if all its principal minors are non-negative. For instance, looking at the principal minor at the intersection of the first two rows and columns of B(k), one obtains the condition m 1 m 3 − (m 2 ) 2 0. This condition is nothing but the p 3 -PPT condition (which we know is useful to detect entanglement [21]). Extending this principal minor to the third row and column, one gets another PPT condition: We call this condition Stieltjes 5 . We illustrate in Sec. V (c.f. Fig 5) that this condition is also useful for entanglement. Numerical computations suggest that this condition is a powerful tool to detect the entanglement of random mixed states, in the sense that it detects more random entangled states than either p opt 3 or D 5 . Not all Stieltjes moment conditions are this powerful, though. For instance, the principal minor condition for the first two rows and columns of A(k) is trivial.
Note that, because we consider here an atomic density function, we have m 0 = r. We could naturally renormalize the density function so that m 0 = 1, but it would imply a re-scaling of the first moment, i.e., the trace of ρ Γ , would be 1/r. Since the partial transpose and the density function cannot be normalized at the same time, we chose to keep normalized partial transposes.

C: Appendix 3. Optimizing conditions involving moments up to degree three
Given a PSD matrix A, with non-zero eigenvalues λ 1 , . . . , λ r , for some r ∈ [1, dim A], consider the Lagrangian function where C 1 and C 2 are Lagrange multipliers.
Here we show that, for all 1 r dim A, the stationary points (λ 1 , . . . , λ r ) of the Lagrangian function (C1) are such that the variables λ i can take at most two distinct values. These stationary points, for which the derivatives of the Lagrangian (C1) with respect to each variable vanish, satisfy the set of equations We first sum up Eq. (C2) for all values of i and then insert Eqs.(C3) and (C4) into it. This yields Inserting this relation into Eq. (C2) and considering this equation for two distinct values of i, say 1 and k = 1, one can eliminate the variable C 1 to get a relation between λ 1 and λ k . After some algebra, one finds Since this argument holds for any k = 1, it must hold that the eigenvalues λ i are either all equal or can only take two distinct values. In the first case, in which all the eigenvalues are equal, one obtains the isolated points (p 2 , p 3 ) = (1/r, 1/r 2 ) in Fig. 2 in the main text. In the second case, the rank r PSD matrices corresponding to the stationary points of the Lagrangian (C1) have a spectrum with r a degenerate eigenvalues λ a and r − r a eigenvalues λ b . Assuming, without loss of generality, λ a > λ b , one can show that the minimal value of the third moment is obtained when r a = r − 1.

D: Appendix 4. Estimating block PT moments with classical shadows: rigorous confidence regions
Classical shadows are a convenient formalism to reason about predicting properties of a quantum system based on randomized measurements [28,71]. In Ref. [21], a classical shadow error analysis was carried out for the estimation of partial transpose moments (of order two and three) from randomized single-qubit measurements performed on a single copy of the experimental state at a time. In this extended appendix, we go one step further an present a detailed error analysis in case one does not assume that the states produced in the experiment are identical in each iteration of the experiment (as it would be the case for a "drifting" source). For completeness, we first recall the fundamental ideas of classical shadows. After this presentation, we focus on the estimation of quadratic observables and provide rigorous error bounds for the estimation of quadratic polynomials in the moments of the partial transpose of a projected density operator.

Classical shadows
Suppose, we are interested in quantum systems comprised of n qubits. Suppose furthermore, that we can perform certain unitary transformations U ∈ E (ensemble), as well as a measurement in the computational basis: It is instructive to analyze the quantum-to-classical channel that arises from first performing a randomly selected unitary transformation ρ → U ρU † followed by a computational basis measurement: The integral over E is an average over all possible classically randomized measurement settings. The summation over b averages over quantum randomness associated with measurement outcomes (Born's rule). It is easy to check that M E (·) is always a quantum channel, i.e. a completely positive and trace-preserving map. Viewed as a linear operator, this channel is also invertible if the underlying ensemble E is sufficiently expressive. More precisely, we require that the complete family U † |b b|U : U ∈ E of admissible basis measurements is tomographically complete. That is, for all ρ = σ, there exists a U ∈ E and a outcome b ∈ {0, 1} n such that b|U ρU † = b|U σU † |b . Fact 1. Suppose that U † |b b|U : U ∈ E, b ∈ {0, 1} n is a tomographically complete family of basis measurements. Then, M E has a well-defined and unique inverse M −1 E (·). From now on, we will always assume that we are dealing with tomographically complete families of basis measurements. Classical shadow estimation with randomized measurements is based on the following basic routine: 1. state preparation: prepare a copy of the unknown quantum state ρ; 2. randomized single-shot measurement: sample U ∼ E at random, transform ρ → U ρU † and measure in the computational basis; 3. construct a classical snapshots: upon receiving outcomeb ∈ {0, 1} n , computê By construction, each snapshot is a random matrix that exactly reproduces the true underlying state ρ in expectation (over both the classical choice of transformation and the quantum randomness in the basis outcome). That is, Statistically speaking,ρ is an unbiased estimator of the underlying quantum state ρ. But a single snapshot is only a very poor estimator. This situation changes if we have access to multiple independent snapshots {ρ 1 , . . . ,ρ N }. We call such a collection a classical shadow of ρ with size N . Forming the empirical average of snapshots within a classical shadow produces ever more accurate approximations of the true underlying state: The main results in Ref. [28] highlight that classical shadows of moderate size already allow joint estimation of many interesting state properties. More precisely, the classical shadow formalism allows for computing powerful a-priori bounds on the convergence behavior of such estimators.

Implicit assumptions and relaxations thereof
Before moving on, it is worthwhile to delineate implicit assumptions within the classical shadows model. It turns out that almost all of them can be relaxed without threatening statistical guarantees like error bounds or confidence intervals.
a. Perfect/noiseless measurements: the original randomized measurement framework is contingent on perfect knowledge of the average quantum-to-classical channel (D1). Erroneous executions of the ensemble rotation U , or noisy executions of the subsequent computational basis measurement can thwart this assumption. However, recent results [50,51] highlight that a suitably extended shadow formalism can handle such imperfections. The key idea is to adjust the inversion formula (D2) appropriately. Ref. [51] achieves such an adjustment by assuming explicit knowledge of the (average) noise channel, while Ref. [50] actually goes a step further and proposes a tractable calibration protocol that reveals sufficient information to appropriately correct Eq. (D2).
b. Independent and identically distributed state copies: because quantum measurements are typically destructive, most estimation protocols assume access to a perfect source that produces independent and identically distributed (iid) copies of the underlying quantum state ρ. Formally, N queries of such a perfect iid state source produce the state ρ ⊗N and we subsequently proceed to measure independent copies sequentially. This particular tensor product structure is a strong assumption that combines stochastic independence (individual state copies are completely uncorrelated) with identical distribution (all state copies are identical). This second assumption is often violated in concrete experimental architectures. Small fluctuations within the device can lead to state copies that, although uncorrelated, vary in time ("drifting source"). N state copies produced by such drifting (but independent) sources can be modelled by a sequence {ρ i } N i=1 of non-identical quantum states. The classical shadow formalism can readily handle drifting (but independent) sources. Each snapshotρ i will have a different expectation value and Eq. (D3) needs to be adjusted accordingly: Hence, empirical averages of classical shadows are wellsuited for approximating linear properties of the average source state ρ avg . We will show below that this desirable feature extends to the shadow estimation of polynomials as well. Note that it is, sufficient to know e.g. that the polynomial, D 2 evaluated at the average state is negative to ensure that the source is capable of producing entanglement, as in this case there must existe a ρ k which leads to a negative value. In order to handle drifting sources, we will also assume access to trusted classical randomness that allows us to randomly select unitary transformations U ∈ E. Mild by comparison, this assumption also features as an explicit (or implicit) assumption in other statistically sound treatments of entanglement detection, see e.g. [72] and references therein.
Based on these assumptions, we will establish rigorous error bounds and confidence intervals for predicting polynomial functions based on classical shadows. Rather than treating this problem in full generality, we focus on estimating i.e., the first symmetry-resolved polynomial that is capable of detecting NPT entanglement. Theorem 3 below highlights that an order of 2 |AB| / 2 randomized measurements suffice to approximate D (i) 2 (ρ avg ) up to additive accuracy . Corollary 2 reformulates this insight in terms of confidence intervals.
Finally, we point out that assuming access to independent state copies (tensor product structure) is not a mild assumption. But, the classical shadow estimators below are invariant under permuting individual state copies. Permutation invariance suggests that strong proof techniques from quantum cryptography -like the quantum de Finetti theorem, see e.g. [73,Chapter 7] and references therein -may be applicable and allow for relaxing the independence assumption as well. We intend to address this in future work.

Predicting linear functions with classical shadows
We are now ready to discuss the simplest use-case of classical shadows: estimate a linear function, say tr(Oρ), based on N randomized measurements of independent (but not necessarily identical) states. We can achieve this by simply replacing the unknown quantum state ρ by an empirical average of snapshots within a classical shadow: Independence ensures that the individual snapshotsρ i are stochastically independent random matrices. This in turn implies that each tr(Oρ i ) is a stochastically independent random variable. Empirical averages of independent random variables tend to concentrate sharply around their expectation value -regardless of the underlying distribution. The variance is an important summary parameter that can control the rate of convergence. Chebyshev's inequality, for instance, implies for > 0 This tail bound is a consequence of independence alone. The remaining average variance does depend on the ensemble E. Different ensemble give rise to different variance contributions [28,71]. Here, we focus on the practically most relevant case: randomized, single-qubit measurements. Each qubit is measured in either the X-, the Y -, or the Z-basis. More formally, the ensemble E subsumes random single qubit Clifford rotations. That is U = U 1 ⊗ · · · ⊗ U n with U 1 , . . . , U n iid ∼ Cl(2) and Cl(2) denotes the single-qubit Clifford group, i.e., the finite group generated by Hadamard and phase gates. More generic single-qubit ensembles (like Haar-random unitaries) are also an option -what matters is that the single qubit ensemble forms a 3-design [74,75]. The (single-and multi-qubit) Clifford group is one ensemble with this feature [76][77][78]. As demonstrated in Ref. [28], the 3-design assumption allows us to compute the measurement channel (D1), as well as its inverse. Let D 1/3 (X) = 1/3 (X + tr(X)I) denote the single-qubit depolarizing channel with parameter 1/3. Then, and we refer to [28,Supplementary Information, Section C] for details. We see that the measurement channel (and its inverse) factorize nicely into a tensor product of singlequbit operations. This is also true for snapshots (D2) within a classical shadow: Alternatively, we can fix a confidence level α and a total measurement budget N . In this case, Eq. (D9) provides us with a bound on the approximation accuracy. Together with the empirical averageô (N ) itself, this provides a statistically sound confidence interval.

Corollary 1 (general confidence interval).
Fix an observable O, a confidence level α ∈ (0, 1), as well as a measurement budget N (comprised of independent states). Then, the true observable average tr(Oρ avg ) is contained in the interval with probability (at least) α.
These statements tell us that the measurement budget N (required number of independent state copies) should scale with 2 w(O) tr(O 2 ) and the approximation error decays as 1/ √ N . This is asymptotically optimal because of the central limit theorem, but the scaling in 1/(1 − α) is extremely poor. More sophisticated estimation techniques -like median of means instead of empirical averages [28] -improve this dependence exponentially from 1/(1 − α) to const × log(1/ (1 − α)).

Predicting quadratic functions with classical shadows a. U-statistics estimator
The linear prediction ideas from above do extend to higher order polynomials. But in contrast to before, independent, but not identical, state copies ("drifting sources") do require extra attention. Here, we restrict our attention to quadratic polynomials [28]. An extension to higher order polynomials is conceptually straightforward, but can become somewhat tedious to analyze [21]. Recall that we can rewrite any quadratic function in ρ as a linear function in the tensor product ρ ⊗ ρ: We can approximate this function by replacing each exact copy of the unknown state with distinct classical snapshots (sayρ i andρ j , with i = j). Independence of the underlying states ensures stochastic independence of the classical snapshots and we conclude tr (Qρ i ⊗ρ j ) obeys Etr (Qρ i ⊗ρ j ) = tr (Qρ i ⊗ ρ j ) . This is not a good estimator (yet). We can improve approximation accuracy by empirically averaging over all distinct pairs of N classical shadowsρ 1 , . . . ,ρ N : This is the simplest example of a U-statistics estimator [79]. It is invariant under permuting the individual snapshotsρ i andρ j . This invariance allows us to also symmetrize the quadratic observable. We can without loss assume tr(QX ⊗ Y ) = tr (QY ⊗ X) for all matrices X, Y with compatible dimension. Such a symmetry will simplify our derivations considerably.

b. Deterministic bias
The estimator average (D10) exactly reproduces q (ρ avg ) = tr(Qρ avg ⊗ ρ avg ) if and only if the underlying states are identical (ρ 1 = · · · = ρ N = ρ). If this is not the case, the expectation values of individual classical shadows can be distinct from each other. This can introduce a bias when attempting to estimate the average behavior of a quadratic function. Fortunately, any such bias is suppressed by 1/N and approaches zero once the number of measurements gets sufficiently large. This is the content of the following statement. Let · 1 and · ∞ denote the trace and operator norm, respectively.

Lemma 4 (quadratic bias for non-identical states).
Let {ρ 1 , . . . ,ρ N } be a classical shadow that arise from measuring independent states ρ 1 , . . . , ρ N . Set ρ avg = 1 N N i=1 ρ i and consider a quadratic function q(σ) = tr(Qσ ⊗ σ). Then, the associated U-statistics estimator (D10) obeys Note that the bias term ∆ vanishes if all states are identically distributed (ρ i = ρ j for all 1 i, j N ) and can never be too large either: as the trace norm difference between two quantum states is at most two. Many quadratic functions, which are particularly relevant in entanglement detection, also obey Q ∞ 1.
Proof of Lemma 4. Apply E [ρ i ⊗ρ j ] = ρ i ⊗ ρ j (independence) and elementary reformulations to conclude Apply matrix Hoelder to this reformulation to obtain a slightly stronger version of the advertised bound: because the trace norm is multiplicative under tensor products and quantum states ρ i satisfy ρ i 1 = 1.

c. Variance bounds
In analogy to the linear estimator (D5) (empirical average), the U-statistics estimator (D10) converges to its expectation value E q (N ) . The variance, which we compute in the following, provides a useful summary parameter for the rate of this convergence. Use E [ρ i ⊗ρ j ] = ρ i ⊗ ρ j to rewrite the U-statistics variance as We can now analyze these contributions separately. And, owing to stochastic independence, most of them vanish identically. It is at this point, where the assumption of independent state copies (and access to independent randomness for selecting measurements) matters the most. For instance, if all indices i, j, k, l are distinct, the expectation value factorizes and produces a zero contribution. As detailed in [21,Supplemental Material] the only exceptions are contributions where at least two indices coincide. Together with symmetry of the observable (tr (QX ⊗ Y ) = tr (QY ⊗ X)) and the AM- The final reformulation allows us to re-use the linear variance bound from Lemma 3. For 1 j N , we define the effective single-copy observable to recognize simple linear variance terms within the first sum.
Var q (N ) Clearly, this upper bound becomes smaller as the measurement budget N increases.

d. Error bound and confidence interval
Having derived bounds on deterministic bias and variance allows us to deduce a general error bound. Markov's inequality implies where we have isolated statistical fluctuations from the underlying deterministic bias. Inserting the bounds from Eq. (D12) and Lemma 4 renders this bound more explicit: Term two and three have a comparable scaling in N and . The first term is different and starts to dominate as N increases. Recall that Q(ρ i ) = tr (QI ⊗ ρ i ) denotes an effectively linear function on a single copy of AB. This is the power of U-statistics. Asymptotically, it is an effectively linear scaling term that dominates the statistical convergence rate of a quadratic estimator. The other terms, however, can dominate in the small-N regime.

E: Concrete guarantees for estimating D2
A direct conversion into error bound and confidence interval is conceptually straightforward, but somewhat cumbersome. Different contributions with distinct scaling behavior must be balanced against each other. This renders fully general statements somewhat difficult to parse. Instead, we derive concrete error bounds and confidence intervals for a concrete, and interesting, quadratic polynomial, namely D 2 .

Rewriting D2 as a quadratic function
As mentioned in the main text, the D 2 PPT condition is trivially satisfied by the partial transpose of any density matrix. Nevertheless, its symmetry-resolved counterpart provides a non-trivial entanglement condition, and is, from an experimental point of view, the most affordable entanglement condition we proposed. Indeed, it requires to estimate only the first two moments of the partial transpose within a sector. For this reason, we consider here states possessing the same symmetry as in Sec. IV of the main text. That is, we focus here on a (n + m)-qubit mixed state ρ that we view as bipartite states, with a subsystem A containing n qubits and subsystem B containing m qubits. Moreover, we assume that ρ commutes with the total number operator N = N A + N B .
Let us recall here that such a state ρ has the block diagonal structure ρ = i Q i ρQ i and that its partial transpose is also block diagonal, albeit in a different basis: ρ Γ = i P i ρ Γ P i (see Eqs. (14) and (15) for the definitions of the projectors Q i and P i ). We fix a block label i and consider the second moment inequality restricted to this block: This inequality is true if and only if the following homogeneous polynomial is nonnegative: Lemma 2 in the main text allows us to rewrite this expression as a linear and symmetric function in ρ ⊗ ρ: where Q is a linear operator that acts on two copies of the bipartite operator space . It is defined in terms of the orthogonal projectors as well as different swap operations. Let W A (W B ) and W AB denote the operator that swaps both copies of A (B) and AB, respectively. Then, obeys Eq. (E1) in a symmetric fashion, i.e. tr (Qρ ⊗ σ) = tr (Qσ ⊗ ρ) for ρ, σ ∈ B(H AB ). We can use this two-copy operator to construct a U-statistics estimator based on (independent) classical shadows: Due to Lemma 4, and Eq. (D13) allows us to rigorously control the speed of convergence.

Bounding the relevant figures of merit
The bound provided by Eq. (D13) depends on the squared Hilbert-Schmidt norms of Q, as well as the squared Hilbert Schmidt norm of where σ may range over all possible source states ρ 1 , . . . , ρ N . The weights (localities) of Q and Q(σ) also play an important role. Alas, these are essentially as large as they can be. Viewed as observables on one and two copies of AB, respectively, the operators Q(σ) ∈ B(H AB ) and Q ∈ B(H AB ) ⊗2 act nontrivially on all qubits involved. Hence, we must conclude w (Q(σ)) = |AB| = (n + m) and (E5) w(Q) = |ABAB| = 2(n + m).. (E6) Next, note that the sum a−b=i Π a ⊗Π b describes an orthogonal projection P i on the AB-system. (Tensor products of) orthogonal projectors have operator norm one ( P i ∞ = 1). Operator norms are also invariant under permuting tensor factors. Therefore, Let us now move on to computing the Hilbert-Schmidt norms (squared) of Q and Q(σ).
Proof. To simplify notation somewhat, we introduce the following short-hand notation conventions:Π a = Π a ⊗ I B ∈ B (H AB ) andΠ b = I A ⊗ Π b ∈ B (H AB ). This notation allows us to split up the Hilbert-Schmidt norm squared into four contributions: We can use P 2 i = P i , P i = a −b =i Π a ⊗ Π b and orthogonality relations to simplify terms individually. In The remaining sum can simply be bounded by observing that tr(Π a ) 2 tr(Π a ) 0 and similarly for Π b : This implies the following upper bound on the quadratic variance: tr(Q 2 ) 3 2 tr(P i ) 2 + 1 2 2 |AB| tr(P i ) − 2tr(P i ) = 1 2 tr(P i ) 3tr(P i ) + 2 |AB| − 4 . These terms can be simplified individually. The first term is already almost as simple as possible. For the second term, use P i = a−b=i Π a ⊗ Π b , as well as orthogonality relations among the projectors (Π a Π a = δ a,a Π a and The third term is exactly the same. Terms four and six are also equivalent, because of commutation relations (Π aΠb =Π bΠa ) and cyclicity of the trace. We can use some partial transpose tricks to obtain: Similar tricks apply to the fifth term: The advertised bound now follows from putting everything together.

D2 error bounds and confidence intervals
We can combine the weight estimates (E6), (E5) with the operator norm bound from Eq. (E7) and the squared Hilbert-Schmidt norm bounds from Lemma 5, as well as Lemma 6, and insert them into the general error bound from Eq. (D13). To simplify later computations, we also replace 1/N -factors by 1/ (N −1). For accuracy ∈ (0, 1), we obtain Pr D Here, the first terms in each parenthesis are the leading contributions. The two remaining factors are of order one and do depend on the particular problem in question.

(E9)
In order to get an error bound, we need to answer the following question: How large does N have to be in order to ensure that this upper bound does not exceed δ? This is equivalent to demanding (N − 1) 2 (N − 1)C 1 2 n+m tr(P i ) 2 δ + C 2 2 3(n+m) tr(P i ) 2 δ (E10) and can be answered by solving a quadratic equation in (N −1). Doing so implies the main result of this appendix section. D 2 ). Fix , δ ∈ (0, 1), a bipartition AB, as well as a symmetry sector i. Let This error bound addresses the estimation of D (i) 2 (ρ avg ) in terms of a single U-statistics estimator. The poor scaling in 1/δ can be exponentially improved by dividing the classical shadow into equally-sized batches and performing a median-of-U-statistics estimation instead [28], which reduced the scaling with 1/δ to a scaling with const × log(1/δ). However, numerical experiments conducted in Ref. [21] suggest that this trade-off is only worthwhile if one attempts to predict many properties with the same data set.

Theorem 3 (Error bound for
Alternatively, we can also fix a maximum failure probability δ and a total measurement budget N . Reformulating Rel. (E10) then provides us with an upper bound on the (squared) approximation accuracy. This provides us with a statistically sound confidence interval around the estimated polynomial. Again, median-of-U-statistics estimation allows for improving the dependence on 1/δ exponentially at the cost of a extra constant (Ref. [28], for instance, achieves const ≈ 68).