A hybrid framework for estimating nonlinear functions of quantum states

Estimating nonlinear functions of quantum states, such as the moment $\tr(\rho^m)$, is of fundamental and practical interest in quantum science and technology. Here we show a quantum-classical hybrid framework to measure them, where the quantum part is constituted by the generalized swap test, and the classical part is realized by postprocessing the result from randomized measurements. This hybrid framework utilizes the partial coherent power of the intermediate-scale quantum processor and, at the same time, dramatically reduces the number of quantum measurements and the cost of classical postprocessing. We demonstrate the advantage of our framework in the tasks of state-moment estimation and quantum error mitigation.

Quantum measurement is one of the fundamental building blocks of quantum physics, connecting the quantum world to its classical counterpart.The linear expectation value of the quantum state ρ in the form tr(Oρ) can be measured directly in the basis of the observable O by Born's rule.However, the measurement of nonlinear functions, such as the Rényi entropy and the moment P m = tr(ρ m ), generally involves quantum circuits interfering among m copies of the state ρ by the generalized swap test [1][2][3][4], which transform nonlinear functions to linear ones on all that copies.Although there are significant experimental advances for the m = 2 case [5][6][7][8], it is still challenging to extend to a larger degree m with a moderate system size on current quantum platforms [9].
Recently alternative approaches based on the randomized measurement (RM) [10], such as the shadow estimation [11,12], are proposed.By postprocessing the results from random basis measurements of sequentially prepared states, the shadow estimation is efficient for measuring local observables and the fidelity to some entangled states.However, when measuring nonlinear functions, such as the purity P 2 , RM protocol inevitably needs an exponential number of measurements and postprocessings [12][13][14], which hinders its further applications for large systems.
By trading off the swap test and the RM protocol, here we propose a hybrid framework for estimating nonlinear functions of quantum states to fill the gap between them, which inherits their advantages and reduces weaknesses.Specifically, one can conduct RM on a few jointly prepared copies of the state by utilizing the partial coherent power of the quantum processor, and then estimate many nonlinear functions in a more efficient way, i.e., less demanding on both the quantum hardware and classical post-processing.
The nonlinear function of quantum states {ρ i } m i=1 with some observables {O i } m i=1 of interest reads tr(O which is general and includes, such as the state overlap and fidelity [15,16], the purity and higher-order moments [17,18], quantum Fisher information [19], out-of-timeordered correlators [20,21], and topological invariants [22,23].For simplicity, hereafter we mainly adopt the moment function P m to illustrate the framework, where O i = I d and ρ i = ρ in Eq. (1) FIG. 1. Illustrations of (a) generalized swap test, (b) shadow protocol, and (c) hybrid shadow protocol.All these three protocols can be divided into two phases, the quantum experiment and the classical postprocessing labeled by green arrows.In (c), one measures not only the control qubit but also the last copy of ρ after the random evolution U .The information of U and measurement results bc and b are used to construct the unbiased estimator ρ t according to Eq. ( 6), and thus estimate t-degree function ot = tr Oρ t directly.One can also patch them together to estimate higher-degree functions like Pm = tr(ρ m ) by Eq. (8).
Swap test and RM.-First, let us take a quick re-arXiv:2208.08416v2[quant-ph] 28 Jun 2023 view of the swap test and the RM protocol for measuring P m .The moment can be written as P m = tr(S m ρ ⊗m ), with S m the shift operation on with each {|b } the basis of each copy.Using the swap test, one can measure P m efficiently with complete coherent control over multiple copies of the state, as illustrated in Fig. 1 (a).In particular, one initializes a control qubit and prepares the m-copy state ρ ⊗m , and then conducts the Controlled-shift operation ( Finally one measures the control qubit to get the value of P m [1,2].The corresponding quantum circuit requires a quantum processor with a total of N = nm + 1 qubits. Although the shift operation on m-copy can be expressed as a product of 2-copy swap operations, this approach would significantly increase the quantum circuit depth.Furthermore, preparing m copies of the state in parallel imposes stringent demands on quantum memories.Therefore, the generalized swap test presents significant challenges for cases where m ≥ 3. The RM toolbox [10] was recently developed to ease the experimental challenge mentioned above [24][25][26].Compared with swap test, one only needs to control a single-copy state to realize the estimation.For shadow estimation [12], independent snapshots of the state { ρ (i) } can be constructed using data conllected in RMs, as shown in Fig. 1 (b).This is referred to as the shadow set and has the property that the expectation E( ρ (i) ) = ρ.Denote the random unitary evolution sampled from some ensemble as U ∈ E, and the Z-basis measurement result as where b is a random variable with probability b| U ρU † |b , and the inverse (classical) postprocessing channel M −1 is determined by the chosen E [12,[27][28][29][30].
For instance, E can be the global or local Clifford ensemble, denoted as Clifford and Pauli measurements hereafter, respectively.After constructing the shadow set, the unbiased estimator of P m can be constructed as In principle, any nonlinear function can be obtained with only an n-qubit quantum processor using sequential RMs.However, the number of measurements needed generally scales exponentially with the qubit number, and the scaling becomes worse for larger m values, such as m ≥ 3 [12,13].
Hybrid framework for nonlinear functions.-Bytrading off the quantum and classical resources, here we develop a hybrid framework for nonlinear functions.All proofs and more detailed discussions are left in Ref. [31].
For a nonlinear function of degree m such as P m , where m = L i=1 m i , we demonstrate that it can be estimated using a quantum processor with only N = n(max i m i )+1 qubits.The core idea is to conduct RM on ρ t , where t = m i , by leveraging the coherent operation on t copies of ρ.Instead of directly reading the control qubit outcome to obtain P t , as in the swap test, we perform RM on one of the prepared t copies of ρ.As the permutation symmetry holds, the RM can be performed on any copy, like the final one shown in Fig. 1 (c).By measuring the expectation value of the Pauli-X operator on the control qubit and performing the projective measurement on the final copy, we obtain tr X c ⊗ I The full measurement procedure is listed in Algorithm 1, which aims to construct the shadow set { ρ t (i) } M i=1 of ρ t from the RM results collected in Fig. 1 (c), and these shadow snapshots can be used to estimate more complex nonlinear functions.

Algorithm 1 Hybrid shadow estimation
Input: M × K sequentially prepared ρ ⊗t and control qubit initially set as |0 c .
Randomly choose U ∈ E and record it.

3:
for j = 1 to K do

5:
Measure the control qubit and the final copy of ρ ⊗t in the computational basis {|bc } and {|b }.

6:
Construct the unbiased estimator ρ t (j) (i) using the results b (i,j) c and b (i,j) by Eq. ( 6), where i and j denoting the j-th measurement under the i-th unitary.

7:
end for

8:
Average K results under the same unitary to get Theorem 1. Suppose one conducts the circuit shown in Fig. 1 (c) for once (M = K = 1), the unbiased estimator of ρ t shows such that E {U,bc,b} ρ t = ρ t .Here b c and b are the measurement results of the control qubit and the final copy from ρ ⊗t , respectively; the inverse classical postprocessing M −1 depends on the random unitary ensemble applied.Furthermore, to evaluate o t = tr(Oρ t ) with O being some observable, the variance shows where Var [tr(O ρ)] is the variance of measuring O on the original single-copy shadow snapshot ρ, which can be upper bounded by the square of shadow norm O 0 shadow [12] for the traceless operator Theorem 1 is the central result of this work, which gives the unbiased estimator of ρ t and also relates the statistical variance to the previous single-copy one, i.e., t = 1 [12].Note that the shadow norm is also related to the chosen random unitary ensemble [12].According to Eq. ( 7), the hybrid shadow can dramatically reduce the variance of estimating nonlinear functions compared with the original shadow protocol.Take the Pauli measurement as an example, for a k-local observable O, the shadow norm O 0 ∞ [12] and thus the variance of evaluating tr(Oρ t ) is independent of the total qubit number n by Eq. (7).However, the variance of the original shadow protocol shows an exponential scaling with n [12,13].This point is also clarified by the numerical result in Fig. 3 (a).We will discuss this advantage in detail in the application of quantum error mitigation later.
Furthermore, one can repeat the above procedure for all ρ mi and patch them together to evaluate more complex functions.For P m the unbiased estimator now shows With the hybrid framework, one can equivalently transform an m-degree function in the original shadow protocol in Eq. ( 4) to a lower L-degree one here.This not only reduces the sampling and postprocessing cost, but also makes other postprocessing strategies [18,32] available for higher-degree functions.In particular, the postprocessing cost is reduced from O(M m ) to O(M L ).We take the moment estimation of P 3 as an example to show these advantages.
Besides the functions like o m and P m here, we give the hybrid shadow estimation for more general functions of Eq. (1) in Ref. [31], by directly extending Theorem 1.
Application for the moment estimation.-Wedivide m = 3 to 2 + 1 to estimate P 3 = tr S 2 ρ 2 ⊗ ρ .For ρ 2 , by following Algorithm 1 (K = 1, t = 2) one collects the shadow set one also collects the shadow set of ρ using the original shadow estimation, and then combines two sets to get the estimator of P 3 .
Proposition 1.By combing the shadow sets { ρ 2 (i) } and { ρ (i ) }, one gets the unbiased estimator of P 3 as Suppose one applies the random Pauli measurements, the variance of P 3 can be upper bounded by with M = M for simplicity.
The result of Eq. ( 12) is almost the same as that of P 2 using the original shadow estimation (Eq.(D16) in Ref. [13]).This indicates that the hybrid framework reduces the statistical error from a 3-degree problem to a 2-degree one, with coherent access to a limited (t = 2) quantum hardware.
Moreover, one can adopt another postprocessing protocol [32] with the same measurement data collected in Algorithm 1, and the estimator P 3 is given in Proposition 2. This protocol with Pauli measurements mainly works for 2-degree functions and is proved infeasible for higherdegree ones [14].With the hybrid framework here, one can make it feasible for the 3-degree function P 3 and also reproduce the same variance scaling of P 2 in the original protocol [16,32], indicating the advantage again.Proposition 2. Using the RM results, { b c (i,j) , b (i,j) }, collected in Algorithm 1 for t = 2, one can construct an alternative unbiased estimator of P 3 as For Pauli measurements, the variance is about To complement the above analytical results, in Fig. 3  (c), we numerically study the scaling of the statistical errors for estimating P 3 , using the estimators P 3 in Eq. ( 11) and P 3 in Eq. (13), and also the one from the original shadow protocol [13], in the regime d M (K).They are denoted for short as Hybrid Shadow (HS), Hybrid Random (HR) and Original Shadow (OS), respectively.The numerical results, which correspond to the standard variance, are consistent with the analytical ones, and we summarize them together in Table I.
The statistical errors for estimating P3 with different protocols.The numerical results are from Fig. 3 (d).
The analytical overestimate of the exponential term on d, for two shadow-based protocols OS and HS, due to the fact that the shadow norm is not a tight upper bound for nonlinear functions.
Consequently, in practise one needs M = O(d 1.27 ) for OS, M = O(d 1.05 ) for HS, and K = O(d 0.73 ) for HR to make the error less than some constant.It is clear that HS and HR from the hybrid framework both show an advantage compared to OS, and HR is the most efficient one for P 3 .In Fig. 3 (d), we further find the total number of measurements is about M K = 200d 0.75 to make Error(P 3 ) ≤ 0.1, showing great enhancement than OS protocol [13,19].And the advantage of the hybrid framework is more significant for measuring higherorder moments like P 4 , and we leave more discussions in Ref. [31].
Application in quantum error mitigation.-Recentlypurified-based methods are proposed [33,34] for quantum error mitigation [35][36][37][38] The central task there is to estimate o m := tr(Oρ m ), and use the normalized value o m /P m → tr(OΨ) to approach the target value tr(OΨ) with Ψ the noiseless state, which can suppress the error exponentially with the copy-number m.The original protocol is based on the swap test to measure o m , and very recently there have been ones by shadow estimation [39,40].There is also an experimental advance for that of m = 2 [41], however, similarly it is still challenging to extend both approaches to m ≥ 3.Here we show that the hybrid framework gives various advantages on estimating o m .
Suppose one has access to the coherent operation on m-copy quantum states.By only adopting the classical postprocessing, the shadow set collected in Algorithm 1 for t = m can be reused for many different observables {O i }, say totally T ones, and the estimation error scales like O(log(T )) using the median-of-mean technique [12].However, in principle the swap test approach should adopt different quantum circuits for different observables [34], and also the error would scale linearly O(T ).This advantage is significant for quantum chemistry simulation with the polynomial number of terms in Hamiltonian [41][42][43].On the other hand, compared to OS protocol [39], HS significantly reduces the statistical variance and thus the sampling cost.For instance, suppose one applies Pauli measurements with OS protocol to estimate o m for a local observable O. Since o m = tr(O m ρ ⊗m ) and ) the symmetrized observable is actually a global one with locality about mn [12,13], and thus the shadow norm scales exponentially with n.
While in HS protocol, the variance is independent of the qubit number n by Eq. ( 7).See Fig. 3 (a) for this exponential advantage as t = 2, and similar advantage also appears in the fidelity estimation using the Clifford measurements as shown in Fig. 3 (b), with more discussions left in Ref. [31].
In reality, one generally can not implement Algorithm 1 directly for t = m when m ≥ 3 due to the hardware limitation.Like the moment estimation, one can alternatively patch low-degree snapshots to measure higherdegree o m .For instance, when m = 3 and t = 2, similar as Eq. ( 11), one can construct the estimator , and the error also scales logarithmically with the total number of observebles.In addition, one can construct an alternative estimator o 3 by following Eq.( 13).Similar as in the moment-estimation of P 3 , both estimators show statistical error advantages compared to the original shadow.The details of the unbiased estimator for general observable O and numerical results are left in Ref. [31].
Concluding remarks.-Thehybrid framework pro-posed here utilizes the partial coherent power of quantum devices and can act as a subroutine for many quantum information tasks, for instance, measuring entanglement [44][45][46][47], characterizing quantum chaos [48,49], and constructing quantum algorithms [50][51][52].Note that the framework reduces to previous RM protocols as t = 1 in Algorithm 1, and the advantage essentially comes from the coherent processing the few-copy state.So it is intriguing to build the ultimate result of this replica advantage [53,54] considering the limited quantum memory.Moreover, it is also appealing to extend the current framework to quantum channel [55][56][57][58] and boson or fermion systems [59,60].21 In this Supplementary Material, we give the proof and more discussions and generalizations of the results in main text.In Sec.A, we mainly prove the central result Theorem 1 in main text.In Sec.B, we further generalize Theorem 1, and show the hybrid estimation framework for general nonlinear functions defined in Eq. ( 1) in main text.Sec.C considers the application to the state-moment estimation.Sec.D discusses the application to the virtual distillation, and the hybrid measurement procedure for the general Hermitian operator O is also given.
Here we prove Theorem 1 in main text, related to the single-shot unbiased estimator of ρ t , that is, the M = 1, K = 1 case in Algorithm 1 .And then we extend the result for general M and K in the following subsection.
First, we prove the estimator in Eq. ( 6).
Proof.We take the expectation value of the estimator on the random unitary U and the projective measurement result b, b c to get where in the first line we insert the probability Pr(b, 0/1|U ) accounting for the measurement result 0/1 of the control qubit in the X-basis, and the final line is by Eq. ( 5) in main text.Inserting the result of Eq. (A2) into the final line of Eq. (A1), one gets The equality holds due to the following fact: by taking σ = ρ t , the equation is just the definition of the unbiased estimator for σ in the original shadow protocol [12].
Second, we prove the result of variance in Eq. (7).

Proof. By definition, the variance of tr
with the first term being where we insert the definition of ρ t of Eq. ( 6  By inserting Eq. (A6) into the last line of Eq. (A5), one gets (A7) The last line shows that the expectation value just equals the one in the original shadow protocol on a single-copy of the state ρ, no matter what value t takes.As a result, Note that E tr(O ρ) 2 has already been analysed by maximizing it for all possible ρ, which serves as an upper bound denoted by the square of the shadow norm O shadow [12].Consequently, by ignoring the second term tr(Oρ t ) 2 , one has the upper bound for the variance for any t as (A10) We remark that this reduction of variance of ρ t to the t = 1 case is since one has the ability to coherently control t-copy of the state.The detailed expression of O shadow actually depends on the RM primitives.We show in latter section that by using Eq.(A10) and the results of shadow norm, one can further give upper bounds for variance of more complex estimators, which are combined from a few of ρ t .

Unbiased estimator for general M and K
In Algorithm 1 in main text, one samples M independent U ∈ H d from some unitary ensemble E, and for a given U , i.e., the measurement setting, one repeats the quantum circuit in Fig. 1 (c) in main text for K shots to collect the measurement results b (i,j) c and b (i,j) .Here the superscript i labels the measurement setting, and j labels the shot.We list the estimator in the following proposition.Proposition 3. Following the procedure in Algorithm 1 in main text, one can collect M independent estimators of ρ t , i.e., ρ t Here each ρ t (j) (i) is constructed according to Eq. ( 6) in Theorem 1 by the j-th measurement result b (i,j) c and b (i,j) under i-th measurement setting.To estimate the value of tr(Oρ t ), one can further average total M rounds to get the unbiased estimator Proof.In Appendix A 1, we have proved the result in Theorem 1 that E {U,b,bc} ρ t (j) (i) = ρ t for the single-shot case, that is, for any i, j.Thus the estimators here directly hold since they only involve average operations.
To evaluate the variance of the estimator in Eq. (A12).Note that ρ t are independent and identically distributed (i.i.d.) random variables.As a result, for any i 0 .
As a result, for the general M and K = 1 (which is the case considered in the original shadow estimation [12]), one can directly use the single-shot result in Eq. (7) in main text to bound the variance by dividing it by M .We leave the variance for general K to further study.Hereafter, our discussion will mainly focus on the single-shot scenario with shadow-type postprocessing.

Appendix B: Hybrid framework for general nonlinear functions
In this section, we show that our hybrid framework can work for more general nonlinear functions in the following form, We call it a m-degree function.Here for simplicity we assume that O i is both Hermitian and unitary (e.g., Pauli operators of practical interest) in the following discussion.In the swap test, one prepares the quantum state m i=1 ρ i in parallel and initializes the control qubit, and further operates Controlled-O i gate, denoted as CO i , for each ρ i after the Controlled-shift operation CS m in the quantum circuit, as shown in Fig. 5. Measuring the expectation value of control qubit using Pauli-X operator, one gets tr which is the average of F m and its conjugation F * m .
By additionally introducing the Pauli-Y measurement on the control qubit, one has (B3) By repeating the measurements and combining the Pauli-X and Y measurement results, one can obtain F m .
For the RM with shadow estimation, one can construct the following unbiased estimator, where ρ i is the snapshot for ρ i based on the shadow protocol as shown in Eq. ( 3) in main text.And one can further increase the shadow size to reduce the statistical fluctuation of the estimator.
As mentioned in main text, both methods are challenging to measure F m for a large degree m.For the swap test, it is hard to prepare so many copies of the state and realize the Controlled-shift gate among them; for the RM, the measurement times and the postprocessing budgets are quite annoying.The following hybrid framework can trade off them, and thus shows advantages.

The hybrid framework
We first define the t-degree multiplication with t < m, denoted by σ as and develop a method to perform the RM on it with the help of the partial coherent power of Controlled-shift operation on just t-copy.Here U is the random unitary applied on the t-th state ρt.Uc on the control qubit is also random, determined by the binary random variable c, which corresponds to the X(Y )-basis measurement.
The quantum circuit is similar to that for the swap test method shown in Fig. 5, but we further conduct RM on t-th state ρ t , as shown in Fig. 6.The measurement procedure follows Algorithm 1 .Note here besides the RM on ρ t , we also conduct the Pauli-X or Y measurement on the control qubit, which in some sense is also random.For simplicity we use a random variable c to denote this choice, and c = 0/1 labels the Pauli-X/Y measurement with probability 1  2 .The reason for this choice is to separate σ and its conjugate σ † , similar to the swap test method discussed in Sec.B 1.
For a single-shot of the measurement, one can collect the measurement result b c and b.Here b c denotes the result for the control qubit, and c = 0/1 for the chosen basis.We give the unbiased estimator of σ as follows.
Proposition 4. Suppose one conducts the measurement procedure shown in Fig. 6 for once, the unbiased estimator of σ shows The proof is similar to that of Theorem 1 in Sec.A 1 by further considering the expectation on the index c.
Proof.By definition, the expectation value of the estimator shows with σ defined in Eq. (B5), and the final line follows similarly as in Eq. (B2).Similarly, for the c = 1 case, one has b1 following Eq.(B3).As a result, combing them one has and by inserting it into Eq.(B8) one further has The equality holds by the definition of the unbiased estimator of σ in the original shadow protocol [12].The variance of tr(O σ) can be proved by following a similar way as in Sec.A 1.
Note that the value of tr(O σ) could be a complex number, and thus we take the square of the absolute value here.As in the expectation value case, here we first sum the index b c .Since the phases from b c and c are eliminated, similar as Eq.(A6), the summation about b c is equivalent to taking a partial trace on the control qubit.For c = 0/1, it holds that bc where we use that fact that O i = O † i and O 2 i = I i .By inserting this result to Eq. (B13), one has Here in the last line the expectation value is for the shadow snapshot of the state ρ := (ρ 1 + ρ t )/2 for any t.

(B16)
With the estimator in Eq. (B6) for the t-degree multiplication in Eq. (B5), one can patch a few of them to estimate the m-th degree function F m in Eq. (B1) .First, one can partition F m into L pieces as follows.
(B17) with each of σ i a m i -degree multiplication.Then the estimator of F m reads where each of σ i is the estimator of σ i by Eq. (B6).Indeed, one can increase the number of independent snapshots for each σ i to reduce the variance.We remark that by applying the hybrid framework, one at most needs to parallel prepare max i m i copies of the states, and then conduct the classical postprocessing on L-copy of Hilbert space H d , which significantly reduce the experimental realization and also shadow estimation budget.Moreover, the variance can be analysed using Eq.(B7).We give more detailed discussions on the advantages of the hybrid framework developed in Sec.A and B, by showing the applications for measuring the moments and quantum error mitigation in the following two sections.

(C4)
Here in the second bound for Pauli measurements we assume O = Õ⊗I 2 n−k with Õ restricted on the actively supporting k qubits, and thus supp(O) = k.
For the nonlinear function estimation, suppose ρ 1 and ρ 2 are two distinct estimators and O is some operator on H ⊗2 d , the variance with Clifford measurements can be upper bounded by For Pauli measurements with O = S 2 , the variance has the upper bound (Lemma 1 [21]) We first give the proof of the variance for P 3 in Eq. 12 in Proposition 1.
Proof.Recall that in Eq. ( 11) the unbiased estimator of P 3 shows where ρ 2 i and ρ j come from M 1 and M 2 independent snapshots.By definition, the variance shows If there is no index-coincidence, say i = i and j = j , the expectation for this kind of term returns P 2 3 .Thus one only needs to consider the following cases with the coincidence.
One coincidence with i = i and j = j : Here in the final line we omit the subscript i, and the result is just the variance of O = ρ on the estimator ρ 2 for any i.The final inequality is due to Eq. ( 7) in Theorem 1 .
One coincidence i = i and j = j : The result is just the variance of O = ρ 2 on the estimator ρ for any j, and can be bounded by the shadow norm.Two coincidences i = i and j = j : The result is the variance of O = S 2 on the composite estimator ρ 2 ⊗ ρ, and in the final line σ = ρ 2 / tr ρ 2 is normalized.
By inserting all these cases into Eq.(C7), one gets By applying the results of shadow norm and variance for Pauli measurements in Proposition 1 and taking M 1 = M 2 = M for simplicity, one further obtains by the fact tr ρ 4 ≤ tr ρ 2 2 ≤ 1.By directly applying Chebyshev's inequality, one gets that is sufficient to let the estimation error less than within some confidence level δ, i.e., Prob(|P We remark that the sampling complexity in Eq. (C13) of P 3 by the hybrid framework here is almost the same as that for P 2 by the original shadow estimation (Lemma 2 in Appendix of Ref. [13]), except that there is also a P 2 factor for the second term.This indicates that the hybrid framework reduces the variance and thus the statistical error from a 3-degree problem to a 2-degree one.
Similarly, by applying the results of shadow norm for the Clifford measurement in Proposition 1, and one can transform it to a lower bound of M similarly as Eq.(C13) by using Chebyshev's inequality.We remark that the upper bounds in Eq. (C12) and Eq.(C14) may be not tight, but they already imply the advantage over original shadow estimation.For instance, if one directly applies original shadow estimation for P 3 with estimator like P 3 (OS) ∝ tr S 3 ρ (i) ⊗ ρ (j) ⊗ ρ (k) with Pauli measurements (Eq.(D15) in Ref. [13]), the leading order of the variance looks like (related to the 3 coincidence of the indices) as shown in Eq. (D28) in Ref. [13], which is worse than Eq.(C12).The advantage on the variance of our hybrid framework is also demonstrated by numerical results in Fig. 3 in main text.The variance of the fourth moment P 4 can be bounded analytically in the same way as Var( P 3 ) and shows a similar scaling, and we do not elaborate it here.We also numerically study the scaling of the statistical error with the qubit number n using the Pauli measurement in Fig. 7 (a), which is summarized in the second column of Table II.

Alternative estimators for the moments
In this subsection, we show alternative estimators of P 3 and P 4 , by adopting the postprocessing in Ref. [32] using the same RM data collected in Algorithm 1 for t = 2.We first prove the result of P 3 in Proposition 2.
Proof.In each shot of the measurement, as shown in Fig. 1 (c), the state is initialized to be ρ c,I,II = |+ +| ⊗ ρ ⊗2 denoting the joint state.After the Controlled-S 2 operation and the random unitary evolution, the state before the projective measurement shows As there is no measurement on the first copy of ρ, by taking a partial trace one gets the remaining density matrix on the control qubit and the second-copy as After measuring the second state in computational basis and getting the result b, the (unnormalized) state of the control qubit becomes As a result, by measuring the control qubit in the X-basis, the probabilities for 0/1 outcomes are For any term in the summation of the estimator P 3 in Proposition 2, say the i-the measurement setting, and j, j -th shots, the expectation value shows (here we omit the labels of i, j, j for simplicity) where the third line is according to Eq. (C19).Here we define the diagonal operator and the 2-fold twirling channel can reproduce the swap operator E U U †⊗2 XU ⊗2 = S 2 [32] .Note that the choice of classcial postprocessing X c\p (b, b ) and thus the operator X c\p depend on the random unitary ensemble being n-qubit Clifford or tensor-product single-qubit ones.Then by averaging all the terms in P 3 , one finishes the proof.
Similarly, one can construct an alternative unbiased estimator for P 4 as by adding the information of the control qubit of j -th shot.The proof is quite similar, for any term in the summation the expectation value shows At the end of this subsection, we give a brief discussion on the statistical variance of the estimator P 3 in Proposition 2 by following the approach in the previous works [14,16], and the same analysis works for P 4 .For clearness, here we show the estimator P 3 in Proposition 2 again as It is clear that P 3 is the summation of M i.i.d estimators, so here we first consider the case M = 1, and omit the index i of (i, j) for the i-th setting and also the hat for the random variable for concise.By definition, the variance shows and the first term is Note that {b c , b} here are all random variables, and the calculation of each term in the summation depends on the index coincidence, similar as Eq.(C7) for P 3 .Here for simplicity, we only consider the two-coincidence case, that is j 1 = j 2 , j 1 = j 2 , or j 1 = j 2 , j 1 = j 2 , which contributes to the leading term of the variance [14,16,47], and one can also figure out the other sub-leading terms with less coincidence in a similar way.
For j 1 = j 2 , j 1 = j 2 , there are K(K − 1) terms in the summation of Eq. (C24), and one of them would show as where the final equality follows similarly as Eq.(C20), and the 2-copy twirling channel is denoted as is cancelled, and thus one has ρ but not ρ 2 in the final result.For j 1 = j 2 , j 1 = j 2 also with K(K − 1) terms, and the second equality can follow Eq.(C22).By combing both cases, one has the leading term of the variance as where the inequality is due to ρ − ρ 2 and Φ 2 E (X 2 ) are non-negative.Note that the final bound is actually the leading term of the variance of P 2 by using the original RM [32], which indicates that the hybrid framework reduces the variance of a 3-degree problem to a 2-degree one, similar as in the hybrid shadow estimation shown in Sec.C 2.
The twirling result Φ 2 the swap operator for the i-th qubit pair [16].As a result, for the Clifford measurement primitive, For the Pauli measurement primitive, where A ⊆ [n] is any subsystem of the n-qubit system, ρ A = tr Ā(ρ) the reduced density matrix on A with tr ρ 2 A ≤ 1, and the final equality is by summing the binomial with d = 2 n .As a result, to make the statistical error to be some constant, the shotting time should be K = O(d (log 2 3)/2 ) = O(d 0.79 ).
From Eq. (C29), it is clear that the variance is related to the subsystem purity.For the noisy GHZ state ρ = q |GHZ GHZ| + (1 − q)I/d used in the numerics, here we give a refined estimation of the scaling.Note that for subsystem which is only related to the qubit number in A, and for A = [n] the whole system, tr ρ 2 = q 2 + (1 − q 2 )2 −n .By inserting all these purity into Eq.(C29), one has

(C31)
As a result, the final variance is the interpolation between d log 2 3 and d log 2 2.5 , and thus the shot-number K is between O(d (log 2 2.5)/2 ) = O(d 0.66 ), and O(d 0.79 ), which is consistent with our numerical results.The variance of P 4 in Eq. (C21) can be analysed in a similar way, and the leading term shows the same scaling behaviour as P 3 in Eq. (C28) and (C29).We also numerically study the scaling of the statistical error with the qubit number n using the Pauli measurement in Fig. 7 (a), which is summarized in the final column of Table II.
TABLE II.The statistical errors for estimating P4 with different protocols using the Pauli measurement.The numerical results are from Fig. 7 (a).The analytical results of HS and HR follow similarly as that of P3 in Eq. (C12) and (C29), respectively.The numerical result of OS is not shown since the measurement and post-processing costs are too demanding for the 4-degree function.

Appendix D: Application in quantum error mitigation
In the purification-based quantum error mitigation [33,34], the central task is to measure o m := tr(Oρ m ) and P m .The measurement of P m has been discussed in the previous section.Here we give discussions about estimators of o m and their performance via the developed hybrid framework.hybrid approach; for Clifford measurements, the leading term of the upper bound of the variance is like d tr O 2 /M 2 [12].Thus in each cases, the variance would scale with the Hilbert space dimension, thus exponentially with the qubit number.These exponential advantages are also manifested by the numerical results in Fig. 3(a) and (b) in main text.

Estimators of o3 and o4
To measure o m with a larger m, for instance m ≥ 3, it is challenging to directly collect the shadow snapshot ρ m due to the hardware limitation.Alternatively, here we combine a few low-degree shadow snapshots, as for estimating the moments.In main text, we have shown the estimator of o 3 , and we give that of o 4 by modifying Eq. (C2) as follows.
where { ρ 2 (i) } is the shadow set of ρ 2 .Here O 2 = (OS 2 + S 2 O)/2, and in the second line we put this symmetry in the summation of indices.The variance of o 3 and o 4 can be analysed in the same way as P 3 in Sec.C 2, and show similar behaviour.We also numerically study the scaling of the statistical error with the qubit number n using the Pauli measurement in Fig. 7 (b) and (c), which is summarized in the second column of Table III TABLE III.The statistical errors for estimating o3 with different protocols using the Pauli measurement.The numerical results are from Fig. 7 (b).The analytical results of HS and HR follow similarly as that of P3 in Eq. (C12) and (C29), respectively.Consequently, in practice, one needs M = O(d 1.25 ) for OS, M = O(d 1.05 ) for HS, and K = 0.71 ) for HR to make the error less than some constant.It is clear that HS and HR from the hybrid framework both show an advantage compared to OS.
Similar to the measurement of moments P 3 and P 4 in Sec.C 3, here we give alternative estimators of o 3 and o 4 with the post-processing strategy in Ref. [32].The central idea is to effctively make RMs on ρOρ.
The first method decompose O and additionally applies control operation.Any Hermitian O can be decomposed into the form O =  4) in main text is a special case of (a), when O is also a unitary.In (b), we take the spectral decomposition of O as O = V ΛOV † , and perform the unitary V † on the control qubit.Besides the measurement on the control and final copy, we also conduct projective measurement on the first copy to get the result b1.
Proof.The proof of the unbiasedness of o 3 is similar to that of P 3 in Sec.The proof of the unbiasedness of o 4 in Eq. (D6) is quite similar to that of P 4 in Eq. (C21), and we do not elaborate it here.
The variance of o 3 and o 4 can be analysed in a similar way as that of P 3 , and the leading term shows the same scaling behaviour as in Eq. (C28) and (C29).We also numerically study the scalings of the statistical error with the qubit number n using the Pauli measurement in Fig. 7 (b) and (c), which are summarized in the third column of Table III and Table IV, respectively.The operator O there is taken to be Pauli operator, and thus the control unitary in Fig. 8  The second method is by directly measuring in the diagonal basis of O on the first-copy, and taking into account the measurement result in the final estimator.The advantage compared to the first one is that here one do not need additional control operation.

FIG. 2 .
FIG.2.The demonstration of Eq. (5) by the tensor diagram.The vertical dotted lines denote the periodic boundary condition, i.e., the trace operation.The shift operation St and its conjugate S † t are represented by the cyclic permutations of the indices (legs) of the t copies of ρ.

FIG. 3 .
FIG. 3. Scaling of errors and the measurement times using OS, HS, and HR protocols.Here the state is the noisy nqubit GHZ state ρ = 0.8 |GHZ GHZ| + 0.2I d /d.The random Pauli measurements are used for (a), (c), and (d), and the random Clifford measurements are used for (b).In (a), (b), and (c), we set M = 10 and K = 1 for OS and HS, and M = 2 and K = 5 for HR.In (a), we estimate local observable O = σ 1 Z ⊗σ 2 Z .In (b), we estimate F2 = GHZ| ρ 2 |GHZ with O = |GHZ GHZ|.In (d), we set M = 400 and find the total measurement is about M K = 200d 0.75 to keep the error less than 0.1 using HR.

( 13 )
Here the choice of the postprocessing function, X c (b, b ) = −(−d) δ b,b or X p (b, b ) = n k=1 −(−2) δ b k ,b k , depends on the RM primitives, random Clifford or Pauli measurements.
A1) with Pr(U, b, b c ) the joint probability distribution for these random variables.Given U and b, one can first sum the index b c as bc Pr(b, b

FIG. 4 .
FIG. 4. The demonstration of the derivation of Eq. (A6) by tensor diagram.The vertical dotted lines denote the periodic boundary condition, i,e., the trace operation.The shift operation St and also its conjugate S † t are represented by the permutation of the indices of the t-copy.On account of the identity Ic (partial trace) on the control qubit, there only two terms corresponding to |0 c 0| and |1 c 1| from |+ c +| survive, and two terms both return b| U ρU † |b , i.e., conducting the RM on ρ.
) in main text in the first line and use the self-adjoint property of M −1 to equivalently act it on O in the second line.For simplicity, we omit the subscript {U, b, b c } in the expectation.The interesting point is that the sign with respective to b c disappears due to the square.By first summing the index b c , which implies taking a partial trace on the control qubit, one has bc Pr(b, b A6)   where in the third line only two terms corresponding to the diagonal terms |0 c 0| and |1 c 1| from |+ c +| survive due to the partial trace, in the last line we use S t ρ ⊗t S † t = ρ ⊗t .See Fig.4for a diagram representation of the derivation.

1 .FIG. 5 .
FIG.5.The quantum circuit of the swap test to measure Fm({ρi}, {Oi}) in Eq. (B1).Compared to the circuit in Fig.1 (a), one additionally adds COi operations.The unitary Uc on the control qubit before the projective measurement determines the measurement basis.For the X-basis, Uc = H as in main text; for the Y -basis, Uc = e −i π 4 Z H.

FIG. 6 .
FIG.6.The quantum circuit to effectively conduct the RM on σ in Eq. (B5).Here U is the random unitary applied on the t-th state ρt.Uc on the control qubit is also random, determined by the binary random variable c, which corresponds to the X(Y )-basis measurement.

) 2 ≤ 4 where
such that E {U,c,b,bc} ( σ) = σ.Here b c and b are the measurement results of the control qubit and the final copy ρ t respectively; c and U denote the measurement basis of the control qubit and the random unitary, respectively; the inverse operation M −1 depends on the random unitary ensemble one applied.To evaluate the value of some observable O on σ, the variance of the estimator o = tr(O σ) shows Var [tr(O σ)] = 4 Var tr O 0 ρ + tr(Oρ ) 2 − | tr(Oσ)| Var tr O 0 ρ is the variance of measuring O on the shadow snaptshot of the state ρ = (ρ 1 + ρ t )/2, which can be upper bounded by the square of shadow norm O 0 shadow for the traceless part O 0 = O − tr(O)I d /d.[12].
B8) Here the joint probability distribution Pr(U, c, b, b c ) = Pr(U )Pr(c)Pr(b, b c |U, c), since the the choices of U and c are independent, and Pr(c) = 1 2 for c = 0, 1.Given U and b, and for c = 0 (the X-basis), one can first sum the index b 0 as b0 Pr(b, b 0

Fact 1 .
In the shadow estimation[12], for a single-shot estimator ρ of the quantum state ρ ∈ H d , and an observable O with its traceless part O 0 , the shadow norms of the Clifford and Pauli measurement primitives are respectively upper bounded by O 0 2 shadow(c) ≤ 3 tr O 2 (P roposition S1); O 0 2 shadow(p) ≤ 2 supp(O) Õ 2 2 (P roposition S3).

1 2 OFIG. 8 . 1 2
FIG.8.The quantum circuits of two protocols to measure om = tr(Oρ m ).In (a), we decompose the observable as O =1 2 O ∞ VO + V †O , and perform a Controlled-VO operation on the control qubit and the first copy of ρ after the Controlledshift operation.Fig.(4) in main text is a special case of (a), when O is also a unitary.In (b), we take the spectral decomposition of O as O = V ΛOV † , and perform the unitary V † on the control qubit.Besides the measurement on the control and final copy, we also conduct projective measurement on the first copy to get the result b1.
(a) is directly the Controlled-O operation.The statistical errors for estimating o4 with different protocols using the Pauli measurement.The numerical results are from Fig.7 (c).The analytical results of HS and HR follow similarly as that of P3 in Eq. (C12) and (C29), respectively.The numerical result of OS is not shown since the measurement and post-processing costs are too demanding for the 4-degree function.
2 E (X 2 ) depends on the random unitary ensemble E. For the Clifford measurement primitive, it shows Φ 2 Ec (X 2 c ) = dI d 2 + (d − 1)S 2 ; for the Pauli measurement primitive, it is in the tensor-product form Φ 2 and TableIV, respectively.