A Bayesian analysis of classical shadows

The method of classical shadows proposed by Huang, Kueng, and Preskill heralds remarkable opportunities for quantum estimation with limited measurements. Yet its relationship to established quantum tomographic approaches, particularly those based on likelihood models, remains unclear. In this article, we investigate classical shadows through the lens of Bayesian mean estimation (BME). In direct tests on numerical data, BME is found to attain significantly lower error on average, but classical shadows prove remarkably more accurate in specific situations—such as high-fidelity ground truth states—which are improbable in a fully uniform Hilbert space. We then introduce an observable-oriented pseudo-likelihood that successfully emulates the dimension-independence and state-specific optimality of classical shadows, but within a Bayesian framework that ensures only physical states. Our research reveals how classical shadows effect important departures from conventional thinking in quantum state estimation, as well as the utility of Bayesian methods for uncovering and formalizing statistical assumptions.


INTRODUCTION
Measurement and characterization of quantum systems comprise a long-standing problem in quantum information science [1].However, the exponential scaling of Hilbert space dimension with the number of qubits makes full characterization extremely challenging, inspiring a plethora of approaches designed to estimate properties of quantum states with as few measurements as possible, such as compressed sensing [2,3], adaptive tomography [4][5][6], matrix product state formulations [7], and neural networks [8][9][10].Very recently, a groundbreaking approach known as classical shadows was proposed and analyzed [11].Building on and simplifying ideas from "shadow tomography" [12], the classical shadow was shown to provide accurate predictions of observables with a fixed number of measurements, including simulated examples for quantum systems in excess of 100 qubits [11].Astonishingly simple, the classical shadow is formed by collecting the results of random measurements on a repeatedly prepared input state, and inverting them through an appropriate virtual quantum channel.
However, several features of the classical shadow remain enigmatic, including its highly nonphysical nature, optimality with respect to alternative cost functions, and relationship to more conventional likelihood-based tomographic techniques.One such method, Bayesian mean estimation (BME) [13], provides a conceptually straightforward path to estimate a quantum state given measured data, making use of prior knowledge and providing meaningful error bars for any experimental conditions.BME appears particularly well suited for contextualizing classical shadows, since it returns a principled estimate under any number of measurements (even zero), and is optimal in terms of minimizing average squared error [14].* lukensjm@ornl.gov In this work, we directly compare the estimates of classical shadows and BME for identical simulated datasets.For particular observables with relatively improbable values from the perspective of BME, shadow is found to reach the ground truth with significantly fewer measurements.However, after properly reformulating the problem under test for consistency with the Bayesian prior, the situation reverses, with BME returning estimates possessing lower error on average.In the latter portion of our investigation, we seek to construct a BME model emulating the key features of the classical shadow, but with positive semidefinite states as support.While complicated by the shadow's nonphysical nature, we ultimately propose an observable-oriented pseudo-likelihood that rates quantum states by their observable values with respect to those of shadow.Our pseudo-likelihood successfully mimics the dimension-independence of shadow, with the advantage of delivering entirely physical estimates for any number of measurements.

Problem Formulation
Classical Shadows.For our analysis, we invoke the setup of the original classical shadow proposal [11].Consider a D-dimensional Hilbert space occupied by a ground truth quantum state ρ g that can be repeatedly prepared.On each preparation m, ρ g is subjected to a randomly chosen D × D unitary U m and one measurement is performed in the computational basis, leaving result |b m .Defining |ψ m = U † m |b m , the classical snapshot associated with measurement m follows as M −1 (|ψ m ψ m |), where M(•) is the quantum channel defined by averaging over all possible unitaries and outcomes.
We assume the U m are drawn from the set of D × D Haar-random unitaries, in which case [11].(This channel holds for the more restricted class of random Cliffords as well [15,16].)Averaging over M measurements yields the shadow estimator (In what follows, the phrases "classical shadow," "shadow estimator," and simply "shadow" refer interchangeably to this estimator as well as the procedure more generally.)In this form, the simplicity of ρ s is evident: it is merely a scaled and recentered average of all observed outcomes.Interestingly, though, ρ s is in general not positive semidefinite; for M < D, ρ s possesses at least D − M eigenvalues equal to −1.Accordingly, in the targeted regime for classical shadows of M D, ρ s is highly nonphysical.Understanding the role the shadow estimator's negativity on estimation forms a central goal of the present study.Finally, defining λ as the expectation of the observable Λ (λ = Tr ρΛ), the shadow estimate thereof follows as to be compared to the ground truth λ (g) = Tr ρ g Λ.
As an aside, we note that Ref. [11] employed an additional statistical technique, "median of means," to reduce the impact of outliers by partitioning the M outcomes into K subsets and taking the median as the estimate λ (s) .In the interests of simplicity and ease of comparison, we focus on K = 1 in Eq. (1).We expect the benefits of selecting K > 1 will prove similar in both the shadow and Bayesian cases [17], but work on this is beyond the scope of the present investigation.
Bayesian Mean Estimation.
In the Bayesian paradigm, the same set of measurement outcomes D = {|ψ 1 , |ψ 2 , ..., |ψ M } is related to a possible density matrix ρ(x) via a likelihood consisting of the product of probabilities set by Born's rule: that is, L D (x) ∝ Pr(D|ρ)-the probability of receiving the dataset D given quantum state ρ.Some prior distribution π 0 (x) is also assumed, defined for parameters x such that ρ(x) is always physical: trace-one, Hermitian, and positive semidefinite.Then the posterior describing the distribution of x given the observed data D ensues from Bayes' rule: Note that the randomness of the chosen unitaries U m does not enter the Bayesian model; only the outcomes |ψ m play a role.The selection of unitary U m is independent of the (unknown) density matrix, i.e., Pr(U m = U |ρ) = Pr(U m = U ); thus any probabilities would cancel out through the normalization factor Z in Eq. ( 4).Intuitively, in the Bayesian view the experimenter knows the unitaries exactly post-experiment, regardless of how they were chosen, so imposing uncertainty on them in the estimation process proves superfluous.Consequently, while the uncertainty of BME depends strongly on the variety of measurements chosen, the theory does not, a conspicuous departure from shadow where the distribution of U m enters directly through the inverted quantum channel M −1 (•).Formally, the posterior distribution in Eq. ( 4) completes the Bayesian model.From this, one can estimate any function of ρ(x).For the most direct comparison with the classical shadow, here we focus on BME specifically, which for some observable Λ is the point estimate defined as where the last two lines follow, respectively, from the linearity of the trace operation and defining the Bayesian mean ρ B = dx π(x)ρ(x).This convenient simplification, in which the Bayesian mean of a quantity is simply its value at ρ B , holds for linear functions of ρ, which includes all quantum observables and which we focus on in this article.Moreover, λ (B) is the function of D which minimizes the mean-squared error (MSE) averaged over all possible states and outcomes.That is, with π(x, D) the joint distribution over data and parameters [14].This optimality is nonasymptotic, holding for any number or collection of unitaries {U 1 , U 2 , ..., U M }.Considering the widely different expressions for ρ s [Eq.( 1)] and ρ B [Eq. ( 5)], we found it remarkable just how well ρ s performed in Ref. [11] in light of BME's optimality in Eq. ( 6); it was this feature which initially inspired us to develop a thorough comparison between shadow and BME.
Simulated Experiments In general, comparing the performance of estimators derived from classical (frequentist) statistics-like ρ s -with those from Bayesian methods proves tricky business, since they view uncertainty in functionally different ways.Therefore we adopt a pragmatic view which aligns with the interests of experimentalists: perform experiments, compute the associated shadow and BME estimators, and calculate their error with respect to actual values.While the final step is not always possible in practice, it is in numerical simulation, where the ground truth ρ g is known exactly.Doing so enables us to illuminate the advantages and disadvantages of both approaches on equal footing.We employ the approach described in the "Methods" section for obtaining simulated datasets D.

Comparing Classical Shadows and BME
Picture 1: Fixed Ground Truth.As our first benchmark, we compare the performance of ρ s and ρ B in estimating three rank-1 observables, of which fidelities and entanglement witnesses form an important and experimentally relevant subset.Specifically, we consider where These possess ground truth values equally spaced within the physically allowed range for trace-one, rank-one observables: λ = 1 2 , and λ (g) 2 = 0.The shadow estimator is readily obtained from Eq. ( 1), so we compute ρ s for all M ∈ {1, 2, ..., 1000}, where M defines the set containing the first M measurements: On the other hand, ρ B requires evaluation of the high-dimensional integral dx π(x)ρ(x).To that end, we summon Markov chain Monte Carlo (MCMC) methods, several of which have been explored in the context of quantum state estimation, including Metropolis-Hastings [13,18], Hamiltonian Monte Carlo [19], sequential Monte Carlo (SMC) [20], and slice sampling [21,22].We select the preconditioned Crank-Nicolson algorithm [23] applied in Ref. [24], which to our knowledge is the most efficient BME approach currently available for density matrix recovery.Finally, because of our assumed pure state ground truth, we take as prior all pure states ρ = |ψ ψ| uniformly distributed on the complex D-dimensional unit hypersphere.Numerically, the parameters x reduce to a D-dimensional complex column vector, so we have The use of pure states is not central to the BME formalism whatsoever, but does permit us to simulate in higher dimensions than otherwise possible.With pure states only, our parameterization entails 2D real numbers, compared to 2D 2 + D for mixed states.As an example, for D = 256, the pure state prior, and likelihood of Eq. (3), each MCMC chain takes about ten minutes to converge on our desktop computer, which for the 400 settings involved in Fig. 1(b) amounts to ∼2.5 days.Based on previous studies [24] the mixed state version would therefore have been completely unfeasible at this dimension with our computational resources, likely taking weeks (or more) to complete [25].With pure states, then, we can focus more directly on dimensional scaling and the statistics from many trials.
For each trial, we perform BME for eight collections of measurements M ∈ {1, 50, 100, 200, 400, 600, 800, 1000}.We keep R = 2 10 samples from each chain of length RT , where we select the thinning factor T empirically to obtain convergence.Figure 1  Each column corresponds to a particular expectation value λ n ; the bottom row shows the MSE with respect to the ground truth, averaged over all trials defined as |λ trials with • = s for the shadow and • = B for BME.The classical shadows show wide variation for low M , including highly nonphysical estimates (λ n > 1 or < 0), but they converge to ground truth values rapidly, with nearly identical rates for all observables and dimensions.This is confirmed quantitatively in the MSE curves that attain values of ∼10 −3 by M = 1000 for all cases.
The behavior proves vastly different for BME.While physical estimates are always returned, the number of measurements needed to reach the ground truth varies strongly both with observable λ n and with dimension D. Intriguingly, shadow shows significantly lower MSE for λ 0 and λ 1 , widening as D increases.On first glance, this presents a paradox: Eq. ( 6) implies that λ n convincingly surpasses it these cases.Yet this dilemma can be resolved by studying the prior π 0 (x).When the Bayesian model assigns equal a priori weights to all possible states-a sensible choice for an uninformative prior-this by implication makes observable values such as λ (g) 0 = 1 highly unlikely, since only one state in the domain attains this.On the other hand, expectations for any rank-1 projector Λ on the order of λ ∼ 1 D are to be expected initially since dx π 0 (x) Tr ρ(x)Λ = 1 D .This manifests itself in Fig. 1 in BME's much lower MSE for λ 2 , whose ground truth value λ (g) 2 = 0 is much more probable.Thus, by running 50 repeated trials with the same ground truth ρ g = |0 0|, the situation over which we average does not accurately reflect the uninformative prior; the conditions for BME optimality are not met.
Picture 2: Random Ground Truth.To accurately reflect uninformative prior knowledge, we therefore must prepare random ground truth states in our simulations.To do so, we leverage the equivalence between (i) randomly prepared input states with a fixed observablethe situation of interest-and (ii) random selection of an observable for a fixed input.Consider the expectation of observable Λ, where the quantum state is rotated by some random unitary U : Thus one can emulate the effect of a randomized state by randomly rotating the observable and evaluating it on a fixed state.Practically speaking, we are free to employ the same simulated datasets and estimators ρ s and ρ B above, but select at random a different projector Λ = |φ φ| for each trial.This is equivalent to performing all trials with a random ground truth but a fixed observable.We call this randomized evaluation "Picture 2" to distinguish it from the fixed ground truth case above (Picture 1).
Results appear in Fig. 2 for (a) D = 32 and (b) D = 256.The first column plots the ground truth value λ (g) for each trial, the next three columns plot the shadow and BME estimates for increasing numbers of measurements, and the final column presents the MSE with respect to the ground truth.Now BME returns much more accurate estimates than shadow on average, and the paradox regarding Bayesian optimality is solved: the Bayesian mean gives the lowest MSE as long as the prior accurately reflects the true uncertainty of the system under test.Accordingly, this BME study clarifies an underlying assumption in selecting observables in Picture 1: being able to "guess" an observable with such high overlap to the ground truth suggests that one is not really operating under the neutrality implied by a uniform prior; an informative prior would more accurately reflect the situation.
This observation brings to light an interesting question of motivation in a given quantum experiment.In the sense of ensuring that any estimate is adequately justified by the data, the idea of "baking in" a prior favoring some subset of quantum states is undesirable.And yet, in many situations the researcher does have strong beliefsor at least hopefulness-about the state being prepared, and wants to verify this by computing an observable, such as fidelity, where it is desired that λ (g) ∼ 1.In this case, one wishes to validate such high values quickly with few measurements, but likely does not care so much about how well the procedure can estimate the ground truth when it is low (e.g., when λ (g) ∼ 1 D ), since this situation suggests a poorly prepared state anyway.Accordingly, the felt cost is stronger when error is higher for situations with λ (g) 1 D than when λ (g) ∼ 1 D , which is not captured by the standard MSE as expressed in Eq. ( 6).And as shown in our tests here, it is precisely these improbable situations wherein shadow excels over BME.Thus our simulations reveal one surprising reason classical shadows are so powerful: they perform well within those subspaces of the entire Hilbert space which are of interest to a high-fidelity system.

Emulating Classical Shadows with BME
Pseudo-Likelihood Formulation.
The dimensionindependence and rapid convergence of classical shadows for cases of interest indicate the value of a Bayesian version with similar features, both to gain further insight into shadow itself and to improve thereon by ensuring only physically acceptable states.A simple approach for custom Bayesian models, gaining traction in "probably approximately correct" (PAC) learning [26], proposes use of a pseudo-likelihood that rates a prospective state's suitability through a cost function, instead of a full likelihood based on a physical model.In quantum state tomography in particular, quadratic costs of the form ρ − ρ 2 F have been explored [18,24], where ρ signifies some point estimator and A F =

√
Tr A † A the Frobenius norm.Therefore, to obtain a physical state with properties similar to ρ s , we first suggest the pseudo- likelihood The constant K establishes the relative weight of prior and likelihood.Previously, we suggested K = M for reasonable uncertainty quantification [24]; here we consider K = M D to impart dimension-independence. (Incidentally, we have found no significant modifications to the results below when testing with K M D.) Figure 3 = g|ρ s |g ∼ 1 (cf.Fig. 1), it is odd that predictions using a BME value maximizing ψ|ρ s |ψ looks so different for D = 256.The origin of this discrepancy, however, lies in ρ s 's nonphysicality.
Plotting the average overlap between shadow and Bayesian samples (Tr ρ B ρ s ) in Fig. 3(c) and (d), we find that ρ B overlaps with ρ s more strongly than the ground truth ρ s = |g g|.Because ρ s is not positive semidefinite, Tr ρ B ρ s > 1 for all cases examined.Thus the BME procedure succeeds in finding states with strong overlap to the shadow, but the closest physical state to ρ s is not the ground truth, even though g|ρ s |g ∼ 1. Intuitively, this nonphysicality helps explain why observables with highly improbable values from the Bayesian view are estimated so much more efficiently with shadow.For a parameterization over physical states and rank-1 observable Λ, only a single state in the Hilbert space attains λ = 1, and since this represents the maximum value possible for any valid quantum state, it can only be approached from below.On the other hand, a continuum of shadow estimators ρ s permit λ = 1, for ρ s is constrained only by Hermiticity and unit-trace-not positive semidefiniteness.Therefore the estimate λ (s) can err on either the high or low side (cf.Fig. 1), pulling the shadow more rapidly to the ground truth in these extreme cases.
This discloses the second central finding of our investigation: the nonphysicality of ρ s is not a deficiency, but rather critical to obtaining dimension independence.Thus the key features of the shadow are not necessarily translated onto physical projections like the BME model here [27] or, for that matter, alternative projected-leastsquares approaches [28,29].While strange from the conventional wisdom of maximum likelihood and Bayesian mean estimation, nonphysical states are actually beneficial for classical shadows.
Observable-Oriented Pseudo-Likelihood.Deriving a positive semidefinite model emulating classical shadows remains an intriguing question, however, to eliminate unphysical estimates while retaining the favorable scaling features.With projecting directly onto ρ s proving unfruitful to this end, we note that, indeed, ρ s was never intended to serve as an accurate substitute for the true ρ g ; instead it facilitates estimates of observables [11].Accordingly, we propose the "observable-oriented pseudolikelihood" where we insert the estimates λ (s) n of N observables from ρ s .This formalism ensures only physical values are returned [through the prior π 0 (x)], and rates the fitness of proposed states through their overlap with respect to shadow's predictions of observables only.For dimensionindependence, we again set K = M D and perform BME for all simulated datasets and N = 3 above, thinning to T = 2 10 (T = 2 13 ) for D = 32 (D = 256).
The results follow in Fig. 4. Now BME shows very similar behavior to shadow: the MSE with respect to the ground truth matches shadow results from Fig. 1 closely, though BME still outperforms for λ 2 .Yet unlike shadow, BME here always gives physically permissible estimates (λ ).This pseudo-likelihood therefore attains the goal of a BME model commensurate with classical shadows.
Yet it is important to emphasize that this approach depends heavily on the quality of the classical shadow.It refines estimates from the shadow with its positive semidefinite requirement, but it does not do markedly better at estimating the ground truth state-at least for arbitrary observables.As an example, we repeat the inference procedure for an observable-oriented pseudolikelihood based solely on Λ 1 , i.e., which has ground truth value λ (g) 2 = 1 2 .Results for the D = 32 case appear in Fig. 5, where we plot the Bayesian estimates for all three observables even though the psuedo-likelihood is based on λ 1 only.The estimate λ (B) 1 closely matches shadow as designed, and λ (B) 2 agrees with the ground truth well, due to the fact its value is highly probable for a uniform prior.But λ When using the pseudo-likelihood above, all quantum states with identical overlap to Λ 1 are equally probable, of which the ground truth ρ g represents just one possibility.The estimate of λ 0 given only λ 1 information reflects the inherent uncertainty within this specification.So to summarize, our observable-oriented pseudo-likelihood builds physicality into shadow, yet it can only (in general) accurately predict the N observables injected into it: to infer quantities beyond these N can prove unreliable.

DISCUSSION
Our numerical investigations here have elucidated two fascinating features of classical shadows: 1. Classical shadows perform extremely well at predicting "unlikely" observables, i.e., those which obtain high values only on a restricted subset of states within the complete Hilbert space.
2. The nonphysicality of classical shadows is critical to their dimension-independence and accuracy under few measurements.
These findings do not contradict the optimality of Bayesian methods expressed in Eq. ( 6): BME with a full likelihood minimizes MSE for any number and collection of measurements, provided the prior distribution accurately reflects the true knowledge involved.The predictive power of ρ s , then, derives from the fact that the situations in which it is much more accurate that BME are often of particular interest in practice, such as verification of a high-fidelity or highly entangled quantum state.Desiring to extend these features in the Bayesian context, we proposed an observable-oriented pseudo-likelihood that attains shadow's dimension-independence and statespecialized accuracy, with the advantage of guaranteed physicality.
Nonetheless, in all these explorations there remains one prominent sense in which classical shadows unquestionably eclipse BME: computational efficiency.The shadow estimator ρ s is formed directly from measurements for any dimension D; yet computing ρ B requires tedious MCMC methods, with the number of parameters increasing linearly (quadratically) with D for a pure (mixed) state prior.Here we considered up to D = 256, a far cry from the D = 2 120 example in Ref. [11], where there is no hope for BME with a parameterization such as ours.Moving forward, it would therefore seem profitable to explore simplified Bayesian models that maintain a fixed parameter dimensionality even as the Hilbert space grows exponentially.For example, if one could specify a prior and likelihood on an observable λ only, to the effect of π(λ) ∝ L D (λ)π 0 (λ), the inference procedure would not be limited directly by exponentially large Hilbert spaces.In this way, Bayesian methods could be extendable to the types of quantum systems sought for practically useful quantum computation.
Overall, our analyses have revealed the value of BME as a tool for shedding light on estimation procedures which formally have no connection to the Bayesian paradigm.The numerical simulations here reveal the complementary strengths of classical shadow and Bayesian tomographic approaches in the efficient estimation of quantum properties.And so we expect valuable opportunities for both methods as quantum information processing resources continue to mature in size and complexity.

Data Simulation Approach
The method of classical shadows introduced in Ref. [11] involves application of a Haar-random (or effectively Haar-random) unitary U followed by measurement in the computational basis.We exploit the fact that our target state is pure to substantially reduce the complexity of simulating this procedure.In particular, our simulation method requires the generation of only size-D random vectors rather than D × D random unitaries.
Without loss of generality we work in a rotated basis such that the first basis state coincides with the ground truth: ρ g = |0 0|.Then the probability of observing outcome j depends only on That is, the distribution of outcomes depends only on the first row of U † .Now, when U is Haarrandom, each individual row and column of U † is a uniformly distributed length-1 vector u.Furthermore, given any component u j , the remaining components are a uniformly distributed vector of length 1 − |u j | 2 .A uniformly random vector u, corresponding to the first row of U † , may be obtained by generating D complex normal random values and normalizing them to yield a unit length vector.An outcome n ∈ {0, 1, . . ., D − 1} is then chosen with probability |u n | 2 .This selects the nth column of U † .Since this column (whichever it is) is uniformly distributed, its remaining elements are uniformly distributed with length 1 − |u n | 2 .The explicit procedure is as follows: 1. Posit a measurement unitary , where each φn is a column vector corresponding to one of the D possible output states.∼ N (0, 1) + iN (0, 1) and normalize These define projections of the unitary's basis states on the ground truth: u n = 0| φn , or in other words, the elements in the first row of U † m .

Finally, take
as the measured state.
Utilizing this method, we performed 50 independent trials with 1000 measurements each, for Hilbert space dimensions D = 32 and D = 256, giving a total of 100 datasets which are used in all subsequent tests above.The two values of D were selected specifically to clarify how classical shadows and BME differ in their scaling with dimension.
possible MSE for any n and M , and yet λ (s)

FIG. 1 .
FIG. 1.Comparison of shadow and BME estimates of λn for (a) D = 32 and (b) D = 256.Results from fifty trials for each dimension are plotted, assuming a fixed ground truth state |0 .

FIG. 2 .
FIG. 2. Estimating rank-1 observable Λ for randomly chosen ground truth states (Picture 2).(a) D = 32 case.(b) D = 256 case.The first four columns show λ values for each trial; the last column plots MSE with respect to ground truth over all trials.

2 F
(a) and (b) show the BME results obtained for D = 32 and D = 256, respectively, where we again return to Picture 1 with fixed ground truth for all trials.For the tests here, thinning of T = 2 8 (T = 2 10 ) is used for the D = 32 (D = 256) MCMC chains.Compared to the shadow results of Fig. 1, the BME predictions still do not reach ground truth values for λ 0 and λ 1 efficiently.This proves intriguing, since ρ−ρ s with ρ = |ψ ψ| is minimized precisely by states for which ψ|ρ s |ψ is large.So if λ (s) 0

2 .
Generate D complex normal samples w n i.i.d.