Efficient characterizations of multiphoton states with an ultra-thin optical device

Metasurface enables the generation and manipulation of multiphoton entanglement with flat optics, providing a more efficient platform for large-scale photonic quantum information processing. Here, we show that a single metasurface optical device would allow more efficient characterizations of multiphoton entangled states, such as shadow tomography, which generally requires fast and complicated control of optical setups to perform information-complete measurements, a demanding task using conventional optics. The compact and stable device here allows implementations of general positive operator valued measures with a reduced sample complexity and significantly alleviates the experimental complexity to implement shadow tomography. Integrating self-learning and calibration algorithms, we observe notable advantages in the reconstruction of multiphoton entanglement, including using fewer measurements, having higher accuracy, and being robust against experimental imperfections. Our work unveils the feasibility of metasurface as a favorable integrated optical device for efficient characterization of multiphoton entanglement, and sheds light on scalable photonic quantum technologies with ultra-thin optical devices.


INTRODUCTION
Metasurface, an ultra-thin and highly integrated optical device, is capable of full light control and thus provides novel applications in quantum photonics [1].In photonic quantum information processing, multiphoton entanglement is the building block for wide range of tasks, such as quantum computation [2], quantum error correction [3], quantum secret sharing [4,5], and quantum sensing [6].Recent investigations highlighted the feasibility of metasurface in generation [7,8], manipulation [9][10][11], and detection [12,13] of multiphoton entanglement, indicating metasurface as a promising technology of ultra-thin optical device for large-scale quantum information processing.
Characterization of multiphoton entanglement provides diagnostic information on experimental imperfections and benchmarks our technological progress towards the reliable control of large-scale photons.The standard quantum tomography (SQT) [14] requires an exponential overhead with respect to the system size.Recently, more efficient protocols have been proposed and demonstrated with fewer measurements, such as compressed sensing [15,16], adaptive tomography [17][18][19] and selfguided quantum tomography (SGQT) [20][21][22].Shadow tomography, which was first proposed by Aaronson et al. [23] and then concreted by Huang et al. [24], efficiently predicts functions of the quantum states instead of state reconstruction.Huang's protocol [24] is hereafter referred as shadow tomography.Shadow tomography is efficient in estimation of quantities in terms of observable (polynomial), including nonlinear observables such as purity and Rényi entropy [25][26][27][28], which is of particular interest in detecting multipartite entanglement [29][30][31][32] and thus is helpful in benchmarking the technologies towards generation of genuine multipartite entanglement [33][34][35].Nevertheless, shadow tomography generally requires the experimental capability of performing information-complete measurements, leading to the consequence that the time of switching experimental setting is much longer than that of data acquisition.A potential solution is to replace the unitary operations and projective measurements with positive operator valued measures (POVMs), which is capable to extract complete information in a single experimental setting [36][37][38].The POVM significantly alleviates the experimental complexity to perform shadow tomography, and thus enables the real-time shadow tomography, i.e., an experimentalist is free to stop shadow tomography at any time.However, a compact and scalable implementation of POVM in optical system is still technically challenging.On the other hand, shadow tomography is not able to easily predict the properties that cannot be directly expressed in terms of observables (polynomial) such as von Neumann entropy S(ρ) = −Tr(ρ log ρ), which is key ingredient in topologi- cal entanglement entropy [39,40].
In this work, we report an implementation of POVM enabled by a metasurface, which is based on planar arrays of nanopillars and able to provide complete control of polarization.The POVM we achieved allows to implement real-time shadow tomography, and observe the shadow norm that determines sample complexity.Moreover, we show that the metasurface-enabled shadow tomography can be readily equipped with other algorithms, enabling the unexplored advantages of shadow tomography .In particular, we propose and implement shadow tomography optimized by simultaneous perturbation stochastic approximation (SPSA) [41], the socalled self-learning shadow tomography (SLST).SLST efficiently returns a physical state with high accuracy against the metasurface-induced imperfections, which can be further used to calculate the quantities that cannot be expressed in terms of directly observable.We also implement robust shadow tomography [42] to show the robustness of reconstruction against the engineered optical loss.

Shadow tomography with POVM.
We start by briefly reviewing the shadow tomography with POVM.
Considering a 2-level (qubit) quantum system, a set of L rank-one projectors {|ψ l ⟩⟨ψ l | ∈ H 2 } L l=1 is called a quantum 2-design if the average value of the secondmoment operator (|ψ l ⟩⟨ψ l |) ⊗2 over the set is proportional to the projector onto the totally symmetric subspace of two copies [43].Each quantum 2-design is propor- L |ψ l ⟩⟨ψ l | being positive semidefinite and satisfying L l=1 E l = 1 2 .Note that quantum 1-design is sufficient to form a POVM but is not always informationcomplete for tomography, such as the measurement on computational basis {|0⟩, |1⟩}.Measuring a quantum state ρ using POVM E results one l ∈ [L] outcome with probability Pr(l|ρ) = Tr(E l ρ) according to Born's rule.The POVM E together with the preparation of the corresponding state |ψ l ⟩ can be viewed as a linear map M : H 2 → H 2 , and the 'classical shadow' is the solution of least-square estimator with single experimental run, ρ(m) For an N -qubit state, the classical shadow is the tensor product of simultaneous single-qubit estimations ρ(m) =  1b).Note that projection on |ψ l ⟩ with equal probability is guaranteed with post-selection to eliminate the mode mismatch between incident light (Gaussian beam) and metasurface (square) (see Supplementary Note 5 for details).The metasurface is an array (with square pixel of s = 500nm) of single-layer amorphous silicon nanopillars on quartz substrate as shown in Figure 1c and Figure 1d.The nanopillars are with the same height of 700nm but different l x ′ , l y ′ and orientation θ relative to the reference coordinate system.In this sense, a single nanopillar can be regarded as a waveguide with different rectangular cross profile that exhibits corresponding effective birefrin-gence, leading to spatial separation between orthogonal polarizations [44].The metasurface is divided into three regions with same size of 210µm×70µm but different arrangement of nanopillars, i.e., (θ, l x ′ , l y ′ ).By carefully designing the arrangement of nanopillars, we can realize spatial separation of |H⟩/|V ⟩, |+⟩/|−⟩ and |R⟩/|L⟩, respectively.To validate the capability of fabricated metasurface to perform information-complete measurement, we test metasurface with input states of |ψ l ⟩ and measure the distribution of output intensity on focal plane.The results are shown in Figure 1e, according to which we reconstruct the Stokes parameters (s 1 , s 2 , s 3 ) shown in Figure 1f.Compared to the ideal values, the average errors of reconstructed Stokes parameters are 0.101±0.005,0.086±0.005and 0.073±0.005,respectively.These errors are mainly caused by the discretization of phase front in design, which inevitably introduces higher-order deflections [45] (see Supplementary Note 4 for details of metasurface).Estimation of observables.We first perform shadow tomography with the fabricated metasurface on singlephoton pure state |ψ γ,ϕ ⟩ = cos γ|H⟩ + sin γe iϕ |V ⟩ with γ = 0.91 and ϕ = 0.12.As shown in Figure 2a, the polarization-entangled photons (central wavelength of 810 nm) are generated from a periodically poled potassium titanyl phosphate (PPKTP) crystal placed in a Sagnac interferometer via spontaneous parametric down The sample complexity of estimation is further characterized by the variance Var( Ô) = Var(ô) ≤ ||O|| 2  shd .Here the shadow norm ||O|| 2 shd [24] is the maximization of Var(ô) over all possible states ρ to remove the statedependence.For ideal E octa , the shadow norm ||O|| 2 shd = 0.75 regardless of the explicit form of O ∈ O (see Sup-plementary Note 1D for deviation of ||O|| 2 shd = 0.75).Experimentally, the variance of a single-shot estimation is It is impossible to maximize Var(ô) over all possible |ψ γ,ϕ ⟩ in experiment, so that we prepare totally 20 |ψ γ,ϕ ⟩ that are uniformly distributed on Bloch sphere, forming a state set of P .For each prepared |ψ γ,ϕ ⟩, we perform shadow tomography and estimate the expectation of O = |+⟩⟨+|.The results of max P Var(ô) are shown in Figure 2d, in which we observe that max P Var(ô (m) ) converges to 0.75 when M > 255.In Figure 2e, we show max P Var(ô) of 128 observables O ∈ O with M = 315 measurements, which agrees well with the theoretical prediction that the shadow norm is a constant regardless of the explicit form of O ∈ O [38].To give a comparison, we simulate max P Var(ô) with symmetric informationally complete (SIC) POVM E SIC [46], which is constructed with the minimal number of 4 measurements for qubit system and has been widely adopted in investigations of advanced tomography [47][48][49].As shown in Figure 2e, the shadow norm with E SIC depends on observable O and generally larger than that with E octa , which indicates E octa requires less shots M than E SIC to achieve the same accuracy of estimation Ô.

State reconstruction.
The direct estimation from classical shadows ρ(m) , i.e., ρ = 1/M M m=1 ρ(m) , is generally not a physical state with finite M measurements, which limits the application of shadow tomography in estimation of nonlinear functions [37,50].Physical constraints need to be introduced to enforce the positivity of the reconstructed state τ , which can be addressed by solving the optimization problem where τ is the proposed state that is positive semidefinite (τ ≥ 0) with unit trace (Tr(τ ) = 1), and the cost function NF (τ ) is the unbiased estimator of squared Frobenius norm with {ρ (m) } (see Supplementary Note 2A for more details).Note that the squared state fidelity adopted in SGQT [20][21][22] is not an unbiased estimator with {ρ (m) } for mixed state.We employ an iterative self-learning algorithm, i.e., SPSA algorithm, to solve the optimization problem in Eq. ( 3).SPSA is especially efficient in multiparameter optimization problems in terms of providing a good solution for a relatively small number of measurements of the objective function [51], which holds the similar spirit as shadow tomography.In traditional maximum likelihood estimation (MLE) reconstruction [52], the computational expense required to estimate gradient direction is directly proportional to the number of unknown parameters (4 N − 1 for an N −qubit state) as it approximates the gradient by varying one parameter at a time, which becomes an issue when the number of qubit is large.In SPSA, the minimization of cost function NF (τ ) is achieved by perturbing all parameters simultaneously, and one gradient evaluation requires only two evaluations of the cost function.While SPSA costs more iterations to converge, it returns state with higher fidelity in limited number of iterations compared to MLE [21].More importantly, SPSA formally accommodates noisy measurements of the objective function, which is an important practical concern in experiment.
Generally, an N -qubit state τ can be modeled with d 2 parameters with d = 2 N being the dimension of τ .Thus, the proposed state τ is determined by a d 2 . SPSA optimization estimates the gradient by simultaneously perturbing all parameters r i in a random direction, instead of individually addressing each r i .In k-th iteration, the simultaneous perturbation approximation has all elements of r k perturbed together by a random perturbation vector with ∆ ki being generated from Bernoulli ±1 distribution with equal probability.Then the gradient is calculated by and r k is updated to r k+1 by  3a, where the average fidelity increases as M increases and achieves 0.992±0.001with M = 315 measurements.The fabricated metasurface is also capable to collect data required for state reconstruction with other technologies, i.e., SGQT [20][21][22] and MLE reconstruction (see Supplementary Note 3 for demonstration of SGQT).In SGQT, two projective measurements are performed with 7 experimental runs in each iteration, and SPSA is used to update the proposed state τ SGQT .The results of F = Tr(τ SGQT |ψ γ,ϕ ⟩⟨ψ γ,ϕ |) are shown with yellow triangles in Figure 3a, in which we observe an average fidelity of 0.983±0.003after 45 iterations (total experimental runs of 315 as the same as that in SLST).The results of MLE reconstruction F = Tr(τ MLE |ψ γ,ϕ ⟩⟨ψ γ,ϕ |) are shown with cyan squares in Figure 3a.When M is small (M < 60), MLE reconstruction is more accurate than SGQT.However, SLST always exhibits higher accuracy compared to other techniques with the same number of experimental runs.It is worth noting that the average fidelity with MLE reconstruction converges to 0.93 ± 0.01 and the error of reconstruction is about 0.07, which is consistent with errors in estimation of observables in Figure 2b.Although the error of metasurface reduces the accuracy of shadow tomography and MLE reconstruction, SLST and SGQT with SPSA optimization can dramatically suppress metasurface-induced error as SPSA can accommodate noisy measurements of the cost function.The accuracy of SLST does not keep increasing with the number of iterations as reflected in Figure 3b, where the converged fidelity depends on the number of experimental runs M in classical shadow collection.
We also demonstrate SLST on two-photon entangled states |ψ⟩ η = √ η|HV ⟩ + √ 1 − η|V H⟩ with η = 0.06, 0.37 and 0.87.In two-photon SLST, one photon is detected by metasurface-enabled E octa , and the other photon is detected by randomly choosing σ x , σ y and σ z measurements realized by an E-HWP and an E-QWP.In contrast to the single-photon state, the generated two-photon state ρ η is far from pure state as it is affected by more noises that are mainly attributed to high-order emission in SPDC and mode mismatch when overlapping two photons in Sagnac interferometer.Thus, the proposed state τ k should be a mixed state in general form of τ k = T † T with T being a complex lower triangular matrix (see Supplementary Note 2C for details).Accordingly, the accuracy of reconstruction is characterized by the fidelity between returned state τ k and ρ η , where ρ η is MLE reconstruction with large amount of data (M ≈ 8 × 10 5 ) collected from bulk optical setting (waveplates and PBS).Robust shadow tomography.Finally, we demonstrate robustness of SLST can be further improved by robust shadow tomography [42,54].Considering that the measurement apparatus are noisy, the measurement apparatus can be calibrated prior to performing SLST.To this end, shadow tomography is firstly performed on high-fidelity state |HH⟩ with M ′ experimental runs to calculate the noisy quantum channel M. Consequently, the classical shadow is constructed by the noisy channel, i.e., ρ(m) = M −1 (|ψ l1 ⟩⟨ψ l1 |⊗|ψ l2 ⟩⟨ψ l2 |) (See Supplementary Note 1B for details of robust shadow tomography).The framework of robust shadow tomography is valid in our experimental setting.Firstly, although two photons are detected with different measurement devices, i.e., one is the metasurface-enabled POVM while the other is randomly detected on three Pauli bases, the mathematical models of these two measurement devices are identical.Secondly, although the metasurface-induced measurement errors are different between six projections, it has been shown that gate-dependent noise can be suppressed by robust shadow tomography [42].Finally, the experimental device is able to generate |HH⟩ with sufficiently high fidelity.Otherwise, the noise in state preparation might be added in M, which introduces biased estimation of returned state.In our experiment, the fidelity of prepared |HH⟩ is 0.9956±0.0005with respect to the ideal form.To demonstrate robust SLST, we insert a tunable attenuator before metasurface to introduce optical loss from 1.5dB to 8.6dB, which accordingly reduces the fidelity of prepared state as reflected by the MLE reconstruction shown in Figure 4. Compared to SLST, robust SLST is able to enhance the accuracy of reconstruction in the presence of optical loss, especially at the high-level optical loss.It is worth mentioning that SLST itself can accommodate metasurface-induced measurement errors so that the enhancement of robust SLST is not significant when optical loss is zero.Increasing the optical loss is equivalent to stronger measurement noise.We observe the significant enhancement of robust SLST at high-level optical loss, which indicates robust SLST can further improve the robustness of SLST against noise (See Supplementary Note 1C for numerical simulations of robust SLST).

DISCUSSION
We propose and demonstrate POVM with a single metasurface that enables implementation of real-time shadow tomography and observation of sample complex-ity.Together with the developed SLST, the underlying quantum states can be reconstructed efficiently, accurately and robustly.The advantages are evident even in single-and two-photon polarization-encoded states.The concept of octahedron POVM can be readily realized with integrated optics, where the directional couplers and phase shifters are able to construct octahedron POVM encoded in path degree of freedom.Metasurface-enabled POVM is particularly promising for efficient detection of scalable polarization-encoded multiphoton entanglement, in which two measurement devices are sufficient for full characterization [55].Our investigation is compatible with metasurface-enabled generation [7,8] and manipulation [9][10][11] of photonic states, thereby opening the door to quantum information processing with a single ultra-thin optical device.

Fabrication of metasurface.
A 700 nm-thick layer of a-Si is deposited on top of 750 µm-thick fused quartz wafers using the low-pressure chemical vapor deposition (LPCVD) technique.Then a layer of AR-P6200.09resists (Allresist GmbH) with a thickness of 200 nm is spun and coated on the substrate.The metasurface pattern is generated with electron-beam lithography (EBL) process which is set with 120 kV, 1 nA current and 300 µc cm −2 dose.Subsequently, the resist is developed with AR300-546 (Allresist GmbH) for 1 min.Reaction ion etching (RIE) is performed to transfer the nanostructures to a-Si film.The residue resist is removed by immersing the chip first in acetone for 5 min, then in isopropanol for 5 min and finally in deionized water.Experimental setup to implement SLST with metasurface.Metasurface is fixed on a piece of hollow plastic, which can be adjusted in six degrees of freedom through a six-dimensional rotation stage.Objective lens with 20× magnifying factor and tube lens with the focal length of 200 mm is used as a microscope, enlarging the distance of six spots focused by metasurface from 70 µm to 1.9 mm.Then, three prisms at different heights are applied to separate six light beams.Four mini lenses with f = 15 mm and two mini lenses with f = 30 mm are used to couple the six beams into six multi-mode fibers with the core diameter of 62.5 µm.

A. Constructing classical shadow with POVM
Consider a d-level quantum system, a set of L rank-one projectors {|ψ l ⟩⟨ψ l | ∈ H d } L l=1 is called a quantum (complex projective) 2-design if the average value of second moment (|ψ l ⟩⟨ψ l |) ⊗2 over the set {|ψ l ⟩} is proportional to the projector onto the totally symmetric subspace of two copies where Following the defining property Supplementary Eq. ( 5), each quantum 2-design is proportional to a POVM since the elements E l are positive semidefinite and satisfy Measuring a quantum state ρ results in one of the L outcomes indexed by l ∈ [L], and by Born's rule, the corresponding probability Hereafter we focus on the case of d = 2.The POVM defined in Supplementary Eq. ( 6) (together with the preparation of the corresponding state |ψ l ⟩) can be viewed as a linear map M : H 2 → H 2 as follows The inverse of this map is For a single experimental run by performing POVM on ρ, we obtain the random outcome l with probability Pr(l|ρ), and the classical shadow is constructed according to Supplementary Eq. ( 9) as It exactly reconstructs the underlying quantum state ρ in expectation where the third line is due to Supplementary Eq. ( 8).For an N -qubit state ρ ∈ H ⊗N 2 , the POVM E ⊗N acts on each qubit independently yields outcome as a string where For a single experimental run, the classical shadow shows ρ = which is in a tensor-product form.

B. Calibration of quantum channel M
In practice, there are unavoidable noises in unitary operations and measurements.To address this issue, robust classical shadow (RShadow) protocol was proposed to mitigate the noise in shadow tomography [42].It is convenient to represent M as a matrix L M in Pauli-Liouville representation.Here a linear operator X is represented by a column vector |X⟩⟩ j = Tr(σ j X) in the Pauli-basis with In this way Supplementary Eq. ( 8) can be expressed as The matrix form L M of the channel M corresponds to the projector onto the subspace spanned by E, which is given by and the classical shadow is where Accordingly, one has which is a depolarizing channel.Accordingly, the classical shadow in Supplementary Eq. ( 10) can be written as Similarly, for an N-qubit state ρ, Then we have and the classical shadow is The quantum channel in Supplementary Eq. ( 21) can be written as where λ is an n-bit vector denoting the subspaces due to the irreducible representation, and Π λ = N n=1 Π λn in the tensor-product form with Here σ 0 = 1 2 / √ 2 is the normalized single-qubit identity operator such that ⟨⟨σ 0 |σ 0 ⟩⟩ = 1, and 1 4 is the 4-dimensional identity matrix for this single-qubit operator space.That is, Π 0 subspace corresponds to the identity operator, and Π 1 is the complementary subspace spanned by the Pauli operators evenly.In the noiseless case, where |λ| is the number of 1s in the n-bit vector λ.For the single-qubit case with N = 1, the matrix form of L M is just shown in Supplementary Eq. (17).
Considering the noise (or imperfections) in practice, suppose the corresponding quantum channel of the noisy POVM can be written as L M = λ fλ Π λ , which is still diagonal according to the subspaces.The noisy parameters fλ can be experimentally calibrated.
To this end, a high-fidelity N -qubit product state is prepared and measured with noisy POVM, and the probability is Pr(l) = ⟨⟨ Ẽl |ρ 0 ⟩⟩ with | Ẽl ⟩⟩ being the noisy POVM.
Here we define |P λ ⟩⟩ = N n=1 |Z λn ⟩⟩, we can use ⟨⟨P λ |ψ l ⟩⟩ to estimate the noisy parameter fλ .To show this is indeed an unbiased estimator, we take the expectation value and find that Experimentally, we record the results of l from the noisy POVM, and then calculate the value ⟨⟨P λ |ψ l ⟩⟩ on the classical computer which serves as an estimation of fλ .One can repeat this process M ′ times, and average them to make the estimation more accurate.In this way, we can get the noisy parameters fλ of the noisy channel M, and the matrix form of its inverse M −1 is In post-processing, we can use this new inverse channel M instead of M, such that the error in the noisy POVM could be mitigated.

C. Gate-dependent noise
In Ref. [42], two main assumptions are made on the noise of measurement device to make the theoretical framework rigorous and analytical, i.e., A1.The noise in the circuit is gate-independent, time-stationary, Markovian noise.
A2.The experimental device can generate the computational basis state |0⟩ ≡ |0⟩ ⊗N with sufficiently high fidelity.
Our experimental device is able to generate |HH⟩ with high fidelity (> 0.99) so that A2 is satisfied.Metasurface is a passive optical device so that the metasurface-induced noise is time-stationary and Markovian noise as well.Note that measurement errors induced by metasurface are not strictly gate-independent, as the errors in σ x , σ y , and σ z measurements are 0.086 ± 0.005, 0.073 ± 0.005, and 0.101 ± 0.005, respectively.However, it has been shown robust shadow tomography still works for gate-dependent noise [42].Our experimental results (Figure 4 in main text) confirm this claim as well.To further confirm this point, we simulate the SLST and robust SLST on single-qubit state In each single run in simulation, the value of δ x,y,z is randomly resampled from Gaussian distribution with mean value of δ and standard deviation of σ.The simulation is equivalent to performing measurement with gate-dependent noises on the ideal state ρ.As shown in Supplementary Fig. 5 (a), the enhancement of robust SLST is not obvious when the noise is weak ( δ = 0.05).Robust SLST exhibits advantage when the noise is strong ( δ = 0.1) as shown in Supplementary Fig. 5 (b).The simulation results agree well with the experimental results shown in Figure 4 in main text, where the robust SLST significantly enhances the accuracy when optical loss is high.

D. Shadow norm
For a given POVM E, the shadow norm of observable O is derived from the variance of the estimation ô on state ρ.The variance Var(ô) can be ideally written as The shadow norm is then defined by the maximization of variance Var(ô) over ρ Note that the second term Tr(ρO) 2 is a constant and can be ignored.Then, the shadow norm of O is calculated by with λ max {•} being the maximal eigenvalue of corresponding operator.In theoretical investigations such as [38], it is convenient to calculate shadow norm in Supplementary Eq. (32).For octahedron POVM, the theoretical shadow norm calculated according to Supplementary Eq. ( 32) is ||O|| 2 shd = 1.5.Experimentally, the variance of estimator ô is observed by Without consideration of experimental noise, Supplementary Eq. ( 33) converges to Supplementary Eq. ( 30) when M → ∞.For octahedron POVM, the maximization of Supplementary Eq. ( 30) over single-qubit pure state ρ yields ||O|| 2 shd = 0.75, which is considered as the ideal value for experimentally observed maximal variance in Supplementary Eq. (33).

E. Simulation of shadow norm with SIC POVM
The SIC POVM in qubit system can be expressed by For fixed observable O = |ψ κ,ν ⟩⟨ψ κ,ν | ∈ O, single-qubit state ρ = |ψ γ,ϕ ⟩⟨ψ γ,ϕ | and E SIC , we simulate the statistic of outcomes with M runs, and calculate the variance according to Supplementary Eq. (33).Then, the shadow norm is calculated by maximization over state set P, i.e., max P Var(ô (m) ), where P is the set of 20 pure states.

A. Loss function in SLST
Generally, the loss function in SLST is squared Frobenius norm between two matrices τ and ρ defined as For this loss function, we can write down the unbiased estimator with shadows {ρ (m) } M m=1 as follows.For the first term, it shows which is an order-2 polynomial function of ρ [24].Obviously, Tr(ρτ ) is an unbiased estimator as well.Then, the unbiased estimator of N F in total is Accordingly, the gradient is Note that fidelity F can also be used as loss function if τ is pure state, as F (τ ) is an unbiased estimation with classical shadows {ρ (m) } and the gradient is

B. Setting of hyperparameters in SPSA optimization
The setting of hyperparameters a 1 , a 2 , a 3 , b 1 and b 2 determines the convergence of SPSA optimization.Previous investigations have concluded that a 3 = 0.602, b 2 = 0.101 are generally good choices for most optimization tasks [41,57,58], so that we set a 3 = 0.602, b 2 = 0.101 in our optimization.Besides, we find that a 2 is trivial compared with other four hyperparameters so that we set a 2 = 0. a 1 and b 1 are determined through numerical simulations.We set a 1 = 13, b 1 = 0.5 in single-photon experiment, while a 1 = 8.  . . .
and we set For general case, a mixed state, the proposed state τ k is modeled by Cholesky decomposition where T is a lower triangular matrix Accordingly, we set

D. Scaling of SLST
To investigate the scaling of SLST, we simulate SLST with POVM E octa on 50 randomly generated N -qubit pure states ρ N with N = 2, 4, 6 and 8, respectively.The results of infidelity 1 − F (τ k , ρ N ) are shown in Supplementary Fig. 6, in which we set the iterations k = 200, 1000, 5000 and 25000 for N = 2, 4, 6 and 8, respectively.The extracted scaling of SLST is O(d log d/M ), which is slightly worse than O(d/M ) in SQT and O(d η /M )(η > 1) in SGQT.However, SLST with POVM requires only one experimental setting and the POVM is locally implemented on individual qubit, which is friendly to experiment.A comparison of these technologies is shown in Table 1.

E. Initial setting of τ0
Generally, achieving the global minimum instead of local minimum is challenging in optimization.Indeed, SPSA optimization avoids local minimum under asymptotic iterations due to stochastic perturbation [58].However, SPSA optimization does not guarantee the global convergence in each iteration [57,58].We show that the global convergence in each iteration can be improved by setting initial τ 0 with prior information, instead of randomly setting initial τ 0 .
The direct estimation from classical shadows ρ = 1

M M m=1
ρ(m) returns a Hermitian matrix, which has an eigendecomposition in form of is the focal length.Our aim is to design a metasurface to realize phase configuration of Φ ± j (x, y) calculated according to Supplementary Eq. ( 48) with fixed (x ± σj ,0 , y ± σj ,0 ), λ and f .The parameters we set to calculate Φ ± σj (x, y) are shown in Table 2. Supplementary Table 2.The values of parameters set in calculation of phase configurations Φ ± σ j (x, y) in Supplementary Eq. ( 48).focused at the designed positions on the focal plane.As expected, for a specific polarization, the field intensity of its orthogonal polarization on the focal plane is almost zero, while the field intensity of the other two groups of states is half of the total field intensity.Here, the focused field intensity is defined as the total energy within a circle centered at the focal spot, with a radius of 1.5 times the full width at half-maximum (FWHM) [60].Arbitrary polarization state can be represented by the Stokes vector formalized as S = [S 0 , S 1 , S 2 , S 3 ] T .The elements S i are defined by  Discretization would also introduce optical loss.For example, different arrangements of nanopillars in three regions would introduce a "cut-off" in phase configuration on metasurface.We simulate the phase configuration on metasurface with input polarization of |V ⟩.As shown in Supplementary Fig. 11(a), there are two cut-off lines between three regions, which introduces undesired scattering and consequently increases the optical loss.However, this optical loss is small as most of the incident light is focused around the desired spot as shown in Supplementary Fig. 11(b) and (c), which is coupled into optical fibers in our experiment.

FIG. 1 .
FIG. 1.The metasurface-enabled octahedron positive operator valued measure (POVM) Eocta.a, The elements in Eocta are projectors on states |H⟩, |V ⟩, |+⟩, |−⟩, |R⟩ and |L⟩ respectively, which form a symmetric polytope of an octahedron on Bloch sphere.b, The metasurface to realize Eocta, green, yellow and red blocks on the metasurface represent nanopillars with different arrangements.c, The scanning electron microscopy images of the fabricated nanopillars in three regions.d, Schematic drawing of single nanopillar that is fabricated with same height of 700nm but different (θ, l x ′ , ly′).e, The measured distribution of intensity on focal plane with input polarization of |H⟩, |V ⟩, |+⟩, |−⟩, |R⟩ and |L⟩, respectively.f, The reconstructed Stokes parameters (s1, s2, s3) from data collected in e, and the error bars indicate standard deviations of reconstructed Stokes parameters.

FIG. 2 . 2 .
ln with l n being the outcome of n-th qubit, and one has E[ρ (m) ] = ρ.Repeating the POVM M times (experimental runs), one has a collection of classical shadows {ρ (m) } M m=1 , which is further inquired for estimation The experimental setup and results of shadow tomography with metasurface-enabled positive operator valued measure (POVM).a, Setup to generate entangled photons and demonstrate shadow tomography with metasurface.PBS: polarizing beam splitter.DM: dichroic mirror.HWP: half-wave plate.QWP: quarter-wave plate.E-HWP: electricallyrotated HWP.E-QWP: electrically-rotated QWP.OL: objective lens.b, The maximal error in estimation of expectation of O ∈ O. c, The real-time estimation of expectation of five randomly selected O ∈ O. d, The results of shadow norm maxP Var(ô) for O = |+⟩⟨+| with different experimental runs.e, The results of shadow norm for 128 O ∈ O (red dots), and the simulated results of shadow norm with symmetric informationally complete (SIC) POVM (blue diamonds).The 128 observables O ∈ O are selected according to Haar random.The dots and bars in b and d are the mean value and corresponding standard deviations obtained by repeating the experiment 5 times.The abbreviations of Exp. and Sim.indicate experimental results and simulation results respectively. of various properties of the underlying state.See Supplementary Note 1A for more details.Implementation of POVM with metasurface.In our experiment, we focus on the POVM on polarizationencoded qubit, i.e., |0(1)⟩ = |H(V )⟩ with |H(V )⟩ being the horizontal (vertical) polarization, and consider POVM of L = 6 and |ψ l ⟩ ∈ {|H⟩, |V ⟩, |+⟩, |−⟩, |R⟩, |L⟩} with |±⟩ = (|H⟩ ± |V ⟩)/ √ 2 and |R(L)⟩ = (|H⟩ ± i|V ⟩)/ √ The corresponding POVM E octa is described by a symmetric polytope of an octahedron on Bloch sphere as shown in Figure 1a.To realize E octa , we design and fabricate a 210µm×210µm polarization-dependent metasurface that splits incident light into six directions corresponding to projection on |H⟩, |V ⟩, |+⟩, |−⟩, |R⟩ and |L⟩ with equal probability (shown in Figure

FIG. 3 . 1 −
FIG. 3. Experimental results of self-learning shadow tomography (SLST) on one-photon and two-photon states.a, The average fidelity between reconstructed single-photon states τ and target state |ψ⟩ γ,ϕ using SLST, self-guided quantum tomography (SGQT), maximum likelihood estimation (MLE) reconstruction.b, Average fidelity of SLST by increasing experimental runs M from 10 to 1000.c, Fidelity between reconstructed two-photon states τ and target state ρη using SLST and MLE.d, The fidelities of two-photon states reconstruction from SLST (dash lines) with M = 2000 measurements.The solid lines represent the fidelity from MLE tomography with M = 2000 measurements.The dots and bars in a and c are the mean value and the corresponding standard deviations obtained by repeating the experiment 5 times.The dashed lines and shadings in d and e are the mean value and standard deviation obtained by repeating the iteration 5 times.

FIG. 4 .
FIG. 4. Results of fidelities from robust self-learning shadow tomography (SLST), SLST and maximum likelihood estimation (MLE) reconstruction on two-photon states a, ρη=0.06,b, ρη=0.37 and c, ρη=0.87.In each reconstructions, the experiment is carried out with M = 1000 runs.In robust SLST, additional M ′ = 2000 experimental runs are used for calibration.We set k = 200 in robust SLST and SLST.The error bars are the standard deviations in SLST (robust SLST), obtained from Monte Carlo simulation with assumption that the collected photons in M (M ′ and M ) experimental runs have Poisson distribution.
The results of F = Tr √ τ 200 ρ η √ τ 200 are shown with dots in Figure 3c, the fidelities of three states reach 0.986±0.002,0.990±0.001and 0.981±0.002with M = 2000 experimental runs and k = 200 iterations.We also perform MLE reconstruction τ MLE of two-photon states, where one photon is detected by metasurface and the other is detected by bulk optical setting.The results of F = Tr √ τ MLE ρ η √ τ MLE with M experimental runs are shown with squares in Figure3c.The error in two-photon MLE reconstruction is about 0.047 ± 0.005, which is smaller than that in single-photon MLE reconstruction as only one photon is detected by noisy device (metasurface).In Figure3d, we show that the fidelity of SLST with M = 2000 is converging after k = 200 iterations.
is the projector of the symmetric subspace, and F is the swap operator acting on 2-copy as F|v⟩ ⊗ |w⟩ = |w⟩ ⊗ |v⟩ for all |v⟩, |w⟩ ∈ C d .

Supplementary Fig 8 .
Schematic diagram and design of the metasurface.(a), Schematic of the designed meta-atom consisting of an a-Si nanopillar on a fused-silica substrate.(b), Target phase configuration for orthogonally polarized states when light passes through metasurface.(c) and (d), Mapping of transmission coefficient for LP light polarized along x ′ and y ′ , respectively, as a function of the parameters of l x ′ and l y ′ of the nanopillars.(e) and (f), Mapping of phase shift φ x ′ and φ y ′ , respectively, as a function of the parameters of l x ′ and l y ′ of the nanopillars.The calculated phase configurations of Φ + σj (x, y) and Φ − σj (x, y) are shown in Supplementary Fig. 8(b).To realize phase configurations Φ ± σj (x, y), we design the metasurface consisting of 420 × 140 nanopillars with interval of 500nm.As shown in Supplementary Fig. 8(a), the nanopillar located at position (x i , y i ) can be regarded as a polarizationdependent scatter acting on input polarization with transformation matrix U pillar = cos θ − sin θ sin θ cos θ e iφ x ′ 0 0 e iφ y ′ cos θ sin θ − sin θ cos θ .(49)

S 0 = 1 √ 2 (
I H + I V = I + + I − = I R + I L , S 1 = I H − I V , S 2 = I + − I − , S 3 = I R − I L , (54) where I p is the power intensity of polarization component p ∈ [H, V, +, −, R, L].The results of normalized Stokes parameter s = [s 1 , s 2 , s 3 ] withs 1 = I H − I V I H + I V , s 2 = I + − I − I + + I − , s 3 = I R − I L I R + I L ,(55)are shown in Supplementary Fig.9(g)-(h).Compared to the ideal values, the average error of reconstructed Stokes parameters s 1 , s 2 and s 3 are 0.097, 0.027 and 0.029 respectively, where the errors in |H⟩/|V ⟩ basis is larger than that in |+⟩/|−⟩ and |R⟩/|L⟩ basis.This is mainly caused by the asymmetric response of region 1 with input polarization of |H⟩ and |V ⟩.To verify this, we simulate the optical response of region 1 with input polarization of |H⟩ and |V ⟩, respectively.The simulation is performed within range x ∈ [−3µm, 3µm] and y ∈ [−1µm, 1µm], which is scaling-down of region 1.As shown in Supplementary Fig.10(a), the optical responses with input polarization of |H⟩ and |V ⟩ are asymmetric with respect of y axis, leading to different transmit efficiency on focal plane.This is verified by simulation of distribution of power intensity on focal plane with input polarization of |+⟩ = |H⟩ + |V ⟩) as shown in Supplementary Fig. 10(d), where the output power intensities are unbalanced and introduces more errors in reconstruction of s 1 = I H −I V I H +I V .In contrast to region 1, region 2 (|+⟩/|−⟩ section) and region 3 (|R⟩/|L⟩ section) response their input polarization in symmetric manner.As shown in Supplementary Fig. 10(b) and (c), the distributions of response intensity with input polarization of |+⟩ (|R⟩) and |−⟩ (|L⟩) are symmetric with respect to x = 0, which consequently leads the balanced splitting of power intensity of input polarization |H⟩ as shown in Supplementary Fig. 10(e) and (f).Therefore, the errors in reconstruction of s 2 = I+−I− I++I− and s 3 = I R −I L I R +I L are smaller than that of s 1 .
[53] the reconstructed state.We emphasize that SPSA inevitably introduces systematic errors of the reconstructed state, as well as other optimization algorithms such as MLE and least squares[53].In fact, it is a tradeoff that the reconstruction of a physical state suffers from a bias.As the prepared single-photon state is extremely closed to the ideal state |ψ γ,ϕ ⟩, the accuracy of reconstruction is characterized by the state fidelity between returned state τ k and ideal state |ψ γ,ϕ ⟩, i.e., F = Tr(τ k |ψ γ,ϕ ⟩⟨ψ γ,ϕ |).The results of average fidelity of SLST over 20 prepared |ψ γ,ϕ ⟩ ∈ P after k = 30 iterations are shown with red dots in Figure b 1 and b 2 being hyperparameters that determine the convergence speed of algorithm, which can be generally obtained from numerical simulations (see Supplementary Note 2B for hyperparameter settings).SLST is terminated when there is little change of NF (r k ) in several successive iterations, and corresponding τ

Table 3 .
l x ′ (nm) l y ′ (nm) φ x ′ / 2π φ y ′ / 2π T x ′ Selected nanopillar sizes and corresponding phases and transmittances in the circularly polarized detection region, where T x ′ and T y ′ are transmission coefficient of lights with polarization along x ′ and y ′ , respectively.