Efficient and robust estimation of many-qubit Hamiltonians

Characterizing the interactions and dynamics of quantum mechanical systems is an essential task in developing quantum technologies. We propose an ef ﬁ - cient protocol based on the estimation of the time-derivatives of few qubit observables using polynomial interpolation for characterizing the underlying Hamiltonian dynamics and Markovian noise of a multi-qubit device. For ﬁ nite range dynamics, our protocol exponentially relaxes the necessary time-resolution of the measurements and quadratically reduces the overall sample complexity compared to previous approaches. Furthermore, we show that our protocol can characterize the dynamics of systems with algebraically decaying interactions. The implementation of the protocol requires only the preparation of product states and single-qubit measurements. Furthermore, we improve a shadow tomography method for quantum channels that is of independent interest and discuss the robustness of the protocol to various errors. This protocol can be used to parallelize the learning of the Hamiltonian, rendering it applicable for the characterization of both current and future quantum devices.


I. INTRODUCTION
Large quantum devices consisting of tens to hundreds of qubits have been realized across various hardware architectures [1][2][3][4] representing a significant step towards the realization of quantum computers and simulators with the potential to solve outstanding problems intractable for classical computers [5,6].However, continued progress towards this goal requires careful characterization of the underlying Hamiltonians and dissipative dynamics of the hardware to mitigate errors and engineer the desired dynamics.The exponential growth of the dimension of the state space of a quantum device with the number of qubits renders this an outstanding challenge broadly referred to as the Hamiltonian learning problem .
To tackle this challenge, previous approaches make strong assumptions such as the existence of a trusted quantum simulator capable of simulating the unknown Hamiltonian [20,21] or the capability of preparing particular states of the Hamiltonian such as steady states and Gibbs states [23,25,26,29,36,37], which may be difficult for realistic devices subject to various decoherence mechanisms.
Alternatively, several works [30][31][32] are built on the observation that a Master equation describes the evolution of any system governed by Markovian dynamics.Through this, one obtains a simple linear relation between time derivatives of expectation values and the parameters of the Hamiltonian, jump operators and decay rates (jointly referred to as the parameters of the Lindbladian L) governing the system.Furthermore, for finite range interactions, these approaches can estimate the parameters of the Lindbladian to a given precision from a * daniel.stilckfranca@ens-lyon.frnumber of samples that is independent of the system's size [30][31][32].
A significant drawback of these approaches is that the time derivatives are estimated using finite difference methods.Obtaining a good precision thus requires high time resolution, which is experimentally challenging given the finite operation time of gates and measurements.To estimate a Lindbladian parameter up to an additive error , the system has to be probed at times O( ) apart and expectation values of observables have to be estimated up to a precision of O( 2 ), which translates to an overall O( −4 ) sample complexity to estimate each parameter.
In this article, we propose a novel protocol that alleviates these daunting experimental requirements.Our protocol requires only a time resolution of O(polylog( −1 )) representing an exponential improvement compared to previous protocols and gives an overall sample complexity to recover all parameters of a local n qubit Lindbladian up to precision of O( −2 polylog(n, −1 )).We obtain this by estimating time derivatives using multiple temporal sampling points and robust polynomial interpolation [38].Furthermore, we show how to use shadow process tomography methods to estimate multiple parameters in parallel.In particular, we address shortcomings of previous results [39,40] in extending the framework of classical shadows to processes, a result that is of independent interest.We also extend our analysis to long-range (algebraically decaying) interactions in the systems, obtaining the first results for such systems to the best of our knowledge.The necessary operations for our protocol are measurements in Pauli basis on time-evolved product states consisting of Pauli eigenstates.These minimal requirements make our protocol feasible for characterization of both current and future quantum devices.
FIG. 1. Sketch of the proposed protocol to estimate an unknown Lindbladian, L, of a multi-qubit device.In the first step of classical pre-processing, the interaction graph between qubits is identified from the physical connectivity of the device.Then the unknown Lindbladian is written in a general form using an operator basis of Pauli strings, {Pi} (see main text) and a suitable set of initial states and observables, {(ρj, Oi)} is chosen.In the second step of quantum processing, a time trace of each element of the set is obtained from evolution and measurement on the quantum device.In the last step of classical post-processing, each time trace is fitted to a low-degree polynomial to estimate the derivative of the observable.From these, an estimate of the Lindbladian,Lest, is obtained from the Master equation.

II. RESULTS
In order to use our protocol for an efficient characterization of a quantum device, two assumptions should be fullfilled: 1.The quantum device implements an (unknown) Markovian quantum evolution on n qubits described by a time-independent Lindbladian, L.
2. We assume knowledge of the general structure of the interaction graph of the device i.e. which qubits are coupled to each other.Importantly, no assumptions are made regarding the exact strength or even form of the couplings.
The first assumption ensures that the evolution of a general observable, O is described by the Master equation, i.e. d dt O = L(O) .We note that the Lindbladian captures both the Hamiltonian evolution and the dissipative dynamics of the device.Furthermore, the assumption of time-independence applies to the run-time of the experimental characterization.
The second assumption bounds the size of the estimation task.If the interaction graph was completely unknown, our protocol could, in principle, be applied but would require the estimation of an exponentially growing number of general multi-qubit coupling terms as the number of qubits increases.However, having prior knowledge that, e.g.nearest neighbor couplings dominate in the device, makes the estimation task tractable.
Using the knowledge of the interaction graph, one can expand the Lindbladian in an operator basis, {σ i } constructed from tensor products of single-qubit Pauli matrices and the identity.Such an expansion is always possible since this basis amounts to a Hilbert-Schmidt orthogonal set of traceless Hermitian operators spanning the entire vector space.Estimating the set of expansion coefficients {α i } gives an estimation of L and thus a full characterization of the system.
It is well known that the Master equation for the time derivative of the expectation value of a local observable O at time t = 0 for a given initial state ρ of the system gives us a linear equation for the expansion coefficients [30][31][32].We use this to estimate the expansion coefficients going through three stages of classical pre-processing, quantum processing, and classical post-processing.

Classical pre-processing
After expanding L in an operator basis, the following steps are performed.
1. Find a suitable complete set, {(ρ j , O i )} of multiqubit product states (ρ j ) and observables (O i ) for which the Master equation involves only a few selected expansion parameters of the Lindbladian for each element of the set.The set is complete in the sense that all expansion coefficients can be found by solving the Master equations for all elements in the set.As we show below, such a set can readily be found by considering initial states where only a few qubits are initialized as different eigenstates of the Pauli matrices while the remaining qubits are prepared in the maximally mixed state I/2.

Calculate the expectation values appearing on the right hand side of the Master equations
] for all elements in the set {(ρ j , O i )}.Since both the initial states and the observables are products, this can be done efficiently.

Quantum processing
In order to solve for the expansion coefficients {α i }, we also need the values of the time-derivatives appearing on the left hand side of the Master equations, i.e. d dt tr [ρ j O i ].These are estimated using the quantum device in the following way.
1.The quantum device is prepared in initial state ρ j and evolved for a time t k ∈ {t 0 , t 1 , . . ., T } after which the observable O i is measured.
2. The above procedure is repeated for each element in the set {(ρ j , O i )} for all evolution times t k to obtain estimates of O i (t k ) j = tr [ρ j (t k )O i ] where ρ j (t k ) is the state of the system having evolved for time t k from the initial state ρ j .We note that the single qubit mixed states can be simulated by sampling eigenstates of the Pauli matrices at random.

Classical post-processing
The final part of the characterization involves estimating d dt tr [ρ j O i ] from the experimentally obtained time trace of O i (t k ) j and solving for the expansion coefficients {α i }.This involves 1. Fit the time trace of O i (t k ) j with a lowdegree polynomial in the time, p i,j (t) and estimate . This is done for each element in the set {(ρ j , O i )}.

Solve the set of linear equations from the Master equations
with respect to the expansion coefficients ({α i }).This is possible since d dt tr [ρ j O i ] has been estimated from the polynomial fits and all expectation values appearing in tr [ρ j L(O i )] have been calculated leaving the α i 's as the only unknown parameters.Following the steps above, a complete characterization of the underlying Hamiltonian and dissipative dynamics of the quantum device as given by the Lindbladian is obtained.The two key steps of the protocol are the choice of the set {(ρ j , O i )} and the polynomial interpolation used to obtain estimates of the time derivatives.Below, we outline the details of both steps, show how shadow tomography methods can be used to parallelize the procedure, and provide rigorous guarantees on the precision of the protocol.Importantly, we show that Lieb-Robinson bounds on the spread of correlations in the system can be used to ensure robust polynomial fitting of the time traces of expectation values allowing for an exponential relaxation of the temporal resolution compared to finite difference methods rendering the protocol feasible for near-term quantum devices.

A. Choosing the set of initial states and observables
The first step in the classical pre-processing is to expand L in an operator basis constructed from tensor products of single-qubit Pauli matrices and the identity.The right hand side (rhs) of the Master equation can be expanded as a sum of single Pauli matrices and their products.Our goal is to isolate the unknown expansion coefficients.To this end, we consider an initial state of the form where the i'th and j'th qubit are prepared in eigenstates of the Pauli matrices while the state of the remaining N − 2 qubits, ρ N −2 , is assumed to be the maximally mixed state.For a state of the form in Eq. ( 1) the rhs of the Master equation (see above) can be simplified greatly depending on the choice of the observable O.This is due to the properties of the Pauli matrices namely that they have vanishing trace and that where δ kl is the Kroenecker delta function and ε klm is the Levi-Civita symbol.From this, it follows that if a single qubit Pauli observable (O = σ m ) is chosen, then only the single qubit terms of the rhs of the Master equation involving the i'th qubit will have non-vanishing trace and, using the relation in Eq. ( 2), the different single qubit Pauli expansion coefficients (the coefficients of terms in the expansion that only involves single qubit Pauli matrices) can be isolated.
After isolating the single qubit expansion coefficients, the expansion coefficients related to two-qubit Pauli terms (σ n in a similar manner.This procedure can be iterated to isolate higher and higher order expansion coefficients by considering observables involving more and and more qubits.
In the supplemental material, we provide a detailed derivation of how all expansion parameters can be isolated for a general Hamiltonian with terms coupling from two to k qubits and arbitrary single qubit dissipation terms.We note that already for two qubit dissipation terms, deriving linear combinations of initial states and expectation values that allow us to isolate different parameters is quite cumbersome and we do not do this explicitly.However, from a numerical point of view this is a trivial task.Indeed, as remarked before, each pair of Pauli strings gives us access to a linear equation for the different parameters of the evolution.
After collecting enough equations to ensure that the linear system is invertible, the precision with which we need to estimate each expectation value to ensure a reliable estimation of the parameters is controlled by the condition number of the matrix describing the system of linear equations.As both estimating the condition number and solving the linear system can be done efficiently, we conclude that estimating dissipative terms acting on a constant number of qubits does not posses a significant challenge from a numerical perspective.
A desirable feature for a Hamiltonian learning protocol is that the state preparation and measurement steps are simple, parallelizable and even independent of the parameter being estimated.In this paper, we propose a variation of the classical shadows protocol of [41] for process tomography that achieves that and is of interest on its own.Given a quantum channel Φ acting on N qubits, our protocol estimates overlaps of the form 2 −n tr P a m Φ(P b l ) for Pauli strings P a m , P b l .Here P a m refers to the m'th Pauli string in the collection of Pauli strings that differ from the identity on at most ω a sites.Note that estimates of such overlaps is all that is required as input for our Hamiltonian learning algorithm.By only requiring the preparation of random product Pauli eigenstates and measurements in a random Pauli bases, our protocol only requires O(3 wa+w b −2 log(K 1 K 2 )) samples to estimate such overlaps up to error for K 1 K 2 pairs of Pauli strings of weight at most w a and w b respectively.For local Hamiltonians on a lattice, using this protocol gives a logarithmic in system size sample complexity to determine all parameters of the evolution.

B. Robust polynomial interpolation
As described above, a key step in our learning algorithm is to obtain information about the time-derivatives of observables at t = 0.For this, we rely on robust polynomial interpolation.Accordingly, based on expectation values O i (t k ) j for a set of times t k we want to extract a polynomial p i,j (t) such that swe can estimate d dt tr [ρ j O i ] as d dt p i,j (t)| t=0 .For this approach to work, we have to be able to control the degree of the polynomial p i,j (t) in order to give an upper bound on the number of sampling points t k for which we will have to determine O i (t k ) j experimentally.In the following, we briefly outline how such a guarantee on the degree of p i,j (t) can be obtained and refer to Append.D for a detailed proof.
Our argument proceeds in two steps.Firstly, we establish that the expectation value O(t) of a local observable O that evolves under a Lindbladian L B restricted to some sub-region B up to some time t max , can indeed be approximated up to error ε by a degree-d polynomial, where d depends linearly on the size of B, t max and log(ε −1 ).Hence, for the second step of our argument, it remains to show under which circumstances, we can restrict the evolution of the Pauli-strings P a m we identified in the previous step to a local generator.The main insight here is that for finite range (or sufficiently quickly fast decaying) interactions, the dynamics of any local observable O exhibits an effective light cone quantified by a Lieb-Robinson bound (LRB) [42][43][44][45][46][47].The LR-bound in turn allows us to restrict the Lindbladian on the full system to a generator coupling only systems in the vicinity of the support of O, where the size of this shielding region only grows linearly with t max .Hence, bringing these two arguments together, we can first employ the LR-bound to restrict the dynamics to a sub-region around the support of the Pauli-string, P a m , and then approximate the corresponding evolution on that finite region up to error ε by a polynomial of degree O poly(t max , log( −1 )) .Now, making use of the techniques from Ref. [38], we can extract the first derivative of this polynomial from measurements at O polylog( −1 ) different times t k .

III. NUMERICAL EXAMPLES
To investigate the performance of our protocol for experimentally relevant parameters, we performed numerical simulations of a multi-qubit superconducting device.We consider a system with tunable couplers similar to the Google Sycamore chip [1].This design relies on a cancellation of the next-next-nearest coupling between two qubits through the direct coupling with a coupler [48,49].We consider a generic system consisting of a 2D grid of qubits with exchange coupling between nearest neighbors.The dynamics are described through a Lindblad equation with the effective two-qubit Hamiltonian for each neighboring qubit pair (i, j) [48,49] for i = j = 1, . . ., n and a dissipation term D (i) acting on the i'th qubit and having jump operators σ + (generalised amplitude damping) and σ (i) z (pure dephasing).Here ωj = ω j + g 2 j ∆j is the Lamb-shifted qubit frequency, g i is the coupling between the i'th qubit and the coupler, and g ij is the direct two-qubit coupling.We have assumed that ∆ j = ω j − ω c < 0 where ω c (ω j ) is the frequency of the coupler (j'th qubit) and have defined 1/∆ ij = (1/∆ i + 1/∆ j )/2.By adjusting the frequencies of the coupler and the qubits, the effective qubitqubit interaction can be cancelled up to experimental precision.Typical qubit frequencies are around 5 − 6 GHz [1]), while ∆ ∼ −1 GHz, g ij ∼ 10 − 20 MHz, and g j ∼ 100 MHz [48,49].In our simulation, we assume that all qubit frequencies and couplings have been characterized up to a precision of 100 kHz using standard characterization techniques [1] and consequently, that all couplers have been tuned off with the same precision i.e. gigj ∆ij + g ij ∼ 100 kHz.Considering a layout of 16 qubits (see Fig. 4 for the interaction graph), we randomly sample all qubit frequencies and qubit-qubit interactions according to Gaussian distributions with zero mean and standard deviation of 100 kHz.
In addition to the Hamiltonian evolution, we also include dissipative dynamics in our numerical simulation.We include quasi-static random frequency shifts of the qubits leading to effective dephasing with a characteristic timescale of T * 2 ∼ 150 µs as well as pure dephasing resulting in a transverse relaxation on a timescale T 2 ∼ 60 µs representing state of the art coherence times [1,49].Finally, we include longitudinal relaxation of the qubits through an amplitude damping channel on the time scale of T 1 ∼ 60 µs.We refer to Appendix C for a more detailed discussion and Table III for the sampled parameters of our simulation.Error of estimation with shot noise Numerical derivative Interpolation FIG. 2. The median quality of recovery of of one 2-qubit coupling using our method and those based on numerical derivatives [30][31][32] as a function of the initial time.We assumed that the total time of the experiment is fixed.That is, we let the initial time times the total number of samples for each time step to be a constant (10 7 for this plot).For each initial time, we simulated 1000 instances of the recovery protocol, always adding shot noise with the same standard deviation to the data.The dots correspond to the median quality of recovery, whereas the lower and upper end correspond to the 25 and 75 percentile.We ran the simulation on a system with 16 qubits.
In Fig. 2, we plot the average estimation error as a function of the temporal resolution set by the value of the initial time step, t 0 .For this plot, we only included the Hamiltonian evolution in the numerical simulation together with quasi-static random frequency shifts of the qubits.This was to lower the run time of the simulation allowing us to investigate the performance for a broad range of initial times.We assumed the total run time of the experiment was fixed such that t 0 × S is constant, where S is the number of samples.From the figure, we clearly see the improved scaling of our protocol of the estimation error with the time-step size compared to using a finite difference method [30][31][32].Besides already performing better at the time resolution for moderate values of the initial time, we see that after a threshold initial time around 10 −0.7 , the performance is not limited by the initial time, only the shot noise.In contrast, the finite difference method still require smaller initial times to improve on the error with the same shot noise.
We also investigated the robustness of our method with respect to shot-noise for a fixed time resolution.For these simulations, we again only included the Hamiltonian evolution together with quasi-static random frequency shifts of the qubits to have a practical run time of the simulation.From Fig. 3 we see that for a fixed time resolution of 30 ns our protocol results in an average estimation error that improves linearly with the shot-noise down to an error below 10 −4 .This is in contrast to finite 3. Median quality of recovery of one 2-qubit coupling using our method and those based on numerical derivatives [30][31][32] as a function of the standard deviation of the shot noise.The initial time for this estimate is 30 ns and here we also generated 1000 instances of the noise with a given standard deviation.The plot shows the median quality of the recovery and the 25 and 75 percentiles.We see that the quality of the recovery for the interpolation decays approximately linearly with the shot noise.For the numerical derivative, we see two regimes: first a linear decay of the error until a shot of noise of order 10 −3 .After that, the error plateaus and does not improve even with smaller shot noise.This is because for numerical derivative methods, at this point the dominant error source comes from the choice of initial time.Importantly, we see that interpolation consistently provides better estimates than the numerical derivatives method.
difference methods, where the estimation error plateaus around 10 −3 since it becomes limited by the time resolution.This is a clear effect of the exponential improvement of our protocol w.r.t. the time resolution compared to finite difference methods.
Finally, we performed a full numerical simulation including also the pure dephasing and amplitude damping noise as described above and estimated the σ X σ X couplings between the qubits.As shown in Fig. 4, we obtain reliable estimates of all 22 couplings demonstrating how our method allows the estimation of specific terms in the Lindbladian despite the dynamics being governed by the full (dissipative) Lindbladian.For simplicity, we did not explicitly estimate the single qubit Hamiltonian parameters and the Lindbladian decay rates.
For all estimations above, we fitted to degrees 1−7 and picked the one with the smallest average error.We note that, although the robust interpolation methods of [38] in principle require random times, we performed numerical experiments with deterministic times on systems with 16 qubits.

FIG. 4.
Error in recovery of σX σX couplings of a quantum systems with a geometry similar to the Sycamore processor usign numerical derivatives and interpolation.Note that while we only plot the estimation of the Hamiltonian couplings, the numerical simulation included the full Lindbladian including both dephasing due to quasi-static random frequency shifts of the qubits, pure dephasing and amplitude damping noise.The initial time for each coupling was 0.1 µs in the simulation.Note that interpolation consistently outperforms numerical derivatives, sometimes by several orders of magnitude.

IV. CONCLUSION AND DISCUSSION
In conclusion, we have proposed a novel Hamiltonian learning protocol based on robust polynomial interpolation that has rigorous guarantees on the estimation error.Our protocol offers an exponential reduction in the required temporal resolution of the measurements compared to previous methods and a quadratic reduction in the overall sampling complexity for finite range interactions.Our protocol only requires the preparation of single qubit states and single qubit measurements in the Pauli bases making it suitable for characterization of both near term and future quantum devices.
Furthermore, the recovery of multiple parameters can be highly paralelized by resorting to a variation of classical shadows to quantum channels we introduce here.Besides being a protocol of independent interest, we believe our work constitutes the first application of classical shadows to process tomography.
Our method allows for the characterization of a general local Markovian evolution consisting of a unitary Hamiltonian part and a dissipative part.While we have only explicitly considered single qubit dissipation here, we believe that our protocol is also valid for general multi-qubit dissipation as outlined above but leave the explicit analysis of this to future work.We have also analysed the performance of our protocol for algebraically decaying interactions which we believe to be the first results for Hamiltonian learning of such systems.The convergence of our method can be ensured for interactions decaying faster than the dimension of the system.We note, however, that improved bounds on the locality of such systems might improve this result in the future.

V. METHODS
Here we detail and formalize our results regarding the estimation error guarantees of our protocol.In particular, we detail the use of Lieb-Robinson bounds on the spread of correlations in the system to bound the error.Furthermore, we outline the shadow tomography method for the parallization of the measurements.

A. Derivative estimation
Define f (t) = tr e tL (O)ρ and L B to be the Lindbladian truncated to a subregion B of the interaction graph.Our protocol consists of first estimating f (t i ) up to an error O( ) for random times t 1 , . . ., t m .The curve of f (t) is then fitted to a low-degree polynomial p, and p (0) is taken as an estimate for the derivative f (0) = tr [L(O)ρ].Below we prove the accuracy and robustness of this method.The first step is Theorem V.1, which establishes under what conditions f (t) is indeed well-approximated by a low-degree polynomial.
such that for all 0 ≤ t ≤ t max : and p (0 The main technical tool required for the proof are Lieb-Robinson bounds (LRB) [42][43][44][45][46][47], which ascertain that the dynamics of local observables under a time evolution with a local Lindbladian have an effective lightcone.More precisely, we need to hold for constants c 1 , µ and v, where dist() is the distance in the graph.
From the LRB we can show that the dynamics is wellapproximated by a low-degree polynomial.We leave the details of the proof to Sec.D in the supplemental material and only discuss the main steps here.The general idea of going from the LRB to the low-degree polynomial is to truncate the Taylor series of the evolution under L B for B large enough and take that as the approximating polynomial.As the derivatives of the evolution under L B only scale with the size of the region B, this allows us to show that the Taylor series converges quickly.Now that we have concluded that the expectation value is well-approximated by a small degree polynomial, we continue to show that we can reliably extract the derivative from approximations of the expectation values for different t.This is formally stated in the following theorem.
Theorem V.2.Let L be a Lindbladian on a Ddimensional regular lattice.Suppose we can measure the expectation value of two-body Pauli observables on Pauli eigenstates in the time interval [t 0 , t max ] under L for t 0 as and t max = 2 + t 0 .Then, measuring the expectation values at random times up to precision O( / polylog( −1 )), is sufficient to obtain an estimate â(m1) α1,α2 of a This yields a total sample complexity of S = O −2 polylog( −1 ) .
Of course, the same results also hold for the parameters a (m1) α1 and those of L (m1) .Importantly, Theorem V.2 bypasses both requiring small initial times and O( −4 ) sample complexities.
To go from Thm. V.1 to Thm.V.2 we first need to establish that we can robustly infer an approximation of p from finite measurement data subject to shot noise.Subsequently, we need to show that it will also allow us to reliably estimate p (0).Let us start with approximating p.
a. Robust polynomial interpolation: We will resort to the robust polynomial interpolation methods of [38] to show Thm.V.2.We review their methods in more detail in [50, Sec.E].But they depart from the assumption that we get m (randomly sampled) points x 1 , . . ., x m ∈ [t 0 , t max ] and y 1 , . . . ,y m ∈ R. In our setting, the x i correspond to different times and the y i to approximations of the expectation value of the evolution at that time.Furthermore, the y i satisfy the promise that there exists a polynomial p of degree d and some σ > 0 such, that hold, for strictly more than half of the y i .The rest might be outliers.In our setting, the magnitude of σ corresponds to amount of shot noise present in the estimates of the expectation values.The authors of [38] then show that by sampling m = O(d log(d)) points from the Chebyshev measure on [t 0 , t max ], a combination of 1 and ∞ regression allows us to find a polynomial p of degree d that satisfies: Although the details of the 1 and ∞ interpolation are more involved and described in Sec.E of the supplemental material, a rough simplification of the procedure is the following.First, we find a polynomial p 1 of degree d that minimizes i |p 1 (x i ) − y i |.After finding p 1 we compute the polynomial p ∞ that minimizes max i |p ∞ (x i )−(y i −p 1 (x 1 ))|.We then output p = p 1 +p ∞ as our guess polynomial.Note that finding both p 1 and p ∞ can be cast as linear programs and thus can be solved efficiently [51].
By combining this result with Thm.V.1, we robustly extract a polynomial that approximates the curve t → tr e tL (O Y )ρ up to O( ) for t ∈ [t 0 , t max ].Indeed, we only need to estimate the expectation value f (t i ) up to for enough t i and run the polynomial interpolation.
Note that Eq. ( 11) only allows us to conclude that p − p is small.However, we are ultimately interested in the curve's derivative at t = 0, as the derivative contains information about the parameters of the evolution.For arbitrary smooth functions, two functions being close on an interval does not imply that their derivatives are close as well.Fortunately, for polynomials the picture is simpler.A classical result from approximation theory, Markov brother's inequality [52], allows us to quantify the deviation of the derivatives given a bound on the degree and a bound like Eq. (11).Putting these observations together, we arrive at Thm. V.2.The details of the proof are given in Sec.E of the supplemental material.

B. Generalizations of Thm. V.2
We also generalize Thm.V.2 in two directions.First, we extend the results to interactions acting on k qubits instead of 2. As long as the noise is constrained to acts on 1 qubit and k = O(1), this generalization is straightforward.Indeed, we only need to measure an observable that has the same support as the Pauli string and does not commute with it, as it is then always possible to find a product initial state that isolates the parameter.Generalizing to noise acting on more than one qubit makes it more difficult to isolate the parameters of the evolution as described in the main text.In that case, it then becomes necessary to solve a system of linear equations that couples different parameters.Although our method still applies, analysing this scenario would require picking the observables and initial states in a way that the system of equations is well-conditioned and we will not discuss this case in detail here.
Second, another important generalization is to go beyond short-range systems.Although we have only stated our results for short-range systems in Thm.V.2, our techniques apply to certain long-range systems.As this generalization is more technical, we leave the details to Appendix E and constrain ourselves to discussing how the statement of Thm.V.2 changes for more general interactions.
Only one aspect of the precious discussion changes significantly for long range interactions: how the r.h.s. of Eq. ( 6) generalizes.More precisely, let us assume that for some injective function h : R → R with h(x) = o(1), we have For instance, for short-range or exponentially decaying interactions, h will be an exponentially decaying function.Then we can restate Thm.V.2 in terms of h −1 .As we show in Thm.F.1 in Appendix F, for a precision parameter > 0 and evolution on a D-dimensional lattice, assume that we pick the initial time as Furthermore, assume that we estimate the expectation value of local observables up to precision O( ) at through, the same procedure as in the local case.Note that the error in Eq. ( 13) only tends to 0 as → 0 if ), holds, i.e. the function h must decay fast enough.In Sec.G of the supplemental material, we discuss examples of systems with algebraically decaying interactions for which this is satisfied.For instance, for potentials that decay like r −α with α > 5D − 1 we obtain that h −1 ( ) = O( − 1 α−3D ), holds.We summarize the resulting resources in Tab.VI in the supplemental material.But the message of bounds like ( 13) is that it is still possible to obtain bounds on the error independent of the system's size beyond short-range systems.However, this comes at the expense of requiring higher precision and sampling from more points.
Another important observation is that the assumption that we know the structure of the interactions exactly is not required.Indeed, our method is robust to Hamiltonian perturbations of the model as long as the resulting evolution still satisfies a LR bound.For instance, suppose that there actually is a non-negligible interaction between qubits i and j that is not accounted by our model.As long as the resulting time evolution still satisfies a LR bound, our results still hold.As the linear equation to isolate any parameter is independent of that parameter, we can still apply our techniques in this setting.

C. Parallelizing the measurements
So far we have only discussed how to obtain the estimate of one parameter of the state from experimental data in an efficient manner.However, it is possible to parallelize the measurement and ensure that we can obtain experimental data to estimate all parameters simultaneously.
To that end, we resort to a classical shadow process tomography method.Although some papers in the literature already discussed classical shadows for process tomography [39,40], their proofs unfortunately contain a shortcoming.These previous approaches where based on applying the classical shadows protocol to the Choi state of the underlying evolution and importing existing classical shadow tomography results for states [41].Unfortunately, this proof method produces a prefactor in the sample complexity that is exponential in the system size and was missed in previous works.Fortunately, we give an alternative proof in Sec.H for process shadow tomography that does not require the Choi state and bypasses this issue.In addition, our result also improves the sample complexity of previous results.
More precisely, we show that given a quantum channel Φ, Pauli strings P a 1 , . . ., P a K1 that differ from the identity on at most ω a sites and Pauli strings P b 1 , . . .P b K2 that differ from the identity on at most ω b sites, it is possible to obtain estimates êm,l of 2 −n tr P a m Φ(P b l ) satisfying for all m, l with probabillity at least 1 samples.More precisely, the protocol of shadow process tomography requires preparing Eq. ( 14) many different random initial product Pauli eigenstates and measuring them in random Pauli bases.This makes it feasible to implement it in the near-term.We discuss it in more detail in Sec.H of the supplemental material, as this protocol may be of interest beyond the problem at hand.The shadow process tomography protocol is ideally suited for our Hamiltonian learning protocol.Indeed, note that to learn k-body interactions, we only required the preparation of initial states ρ l that differ from the maximally mixed state on k qubits and measure Pauli strings P m supported on at most k qubits.Furthermore, for a system of n qubits in total, there are at most 16 k n k ≤ 16 k n k such states or Pauli strings.We conclude that we can estimate all required expectation values for a given time step using samples.As our protocol requires estimating expectation values at a total of polylog( −1 ) time steps, we can gather the data required to recover all the O(n) parameters of the evolution from O( −2 polylog(n, −1 )) samples through the shadow process tomography protocol whenever k = O(1).

Any Hamiltonian can be written as
where Roman indices identify the subspace on which the operator acts, and Greek indices identify the Pauli operator, e.g.α = x.No assumption about the dimension or structure of the hermitian Hamiltonian is needed for this expansion to be valid.For a Markovian noise environment, the evolution of a quantum system ρ 0 is described by a Master equation of the form where H is the Hamiltonian describing the evolution of the system, L µ,ν are the elements of the Lindblad matrix, expressed in an operator basis consisting of the different combinations of single-qubit Pauli matrices {σ}.Multiplying it from the right hand side on an observable O and taking the trace, we can write Let us introduce the notation To isolate {a , . . .} we observe the N qubit state with the following density matrix where the i and j are the Pauli qubits, namely ρ τi,j )/2, and ρ N −2 is the density matrix of all other qubits, which we set to be maximally mixed.Using σ α σ β = δ αβ I + iε αβγ σ γ , we can find the following relations where O is acting on (i, j) qubits.Selecting O = σ ξj , we can rewrite the latter matrix element as follows , where η, γ ∈ {x, y, z}.Selecting other observable O = σ ξi , we can rewrite (A6)-(A8) differently as ξi ) = 0, (A10) ξj , we can rewrite (A6)-(A8) as For k ≥ 2 we can write the general matrix element: where O is acting on (1, . . ., k) qubits.Let O = σ (1) ξ k , holds.Then we can rewrite (A12) as follows B (1,...,k) α1,...,α k (ρ (i,j) τi,τj , σ From this result, for k = 3 we get µν } we recall the assumption that ρ N −2 is the density matrix of the maximally mixed state.Then the Lindblad part of the equation (A3) for an observable O = σ .
Let us substitute the conditions (A9) in (A15).We get the following results: and However, for an observable O = σ ξi the Lindblad part of the equation (A3) is the following Substituting the conditions (A10) in (A18), we get ξ l we can write Then, according to the results of the previous subsection, we get Finally, for an observable O = σ (1) ξ k , k > 3, the Lindblad part of the equation (A3) is the following 1. Final Results After we selected the different observable operators and defined the density matrix ρ 0 = ρ where the i and j qubits are in the Pauli states, we are ready to isolate the desired coefficients.For an observable O = σ we can write the equation (A3) as follows Selecting τ j = η, τ i = γ, we can isolate the coefficients of the type a α1α2 in (A23), namely Let us call ρ tτiτj the density matrix evaluated by the Hamiltonian evolution from ρ 0 .From (A24) we can find a α1α2 .To this end, we select γ = y, η = y and four pairs τ i , τ j ∈ {z, x; z, z; x, x; x, z} to get the system of equations Since a zy and a yx , we can write Selecting γ = x, η = y and four pairs τ i , τ j ∈ {y, z; y, x; z, z; z, x} to get the system of equations Hence Selecting γ = x, η = z and four pairs τ i , τ j ∈ {y, y; z, x; y, x; z, y}, we get Hence the last coefficient is To find the other coefficients we select τ j = η, τ i = γ and rerewrite (A23) as Next, for τ j = η, τ i = γ, we can rewrite (A23) as µ=x,y,z Selecting an observable O = σ γ , we can write µ=x,y,z For the other observable O = σ η , the result is the following Substituting (A33) in (A31), we can write Since in (A31) the conditions τ j = η, τ i = γ, hold, we can rewrite the latter equation as Solving the latter equation, we find a (i) α1 .To this end, we select γ = y, τ i = x, τ j = η ∈ {x, y, z} and get Next, for γ = x, τ i = z, τ j = η ∈ {x, y, z} we get the solution of (A36), namely Finally, for γ = x, τ i = y, τ j = η ∈ {x, y, z} the solution is Substituting (A34) in (A32), we can write Since τ j = η, τ i = γ, hold, we can rewrite it as follows Solving the latter equation, we find a (j) α1 .Selecting η = y, τ j = z and τ i = γ ∈ {x, y, z}, we get Selecting η = x, τ j = z and τ i = γ ∈ {x, y, z}, we get Finally, selecting η = y, τ j = x and τ i = γ ∈ {x, y, z}, we get the last coefficient of this type All the coefficients with the corresponding observables and initial states are given in Table I.
x,η )}, ∀η ∈ {x, y, z} (A37) a α i ,α j , αi, αj ∈ {x, y, z}.In the third column the number of equation for every parameter is provided, depending from the pairs of the observable O and the initial state ρ (i,j) τ j ⊗ ρN−2, τi, τj = {x, y, z}, given in the second column.For example, to estimate a (i) x we use (A37) with two pairs of observables and states: {O, ρ (i,j)
x , ρ (i,j) x,τ j1 }; {σ µν , µ, ν ∈ {x, y, z}.In the third column the number of equation for every parameter is provided, depending from the pairs of the observable O and the initial state ρ (i,j) τ j ⊗ ρN−2, τi, τj = {x, y, z}, given in the second column.For example, to estimate D (j) yz we use (A53) with one pair of observable and state: {O, ρ (i,j)

Appendix B: Numerical simulations
For simulations of the Hamiltonian learning protocol, we employed direct numerical solution of the time-dependent Schrodinger equation for the whole system of N = 16 qubits, whose time-dependent wavefunction |ψ(t) is represented as an array of 2 N complex numbers, normalized to 1.The evolution includes both unitary component, governed by the system's Hamiltonian, and three non-unitary components: the first one, stemming from the quasi-static random frequency shifts, leading to essentially non-Markovian dephasing of the qubits with the characteristic time T * 2 , the second component, described as a set of Lindblad superoperators corresponding to the phase damping channel, leading to Markovian transverse decoherence of the qubits on the timescale T 2 , and the third component, also leading to Markovian evolution of the qubit, and described as a set of Lindblad superoperators corresponding to the amplitude damping channel, which leads to longitudinal relaxation of the qubit on the timescale T 1 .For precise meaning of the terms "Markovian" and "non-Markovian", see explanations below in subsection B 2.

Unitary evolution
The simulation of the unitary (Hamiltonian) evolution was performed using the 2nd order Suzuki-Trotter decomposition of the evolution operator.The total Hamiltonian of the system in question can be written as where the operators σ α j for α = x, y, z denote the Pauli matrices σ x j , σ y j , and σ z j , respectively, corresponding to the j-th qubit, and for the problem considered in this paper the couplings J jk are restricted to the nearest neighbor qubits on a 2-D lattice.Note that the actual frequency Ω j of the j-th qubit in the Hamiltonian (B1) is different from its nominal frequency ωm mentioned in the main text; the reasons for this difference are explained in subsection B 2 below.
The Hamiltonian (B1) is represented as a sum J jk σ y j σ y k , (B4) and the corresponding Suzuki-Trotter decomposition of the evolution operator U (∆t) for the (small) timestep of duration ∆t has the form ensuring the overall time discretization error of the order (∆t) 2 .The evolution operator over many time steps is a product of elementary operators U (∆t).Each term in the sum representing the Hamiltonian H X (and, similarly, H Y and H Z ) commutes with all other terms, therefore Each term in this direct product acts on the wavefunction |ψ(t) in a straightforward manner: the entries of the array that represents the wavefunction turn into linear combinations of themselves.Similar direct-product representation holds for H Y and H Z as well, such that the action of the total evolution operator U (∆t) is easy to compute, without the need to calculate or store 2 N × 2 N matrices.
In order to represent the situation where the non-initialized part of the system is in the completely mixed state, but avoid using the density matrix explicitly (which would imply dealing with 2 N × 2 N matrix instead of the single array of the size 2 N ), we represent the completely mixed state as a wavefunction with random entries [53,54].Specifically, we sampled the real and the imaginary parts of each entry of the corresponding wavefunction independently from Gaussian distribution with zero mean and unit variance, and then normalized the resulting wavefunction to one.In this way, for instance, the situation where the first and the second qubit are both initialized in the state |0 , while the rest of the system is in completely mixed state, i.e. when the system's density matrix is where 1 N −2 is an identity matrix of the size 2 N −2 × 2 N −2 , is represented using the total wavefunction in the form where the random state |ψ (r) N −2 of the remaining N −2 qubits is generated as described above.Such an approximation provides high accuracy, of the order of exp (−N/2), due to the measure concentration phenomenon [55].
Further improvement in accuracy was achieved by averaging the values of the relevant observables over M = 189 independent realizations of the random wavefunction (as well as other random quantities, see below), which reduced the error by an additional factor of the order ∼ 1/ √ M ≈ 0.07.The accuracy was also independently controlled by estimating the variance in the calculated values of the observables, and ensuring that this variance remains much smaller than the statistical error caused by the shot noise produced by sampling the relevant observables for each qubit.

Non-unitary evolution
The first non-unitary component of the system's evolution, dephasing of the j-th qubit on the timescale T * 2,j , caused by its random static frequency shift, is modeled by directly reproducing the underlying physical picture.Namely, we assumed that the actual frequency Ω j of the j-th qubit, see Eq. B1, is a sum of two contributions: the nominal value ωj , and a random shift β j that remains constant during the system's evolution.The values of β j were independently sampled from Gaussian distributions with zero mean and variance b 2 j , which can be different for different qubits.The parameter b j determines the dephasing time T * 2,j of the j-th qubit: if this qubit were uncoupled from the rest of the system, then, after averaging over β j , its transverse (x-and y-) components would undergo Gaussian decay with time dependence exp (−b 2 j t 2 /2), i.e. b j = √ 2/T * 2,j .As mentioned above in subsection B 1, the evolution of the system was repeated M = 189 times; each time we used different realizations of the set of the random frequency shifts β j (as well as other random quantities, such as e.g.different realizations of the random wavefunction, also see below).Within this approach, for each particular realization of the parameters β j , the evolution of the system is unitary, and can be simulated using the system's wavefunction as described in subsection B 1 above, while the non-unitary decay occurs due to averaging of the relevant observables over different realizations of the random parameters β j (along with other random quantities).
Note that the dephasing caused by averaging over the static random frequency shifts, with its characteristic Gaussian-like decay of the density matrix elements, cannot be described via Lindblad operators.It is an example of non-Markovian evolution, in the sense that it cannot be described by a set of first-order differential equations (with respect to time t), which would include only current values of the (averaged) elements of the system's density matrix ρ(t); in other words, the future values of the (averaged) density matrix elements, at times t + s (s > 0), are not completely determined by their current values at the moment of time t.At the same time, the static noise processes β j (t) representing the random frequency shifts are, of course, Markovian random processes, sastisfying the Chapman-Kolmogorov equation.
The two other components of the non-unitary evolution, addressed below, are Markovian, and can be described using the Lindblad operators.However, in order to avoid dealing with the density matrix, these components were also modeled by employing the random processes and calculating the averages of the relevant observables.
The second non-unitary component of the evolution corresponds to the Markovian dephasing, and can be described via the set of Lindblad superoperators corresponding to the phase damping channel.For an isolated qubit, this would lead to exponential decay of the transverse components of the j-th qubit, having the form exp (−t/T 2,j ).This kind of dephasing, being Markovian, can be described by a set of first-order differential equations, generalizing the well-known Bloch-Redfield equations [56], which include only the current values of the elements of the system's density matrix ρ(t), such that the future values of the density matrix elements, at times t + s (s > 0), are completely determined by their current values at the moment of time t.
This decay was modeled by taking the z-rotation of the j-th qubit produced by Ω j (i.e., produced by the action of the operator exp (−iH Z ∆t) in Eq.B6), and adding to it another time-dependent rotation around the z-axis by the angle γ j (t).For each time step of duration ∆t, the values γ j (t) were sampled randomly, indepedently of each other and of their previous values, from Gaussian distribution with zero mean and variance 4g 2 j τ j ∆t.This choice for the quantity γ(t) can be visualized as a rotation induced by a time-dependent frequency shift δω j (t), which is represented by an Ornstein-Uhlenbeck noise process with the correlation function δω j (t) δω j (t+s) = g 2 j exp (−|s|/τ j ), in the limit where the correlation time τ j is much smaller than ∆t, while the magnitude g j is large (formally, τ j → 0 and g j → ∞), but the combination g 2 j τ j = 1/T 2,j remains finite.For an isolated qubit, the average evolution under the influence of such Ornstein-Uhlenbeck noise δω j (t) is known [57] to produce exponential decay of the qubit's transverse (x-and y-) components with the decay time T 2 .Again, for each particular realization of the time-dependent random process γ j (t), the evolution of the system is unitary, and can be simulated using the system's wavefunction as described in subsection B 1 (provided, of course, that ∆t T 2,j , to ensure accuracy of the Suzuki-Trotter decomposition), while the non-unitary decay occurs due to averaging over different realizations of the noise.
The third non-unitary component, describing exponential relaxation of the j-th qubit towards the state |0 on a timescale T 1,j , was simulated in a similar manner, by representing the non-unitary evolution via averaging over many realizations of a random unitary evolution, employing the approach described in Ref. [58], with some modifications improving the accuracy.Namely, at each time step, we calculated the probability p j for the j-th qubit to make a transition ("quantum jump") from the state |1 to the state |0 ; the corresponding value is p j = w j1 µ 2 j , where µ j = 1 − exp (−∆t/T 1,j ), and w j1 is the total probability of the system to be in the subspace corresponding to the j-th qubit in the state |1 .This transition was implemented with the probability p j at each time step: the part of the system's wavefunction corresponding to the j-th qubit in the state |0 was replaced by its complement, i.e. by the part corresponding to the j-th qubit in the state |1 , multiplied by the factor µ j , and the part of the wavefunction corresponding to the j-th qubit in the state |1 was set to zero.Alternatively, with the probability 1 − p j at each time step, the part of the wavefunction corresponding to the j-th qubit in the state |1 was multiplied by the factor exp [−(1/2) ∆t/T 1,j ], while its complement was left unchanged.These transformations were applied to the wavefunction in succession, for all qubits (for all j = 1, . . .N ), and the resulting modified wavefunction was normalized back to 1. Since all these transformations commute with the action of the operator exp (−iH Z ∆t) in Eq.B6, they were applied at the end of each unitary time-step evolution, after application of the operator U (∆t) given by Eq.B6, in parallel with the action of the operators exp (−iH Z ∆t) or exp (−iH Z ∆t/2).
Note that this implementation corresponds to the application to the wavefunction of the Krauss operators E 0 or E 1 (see Ref. 59), describing the amplitude damping quantum channel, with the corresponding probabilities, where E 1 corresponds to the event of the "quantum jump", and E 0 corresponds to the absence of it.

Appendix C: Numerical simulation of superconducting qubit platform
From the discussion in the main text we are simulating a 2D grid of qubits that interact only with the nearest neighbours.The coupling between two neighbouring qubits through a coupler can be described by an Hamiltonian (3).In our notations it can be rewritten as where we introduce the notations x , H (i,j) yy = σ (i) y σ (j) y .
Thus, we define the 16 qubits 2D grid, where we generate ω 1 , ω 2 , ω c , g 1 , g 2 , g 12 from Gaussian distribution with mean and variance N (0, 1).The parameters a yy we estimate in our simulation and the rates of decay of T 1 , T 2 and T 2 are given in the Table III.The observables and initial states, isolating the desired coefficients a α i ,α j , αi, αj ∈ {x, y, z} are not zero, given in the second column.The fourth column contains a (i) α i , αi ∈ {x, y, z}, i = 1, . . ., 16.The rates of decay of T1, T2 and T 2 , corresponding to i's qubit are given in the fifth and six's columns, respectively.and a (i,j) yy , are given in Table IV.One can see, that we need three starting states, namely ρ 01 = ρ to isolate all four unknown coefficients.We measure the expectation values of observables in different times.Next, the time traces of these expectation values are fitted, using the polynomial interpolation method, and the derivatives estimation is preceded.Finally, using (A39), (A44), (A28) and (A28), the estimates of the coefficients a ,j) yy for the pair of (i, j) are obtained.We repeat this process for all pairs of interacting qubits to obtain all coefficients of the Hamiltonian of the 2D grid.
In the presence of the noise, the observables and initial states required to isolate the Lindbladian coefficients are given in Table V.One can see, that we need three extra starting states in the presence of the Lindbladian noise, namely ρ 04 = ρ µν , µ, ν ∈ {x, y, z}.

Appendix D: Approximating local time evolutions by polynomials
One of the main points behind our method is the fact that the time evolution of local observables at constant times is well-approximated by polynomials.The purpose of this section is to make this assertion precise.
x , ρ x,x } (A28) TABLE IV.In this table the minimal selection of the pairs {O, ρ (i,j) τ i ,τ j } is presented for our specific example.The first column represents the type of the estimated Hamiltonian (C1) parameters a (i) α i ,α j , αi, αj ∈ {x, y, z}.In the third column the number of equation for every parameter is provided, depending from the pairs of the observable O and the initial state ρ (i,j) τ j ⊗ ρN−2, τi, τj = {x, y, z}, given in the second column.To estimate all four parameters we need only three initial states: ρ (i,j) x,x , ρ (i,j) z,z and ρ (i,j) y,z .
yy , D zz {σ x , ρ (i,j) x,x }; {σ τ i ,τ j } is presented for our specific example.The first column represents the type of the estimated Lindbladian parameters D (i) µν , µ, ν ∈ {x, y, z}.In the third column the number of equation for every parameter is provided, depending from the pairs of the observable O and the initial state ρ (i,j) τi, τj = {x, y, z}, given in the second column.To estimate all parameters we need only three extra states to the ones given by the previous table, namely: ρ z,y and ρ (i,j) y,y .
Before we do that, let us set some notation.Given a system of n qubits on a D, we let L Γ : M 2 n → M 2 n be a Lindbladian which models the time evolution of the system in the Heisenberg picture.Note that in the supplementary information we consider a slightly more general class of evolutions than in the main text.There, we restricted to evolutions whose Hamiltonians were short range with two-body interactions and the noise acted on at most one qubit at a time.Here, in contrast, we will also consider k-local evolutions with long range.
We will assume that this Lindbladian can be written as: where L A is a Lindbladian only acting on the qudits in A. Given some graph G = (V, E) on n vertices, we will say that L is k−local if L A = 0 only if A is a subset of vertices of G containing at most k vertices.Furthermore, we will say that L Γ is locally bounded if there is a constant g > 0 such that for all B ⊂ Γ we have that: This condition is satisfied if e.g.L is a local Lindbladian on a D−dimensional lattice.In that case, we have g = O(D).However, this condition is also fulfilled for generators with algebraically decaying tails, as long as these tails decays fast enough.Moreover, for ease of notation we will let for a region B ⊂ Γ be the generator restricted to a subregion B. Furthermore, given the graph G, some region X ⊂ V and r > 0, we will denote by Λ r (X) the set of vertices that are a distance at most r from X: We will also require some norms for superoperators.Given a superoperator Φ : M 2 n → M 2 n we define for p, q ≥ 1 where • p corresponds to the Schatten p-norm.Also note that p = ∞ corresponds to the operator norm.Our goal will be to prove the following statement: Theorem D.1 (Polynomials approximate the evolution of local expectation values, informal).Let G Γ be a local Lindbladian on a D-dimensional regular lattice.Moreover, let t max , > 0 be given and O Y and observable supported on a constant number of qubits.Assume that G Γ satisfies a Lieb-Robinson bound.Then there is a polynomial p of degree such that for all 0 ≤ t ≤ t max : and We will start by showing a similar statement for the time evolution of truncated local evolutions.After that we will conclude by showing that truncated, local evolutions approximate the global evolution well.It is simple to see that derivatives of locally bounded, truncated evolutions can only increases with the size of the region they are defined on: Lemma D.1 (Derivatives of truncated local evolutions).Let L Γ be a locally bounded Lindbladian with constant g.For an observable O such that O ≤ 1, an initial state ρ and a region B ⊂ Γ define the evolution of the truncated evolution f B : R + → R as f B (t) = tr e tL B (O)ρ .Then for all t ≥ 0: In particular, for any 0 < t < t max we have that: Proof.The proof is elementary.Note that: = tr e tL B ((tL B ) k (O))ρ .Now, by Hölder's inequality we have that: where in (1) we used the submultiplicativity of the operator norm, i.e.Φ 1 Φ 2 ∞→∞ ≤ Φ 1 ∞→∞ Φ 2 ∞→∞ for all linear maps Φ 1 , Φ 2 .In (2) we used the fact that for any quantum channel e tG B ∞→∞ = 1 and the fact that the Lindbladian is locally bounded with constant g.The estimate in Eq. (D8) then immediately follows from Taylor's remainder theorem.
We then immediately have: Corollary D.1.In the same setting as Lemma D.1 it holds that for any given > 0 and t max > 0 there is a polynomial p of degree such that for all 0 ≤ t ≤ t max we have that |f B (t) − p(t)| ≤ .
Proof.It follows from Sitrling's approximation that the error in Eq. (D8) is bounded by It is then easy to see that picking d = 2et max g|B| log( −1 ) − 1 is sufficient to ensure that the error in (D8) is at most .Indeed, plugging in the value of d into Eq.(D10) we get: Thus, the truncated Taylor expansion yields the desired polynomial.
We conclude from Lemma D.1 and Cor.D.1 that local, truncated time evolutions are well-approximated by polynomials whose degree grows like the size of the region times the maximal time of evolution.
Also note that the estimate in Eq. ( D11) is quite loose and shows that for d as in Eq. (D9) the error decays like a polynomial of high-degree in .But that rough approximation will be sufficient for our purposes.
Cor. D.1 is an important step to prove our Thm.D.2, but still does not correspond to the exact statement we wish to prove.This is because Cor.D.1 is a statement about the local, truncated evolution, whereas Thm.D.2 is a statement about the global evolution being well-approximated by a polynomial of small degree.The strategy to go from the local to the global evolution, is to show that for the (local) observables required for our protocl, the local evolution approximates the global one well.
Our main tool to show this approximatability of expectation values are Lieb-Robinson bounds [43][44][45]60], which exactly give conditions under which the local time evolution and the global one are close for small enough times and local observables.In order to provide a self-contained presentation, we include a brief introduction to Lieb-Robinson bounds in Sec.G of this appendix.
In fact, there are various ways of quantifying this idea of local approximability and, thus, LR-bounds come in various forms.The version that we are going to work with here and which is discussed in detail in Sec.where h : R + → R + is a monotonically increasing function such that lim R→+∞ h(R) = 0 and v is some constant that depends on the generator, usually called the LR-velocity.On the other hand, the decay of the function h typically depends on how fast the interactions in the system decays spatially (i.e. if it is strictly local, exponentially decaying in the distance or even algebraically decaying) and the geometry of the underlying lattice.However, the important point for our purposes is that it does not depend on the system size.For the specific case of short-range Hamiltonians discussed in the main text, we have that h(R) = e −µR for some constant µ > 0. For algebraically decaying evolutions we usually have h(x) = O(x −k ) for some k ∈ R + .We refer again to Sec.G for a discussion of various LR-bounds available in the literature.But from Eq. (D13) we immediately conclude that the values of the expectation values of global and local evolutions are well-approximated by each other.More precisely: Proposition D.1.Let O Y be an observable supported on some region Y , > 0 and t max be given.Assume Eq. (D13) holds for the time evolution G Γ and a function h.Let l > 0 be given by l = h −1 e t max − 1 .
Then we have for Λ l (Y ) and all 0 < t < t max and any initial state ρ that Proof.The claim follows directly from Eq. (D13) or Lemma G.1 and a Hölder inequality.Indeed, for the value of l in Eq. (D14), we obtain from Eq. (D13) after some simplification that It is well-known that if p is a polynomial of degree d, then it is uniquely determined by its values at d + 1 points.Thus, one could naively expect that having access to f (t 1 ), . . ., f (t d) points for d ∼ d times is sufficient to reconstruct d.
However, the present situation exhibits three challenges that need to be overcome to ensure that we can reliably apply polynomial interpolation methods and recover p from points f (t i ): 1. we can only estimate f (t), and not p(t).And the value of f only approximates that of p up to some error a , as discussed in Thm.D.2.
2. we do not have access to the value of f (t) directly, but can only approximate it to a precision s by sampling from the output of the device at time t O( −2 s ) times.3. we are interested in the value of p (0) and not in the polynomial p itself.Thus, we need to ensure a small error in estimating the derivative.
To deal with the first two problems the polynomial interpolation technique we use has to be robust to the noise stemming from both the approximation error from the polynomial approximation and the statistical noise.To deal with the third issue, we will show that we need to pick the final and initial time of the interpolation in a judicious manner.
To obtain some intuition about how to pick the times, let us consider the case of estimating the derivative of a quadratic polynomial at 0. I.e., if we have two linear functions p, p that are close in some interval [a, b], how well can we infer the derivative of p at 0 from that of p? First, note that if the interval [a, b] is very small, then two functions can differ by and their derivatives can still differ by /(b − a) even for linear functions.This indicates we probably do not want to pick b − a too small.However, as we know that increasing b also implies that, in our setting, we need to increase the degree of the polynomial for the fitting, which in turns increases the number of points we need to estimate, this hints at the fact that picking b − a of constant order will be optimal.
On the other hand, it is also clear that the closer a is to 0, the more information about the value of p (0) we can infer from the interpolation.Thus, this discussion suggests that picking a as close to 0 as possible and b of constant order should give the best results.We will prove this intuition later in this section, but first will discuss robust interpolation.
Directly interpolating through the noisy data can be an unstable procedure if we do not pick the interpolating points wisely and perform a suitable regression.Recent results have shown how to perform polynomial interpolation in an essentially optimal fashion in a robust way even with a fraction of the points being outliers [38].Let us now review the results of [38].
We will now assume we wish to estimate a polynomial p : [−1, 1] → R of degree d, as this corresponds to the setting of [38].Note that for Hamiltonian learning, we will be interested in the case where the domain is of the form [a, b] for a, b ≥ 0. However, we can simply shift and rescale the domain to [−1, 1].When we summarize our results later, we will dicuss the effect of this rescaling explicitly.We will assume we are given access to m random samples (x i , y i ) of points such that a fraction of at least α > 1/2 of them satisfies for some σ > 0 that: There are results available for various different ways of sampling the points x i .However, the best available sample complexity is given by sampling from the Chebyshev measure, which has density on the interval [−1, 1].We then have: Theorem E.1 (Robust polynomial interpolation).Let p : [−1, 1] → R be a polynomial of degree d and assume we are given m samples (x i , y i ) such that a fraction α > 1/2 of them satisfies Eq. (E1) for some σ > 0.Moreover, suppose that the x i were sampled independently and at random from the Chebyshev measure.Then for any δ > 0 samples suffice to with probability of success at least 1 − δ recover a polynomial p that satisfies: Moreover, p can be computed in time polynomial in the number of samples m.
Proof.We refer to [38,Corollary 1.5] for a proof and note that we obtain the statement by setting the parameter = 1/2 in their statement.
We note that the same result holds for random points picked from the uniform measure with m = O(d 2 ).The result above solves our problem of robust polynomial interpolation outlined in points 1 and 2. It shows that it if we can ensure that we can approximate sufficiently many points of the polynomial up to some σ, then we also recover the whole polynomial up to some error proportional to σ.Moreover, the number of sampled required only has a logarithmic overhead in d when compared with the case where we know the points exactly.As we will see later, for our puproses it will be important to choose d to be small.Thus, in a nutshell, we see that Thm.E.1 ensures that we can reliably and robustly perform polynomial interpolation by only a small overhead when compared to when we know the points exactly.
We will later describe in more detail the algorithm given in [38] whose output satisfies the promises of Thm.E.1.However, before that we will show how the condition in Eq. (E3) ensures that we can also recover the derivative of the polynomial as long as the degree d is small.
To do that, we will resort to Markov brothers' inequality, which we restate now for completeness.
Lemma E.1 (Markov brothers' inequality).For d, k ∈ N define the constant C M (d, k) to be given by Then for any polynomial p of degree d we have that: Proof.We refer to [61, Theorem 1.2] for a proof and discussion of this result.
Note that the value of CM (d, k) increases exponentially with d for k constant.We remark that having further promises on the structure of the polynomial, such as the location of its zeros, can greatly improve this estimate.We once again refer to [61, Chapter 1] for a discussion on this.It would be interesting to see if recent results on the analyticity of the partition function [29] could be used in our context to also improve this estimate, as it grows exponentially with d.However, this general bound will suffice for our purposes.
It is easy to see that for polynomials defined on some interval [a, b], the polynomial p(x) = p( b−a 2 x + a+b 2 ) is defined on [−1, 1] and we can use this simple transformation to obtain a variation of Eq. (E5) for polynomials defined on general intervals.Indeed, applying Eq. (E5) to p, it follows from a straightforward application of the chain rule that max From this we conclude that: Lemma E.2 (Extrapolating the derivative at 0).Let p : [0, b] → R be a polynomial of degree d such that for some > 0 and 0 < a < b: and Proof.It is easy to see that we have: where we used the fact that a ≤ d −2 .Thus, as we conclude that with this choice of parameters we have which gives the claim.
We will discuss in Sec.F how to specialize the discussion and results above to the scenario of Hamiltonian learning.
Appendix F: Choice of parameters and performance guarantee of the protocol Let us now combine the results from Sections D and E to see how to pick the various parameters of the algorithm to ensure a good recovery of the couplings of the Hamiltonian.
More precisely, given a coupling parameter a i,j of L we will be interested in estimating the sample complexity of obtaining an estimate âi,j satisfying |a i,j − âi,j | ≤ (F1) with high probability for some given error .As extensively discussed by now, we can easily reduce estimating the couplings to estimating derivatives of time evolutions of local observables.
As expected, we will see that the main parameters we need to control are the maximal observation time t max and the initial time of measurement t 0 .This is showcased in the following Theorem: Theorem F.1 (Choice of final and initial time).Let L Γ be a locally bounded Lindbladian on a D-dimensional regular lattice with g = O(1) growth constant.Let O be an observable of constant support and ρ and arbitrary quantum state.Let > 0 be given.Assume that L Γ satisfies Eq. (D13) for some function h.Then picking t 0 as In particular, this estimate can be obtained from Õ( −2 log(δ −1 )) (F4) samples from the time evolved state ρ with probability of success at least 1 − δ.
Proof.First, note that by Lemma.D.1, if we pick t max as described above, then a polynomial p of degree 2(e 2.5v − 1) is sufficient to approximate the expectation value in the interval [0, t max ] up to an error /2, as t max ≤ 2.5.We will estimate the value of the polynomial at each point up to an error σ > 0, which is to be determined later.Thus, by inserting the bound on the degree d in Eq. (E16), we need to estimate each value of the polynomial up to a precision σ = O( ) to obtain an overall error of 2(e 2.5v − 1) As we imposed that the precision with which the polynomial approximates the expectation values is /2, we can estimate the value of the polynomial for a given time up to an error O( ) from O( −2 ) samples.
As we have to sample 2(e 2.5v − 1) points to perform the stable interpolation, we obtain the advertised sample complexity.
For the case of strictly local or exponentially decaying interactions we have that h −1 2(e 2.5v − 1) = poly log( −1 ).(F7) In that case the sample complexity is of order Õ( −2 ).Thus, in this case we see that the inverse initial time t −1 0 and the number of points we need to sample from is polylogarithmic in .Furthermore, the sample complexity to obtain an error is also Õ( −2 ) up to polylogarithmic corrections.
For the sake of completeness, let us now discuss the conditions under which our protocol works beyond the setting of exponentially decaying or short-range interactions.From Eq. (F3) the condition for our procedure to work becomes transparent: we need that h −1 2(e 2.5v − 1) Indeed, in this case we have the property that it is possible to suitably re-scale the error to ensure that the total precision is at some desired precision ˜ .For instance, let us assume that h −1 2(e 2.5v − 1) = O( −r ), (F9) for some r > 0. As we discuss later, this is typically the case for algebraically decaying interactions.For such a LR-bound, we see that the resulting error in Eq. (F3) is the super-operator [A X , •] for an arbitrary bounded super-operator K X : M 2 n → M 2 n supported on X leading to a Lieb-Robinson-bound of the form K X (O(t) Y ) ≤ Ch(dist(X, Y ))(e vt − 1), (G2) with C depending on K X ∞→∞,cb .However, if K X is of the form K X = [A X , •], we have K X ∞→∞,cb ≤ 2 A X ∞ , which allows us to recover the commutator [60,63].In the following, we consider a regular lattice Λ and assume that the dynamics is generated by a Lindbladian that decomposes according to Following [43,63,64], we define the maximal interaction strength J = sup X⊂Λ L X 1→1,cb as well as the decay behaviour of the interactions µ(r) = sup X⊂Λ:diam(X)=r in terms of the stabilized 1-to-1-norm T 1→1,cb = sup n T ⊗ id n r→1 .We can then characterize L as finite range if µ(r) = 0 for r ≥ R > 0, exponentially decaying if µ(r) ≤ e −µr and algebraically decaying if µ(r) ≤ (1 + r) −α for α > 0 and state the following Lieb-Robinson-bound for Lindbladians Theorem G.1 (dissipative LR-bound [63]).Let L be a Lindbladian of the form (G3), O Y an observable supported on Y ⊂ Λ and K X : M 2 n → M 2 n with K X (id X ) = 0. Then We notice, that the term inside the integral is exactly of the form of the left-hand side of (G2) with K X = L X .Hence, we can insert the standard LR-bound for dissipative dynamics from (G2) here and are left with a combinatorial problem in terms of the decay bounds.This can be done explicitly for several standard interaction decays, such as finite range, exponentially decaying or algebraically decaying interactions [60,63].In particular, based on the Lieb-Robinson bound in Thm.G.1, we obtain We remark that all these bound give us the required independence of the right-hand side from the overall system size.We expect that with the help of recent more stringent estimates on Hamiltonians with algebraic decay, it will most likely be possible to extend and strengthen these bounds for other algebraic decays.which yields (H4).This concludes all the computations required for Cor.(H.1).
Note that the protocol only requires the preparation of Pauli states and Pauli measurements.Thus, it should be feasible to implement it on near term devices.Additionally, the required postprocessing is efficient, as evaluating the value of X a,b can be efficiently given a sample.
Furthermore, note that if we further have the information that we do not to wish to recover certain bases (i.e.we do not wish to recover Pauli strings with Y terms), it is possible to adapt the protocol and do not prepare initial states or measure in that basis.This will reduce the sample complexity accordingly.
FIG.3.Median quality of recovery of one 2-qubit coupling using our method and those based on numerical derivatives[30- 32]  as a function of the standard deviation of the shot noise.The initial time for this estimate is 30 ns and here we also generated 1000 instances of the noise with a given standard deviation.The plot shows the median quality of the recovery and the 25 and 75 percentiles.We see that the quality of the recovery for the interpolation decays approximately linearly with the shot noise.For the numerical derivative, we see two regimes: first a linear decay of the error until a shot of noise of order 10 −3 .After that, the error plateaus and does not improve even with smaller shot noise.This is because for numerical derivative methods, at this point the dominant error source comes from the choice of initial time.Importantly, we see that interpolation consistently provides better estimates than the numerical derivatives method.
5v −1) D log( −1 )) points.Then we can estimate each parameter up to an error of G, considers an observable O Y initially supported on in a region Y .For a region B ⊃ Y we set R = dist(Γ\{B}, Y )/k, where k is the locality of the generator, then it is shown in Lemma G.1 that indeed (e tLΓ − e tL Λr (Y ) )(O Y ) ∞ ≤ (e vt − 1)h(R), (D13)

K
X (O(t) Y ) ≤ K X ∞→∞,cb O Y min(|X|, |Y |)h(dist(X, Y ))(e vt − 1), (G4)with h(r) = e −νr for L exponentially decaying or finite range and h(r) = (1 + r) ν if L is algebraically decaying with α > 2D + 1 with ν < α − (2D + 1).As stated above, in this work, we require a slightly different formulation of the LR-bound as given in(6), namely(e tLΛ − e tL Λr (Y ) )(O Y ) ≤ c 1 h(diam(Λ r(Y ) ))(e vt − 1),(G5)which reflects directly that the dynamics of the system can already be described by a generator L Λ r(Y ) restricted to a region Λ r(Y ) of diameter r around the initial support Y of the observable O Y .To convert a bound of the form (G2), we follow the reasoning given in[60,63].We can express the difference of the dynamics generated by the full Lindblad generator L as compared to a restriction L Λr to the subset Λ r ⊂ Λ according to (e tL − e tLΛ r )O Y = − t 0 ds ∂ s e sLΛ r e (t−s)L O Y = t 0 ds e sLΛ r (L − L Λr ) e (t−s)L O Y (G6) Taking norms on both sides, we therefore obtain an upper bound of the form (e tG − e tLΛ r )O Y ≤ X ⊂Λr t 0 ds L X e (t−s)LO Y .(G7)

TABLE I .
The first column represents the type of the estimated Hamiltonian parameters a

TABLE II .
The first column represents the type of the estimated Lindbladian parameters D

TABLE III .
The first column represents the numbers of qubits whose coefficients a

TABLE V .
In this table the minimal selection of the pairs {O, ρ