Practical Hamiltonian learning with unitary dynamics and Gibbs states

Gu, Andi; Cincio, Lukasz; Coles, Patrick J.

doi:10.1038/s41467-023-44008-1

Download PDF

Article
Open access
Published: 08 January 2024

Practical Hamiltonian learning with unitary dynamics and Gibbs states

Nature Communications volume 15, Article number: 312 (2024) Cite this article

2374 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

We study the problem of learning the parameters for the Hamiltonian of a quantum many-body system, given limited access to the system. In this work, we build upon recent approaches to Hamiltonian learning via derivative estimation. We propose a protocol that improves the scaling dependence of prior works, particularly with respect to parameters relating to the structure of the Hamiltonian (e.g., its locality k). Furthermore, by deriving exact bounds on the performance of our protocol, we are able to provide a precise numerical prescription for theoretically optimal settings of hyperparameters in our learning protocol, such as the maximum evolution time (when learning with unitary dynamics) or minimum temperature (when learning with Gibbs states). Thanks to these improvements, our protocol has practical scaling for large problems: we demonstrate this with a numerical simulation of our protocol on an 80-qubit system.

Learning quantum Hamiltonians from high-temperature Gibbs states and real-time evolutions

Article 06 March 2024

Sample-efficient learning of interacting quantum systems

Article 24 May 2021

Improved machine learning algorithm for predicting ground state properties

Article Open access 30 January 2024

Introduction

An increasingly relevant task for the study of many-body quantum systems is to learn the associated Hamiltonian operator efficiently (i.e., without requiring resources that scale exponentially in system size). In condensed matter physics, we can experimentally verify our models of quantum materials by comparing theoretical predictions about their effective interactions with the interactions inferred by Hamiltonian learning^1,2,3,4. This verification is also applicable for quantum device engineering. With the expanding capabilities of quantum computers, it is increasingly important to be able to certify their behavior⁵, and while benchmarking protocols can give coarse-grained information about a particular quantum device, knowing its Hamiltonian can be significantly more powerful, allowing us to design improved devices^6,7,8 or better understand the physical origin of failure modes^9,10,11.

Several promising approaches have been proposed for the Hamiltonian learning problem. An early work¹² demonstrated that systems with local Hamiltonians can be efficiently characterized without requiring full state tomography, which is costly in terms of accuracy in the trace norm. However, this method was limited in its applicability and found to be prohibitively expensive in general. Subsequent approaches^13,14,15,16 successfully employed machine learning on small systems. Nonetheless, these methods lacked rigorous performance guarantees or scaling results that would provide confidence in their application to larger systems, as their performance on such systems has not been explored beyond limited numerical studies. Additionally, several proposals^17,18,19 suggested learning the coefficients of the Hamiltonian by solving a system of linear equations, with the coefficient matrix determined by local measurement outcomes. However, the performance of these approaches relies on the spectral gap of the coefficient matrix, which remains poorly characterized. Recent works^20,21 have achieved asymptotically optimal sample complexities, albeit with large constant prefactors that render them impractical in real-world scenarios.

In this work, we propose a protocol for Hamiltonian learning that aims to address these shortcomings. Our protocol is motivated by a major application of Hamiltonian learning, which is the characterization of near-term quantum computers. To accommodate this application, our protocol is designed to make relatively weak assumptions about the nature of the system. Specifically, we assume:

The Hamiltonian we are interested in learning is sparsely interacting (these are generalizations of k-local Hamiltonians; see Definition 2).
Our interaction with the system is limited to the ‘prepare-and-measure’ model – that is, we do not require the ability to interact with the system under study via another trusted quantum simulator e.g.,^22,23,24 or make interventions (other than measurement) after initializing the system²⁵. Two examples of this prepare-and-measure setup are making measurements on time-evolved states or on Gibbs states (Fig. 1). In these two settings, we assume that we can control the evolution time and the temperature, respectively.
We can prepare fully separable states and make Pauli measurements.

We note that the practicality of these assumptions depends on the experimental platform. Indeed there are other approaches that impose even more stringent assumptions, such as the restriction of only being able to prepare a single fixed initial state²⁶, or the ability to make measurements on only a single site²⁷. In our work, we do not impose such restricted assumptions, as they do not align with the application we focus on, namely the characterization of near-term quantum computers. In this context, it remains a natural assumption that we have the ability to prepare arbitrary product states and perform Pauli measurements on arbitrary sites. A further advantage of our protocol is that it is easily parallelizable. In short, in this work, we will describe a Hamiltonian learning protocol that requires only ${{{{{{{\mathcal{O}}}}}}}}\left({\epsilon }^{-2}{{{{{{{\rm{polylog}}}}}}}}(n/\epsilon )\right)$ samples to recover every parameter of a sparsely interacting n-qubit Hamiltonian up to an error ϵ. We will conclude by providing a concrete prescription for optimal configurations of the protocol when used in practice, and demonstrate its performance with numerical examples.

Results

In this work, we will treat the system under study as a black box system with an unknown Hamiltonian H, and our goal will be to efficiently infer H with access to only a limited number of inputs to, and outputs from the black box. Importantly, we use the ‘prepare-and-measure model’ of interaction with our system (see Fig. 1). This model of interaction prohibits any quantum channel between the system under study (whose Hamiltonian we are trying to learn) and some other quantum processing unit. Furthermore, after initializing the system in some state, it prohibits any interaction with the system other than making measurements. Two typical examples of this are Hamiltonian learning using unitary dynamics and Gibbs states. For the former, we initialize the system in some known state ρ₀, and evolve it forward in time by t, resulting in the state:

$$\rho (t)={e}^{-iHt}{\rho }_{0}{e}^{iHt}.$$

(1)

For the latter, we assume we have access to a system in thermal equilibrium at a temperature β⁻¹. That is, we have access to the Gibbs state

$$\rho (\beta )=\frac{\exp (-\beta H)}{{{{{{{{\rm{Tr}}}}}}}}(\exp (-\beta H))}.$$

(2)

We assume that we can control the parameters t and β, respectively. Finally, we assume that we can measure some observable P of the final states ρ(t) and ρ(β). However, we do not insist on arbitrary control over ρ₀ and P; we only consider the case where ρ₀ is fully separable and P is a local Pauli operator.

**Fig. 1: Classical interaction with quantum systems.**

Using these two interaction models, we propose a method for Hamiltonian learning that relies on a simple intuition. For some particular state preparation and measurement (SPAM) settings (consisting of a prescription for the observable P, and in the case of unitary dynamics, the initial state ρ₀), which we write as ${{{{{{{\mathcal{S}}}}}}}}$, we can define a function ${f}_{{{{{{{{\mathcal{S}}}}}}}}}$ as the expectation value of P on the state ρ(t) and ρ(β):

$${f}_{{{{{{{{\mathcal{S}}}}}}}}}(x)=\left\{\begin{array}{ll}{{{{{{{\rm{Tr}}}}}}}}(P\rho (t=x))\quad &{{{{{{{\rm{for}}}}}}}}\,{{{{{{{\rm{unitary}}}}}}}}\,{{{{{{{\rm{evolution}}}}}}}}\\ {{{{{{{\rm{Tr}}}}}}}}(P\rho (\beta=x))\quad &{{{{{{{\rm{for}}}}}}}}\,{{{{{{{\rm{Gibbs}}}}}}}}\,{{{{{{{\rm{states}}}}}}}}.\hfill\end{array}\right.$$

(3)

We will show that for the appropriate choice of SPAM parameters, ${f}_{{{{{{{{\mathcal{S}}}}}}}}}(x)$ can be viewed as black box function in x. Using this framework, we describe our basic approach below. For concreteness, we will consider learning with unitary evolution (the analysis for Gibbs states follows similarly in the Supplementary Note 5). First, to assist the reader, we provide below a glossary of notation (Table 1) to serve as reference.

Table 1 Glossary of Notations

Full size table

Preliminaries

To set the stage, we first give a formal definition of the Hamiltonian learning problem and define a sparsely interacting Hamiltonian.

Definition 1

(Hamiltonian learning problem). Fix a Hamiltonian on an n-qubit system that has an expansion in the Pauli basis:

$$H=\mathop{\sum }\limits_{i=1}^{r}{\theta }_{i}{P}_{i},$$

(4)

where each ${P}_{i}\in {\left\{I,{\sigma }_{x},{\sigma }_{y},{\sigma }_{z}\right\}}^{\otimes n}$ is a Pauli operator and ${{\Theta }}={\left[{\theta }_{1},\ldots,{\theta }_{r}\right]}^{T}\in {{\mathbb{R}}}^{r}$ are the Hamiltonian coefficients. We assume the Hamiltonian is traceless (i.e., P_i ≠ I^⊗n), and that we know the structure of the Hamiltonian (i.e., which Paulis P_m are present in the expansion), but that the coefficients θ_m are unknown. The Hamiltonian learning problem is to infer all of the coefficients θ_m up to an additive error $\epsilon \cdot \mathop{\max }\nolimits_{m}\left|{\theta }_{m}\right|$ with success probability at least 1 − δ. We will assume two types of data access which define different variants of the Hamiltonian learning problem.

1.
Unitary evolution: We can prepare the system in some initial product state ρ₀ and evolve for a specifiable duration of time t. We can then make a measurement of some local Pauli observable on this time-evolved state.
2.
Gibbs states: We can prepare the system in a Gibbs state at some specifiable temperature. We can then make a measurement of some local Pauli observable on this Gibbs state.

Definition 2

(Sparsely interacting Hamiltonian). The interaction graph ${{{{{{{\mathcal{G}}}}}}}}$ (called the “dual” interaction graph by Haah et al.²¹) of a Hamiltonian consists of a set of vertices V and edges E.

$$V=\left\{{P}_{i}| i=1,\ldots,r\right\}\,,$$

(5)

$$E=\left\{\left({P}_{i},{P}_{j}\right)| \left({{{{{{{\rm{supp}}}}}}}}\left({P}_{i}\right)\cap {{{{{{{\rm{supp}}}}}}}}\left({P}_{j}\right)\ne \varnothing \right)\wedge \left(i \, \ne \, j\right)\right\}\,.$$

(6)

Each vertex represents one Pauli operator P_i in the Hamiltonian, and there are edges between two vertices if the support of their corresponding Pauli operators overlap. The support of a Pauli, supp(P), is the set of sites that P acts nontrivially on. We also define the degree ${{{{{{{\mathscr{D}}}}}}}}$ of the Hamiltonian to be the maximum degree of any node in the interaction graph:

$${{{{{\mathscr{D}}}}}}=\mathop{\max}\limits_{v {\in} {V}} {\deg}\!(v).$$

(7)

A Hamiltonian is sparsely interacting if ${{{{{{{\mathscr{D}}}}}}}}={{{{{{{\mathcal{O}}}}}}}}\left(1\right)$ (that is, ${{{{{{{\mathscr{D}}}}}}}}$ does not depend on system size). Notably, this class of Hamiltonians includes geometrically k-local Hamiltonians, as this locality constraint implies that the number of terms overlapping with any Pauli term is a function of k alone.

Example 2.1

In Fig. 2, we show a sample interaction graph for a 9-qubit transverse field Ising model (TFIM), whose Hamiltonian is

$$H=\mathop{\sum }\limits_{i=1}^{8}{\sigma }_{z}^{(i)}{\sigma }_{z}^{(i+1)}+\mathop{\sum }\limits_{i=1}^{9}{\sigma }_{x}^{(i)}.$$

(8)

The TFIM will serve as a prototypical example for the rest of this work.

**Fig. 2: Interaction graph for a transverse field Ising model.**

Collecting necessary data

Writing the Taylor expansion of Eq. (3) ${f}_{{{{{{{{\mathcal{S}}}}}}}}}(t)=\mathop{\sum }\nolimits_{m=0}^{\infty }{c}_{m}\frac{{t}^{m}}{m!}$, our protocol will focus on extracting Hamiltonian parameters using only the first order coefficient of the Taylor expansion c₁. To infer this coefficient, we will need to collect data that allows us to estimate ${f}_{{{{{{{{\mathcal{S}}}}}}}}}(t)$. The amount and nature of this data will depend on the higher order derivatives c_m. More specifically, together with the desired accuracy of the learning protocol ϵ, a bound on the norm $\left|{c}_{m}\right|$ will determine the required accuracy for our estimate of ${f}_{{{{{{{{\mathcal{S}}}}}}}}}(t)$, the number of different points at which we evaluate the function, and the specific times at which we evaluate it. The scaling we find for $\left|{c}_{m}\right|$ varies depending on whether we are using unitary dynamics or Gibbs states, and also depends on the assumptions we make about the Hamiltonian (i.e., the structure parameter ${{{{{{{\mathscr{D}}}}}}}}$, and whether the Hamiltonian is commuting). This bound is a crucial determining factor for the rest of our algorithm. In this work, we find

$$\left|{c}_{m}\right|\sim \left\{\begin{array}{ll}{{{{{{{\mathcal{O}}}}}}}}\left({{{{{{{{\mathscr{D}}}}}}}}}^{m}m!\right)\quad &{{{{{{{\rm{for}}}}}}}}\,{{{{{{{\rm{sparsely}}}}}}}}\,{{{{{{{\rm{interacting}}}}}}}}\,{{{{{{{\rm{Hamiltonians}}}}}}}}\,{{{{{{{\rm{using}}}}}}}}\,{{{{{{{\rm{unitary}}}}}}}}\,{{{{{{{\rm{dynamics}}}}}}}}\\ {{{{{{{\mathcal{O}}}}}}}}\left({{{{{{{{\mathscr{D}}}}}}}}}^{m}\right)\quad &{{{{{{{\rm{for}}}}}}}}\,{{{{{{{\rm{commuting}}}}}}}}\,{{{{{{{\rm{Hamiltonians}}}}}}}}\,{{{{{{{\rm{using}}}}}}}}\,{{{{{{{\rm{unitary}}}}}}}}\,{{{{{{{\rm{dynamics}}}}}}}}\hfill\\ {{{{{{{\mathcal{O}}}}}}}}\left({{{{{{{{\mathscr{D}}}}}}}}}^{2m}m!\right)\quad &{{{{{{{\rm{for}}}}}}}}\,{{{{{{{\rm{sparsely}}}}}}}}\,{{{{{{{\rm{interacting}}}}}}}}\,{{{{{{{\rm{Hamiltonians}}}}}}}}\,{{{{{{{\rm{with}}}}}}}}\,{{{{{{{\rm{Gibbs}}}}}}}}\,{{{{{{{\rm{states.}}}}}}}}\hfill\end{array}\right.$$

(9)

Importantly, due to the structure of the Hamiltonian (i.e., it is sparsely interacting), $\left|{c}_{m}\right|$ does not depend on the size of the system. This enables our protocol to achieve a sample complexity that scales only polylogarithmically in n.

Having characterized the higher order derivatives, we return to c₁: as mentioned above, this is the only derivative we are interested in. This is because, with the appropriate SPAM configuration, the first order Taylor coefficient c₁ will correspond to exactly one Hamiltonian parameter. More precisely, by expanding H in the Pauli basis, we find that there is always at least one pair (P, ρ₀) such that ${c}_{1}={{{{{{{\rm{Tr}}}}}}}}(i[H,P]{\rho }_{0})$ corresponds exactly to one of the Hamiltonian coefficients θ_m. However, this approach only allows us to extract one Hamiltonian parameter at a time. It turns out that if we are careful, we can learn entire sets of parameters at once by applying simultaneous measurements. These sets of parameters can be chosen with an efficient classical analysis of the Hamiltonian’s interaction graph: the key idea is that if two Pauli terms in the Hamiltonian are far enough apart, they have no effect on each other (to first order in time). After these sets are chosen, we can use a single fixed state ρ₀, and a set of commuting observables $\left\{{P}_{i}\right\}$ such that each ${{{{{{{\rm{Tr}}}}}}}}\left(i\left[H,{P}_{i}\right]{\rho }_{0}\right)$ extracts one Hamiltonian parameter, and all the observables P_i can be measured simultaneously. Furthermore, the observables P_i can be chosen to be single qubit Paulis and the initial state ρ₀ will be a fully separable state. The reduced state for each site will be either the maximally mixed state I/2 or an eigenstate of X, Y, or Z; the full state ρ₀ is a tensor product of these single qubit states. These states are easily prepared from ${\left|0\right\rangle }^{\otimes n}$ by applying a constant number of single qubit gates. This simultaneous measurement technique allows us to learn all the Hamiltonian parameters with a sample complexity that is only logarithmic in the number of parameters.

After the SPAM parameters have been determined, we then evaluate ${f}_{{{{{{{{\mathcal{S}}}}}}}}}$ to collect a dataset that will subsequently allow us to infer c₁. This dataset collection is the only part of our protocol that requires interaction with the system under study. For Hamiltonian learning with unitary dynamics, this involves initializing the system in a product state ρ₀, evolving for some time t₁, then measuring the set of observables P_i. This is repeated L times for different (predetermined) evolution times t₁, t₂, …t_L∈ [0, A] up to some maximum time A.

Classical postprocessing

Having constructed our dataset, our Hamiltonian learning protocol can be summarized as follows. For each Hamiltonian parameter θ_i, we fit the corresponding data in our dataset with a degree L − 1 polynomial in t. The first derivative of this fitted polynomial at t = 0 serves as an estimate for the parameter θ_i. The following is an informal sketch of our algorithm.

By using a form of polynomial regression known as Chebyshev regression (which simply consists of choosing t_ℓ judiciously), we can guarantee that c₁ can be estimated with a bias ${{{{{{{\mathcal{O}}}}}}}}\left(\frac{{A}^{L}\left|{c}_{L}\right|}{L!}\right)$. If $\left|{c}_{L}\right|$ grows no faster than a factorial, as is the case in Eq. (9), the bias decreases (at least) as a power law in L for suitably chosen A. However, our overall error scaling cannot achieve this bound due to the presence of noise when evaluating ${f}_{{{{{{{{\mathcal{S}}}}}}}}}$, as increasing L will result in an increase in the variance of our estimator for c₁. The modeling error (bias) must be carefully traded against the effects of noise (variance). By appropriately balancing these two, we show that we are able to achieve almost shot noise-limited performance. This is made precise by the following theorem.

Theorem 1

(Hamiltonian learning with unitary dynamics). For the appropriate choice of Chebyshev degree $L \sim {{{{{{{\mathcal{O}}}}}}}}\left(\log {\epsilon }^{-1}\right)$ and evolution time $A \sim {{{{{{{\mathcal{O}}}}}}}}\left(1\right)$, the algorithm shown in Box 1 solves the Hamiltonian learning problem with sample complexity

$${{{{{{{\mathcal{O}}}}}}}}\left(\frac{{{{{{{{{\mathscr{D}}}}}}}}}^{4}\log (r/\delta )\,{{{{{{{\rm{polylog}}}}}}}}({{{{{{{\mathscr{D}}}}}}}}/\epsilon )}{{\epsilon }^{2}}\right),$$

(10)

and classical processing time complexity

$${{{{{{{\mathcal{O}}}}}}}}\left(\frac{{{{{{{{{\mathscr{D}}}}}}}}}^{2}r\log (r/\delta )\,{{{{{{{\rm{polylog}}}}}}}}({{{{{{{\mathscr{D}}}}}}}}/\epsilon )}{{\epsilon }^{2}}\right).$$

(11)

Proof

See Supplementary Note 4.

Similar to the results of França et al.²⁸, this can be generalized, via careful selection of initial states and measurements, to learn the Lindbladian (when expanded in the Pauli basis) of open quantum systems undergoing Markovian dynamics. The sample and classical processing time complexity using Gibbs states is only worse by a factor ${{{{{{{\mathscr{D}}}}}}}}$ and ${{{{{{{{\mathscr{D}}}}}}}}}^{2}$, respectively.

Box 1 Algorithm for Hamiltonian learning with unitary dynamics (informal)

1: procedure INFERCOEFFICIENTS (N, L, A)

2: Partition the r Hamiltonian coefficients into ${{{{{{{{\mathscr{D}}}}}}}}}^{2}$ subsets $\left\{{{{{{{{{\bf{V}}}}}}}}}_{i}| i=1,\ldots,{{{{{{{{\mathscr{D}}}}}}}}}^{2}\right\}$

3: for each subset V_i do

4: Define the observables $\left\{{P}_{j}\right\}$ (one for each coefficient c_j ∈ V_i) and initial state ρ₀ such that ${{{{{{{\rm{Tr}}}}}}}}(i[H,{P}_{j}]{\rho }_{0}/2)={\theta }_{j}$.

5: Choose L different times $\left\{{t}_{\ell }\in [0,A]| \ell=1,\ldots,L\right\}$ at which to evaluate ${f}_{j}(t)\equiv {{{{{{{\rm{Tr}}}}}}}}({P}_{j}{e}^{-iHt}{\rho }_{0}{e}^{iHt})$.

6: For each ℓ, use an N-sample mean estimator to estimate f_j(t_ℓ)

7: Fit a degree L − 1 polynomial to the data (t_ℓ, f_j(t_ℓ)).

8: c₁ ← first derivative of the fitted polynomial at t = 0.

9: Output c₁/2.

Numerical simulations

In Theorem 1, we have established the theoretical sample and processing time complexities of our Hamiltonian learning protocol, indicating its effectiveness under certain settings of the Chebyshev degree L and evolution time A. However, to provide practical guidance, we now delve into the optimal configurations of our algorithm for real-world applications. This includes prescribing specific values for L and A based on numerical considerations. Additionally, we present compelling numerical results obtained from an 80-qubit transverse field Ising model (TFIM), providing empirical evidence that further supports the utility of our protocol. Our aim will be to learn the following TFIM Hamiltonian:

$$H=\mathop{\sum }\limits_{i=1}^{n-1}{J}_{i}{\sigma }_{z}^{(i)}\otimes {\sigma }_{z}^{(i+1)}+\mathop{\sum }\limits_{i=1}^{n}{B}_{i}{\sigma }_{x}^{(i)},$$

(12)

where J_i, B_i~Unif(−1, 1). We choose the TFIM for its broad range of applications²⁹, including its relevance for quantum computing platforms such as Rydberg atom arrays³⁰. The dynamics of this Hamiltonian are simulated with the time evolution block-decimation method^{31,32,33,34,35}.

Our protocol has two hyperparameters that determine its performance: the maximum evolution time A and the fitting polynomial degree L. Setting these parameters is a delicate balance between noise-induced error and modeling errors. If A is too low or L is too high, the variance in the dataset will dominate the error, and on the other hand, if A is too high or L is too low, the modeling error will dominate. It is generally desirable to set these two parameters such that the modeling and noise errors are comparable. However, in some settings, it may be desirable to let the dataset variance grow somewhat larger than the modeling error, since this error can be quantified exactly (see Supplementary Note 3), where ${\sigma }_{\ell }^{2}$ can be obtained by a bootstrap estimate from the dataset. There are no similar methods to quantify the modeling error. One possible method for setting A and L can be to optimize the error bounds (see Fig. 3). Numerically, these optimal values behave as anticipated in our theoretical analysis (Theorem 1): the optimal L^* scales with ${{{{{{{\mathcal{O}}}}}}}}\left(\log {\epsilon }^{-1}\right)$, and A~1. This leads to a sampling complexity that scales with ${{{{{{{\mathcal{O}}}}}}}}\left({{{{{{{\rm{polylog}}}}}}}}(1/\epsilon ){\epsilon }^{-2}\right)$.

**Fig. 3: Optimal hyperparameter settings.**

In Fig. 4, we show the error in the recovered Hamiltonian parameters corresponding to a target error of ϵ = 0.01. As expected, the theoretical prediction for the noise error is close to perfect. However, the modeling error is drastically overestimated by nearly four orders of magnitude. This miscalculated modeling error has important consequences for the algorithm, since it results in a poorly specified evolution time A. We propose a number of remedies for this in Supplementary Note 6; the improvements enabled by these techniques are shown in Fig. 5. As demonstrated by the figure, we are able to recover all 159 Hamiltonian parameters up to an error ≲10% using just ~10⁶ samples.

**Fig. 4: Empirical error of the Hamiltonian learning protocol.**

**Fig. 5: Empirical error distribution.**

Discussion

In this work, we have discussed the quantum Hamiltonian learning problem. We introduced a unifying model for Hamiltonian learning using both unitary dynamics and Gibbs states. By subsuming these two approaches into the same model, we were able to describe an abstract routine for learning the Hamiltonian of a quantum many-body system given limited access to the system. This routine was based on fixing certain SPAM parameters, then viewing the system as a function f of a single variable. In this work, we consider this variable to be either time t (in which case f represents the time-evolved expectation value of a Pauli observable) or inverse temperature β (in which case f represents the thermal expectation value of a Pauli observable). We argued that for the appropriate choice of SPAM parameters, the derivatives of f – particularly ${f}^{{\prime} }(t=0)$ – would correspond exactly to particular coefficients in the Hamiltonian. We then showed that ${f}^{{\prime} }(t=0)$ could be inferred both accurately and efficiently from noisy evaluations of f. Finally, we concluded by describing how our protocol could achieve better than linear sample complexity in r (the number of Hamiltonian parameters) by using SPAM configurations amenable to simultaneous measurements.

This culminated in our main result, wherein we proposed an algorithm that achieves an almost noise-limited ($\sim \frac{{{{{{{{\rm{polylog}}}}}}}}({\epsilon }^{-1})}{{\epsilon }^{2}}$) sample complexity, similar to that of Haah et al.²¹ and França et al.²⁸. However, our work represents an advance for several reasons. In comparison to Haah et al.²¹, we significantly reduce the sample complexity dependence on the parameter ${{{{{{{\mathscr{D}}}}}}}}$ from ${{{{{{{{\mathscr{D}}}}}}}}}^{21}$ to ${{{{{{{{\mathscr{D}}}}}}}}}^{4}$. In comparison to França et al.²⁸, while their approach includes only Hamiltonian learning from unitary dynamics, our protocol is generalizable to Gibbs states. Furthermore, our approach also offers an additional advantage. Unlike²⁸, which requires a geometrically local Hamiltonian, our protocol operates efficiently with a “sparsely interacting" Hamiltonian, which is a considerably weaker assumption. This advantage is particularly significant as it eliminates the need for geometric locality. Moreover, we enhance the measurement parallelization overhead from ${{{{{{{\mathcal{O}}}}}}}}(1{6}^{k})$ (assuming a geometrically k-local Hamiltonian) to ${{{{{{{\mathcal{O}}}}}}}}\left({{{{{{{{\mathscr{D}}}}}}}}}^{2}\right)$, a substantial improvement. This is especially relevant in practical applications, where we can often a priori rule out the presence of certain terms in our Hamiltonian from physical constraints or symmetry considerations. That is, oftentimes, we have ${{{{{{{\mathscr{D}}}}}}}}\ll {4}^{k}$; in these settings, our protocol can provide a significant advantage. Furthermore, by deriving explicit bounds on the performance of our algorithm, we were able to provide precise numerical prescriptions for theoretically optimal hyperparameters such as maximum evolution time and Chebyshev degree. We concluded by proposing a number of heuristic improvements to our algorithm, and argued they were reasonable to apply in general. This combination of improvements makes significant steps towards achieving a practically useful protocol that can be applied experimentally, as indicated by the demonstration of our protocol on a large (80-qubit) simulated problem.

Although we have demonstrated a successful application of our learning algorithm on a simulated problem, this simulation did not include possible detrimental experimental effects. With respect to SPAM errors, our algorithm makes minimal SPAM requirements (requiring only single qubit measurements and simple product states). To the first order, the effect of SPAM errors will only be in the measurement of the first order commutator ${{{{{{{\rm{Tr}}}}}}}}(i[H,P]{\rho }_{0})$. For instance, if our initial state is subject to decoherence, this will result in a systematic underestimate of the Hamiltonian parameters. Therefore, a natural future direction for investigation is how this protocol can be made robust to SPAM errors. Another consideration is the potential discrepancy between the Hamiltonian ansatz used by the learning algorithm and the actual underlying Hamiltonian governing the physical system. In realistic scenarios, the system Hamiltonian may deviate from the assumed form due to various factors such as unaccounted interactions, noise, or experimental limitations. To the first order, terms that are unaccounted for do not affect the performance guarantees of our algorithm except for their effect on ${{{{{{{\mathscr{D}}}}}}}}$. However, as noted previously, a good estimate of ${{{{{{{\mathscr{D}}}}}}}}$ is a strong determining factor in the practical performance of our protocol; further investigation is needed to understand the extent to which model mismatches adversely affect performance in practice. We also leave for later works a study of how this protocol can be improved by making stronger assumptions on either the Hamiltonian or the suite of interactions available to us. For instance, we already showed a constant (but significant) drop in the number of measurements required for learning a commuting Hamiltonian with unitary dynamics. We expect a similar effect for Hamiltonian learning with Gibbs states. Furthermore, if we assume we can interact with our system using a trusted quantum simulator of our own, a variety of approaches become possible. Among these is Hamiltonian learning with Loschmidt echoes, as done in Wiebe et al.²². Rigorous performance bounds have not yet been found for this approach, but we speculate that a similar application of our techniques may yield improved performance – however, we leave this for future works.

Methods

In this section, we will describe our derivative estimation protocol, and show that this allows us to make guarantees on the error. First, we establish an elementary procedure for estimating the first order derivative ${f}^{{\prime} }(0)$ given access only to noisy estimates of f. We then apply this procedure to Hamiltonian learning with unitary dynamics and Gibbs states.

Inferring the first-order commutator

For a system evolving under a Hamiltonian H and an initial state given by some density matrix ρ₀, the expectation value of any operator P can be written as:

$$\left\langle P\left(t\right)\right\rangle={{{{{{{\rm{Tr}}}}}}}}\left(P{\rho }_{0}(t)\right)={{{{{{{\rm{Tr}}}}}}}}\left(P{e}^{-iHt}{\rho }_{0}{e}^{iHt}\right)=\mathop{\sum }\limits_{m=0}^{\infty }\frac{{(it)}^{m}}{m!}{{{{{{{\rm{Tr}}}}}}}}\left(\left[{H}^{m}P\right]{\rho }_{0}\right),$$

(13)

$${{{{{{\rm{where}}}}}}}\,\left[H^{m} P\right]=\underbrace{ \left[H,\left[H,\ldots,\left[{H}\right.\right.\right.}_{{m\, {{{{{{\rm{times}}}}}}}}},\left.\left.\left.P\right]\ldots\right]\right] \,{{{{{\rm{with}}}}}}\, \left[H^0 P\right]=P.$$

(14)

This equality is simply using the Heisenberg expansion of the time-evolved operator P(t).

In this section, we define a critical subroutine of our Hamiltonian learning algorithm that infers the expectation ${{{{{{{\rm{Tr}}}}}}}}(\left(i\left[H,P\right]\right){\rho }_{0})$, for P being a local Pauli operator, by measuring time-evolved expectation values. The main idea behind our algorithm is that ${{{{{{{\rm{Tr}}}}}}}}\left(\left(i[H,P]\right){\rho }_{0}\right)$ is the time derivative of the expectation ${{{{{{{\rm{Tr}}}}}}}}(P{e}^{-iHt}{\rho }_{0}{e}^{iHt})$. More specifically, the Heisenberg expansion in Eq. (13) expresses the time-evolved expectation of an observable as

$$\left\langle P(t)\right\rangle=\mathop{\sum }\limits_{m=0}^{\infty }\frac{{i}^{m}}{m!}{{{{{{{\rm{Tr}}}}}}}}(\left[{H}^{m}P\right]{\rho }_{0}){t}^{m}.$$

(15)

Therefore $\left\langle P(t)\right\rangle$ can be modeled as a univariate power series in time, $\mathop{\sum }\nolimits_{m=0}^{\infty }{c}_{m}{t}^{m}$, with coefficients

$${c}_{m}=\frac{{i}^{m}}{m!}{{{{{{{\rm{Tr}}}}}}}}\left(\left[{H}^{m}P\right]{\rho }_{0}\right).$$

(16)

If we were able to access $\left\langle P(t)\right\rangle$ exactly, the most effective way to find c₁ would be to simply differentiate $\left\langle P(t)\right\rangle$ via finite differences with very small Δt (i.e., ${c}_{1}\approx \frac{\left\langle P({{\Delta }}t)\right\rangle -\left\langle P(0)\right\rangle }{{{\Delta }}t}$). Since our measurements of $\left\langle P({{\Delta }}t)\right\rangle$ are subject to shot noise, the variance of this estimator scales with ${{{{{{{\mathcal{O}}}}}}}}({({{\Delta }}t)}^{-2})$, preventing us from using arbitrarily small Δt. However, as Δt grows, the bias in the finite difference estimator grows. The algorithm in Box 2 is a generalization of finite differencing, and uses Chebyshev regression (see Supplementary Note 1) to estimate c₁. This algorithm takes as input a maximum evolution time A and an cutoff degree for the Chebyshev polynomial L. This finite cutoff degree induces biases in the recovered polynomial coefficients, however, we will demonstrate that this bias is suppressed much more effectively than for the finite-difference estimator, as it turns out that these errors scale in a power-law with power L. As mentioned in the beginning of this section, this error bound depends on a bound for the derivative $|\frac{{{{{{{{{\rm{d}}}}}}}}}^{L}\langle P(t) \rangle }{{{{{{{{{\rm{d}}}}}}}}t}^{L}}|=|{{{{{{{\rm{Tr}}}}}}}}([{H}^{m}P]\rho (t))|$. Since ρ(t) is a density matrix, a simple application of the Höelder inequality shows that $\left|{{{{{{{\rm{Tr}}}}}}}}(\left[{H}^{m}P\right]\rho (t))\right|\le \left|\left[{H}^{m}P\right]\right|$ (where $\left|\cdot \right|$ denotes the spectral norm). We can bound spectral norms of iterated commutators with the Hamiltonian as follows:

Definition 3

(Typical scales). We define a typical time scale

$$\tau=\frac{1}{2{{{{{{{\mathscr{D}}}}}}}}{\left|{{\Theta }}\right|}_{\infty }}$$

(17)

of our Hamiltonian. The appearance of ${\left|{{\Theta }}\right|}_{\infty }$ in these scales is unsurprising; scaling all the coefficients up by some constant factor will decrease the time scale of the time evolution by the same factor. The structure parameter ${{{{{{{\mathscr{D}}}}}}}}$ appears in this time scale because, all things being equal, we expect a highly connected Hamiltonian to have observables that change faster compared to weakly connected ones. Indeed, in Supplementary Note 2, we show an upper bound for the norm of the mth iterated commutator between H and P scales roughly with ~τ^−m.

Definition 4

(Dataset). Assume we are given the following hyperparameters:

L, which tells us how many different times at which to evaluate $\left\langle P(t)\right\rangle$;
A, which tells us the maximum time at which we want to evaluate $\left\langle P(t)\right\rangle$; and
N, which tells us how many samples we use to estimate a single evaluation of $\left\langle P(t)\right\rangle$.

We construct the dataset ${{{{{{{\mathcal{D}}}}}}}}$ by evaluating $\left\langle P(t)\right\rangle$ at the roots $\left\{{z}_{i}| i=1,\ldots,L\right\}$ of the Lth Chebyshev polynomial (see Supplementary Note 1 for a review of Chebyshev polynomials). Our dataset comprises of L points:

$${{{{{{{\mathcal{D}}}}}}}} =\left\{({t}_{1},{y}_{1}),({t}_{2},{y}_{2}),\ldots,({t}_{L},{y}_{L})\right\},\; {{{{{{{\rm{where}}}}}}}} \\ {t}_{i} =\frac{A}{2}(1+{z}_{i}),\\ {y}_{i} \sim {Y}_{i},$$

(18)

where Y_i is an N-sample mean estimator of $\left\langle P({t}_{i})\right\rangle$. That is, it satisfies ${\mathbb{E}}[{Y}_{i}]=\left\langle P({t}_{i})\right\rangle$ and ${{{{{{{\rm{var}}}}}}}}[{Y}_{i}]={\sigma }_{i}^{2}/N$, where ${\sigma }_{i}^{2}$ is the variance for a single measurement of $\left\langle P({t}_{i})\right\rangle$. The mapping ${t}_{i}=\frac{A}{2}(1+{z}_{i})$ ensures that the evolution time is nonnegative and never exceeds A.

Having collecting the dataset, it is simple to infer the first derivative c₁.

The following theorem shows that for the appropriate choice of evolution time A and Chebyshev degree L, the error of the estimator ${\tilde{c}}_{1}$ in Box 2 is close to being noise-limited.

Theorem 2

(Sample complexity for one coefficient). Fix some maximum failure probability δ and an error ϵ. Assume that we have access to an unbiased (single-shot) estimator of $\left\langle P(t)\right\rangle$ with variance σ² ≤ 1. Furthermore, assume $\left|P\right|\le 1$. Then there is some choice of maximum evolution A ~ τ and Chebyshev degree $L \sim \log {\epsilon }^{-1}$ such that with

$$N={{{{{{{\mathcal{O}}}}}}}}\left(\log (1/\delta ){{{{{{{\rm{polylog}}}}}}}}(1/\epsilon ){\epsilon }^{-2}\right)$$

(19)

sample complexity, we can construct an estimator ${\tilde{c}}_{1}$ such that $\left|{c}_{1}-{\tilde{c}}_{1}\right|\le \epsilon \cdot {{{{{{{\mathscr{D}}}}}}}}$, except with a failure probability at most δ.

Proof

See Supplementary Note 3. □

Box 2 Algorithm for estimating the first derivative ${{{{{{{\rm{Tr}}}}}}}}(\left(i[H,P]\right){\rho }_{0})$

1: procedure ESTIMATEDERIVATIVE(N, L, A; P, ρ₀)

2: for ℓ ← 1,…, L do ⊳ Construct the dataset ${{{{{{{\mathcal{D}}}}}}}}$ (Definition 4)

3: ${t}_{\ell }\leftarrow \frac{A}{2}\left(1-\cos \left(\frac{2\ell -1}{L}\pi \right)\right)$

4: ${y}_{\ell }\leftarrow {{{{{{{\rm{estimate}}}}}}}}\,{{{{{{{\rm{of}}}}}}}}\,{{{{{{{\rm{Tr}}}}}}}}(P{e}^{-iH{t}_{\ell }}{\rho }_{0}{e}^{iH{t}_{\ell }})$ ⊳ Average N measurement outcomes of P

5: Fit the coefficients ${\tilde{c}}_{k}$ in $\mathop{\sum }\nolimits_{k=0}^{L-1}{\tilde{c}}_{k}\frac{{t}^{k}}{k!}$ to the data $\left\{({t}_{\ell },{y}_{\ell })| \ell=1,\ldots,L\right\}$

6: return ${\tilde{c}}_{1}$

Recovering Hamiltonian coefficients

With an efficient algorithm for accurately estimating first order commutators ${{{{{{{\rm{Tr}}}}}}}}(i[H,P]{\rho }_{0})$, it is possible to construct an algorithm that can infer the coefficients of H using these commutators. The idea is to carefully choose ρ₀ and P so that ${{{{{{{\rm{Tr}}}}}}}}(i[H,P]{\rho }_{0})$ corresponds to one parameter at a time.

First, we introduce the notation that ${\rho }_{0}^{({{{{{{{\mathcal{X}}}}}}}})}$ and ${P}^{({{{{{{{\mathcal{X}}}}}}}})}$ will be the reduced state or Pauli matrix (respectively) that is restricted to the qubits in ${{{{{{{\mathcal{X}}}}}}}}$, and ${{{{{{{{\mathcal{X}}}}}}}}}^{{\prime} }$ will be the set of all qubits not in ${{{{{{{\mathcal{X}}}}}}}}$.

Lemma 1

(Term selection) Let P be some Pauli operator such that there exists some $i\in \left\{1,\ldots,r\right\}$ where suppP ⊆ suppP_i and $\frac{i[{P}_{i},P]}{2} \, \ne \, 0$. Let

$${{{{{{{\mathcal{X}}}}}}}}={{{{{{{\rm{supp}}}}}}}}{P}_{i},$$

(20)

$${{{{{{{\mathcal{Y}}}}}}}}=\left(\bigcup \left\{{{{{{{{\rm{supp}}}}}}}}{P}_{j}| {{{{{{{\rm{supp}}}}}}}}{P}_{j}\cap {{{{{{{\mathcal{X}}}}}}}} \, \ne \, \varnothing \right\}\right)\setminus {{{{{{{\mathcal{X}}}}}}}},$$

(21)

$${{{{{{{\mathcal{Z}}}}}}}}={({{{{{{{\mathcal{X}}}}}}}}\cup {{{{{{{\mathcal{Y}}}}}}}})}^{{\prime} },$$

(22)

$${\rho }_{0}={\left(\frac{{\mathbb{I}}+i[{P}_{i},P]/2}{{2}^{\left|{{{{{{{\mathcal{X}}}}}}}}\right|}}\right)}^{({{{{{{{\mathcal{X}}}}}}}})}\otimes {\left(\frac{{\mathbb{I}}}{{2}^{\left|{{{{{{{\mathcal{Y}}}}}}}}\right|}}\right)}^{({{{{{{{\mathcal{Y}}}}}}}})}\otimes {\rho }_{0}^{({{{{{{{\mathcal{Z}}}}}}}})}.$$

(23)

In words, ${{{{{{{\mathcal{Y}}}}}}}}$ is a neighborhood around ${{{{{{{\mathcal{X}}}}}}}}$ that contains the support of all Paulis that intersect with ${{{{{{{\mathcal{X}}}}}}}}$, and ${{{{{{{\mathcal{Z}}}}}}}}$ is the set of all qubits that are not in ${{{{{{{\mathcal{X}}}}}}}}\cup {{{{{{{\mathcal{Y}}}}}}}}$. The state ρ₀ is defined such that for all qubits in ${{{{{{{\mathcal{Y}}}}}}}}$, it is the maximally mixed state and for qubits inside ${{{{{{{\mathcal{X}}}}}}}}$, ρ₀ is defined in a way such that ${{{{{{{\rm{Tr}}}}}}}}(i[{P}_{i},P]{\rho }_{0}^{({{{{{{{\mathcal{X}}}}}}}})}/2)=1$, and for all other qubits, ρ₀ can be anything. Then:

$${{{{{{{\rm{Tr}}}}}}}}(i[H,P]{\rho }_{0})={\theta }_{i}.$$

(24)

Proof

See Supplementary Note 4. □

This defines a simple algorithm for Hamiltonian learning. For simplicity, for any Pauli P_i, we will simply set the observable P to be a single qubit Pauli acting on one site in ${{{{{{{\mathcal{X}}}}}}}}$ such that [P_i, P] ≠ 0 (see Box 3).

However, the runtime of this algorithm is Ω(r), since this procedure must be called once for each term in the Hamiltonian. We propose an improvement of this algorithm wherein we estimate ${{{{{{{\rm{Tr}}}}}}}}(P{e}^{-iHt}{\rho }_{0}{e}^{iHt})$ for many different choices of P simultaneously. We aim to set ρ₀ in such a way that we can extract coefficients for many terms simultaneously. Yet, rather than using shadow tomography (as done in França et al.²⁸), which can result in ${{{{{{{\mathcal{O}}}}}}}}\left(1{6}^{k}\right)$ scaling, we carefully take advantage of our knowledge about the Hamiltonian structure to get a smaller parallelization overhead. The way forward relies on the fact that in Lemma 1, ${\rho }_{0}^{({{{{{{{\mathcal{Z}}}}}}}})}$ can be anything. Similarly to Haah et al.²¹, we partition the terms of our Hamiltonian into groups of terms that can each be inferred simultaneously. This partition is based on a graph coloring; for details, see the Supplementary Note 4.

Definition 5

(Squared graph). Let the square of the interaction graph, ${{{{{{{{\mathcal{G}}}}}}}}}^{2}$, be the graph with the same vertex set as ${{{{{{{\mathcal{G}}}}}}}}$ and in which any two vertices are connected if their distance in ${{{{{{{\mathcal{G}}}}}}}}$ is at most 2. In words, the edges for ${{{{{{{{\mathcal{G}}}}}}}}}^{2}$ are

$$\left\{(i,k)| \exists j\,\left({{{{{{{\rm{supp}}}}}}}}{P}_{i}\cap {{{{{{{\rm{supp}}}}}}}}{P}_{j}\ne \varnothing \right)\wedge \left({{{{{{{\rm{supp}}}}}}}}{P}_{j}\cap {{{{{{{\rm{supp}}}}}}}}{P}_{k}\ne \varnothing \right)\wedge (i\ne k)\right\}$$

(25)

Our algorithm will rely on a graph coloring of ${{{{{{{{\mathcal{G}}}}}}}}}^{2}$. The essential idea is that for Paulis of the same color, there is always a “moat" separating them. This moat will then be filled with maximally mixed states, which completely suppresses the influence of terms that we are not interested in. A partitioning of the Hamiltonian terms via some C-coloring of ${{{{{{{{\mathcal{G}}}}}}}}}^{2}$ makes it natural to rewrite the Hamiltonian using a double sum notation:

$$H=\mathop{\sum }\limits_{i=1}^{C}\mathop{\sum }\limits_{j=1}^{\left|{{{{{{{{\bf{V}}}}}}}}}_{i}\right|}{\theta }_{i,j}{P}_{i,j},$$

(26)

where V_i is the set of all Paulis with the same color C_i. For instance, see Supplementary Fig. 3 for a coloring of the squared interaction graph for a 9-qubit TFIM.

Lemma 2

(Simultaneous inference for a partition) Let V_i be a partition in a coloring of ${{{{{{{{\mathcal{G}}}}}}}}}^{2}$. The coefficient for each Pauli in V_i can be inferred with up to an error $\epsilon {\left|{{\Theta }}\right|}_{\infty }$, with failure probability for each individual coefficient being at most δ (so the overall failure probability is upper bounded by $\delta \cdot \left|{{{{{{{{\bf{V}}}}}}}}}_{i}\right|$). This can be done with sample complexity

$${{{{{{{\mathcal{O}}}}}}}}\left({{{{{{{{\mathscr{D}}}}}}}}}^{2}\log (1/\delta ){{{{{{{\rm{polylog}}}}}}}}({{{{{{{\mathscr{D}}}}}}}}/\epsilon ){\epsilon }^{-2}\right).$$

(27)

Proof

See Supplementary Note 4. □

Theorem 3

(Hamiltonian learning with unitary dynamics). Fix a sparsely interacting Hamiltonian H that has r terms in its Pauli expansion with coefficients Θ. For the appropriate choice of Chebyshev degree L and evolution time A, the algorithm in Box 3 and Box 4 solves the quantum Hamiltonian learning problem (with an additive error $\epsilon {\left|{{\Theta }}\right|}_{\infty }$ and failure probability at most δ) with sample complexity

$${{{{{{{\mathcal{O}}}}}}}}\left(\frac{{{{{{{{{\mathscr{D}}}}}}}}}^{4}\log (r/\delta ){{{{{{{\rm{polylog}}}}}}}}({{{{{{{\mathscr{D}}}}}}}}/\epsilon )}{{\epsilon }^{2}}\right),$$

(28)

and classical processing time complexity

$${{{{{{{\mathcal{O}}}}}}}}\left(\frac{{{{{{{{{\mathscr{D}}}}}}}}}^{2}r\log (r/\delta ){{{{{{{\rm{polylog}}}}}}}}({{{{{{{\mathscr{D}}}}}}}}/\epsilon )}{{\epsilon }^{2}}\right).$$

(29)

Proof

We partition our Hamiltonian terms into sets that can be simultaneously inferred. There are at most ${{{{{{{{\mathscr{D}}}}}}}}}^{2}$ of these sets (for a proof, see Supplementary Note 4) – morevoer, this partitioning into at most ${{{{{{{{\mathscr{D}}}}}}}}}^{2}$ can be found with classical greedy algorithm that has runtime ${{{{{{{\mathcal{O}}}}}}}}\left({{{{{{{{\mathscr{D}}}}}}}}}^{2}\right)$³⁶. Now, we apply Lemma 2 to each of these sets. For the detailed proof, see Supplementary Note 4. □

In a different setup, we may be given access to copies of a Gibbs state at a temperature β⁻¹. If we measure an observable P_i, the expectation will be

$${\left\langle {P}_{i}\right\rangle }_{\beta }=\frac{{{{{{{{\rm{Tr}}}}}}}}({P}_{i}\exp (-\beta H))}{{{{{{{{\rm{Tr}}}}}}}}(\exp (-\beta H))}$$

(30)

In what follows, we apply the analysis of Haah et al.²¹ to formulate ${\left\langle {P}_{i}\right\rangle }_{\beta }$ as a polynomial in β, in accordance to the framework in Eq. (3). We will show that we can learn the coefficients of the Hamiltonian from the first order term in this polynomial, therefore mapping the problem of Hamiltonian learning from Gibbs states onto Hamiltonian learning with unitary dynamics.

Theorem 4

(Hamiltonian learning with Gibbs states). The Hamiltonian learning problem (with an additive error $\epsilon {\left|{{\Theta }}\right|}_{\infty }$ and failure probability at most δ) can be solved using

$${{{{{{{\mathcal{O}}}}}}}}\left(\frac{{{{{{{{{\mathscr{D}}}}}}}}}^{5}\log (r/\delta ){{{{{{{\rm{polylog}}}}}}}}({{{{{{{\mathscr{D}}}}}}}}/\epsilon )}{{\epsilon }^{2}}\right)$$

(31)

copies of the Gibbs state. This can be achieved with a time complexity

$${{{{{{{\mathcal{O}}}}}}}}\left(\frac{{{{{{{{{\mathscr{D}}}}}}}}}^{4}r\log (1/\delta ){{{{{{{\rm{polylog}}}}}}}}({{{{{{{\mathscr{D}}}}}}}}/\epsilon )}{{\epsilon }^{2}}\right).$$

(32)

Proof

The protocol is a near mirror image of the Hamiltonian learning protocol using unitary dynamics. For the full proof, see Supplementary Note 5. □

Box 3 Algorithm for Naive Hamiltonian learning

1: procedure NAIVEINFERCOEFFICIENTS (τ, L, A, N)

2: for i ← 1,…, r do

3: P ← single qubit Pauli acting on one site in ${{{{{{{\mathcal{X}}}}}}}}$ where [P_i, P] ≠ 0

4: ${\rho }_{0}={\left(\frac{{\mathbb{I}}+i[{P}_{i},P]/2}{{2}^{\left|{{{{{{{\mathcal{X}}}}}}}}\right|}}\right)}^{({{{{{{{\mathcal{X}}}}}}}})}\otimes {(\frac{{\mathbb{I}}}{{2}^{\left|{{{{{{{\mathcal{Y}}}}}}}}\right|}})}^{({{{{{{{\mathcal{Y}}}}}}}})}\otimes {\rho }_{0}^{({{{{{{{\mathcal{Z}}}}}}}})}$ ⊳ ${\rho }_{0}^{({{{{{{{\mathcal{Z}}}}}}}})}$ is any density matrix

5: ${\tilde{\theta }}_{i}\leftarrow$ ESTIMATEDERIVATIVE (N, L, A; P, ρ₀) ⊳ See Box 2

6: return ${\tilde{c}}_{1}$

Box 4 Algorithm for Hamiltonian learning with unitary dynamics

1: procedure PARTITIONINFERCOEFFICIENTS$\left(\tau,{{{{{{{\mathcal{G}}}}}}}},L,A,N\right)$

2: $\left\{{{{{{{{{\bf{V}}}}}}}}}_{i}\right\}\leftarrow$ GRAPHCOLOR$\left({{{{{{{{\mathcal{G}}}}}}}}}^{2}\right)$ ⊳ Find ${{{{{{{{\mathscr{D}}}}}}}}}^{2}+1$ partitions of ${{{{{{{{\mathcal{G}}}}}}}}}^{2}$

3: for $i\leftarrow 1,\ldots,{{{{{{{{\mathscr{D}}}}}}}}}^{2}+1$ do

4: for j ← 1,…, ∣V_i∣ do ⊳ Define the observables and states (Lemma 2)

5: ${P}_{j}^{{\prime} }\leftarrow$ a single-qubit Pauli such that $\left[{P}_{j}^{{\prime} }{P}_{j}\right]\ne 0$

6: ${\rho }_{0}^{({{{{{{{\rm{supp}}}}}}}}{P}_{j})}=({\mathbb{I}}+i\left[{P}_{j}{P}_{j}^{{\prime} }\right]/2)/2$

7: for $q\in {\left({{{{{{{\rm{supp}}}}}}}}{{{{{{{{\bf{V}}}}}}}}}_{i}\right)}^{{\prime} }$do ⊳ Fill the moats

8: ${\rho }_{0}^{(q)}\leftarrow {\mathbb{I}}/2$

9: for ℓ ← 1,…Ldo ⊳ Construct the dataset (Definition 4)

10: ${t}_{\ell }\leftarrow \frac{A}{2}(1-\cos (\frac{2i-1}{L}\pi ))$

11: M_ℓ ← N measurements of ${{{{{{{\rm{Tr}}}}}}}}({P}_{j}^{{\prime} }{e}^{-iH{t}_{\ell }}{\rho }_{0}{e}^{iH{t}_{\ell }})$ for $j\in \left\{1,\ldots,| {{{{{{{{\bf{V}}}}}}}}}_{i}| \right\}$

12: for j ← 1,…, ∣V_i∣ do ⊳ Estimate the first commutator for each Pauli in V_i

13: y_ℓ ← estimate of ${{{{{{{\rm{Tr}}}}}}}}({P}_{j}^{{\prime} }{e}^{-iH{t}_{\ell }}{\rho }_{0}{e}^{iH{t}_{\ell }})$ by averaging over M_ℓ

14: Fit the coefficients ${\tilde{c}}_{k}$ in $\mathop{\sum }\nolimits_{k=0}^{L-1}{\tilde{c}}_{k}\frac{{t}^{k}}{k!}$ to the data $\left\{({t}_{\ell },{y}_{\ell })| \ell=1,\ldots,L\right\}$

15: ${\tilde{\theta }}_{i,j}\leftarrow {\tilde{c}}_{1}$ ⊳ Estimate for the coefficient θ_i,j

Data availability

The simulation data used to produce Fig. 5 has been deposited in Zenodo³⁷.

Code availability

The code for the 80-qubit TFIM numerical simulation can be found on Github³⁷.

References

Burgarth, D. & Ajoy, A. Evolution-free Hamiltonian parameter estimation through Zeeman markers. Phys. Revi. Lett. 119, 030402 (2017).
Article ADS Google Scholar
Wang, J. et al. Experimental quantum Hamiltonian learning. Nat. Phys. 13, 551–555 (2017).
Article Google Scholar
Kwon, H. Y. et al. Magnetic Hamiltonian parameter estimation using deep learning techniques. Sci. Adv. 6, eabb0872 (2020).
Article ADS PubMed PubMed Central Google Scholar
Wang, D. et al. Machine learning magnetic parameters from spin configurations. Adv. Sci. 7, 2000566 (2020).
Article Google Scholar
Carrasco, J., Elben, A., Kokail, C., Kraus, B. & Zoller, P. Theoretical and experimental perspectives of quantum verification. PRX Quantum 2, 010102 (2021).
Article Google Scholar
Boulant, N., Havel, T. F., Pravia, M. A. & Cory, D. G. Robust method for estimating the Lindblad operators of a dissipative quantum process from measurements of the density operator at multiple time points. Phys. Rev. A 67, 042322 (2003).
Article ADS Google Scholar
Innocenti, L., Banchi, L., Ferraro, A., Bose, S. & Paternostro, M. Supervised learning of time-independent Hamiltonians for gate design. New J. Phys. 22, 065001 (2020).
Article ADS MathSciNet Google Scholar
Ben Av, E., Shapira, Y., Akerman, N. & Ozeri, R. Direct reconstruction of the quantum-master-equation dynamics of a trapped-ion qubit. Phys. Rev. A 101, 062305 (2020).
Article ADS MathSciNet Google Scholar
Shulman, M. D. et al. Suppressing qubit dephasing using real-time Hamiltonian estimation. Nat. Commun. 5, 5156 (2014).
Article ADS CAS PubMed Google Scholar
Sheldon, S., Magesan, E., Chow, J. M. & Gambetta, J. M. Procedure for systematically tuning up cross-talk in the cross-resonance gate. Phys. Rev. A 93, 060302 (2016).
Article ADS Google Scholar
Sundaresan, N. et al. Reducing unitary and spectator errors in cross resonance with optimized rotary echoes. PRX Quantum 1, 020318 (2020).
Article Google Scholar
da Silva, M. P., Landon-Cardinal, O. & Poulin, D. Practical characterization of quantum devices without tomography. Phys. Rev. Lett. 107, 210404 (2011).
Article PubMed Google Scholar
Hentschel, A. & Sanders, B. C. Machine learning for precise quantum measurement. Phys. Rev. Lett. 104, 063603 (2010).
Article ADS PubMed Google Scholar
Hentschel, A. & Sanders, B. C. Efficient algorithm for optimizing adaptive quantum metrology processes. Phys. Rev. Lett. 107, 233601 (2011).
Article ADS PubMed Google Scholar
Sergeevich, A., Chandran, A., Combes, J., Bartlett, S. D. & Wiseman, H. M. Characterization of a qubit hamiltonian using adaptive measurements in a fixed basis. Phys. Rev. A 84, 052315 (2011).
Article ADS Google Scholar
Granade, C. E., Ferrie, C., Wiebe, N. & Cory, D. G. Robust online hamiltonian learning. New J. Phys. 14, 103013 (2012).
Article ADS MathSciNet Google Scholar
Qi, X.-L. & Ranard, D. Determining a local hamiltonian from a single eigenstate. Quantum 3, 159 (2019).
Article Google Scholar
Bairey, E., Arad, I. & Lindner, N. H. Learning a local hamiltonian from local measurements. Phys. Rev. Lett. 122, 020504 (2019).
Article ADS CAS PubMed Google Scholar
Evans, T. J., Harper, R. & Flammia, S. T. Scalable bayesian hamiltonian learning 1912.07636 (2019).
Anshu, A., Arunachalam, S., Kuwahara, T. & Soleimanifar, M. Sample-efficient learning of interacting quantum systems. Nat. Phys. 17, 931–935 (2021).
Article CAS Google Scholar
Haah, J., Kothari, R. & Tang, E. Optimal learning of quantum hamiltonians from high-temperature gibbs states 2108.04842 (2021).
Wiebe, N., Granade, C., Ferrie, C. & Cory, D. Quantum hamiltonian learning using imperfect quantum resources. Phys. Rev. A 89, 042314 (2014).
Article ADS Google Scholar
Wiebe, N., Granade, C., Ferrie, C. & Cory, D. G. Hamiltonian learning and certification using quantum resources. Phys. Rev. Lett. 112, 190501 (2014).
Article ADS PubMed Google Scholar
Verdon, G., Marks, J., Nanda, S., Leichenauer, S. & Hidary, J. Quantum hamiltonian-based models and the variational quantum thermalizer algorithm 1910.02071 (2019).
Huang, H.-Y., Tong, Y., Fang, D. & Su, Y. Learning many-body hamiltonians with heisenberg-limited scaling. Phys. Rev. Lett. 130, 200403 (2023).
Article ADS MathSciNet CAS PubMed Google Scholar
Wilde, F. et al. Scalably learning quantum many-body hamiltonians from dynamical data 2209.14328 (2022).
Burgarth, D., Maruyama, K. & Nori, F. Coupling strength estimation for spin chains despite restricted access. Phys. Rev. A 79, 020305 (2009).
Article ADS Google Scholar
França, D. S., Markovich, L. A., Dobrovitski, V. V., Werner, A. H. & Borregaard, J. Efficient and robust estimation of many-qubit hamiltonians 2205.09567 (2022).
Verresen, R. Everything is a quantum ising model 2301.11917 (2023).
Schauss, P. Quantum simulation of transverse Ising models with Rydberg atoms. Quantum Sci. Technol. 3, 023001 (2018).
Article ADS Google Scholar
Zwolak, M. & Vidal, G. Mixed-state dynamics in one-dimensional quantum lattice systems: a time-dependent superoperator renormalization algorithm. Phys. Rev. Lett. 93, 207205 (2004).
Article ADS PubMed Google Scholar
Paeckel, S. et al. Time-evolution methods for matrix-product states. Ann. Phys. 411, 167998 (2019).
Article MathSciNet CAS Google Scholar
White, S. R. & Feiguin, A. E. Real-time evolution using the density matrix renormalization group. Physi. Rev. Lett. 93, 076401 (2004).
Article ADS Google Scholar
Daley A, Kollath C, Schollwöck U & Vidal G. Time-dependent density-matrix renormalization-group using adaptive effective hilbert spaces. J. Stat. Mech.: Theory Exp. 2004, P04005 (2004).
Vidal, G. Efficient simulation of one-dimensional quantum many-body systems. Phys. Rev. Lett. 93, 040502 (2004).
Article ADS PubMed Google Scholar
Mitchem, J. On various algorithms for estimating the chromatic number of a graph. Comput. J. 19, 182–183 (1976).
Article MathSciNet Google Scholar
Gu, A. andigu/hamiltonian-learning https://doi.org/10.5281/zenodo.8412142 (2023).

Download references

Acknowledgements

The authors thank Hsin-Yuan Huang (Robert), Matthias C. Caro, Diego García-Martín, Marco Cerezo, Susanne Yelin, and Hong-Ye Hu for inspiring discussions and for comments on earlier drafts. A.G. acknowledges support from the U.S. Department of Energy (DOE) through a quantum computing program sponsored by the Los Alamos National Laboratory (LANL) Information Science & Technology Institute. L.C. was supported by the Laboratory Directed Research and Development (LDRD) program of LANL under project number 20210116DR. P.J.C. was supported by the LANL ASC Beyond Moore’s Law project.

Author information

Authors and Affiliations

Department of Physics, University of California, Berkeley, Berkeley, CA, USA
Andi Gu
Harvard Quantum Initiative, Harvard University, Cambridge, MA, 02138, USA
Andi Gu
Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
Andi Gu, Lukasz Cincio & Patrick J. Coles
Normal Computing Corporation, New York, NY, USA
Patrick J. Coles

Authors

Andi Gu
View author publications
You can also search for this author in PubMed Google Scholar
Lukasz Cincio
View author publications
You can also search for this author in PubMed Google Scholar
Patrick J. Coles
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.G. conceptualized, conducted the formal analysis, and wrote the original draft. L.C. and P.J.C. helped in the initial writing, as well as revising and editing the manuscript.

Corresponding author

Correspondence to Andi Gu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gu, A., Cincio, L. & Coles, P.J. Practical Hamiltonian learning with unitary dynamics and Gibbs states. Nat Commun 15, 312 (2024). https://doi.org/10.1038/s41467-023-44008-1

Download citation

Received: 18 November 2022
Accepted: 24 November 2023
Published: 08 January 2024
DOI: https://doi.org/10.1038/s41467-023-44008-1

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Learning quantum Hamiltonians from high-temperature Gibbs states and real-time evolutions

Sample-efficient learning of interacting quantum systems

Improved machine learning algorithm for predicting ground state properties

Introduction

Results

Preliminaries

Definition 1

Definition 2

Example 2.1

Collecting necessary data

Classical postprocessing

Theorem 1

Proof

Numerical simulations

Discussion

Methods

Inferring the first-order commutator

Definition 3

Definition 4

Theorem 2

Proof

Recovering Hamiltonian coefficients

Lemma 1

Proof

Definition 5

Lemma 2

Proof

Theorem 3

Proof

Theorem 4

Proof

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Peer Review File

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links