## Introduction

Much work has been done on the time and query complexity of Hamiltonian simulation when given a classical description or black box description of the Hamiltonian. Lloyd provided the first formal results on simulation, considering Hamiltonians that consist of sums of non-commuting terms.1 Other lines of research have focused on simulating sparse Hamiltonians, with a long sequence of work recently culminating in an optimal algorithm2 (see ref. 3 for a more complete history of work in this field).

In this work, we approach the problem of Hamiltonian simulation from a slightly different perspective. Rather than given a classical description or black-box access to a Hamiltonian H, we consider the problem of simulating H when given many copies of a quantum state ρ that encodes the Hamiltonian to be simulated. In particular, we assume that

$$\rho =\frac{H+c{\Bbb 1}}{{\rm{Tr}}(H+c{\Bbb 1})}$$
(1)

for some constant $$c\in {\mathbb{R}}$$ such that $$H + c {\mathbb{1}}$$ is positive semidefinite and nonzero. In that case, ρ itself is positive semidefinite and Tr ρ = 1, so ρ is a valid density matrix. Note that the Hamiltonian dynamics of H and ρ are equivalent up to an overall phase and time scaling. Moreover, since the Hamiltonian H in Eq. (1) can be arbitrary, any unitary can in fact be expressed as e iρt for an appropriately chosen state ρ and time t.

This modified version of the original Hamiltonian simulation problem is what we call sample-based Hamiltonian simulation: given one copy of an unknown state σ and n copies of an unknown state ρ, implement the following map:

$$\sigma \otimes \mathop{\underbrace{\rho \otimes \cdots \otimes \rho }}\limits_{n}\quad \mapsto \quad {e}^{-i\rho t}\sigma {e}^{i\rho t}$$
(2)

where t is the desired evolution time. We also allow for some error in the final state—we denote by δ the trace distance 4 between the state that is output by the protocol and the ideal state e iρt σe iρt. This problem was first considered in ref. 5, where the authors give a simple protocol, which we call the LMR protocol (LMR comes from the author’s initials: Lloyd, Mohseni, and Rebentrost), for approximately implementing the unitary e iρt using many copies of ρ. Their protocol is based on a partial swap operation that can also be considered as a finite-dimensional analog of a beam-splitter.6 An interesting feature of the LMR protocol is that it is agnostic with regard to ρ. In the spirit of 7, 8 this suggests interpreting ρ as a “quantum software state”.

The main motivation for sample-based Hamiltonian simulation in ref. 5 is to perform principal component analysis of ρ. They do this by performing phase estimation on the unitary e . (We note in Supplementary Information Section C that a slightly more careful analysis gives a polynomial improvement in sample complexity over the complexity given in ref. 5 for performing phase estimation, which is a subroutine for principal component analysis). The LMR protocol has applications to many problems in machine learning, e.g., refs 5, 9,10,11.

In this paper, we ask the following question: given t and δ, what is the minimum n (number of copies of ρ) necessary to implement the unitary e iρt on an unknown state σ to trace distance at most δ? We call this the sample complexity of Hamiltonian simulation.

While the LMR protocol acts with each copy of ρ sequentially, perhaps one could achieve better performance by acting with a global operation?12 For example, recent near-optimal tomographic protocols have relied on performing global operations (like the Schur transform) on many copies of the unknown state.13, 14 Along those lines, perhaps one could do better than LMR by applying tomographic protocols to get an estimate $$\hat{\rho }$$ of ρ from the n copies of ρ, and then evolve according to $${e}^{-i\hat{\rho }t}$$.

On the contrary, however, we show that LMR performs asymptotically better than any tomographic strategy (‘LMR protocol vs. state tomography’), and performs sample-based Hamiltonian simulation with asymptotic optimality in both t and δ simultaneously (‘LMR protocol is optimal’). We additionally show LMR is optimal in δ when restricting to pure states, and provide a sample-optimal algorithm for a variant of Grover’s search. In ‘Generalized LMR for simulation of Hermitian polynomials’, we discuss the sample complexity of more complex Hamiltonians that depend on multiple states. For example, we show how to simulate the Hamiltonians given by any Hermitian polynomial (i.e. any element of the Jordan–Lie algebra15) generated by states ρ 1,…, ρ K to which we are given sample access. As applications of this result, we show how to simulate the commutator i[ρ 1, ρ 2] and anticommutator {ρ 1, ρ 2} of two states ρ 1 and ρ 2, and how to simulate any real linear combination of states ρ 1,…, ρ K when given access to many copies of those states. We prove the optimality of the commutator, anticommutator, and linear combinations protocols. In ‘Applications of commutator simulation’, we give applications of commutator simulation to orthogonality testing and quantum state addition. In ‘Universality of LMR’, we show how to use sample-based Hamiltonian simulation to implement a universal model of quantum computation using only partial swaps and a stream of input qubits initialized in |0〉 and |+〉.

In ‘Discussion’, we discuss the results and suggest open problems. Finally, in ‘Methods’ we give proofs for two of the main results in the paper: the lower bound on the LMR protocol, and the protocol for simulating Hamiltonians given by Hermitian polynomials of the input states.

### Notation

We use $${\mathcal H}$$ to denote a finite-dimensional Hilbert space, and $${\rm{D}}( {\mathcal H} )$$ to represent the set of positive semi-definite operators with trace 1 on $${\mathcal H}$$ (i.e. the set of valid quantum states).

The trace distance between $$\rho ,\sigma \in {\rm{D}}( {\mathcal H} )$$ is given by $$\frac {1}{2}{\Vert \rho - \sigma \Vert }_{1}$$, where $${\Vert A\Vert }_{1}:={\rm{Tr}}(\sqrt{A{A}^{\dagger }})\mathrm{.}$$ The trace distance between ρ and σ gives the maximum difference in probability of any measurement on the two states.4 For two quantum channels $${ {\mathcal E} }_{1}$$ and $${ {\mathcal E} }_{2}$$ that act on $${\rm{D}}( {\mathcal H} )$$, their trace norm distance is defined as

$$\frac{1}{2}\parallel { {\mathcal E} }_{1}-{ {\mathcal E} }_{2}{\parallel }_{{\rm{tr}}}:=\frac{1}{2}\mathop{\max }\limits_{\rho \in {\rm{D}}( {\mathcal H} )}\parallel { {\mathcal E} }_{1}(\rho )-{ {\mathcal E} }_{2}(\rho ){\parallel }_{1}$$
(3)

The diamond norm distance is defined as

$$\frac{1}{2}\parallel { {\mathcal E} }_{1}-{ {\mathcal E} }_{2}{\parallel }_{\diamond }:=\frac{1}{2}\mathop{\max }\limits_{k,\rho \in {\rm{D}}( {\mathcal H} \otimes { {\mathcal H} }_{k})}{\Vert ({ {\mathcal E} }_{1}\otimes {\mathcal I} )(\rho )-({ {\mathcal E} }_{2}\otimes {\mathcal I} )(\rho )\Vert }_{1}$$
(4)

where $${\mathcal I}$$ is the identity channel on a k-dimensional space $${ {\mathcal H} }_{k}$$. Note $$\parallel { {\mathcal E} }_{1}-{ {\mathcal E} }_{2}{\parallel }_{\diamond }\ge \parallel { {\mathcal E} }_{1}-{ {\mathcal E} }_{2}{\parallel }_{{\rm{tr}}}$$.

We use $${\mathbb 1}_{\rm A}$$ to mean the identity matrix acting on subsystem A, but if clear from context, we will drop the subscript. We use |+〉:=(|0〉+|1〉)/(2)1/2 and denote single-qubit Pauli operators as X, Y, and Z.

## Results

### LMR protocol vs. state tomography

Lloyd, Mohseni, and Rebentrost5 gave a simple method for approximating the transformation in Eq. (2). The number of copies of ρ required by their procedure is not only independent of σ and ρ, but is independent of the dimension and rank of ρ. We state their result in a slightly more general form, where σ has two registers and e iρt is applied only to one of them.

### Theorem 1

(ref. 5). Let $$\rho \in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}})$$ and $$\sigma \in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}}\otimes { {\mathcal H} }_{{\rm{B}}})$$ be two unknown quantum states and $$t\in {\mathbb{R}}$$ (can be either positive or negative). Then there exists a quantum algorithm that transforms $${\sigma }_{{\rm{AB}}}\otimes {\rho }_{{{\rm{A}}}_{1}}\otimes \cdots \otimes {\rho }_{{{\rm{A}}}_{n}}$$ into $${\tilde{\sigma }}_{{\rm{AB}}}$$ such that

$$\frac{1}{2}{\Vert ({e}^{-i{\rho }_{{\rm{A}}}t}\otimes {\mathbb 1}_{{\rm{B}}}){\sigma }_{{\rm{AB}}}({e}^{i{\rho }_{{\rm{A}}}t}\otimes {\mathbb 1}_{{\rm{B}}})-{\tilde{\sigma }}_{{\rm{AB}}}\Vert }_{1}\le \delta ,$$
(5)

as long as the number of copies of ρ is n = O(t 2/δ). In other words, this quantum algorithm implements the unitary e iρt up to error δ in the diamond norm, using O(t 2/δ) copies of ρ.

We will give a sketch of the proof because many of our more general simulation techniques build on their ideas; for the full proof see Supplementary Information Section A. For simplicity we assume ρ and σ have the same dimension. Using a Taylor series expansion, the target state is

$${e}^{-i\rho t}\sigma {e}^{i\rho t}=\sigma -i[\rho ,\sigma ]t-\frac{1}{\mathrm{2!}}[\rho ,[\rho ,\sigma ]]{t}^{2}+\cdots \mathrm{.}$$
(6)

We note that for very small evolution times Δ, we have the following direct calculation:

$${{\rm{Tr}}}_{2}[{e}^{-iS\Delta }(\sigma \otimes \rho ){e}^{iS\Delta }]=\sigma -i[\rho ,\sigma ]\Delta +O({\Delta }^{2})$$
(7)
$$\quad \quad \quad \quad \quad \quad \quad \quad \;\, ={e}^{-i\rho \Delta }\sigma {e}^{i\rho \Delta }+O({\Delta }^{2}),$$
(8)

where by Tr i we mean taking the partial trace of the ith subsystem, and S is the swap operator between the two registers. If we take Δ = δ/t and repeat this procedure O(t 2/δ) times, we end up implementing the operator e iρt up to error O2 · t 2/δ) = O(δ).

Thus the LMR protocol uses O(t 2/δ) copies of ρ to implement the unitary e iρt up to error δ in trace norm. (While not noted explicitly in ref. 5, the LMR protocol can be implemented efficiently, i.e. using $$O(\mathrm{log}\,D\cdot {t}^{2}/\delta )$$ single-qubit and Fredkin (controlled-swap) gates, where $$D=\dim ({ {\mathcal H} }_{{\rm{A}}})$$, by applying the linear combination of unitaries algorithm (see, e.g., ref. 3 or ref. 16, Theorem 2.4). For more information, see Supplementary Information Section A). To obtain the result for the diamond norm, simply replace σ by σ AB and perform the partial swap operation e iSΔ only between the A registers of σ AB and ρ A, and then discard the last register that was originally holding ρ A.

Additionally, the LMR protocol can be modified to implement the controlled-e iρt operation, which will be important if one wants to implement phase estimation on e iρt. A method for implementing controlled-e iρt is stated without proof in ref. 5; we prove this method works and provide an additional approach in the Supplementary Information Section A.1.

An alternative method to LMR for sample-based Hamiltonian simulation would be to perform tomography on the copies of ρ to get an estimate $$\hat{\rho }$$ of ρ, and then implement $${e}^{-i\hat{\rho }t}$$. In Supplementary Information Section B, we show that the number of samples needed if using this strategy is

$$n=\Omega \left (\frac{Cdr{(t-\delta )}^{2}}{{\delta }^{2}\,\mathrm{log}(dt/r\delta )}+\frac{{t}^{2}}{{\delta }^{2}}\right ),$$
(9)

where d is the dimension of ρ, r is the rank of ρ, and t and δ are as in Theorem 1.

Comparing with Theorem 1, since LMR does not have any dependence on d or r, we immediately see that for large d or r, LMR does significantly better. Furthermore, even fixing d and r, we see that LMR provides a square-root improvement in sample complexity over tomography in terms of δ.

### LMR protocol is optimal

To prove the LMR protocol is in fact asymptotically optimal, we first give a lower bound on the sample complexity of distinguishing two specific states. Next, we assume we have a protocol that simulates e iρt to trace norm (which is a weaker assumption than using diamond norm) δ using f(t, δ) samples of ρ for some function f. Then we show that using such a protocol one can distinguish these two states. However, if f = o(t 2/δ), we would violate our lower bound on state discrimination.

### Theorem 2

Let f(t, δ) be the number of copies of ρ required to implement the unitary e iρt up to error δ in trace norm. Then as long as δ ≤ 1/6 and δ/t ≤ 1/(6π), it holds that f(t,δ) = Θ(t 2/δ).

The proof of Theorem 2 can be found in ‘Discussion’. The proof uses mixed states, so it could be possible that simulating $$\exp (-i|\psi \rangle \langle \psi |t)$$ for a pure state |ψ〉 could be done more efficiently. This relates to a practically relevant question, namely, the fact that the LMR protocol and certain pure states as resources create a universal model for quantum computation (see ‘Universality of LMR’). However, we can show that LMR is also optimal for pure states in the δ error parameter. We cannot expect to prove a meaningful lower bound on the t dependence in pure state LMR. The reason is that, given any state ρ and promised that $$\exp (-i\rho t)$$ is periodic with period T (i.e. $$\exp (-i\rho {t}_{1})=\exp (-i\rho {t}_{2})$$ for any t 2 = t 1 + kT for integer k and real number T ), we can always simulate the Hamiltonian ρ for an equivalent time t[0,T ) instead. Therefore asymptotic scaling in terms of large t is meaningless. For pure states, we immediately know the period, namely 2π.

To prove that the LMR protocol is optimal for pure states, we employ variants of Grover’s search. While Grover’s search17 is a well-known quantum mechanical task, it is not often stated in its form as a decision problem, and very rarely18 as a metrological decision problem, where the inputs are unitaries and the output depends on a property that those unitaries either possess or do not possess. This guise is useful for our purposes, however, because the LMR protocol allows us to turn metrology problems on states into metrology problems on quantum operations.

In the metrological view, Grover’s search, or perhaps more precisely amplitude amplification,19 is the following problem of parameter estimation. Let $${\mathcal{T}}$$ be a subspace of $${{\mathbb{C}}}^{{2}^{q}}$$. We call $${\mathcal{T}}$$ the target subspace. Let $${{\mathcal{U}}}_{{\mathcal{T}}}$$ be a unitary acting on q + 1 qubits such that

$${{\mathcal{U}}}_{{\mathcal{T}}}\left|\phi\right\rangle \left|0\right\rangle = \left(\begin{array}{cc}\left|\phi\right\rangle \left|1\right\rangle , & {\rm{if}}\left|\phi\right\rangle \in {\mathcal{T}},\\ \left|\phi\right\rangle \left|0\right\rangle , & {\rm{if}}\left|\phi\right\rangle \perp {\mathcal{T}}.\end{array}\right.$$
(10)

In this problem, and in the following variations, we will assume access to $${{\mathcal{U}}}_{{\mathcal{T}}}$$ and $${{\mathcal{U}}}_{{\mathcal{T}}}^{\dagger }$$ are free. For a q-qubit unitary V, define

$$\lambda :={\left|({\mathbb 1}\otimes \left\langle 1\right|){U}_{{\mathcal{T}}}((V{\left|0\right\rangle }^{\otimes q})\otimes \left|0\right\rangle )\right|}^{2}.$$
(11)

Then in Grover’s search, the task is to decide whether λ ≥ w (for w > 0) or λ = 0, while using V and V as few times as possible. In other words, if we call |s〉: = V|0〉q the start state, we would like to determine whether the start state has substantial probability mass in the target subspace or none, promised one is the case. If we solve this problem using Grover’s search and count the number of uses of V and V required to succeed with probability 1 − ε, we get the standard complexity $$\Theta (\mathrm{\log}(\mathrm{1/}\varepsilon )/\sqrt{w})$$.20, 21

One simple modification of metrological Grover’s search is to replace the circuit description of V with copies of the start state |s〉 instead. The problem is now to determine whether $$\lambda :={\left|({\mathbb 1}\otimes \left\langle 1\right|){U}_{{\mathcal{T}}}\left|s\right\rangle \otimes \left|0\right\rangle \right|}^{2}$$ is at least w > 0 or equal to zero, promised one is the case, given copies of |s〉 and unlimited access to $${{\mathcal{U}}}_{{\mathcal{T}}}$$ and $${{\mathcal{U}}}_{{\mathcal{T}}}^{\dagger }$$. We call this sample-based Grover’s search.

A second variant of metrological Grover’s search is to replace both V and $${{\mathcal{U}}}_{{\mathcal{T}}}$$ with quantum states. In this form, the problem becomes: given copies of q-qubit states |s〉 and |t〉, determine whether $$\lambda ={|\langle s||t\rangle |}^{2}$$ is at least w > 0 or equal to zero, promised one is the case. We call this variant orthogonality testing. We address optimal orthogonality testing in ‘Applications of commutator simulation’.

In Supplementary Information Section D.2, we first prove that for sample-based Grover’s search, we lose the square-root advantage of the regular Grover’s search:

### Lemma 3

Sample-based Grover’s search with success probability 1 − ε uses $$\Theta (\mathrm{\log}(\mathrm{1/}\varepsilon )/w)$$ copies of |s〉.

Using this result, we prove:

### Theorem 4

The number of copies of an unknown pure state ρ required for any algorithm to simulate e iρt to trace norm δ is Ω(1/δ).

The main idea of the proof, which can be found in Supplementary Information Section D.2, is that the reflections V and V needed for Grover’s algorithm can be implemented by sample-based Hamiltonian simulation using many copies of the state |s〉. This gives us a way to reduce sample-based Grover’s search to sample-based Hamiltonian simulation. Then we apply the lower bound of Lemma 3.

### Generalized LMR for simulation of Hermitian polynomials

We show sample-based Hamiltonian simulation of Eq. (2) can be further generalized. Instead of evolution of σ by a single state ρ, the target Hamiltonian H could be encoded by some combination of multiple states ρ 1,ρ 2,…,ρ K . For example, we might want to implement the map

$$\sigma \,{\otimes }{\bigotimes _{j\mathrm{=1}}^{K}} {\rho }_{j}^{\otimes {n}_{j}}\quad \mapsto \quad {e}^{-if({\rho }_{1},{\rho }_{2},\ldots ,{\rho }_{K})t}\sigma {e}^{if({\rho }_{1},{\rho }_{2},\ldots ,{\rho }_{K})t},$$
(12)

where H = f(ρ 1,ρ 2,…,ρ K ) is some Hermitian polynomial function of the input states. In fact, we prove it is possible to simulate arbitrary Hermitian multinomial functions:

### Theorem 5

Let $${\rho }_{1},\ldots ,{\rho }_{K}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}})$$ and $${\sigma }_{{\rm{AB}}}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}}\otimes { {\mathcal H} }_{{\rm{B}}})$$ be unknown quantum states, and let

$$H=\sum _{r\in R}{c}_{r}{H}_{r},\quad {H}_{r}=\frac{1}{2}({e}^{i{\phi }_{r}}{\rho }_{{r}_{1}}{\rho }_{{r}_{2}}\cdots {\rho }_{{r}_{|r|}}+{e}^{-i{\phi }_{r}}{\rho }_{{r}_{|r|}}{\rho }_{{r}_{|r|-1}}\cdots {\rho }_{{r}_{1}})$$
(13)

be a Hermitian polynomial in ρ 1,…,ρ K , where R is a finite set of strings over the alphabet {1,2,…,K}. Using n samples from the states {ρ 1,…,ρ K }, a quantum algorithm can transform σ AB into $${\tilde{\sigma }}_{{\rm{AB}}}$$ such that

$$\frac{1}{2}{\Vert ({e}^{-iHt}\otimes {\mathbb 1}_{{\rm{B}}}){\sigma }_{{\rm{AB}}}({e}^{iHt}\otimes {\mathbb 1}_{{\rm{B}}})-{\tilde{\sigma }}_{{\rm{AB}}}\Vert }_{1}\le O(\delta ),$$
(14)

if n = O(Lc 2 t 2/δ) where c: = ∑ rR |c r | and $$L:={\max }_{r\in R}|r|$$ is the multinomial degree of H. Moreover, on average, the number of copies of ρ j consumed is n j  = O(κ j c 2 t 2/δ) where κ j  = ∑ rR v j (r)|c r |/c, and v j (r) = |{s:r s  = j}|.

As corollaries of Theorem 5, we have the following simulation results for linear combinations of states (Corollary 6) and for the commutator and anticommutator of two states (Corollary 7):

### Corollary 6

Let $${\rho }_{1},\ldots ,{\rho }_{K}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}})$$ and $${\sigma }_{{\rm{AB}}}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}}\otimes { {\mathcal H} }_{{\rm{B}}})$$ be unknown quantum states, and let $${c}_{1},\ldots ,{c}_{K}\in {\mathbb{R}}$$ . Using n samples from the states {ρ 1,…,ρ K }, a quantum algorithm can transform σ AB into $${\tilde{\sigma }}_{{\rm{AB}}}$$ such that

$$\frac{1}{2}{\Vert ({e}^{-iHt}\otimes {\mathbb 1}_{{\rm{B}}}){\sigma }_{{\rm{AB}}}({e}^{iHt}\otimes {\mathbb 1}_{{\rm{B}}})-{\tilde{\sigma }}_{{\rm{AB}}}\Vert }_{1}\le O(\delta ),\quad H=\sum _{j\mathrm{=1}}^{K}{c}_{j}{\rho }_{j},$$
(15)

if n = O(c 2 t 2/δ) where $$c:={\sum }_{j\mathrm{=1}}^{K}|{c}_{j}|$$ . Moreover, on average, the number of copies of ρ j consumed is n j  = O(|c j |ct 2/δ).

### Corollary 7

Let $${\rho }_{1},{\rho }_{2}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}})$$ and $${\sigma }_{{\rm{AB}}}\in {\rm{D}}({ {\mathcal H} }_{{\rm{A}}}\otimes { {\mathcal H} }_{{\rm{B}}})$$ be unknown quantum states, and ϕ[0, 2π). Using n samples each of ρ 1 and ρ 2 , a quantum algorithm can transform σ AB into $${\tilde{\sigma }}_{{\rm{AB}}}$$ such that

$$\begin{array}{ccccc}\\ & \frac{1}{2}{\Vert ({e}^{-iHt}\otimes {\mathbb 1}_{{\rm{B}}}) {\sigma }_{{\rm{AB}}}({e}^{iHt}\otimes {\mathbb 1}_{{\rm{B}}})-{\tilde{\sigma }}_{{\rm{AB}}}\Vert }_{\diamond }\le O(\delta ),\quad \\ & H=\frac{1}{2}({e}^{i\phi }{\rho }_{1}{\rho }_{2}+{e}^{-i\phi }{\rho }_{2}{\rho }_{1}),\\ \end{array}$$
(16)

if $$n=O({t}^{2}/\delta )$$

Note from Eq. (16) that

$$H=\frac{1}{2}\,\cos (\phi )\{{\rho }_{1},{\rho }_{2}\}+\frac{1}{2}\,\sin (\phi )i[{\rho }_{1},{\rho }_{2}],$$
(17)

so by choosing ϕ = 0, we recover the anticommutator Hamiltonian {ρ 1,ρ 2}/2, and choosing ϕ = π/2 we recover the commutator Hamiltonian i[ρ 1,ρ 2]/2.

Furthermore, Corollary 6 and Corollary 7 are both optimal:

### Theorem 8

Let {c 1,…,c K } be a set of K real numbers. Then there exist ρ 1,…,ρ K such that to simulate $$H={\sum }_{j\mathrm{=1}}^{K}{c}_{j}{\rho }_{j}$$ for time t and to error δ in trace norm requires Ω(c 2 t 2/δ) copies of states in {ρ 1,…,ρ K }, where c: = ∑ j |c j |, as long as δ and δ/(ct) are smaller than some constants.

### Theorem 9

To simulate H = i[ρ 1, ρ 2] for time t and to trace norm error δ requires Ω(t 2/δ) copies each of the states ρ 1 and ρ 2 , as long as δ and δ/t are smaller than some constants.

While Theorem 9 only applies to commutators, it is easy to see that the simulation from Corollary 7 of the anticommutator {ρ 1, ρ 2} has optimal scaling in t and δ, because in the qubit case, we can always choose $$\rho_2 = {\mathbb 1}/2$$ so that {ρ 1, ρ 2} = ρ 1 and we can apply the lower bound from Theorem 2. The proofs for results in this section can be found in Supplementary Information Section D.3.

### Applications of commutator simulation

We now describe how one can use commutator simulation to perform tasks such as orthogonality testing and coherent addition of two pure states.

We first give a protocol for coherent state addition: given many copies of unknown pure states |ψ 1〉 and |ψ 2〉, the task is to obtain a state of the form

$$a\left|{\psi }_{1}\right\rangle +b\left|{\psi }_{2}\right\rangle$$
(18)

for some $$a,b\in {\mathbb{R}}$$. Note that the target state is sensitive to the global phases of the two input states—in particular, the relative phase between |ψ 1〉 and |ψ 2〉—which have no physical meaning. To make the task well-defined, we instead demand the target state to be of the form

$$a\left|{\psi }_{1}\right\rangle +b\frac{\left\langle {\psi }_{2}\left|\right.{\psi }_{1}\right\rangle }{\left\langle {\psi }_{2}\left|\right.{\psi }_{1}\right\rangle }\left|{\psi }_{2}\right\rangle$$
(19)

for some $$a,b\in {\mathbb{R}}$$, which is unique (up to a global phase) even when the global phases of the two input states have not been specified. Note that we can always recover Eq. (18) from Eq. (19) by fixing the global phases of the two input states appropriately (i.e. such that 〈ψ 2|ψ 1〉 > 0).

### Theorem 10

Let |ψ 1and |ψ 2be unknown pure states of the same dimension. Promised that the angle between the two states is $$\Delta :=\arccos |\langle {\psi }_{1}|{\psi }_{2}\rangle |$$ and Δ{0,π/2}, it is possible to create the state

$$\left|\psi (\chi )\right\rangle :=\frac{1}{\sin \,\Delta }(\sin (\Delta -\chi )\left|{\psi }_{1}\right\rangle +{e}^{i\varphi }\,\sin \,\chi \left|{\psi }_{2}\right\rangle )$$
(20)

to trace distance δ using $$O(\frac{{\chi }^{2}}{\delta {\sin }^{2}2\Delta })$$ copies of |ψ 1and |ψ 2, where e : = 〈ψ 2|ψ 1〉/|〈ψ 2|ψ 1〉| is an unimportant phase factor that can be ignored by appropriately adjusting the global phases of the two states.

A similar protocol has been independently discovered in ref. 22, Theorem 2, and recently implemented experimentally.23 While our protocol involves only the two input states |ψ 1〉 and |ψ 2〉, the protocol of ref. 22 requires an additional reference state |χ〉. Another difference between the two results is that we consume several copies of the input states to obtain an approximation of the target state, while23 consume only a single copy but rely on a probabilistic postselection.

The proof of Theorem 10 (which can be found in Supplementary Information Section F) is based on commutator simulation and effectively implements a rotation in the two-dimensional subspace spanned by |ψ 1〉 and |ψ 2〉. Indeed, note from Eq. (20) that |ψ(0)〉 = |ψ 1〉 and |ψ(Δ)〉 = e |ψ 2〉, while intermediate values of χ produce states that interpolate between these two. (If one does not care about the relative phase $${e}^{i\varphi }$$, one can always exchange the two states and replace $$\chi$$ by $$\Delta -\chi$$, which would improve the complexity by a constant factor when $$\chi >\Delta \mathrm{/2}$$). As a consequence, the target state in Eq. (19) has real coefficients a and b. One can also achieve complex coefficients using a more sophisticated Hamiltonian that includes terms proportional to |ψ 1〉〈ψ 1| and |ψ 2〉〈ψ 2|, but we do not consider this case here for the sake of simplicity.

Our protocol requires a very large number of samples when the states |ψ 1〉 and |ψ 2〉 have either very small or very large overlap (i.e. in cases when $${\sin }^{2}2\Delta$$ is very small). This is because we use commutator simulation to effectively implement a rotation in the two-dimensional subspace spanned by |ψ 1〉 and |ψ 2〉, and in the special cases when |ψ 1|ψ 2〉 or |ψ 1〉 = e |ψ 2〉 the commutator vanishes and hence our protocol fails (in the second case the task is trivial though).

Interestingly, by choosing χ = Δ/2 in Eq. (20) it is possible to coherently add two states, i.e. create a state proportional to |ψ 1〉 + |ψ 2〉 (we are ignoring the relative phase between the two states). However, to determine Δ one needs to estimate the inner product between the two states, which can be done by running phase estimation on the commutator.

We note that the commutator of orthogonal states is zero, while the commutator of non-orthogonal states is non-zero (as long as the states are not identical). Using this fact, and by performing phase estimation on the unitary generated by the commutator of two pure states, we can create a test for orthogonality (for proof, see Supplementary Information Section F).

### Theorem 11

Let |ψ 1and |ψ 2be unknown pure states of the same dimension. Promised that either |〈ψ 1|ψ 2〉| = 0 or |〈ψ 1|ψ 2〉| ≥ w, deciding which with probability $$1-\epsilon$$ uses $$\Theta (\mathrm{\log}(\mathrm{1/}\epsilon )/w)$$ copies of |ψ 1and |ψ 2.

### Universality of LMR

In many solid-state implementations of quantum computers, such as quantum dots,24 donor pairs,25 and electron spins,26 the Heisenberg exchange is the natural coupling interaction between qubits. More specifically, the Heisenberg interaction between qubits i and j is given by

$${H}_{ij}:={X}^{i}\otimes {X}^{j}+{Y}^{i}\otimes {Y}^{j}+{Z}^{i}\otimes {Z}^{j},$$
(21)

where X i, Y i, and Z i are the Pauli matrices acting on qubit i. Up to an overall scaling, this is the same as the swap interaction S used in the LMR protocol, see Eq. (7). In the solid state systems mentioned above, the Heisenberg interaction typically can be turned on and off for pairs of qubits for any desired length of time, and the operations induced by these interactions are usually fast and reliable.

While it is beneficial to create computing models that take advantage of the Heisenberg exchange interaction, this interaction is not universal for spin-1/2 systems.27 Several schemes have overcome this limitation by using encoded logical qubits and decoherence-free subsystems.28, 29

In this section, we use the LMR protocol to design a universal model for quantum computation that does not use encoded qubits, but which requires only the Heisenberg interaction, as well as the ability to prepare the states |0〉 and |+〉 on a single qubit. Our scheme thus requires n + 1 physical qubits to perform computations on n qubits, in contrast to encoded schemes, of which the simplest require 2 or 3 times the number of physical qubits.28, 29 Furthermore, there has been much research in the field of quantum dots on how to quickly and reliably prepare a fixed qubit state, e.g., in refs 30,31,32,33. These schemes could be applied to produce the single-qubit states |0〉 and |+〉 needed for our protocol.

We consider a connectivity graph of the qubits as in Fig. 1 (different connectivity graphs lead to different scalings depending on which costs you would like to optimize). We assume exchange interactions can be applied between connected qubits in the form of unitaries $$\exp (-it{H}_{ij})$$ for arbitrary t. The qubit q * is where the states |0〉 and |+〉 are prepared.

Recall that arbitrary single-qubit gates combined with any entangling two-qubit gate is sufficient for universal quantum computation.34 Since we do not have encoded qubits, the exchange interaction itself immediately gives us an entangling gate. Now for universal quantum computation we need to show how to perform arbitrary single-qubit gates.

Let $${X}_{\phi }:=\exp [-i\theta X]$$ and let $${Z}_{\theta }:=\exp [-i\theta Z]$$ for Pauli’s X and Z. Then any single-qubit rotation can be written as X ϕ Z θ X ξ for some angles ϕ, θ, and ξ.4 Therefore, it is sufficient to show how to perform X and Z rotations.

If qubit i needs to have a single-qubit gate performed on it, using the Heisenberg interaction, we use swap gates to move that qubit to position 0 of Fig. 1. We now show how to perform Z ϕ and X θ on the qubit in position 0. Using LMR, given n copies of the state |0〉 input at qubit q *, using only partial swap operations on qubits q 0 and q *, (i.e. applying the Heisenberg interaction between qubits q 0 and q *) we can apply the unitary

$$\exp (-i2\phi |0\rangle \langle 0|)={Z}_{\phi }$$
(22)

(up to a global phase) to accuracy O(n −1). Likewise, using the LMR protocol, given n copies of the state |+〉, using only partial swap interactions between qubits q 0 and q *, we can apply the unitary

$$\exp (-i2\theta |+\rangle \langle +|)={X}_{\theta }$$
(23)

(up to a global phase) to accuracy O(n −1).

To apply an arbitrary single-qubit rotation to accuracy ε, we need O(ε −1) resource states |0〉 and |+〉 (this construction is reminiscent of ideas in ref. 35). Suppose that over the course of an algorithm, one must apply M single-qubit gates and M′ CNOT gates. A CNOT gate requires a constant number of single-qubit gates as well as a constant number of partial swap gates.34 Then to bound the error over the course of the algorithm, we require accuracy of O((M + M′)−1) for each single-qubit gate. Therefore, we require O((M + M′)2) resource states |0〉 and |+〉 in total. Additionally, using the connectivity graph of Fig. 1, to move qubits into proximity with one another to perform any single-qubit or two-qubit gate requires O(N) swap operations operations, where N is the number of qubits. Thus the total number of operations scales as O(N(M + M′)2).

The states |0〉 and |+〉 need not be prepared perfectly for our protocol to work. For example, given depolarized versions of these states, we would need to increase the number of rounds in the LMR protocol by a constant factor. In fact, two arbitrary states (other than |0〉 and |+〉) could be used, as long as they are well characterized and not diagonal in the same basis.

Our model produces a polynomial (in particular squared) blow-up in the number of operations, which still allows for universal quantum computation. However, it would be impossible to obtain a speed-up for problems such as Grover’s search. We hope it is a useful model for systems where the Heisenberg exchange is a natural operation. It may even be useful in non-solid state systems such as cold, trapped atoms, where it was shown that partial swaps could be implemented using Rydberg interactions or through coupling to a cavity.36

## Discussion

We have shown that the LMR protocol is optimal for the problem of simulating unknown Hamiltonians encoded as quantum states. Moreover, the protocol and its generalizations also turn out to be optimal for a variety of other tasks, such as discriminating between pure states and Hamiltonian evolution under the commutators of unknown states. We hope that this study will motivate the discovery of other possible applications of this versatile protocol.

We have not shown the optimality of our protocol for simulating the evolution by the multinomials in Eq. (12). It would be interesting to investigate whether it is optimal, or whether better algorithms can be found.

Another interesting aspect is the role of ancilla qubits in our protocol. While the original LMR protocol for Hamiltonian simulation is based on partial swaps and hence does not require ancilla qubits, the use of ancillas seems to be essential in our more general simulation protocol (see Fig. 2 in ‘Methods’). We wonder whether the use of ancillas is necessary in our protocol or, for example, whether it can instead be implemented using the continuous permutations introduced in ref. 12. These continuous permutations generalize the partial swap operation and do not require ancillas.

Another possible direction is to investigate distributed versions of our protocols in the context of multiparty communication. Reference 37 considers a protocol for simulating distributed unitaries over multiple remote parties using shared entanglement and a limited amount of quantum communication, and the techniques they use are reminiscent of those of the LMR protocol. It would be interesting to investigate the connections of ref. 37 with the protocols in our work.

Finally, the LMR protocol can be seen as allowing the encoding of the operation e iρt into multiple copies of a quantum state ρ. As discussed in ‘LMR protocol vs. state tomography’, having access to O(t 2/δ) copies of ρ allows a user to perform the operation e iρt, but may be insufficient for the user to determine what ρ is through tomography. It is an intriguing question whether other quantum operations could be encoded into states in this way, so that a user could perform the quantum operation but learn little else about what operation is being performed. This could be seen as a form of quantum copy-protection.38 See ref. 39 for some progress in this direction, and ref. 40 for negative results when the encoding is required to be a circuit and not a state.

## Methods

In this section, we give proofs for two of the main results in the paper: Theorem 2 (optimality of the LMR protocol), and Theorem 5 (the protocol for simulating arbitrary Hermitian polynomials of the input states). Many of the other proofs in this paper are similar, and can be found in the Supplementary Information.

### Proof of Theorem 2

The upper bound holds by the LMR protocol, Theorem 1, so we will only prove the lower bound. The fact that the trace norm lower bounds the diamond norm makes a tight lower bound in terms of the trace norm a stronger result than if we had used the diamond norm. Let

$$\rho (x):=x\left|0\right\rangle \left\langle 0\right|+(1-x)\left|1\right\rangle \left\langle 1\right|=\frac{1}{2}1+\left(x-\frac{1}{2}\right)Z.$$
(24)

Then, given many copies of an unknown state ρ, suppose we want to distinguish between the cases ρ 1: = ρ(1/2) and $${\rho }_{2}:=\rho (\frac{1}{2}+\epsilon )$$, with $$0\, < \, \epsilon \le 1/2$$, promised ρ is one of the two. One way of doing this is to consider the single-qubit unitary operator $${\mathcal{U}}(\rho ,t):=\exp (-i\rho t)$$. Then for t ε : = π/(2ε) the operators $${\mathcal{U}}({\rho }_{i},{t}_{\epsilon })$$ become orthogonal, namely,

$${\mathcal{U}}({\rho }_{1},{t}_{\epsilon })\propto {\mathbb 1,}\quad \quad {\mathcal{U}}({\rho }_{2},{t}_{\varepsilon })\propto Z,$$
(25)

where indicates that we have hidden an unimportant phase factor. Consequently, applying $${\mathcal{U}}(\rho ,t)$$ to |+〉 and measuring in the X-basis will distinguish ρ 1 from ρ 2 with certainty.

Thus, we can distinguish between ρ = ρ 1 or ρ = ρ 2 with probability at least 2/3 using no more than f(t ε ,1/3) copies of ρ by implementing a map that differs from $${\mathcal{U}}(\rho ,{t}_{\epsilon })$$ by trace norm 1/3. However, Lemma D1 in the Supplementary Information tells us that C η /ε 2 samples of ρ are required if ε < η ≤ 1/2. Therefore

$$f({t}_{\varepsilon }\mathrm{,1/3})\ge {C}_{\eta }/{\epsilon }^{2}=C{t}_{\epsilon }^{2},\quad {t}_{\epsilon }\ge \pi ,$$
(26)

using the definition of t ε , and where C := 4C η /π 2 is some positive constant. Eq. (26) holds whenever t ε  ≥ π since ε ≤ 1/2 and so $${t}_{\varepsilon }=\tfrac{\pi }{2}\cdot \tfrac{1}{\varepsilon }\ge \pi$$.

Now suppose instead we have arbitrary δ and t satisfying δ ≤ 1/6 and t/δ ≥ 6π, as assumed in the theorem statement. We note the following inequality for any $$t\in {\mathbb{R}}$$ and any integer m ≥ 0:

$$mf(t,\delta )\ge f(mt,m\delta ),$$
(27)

which holds because one way of simulating $$\exp (-i\rho mt)$$ up to error is to run m times a simulation of $$\exp (-i\rho t)$$ up to error δ. Taking $$m=\lceil 1/(6\delta )\rceil$$, we have

$$f(t,\delta )\ge f(mt,m\delta )/m$$
(28)
$$\ge C{(mt)}^{2}/m=Cm{t}^{2}$$
(29)
$$=\Omega ({t}^{2}/\delta ),$$
(30)

where Eq. (29) holds because  ≤ 1/6 + δ ≤ 1/3 and mt ≥ t/(6δ) ≥ π, so Eq. (26) applies.

We now give a proof of Theorem 5. One key tool in the proof will be the following lemma, which lets us simulate a Hamiltonian given by the difference of two subnormalized states:

### Lemma 12

Let $$\rho ^{\prime} \in {\rm{D}}({{\mathbb{C}}}^{2}\otimes { {\mathcal H} }_{{\rm{A}}})$$ be a quantum state of the form ρ′ = |0〉〈0|ρ + + |1〉〈1|ρ , where ρ +,ρ are unknown subnormalized states with Tr ρ + + Tr ρ  = 1. Using n samples of ρ, a quantum algorithm can transform σ AB into $${\tilde{\sigma }}_{{\rm{AB}}}$$ such that

$$\frac{1}{2}{\Vert ({e}^{-iHt}\otimes {\mathbb 1}_{{\rm{B}}}){\sigma }_{{\rm{AB}}}({e}^{iHt}\otimes {\mathbb 1}_{{\rm{B}}})-{\tilde{\sigma }}_{{\rm{AB}}}\Vert }_{1}\le O(\delta ),\quad H={\rho }_{+}-{\rho }_{-},$$
(31)

if n = O(t 2/δ).

The idea is to use the first qubit of ρ′ as a control that determines whether one applies a positive or negative time evolution of partial swap between the second register of ρ′ and the target state. The rest of the proof (found in Supplementary Information Section E.2) proceeds as in the proof sketch of the standard LMR protocol.

### Proof of Theorem 5

We first consider a term H r with r = (1,2,…,k), for some k such that 2 ≤ k ≤ K. (More general r will follow easily from this special case.) Let S k be the cyclic permutation of k copies of $${ {\mathcal H} }_{{\rm{A}}}$$ that acts as follows: S k |j 1,j 2,…,j k 〉 = |j k ,j 1,…,j k−1〉. In other words,

$${S}_{k}:=\sum _{{j}_{1},{j}_{2},\ldots ,{j}_{k}{=1}}^{\dim { {\mathcal H} }_{{\rm{A}}}}\left|{j}_{k}\right\rangle \left\langle {j}_{1}\right|\otimes \left|{j}_{1}\right\rangle \left\langle {j}_{2}\right|\otimes \left|{j}_{2}\right\rangle \left\langle {j}_{3}\right|\otimes \cdots \otimes \left|{j}_{k-1}\right\rangle \left\langle {j}_{k}\right|.$$
(32)

Consider the circuit in Fig. 2. The output is of the form $${{\rho \prime}^{(r)}}=\left|0\right\rangle \left\langle 0\right|\otimes {\rho }_{+}^{(r)}+\left|1\right\rangle \left\langle 1\right|\otimes {\rho }_{-}^{(r)}$$, where

$$\begin{matrix}{\rho }_{+}^{(r)}:=\frac{1}{2}({|a|}^{2}{\rho }_{1}+{|b|}^{2}{\rho }_{k}+a{b}^{\ast }{\rho }_{1}{\rho }_{2}\cdots {\rho }_{k}+{a}^{\ast }b{\rho }_{k}{\rho }_{k-1}\cdots {\rho }_{1}),\\ {\rho }_{-}^{(r)}:=\frac{1}{2}({|a|}^{2}{\rho }_{1}+{|b|}^{2}{\rho }_{k}-a{b}^{\ast }{\rho }_{1}{\rho }_{2}\cdots {\rho }_{k}-{a}^{\ast }b{\rho }_{k}{\rho }_{k-1}\cdots {\rho }_{1})\mathrm{.}\end{matrix}$$
(33)

When we chose $$a{b}^{\ast }={e}^{i{\phi }_{r}}\mathrm{/2}$$, we find

$${\rho }_{+}^{(r)}-{\rho }_{-}^{(r)}=\frac{1}{2}{e}^{i{\phi }_{r}}{\rho }_{1}{\rho }_{2}\cdots {\rho }_{k}+\frac{1}{2}{e}^{-i{\phi }_{r}}{\rho }_{k}{\rho }_{k-1}\cdots {\rho }_{1}={H}_{r}\mathrm{.}$$
(34)

To deal with arbitrary r with |r| = k, simply supply the appropriate input states ρ j in Fig. 2.

Now without loss of generality, we can assume c r  ≥ 0 for all r, since the sign can be absorbed into the phase ϕ r . Therefore by sampling from rR with probability c r /c and creating ρ(r), we obtain the state

$$\rho ^{\prime} =\frac{1}{c}\left (\sum _{r\in R}{c}_{r}{{\rho \prime}^{(r)}}\right )=\frac{1}{c}\left (\left |0\right \rangle \left \langle 0\right |\otimes \left (\sum _{r\in R}{c}_{r}{\rho }_{+}^{(r)}\right )+\left |1\right\rangle \left\langle 1\right |\otimes \left (\sum _{r\in R}{c}_{r}{\rho }_{-}^{(r)}\right )\right )\mathrm{.}$$
(35)

By Lemma 12, we can therefore simulate the Hamiltonian

$$H=\sum _{r\in R}{c}_{r}({\rho }_{+}^{(r)}-{\rho }_{-}^{(r)})=\sum _{r\in R}{c}_{r}{H}_{r}$$
(36)

for the desired time and precision using O(c 2 t 2/δ) copies of ρ′. Since each copy of ρ′ requires a sample of a state ρ(r), and each of these states requires at most $$L={\max }_{r\in R}|r|$$ copies of states in {ρ 1,…,ρ K }, we obtain the stated total sample complexity.

To calculate the average number of uses of ρ j , we note that ρ j is used v j (r) times to create the state ρ(r), and to create the state ρ′, the state ρ(r) is chosen with probability |c j |/c. Thus ρ j is used on average κ j  = ∑ rR v j (r)|c r |/c times to create a single ρ′. Then since O(c 2 t 2/δ) copies of ρ′ are used in the simulation, we obtain the stated complexity.