Several quantum algorithms are known to outperform their classical counterparts by computational costs that asymptotically scale better, e.g., Shor’s prime factoring algorithm1, Hamiltonian simulation2,3 and Grover search4,5. Their realization on actual quantum computers, however, requires additional qubits and gates to correct errors that naturally occur in real physical devices. Currently available noisy quantum computers are not capable yet of running such quantum algorithms for large problem sizes.

In the context of noisy quantum circuits, there are two regimes in which the classical computational requirements for simulating a quantum computer remain tractable. First, shallow circuits typically generate small amounts of entanglement making them amenable to classical simulation. Second, deep circuits quickly accumulate errors causing decoherence towards a regime which can also be treated efficiently on classical computers6,7. Between these two extremes, there is an optimal working point at which maximum non-trivial quantum correlation is attained and where accurate simulation may become challenging for a classical computer8. In light of this, a promising route towards achieving a genuine quantum advantage without fault tolerance is to realize the aforementioned algorithms while operating the computer at its optimal working point. In order to design such an algorithm, it is therefore essential to account for the influence of noise on the circuits which implement it.

In this work, we propose to heuristically optimize the depth of quantum circuits and operate where we can make the most out of our noisy quantum computer. With this heuristic approach, we provide the first realization of quantum signal processing (QSP) on a trapped-ion quantum computer. QSP was proposed in9 and is now recognized as one of the most powerful frameworks for developing quantum algorithms. It gives a unifying perspective on seemingly distinct algorithms such as amplitude amplification and the quantum linear systems algorithm and improves on their computational resources10,11. Such flexibility stems from the fact that QSP allows one to apply almost any polynomial transformation to an input scalar or matrix. In the literature, QSP often refers to a polynomial transformation applied to an input scalar, and its generalizations apply a polynomial transformation to eigenvalues (QET) or singular values (QSVT) of an input matrix. Throughout this article, we do not make such a distinction and refer to all these protocols as QSP.

Hamiltonian simulation is an example where QSP provides an improved asymptotic scaling over other algorithms. Since Feynman’s seminal proposal12, Hamiltonian simulation has been a fundamental problem of quantum computing. An efficient Hamiltonian simulation algorithm allows us to simulate the real-time dynamics of a quantum system described by a Hamiltonian H with computational resources scaling at most polynomially in evolution time t, system size n, and inverse of required accuracy 1/ϵ. Extensive studies have been devoted to exploring efficient algorithms for Hamiltonian simulation, which include product formulas2,13,14,15, quantum walks16, the truncated Taylor-series expansion17, randomized protocols18,19,20,21,22, and making use of classical optimization techniques23,24,25. Nowadays, the QSP-based algorithm is known to exhibit nearly optimal asymptotic scaling10,26,27 (see also28 for a comparative survey).

In29, the authors demonstrate the QSP protocol using random Hamiltonians on a superconducting device for the purpose of benchmarking. The present work takes a step forward by realizing QSP on the Quantinuum H1-1 trapped-ion quantum computer and performing the Hamiltonian simulation of physically relevant quantum systems. After the release of the present manuscript, another group demonstrated QSP for the task of quantum channel discrimination30.


Review of Hamiltonian simulation by quantum signal processing

The Hamiltonian simulation algorithm solves the real-time dynamics of a quantum system by applying a real-time evolution operator e−iHt to some initial state \(\left\vert {\psi }_{0}\right\rangle\), where the Hamiltonian H is given by a Hermitian operator in this work. We employ QSP in order to find an approximate real-time evolution operator that can be efficiently implemented on a quantum computer. QSP outputs a degree-d polynomial \(f\in {\mathbb{C}}[x]\) using a sequence of unitary operators9,10,26,27,

$${U}_{{{{\rm{QSP}}}}}:= \mathop{\prod }\limits_{k=1}^{d}\left[S({\phi }_{k})W(x)\right]=\left(\begin{array}{cc}f(x)&* \\ * &* \end{array}\right),$$
$$S(\phi ):= \left(\begin{array}{cc}{{{{\rm{e}}}}}^{{{{\rm{i}}}}\phi }&0\\ 0&{{{{\rm{e}}}}}^{-{{{\rm{i}}}}\phi }\end{array}\right),$$
$$W(x):= \left(\begin{array}{cc}x&\sqrt{1-{x}^{2}}\\ \sqrt{1-{x}^{2}}&-x\end{array}\right),$$

where * stands for an unspecified entry. Here, we follow the convention of Corollary 8 in10 (preprint version), where W(x) takes the form of a reflection operator. For a polynomial f(x) that satisfies certain conditions10,11 there always exists a set of QSP angles {ϕk}. The conditions are: (i) f must have parity-\((d\,{{{\rm{mod}}}}\,2)\), (ii) f(x) ≤ 1 for all x [ − 1, 1], (iii) f(x) ≥ 1 for all x ( − , 1]  (1, ], and (iv) f(ix)f*(ix) ≥ 1 for all \(x\in {\mathbb{R}}\) if d is even. The function f(x) is implemented by computing such angles {ϕk}, and is encoded in the expectation \(\left\langle 0\right\vert {U}_{{{{\rm{QSP}}}}}\left\vert 0\right\rangle\). It is evident from Eq. (1) that the circuit depth is proportional to the degree d.

Finding an efficient Hamiltonian simulation algorithm with QSP starts by approximating the function e−ixt with a fixed-degree polynomial on an interval I [ − 1, 1]. Given time t > 0 and accuracy ϵpoly, we find a polynomial f such that

$$\mathop{\max }\limits_{x\in I}| f(x)-{{{{\rm{e}}}}}^{-{{{\rm{i}}}}xt}| \le {\epsilon }_{{{{\rm{poly}}}}}.$$

One way to find f is to consider the polynomial approximation to the exponential function given by the Jacobi-Anger expansion26,

$$\begin{array}{ll}\quad{{{{\rm{e}}}}}^{-{{{\rm{i}}}}xt}\,=\,\cos (xt)-{{{\rm{i}}}}\sin (xt),\\ \cos (xt)\,=\,{J}_{0}(t)+2\mathop{\sum }\limits_{k=1}^{\infty }{J}_{2k}(t){T}_{2k}(x),\\ \sin (xt)\,=\,2\mathop{\sum }\limits_{k=1}^{\infty }{J}_{2k+1}(t){T}_{2k+1}(x),\end{array}$$

where Ji(t) is a Bessel function of order i, and Ti(x) is a Chebyshev polynomial of order i. Tolerating an error ϵpoly, the polynomial can be truncated at degree

$$d=\Theta \left(t+\frac{\log (1/{\epsilon }_{{{{\rm{poly}}}}})}{\log ({{{\rm{e}}}}+\log (1/{\epsilon }_{{{{\rm{poly}}}}})/t)}\right),$$

which is almost linear in t and logarithmic in 1/ϵpoly. Here, we use the big-Θ notation, i.e., for functions f and g we write f(x) = Θ(g(x)) if there exist constants c1, c2, and x0 such that c1g(x) ≤ f(x) ≤ c2g(x) for any x > x0.

The goal is to apply this polynomial transformation to the eigenvalues of the Hamiltonian H. This is achieved by block encoding H, i.e., embedding H in a unitary operator \({{{\mathcal{W}}}}(H)\) acting on a larger Hilbert space. A number of block-encoding methods have been proposed in the literature10,27,31,32,33 and their applicability depends on the form of the Hamiltonian. For instance, one can employ the linear-combination-of-unitary (LCU) method when H is given as a weighted sum of unitary operators34. Then, by identifying a subspace analogous to a one-qubit space, the block-encoding unitary \({{{\mathcal{W}}}}(H)\) and a generalized rotation operator \({{{\mathcal{S}}}}(\phi )\) behave like the single-qubit operations W(x) and S(ϕ) in Eq. (1).

Our aim is to run a small-scale QSP-based Hamiltonian simulation on a quantum computer with no fault-tolerance mechanism. This is challenging because noise limits the maximum depth of our circuits. We present a practical protocol to run the Hamiltonian simulation by QSP, while taking hardware noise into account.


Recall that QSP applies a polynomial transformation to the eigenvalues of the Hamiltonian. The eigenvalues need to be rescaled in a suitable interval so that the Hamiltonian can be encoded as a sub-block of a unitary operator. By unitarity, the largest possible interval in Eq. (4) is [ − 1, 1]. However, the protocol is made more efficient if we further narrow the interval down to [0, 1] and ~e−ixt by an even function of x35. A general preprocessing method to rescale the spectrum of H in [a, b]  [0, 1] is given by

$$\tilde{H}=\frac{(H-{\lambda }_{-}I)(b-a)}{{\lambda }_{+}-{\lambda }_{-}}+aI,$$

where λ+ and λ are upper and lower bounds on the eigenvalues, respectively (see Fig. 1A). To recover the desired time evolution, we counterbalance with a time rescaling

$$\tilde{t}=\frac{t({\lambda }_{+}-{\lambda }_{-})}{b-a}.$$
Fig. 1: The proposed protocol for the realization of QSP on a noisy quantum computer.
figure 1

We choose Hamiltonian simulation as the application. We start with a necessary preprocessing step (A) that maps the input parameters to an effective Hamiltonian \(\tilde{H}\) and an effective simulation time \(\tilde{t}\). In step (B), \(\tilde{H}\) is embedded in a unitary operator. By classically optimizing/compiling a circuit \({{{\mathcal{W}}}}\) this step produces a compressed version of a block-encoding circuit. Next, in the operator-function design (C), we approximate the real-time evolution function, e−ixt, by a polynomial f(x) of degree d. While increasing the degree leads to a more accurate polynomial approximation, the computation suffers from larger noise effects. This is due to the growing depth of the QSP circuit, consisting of \({{{\mathcal{O}}}}(d)\) primitive gates. By accounting for the error rate pTQ of two-qubit gates, we heuristically estimate the optimal degree yielding the smallest combined error. The processing step (D) finally realizes QSP using the compressed block-encoding circuit \({{{\mathcal{W}}}}\) and the designed polynomial f(x). Upon postselection on the ancilla’s measurement outcomes, we obtain an approximation to the desired real-time evolution e−iHt. An error mitigation scheme based on the error rate pTQ further reduces the effect of noise on the output.

This yields the desired real-time evolution operator up to an irrelevant global phase: \({{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{t}\tilde{H}}={{{{\rm{e}}}}}^{-{{{\rm{i}}}}\phi }{{{{\rm{e}}}}}^{-{{{\rm{i}}}}tH}\), where ϕ = t(aλ+ − bλ)/(b − a). The exact minimum \({\lambda }_{\min }\) and maximum \({\lambda }_{\max }\) eigenvalues are unknown and finding them is computationally intractable in general36,37,38. That is why we resort to bounds. Equation (8) shows that the effective evolution time \(\tilde{t}\) increases as the QSP interval [a, b] gets smaller, and as the eigenvalue bounds get looser. For example, suppose λ± are taken such that \(({\lambda }_{+}-{\lambda }_{\max })/| {\lambda }_{\max }| =({\lambda }_{\min }-{\lambda }_{-})/| {\lambda }_{\min }| =r\ge 0\), i.e., the bounds λ+/− are 100r% off from \({\lambda }_{\max /\min }\). From Eq. (8) we obtain

$$\tilde{t}=\frac{t({\lambda }_{\max }-{\lambda }_{\min })}{b-a}+\frac{rt(| {\lambda }_{\max }| +| {\lambda }_{\min }| )}{b-a}.$$

The first term is the smallest effective time achievable, while the second term is extra overhead. Note that \(\tilde{t}\) determines the polynomial degree d (e.g., Eq. (6) for the truncated Jacobi-Anger expansion), and thus the circuit depth.

When the Hamiltonian is provided as a weighted sum H = ∑kckHk of operators {Hk}, simple bounds are readily available: λ± = ± ∑kckHk, where is the spectral norm. Tighter bounds can be obtained by relaxing the ground-state constraints39,40 and/or exploiting some structure in the Hamiltonian. For translation-invariant systems, the Anderson bound41, and a particular semi-definite programme relaxation, can provide a lower bound with an error that is independent of system size42. Furthermore, for a large class of local Hamiltonians, one can formulate a hierarchy of semi-definite programming constraints with increasing complexity that can be solved numerically with tensor network and renormalization group techniques43.

Compressed block-encoding

The second key step of the protocol (Fig. 1B) is to input the Hamiltonian to the quantum computer so that it can be processed. For ϵBE ≥ 0, a block-encoding \({{{\mathcal{W}}}}\) of \(\tilde{H}\) is defined by

$$\begin{array}{l}{\left\Vert \tilde{{{{\mathcal{W}}}}}-\tilde{H}\right\Vert }_{{{{\rm{F}}}}}={\epsilon }_{{{{\rm{BE}}}}},\\ \tilde{{{{\mathcal{W}}}}}:= (\left\langle {0}^{a}\right\vert \otimes I){{{\mathcal{W}}}}(\left\vert {0}^{a}\right\rangle \otimes I),\end{array}$$

where F is the Frobenius norm and the integer a is the number of ancillary qubits. Note that \((\left\langle {0}^{a}\right\vert \otimes I)\cdot (\left\vert {0}^{a}\right\rangle \otimes I)\) projects onto the subspace where the ancillary qubits are in the all-zero state. The accuracy of the block encoding is specified by the parameter ϵBE.

Depending on the form of \(\tilde{H}\), there exist different block-encoding methods10,27,31,32,33,34. While such generic methods are scalable in principle, the required number of ancillary qubits and the circuit depth may preclude an implementation on current noisy quantum devices. Here, we propose two ways to overcome this by compressing the block-encoding circuit.

First, we use a parameterized quantum circuit \({{{\mathcal{W}}}}={{{\mathcal{W}}}}({{{\boldsymbol{\theta }}}})\) as ansatz and minimize Eq. (10) with respect to the parameters θ. The possible presence of barren plateaus in the optimization landscape could prohibit quantum-classical hybrid methods from being efficient at larger system sizes44,45,46. In this case, a fully classical approach is preferable47. We thus suggest to use tensor network ansätze that can be efficiently optimized on a classical computer.

Second, we make use of multiplexor circuit compilation to compress the LCU block-encoding circuit48,49. The multiplexor compilation reduces the number of elementary gates required to implement sequential multi-controlled unitary operations which are heavily used in the LCU circuit. Since the compilation adopted here does not introduce approximation error, it provides an exact block-encoding, i.e., ϵBE = 0.

In the Methods section we discuss both approaches in more detail.

Operator-function design

The depth of a QSP circuit is proportional to the degree d of the polynomial. When using noisy devices, we must fix d so that the final circuit has a reasonable fidelity. Later on, we provide a heuristic to choose d as a function of \(\tilde{t}\) and hardware noise. For now, let us assume that d is fixed and proceed to the function design (Fig. 1C). Instead of using the Jacobi-Anger expansion, we numerically optimize the QSP angles {ϕk}. The preprocessing step has rescaled the eigenvalues of H in [a, b]  [0, 1], so we restrict the optimization to that interval. Furthermore, we can utilize polynomials of even parity, i.e., QSP polynomials of even degree d. The resulting accuracy is

$${\epsilon }_{{{{\rm{poly}}}}}=\mathop{\min }\limits_{\{{\phi }_{k}\}}\mathop{\max }\limits_{x\in [a,b]}\left\vert \left\langle 0\right\vert {U}_{{{{\rm{QSP}}}}}(\{{\phi }_{k}\})\left\vert 0\right\rangle -{{{{\rm{e}}}}}^{-{{{\rm{i}}}}x\tilde{t}}\right\vert .$$

Figure 2a shows the accuracy for different values of degree and evolution time. For each value of d, we find the QSP angle sequence using a dedicated python package called pyqsp50. As expected, the error decreases as the degree gets larger for a given evolution time. It is also observed that the error increases as the evolution time gets longer for a fixed degree.

Fig. 2: Heuristic search of optimal parameters for the five-qubit hardware experiment.
figure 2

a Accuracy of QSP angle optimization, Eq. (11), using pyqsp50. b Upper bound to the infidelity, Eq. (16), as a function of degree d and evolution time Jt. c For each evolution time, the optimal degree dopt is the degree that minimizes the total error ϵtotal in b.

The error stemming from both block-encoding Eq. (10) and operator-function design Eq. (11) propagates to the accuracy of the whole algorithm. This is found by expanding the error as10,31,

$$\begin{array}{l}\left\Vert {{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{t}\tilde{H}}-f(\tilde{{{{\mathcal{W}}}}})\right\Vert \\ \le \left\Vert {{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{t}\tilde{H}}-{{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{t}\tilde{{{{\mathcal{W}}}}}}\right\Vert +\left\Vert {{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{t}\tilde{{{{\mathcal{W}}}}}}-f(\tilde{{{{\mathcal{W}}}}})\right\Vert \\ \le | \tilde{t}| \,{\left\Vert \tilde{H}-\tilde{{{{\mathcal{W}}}}}\right\Vert }_{{{{\rm{F}}}}}+\left\Vert {{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{t}\tilde{{{{\mathcal{W}}}}}}-f(\tilde{{{{\mathcal{W}}}}})\right\Vert \\ {=}| \tilde{t}| \,{\epsilon }_{{{{\rm{BE}}}}}+{\epsilon }_{{{{\rm{poly}}}}}=:{\epsilon }_{{{{\rm{QSP}}}}},\end{array}$$

where we have defined \(f(\tilde{{{{\mathcal{W}}}}}):= {\sum }_{{\lambda }_{\tilde{{{{\mathcal{W}}}}}}}f({\lambda }_{\tilde{{{{\mathcal{W}}}}}})\left\vert {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\rangle \left\langle {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\vert\) with the eigenstates \(\{\left\vert {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\rangle \}\) of \(\tilde{{{{\mathcal{W}}}}}\) such that \(\tilde{{{{\mathcal{W}}}}}\left\vert {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\rangle ={\lambda }_{\tilde{{{{\mathcal{W}}}}}}\left\vert {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\rangle\). In the third line, we use inequality \(\parallel {{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{t}\tilde{H}}-{{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{t}\tilde{{{{\mathcal{W}}}}}}\parallel \le | \tilde{t}| \,\parallel \tilde{H}-\tilde{{{{\mathcal{W}}}}}{\parallel }_{{{{\rm{F}}}}}\) (see Lemma 50 in ref. 31, preprint version) and the fact that the spectral norm is upper bounded by the Frobenius norm.

Let us now incorporate the effect of hardware noise via a simple noise model. This allows us to develop a heuristic for estimating the optimal polynomial degree, given the evolution time and the noise rate of our quantum device. Letting \(\left\vert {\psi }_{0}\right\rangle\) be a n-qubit initial state and \(\left\vert {0}^{a}\right\rangle\) be the a-qubit ancillary state, the quantum computation is described by

$$\sigma ={{{{\mathcal{U}}}}}_{{{{\rm{QSP}}}}}(\left\vert {0}^{a}\right\rangle \left\langle {0}^{a}\right\vert \otimes \left\vert {\psi }_{0}\right\rangle \left\langle {\psi }_{0}\right\vert ){{{{\mathcal{U}}}}}_{{{{\rm{QSP}}}}}^{{\dagger} },$$

where \({{{{\mathcal{U}}}}}_{{{{\rm{QSP}}}}}\) represents the unitary implementing the QSP protocol, which will be defined later in Eq. (17). We model the noise effect of the hardware with the depolarizing channel \({{{{\mathcal{D}}}}}_{p}\) acting on the entire system. It alters the state to

$${{{{\mathcal{D}}}}}_{p}[\sigma ]=(1-p)\sigma +p\frac{I}{{2}^{n+a}},$$

where we set \(p=1-{(1-{p}_{{{{\rm{TQ}}}}})}^{{N}_{{{{\rm{TQ}}}}}}\) with the two-qubit gate infidelity pTQ and the number of two-qubit gates NTQ in the \({{{{\mathcal{U}}}}}_{{{{\rm{QSP}}}}}\) circuit. The fidelity between this state and the ideal target state \(\left\vert {\psi }_{\tilde{t}}\right\rangle := {{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{H}\tilde{t}}\left\vert {\psi }_{0}\right\rangle\) quantifies the error,

$$\begin{array}{l}(\left\langle {0}^{a}\right\vert \otimes \left\langle {\psi }_{\tilde{t}}\right\vert ){{{{\mathcal{D}}}}}_{p}[\sigma ](\left\vert {0}^{a}\right\rangle \otimes \left\vert {\psi }_{\tilde{t}}\right\rangle )\\ =(1-p){\left\vert \left\langle {\psi }_{\tilde{t}}\right\vert f(\tilde{{{{\mathcal{W}}}}})\left\vert {\psi }_{0}\right\rangle \right\vert }^{2}+\frac{p}{{2}^{n+a}},\end{array}$$

Thus, the corresponding infidelity is bounded as

$$\begin{array}{l}1-(\left\langle {0}^{a}\right\vert \otimes \left\langle {\psi }_{\tilde{t}}\right\vert ){{{{\mathcal{D}}}}}_{p}[\sigma ](\left\vert {0}^{a}\right\rangle \otimes \left\vert {\psi }_{\tilde{t}}\right\rangle )\\ =1-(1-p){\left\vert 1-\left\langle {\psi }_{\tilde{t}}\right\vert \left({{{{\rm{e}}}}}^{-{{{\rm{i}}}}\tilde{H}\tilde{t}}-f(\tilde{{{{\mathcal{W}}}}})\right)\left\vert {\psi }_{0}\right\rangle \right\vert }^{2}-\frac{p}{{2}^{n+a}}\\ {\le }1-(1-p){(1-{\epsilon }_{{{{\rm{QSP}}}}})}^{2}-\frac{p}{{2}^{n+a}}=:{\epsilon }_{{{{\rm{total}}}}}.\end{array}$$

Figure 2b shows the upper bound in Eq. (16) as a function of degree and evolution time, where the algorithmic error ϵQSP [Eq. (12)] is obtained for the Hamiltonian given in Eq. (23). The two-qubit gate error rate is set to pTQ = 2.577 × 10−3 (see Methods for details) and the circuits of degree d {2, 4, 6, 8, 10, 12, 14} contain NTQ {52, 98, 144, 190, 236, 282, 328} two-qubit gates, respectively. In contrast to the operator-function design error in Fig. 2a, the total error in Fig. 2b has a sweet spot for each value of Jt. Intuitively, the increase of the degree reduces the algorithmic error ϵQSP while making the noise effect more prominent due to the larger circuit depth. This motivates the following heuristic: for a given evolution time, pick the degree that minimizes the upper bound on the total error Eq. (16) (see refs. 51,52, where a similar approach has been applied to Grover’s algorithm). Importantly, this step of the protocol does not require the use of a quantum computer. The optimal degree for Eq. (16) is found numerically using classical computation. Additionally, the sweet spot may coincide with the hardware’s optimal working point where we expect a classical simulation of the corresponding noisy quantum circuit to be most challenging6,8, further justifying our heuristic choice.

Figure 2c shows that the optimal degree dopt is approximately linear in the evolution time t. The estimated degrees are corroborated by the complementary numerical study that we carried out and presented in the Methods section. It is important to emphasize that our approximately linear scaling in time is different from the one expected by noiseless QSP. Our heuristic is designed to run the noisy quantum computer to its full potential, but may still produce large errors. This happens when the simulation parameters {H, t, pTQ} are not compatible in the first place. For instance, at a fixed error rate pTQ and large simulation time t, it is reasonable to expect a large infidelity. In contrast, Hamiltonian simulation by noiseless QSP achieves linear scaling in time while providing full control over the total error. For example, one can use a perfect block-encoding, ϵBE = 0, along with the desired approximation error ϵpoly in Eq. (6).


In this last step of the protocol, we apply the polynomial f found in Eq. (11) to the block-encoded Hamiltonian \(\tilde{{{{\mathcal{W}}}}}\) (Fig. 1D). For an even integer d, the QSP unitary takes the form10,11,

$$\begin{array}{l}{{{{\mathcal{U}}}}}_{{{{\rm{QSP}}}}}:= \mathop{\prod }\limits_{k=1}^{d/2}\left[{{{\mathcal{S}}}}({\phi }_{2k-1}){{{{\mathcal{W}}}}}^{{\dagger} }{{{\mathcal{S}}}}({\phi }_{2k}){{{\mathcal{W}}}}\right]\\ \qquad\quad=\mathop{\bigoplus}\limits_{{\lambda }_{\tilde{{{{\mathcal{W}}}}}}}\left(\begin{array}{cc}f({\lambda }_{\tilde{{{{\mathcal{W}}}}}})&* \\ * &* \end{array}\right)\otimes \left\vert {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\rangle \left\langle {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\vert =\left(\begin{array}{cc}f(\tilde{{{{\mathcal{W}}}}})&* \\ * &* \end{array}\right),\end{array}$$
$${{{\mathcal{S}}}}(\phi ):= \mathop{\bigoplus}\limits_{{\lambda }_{\tilde{{{{\mathcal{W}}}}}}}\left(\begin{array}{cc}{{{{\rm{e}}}}}^{{{{\rm{i}}}}\phi }&0\\ 0&{{{{\rm{e}}}}}^{-{{{\rm{i}}}}\phi }\end{array}\right)\otimes \left\vert {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\rangle \left\langle {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\vert ,$$

where the direct sum is taken over the eigenstates \(\{\left\vert {\lambda }_{\tilde{{{{\mathcal{W}}}}}}\right\rangle \}\) of \(\tilde{{{{\mathcal{W}}}}}\) and the upper-left block of the matrices represents the \(\left\vert {0}^{a}\right\rangle \left\langle {0}^{a}\right\vert\) component of the corresponding operators. Thus, starting from the initial ancillary state \(\left\vert {0}^{a}\right\rangle\), and post-selecting on the ancillary state \(\left\vert {0}^{a}\right\rangle\) at the end, we obtain

$$(\left\langle {0}^{a}\right\vert \otimes I){{{{\mathcal{U}}}}}_{{{{\rm{QSP}}}}}(\left\vert {0}^{a}\right\rangle \otimes I)=f(\tilde{{{{\mathcal{W}}}}}),$$

which approximates the desired real-time evolution operator e−iHt.

Let us now discuss how to post-process the measurement results and mitigate the noise effects on observables. We let the noisy quantum state simulated on the hardware before any measurement be η, which is generally different from the state affected only by the depolarizing channel given by Eq. (14). For simplicity, we consider the expectation value, \({{{\rm{Tr}}}}[\bar{P}\eta ]\), of \(\bar{P}:= \left\vert {0}^{a}\right\rangle \left\langle {0}^{a}\right\vert \otimes P\), where P is a Pauli operator acting on the system register. The variance is \({{{{\rm{Var}}}}}_{\eta ,\bar{P}}={{{\rm{Tr}}}}[\bar{I}\eta ]-{{{\rm{Tr}}}}{[\bar{P}\eta ]}^{2}\). We mitigate the noise effects by modelling it with the depolarizing channel53,54,55. In particular, we use the same noise model that we previously employed when estimating the optimal polynomial degree. The expectation value of \(\bar{P}\) with respect to the state in Eq. (14) is

$${{{\rm{Tr}}}}[\bar{P}{{{{\mathcal{D}}}}}_{p}[\sigma ]]=(1-p)\left\langle {\psi }_{0}\right\vert f{(\tilde{{{{\mathcal{W}}}}})}^{{\dagger} }Pf(\tilde{{{{\mathcal{W}}}}})\left\vert {\psi }_{0}\right\rangle .$$

where \(p=1-{(1-{p}_{{{{\rm{TQ}}}}})}^{{N}_{{{{\rm{TQ}}}}}}\). We infer the noiseless expectation value from the noisy expectation value as

$${\langle \bar{P}\rangle }_{\eta }^{{{{\rm{mitig}}}}}:= \frac{{{{\rm{Tr}}}}[\bar{P}\eta ]}{1-p}.$$

This is understood as mitigating the depolarizing noise, at the cost of a larger variance,

$${{{{\rm{Var}}}}}_{\eta ,\bar{P}}^{{{{\rm{mitig}}}}}=\frac{{{{{\rm{Var}}}}}_{\eta ,\bar{P}}}{{(1-p)}^{2}}=\frac{{{{{\rm{Var}}}}}_{\eta ,\bar{P}}}{{(1-{p}_{{{{\rm{TQ}}}}})}^{2{N}_{{{{\rm{TQ}}}}}}}.$$

This implies that the number of samples needed to achieve a fixed sampling error increases exponentially in NTQ. Therefore, reducing the depth of the circuit is extremely important even though the noise effect on the expectation value \({\langle \bar{P}\rangle }_{\eta }^{{{{\rm{mitig}}}}}\) is mitigated.

Hardware experiment

In order to demonstrate the protocol, we perform the QSP-based Hamiltonian simulation experiments on the Quantinuum H1-1 trapped-ion quantum computer. We simulate the real-time dynamics of the quantum system described by the one-dimensional Ising spin Hamiltonian

$$H=-J\mathop{\sum }\limits_{i=0}^{n-2}{Z}_{i}{Z}_{i+1}-\mathop{\sum }\limits_{i=0}^{n-1}{h}_{i}{X}_{i}-m\mathop{\sum }\limits_{i=0}^{n-1}{Z}_{i},$$

We quantify entanglement growth by bi-partitioning the system into subsystems A and \(\bar{A}\) and then computing the time dependence of the von Neumann entropy

$${S}_{{{{\rm{vN}}}}}=-{{{\rm{Tr}}}}[{\rho }_{A}\log {\rho }_{A}],$$

and the degree-2 Rényi entropy

$${S}_{{{{\rm{R}}}}}^{(2)}=-\log {{{\rm{Tr}}}}[{\rho }_{A}^{2}]$$

on the nA-qubit subsystem A, where \({\rho }_{A}={{{{\rm{Tr}}}}}_{\bar{A}}[\rho ]\).

We perform state tomography by measuring the Pauli expectation values via

$${c}_{P}=\frac{{\langle \bar{P}\rangle }_{\eta }^{{{{\rm{mitig}}}}}}{{\langle \bar{I}\rangle }_{\eta }^{{{{\rm{mitig}}}}}},$$

for an operator \(P\in \,{{{{\rm{Pauli}}}}}_{A}:= {\{I,X,Y,Z\}}^{\otimes {n}_{A}}\backslash \{{I}^{\otimes {n}_{A}}\}\) on A (see Methods), which leads to an estimator of the density matrix,

$${\rho }_{A}=\frac{I+{\sum }_{P\in {{{{\rm{Pauli}}}}}_{A}}{c}_{P}P}{{2}^{{n}_{A}}}.$$

Since the denominator of Eq. (26) would be one in the absence of algorithmic error and noise effects, the quantity in Eq. (26) approximates the expectation value of the Pauli operator P as is further discussed in the Methods section. We note that the computation of von Neumann entropy is not scalable in general. However, the current procedure can be straightforwardly applied to the computation of degree-2 Rényi entropy using the swap trick56,57,58,59,60,61 or randomized measurement protocols62,63,64,65,66,67.

The H1-1 system operates by controlling the S1/2 hyperfine clock states of trapped 171Yb+ ions, which play the role of qubits68,69; there are a total of 20 qubits in the system at the time the experiments are conducted (see ref. 70 for details on the H1-1 system). In addition to single-qubit rotations, a two-qubit native gate \(\exp (-{{{\rm{i}}}}\theta Z\otimes Z/2)\) with \(\theta \in {\mathbb{R}}\) can be applied to an arbitrary pair of qubits giving the system all-to-all connectivity. This is enabled by the ability of the H1-1 system to move any pair of ions to one of five isolated interaction zones where quantum operations (initialization, gate application, measurement) are executed in a manner that suppresses the rate of crosstalk and allows for high-fidelity two-qubit gates.

In the first experiment, we consider the n = 3 Ising spin chain with hi/J = − 1.05 for all i and m/J = 0.5 in Eq. (23). The system is known to display rapid growth of entanglement71,72. We preprocess the Hamiltonian H given in Eq. (23) to find \(\tilde{H}\) via Eq. (7) with a = 0, b = 1, and λ± = ± (2J + 3h + 3m). We obtain a compressed block-encoding circuit by variational optimization using two ancillary qubits and L = 3 layers obtaining an error ϵBE = 1.8 × 10−2 (see Methods for details). The subsystem A is taken to be the zeroth site of the system register (see Fig. 3 for a schematic of this five-qubit experiment).

Fig. 3: Sketch of the setup for the five-qubit experiment.
figure 3

a The system consists of the two-qubit ancillary register (orange ions) and the three-qubit system register. The latter is further partitioned into the one-qubit subsystem A (a red ion) and its complement \(\bar{A}\) (blue ions). b The H1-1 quantum computer operates by manipulating the ions representing the qubits. Each quantum operation (initialization, gate application, measurement) is performed using lasers after the target ions are transported to one of the isolated interaction zones. In the experiments we use five out of the 20 qubits available, and apply up to 328 two-qubit gates.

We consider the real-time evolution with Jt {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7} and starting from the initial state on the system register \(\left\vert {\psi }_{0}\right\rangle ={\left\vert +\right\rangle }^{\otimes 3}\) where \(\left\vert +\right\rangle =(\left\vert 0\right\rangle +\left\vert 1\right\rangle )/\sqrt{2}\). For each evolution time, the degree d is set to dopt {0, 4, 4, 6, 8, 10, 10, 14} following the heuristic shown in Fig. 2c. The resulting number of two-qubit gates in each circuit is NTQ {0, 98, 98, 144, 190, 236, 236, 328}. Error-mitigated Pauli expectation values in Eq. (26) are obtained from Eq. (21) with the two-qubit gate infidelity pTQ = 2.577 × 10−3, the number of two-qubit gates NTQ, and 1000 measurements.

Figure 4a, b show the growth of entanglement entropies with time for our system. The exact time evolution data (dashed line) is obtained from the exact application of the operator e−iHt to the initial state \(\left\vert {\psi }_{0}\right\rangle\). The experimental data obtained from H1-1 is reported with error mitigation (orange circles) as well as without error mitigation (green squares). The noiseless QSP simulation data (blue diamonds) is obtained by classically simulating the algorithm without the noise effects. Error bars represent one standard deviation due to sampling error.

Fig. 4: Experimental results.
figure 4

a The von Neumann entanglement entropy and b the degree-2 Rényi entanglement entropy of the five-qubit experiment on the H1-1 quantum computer. c The von Neumann entanglement entropy and d the degree-2 Rényi entanglement entropy of the seven-qubit experiment. Error bars represent one standard deviation due to sampling error.

The error-mitigated experimental data agree well with the exact values and with the noiseless data up to Jt = 0.6, while there is a discrepancy between the unmitigated data and the rest from as early as Jt = 0.1. We also observe that the error-mitigated data show larger sampling errors (error bars) than the unmitigated data as expected from Eq. (22). The experimentally obtained entanglement entropies generally yield larger values than the exact ones due to algorithmic error and noise effects, which induce the interaction among the system register, ancillary register, and environment surrounding the device. Thus, the von Neumann and Rényi entropies computed on the subsystem A measure the entanglement not only with the system \(\bar{A}\) but also with the ancillary register and environment. Nevertheless, our protocol mitigates these erroneous impacts well. In particular, the agreement between the mitigated experimental data and exact values indicates that our protocol brings both QSP algorithmic error and noise effects under good control for the range of parameters that we assessed.

In the second experiment, we simulate the real-time evolution of the n = 4 Ising spin chain with h1/J = 1 and hi/J = m/J = 0 for i ≠ 1 in Eq. (23). We begin by constructing the exact LCU block-encoding circuit (ϵBE = 0) which uses a = 3 ancillary qubits and 125 two-qubit gates. We compress this circuit using multiplexor compilation and obtain an equivalent circuit with only 44 two-qubit gates. This is a reduction of 64.8% of the original LCU circuit size (see Methods for details). We evolve the initial state \(\left\vert {\psi }_{0}\right\rangle ={\left\vert +\right\rangle }^{\otimes 4}\) on the system register and make 1000 measurements to compute each Pauli expectation value [Eq. (26)] at each time Jt {0.1, 0.4, 0.7}. We again follow the heuristic in Fig. 1C to find dopt {2, 4, 8} for each evolution time Jt. However, we use a different two-qubit gate infidelity, pTQ = 2.185 × 10−3, following an update to the H1-1 device after our first experiment. The resulting number of two-qubit gates in each circuit is NTQ {102, 204, 408}.

We choose the zeroth and first sites of the system register to represent subsystem A. The calculated entanglement entropies are shown in Fig. 4c, d. The discrepancy between the noiseless data (blue diamonds) and exact data (dashed line) is due to the degrees dopt being smaller than those found in the first experiment. Indeed, the heuristic has taken into account the increased number of qubits and two-qubit gates for this second experiment. The degrees found by our heuristic lead to a good agreement between the noiseless data and error-mitigated experimental data (orange circles), except for Jt = 0.7. Note that this parameter setting (Jt = 0.7) yields our largest quantum circuit with as many as 408 two-qubit gates. This experiment exemplifies the importance of finding the optimal working point to balance the algorithmic error, hardware noise, and parameter setting.


We propose a detailed protocol to perform QSP-based Hamiltonian simulation tailored to noisy quantum hardware. Each process is carefully studied to clarify the sources of error in the estimate of target observables, as summarized in Tab. 1. In particular, the polynomial approximation is designed such that the combined error caused by the QSP protocol and noise effect is minimized. The block-encoding circuit is compressed to further reduce the circuit depth for experimental purposes. An error mitigation scheme is used to increase accuracy in the estimate of target expectation values.

Table 1 Summary of the main sources of error in our QSP protocol and how they can be improved upon.

We execute the protocol on the Quantinuum H1-1 quantum computer. As an illustration, the time evolution of von Neumann and degree-2 Rényi entanglement entropies are computed. The results from the hardware experiments agree not only with those from noiseless simulations but with exactly obtained values, which implies the algorithmic error and noise effects are well controlled in the range of parameters that we chose.

An important question is whether the approach can scale to larger demonstrations. Both our heuristic and error mitigation schemes are derived under a simple noise model for the hardware at hand. A sophisticated error model may be required to obtain more accurate outputs for larger instances. Beyond that, one can use quantum error detection codes (see, e.g., ref. 73 for the code tailored for the Quantinuum H1 system) to generate more reliable results at the cost of discarding a portion of the circuit runs, or apply algorithm-level error correction74 for noisy QSP. Finally, it is noted that there exist block-encoding schemes with asymptotically efficient scaling10,27,31,32,33. Their required quantum resources are, however, still beyond the capability of currently available quantum devices. The techniques employed in this article to compress block-encoding circuits are potentially useful to perform larger-scale QSP realizations.

While further theoretical improvements are still required to scale up the protocol, the present study has taken the first step in the experimental realization of QSP-based algorithms and applications.


Compressed block-encoding by variational optimization

Here we elaborate on the block-encoding techniques used in this work. The goal is to optimize a parameterized quantum circuit, \({{{\mathcal{W}}}}({{{\boldsymbol{\theta }}}})\), to minimize the block-encoding error,

$${\epsilon }_{{{{\rm{BE}}}}}=\parallel \tilde{{{{\mathcal{W}}}}}({{{\boldsymbol{\theta }}}})-\tilde{H}{\parallel }_{{{{\rm{F}}}}},\\ \tilde{{{{\mathcal{W}}}}}({{{\boldsymbol{\theta }}}})=(\left\langle {0}^{a}\right\vert \otimes {I}^{\otimes n}){{{\mathcal{W}}}}({{{\boldsymbol{\theta }}}})(\left\vert {0}^{a}\right\rangle \otimes {I}^{\otimes n}),$$

with θ referring to the collection of all the parameters in the circuit. This is equivalent to minimizing the cost function,

$$F({{{\boldsymbol{\theta }}}})={{{\rm{Tr}}}}({\tilde{{{{\mathcal{W}}}}}}^{{\dagger} }\tilde{{{{\mathcal{W}}}}})-2{{{\rm{Re}}}}{{{\rm{Tr}}}}(\tilde{H}\tilde{{{{\mathcal{W}}}}}),$$

where we used that \(\tilde{H}\) is a Hermitian operator. Provided that the Hamiltonian is expanded as \(\tilde{H}={\sum }_{\ell }{c}_{\ell }{P}_{\ell }\) with n-qubit Pauli operators {P}, the error ϵBE is obtained from F(θ) by

$${({\epsilon }_{{{{\rm{BE}}}}})}^{2}=F({{{\boldsymbol{\theta }}}})-{{{\rm{Tr}}}}({\tilde{H}}^{2})=F({{{\boldsymbol{\theta }}}})-{2}^{n}\mathop{\sum}\limits_{\ell }{c}_{\ell }^{2}.$$

We consider a particular structure for the parameterized quantum circuit which satisfies the reflection condition \({{{\mathcal{W}}}}{({{{\boldsymbol{\theta }}}})}^{2}={I}^{\otimes n}\). This condition is not crucial to the construction of QSP. However, we empirically found that the constraint makes optimization of block encoding easier. One ansatz satisfying the reflection condition is shown in Fig. 5 and given by

$${{{\mathcal{W}}}}({{{\boldsymbol{\theta }}}})=V({{{\boldsymbol{\theta }}}})\,\overline{CZ}\,V{({{{\boldsymbol{\theta }}}})}^{{\dagger} },$$

where V(θ) is a unitary operator specified by the right circuit of Fig. 5, and \(\overline{CZ}\) stands for the sequential application of controlled-Z gates that is shown in the middle of the upper circuit.

Fig. 5: Quantum circuit diagrams for compressed block-encoding by variational optimization.
figure 5

(Left) An example of (a + n)-qubit parameterized quantum circuit \({{{\mathcal{W}}}}({{{\boldsymbol{\theta }}}})\) satisfying the qubitization condition. (Right) An example of sub-circuit V(θ). The circuit inside the dashed box is repeated L times with new variational parameters added for each layer. The single- and two-qubit gates used in the circuit are \(R({{{\boldsymbol{\theta }}}})=\exp (-{{{\rm{i}}}}{\theta }^{(3)}X/2)\exp (-{{{\rm{i}}}}{\theta }^{(2)}Z/2)\exp (-{{{\rm{i}}}}{\theta }^{(1)}X/2)\) and \({R}_{ZZ}(\theta )=\exp (-{{{\rm{i}}}}\theta Z\otimes Z/2)\). In our five-qubit experiment, we use the bottom n( = 3) qubits as the system register and the top a( = 2) qubits as the ancillary register.

The parameterized quantum circuit \({{{\mathcal{W}}}}({{{\boldsymbol{\theta }}}})\) shown in Fig. 5 is composed of the following gates:

$$\begin{array}{ll}{R}_{X}(\theta )=\exp (-{{{\rm{i}}}}\theta X/2),\\ {R}_{Z}(\theta )=\exp (-{{{\rm{i}}}}\theta Z/2),\\ {R}_{ZZ}(\theta)=\exp (-{{{\rm{i}}}}\theta Z\otimes Z/2),\end{array}$$

where each gate has an independent variational parameter θ. Importantly, these gates are part of the native gate set of the Quantinuum H1-1 quantum computer.

In the present work, the optimization of the block-encoding circuit is performed by minimizing the cost function given in Eq. (29) using a classical state-vector simulation and the quasi-Newton BFGS method75. The optimization is stopped when the gradient norm of the cost function falls below the threshold value 1 × 10−5. The accuracies of the optimized block encoding circuits for the 3-site and 4-site Ising spin Hamiltonian are shown in Fig. 6. In the experiment of the 3-site Ising spin chain, we use the circuit with a = 2 and L = 3, which requires (a + n − 1)(2L + 1) = 28RZZ gates. The optimized circuit has block-encoding error ϵBE = 1.8 × 10−2.

Fig. 6: Error ϵBE of the block encoding circuit as a function of the number of layers L and for each number of ancillary qubits a.
figure 6

We use the Ising spin Hamiltonian with hi/J = − 1.05 for all i and m/J = 0.5. The system size n is three in a and four in b.

We briefly discuss a classical method based on tensor network techniques. By expressing the cost function [Eq. (29)] as a tensor network contraction and using a classical optimizer to find the parameters θ, a block-encoding circuit \({{{\mathcal{W}}}}({{{\boldsymbol{\theta }}}})\) which minimizes ϵBE can be found. The terms in the cost function Eq. (29), \({{{\rm{Tr}}}}({\tilde{{{{\mathcal{W}}}}}}^{{\dagger} }\tilde{{{{\mathcal{W}}}}})\) and \({{{\rm{Tr}}}}(\tilde{H}\tilde{{{{\mathcal{W}}}}})\), can be evaluated using tensor network contractions as illustrated in Fig. 7.

Fig. 7: Tensor network contractions for the evaluation of the cost function.
figure 7

a Contraction of \({{{\rm{Tr}}}}({\tilde{{{{\mathcal{W}}}}}}^{{\dagger} }\tilde{{{{\mathcal{W}}}}})\) for \(\tilde{{{{\mathcal{W}}}}}\) of Fig. 5a. b Contraction of \({{{\rm{Tr}}}}({\tilde{{{{\mathcal{W}}}}}}^{{\dagger} }\tilde{H})\) for \(\tilde{{{{\mathcal{W}}}}}\) of Fig. 5a and \(\tilde{H}\) represented by a matrix product operator. Note that the terms in the gradient (34) and Hessian (35) can be evaluated using similar tensor network contractions.

The cost function in Eq. (29) can be variationally optimized using a classical optimizer, for instance, we can employ a gradient-based method as follows. At each iteration i, we require the gradient vector \({{{{\boldsymbol{{{{\mathcal{G}}}}}}}}}^{(i)}\) of the objective function F(θ) at θ = θ(i):

$${{{{\mathcal{G}}}}}_{k}^{(i)}=\frac{\partial F}{\partial {\theta }_{k}}=2{{{\rm{Re}}}}\left[{{{\rm{Tr}}}}\left({\tilde{{{{\mathcal{W}}}}}}^{{\dagger} }\frac{\partial \tilde{{{{\mathcal{W}}}}}}{\partial {\theta }_{k}}\right)\right]-2{{{\rm{Re}}}}\left[{{{\rm{Tr}}}}\left(\tilde{H}\frac{\partial \tilde{{{{\mathcal{W}}}}}}{\partial {\theta }_{k}}\right)\right].$$

The partial derivatives in each gradient are straightforward to compute via the first of the variational gates given in Eq. (32). We then iterate

$${{{{\boldsymbol{\theta }}}}}^{(i+1)}={{{{\boldsymbol{\theta }}}}}^{(i)}-\gamma \,{{{{\boldsymbol{{{{\mathcal{G}}}}}}}}}^{(i)},$$

with some learning parameter γ > 0 to update the parameters. The iteration is repeated until the norm of the vector of gradients falls below a predefined convergence threshold.

One could improve the convergence rate by additionally computing the Hessian matrix \({{{{\mathcal{H}}}}}^{(i)}\) at the cost of more evaluations of operator expectation values:

$${{{{\mathcal{H}}}}}_{j,k}^{(i)}=\frac{{\partial }^{2}F}{\partial {\theta }_{j}\partial {\theta }_{k}}=2{{{\rm{Re}}}}\left[{{{\rm{Tr}}}}\left({\tilde{{{{\mathcal{W}}}}}}^{{\dagger} }\frac{{\partial }^{2}\tilde{{{{\mathcal{W}}}}}}{\partial {\theta }_{j}\partial {\theta }_{k}}\right)\right]+2{{{\rm{Tr}}}}\left(\frac{\partial {\tilde{{{{\mathcal{W}}}}}}^{{\dagger} }}{\partial {\theta }_{j}}\frac{\partial \tilde{{{{\mathcal{W}}}}}}{\partial {\theta }_{k}}\right)-2{{{\rm{Re}}}}\left[{{{\rm{Tr}}}}\left(\tilde{H}\frac{{\partial }^{2}\tilde{{{{\mathcal{W}}}}}}{\partial {\theta }_{j}\partial {\theta }_{k}}\right)\right].$$

Then, the parameter update in Eq. (34) is replaced with,

$${{{{\boldsymbol{\theta }}}}}^{(i+1)}={{{{\boldsymbol{\theta }}}}}^{(i)}-{\left({{{{\mathcal{H}}}}}^{(i)}\right)}^{-1}{{{{\boldsymbol{{{{\mathcal{G}}}}}}}}}^{(i)}.$$

For the computation of the inverse of the Hessian matrix, we use the fact that this matrix is Hermitian and since our goal is to minimize the objective function in Eq. (29), we are only interested in its positive eigenvalues.

Therefore we compute the pseudo-inverse via the eigendecomposition of the Hessian matrix and set all eigenvalues μk smaller than some small cutoff ϵ to zero, e.g., ϵ = 1 × 10−5. More specifically, the pseudo-inverse is computed by replacing μk by 1/μk in the diagonal matrix of the eigendecomposition using only the positive eigenvalues μk ≥ ϵ (all other eigenvalues are set to zero).

Compressed block-encoding by multiplexor compilation

As an alternative approach to compressing a block-encoding circuit, we employ the linear-combination-of-unitaries (LCU) method34 with the help of an efficient compilation of multi-controlled unitary gates (multiplexors). LCU provides a way to block encode \(\tilde{H}\) when it is expressed as a weighted sum of unitary operators, \({\{{P}_{\ell }\}}_{\ell = 1}^{K}\), \(\tilde{H}=\mathop{\sum }\nolimits_{\ell}{c}_{\ell }{P}_{\ell }\). The LCU consists of two unitary operators:

  1. 1.

    an operator A acting on the ancillary register with \(a=\lceil {\log }_{2}K\rceil\) such that \(A\left\vert {0}^{a}\right\rangle =\frac{1}{\sqrt{c}}{\sum }_{\ell}\sqrt{{c}_{\ell }}\left\vert \ell \right\rangle\) with c = ∑c; and

  2. 2.

    a controlled operator \(B=\mathop{\sum }\nolimits_{\ell}{{{\rm{sign}}}}({c}_{\ell })\left\vert \ell \right\rangle \left\langle \ell \right\vert \otimes {P}_{\ell }\) with the sign function, sign(c) = + 1( − 1) for c ≥ 0(c < 0).

With these,

$${{{\mathcal{W}}}}={A}^{{\dagger} }BA$$

gives an exact block encoding of \(\tilde{H}\), i.e., ϵBE = 0.

The bottleneck of this construction is the implementation of B, which contains a sequential application of multi-controlled-P gates. We make use of the compilation technique of multiplexor, which is developed in49 based on76,77, to reduce the gate complexity without introducing extra ancillary qubits. In the block-encoding of \(\tilde{H}\), we use A = Had3 with the Hadamard gate, Had, and apply the multiplexor compilation to B shown in the right panel of Fig. 8. This results in 44 RZZ gates for the block-encoding circuit \({{{\mathcal{W}}}}\). Indeed, the number of RZZ gates is significantly reduced relative to the circuit obtained without the compilation, which uses 125 RZZ gates.

Fig. 8: Quantum circuit diagrams for compressed block-encoding by multiplexor compilation.
figure 8

(Left) Structure of the LCU-based block encoding \({{{\mathcal{W}}}}\) given by Eq. (37). The top three and bottom four qubits represent the ancillary and system registers, respectively. (Right) The sub-circuit B used for block-encoding the n = 4 Ising spin Hamiltonian with h1/J = 1 and hi/J = m/J = 0 for i ≠ 1, before the multiplexor compilation is applied.

Heuristic estimation of the optimal degree

One key aspect of this work is the estimation of the optimal degree for the QSP polynomial given a certain noise rate. Our heuristic uses the upper bound ϵtotal on the infidelity between the noisy and target states under a simplified noise model. Here we discuss the noise model and provide further numerical results.

For our numerical study, we replace all the two-qubit gates, \({R}_{ZZ}(\theta )=\exp (-{{{\rm{i}}}}\theta Z\otimes Z/2)\) for \(\theta \in {\mathbb{R}}\), by two-qubit depolarizing channels:

$${R}_{ZZ}(\theta )\sigma {R}_{ZZ}{(\theta )}^{{\dagger} }\mapsto (1-{p}_{2}){R}_{ZZ}(\theta )\sigma {R}_{ZZ}{(\theta )}^{{\dagger} }+\frac{{p}_{2}}{15}\mathop{\sum}\limits_{P\in {\{I,X,Y,Z\}}^{\otimes 2}\backslash \{{I}^{\otimes 2}\}}P\sigma P,$$

where σ is some quantum state and we use the error parameter p2 = 2.416 × 10−3. This value is the two-qubit fault probability reported in the System Model H1 Emulator Product Data Sheet70. In particular, in the System Model H1-1 Emulator, the probability p2 is chosen such that the faulty RZZ(π/2) modelled by the following two-qubit depolarizing channel D(2) combined with the other noise channels emulates the noise of Quantinuum H1-1 quantum computer:

$$\begin{array}{rcl}{D}^{(2)}[\sigma ]&=&(1-{p}_{2}){R}_{ZZ}(\pi /2)\sigma {R}_{ZZ}{(\pi /2)}^{{\dagger} }+\frac{{p}_{2}}{15}\mathop{\sum}\limits_{P\in {\{I,X,Y,Z\}}^{\otimes 2}\backslash \{{I}^{\otimes 2}\}}P\sigma P\\ &=&\left(1-\frac{16{p}_{2}}{15}\right)\sigma +\frac{16{p}_{2}}{15}{{{{\rm{Tr}}}}}^{(2)}[\sigma ]\otimes \frac{{I}^{\otimes 2}}{4},\end{array}$$

where \({{{{\rm{Tr}}}}}^{(2)}\) indicates the trace over the two-dimensional subspace which the channel D(2) acts on. We remark that, in the H1-1 Emulator, the faulty RZZ(θ) is modelled by the channel D(2) with θ-dependent fault probability p2(θ) (see ref. 70 for more details). In the present work, we simplify the noise model by using p2 = 2.416 × 10−3 for all the two-qubit gates, RZZ(θ), independent of the angle θ as given by Eq. (38). To clarify the relation between this parameter and the error parameter pTQ used throughout our protocol (see Fig. 1), we note that the same channel D(2) is expressed as

$${D}^{(2)}[\sigma ]=\left(1-{p}_{{{{\rm{TQ}}}}}\right)\sigma +{p}_{{{{\rm{TQ}}}}}{{{{\rm{Tr}}}}}^{(2)}[\sigma ]\otimes \frac{{I}^{\otimes 2}}{4}.$$

Therefore, the new error parameter is identified with pTQ = (16/15)p2 = (16/15)2.416 × 10−3 = 2.577 × 10−3. This is the error parameter used in our infidelity bound.

To strengthen our argument, we verify the infidelity bound using exact density matrix emulations of noisy quantum circuits. We let the density matrix numerically obtained by the QSP protocol with the noise channel (38) be \({\eta }_{{{{\rm{sim}}}}}\). Figure 9a shows the infidelity bound, while Fig. 9b shows the exact infidelity. It is seen that the locations of minima in Fig. 9a, b are close to each other for each evolution time Jt. This observation supports that the degree d minimizing ϵtotal is likely to lead to the smallest possible error on noisy hardware. We emphasize that our heuristic does not require the use of a quantum computer beforehand. The optimal degree is found numerically using classical computation.

Fig. 9: Numerical verification of the indelity bound used in this work.
figure 9

a The upper bound of the infidelity between the target and simulated states. b The infidelity between the target state and simulated state with the noise model in Eq. (38). The locations of minima in a and b are close to each other for each time Jt.

Processing with depolarizing error mitigation

In our hardware experiment we employed state tomography to compute the entanglement entropies. To this end, we estimated the expectation value of a Pauli operator P on the system register by

$$\frac{{\langle \bar{P}\rangle }_{\eta }^{{{{\rm{mitig}}}}}}{{\langle \bar{I}\rangle }_{\eta }^{{{{\rm{mitig}}}}}}.$$

This is understood as taking the expectation of P with the normalized post-selected state. Given an initial quantum state \(\left\vert {\psi }_{0}\right\rangle\) on the system register, we wish to approximate the time-evolved state \({{{{\rm{e}}}}}^{-{{{\rm{i}}}}Ht}\left\vert {\psi }_{0}\right\rangle \left\langle {\psi }_{0}\right\vert {{{{\rm{e}}}}}^{{{{\rm{i}}}}Ht}\) by applying the QSP unitary

$$\sigma ={{{{\mathcal{U}}}}}_{{{{\rm{QSP}}}}}(\left\vert {0}^{a}\right\rangle \left\langle {0}^{a}\right\vert \otimes \left\vert {\psi }_{0}\right\rangle \left\langle {\psi }_{0}\right\vert ){{{{\mathcal{U}}}}}_{{{{\rm{QSP}}}}}^{{\dagger} },$$

followed by the post-selection. We simulate the protocol on the quantum hardware. Let η be the experimentally obtained state on the system and ancillary registers before any measurements, and let \(\tilde{\eta }\) be the state that is post-selected on the ancillary state \(\left\vert {0}^{a}\right\rangle\) and normalized,

$$\tilde{\eta }=\frac{(\left\langle {0}^{a}\right\vert \otimes {I}^{\otimes n})\eta (\left\vert {0}^{a}\right\rangle \otimes {I}^{\otimes n})}{{{{\rm{Tr}}}}[(\left\vert {0}^{a}\right\rangle \left\langle {0}^{a}\right\vert \otimes {I}^{\otimes n})\eta ]}.$$

Then, the expectation value of a Pauli operator P with respect to \(\tilde{\eta }\) is

$${\langle P\rangle }_{\tilde{\eta }}=\frac{{{{\rm{Tr}}}}[(\left\vert {0}^{a}\right\rangle \left\langle {0}^{a}\right\vert \otimes P)\eta ]}{{{{\rm{Tr}}}}[(\left\vert {0}^{a}\right\rangle \left\langle {0}^{a}\right\vert \otimes I)\eta ]}=\frac{{\langle \bar{P}\rangle }_{\eta }}{{\langle \bar{I}\rangle }_{\eta }},$$

This can be estimated with nshots circuit executions with the variance

$${{{{\rm{Var}}}}}_{\tilde{\eta },P}={\langle P\rangle }_{\tilde{\eta }}^{2}\left(\frac{{{{{\rm{Var}}}}}_{\eta ,\bar{P}}}{{\langle \bar{P}\rangle }_{\eta }^{2}}+\frac{{{{{\rm{Var}}}}}_{\eta ,\bar{I}}}{{\langle \bar{I}\rangle }_{\eta }^{2}}\right),$$

where the variances inside the parenthesis are given by \({{{{\rm{Var}}}}}_{\eta ,\bar{P}}=({\langle \bar{I}\rangle }_{\eta }-{\langle \bar{P}\rangle }_{\eta }^{2})/({n}_{{{{\rm{shots}}}}}-1)\) and \({{{{\rm{Var}}}}}_{\eta ,\bar{I}}=({\langle \bar{I}\rangle }_{\eta }-{\langle \bar{I}\rangle }_{\eta }^{2})/({n}_{{{{\rm{shots}}}}}-1)\).

To mitigate noise effects, we model them by a depolarizing channel Dp55 applied to the entire system. Upon application of Dp, the state σ becomes

$${D}_{p}[\sigma ]=(1-p)\sigma +p\frac{{I}^{\otimes n+a}}{{2}^{n+a}},$$

where \(p=1-{(1-{p}_{{{{\rm{TQ}}}}})}^{{N}_{{{{\rm{TQ}}}}}}\) with NTQ two-qubit gates of gate infidelity pTQ. With the state Dp[σ], the expectation values of \(\bar{P}\) and \(\bar{I}\) take forms,

$${\langle \bar{P}\rangle }_{D[\sigma ]}=(1-p){\langle \bar{P}\rangle }_{\sigma },$$
$${\langle \bar{I}\rangle }_{D[\sigma ]}=(1-p){\langle \bar{I}\rangle }_{\sigma }+\frac{p}{{2}^{a}}.$$

Thus, inverting these equations leads to the expectation values without the depolarizing noise, \({\langle \bar{P}\rangle }_{\sigma }={\langle \bar{P}\rangle }_{D[\sigma ]}/(1-p)\) and \({\langle \bar{I}\rangle }_{\sigma }=({\langle \bar{I}\rangle }_{D[\sigma ]}-p/{2}^{a})/(1-p)\). Assuming that the dominant source of error in the experimentally obtained state η is depolarizing noise, we infer the noiseless expectation value as,

$${\langle P\rangle }_{\tilde{\eta }}^{{{{\rm{mitig}}}}}=\frac{{\langle \bar{P}\rangle }_{\eta }^{{{{\rm{mitig}}}}}}{{\langle \bar{I}\rangle }_{\eta }^{{{{\rm{mitig}}}}}}=\frac{{\langle \bar{P}\rangle }_{\eta }}{{\langle \bar{I}\rangle }_{\eta }-p/{2}^{a}}.$$

This is Eq. (41) and is understood as mitigating the depolarizing noise, at the cost of a larger variance,

$${{{{\rm{Var}}}}}_{\tilde{\eta },P}^{{{{\rm{mitig}}}}}={\langle P\rangle }_{\tilde{\eta }}^{2}\left(\frac{{{{{\rm{Var}}}}}_{\eta ,\bar{P}}}{{\langle \bar{P}\rangle }_{\eta }^{2}}+\frac{{{{{\rm{Var}}}}}_{\eta ,\bar{I}}}{{({\langle \bar{I}\rangle }_{\eta }-p/{2}^{a})}^{2}}\right).$$

Note that the quantity in the denominator of the second term evaluates to

$${\langle \bar{I}\rangle }_{\eta }-\frac{p}{{2}^{a}}\approx (1-p)+\frac{p}{{2}^{a}}-\frac{p}{{2}^{a}}={(1-{p}_{{{{\rm{TQ}}}}})}^{{N}_{{{{\rm{TQ}}}}}},$$

where the approximate equality is due to the QSP algorithmic error and other types of noise effects. This implies that the variance, and hence the required number of samples, increases exponentially in NTQ to achieve some fixed sampling error.