Introduction

Faster algorithms for linear algebra are a major promise of quantum computation, holding the potential for precious runtime speed-ups over classical methods. A modern, unified framework for such algorithms is given by the quantum signal processing (QSP)1,2 and, more generally, quantum singular-value transformation (QSVT)3 formalisms. These are powerful techniques to manipulate a matrix, coherently given by a quantum oracle, via polynomial transformations on its eigenvalues and singular values, respectively. The class of matrix arithmetic attained is remarkably broad, encompassing primitives as diverse as Hamiltonian simulation, matrix inversion, ground-state energy estimation, Gibbs-state sampling, among others4. Moreover, the framework often offers the state-of-the-art in asymptotic query complexities (i.e. number of oracle calls), in some cases matching known complexity lower bounds. Nevertheless, the experimental requirements for full implementations are prohibitive for current devices, and it is not clear if the framework will be useful in practice before large-scale fault-tolerant quantum computers appear.

This has triggered a quest for early fault-tolerant algorithms for matrix processing that allow one to trade performance for nearer-term feasibility in a controlled way, i.e. with provable runtime guarantees5,6,7,8,9,10,11,12,13,14. Particularly promising are randomized hybrid quantum-classical schemes to statistically simulate a matrix function via quantum implementations of more elementary ones7,8,9,10,11,12. For instance, this has been applied to the Heaviside step function θ(H) of a Hamiltonian H, which allows for eigenvalue thresholding, a practical technique for Heisenberg-limited spectral analysis7. Two input access models have been considered there: quantum oracles as a controlled unitary evolution of H7,8,13 and classical ones given by a decomposition of H as a linear combination of Pauli operators9,10,11,12. In the former, one Monte-Carlo simulates the Fourier series of θ(H) by randomly sampling its harmonics. In the latter—in an additional level of randomization—one also probabilistically samples the Pauli terms from the linear combination.

Curiously, however, randomized quantum algorithms for matrix processing have been little explored beyond the specific case of the Heaviside function. Reference 11 put forward a randomized, qubit-efficient technique for Fourier-based QSP6,13 for generic functions. However, the additional level of randomization can detrimentally affect the circuit depth per run compared to coherent oracles. On the other hand, in the quantum-oracle setting, the randomized algorithms above have focused mainly on controlled unitary evolution as the input access model. This is convenient in specific cases where H can be analogically implemented. However, it leaves aside the powerful class of block-encoding oracles, i.e., unitary matrices with the input matrix as one of its blocks2. Besides having a broader scope of applicability (including non-Hermitean matrices), such oracle types are also a more natural choice for digital setups. Moreover, randomized quantum algorithms have so far not addressed Chebyshev polynomials, the quintessential basis functions for approximation theory15, which often attain better accuracy than Fourier series16. Chebyshev polynomials, together with block-encoding oracles, provide the most sophisticated and general arena for quantum matrix arithmetic1,2,3,4.

Here, we fill in this gap. We derive a semi-quantum algorithm for Monte-Carlo simulations of QSVT with provably better circuit complexities than fully-quantum schemes as well as advantages in terms of experimental feasibility. Our method estimates state amplitudes and expectation values involving a generic matrix function f(A), leveraging three main ingredients: (i) it samples each component of a Chebyshev series for f with a probability proportional to its coefficient in the series; (ii) it assumes coherent access to A via a block-encoding oracle; and (iii) f(A) is automatically extracted from its block-encoding without post-selection, using a Hadamard test. The combination of (i) and (ii) leaves untouched the maximal query complexity k per run native from the Chebyshev expansion. In addition, the statistical overhead we pay for end-user estimations scales only with the l1-norm of the Chebyshev coefficients. For the use cases we consider, this turns out to be similar (at worst up to logarithmic factors) to the operator norm of f(A), which would govern the statistical overhead if we used fully-quantum QSVT with a Hadamard test. That is, our scheme does not incur any significant degradation with respect to the fully-quantum case either in runtime or circuit depth. On the contrary, the average query complexity can be significantly smaller than k. We prove interesting speed-ups of the former over the latter for practical use cases.

These speed-ups translate directly into equivalent reductions in noise sensitivity: For simple models such as depolarization or coherent errors in the quantum oracle, we show that the estimation inaccuracy due to noise scales with the average query depth. In comparison, it scales with the maximal depth in standard QSVT implementations. Importantly, we implement each sampled Chebyshev polynomial with a simple sequence of queries to the oracle using qubitization; no QSP pulses are required throughout. Finally, (iii) circumvents the need for repeating until success or quantum amplitude amplification. That is, no statistical run is wasted, and no overhead in circuit depth is incurred. In addition, the fully quantum scheme requires an extra ancillary qubit controlling the oracle in order to implement the QSP pulses. All this renders our hybrid approach more experimentally friendly than coherent QSVT.

As use cases, we benchmark our framework on four end-user applications: partition-function estimation of classical Hamiltonians via quantum Markov-chain Monte Carlo (MCMC); partition-function estimation of quantum Hamiltonians via quantum imaginary-time evolution (QITE); linear system solvers (LSSs); and ground-state energy estimation (GSEE). The maximal and expected query depths per run as well as the total expected runtime (taking into account sample complexity), are displayed in Table 1 In all cases, we systematically obtain the following advantages (both per run and in total) of expected versus maximal query complexities.

Table 1 Complexities of our algorithms for end-user applications

For MCMC, we prove a quadratic speed-up on a factor \({\mathcal{O}}(\log ({Z}_{\beta }\,{e}^{\beta }/{\epsilon }_{{\rm{r}}}))\), where Zβ is the partition function to estimate, at inverse temperature β, and ϵr is the tolerated relative error. For QITE, we remove a factor \({\mathcal{O}}(\log (D\,{e}^{\beta }/{Z}_{\beta }\,{\epsilon }_{{\rm{r}}}))\) from the scaling, where D is the system dimension. For LSSs we consider two sub-cases: estimation of an entry of the (normalized) solution vector and of the expectation value of an observable O on it. We prove quadratic speed-ups on factors \({\mathcal{O}}\left(\log (\kappa /\epsilon )\right)\) and \({\mathcal{O}}\left(\log ({\kappa }^{2}\,\parallel O\parallel /\epsilon )\right)\) for the first and second sub-cases, respectively, where O is the operator norm of O, κ is the condition number of the matrix, and ϵ the tolerated additive error. This places our query depth at an intermediate position between that of the best known Chebyshev-based method17 and the optimal one in general18. In turn, compared to the results obtained in11 via full randomization, our scaling is one power of κ superior. Finally, for GSEE, we prove a speed-up on a factor that depends on the overlap η between the probe state and the ground state: the average query depth is \({\mathcal{O}}\left(\frac{1}{\xi }\sqrt{\log (1/\eta )}/\log (1/\xi )\right)\), whereas the maximal query depth is \({\mathcal{O}}\left(\frac{1}{\xi }\log (1/\eta )\right)\), with ξ the additive error in the energy estimate.

Results

Framework

Our basic setup is that of quantum singular value transformation (QSVT)3,4. This is a powerful technique for synthesizing polynomial functions of a matrix A embedded into a block of a unitary matrix UA (the block-encoding oracle), via polynomial transformations on its singular values. Combined with approximation theory19, this leads to state-of-the-art query complexities and an elegant unifying structure for a variety of quantum algorithms of interest. For simplicity of the presentation, in the main text, we focus explicitly on the case of Hermitian matrices. There, QSVT reduces to the simpler setup of quantum signal processing (QSP)1,2, describing eigenvalue transformations. The extension of our algorithms to QSVT for generic matrices is straightforward and is left to Supplementary Note 4. Throughout the paper, we adopt the short-hand notation [l]  {0, …, l − 1} for any \(l\in {\mathbb{N}}\).

We next state our main results. First, we set up explicitly the two problems in question and then proceed to describe our randomized semi-quantum algorithm to solve each one of them, proving correctness, runtime, and performing an error-robustness analysis. We conclude by applying our general framework to a number of exemplary use cases of interest.

Problem statement

We consider the following two concrete problems (throughout the paper, we will use superscripts (1) or (2) on quantities referring to Problems 1 or 2, respectively):

Problem 1

(Transformed vector amplitudes) Given access to state preparation unitaries Uϕ and Uψ such that \({U}_{\psi }\left\vert 0\right\rangle =\left\vert \psi \right\rangle ,{U}_{\phi }\left\vert 0\right\rangle =\left\vert \phi \right\rangle\), a Hermitian matrix A, and a real-valued function f, obtain an estimate of

$${z}^{(1)}=\langle \phi \vert\;f(A)\vert \psi \rangle$$
(1)

to additive precision ϵ with failure probability at most δ.

This class of problems is relevant for estimating the overlap between a linearly transformed state and another state of interest. This is the case, e.g., in linear system solving, where one is interested in the ith computational basis component of a quantum state of the form \({A}^{-1}\left\vert {\bf{b}}\right\rangle\) encoding the solution to the linear system (see the section “Quantum linear-system solvers” for details). The unitary Uϕ preparing the computational-basis state \(\left\vert i\right\rangle\), in that case, is remarkably simple, given by a sequence of bit flips.

Problem 2

(Transformed observable expectation values) Given access to a state preparation ϱ, a Hermitian matrix A, an observable O, and a real-valued function f, obtain an estimate of

$${z}^{(2)}={\rm{Tr}}[O\,\,f(A)\,\varrho \,f{(A)}^{\dagger }]$$
(2)

to additive precision ϵ with failure probability at most δ.

This is relevant, e.g., when A = H is a Hamiltonian, to estimate the partition function corresponding to H, as discussed below in the section “Relative-precision partition function estimation”.

We present randomized hybrid classical–quantum algorithms for these problems using Chebyshev-polynomial approximations of f and coherent access to a block-encoding of A. Similar problems have been addressed in11 but using Fourier approximations and randomizing also over a classical description of A in the Pauli basis.

Randomized semi-quantum matrix processing

Our framework is based on the Chebyshev approximation \(\tilde{f}=\mathop{\sum }\nolimits_{j = 0}^{k}{a}_{j}{{\mathcal{T}}}_{j}(x)\) of the function f and a modified Hadamard test involving the qubitized block-encoding oracle UA. We denote by \({\bf{a}}:= \left\{{a}_{0},\ldots ,{a}_{k}\right\}\) the vector of Chebyshev coefficients of \(\tilde{f}\) and by \({\left\Vert {\bf{a}}\right\Vert }_{1}:= \mathop{\sum }\nolimits_{j = 0}^{k}| {a}_{j}|\) its 1-norm. The idea is to statistically simulate the coherent QSP algorithm using a hybrid classical/quantum procedure based on randomly choosing j [k + 1] according to its importance for ~f and then running a Hadamard test involving the block encoding \({U}_{A}^{j}\) of \({{\mathcal{T}}}_{j}(A)\). Pseudo-codes for the algorithms are presented in Fig. 1a and b for Problems 1 and 2, respectively. In both cases, the Hadamard test is the only quantum sub-routine. The total number of statistical runs will be \(\frac{2}{P}\,{S}^{(P)}\), with P = 1 or 2, where S(P) will be given in Eqs. (8, 9) and (10) below. The factor \(\frac{2}{P}\) is a subtle difference between Algorithms 1 and 2 coming from the fact that the target quantity is a complex-valued amplitude in the former case, while in the latter it is a real number. This implies that two different types of Hadamard tests (each with S(1) shots) are needed to estimate the real and imaginary parts of z(1), while z(2) requires a single one. More technically, the procedure goes as follows. First, for every \(\alpha \in [\frac{2}{P}\,{S}^{(P)}]\) run, the following two steps:

  1. i.

    Classical subroutine: sample a Chebyshev polynomial degree jα [k + 1] (and also lα for P = 2) from a probability distribution weighted by the coefficients a of \(\tilde{f}\) defined by

    $$p(\;j)=\frac{| {a}_{j}| }{{\left\Vert {\bf{a}}\right\Vert }_{1}},\quad \,\text{for all}\,\,j\in [k+1]\,.$$
    (3)

    This has classical runtime \(\tilde{{\mathcal{O}}}(k)\).

  2. ii.

    Quantum subroutine: if P = 1, run the Hadamard test in Fig. 1a with \({B}_{\alpha }={\mathbb{1}}\) for α < S(1) or \({B}_{\alpha }={S}^{\dagger }:= \left\vert 0\right\rangle \left\langle 0\right\vert -i\left\vert 1\right\rangle \left\langle 1\right\vert\) for α ≥ S(1) and use the resulting random bit \({b}_{\alpha }^{(1)}\in \{-1,1\}\) to record a sample of the variable

    $${\tilde{z}}_{\alpha }^{(1)}:= {\left\Vert {\bf{a}}\right\Vert }_{1}\,{\rm{sgn}}({a}_{{j}_{\alpha }})\,{b}_{\alpha }^{(1)}\,.$$
    (4)

    If P = 2, in turn, run the test in Fig. 1b to get as outcomes a random bit \({b}_{\alpha }^{(2)}\in \{-1,1\}\) and a random number \({\omega }_{\alpha }\in {\{{o}_{m}\}}_{m\in [D]}\) where om is the mth eigenvalue of O, and use this to record a sample of

    $${\tilde{z}}_{\alpha }^{(2)}:=\,\parallel\!{\bf{a}}{\parallel }_{1}^{2}\,{\rm{sgn}}({a}_{{j}_{\alpha }}){\rm{sgn}}({a}_{{l}_{\alpha }})\,{b}_{\alpha }^{(2)}\,{\omega }_{\alpha }.$$
    (5)
Fig. 1: Summary of the main algorithms.
figure 1

Alg. 1 in panel a solves Problem 1, whereas Alg. 2 in panel b solves Problem 2. a and b The algorithms receive as inputs: (i) a qubitized block-encoding UA of A; (ii) the vector a of Chebyshev coefficients defining the polynomial approximation \(\tilde{f}\) to the target function f; (iii) state preparation unitaries Uϕ and Uψ (for Alg. 1), or the state ϱ and the observable O (for Alg. 2); (iv) the tolerated error ϵ and failure probability δ for the statistical estimation. The algorithm repeats a number \(\frac{2}{P}{S}^{(P)}\) of times two basic sampling steps. The first step is to classically sample a Chebyshev polynomial degree jα with probability \(p({j}_{\alpha })=| {a}_{{j}_{\alpha }}| /{\left\Vert {\bf{a}}\right\Vert }_{1}\). The second step—the only quantum subroutine—is a Hadamard test (including measurement of O, for Alg. 2) containing jα successive queries to the controlled version of UA (plus another sequence of lα queries but with the control negated and a different oracle-ancilla register for Alg. 2). Finally, the average over all the measurement outcomes gives the statistical estimate of the quantity of interest z(P), for P = 1 or 2. Interestingly, the Hadamard test automatically extracts the correct block of UA, which relaxes the need for post-selection on the oracle ancillae. Therefore, every experimental run contributes to the statistics (i.e., no measurement shot is wasted). c Histograms of a number of times (shots) a Chebyshev polynomial degree j is drawn out of 1000 samples, for the four use cases described in the text. The vertical lines show the maximal Chebyshev degree k (purple) and the average degree \({\mathbb{E}}[j]\) (red). Importantly, for this figure, we do not estimate k analytically using approximation theory. The values of k plotted are numerically obtained as the minimum degree of \(\tilde{f}\) such that the target error ν is attained. The parameters used are: ν = 10−2 (all examples), β = 100 (exponential function), t = 200 (monomial), κ = 8 (inverse function), ξ = 20 (step function). In all cases, we observe a significant reduction in query complexity. This translates in practice into shallower circuits and, hence, less accumulated noise.

Then, in a final classical step, obtain the desired estimate \({\tilde{z}}^{(P)}\) by computing the empirical mean over all the recorded samples as follows:

$${\tilde{z}}^{(1)}=\frac{1}{{S}^{(1)}}\mathop{\sum }\limits_{\alpha =0}^{{S}^{(1)}-1}\left({\tilde{z}}_{\alpha }^{(1)}+i\,{\tilde{z}}_{\alpha +{S}^{(1)}}^{(1)}\right),$$
(6)
$${\tilde{z}}^{(2)}=\frac{1}{{S}^{(2)}}\mathop{\sum }\limits_{\alpha =0}^{{S}^{(2)}-1}{\tilde{z}}_{\alpha }^{(2)}\,.$$
(7)

The following two theorems respectively prove the correctness of the estimator and establish the complexity of the algorithm. A simple but crucial auxiliary result for the correctness is the observation that the Hadamard test statistics (i.e. the expectation value of \({b}_{\alpha }^{(P)}\)) depends only on the correct block of \({U}_{A}^{j}\), removing the need for post-selection. With this, in Supplementary Note 1, we prove the following.

Theorem 1

(Correctness of the estimator) The empirical means \({\tilde{z}}^{(1)}\) and \({\tilde{z}}^{(2)}\) are unbiased estimators of \(\left\langle \phi \,\right\vert\,\tilde{f}(A)\left\vert \psi \right\rangle\) and \({\rm{Tr}}[O\,\tilde{f}(A)\,\varrho \,\tilde{f}{(A)}^{\dagger }]\), respectively.

Importantly, since \(\tilde{f}\) is a ν-approximation to f, the obtained \({\tilde{z}}^{(P)}\) are actually biased estimators of the ultimate quantities of interest z(P) in Eqs. (1) and (2). Such biases are always present in quantum algorithms based on approximate matrix functions, including the fully coherent schemes for QSP1,2 and QSVT3,4. Nevertheless, they can be made arbitrarily small in a tunable manner by increasing the truncation order k in the polynomial approximation ~f.

Here, it is convenient to set k so that ν(P) ≤ ϵ/2, where ν(1)ν and \({\nu }^{(2)}:= \nu \,(2\,\Vert f(A)\,\Vert\,\Vert\,O\,\Vert +\nu )\). This limits the approximation error in Eqs. (1) or (2) to at most ϵ/2. In addition, demanding the statistical error to be also ϵ/2 leads to (see Supplementary Note 1) the following end-to-end sample and oracle-query complexities for the algorithm.

Theorem 2

(Complexity of the estimation) Let ϵ > 0 and δ > 0 be, respectively, the tolerated additive error and failure probability; let a be the vector of coefficients in ~f and ν(P) ≤ ϵ/2 the error in z(P) from approximating f with \(\tilde{f}\). Then, if the number of samples is at least

$${S}^{(1)}=\frac{16{\left\Vert {\bf{a}}\right\Vert }_{1}^{2}}{{\epsilon }^{2}}\log \frac{4}{\delta },\quad{\text{for}}\,P=1,$$
(8)
$${S}^{(2)}=\frac{8\,{\left\Vert O\right\Vert }^{2}{\left\Vert {\bf{a}}\right\Vert }_{1}^{4}}{{\epsilon }^{2}}\log \frac{2}{\delta },\quad {\text{for}}\,P=2,$$
(9)

Equations (6) and (7) give an ϵ-precise estimate of z(P) with confidence 1−δ. Moreover, the total expected runtime is \({Q}^{(P)}:= 2\,{\mathbb{E}}[\;j]\,{S}^{(P)}\), where \({\mathbb{E}}[\;j]:= \mathop{\sum }\nolimits_{j = 0}^{k}j\,p(\;j)\).

A remarkable consequence of this theorem is that the expected number of queries per statistical run is \(P\times {\mathbb{E}}[j]\). Instead, if we used standard QSVT (together with a similar Hadamard test to avoid post-selection), each statistical run would take P × k queries (and an extra ancillary qubit coherently controlling everything else would be required). As shown in Fig. 1c, \({\mathbb{E}}[j]\) can be significantly smaller than k in practice. In fact, for the use cases we analyze, we prove scaling advantages of \({\mathbb{E}}[j]\) over k. These query-complexity advantages translate directly into reductions in circuit depth and, hence, also in noise sensitivity (see next sub-section). As for sample complexity, the statistical overhead of our semi-quantum algorithms scales with \({\left\Vert {\bf{a}}\right\Vert }_{1}\), while that of fully-quantum ones would have a similar scaling with \(\left\Vert f(A)\right\Vert\), due to the required normalization for block encoding. Interestingly, in all the use cases analyzed, \({\left\Vert {\bf{a}}\right\Vert }_{1}\) and \(\left\Vert f(A)\right\Vert\) differ at most by a logarithmic factor. Finally, another appealing feature is that our approach relaxes the need to compute the QSP/QSVT angles, which is currently tackled with an extra classical pre-processing stage of runtime \({\mathcal{O}}\left(\,\text{poly}\,(k)\right)\)1,2,3,4.

We emphasize that here we have assumed Hermitian A for the sake of clarity, but a straightforward extension of our randomized scheme from QSP to QSVT (see Supplementary Note 4) gives the generalization to generic A. Moreover, in Supplementary Note 1, we also extend the construction to Chebyshev polynomials of the second kind. This is useful for the ground-state energy estimation in the section “Ground-state energy estimation”.

Intrinsic noise-sensitivity reduction

Here, we study how the reduction in query complexity per run from k to the average value \({\mathbb{E}}[j]\) translates into sensitivity to experimental noise. The aim is to make a quantitative but general comparison between our randomized semi-quantum approach and fully-quantum schemes, remaining agnostic to the specific choice of operator function, circuit compilation, or physical platform. To this end, we consider two toy error models that allow one to allocate one unit of noise per oracle query.

Our first error model consists of a faulty quantum oracle given by the ideal oracle followed by a globally depolarizing channel Λ of noise strength p, defined by20

$$\Lambda [\varrho ]:= (1-p)\,\varrho +p\,\frac{{\mathbb{1}}}{{D}_{{\rm{tot}}}}.$$
(10)

Here, ϱ is the joint state of the total Hilbert space in Fig. 1a (system register, oracle ancilla, and Hadamard test ancilla) and Dtot its dimension. In Supplementary Note 2, we prove the following.

Theorem 3

(Average noise sensitivity) Let \({\tilde{z}}^{(P)}\) be the ideal estimators (6) and (7) and \({\tilde{z}}^{(P),\Lambda }\) their noisy version with Λ acting after each oracle query in Fig. 1. Then

$$\left\vert {\mathbb{E}}\left[{\tilde{z}}^{(1)}\right]-{\mathbb{E}}\left[{\tilde{z}}^{(1),\Lambda }\right]\right\vert \le p\,{E}_{{\rm{sq}}}^{(1)}\le \,p\,{\left\Vert {\bf{a}}\right\Vert }_{1}\,{\mathbb{E}}[\;j]\,,$$
(11)
$$\left\vert {\mathbb{E}}\left[{\tilde{z}}^{(2)}\right]-{\mathbb{E}}\left[{\tilde{z}}^{(2),\Lambda }\right]\right\vert \le p\,{E}_{{\rm{sq}}}^{(2)}\le \,2\,p\,{\left\Vert {\bf{a}}\right\Vert }_{1}^{2}\,{\mathbb{E}}[\;j],$$
(12)

where

$${E}_{{\rm{sq}}}^{(1)}:= \left| \mathop{\sum }\limits_{j=0}^{k}j\,{a}_{j}\left\langle \phi \right| {{\mathcal{T}}}_{j}(A)\left\vert \psi \right\rangle \right|$$
(13)
$${E}_{{\rm{sq}}}^{(2)}:= \left| \mathop{\sum }\limits_{j,l=0}^{k}(j+l)\,{a}_{j}\,{a}_{l}{\rm{Tr}}\{O\,{{\mathcal{T}}}_{j}(A)\,\varrho \,{{\mathcal{T}}}_{l}(A)\}\right| .$$
(14)

Our second model is coherent errors that make the quantum oracle no longer the exact block encoding UA of A but only an ε-approximate block encoding (a unitary with operator-norm distance ε from UA). In Supplementary Note 2, we show that Eqs. (11) and (12) hold also there with p replaced by 2ε.

It is instructive to compare Eqs. (11) and (12) with the inaccuracy for the corresponding fully-quantum scheme. A fair scenario for that comparison (in the case of Problem 1) is to equip the standard QSVT with a Hadamard test similar to the ones in Fig. 1 so as to also circumvent the need for post-selection. Notice that, while in our randomized method, only the Hadamard ancilla controls the calls to the oracle, the standard QSVT circuit involves two-qubit control to also implement the pulses that determine the Chebyshev coefficients. As a consequence, the underlying gate complexity per oracle query would be considerably higher than for our schemes (with single-qubit gates becoming two-qubit gates, two-qubit gates becoming Toffoli gates, etc.). For this reason, the resulting noise strength pfq is expected to be larger than p. The left-hand side of Eq. (11) would then (see Supplementary Note 2) be upper-bounded by pfqEfq, with \({E}_{{\rm{fq}}}=k\,| \left\langle\,\phi\;\right\vert \tilde{f}(A)\left\vert \psi \right\rangle |\), where pfq > p and \(k\,>\, {\mathbb{E}}[\;j]\).

Another natural scenario for comparison is where the fully-quantum algorithm does not leverage a Hadamard test but implements post-selection measurements on the oracle ancilla, in a repeat-until-success strategy. This comparison applies only to Problem 2 since one cannot directly measure the complex amplitudes for Problem 1. The advantage, though, is that the circuits are now directly comparable because the gate complexities per oracle query are essentially the same (the fully quantum scheme has extra QSP pulses, but these are single-qubit gates whose error contribution is low). Hence, similar error rates to p are expected here so that one would have the equivalent of Eq. (12) being \({\mathcal{O}}(k\,p)\). This is already worse than Eq. (12) because \(k \,>\, {\mathbb{E}}[j]\), as already discussed. Moreover, with post-selection, one additionally needs to estimate normalizing constants with an independent set of experimental runs, which incurs extra systematic and statistical errors. In contrast, our method does not suffer from this issue, as it directly gives the estimates in Eqs. (1) or (2) regardless of state normalization (see the use cases below).

Finally, a third possibility could be to combine the fully quantum scheme with quantum amplitude amplification to manage the post-selection. This would quadratically improve the dependence on the post-selection probability. However, the circuit depth would then gain a factor inversely proportional to the square root of the post-selection probability. Unfortunately, this is far out of reach of early-fault tolerant hardware.

In what follows, we illustrate the usefulness of our framework with four use cases of practical relevance: partition function estimation (both for classical or general Hamiltonians), linear system solving, and ground-state energy estimation. These correspond to f(x) = xt, eβx, x−1, and θ(x), respectively. The end-to-end complexities for each case are summarized in Table 1.

Relative-precision partition function estimation

Partition function estimation is a quintessential hard computational problem, with applications ranging from statistical physics to generative machine learning, as in Markov random fields21, Boltzmann machines22, and even the celebrated transformer architecture23 from large language models. Partition functions also appear naturally in other problems of practical relevance, such as constraint satisfaction problems24.

The partition function of a Hamiltonian H at inverse temperature β is defined as

$${Z}_{\beta }={\rm{Tr}}\left[{e}^{-\beta H}\right].$$
(15)

One is typically interested in the problem of estimating Zβ to relative error ϵr, that is, finding \({\tilde{Z}}_{\beta }\) such that

$$\left\vert {\tilde{Z}}_{\beta }-{Z}_{\beta }\right\vert \le {\epsilon }_{{\rm{r}}}\,{Z}_{\beta }\,.$$
(16)

This allows for the estimation of relevant thermodynamic functions, such as the Helmholtz free energy \(F=\frac{1}{\beta }\log {Z}_{\beta }\), to additive precision. The naive classical algorithm based on direct diagonalization runs in time \({\mathcal{O}}({D}^{3})\), where \(D=\,\text{dim}\,({{\mathcal{H}}}_{s})\) is the Hilbert space dimension. Although it can be improved to \({\mathcal{O}}(D)\) using the kernel polynomial method25 if H is sparse, one expects no general-case efficient algorithm to be possible due to complexity theory arguments26. In turn, if the Hamiltonian is classical (diagonal), Zβ can be obtained exactly in classical runtime \({\mathcal{O}}(D)\). General-purpose quantum algorithms (that work for any inverse temperature and any Hamiltonian) have been proposed27,28,29. The list includes another algorithm28 that, like ours, utilizes the Hadamard test and a block-encoding of the Hamiltonian.

In the following, we present two different quantum algorithms for partition function estimation: one for classical Ising models, based on the Markov-Chain Monte-Carlo (MCMC) method, and another for generic non-commuting Hamiltonians, based on quantum imaginary-time evolution (QITE) simulation5,30.

Partition function estimation via MCMC

Here, we take H as the Hamiltonian of a classical Ising model. As such, spin configurations, denoted by \(\left\vert {\bf{y}}\right\rangle\), are eigenstates of H with corresponding energies Ey. Let us define the coherent version of the Gibbs state \(\left\vert \sqrt{{\boldsymbol{\pi }}}\right\rangle := {Z}_{\beta }^{-1/2}{\sum }_{{\bf{y}}}{e}^{-\beta {E}_{{\bf{y}}}/2}\left\vert {\bf{y}}\right\rangle\). Then, for any \(\left\vert {\bf{y}}\right\rangle\), the partition function satisfies the identity

$$\begin{array}{r}{Z}_{\beta }=\frac{{e}^{-\beta {E}_{{\bf{y}}}}}{\left\langle {\bf{y}}\right\vert {\Pi }_{{\boldsymbol{\pi }}}\left\vert {\bf{y}}\right\rangle }\end{array}$$
(17)

with \({\Pi }_{{\boldsymbol{\pi }}}:= \left\vert \sqrt{{\boldsymbol{\pi }}}\right\rangle \left\langle \sqrt{{\boldsymbol{\pi }}}\right\vert\). Below we discuss how to use our framework to obtain an estimation of \(\left\langle {\bf{y}}\right\vert {\Pi }_{{\boldsymbol{\pi }}}\left\vert {\bf{y}}\right\rangle\) for a randomly sampled \(\left\vert {\bf{y}}\right\rangle\) and, therefore, approximate the partition function.

Let A be the discriminant matrix31 of a Markov chain having the Gibbs state of H at inverse temperature β as its unique stationary state. The Szegedy quantum walk unitary31 provides a qubitized block-encoding UA of A that can be efficiently implemented32. A useful property of A is that the monomial At approaches Ππ for sufficiently large integer t (the precise statement is given in Supplementary Note 3). This implies that \(\left\langle {\bf{y}}\right\vert {\Pi }_{{\boldsymbol{\pi }}}\left\vert {\bf{y}}\right\rangle\) can be estimated using Alg. 1 with f(A) = At and \(\left\vert \psi \right\rangle =\left\vert \phi \right\rangle =\left\vert {\bf{y}}\right\rangle\). In this case, the state preparation unitaries Uψ = Uϕ will be simple bit flips.

A ν-approximation \(\tilde{f}(A)\) can be constructed by truncating the Chebyshev representation of At to order \(k=\sqrt{2\,t\log (2/\nu )}\)19. The l1-norm of the corresponding coefficient vector is \({\left\Vert {\bf{a}}\right\Vert }_{1}=1-\nu\). For this Chebyshev series, the ratio \({\mathbb{E}}[j]/k\) between the average and the maximum query complexities can be shown (see Supplementary Note 3) to be at most \({(1-\nu )}^{-1}/\sqrt{\pi \,\log (2/\nu )}\) for large t. This implies that the more precise the estimation, the larger the advantage of the randomized algorithm in terms of total expected runtime. For instance, for ν = 10−2, the ratio is roughly equal to 0.25.

To estimate the partition function up to relative error ϵr, Alg. 1 needs to estimate \(\left\langle {\bf{y}}\right\vert {\Pi }_{{\boldsymbol{\pi }}}\left\vert {\bf{y}}\right\rangle\) with additive error \(\epsilon =\frac{{e}^{-\beta {E}_{{\bf{y}}}}}{2\,{Z}_{\beta }}{\epsilon }_{{\rm{r}}}\) (see Supplementary Note 3). In Supplementary Note 3, we show that the necessary t and ν required for that yield a maximum query complexity per run of \(k=\sqrt{\frac{2}{\Delta }}\log (\frac{12\,{Z}_{\beta }\,{e}^{\beta {E}_{{\bf{y}}}}}{{\epsilon }_{{\rm{r}}}})\) and an average query complexity of \({\mathbb{E}}[j]=\sqrt{\frac{2}{\pi \,\Delta }\log (\frac{12\,{Z}_{\beta }\,{e}^{\beta {E}_{{\bf{y}}}}}{{\epsilon }_{{\rm{r}}}})}\), where Δ is the spectral gap of A. Moreover, from Theorem 2, the necessary sample complexity is \({S}^{(1)}=64\,{e}^{2\beta {E}_{{\bf{y}}}}\,{Z}_{\beta }^{2}\,\frac{\log (2/\delta )}{{\epsilon }_{{\rm{r}}}^{2}}\). This leads to the total expected runtime in Table 1.

Three important observations about the algorithm’s complexities are in place. First, the total expected runtime has no explicit dependence on the Hilbert space dimension D and maintains the square-root dependence on Δ (a Szegedy-like quadratic quantum speed-up31). Second, all three complexities in the first row of the Table 1 depend on the product \({Z}_{\beta }\,{e}^{\beta {E}_{{\bf{y}}}}={\mathcal{O}}\left({e}^{\beta ({E}_{{\bf{y}}}-{E}_{\min })}\right)\), with \({E}_{\min }\) the minimum eigenvalue of H, where the scaling holds for large β. This scaling plays more in our favor the lower the energy Ey of the initial state y is. Hence, by uniformly sampling a constant number of different bit-strings y and picking the lowest energy one, one ensures to start with a convenient initial state. Third, the quadratic advantage featured by \({\mathbb{E}}[j]\) over k on the logarithmic term is an interesting type of speed-up entirely due to the randomization over the components of the Chebyshev series.

To end up with, the total expected runtime obtained can potentially provide a quantum advantage over classical estimations in regimes where \(\frac{{e}^{2\beta ({E}_{{\bf{y}}}-{E}_{\min })}}{\sqrt{\Delta }\,{\epsilon }_{{\rm{r}}}^{2}} \,<\, D\).

Partition function estimation via QITE

Alternatively, the partition function associated with a Hamiltonian H can be estimated by quantum simulation of imaginary time evolution (QITE). This method applies to any Hamiltonian (not just classical ones), assuming a block-encoding of H. Zβ can be written in terms of the expectation value of the QITE propagator eβH over the maximally mixed state \({\varrho }_{0}:= \frac{{\mathbb{1}}}{D}\), that is

$${Z}_{\beta }=D\,{\rm{Tr}}\left[{e}^{-\beta H}{\varrho }_{0}\right]\,.$$
(18)

Therefore, we can apply our Alg. 2 with \(A=H,O=D{\mathbb{1}},\varrho ={\varrho }_{0}\), and f(H) = eβH/2 to estimate Zβ with relative precision ϵr and confidence 1−δ. The sample complexity is obtained from Eq. (10) as \({S}^{(2)}=\frac{8\,{D}^{2}{e}^{2\beta }}{{\epsilon }_{{\rm{r}}}^{2}{Z}_{\beta }^{2}}\log \frac{2}{\delta }\), by setting the additive error equal to Zβϵr.

We use the Chebyshev approximation of the exponential function introduced in ref. 19, which has a quadratically better asymptotic dependence on β than other well-known expansions such as the Jacobi-Anger decomposition5. This expansion was used before to implement the QITE propagator using QSVT coherently3. The resulting truncated Chebyshev series has order \(k=\sqrt{2\,\max \left\{\frac{{e}^{2}\beta }{2},\log \left(\frac{8D\,{e}^{\beta }}{{Z}_{\beta }\,{\epsilon }_{{\rm{r}}}}\right)\right\}\,\log \left(\frac{16D\,{e}^{\beta }}{{Z}_{\beta }\,{\epsilon }_{{\rm{r}}}}\right)}\) and coefficient l1-norm \({\left\Vert {\bf{a}}\right\Vert }_{1}\le {e}^{\beta /2}+\nu\) (see Supplementary Note 3). Interestingly, the average query depth does not depend on the precision of the estimation but scales as \({\mathcal{O}}(\sqrt{\beta })\) with a modest constant factor for any ϵr (see Supplementary Note 3). This implies an advantage of \({\mathcal{O}}\left(\log \left(\frac{D\,{e}^{\beta }}{{Z}_{\beta }\,{\epsilon }_{{\rm{r}}}}\right)\right)\) in terms of overall runtime as compared to coherent QSVT, which is again entirely due to our randomization scheme.

Overall, this gives our algorithm a total expected runtime of \({\mathcal{O}}\left(\frac{{D}^{2}\sqrt{\beta }\,{e}^{2\beta }}{{Z}_{\beta }^{2}}\frac{\log (2/\delta )}{{\epsilon }_{{\rm{r}}}^{2}}\right)\). The previous state-of-the-art algorithm from ref. 28 has runtime \(\tilde{{\mathcal{O}}}\left(\frac{{D}^{2}{D}_{a}^{2}{e}^{2\beta }{\beta }^{2}}{{\epsilon }_{{\rm{r}}}^{2}{Z}_{\beta }^{2}}\log \frac{1}{\delta }\right)\). Compared with that, we get an impressive quartic speed-up in β together with the entire removal of the dependence on \({D}_{a}^{2}\). The improvement comes from not estimating each Chebyshev term individually and allowing the ancillas to be pure while only the system is initialized in the maximally mixed state.

Finally, compared to the \({\mathcal{O}}({D}^{3})\) scaling of the classical algorithm based on exact diagonalization, our expected runtime has a better dependence on D. Moreover, in the regime of small β such that \({Z}_{\beta }^{2} \,>\, {\mathcal{O}}\left(\sqrt{\beta }\,{e}^{2\beta }\log (1/\delta )/{\epsilon }_{{\rm{r}}}^{2}\right)\), the expected runtime can be even better than that of the kernel method, which scales as \({\mathcal{O}}(D)\).

Quantum linear-system solvers

Given a matrix \(A\in {{\mathbb{C}}}^{D}\times {{\mathbb{C}}}^{D}\) and a vector \({\bf{b}}\in {{\mathbb{C}}}^{D}\), the task is to find a vector \({\bf{x}}\in {{\mathbb{C}}}^{D}\) such that

$$A\,{\bf{x}}={\bf{b}}.$$
(19)

The best classical algorithm for a generic A is based on Gaussian elimination, with a runtime \({\mathcal{O}}({D}^{3})\)33. For A positive semi-definite and sparse, with sparsity (i.e. maximal number of non-zero elements per row or column) s, the conjugate gradient algorithm34 can reduce this to \({\mathcal{O}}(Ds\kappa )\), where \(\kappa := \left\Vert A\right\Vert \,\left\Vert {A}^{-1}\right\Vert\) is the condition number of A. In turn, the randomized Kaczmarz algorithm35 can yield an ϵ-precise approximation of a single component of x in \({\mathcal{O}}\left(s\,{\kappa }_{{\rm {F}}}^{2}\log (1/\epsilon )\right)\), with \({\kappa }_{{\rm {F}}}:= {\left\Vert A\right\Vert }_{{\rm {F}}}\left\Vert {A}^{-1}\right\Vert\) and \({\left\Vert A\right\Vert }_{{\rm {F}}}\) the Frobenius norm of A.

In contrast, quantum linear-system solvers (QLSSs)3,4,17,18,36,37,38,39,40 prepare a quantum state that encodes the normalized version of the solution vector x in its amplitudes. More precisely, given quantum oracles for A and \(\left\vert {\bf{b}}\right\rangle := \frac{1}{{\left\Vert {\bf{b}}\right\Vert }_{2}}{\sum }_{i}{b}_{i}\left\vert i\right\rangle\) as inputs, they output the state \(\left\vert {\bf{x}}\right\rangle := \frac{1}{\parallel {\bf{x}}{\parallel }_{2}}{\sum }_{i}{x}_{i}\left\vert i\right\rangle\), where \({\left\Vert \cdot \right\Vert }_{2}\) is the l2-norm and we assume \(\left\Vert A\right\Vert \le 1\) for simplicity of presentation (see Supplementary Note 4 for the case of unnormalized A). Interestingly, circuit compilations of block encoding oracles for A with gate complexity \({\mathcal{O}}\left(\log (D/\epsilon )\right)\) have been explicitly worked out assuming a QRAM access model to the classical entries of A41. This can be used for extracting relevant features—such as an amplitude 〈ϕx〉 or an expectation value \(\left\langle {\bf{x}}\right\vert O\left\vert {\bf{x}}\right\rangle\)—from the solution state, with potential exponential speed-ups over known classical algorithms, assuming that the oracles are efficiently implementable and \(\kappa ={\mathcal{O}}\left(\,\text{polylog}\,(D)\right)\).

Reference 18 proposed an asymptotically optimal QLSS based on a discrete version of the adiabatic theorem with query complexity \({\mathcal{O}}\left(\kappa \log (1/\epsilon )\right)\). Within the Chebyshev-based QSP framework, the best known QLSS uses \({\mathcal{O}}\left(\kappa \log (\kappa /\epsilon )\right)\) oracle queries17. If the final goal is, for instance, to reconstruct a computational-basis component 〈ix〉 of the solution vector, the resulting runtime becomes \({\mathcal{O}}\left(({\kappa }^{3}/{\epsilon }^{2})\log (\kappa /\epsilon )\right)\), since this requires \({\mathcal{O}}\left({\kappa }^{2}/{\epsilon }^{2}\right)\) measurements on \(\left\vert {\bf{x}}\right\rangle\). Importantly, however, in order to relate the abovementioned features of \(\left\vert {\bf{x}}\right\rangle\) to the corresponding ones from the (unnormalized) classical solution vector x, one must also independently estimate x2. This can still be done with QLSSs (e.g., with quantum amplitude estimation techniques), but requires extra runs. Our algorithms do not suffer from this issue, providing direct estimates from the unnormalized vector \({A}^{-1}\left\vert {\bf{b}}\right\rangle\).

More precisely, with f being the inverse function on the cut-off interval \({{\mathcal{I}}}_{\kappa }:= [1,-1/\kappa ]\cup [1/\kappa ,1]\), our Algs. 1 and 2 readily estimate amplitudes \(\left\langle \phi \right\vert {A}^{-1}\left\vert {\bf{b}}\right\rangle\) and expectation values \(\left\langle {\bf{b}}\right\vert {A}^{-1}O{A}^{-1}\left\vert {\bf{b}}\right\rangle\), respectively. The technical details of the polynomial approximation \(\tilde{f}\) and complexity analysis are deferred to Supplementary Note 3. In particular, there we show that, to approximate f to error ν, one needs a polynomial of degree \(k={\mathcal{O}}\left(\kappa \,\log (\kappa /\nu )\right)\) and \({\left\Vert {\bf{a}}\right\Vert }_{1}={\mathcal{O}}\left(\kappa \sqrt{\log ({\kappa }^{2}/\nu )}\right)\). For our purposes, as discussed before theorem 2, to ensure a target estimation error ϵ on the quantity of interest one must have \(\nu ={\mathcal{O}}(\epsilon )\) for Alg. 1 and \(\nu ={\mathcal{O}}({(\kappa \left\Vert O\right\Vert )}^{-1}\epsilon )\) for Alg. 2. This leads to the sample complexities \({S}^{(1)}={\mathcal{O}}\left(({\kappa }^{2}/{\epsilon }^{2}){\log }^{2}({\kappa }^{2}/\epsilon )\log (4/\delta )\right)\) and \({S}^{(2)}={\mathcal{O}}(({\kappa }^{4}{\left\Vert O\right\Vert }^{2}/{\epsilon }^{2}){\log }^{4}({\kappa }^{3}\,\left\Vert O\right\Vert /\epsilon ))\log (4/\delta )\left.\right)\), respectively.

The expected query depth and total expected runtimes are shown in Table 1. In particular, the former exhibits a quadratic improvement in the error dependence with respect to the maximal query depth k. This places our algorithm in between the \({\mathcal{O}}(\kappa \log (\kappa /\epsilon ))\)17 scaling of the fully quantum algorithm and the asymptotically optimal \({\mathcal{O}}(\kappa \log (1/\epsilon ))\) scaling of18, therefore making it more suitable for the early fault-tolerance era. In fact, our expected query depth can even beat this optimal scaling for \(\kappa \,\lesssim\, {(1/\epsilon )}^{\log (1/\epsilon )-1}\). Note also that our total expected runtimes are only logarithmically worse in κ than the ones in the fully-quantum case. For the case of Alg. 1, an interesting sub-case is that of \(\left\langle \phi \right\vert =\left\langle i\right\vert\), as this directly gives the ith component of the solution vector x. The quantum oracle Uϕ is remarkably simple there, corresponding to the preparation of a computational-basis state. As for the runtime, we recall that \(\left\Vert A\right\Vert \le {\left\Vert A\right\Vert }_{{\rm {F}}}\) in general and \({\left\Vert A\right\Vert }_{{\rm {F}}}={\mathcal{O}}(\sqrt{D}\,\left\Vert A\right\Vert )\) for high-rank matrices. Hence, Alg. 1 has the potential for significant speed-ups over the randomized Kaczmarz algorithm mentioned above. In turn, for the case of Alg. 2, we stress that the estimates obtained refer directly to the target expectation values for a generic observable O, with no need to estimate the normalizing factor x2 separately (although, if desired, the latter can be obtained by taking \(O={\mathbb{1}}\)).

It is also interesting to compare our results with those of the fully randomized scheme of Wang et al.11. There, for A given in terms of a Pauli decomposition with total Pauli weight λ, they also offer direct estimates, with no need of x2. However, their total runtime of \(\tilde{O}\left({\left\Vert {A}^{-1}\right\Vert }^{4}{\lambda }^{2}/{\epsilon }^{2}\right)\) is worse than the scaling presented here by a factor \(\tilde{O}\left(\left\Vert {A}^{-1}\right\Vert \,{\lambda }^{2}\right)\) (recall that here \(\kappa =\left\Vert {A}^{-1}\right\Vert\) since we are assuming \(\left\Vert A\right\Vert =1\)). In turn, compared to the solver in ref. 11, the scaling of our query depth per run is one power of κ superior. In their case, the scaling refers readily to circuit depth, instead of query depth, but this is approximately compensated by the extra dependence on λ2 in their circuit depth.

Ground-state energy estimation

The task of estimating the ground-state energy of a quantum Hamiltonian holds paramount importance in condensed matter physics, quantum chemistry, material science, and optimization. In fact, it is considered one of the most promising use cases for quantum computing in the near term42. However, the problem in its most general form is known to be QMA-hard43. A typical assumption—one we will also use here—is that one is given a Hamiltonian H with \(\left\Vert H\right\Vert \le 1\) and a promise state ϱ having non-vanishing overlap η with the ground state subspace. The ground state energy estimation (GSEE) problem7 then consists of finding an estimate of the ground state energy E0 to additive precision ξ.

If the overlap η is reasonably large (which is often the case in practice, e.g., for small molecular systems using the Hartree–Fock state44), the problem is known to be efficiently solvable, but without any guarantee on η the problem is challenging. A variety of quantum algorithms for GSEE have been proposed (see, e.g., refs. 37,45,46,47), but the substantial resources required are prohibitive for practical implementation before full-fledged fault-tolerant devices become available. Recent works have tried to simplify the complexity of quantum algorithms for GSEE with a view toward early fault-tolerant quantum devices. Notably, a semi-randomized quantum scheme was proposed in7 with query complexity \({\mathcal{O}}\left(\frac{1}{\xi }\log \left(\frac{1}{\xi \eta }\right)\right)\) achieving Heisenberg-limited scaling in ϵ48. Importantly, their algorithm assumes access to the Hamiltonian H through a time evolution oracle eiHτ (for some fixed time τ), which makes it more appropriate for implementation in analog devices. The similar fully-randomized approach of Wan et al.10 gives rise to an expected circuit (not query) complexity of \({\mathcal{O}}\left(\frac{1}{{\xi }^{2}}\log \left(\frac{1}{\eta }\right)\right)\).

Here we approach the GSEE problem within our Chebyshev-based randomized semi-quantum framework. We follow the same strategy used in refs. 7,8,10,14 of reducing GSEE to the so-called eigenvalue thresholding problem. The problem reduces to the estimation up to additive precision \(\frac{\eta }{2}\) of the filter function \({F}_{\varrho }(\;y):= {\rm{Tr}}[\varrho \,\theta (y{\mathbb{1}}-H)]\) for a set of \(\log \left(\frac{1}{\xi }\right)\) different values of y chosen from a uniform grid of cell size ξ (times the length \({E}_{\max }-{E}_{0}\) of the interval of energies of H). This allows one to find E0 up to additive error ξ with \(\log \left(\frac{1}{\xi }\right)\) steps of a binary-like search over y7. At each step, we apply our Alg. 1 with f(x) = θ(yx), A = H, and \(\left\vert \phi \right\rangle =\left\vert \psi \right\rangle\) to estimate Fϱ(y), with \(\varrho =\left\vert \psi \right\rangle \langle \psi\vert\). Here, \(\left\vert \psi \right\rangle\) is any state with promised overlap η > 0 with the ground state subspace. The requirement of additive precision \(\frac{\eta }{2}\) for Fϱ(x) requires an approximation error \(\nu \le \frac{\eta }{4}\) for f and a statistical error \(\epsilon \le \frac{\eta }{4}\) for the estimation.

Interestingly, our approach does not need to estimate Fϱ(y) at different y’s for the search. In Supplementary Note 3, we show that estimating Fϱ at a special point \({y}_{* }=1/\sqrt{2}\) and increasing the number of samples suffices to obtain Fϱ(y) at any other y. As a core auxiliary ingredient for that, we develop a ν-approximation \(\tilde{f}\) to the step function with a shifted argument, θ(yx) that may be of independent interest (see Supplementary Note 3). It has the appealing property that the x and y dependence are separated, namely \(\tilde{f}(\;y-x)={\sum }_{j\in [k]}\left[{a}_{j}(y)\,{{\mathcal{T}}}_{j}(x)\right]+{\sum }_{j\in [k]}\left[{b}_{j}(y)\sqrt{1-{x}^{2}}\,{{\mathcal{U}}}_{j}(x)\right]\), where \({{\mathcal{U}}}_{j}\) is the jth Chebyshev polynomial of the second kind. The first contribution to \(\tilde{f}\) takes the usual form (23) and can be directly implemented by our Alg. 1; the second contribution containing the \({{\mathcal{U}}}_{j}\)’s can also be implemented in a similar way, with the caveat that the required Hadamard test needs a minor modification described in Supplementary Note 1. The maximal degree \(k={\mathcal{O}}(\frac{1}{\xi }\log \left(\frac{1}{\eta }\right))\) is the same for both contributions and the coefficient 1-norms are \({\left\Vert {\bf{a}}\right\Vert }_{1}={\left\Vert {\bf{b}}\right\Vert }_{1}={\mathcal{O}}\left(\log \left(\frac{1}{\xi }\log \left(\frac{1}{\eta }\right)\right)\right)\). Putting all together and taking into account also the \({\mathcal{O}}\left(\log \left(\frac{1}{\xi }\right)\right)\) steps of the binary search, one obtains a total sample complexity \({S}^{(1)}={\mathcal{O}}\left(\frac{1}{{\eta }^{2}}{\log }^{2}\left(\frac{1}{\xi }\log \left(\frac{1}{\eta }\right)\right)\log \left(\frac{4}{\delta }\log \left(\frac{1}{\xi }\right)\right)\right)\).

The corresponding expected query depth and total runtime are shown in Table 1. Remarkably, the query depth exhibits a speed-up with respect to the maximal value k, namely a square root improvement in the η dependence and a logarithmic improvement in the \(\frac{1}{\xi }\) dependence (see Supplementary Note 3 for details). In addition, as can be seen in the table, our expected runtime displays the same Heisenberg-scaling of Wan et al.10. This is interesting given that our algorithm is based on block-encoded oracles rather than the time-evolution oracles used in ref. 10, which may be better suited for digital platforms as discussed previously. Finally, it is interesting to note that there have been recent improvements in the precision dependence, e.g. based on a derivative Gaussian filter8. Those matrix functions are also within the scope of applicability of our approach.

Discussion

We presented a randomized hybrid quantum-classical framework to efficiently estimate state amplitudes and expectation values involving a generic matrix function f(A). More precisely, our algorithms perform a Monte-Carlo simulation of the powerful quantum signal processing (QSP) and singular-value transformation (QSVT) techniques1,2,3,4. Our toolbox is based on three main ingredients: (i) it samples each component of a Chebyshev series for f weighed by its coefficient in the series; (ii) it assumes coherent access to A via a block-encoding oracle; and (iii) f(A) is automatically extracted from its block-encoding without post-selection, using a Hadamard test. This combination allows us to deliver provably better circuit complexities than the standard QSP and QSVT algorithms while maintaining comparable total runtimes.

We illustrated our algorithms on four specific end-user applications: partition-function estimation via quantum Markov-chain Monte Carlo and via imaginary-time evolution, linear system solvers, and ground-state energy estimation (GSEE). The full end-to-end complexity scalings are detailed in Table 1.

For GSEE, the reduction in query complexity (and consequently also in the total gate count) due to randomization is by a factor \(k/{\mathbb{E}}[j]={\mathcal{O}}\left(\sqrt{\log (1/\eta )}\,\log ({\xi }^{-1}\log (1/\eta ))\right)\). We estimate this factor explicitly for the iron–molybdenum cofactor (FeMoco) molecule, which is the primary cofactor of nitrogenase and one of the main target use cases in chemistry for early quantum computers49,50. For GSEE within chemical accuracy (see the section “Resource estimates for FeMoco GSEE” in the “Methods” section for details), the resulting reduction factor is approximately 28 (from k ≈ 1.12 × 107 to \({\mathbb{E}}[j]\,\approx\, 4.01\times 1{0}^{5}\)), while the sample complexity overhead is a factor of \({\left\Vert {\bf{a}}\right\Vert }_{1}^{2}\approx 2.35\). Importantly, these estimates do not take into account the overhead for quantum error correction (QEC). Our reductions in query depth imply larger tolerated logical-gate error levels, which translate into lower code distances and, hence, lower QEC overheads. E.g., for the surface code, the overhead in physical-qubit number is usually quadratic with the code distance51.

An interesting future direction is to explore other matrix functions with our framework. This includes recent developments such as Gaussian and derivative-Gaussian filters for precision improvements in ground-state energy estimation8 or Green function estimation11, and a diversity of other more-established use cases4. Another possibility is to explore the applicability of our methods in the context of hybrid quantum-classical rejection sampling14. Moreover, further studies on the interplay between our framework and Fourier-based matrix processing6,13 may be in place too. Fourier-based approaches have so far focused mainly on the eigenvalue thresholding for ground-state energy estimation7,8,10,11.

Our findings open a promising arena to build and optimize early fault-tolerant quantum algorithms towards practical linear-algebra applications in the near term.

Methods

Qubitized block-encoding

The basic input taken by QSP is a block-encoding UA of the Hermitian operator A of interest (the signal). A block-encoding is a unitary acting on \({{\mathcal{H}}}_{sa}:= {{\mathcal{H}}}_{s}\otimes {{\mathcal{H}}}_{a}\), where \({{\mathcal{H}}}_{s}\) is the system Hilbert space where A acts and \({{\mathcal{H}}}_{a}\) is an ancillary Hilbert space (with dimensions D and Da, respectively), satisfying

$$\left({\left\langle 0\right\vert }_{a}\otimes {{\mathbb{1}}}_{s}\right)\,{U}_{A}\,\left({\left\vert 0\right\rangle }_{a}\otimes {{\mathbb{1}}}_{s}\right)=A$$
(20)

for some suitable state \({\left\vert 0\right\rangle }_{a}\in {{\mathcal{H}}}_{a}\) (here \({{\mathbb{1}}}_{s}\) is the identity operator in \({{\mathcal{H}}}_{s}\)). Designing such an oracle for arbitrary A is a non-trivial task52, but efficient block-encoding schemes are known in cases where some special structure is present, e.g., when A is sparse or expressible as a linear combination of unitaries2,3,53. In particular, we use the following particular form of UA that makes it amenable for dealing with Chebyshev polynomials.

Definition 1

(Qubitized block-encoding oracle) Let A be a Hermitian matrix on \({{\mathcal{H}}}_{s}\) with spectral norm \(\left\Vert A\right\Vert \le 1\), eigenvalues \({\{{\lambda }_{\gamma }\}}_{\gamma \in [D]}\), and eigenstates \(\{{\left\vert \lambda \right\rangle }_{s}\}\). A unitary UA acting on \({{\mathcal{H}}}_{sa}\) is called a (exact) qubitized block-encoding of A if it has the form

$${U}_{A} =\mathop{\bigoplus}\limits_{\gamma \in [D]}{e}^{-i{\vartheta }_{\gamma }{Y}_{\gamma }}\,,$$
(21)

where \({\vartheta }_{\gamma }:= \arccos ({\lambda }_{\gamma })\) and Yγ is the second Pauli matrix acting on the two-dimensional subspace spanned by \(\{{\vert 0\rangle }_{a}\otimes {\vert {\lambda }_{\gamma }\rangle }_{s},{\vert {\perp }_{{\lambda }_{\gamma }}\rangle }_{sa}\}\) with sa\(\langle {\perp }_{{\lambda }_{\gamma }}\vert ({\vert 0\rangle }_{a}\otimes {\vert {\lambda }_{\gamma }\rangle }_{s})=0\).

A qubitized oracle of the form (21) can be constructed from any other block-encoding \({U}_{A}^{{\prime} }\) of A using at most one query to \({U}_{A}^{{\prime} }\) and \({{U}_{A}^{{\prime} }}^{-1}\), at most one additional ancillary qubit, and \({\mathcal{O}}(\log ({D}_{a}))\) quantum gates2.

Block-encoding of Chebyshev polynomials

Standard QSP takes as input the qubitized oracle UA and transforms it into (a block-encoding of) a polynomial function \(\tilde{f}(A)\). With the help of function approximation theory15, this allows the approximate implementation of generic non-polynomial functions f(A). The algorithm complexity is measured by the number of queries to UA, which allows for rigorous quantitative statements agnostic to details of A or to hardware-specific circuit compilations. For our purposes, only a simple QSP result is needed, namely the observation2 that repeated applications of UA give rise to Chebyshev polynomials of A (see Supplementary Note 1 for a proof).

Lemma 4

(Block encoding of Chebyshev polynomials) Let UA be a qubitized block-encoding of A. Then

$$\left({\left\langle 0\right\vert }_{a}\otimes {{\mathbb{1}}}_{s}\right)\,{U}_{A}^{j}\,\left({\left\vert 0\right\rangle }_{a}\otimes {{\mathbb{1}}}_{s}\right)={{\mathcal{T}}}_{j}(A)\,,$$
(22)

for \(j\in {\mathbb{N}}\), where \({{\mathcal{T}}}_{j}(\cdot )\) is the jth order Chebyshev polynomial of the first kind.

We are interested in a truncated Chebyshev series

$$\tilde{f}(x)=\mathop{\sum }\limits_{j=0}^{k}{a}_{j}{{\mathcal{T}}}_{j}(x)$$
(23)

providing a ν-approximation to the target real-valued function \(f:[-1,1]\to {\mathbb{R}}\), that is, \(\mathop{\max }\nolimits_{x\in [-1,1]}\left\vert f(x)-\tilde{f}(x)\right\vert \le \nu\). The Chebyshev polynomials \({{\mathcal{T}}}_{j}\) form a key basis for function approximation, often leading to near-optimal approximation errors15. In particular, unless the target function is periodic and smooth, they tend to outperform Fourier approximations16. The case of complex-valued functions can be treated similarly by splitting it into its real and imaginary parts. The truncation order k is controlled by the desired accuracy ν in a problem-specific way (explicit examples are presented later).

Hadamard test for block encodings

In Algorithms 1 and 2, for each jα or jα, lα sampled, only one measurement shot is obtained. Here we present lemmas showing the result of performing several measurement shots from the Hadamard tests in Fig. 1 with a fixed circuit, i.e., fixed jα (Lemma 5) or jα, lα (Lemma 6). The proofs can be found in the Supplementary Information.

Lemma 5

(Circuit for Algorithm 1) A single-shot measurement on the Hadamard test of Fig. 1a yields a random variable \({\bf{Had}}({U}_{A}^{j},\left\vert \phi \right\rangle ,\left\vert \psi \right\rangle )\in \{-1,1\}\) that satisfies

$${\mathbb{E}}\left[{\bf{Had}}({U}_{A}^{j},\left\vert \phi \right\rangle ,\left\vert \psi \right\rangle )\right]=\left\{\begin{array}{l}{\rm{Re}} \{\left\langle \phi \right\vert {{\mathcal{T}}}_{j}(A)\left\vert \psi \right\rangle \},\,\text{if}\,\,B={\mathbb{1}},\quad \\{\rm{Im}}\{\left\langle \phi \right\vert {{\mathcal{T}}}_{j}(A)\left\vert \psi \right\rangle \},\,\text{if}\,\,B={S}^{\dagger }\quad \end{array}\right..$$
(24)

Lemma 6

(Circuit for Algorithm 2) A single-shot measurement on the Hadamard test of Fig. 1b yields a random variable \({\bf{Had}}({U}_{A}^{l},{U}_{A}^{j},O)\) that satisfies

$${\mathbb{E}}\left[{\bf{Had}}({U}_{A}^{j},{U}_{A}^{l},O)\right]={\rm{Re}} \left\{{\rm{Tr}}[O\,{{\mathcal{T}}}_{j}(A)\,\varrho \,{{\mathcal{T}}}_{l}(A)]\right\}\,.$$
(25)

Resource estimates for FeMoco GSEE

Here, we provide details on the complexity analysis of GSEE for the specific case of FeMoco. We use the sparse qubitized Hamiltonian model of Berry et al.54 with the corrections given in ref. 55 and the active space of Li et al.56.

The molecule’s Hamiltonian is written as a linear combination \({H}^{{\prime} }=\mathop{\sum }\nolimits_{l = 1}^{L}{b}_{l}\,{V}_{l}\) of unitary operators Vl. A block-encoding of \(H={H}^{{\prime} }/\parallel {\bf{b}}{\parallel }_{1}\) is obtained using the method of linear combination of unitaries (LCU)57, i.e., encoding the coefficients bl in the amplitudes of a state of \({\mathcal{O}}(\log (L))\) ancillary qubits, which are used to control the action of each unitary Vl in the sum. In order to estimate the ground-state energy of \({H}^{{\prime} }\) up to chemistry precision of 0.0016Eh (Hatree), we can run the GSEE algorithm on H with precision ξ = 0.001/b1, leaving the remaining 0.0006/b1 tolerated error to imprecision in the block-encoding procedure.

The number of terms L in the Hamiltonian, the 1-norm b1 of their coefficients, and also the total number of qubits required depend on the active space used. In the case we consider, this is composed of n = 152 orbitals, each one corresponding to one qubit in the system. In ref. 55, App. A] it was shown that b1 = 1547Eh and L = 440,501. Moreover, the authors show how to construct a qubitized block-encoding oracle for H explicitly using LCU with 2446 logical qubits and 1.8 × 104 Toffoli gates per oracle query. It is worth noting that, in principle, only n + a qubits are strictly required for LCU, with \(a=\lceil \log (L)\rceil =19\). Therefore, the majority of the qubits are used to upload the Hamiltonian coefficients into the a ancillas state and could be removed if a more efficient upload is used. They apply quantum phase estimation (QPE), which requires \(Q=\lceil \frac{\pi }{2\xi }\rceil \,\approx\, 2.4\times 1{0}^{6}\) oracle queries and \(2\,\log (Q+1)\,\approx\, 43\) additional control qubits. In our case, only one control qubit is necessary for the Hadamard test, while the number of oracle calls Q is to be compared to k or \({\mathbb{E}}[j]\).

At last, to implement our algorithm, we need an initial state with a sufficiently large overlap η with the target ground state. In ref. 58 it was shown that η2 = 10−7 can be achieved using Slater determinants. Using this and the value of ξ given above, we explicitly construct the series approximating the Heaviside function θ(H). The truncation degree and the average degree obtained are k = 1.12 × 107 and \({\mathbb{E}}[j]=4.01\times 1{0}^{5}\), respectively. When multiplied by the above-mentioned gate complexity per oracle query, we get a total of 2.03 × 1011 and 7.26 × 109 Toffoli gates per circuit, respectively. Notice that, when compared to phase estimation, the coherent QSP algorithm demands more coherent queries to the oracle while the randomized processing is less costly on average. At the same time, all three algorithms require a similar number of samples, proportional to η−2. We summarize the complexities of the three methods in Table 2.

Table 2 Comparison of complexities of GSEE for FeMoco using the three different algorithms described in the text

Another interesting feature of the randomized scheme is that, due to its variable circuit depth, it allows for variable quantum error-correcting code distances. Consequently, the average number of physical qubits required for quantum error correction (QEC) and also the runtime of the typical circuits, which have even lower depth than the average, can be highly reduced further. To give a more quantitative idea, we estimated the QEC overhead using the surface code with the spreadsheet provided in ref. 59. We consider a total number of Toffoli’s obtained using the average degree \({\mathbb{E}}[j]\), (i.e., not taking into account the possible further due to typical circuits with even lower depths). With a level-1 code distance d1 = 19 and a level-2 code distance d2 = 31, assuming an error rate per physical gate of 0.001, the error-corrected algorithm uses 8 million qubits to obtain a global error budget of 1%. This means that all circuits with a depth smaller than the average will have an extremely high fidelity. The circuits corresponding to larger degrees will have an increasingly higher chance of failure, but they contribute less to the final estimate. Meanwhile, the same QEC overhead only ensures a roughly 20% chance of errors to a circuit corresponding to the truncation order k.