Abstract
We present a hybrid quantumclassical framework for simulating generic matrix functions more amenable to early faulttolerant quantum hardware than standard quantum singularvalue transformations. The method is based on randomization over the Chebyshev approximation of the target function while keeping the matrix oracle quantum, and is assisted by a variant of the Hadamard test that removes the need for postselection. The resulting statistical overhead is similar to the fully quantum case and does not incur any circuit depth degradation. On the contrary, the average circuit depth is shown to get smaller, yielding equivalent reductions in noise sensitivity, as explicitly shown for depolarizing noise and coherent errors. We apply our technique to partitionfunction estimation, linear system solvers, and groundstate energy estimation. For these cases, we prove advantages on average depths, including quadratic speedups on costly parameters and even the removal of the approximationerror dependence.
Similar content being viewed by others
Introduction
Faster algorithms for linear algebra are a major promise of quantum computation, holding the potential for precious runtime speedups over classical methods. A modern, unified framework for such algorithms is given by the quantum signal processing (QSP)^{1,2} and, more generally, quantum singularvalue transformation (QSVT)^{3} formalisms. These are powerful techniques to manipulate a matrix, coherently given by a quantum oracle, via polynomial transformations on its eigenvalues and singular values, respectively. The class of matrix arithmetic attained is remarkably broad, encompassing primitives as diverse as Hamiltonian simulation, matrix inversion, groundstate energy estimation, Gibbsstate sampling, among others^{4}. Moreover, the framework often offers the stateoftheart in asymptotic query complexities (i.e. number of oracle calls), in some cases matching known complexity lower bounds. Nevertheless, the experimental requirements for full implementations are prohibitive for current devices, and it is not clear if the framework will be useful in practice before largescale faulttolerant quantum computers appear.
This has triggered a quest for early faulttolerant algorithms for matrix processing that allow one to trade performance for nearerterm feasibility in a controlled way, i.e. with provable runtime guarantees^{5,6,7,8,9,10,11,12,13,14}. Particularly promising are randomized hybrid quantumclassical schemes to statistically simulate a matrix function via quantum implementations of more elementary ones^{7,8,9,10,11,12}. For instance, this has been applied to the Heaviside step function θ(H) of a Hamiltonian H, which allows for eigenvalue thresholding, a practical technique for Heisenberglimited spectral analysis^{7}. Two input access models have been considered there: quantum oracles as a controlled unitary evolution of H^{7,8,13} and classical ones given by a decomposition of H as a linear combination of Pauli operators^{9,10,11,12}. In the former, one MonteCarlo simulates the Fourier series of θ(H) by randomly sampling its harmonics. In the latter—in an additional level of randomization—one also probabilistically samples the Pauli terms from the linear combination.
Curiously, however, randomized quantum algorithms for matrix processing have been little explored beyond the specific case of the Heaviside function. Reference ^{11} put forward a randomized, qubitefficient technique for Fourierbased QSP^{6,13} for generic functions. However, the additional level of randomization can detrimentally affect the circuit depth per run compared to coherent oracles. On the other hand, in the quantumoracle setting, the randomized algorithms above have focused mainly on controlled unitary evolution as the input access model. This is convenient in specific cases where H can be analogically implemented. However, it leaves aside the powerful class of blockencoding oracles, i.e., unitary matrices with the input matrix as one of its blocks^{2}. Besides having a broader scope of applicability (including nonHermitean matrices), such oracle types are also a more natural choice for digital setups. Moreover, randomized quantum algorithms have so far not addressed Chebyshev polynomials, the quintessential basis functions for approximation theory^{15}, which often attain better accuracy than Fourier series^{16}. Chebyshev polynomials, together with blockencoding oracles, provide the most sophisticated and general arena for quantum matrix arithmetic^{1,2,3,4}.
Here, we fill in this gap. We derive a semiquantum algorithm for MonteCarlo simulations of QSVT with provably better circuit complexities than fullyquantum schemes as well as advantages in terms of experimental feasibility. Our method estimates state amplitudes and expectation values involving a generic matrix function f(A), leveraging three main ingredients: (i) it samples each component of a Chebyshev series for f with a probability proportional to its coefficient in the series; (ii) it assumes coherent access to A via a blockencoding oracle; and (iii) f(A) is automatically extracted from its blockencoding without postselection, using a Hadamard test. The combination of (i) and (ii) leaves untouched the maximal query complexity k per run native from the Chebyshev expansion. In addition, the statistical overhead we pay for enduser estimations scales only with the l_{1}norm of the Chebyshev coefficients. For the use cases we consider, this turns out to be similar (at worst up to logarithmic factors) to the operator norm of f(A), which would govern the statistical overhead if we used fullyquantum QSVT with a Hadamard test. That is, our scheme does not incur any significant degradation with respect to the fullyquantum case either in runtime or circuit depth. On the contrary, the average query complexity can be significantly smaller than k. We prove interesting speedups of the former over the latter for practical use cases.
These speedups translate directly into equivalent reductions in noise sensitivity: For simple models such as depolarization or coherent errors in the quantum oracle, we show that the estimation inaccuracy due to noise scales with the average query depth. In comparison, it scales with the maximal depth in standard QSVT implementations. Importantly, we implement each sampled Chebyshev polynomial with a simple sequence of queries to the oracle using qubitization; no QSP pulses are required throughout. Finally, (iii) circumvents the need for repeating until success or quantum amplitude amplification. That is, no statistical run is wasted, and no overhead in circuit depth is incurred. In addition, the fully quantum scheme requires an extra ancillary qubit controlling the oracle in order to implement the QSP pulses. All this renders our hybrid approach more experimentally friendly than coherent QSVT.
As use cases, we benchmark our framework on four enduser applications: partitionfunction estimation of classical Hamiltonians via quantum Markovchain Monte Carlo (MCMC); partitionfunction estimation of quantum Hamiltonians via quantum imaginarytime evolution (QITE); linear system solvers (LSSs); and groundstate energy estimation (GSEE). The maximal and expected query depths per run as well as the total expected runtime (taking into account sample complexity), are displayed in Table 1 In all cases, we systematically obtain the following advantages (both per run and in total) of expected versus maximal query complexities.
For MCMC, we prove a quadratic speedup on a factor \({\mathcal{O}}(\log ({Z}_{\beta }\,{e}^{\beta }/{\epsilon }_{{\rm{r}}}))\), where Z_{β} is the partition function to estimate, at inverse temperature β, and ϵ_{r} is the tolerated relative error. For QITE, we remove a factor \({\mathcal{O}}(\log (D\,{e}^{\beta }/{Z}_{\beta }\,{\epsilon }_{{\rm{r}}}))\) from the scaling, where D is the system dimension. For LSSs we consider two subcases: estimation of an entry of the (normalized) solution vector and of the expectation value of an observable O on it. We prove quadratic speedups on factors \({\mathcal{O}}\left(\log (\kappa /\epsilon )\right)\) and \({\mathcal{O}}\left(\log ({\kappa }^{2}\,\parallel O\parallel /\epsilon )\right)\) for the first and second subcases, respectively, where ∥O∥ is the operator norm of O, κ is the condition number of the matrix, and ϵ the tolerated additive error. This places our query depth at an intermediate position between that of the best known Chebyshevbased method^{17} and the optimal one in general^{18}. In turn, compared to the results obtained in^{11} via full randomization, our scaling is one power of κ superior. Finally, for GSEE, we prove a speedup on a factor that depends on the overlap η between the probe state and the ground state: the average query depth is \({\mathcal{O}}\left(\frac{1}{\xi }\sqrt{\log (1/\eta )}/\log (1/\xi )\right)\), whereas the maximal query depth is \({\mathcal{O}}\left(\frac{1}{\xi }\log (1/\eta )\right)\), with ξ the additive error in the energy estimate.
Results
Framework
Our basic setup is that of quantum singular value transformation (QSVT)^{3,4}. This is a powerful technique for synthesizing polynomial functions of a matrix A embedded into a block of a unitary matrix U_{A} (the blockencoding oracle), via polynomial transformations on its singular values. Combined with approximation theory^{19}, this leads to stateoftheart query complexities and an elegant unifying structure for a variety of quantum algorithms of interest. For simplicity of the presentation, in the main text, we focus explicitly on the case of Hermitian matrices. There, QSVT reduces to the simpler setup of quantum signal processing (QSP)^{1,2}, describing eigenvalue transformations. The extension of our algorithms to QSVT for generic matrices is straightforward and is left to Supplementary Note 4. Throughout the paper, we adopt the shorthand notation [l] ≔ {0, …, l − 1} for any \(l\in {\mathbb{N}}\).
We next state our main results. First, we set up explicitly the two problems in question and then proceed to describe our randomized semiquantum algorithm to solve each one of them, proving correctness, runtime, and performing an errorrobustness analysis. We conclude by applying our general framework to a number of exemplary use cases of interest.
Problem statement
We consider the following two concrete problems (throughout the paper, we will use superscripts ^{(1)} or ^{(2)} on quantities referring to Problems 1 or 2, respectively):
Problem 1
(Transformed vector amplitudes) Given access to state preparation unitaries U_{ϕ} and U_{ψ} such that \({U}_{\psi }\left\vert 0\right\rangle =\left\vert \psi \right\rangle ,{U}_{\phi }\left\vert 0\right\rangle =\left\vert \phi \right\rangle\), a Hermitian matrix A, and a realvalued function f, obtain an estimate of
to additive precision ϵ with failure probability at most δ.
This class of problems is relevant for estimating the overlap between a linearly transformed state and another state of interest. This is the case, e.g., in linear system solving, where one is interested in the ith computational basis component of a quantum state of the form \({A}^{1}\left\vert {\bf{b}}\right\rangle\) encoding the solution to the linear system (see the section “Quantum linearsystem solvers” for details). The unitary U_{ϕ} preparing the computationalbasis state \(\left\vert i\right\rangle\), in that case, is remarkably simple, given by a sequence of bit flips.
Problem 2
(Transformed observable expectation values) Given access to a state preparation ϱ, a Hermitian matrix A, an observable O, and a realvalued function f, obtain an estimate of
to additive precision ϵ with failure probability at most δ.
This is relevant, e.g., when A = H is a Hamiltonian, to estimate the partition function corresponding to H, as discussed below in the section “Relativeprecision partition function estimation”.
We present randomized hybrid classical–quantum algorithms for these problems using Chebyshevpolynomial approximations of f and coherent access to a blockencoding of A. Similar problems have been addressed in^{11} but using Fourier approximations and randomizing also over a classical description of A in the Pauli basis.
Randomized semiquantum matrix processing
Our framework is based on the Chebyshev approximation \(\tilde{f}=\mathop{\sum }\nolimits_{j = 0}^{k}{a}_{j}{{\mathcal{T}}}_{j}(x)\) of the function f and a modified Hadamard test involving the qubitized blockencoding oracle U_{A}. We denote by \({\bf{a}}:= \left\{{a}_{0},\ldots ,{a}_{k}\right\}\) the vector of Chebyshev coefficients of \(\tilde{f}\) and by \({\left\Vert {\bf{a}}\right\Vert }_{1}:= \mathop{\sum }\nolimits_{j = 0}^{k} {a}_{j}\) its ℓ_{1}norm. The idea is to statistically simulate the coherent QSP algorithm using a hybrid classical/quantum procedure based on randomly choosing j ∈ [k + 1] according to its importance for ~f and then running a Hadamard test involving the block encoding \({U}_{A}^{j}\) of \({{\mathcal{T}}}_{j}(A)\). Pseudocodes for the algorithms are presented in Fig. 1a and b for Problems 1 and 2, respectively. In both cases, the Hadamard test is the only quantum subroutine. The total number of statistical runs will be \(\frac{2}{P}\,{S}^{(P)}\), with P = 1 or 2, where S^{(P)} will be given in Eqs. (8, 9) and (10) below. The factor \(\frac{2}{P}\) is a subtle difference between Algorithms 1 and 2 coming from the fact that the target quantity is a complexvalued amplitude in the former case, while in the latter it is a real number. This implies that two different types of Hadamard tests (each with S^{(1)} shots) are needed to estimate the real and imaginary parts of z^{(1)}, while z^{(2)} requires a single one. More technically, the procedure goes as follows. First, for every \(\alpha \in [\frac{2}{P}\,{S}^{(P)}]\) run, the following two steps:

i.
Classical subroutine: sample a Chebyshev polynomial degree j_{α} ∈ [k + 1] (and also l_{α} for P = 2) from a probability distribution weighted by the coefficients a of \(\tilde{f}\) defined by
$$p(\;j)=\frac{ {a}_{j} }{{\left\Vert {\bf{a}}\right\Vert }_{1}},\quad \,\text{for all}\,\,j\in [k+1]\,.$$(3)This has classical runtime \(\tilde{{\mathcal{O}}}(k)\).

ii.
Quantum subroutine: if P = 1, run the Hadamard test in Fig. 1a with \({B}_{\alpha }={\mathbb{1}}\) for α < S^{(1)} or \({B}_{\alpha }={S}^{\dagger }:= \left\vert 0\right\rangle \left\langle 0\right\vert i\left\vert 1\right\rangle \left\langle 1\right\vert\) for α ≥ S^{(1)} and use the resulting random bit \({b}_{\alpha }^{(1)}\in \{1,1\}\) to record a sample of the variable
$${\tilde{z}}_{\alpha }^{(1)}:= {\left\Vert {\bf{a}}\right\Vert }_{1}\,{\rm{sgn}}({a}_{{j}_{\alpha }})\,{b}_{\alpha }^{(1)}\,.$$(4)If P = 2, in turn, run the test in Fig. 1b to get as outcomes a random bit \({b}_{\alpha }^{(2)}\in \{1,1\}\) and a random number \({\omega }_{\alpha }\in {\{{o}_{m}\}}_{m\in [D]}\) where o_{m} is the mth eigenvalue of O, and use this to record a sample of
$${\tilde{z}}_{\alpha }^{(2)}:=\,\parallel\!{\bf{a}}{\parallel }_{1}^{2}\,{\rm{sgn}}({a}_{{j}_{\alpha }}){\rm{sgn}}({a}_{{l}_{\alpha }})\,{b}_{\alpha }^{(2)}\,{\omega }_{\alpha }.$$(5)
Then, in a final classical step, obtain the desired estimate \({\tilde{z}}^{(P)}\) by computing the empirical mean over all the recorded samples as follows:
The following two theorems respectively prove the correctness of the estimator and establish the complexity of the algorithm. A simple but crucial auxiliary result for the correctness is the observation that the Hadamard test statistics (i.e. the expectation value of \({b}_{\alpha }^{(P)}\)) depends only on the correct block of \({U}_{A}^{j}\), removing the need for postselection. With this, in Supplementary Note 1, we prove the following.
Theorem 1
(Correctness of the estimator) The empirical means \({\tilde{z}}^{(1)}\) and \({\tilde{z}}^{(2)}\) are unbiased estimators of \(\left\langle \phi \,\right\vert\,\tilde{f}(A)\left\vert \psi \right\rangle\) and \({\rm{Tr}}[O\,\tilde{f}(A)\,\varrho \,\tilde{f}{(A)}^{\dagger }]\), respectively.
Importantly, since \(\tilde{f}\) is a νapproximation to f, the obtained \({\tilde{z}}^{(P)}\) are actually biased estimators of the ultimate quantities of interest z^{(P)} in Eqs. (1) and (2). Such biases are always present in quantum algorithms based on approximate matrix functions, including the fully coherent schemes for QSP^{1,2} and QSVT^{3,4}. Nevertheless, they can be made arbitrarily small in a tunable manner by increasing the truncation order k in the polynomial approximation ~f.
Here, it is convenient to set k so that ν^{(P)} ≤ ϵ/2, where ν^{(1)} ≔ ν and \({\nu }^{(2)}:= \nu \,(2\,\Vert f(A)\,\Vert\,\Vert\,O\,\Vert +\nu )\). This limits the approximation error in Eqs. (1) or (2) to at most ϵ/2. In addition, demanding the statistical error to be also ϵ/2 leads to (see Supplementary Note 1) the following endtoend sample and oraclequery complexities for the algorithm.
Theorem 2
(Complexity of the estimation) Let ϵ > 0 and δ > 0 be, respectively, the tolerated additive error and failure probability; let a be the vector of coefficients in ~f and ν^{(P)} ≤ ϵ/2 the error in z^{(P)} from approximating f with \(\tilde{f}\). Then, if the number of samples is at least
Equations (6) and (7) give an ϵprecise estimate of z^{(P)} with confidence 1−δ. Moreover, the total expected runtime is \({Q}^{(P)}:= 2\,{\mathbb{E}}[\;j]\,{S}^{(P)}\), where \({\mathbb{E}}[\;j]:= \mathop{\sum }\nolimits_{j = 0}^{k}j\,p(\;j)\).
A remarkable consequence of this theorem is that the expected number of queries per statistical run is \(P\times {\mathbb{E}}[j]\). Instead, if we used standard QSVT (together with a similar Hadamard test to avoid postselection), each statistical run would take P × k queries (and an extra ancillary qubit coherently controlling everything else would be required). As shown in Fig. 1c, \({\mathbb{E}}[j]\) can be significantly smaller than k in practice. In fact, for the use cases we analyze, we prove scaling advantages of \({\mathbb{E}}[j]\) over k. These querycomplexity advantages translate directly into reductions in circuit depth and, hence, also in noise sensitivity (see next subsection). As for sample complexity, the statistical overhead of our semiquantum algorithms scales with \({\left\Vert {\bf{a}}\right\Vert }_{1}\), while that of fullyquantum ones would have a similar scaling with \(\left\Vert f(A)\right\Vert\), due to the required normalization for block encoding. Interestingly, in all the use cases analyzed, \({\left\Vert {\bf{a}}\right\Vert }_{1}\) and \(\left\Vert f(A)\right\Vert\) differ at most by a logarithmic factor. Finally, another appealing feature is that our approach relaxes the need to compute the QSP/QSVT angles, which is currently tackled with an extra classical preprocessing stage of runtime \({\mathcal{O}}\left(\,\text{poly}\,(k)\right)\)^{1,2,3,4}.
We emphasize that here we have assumed Hermitian A for the sake of clarity, but a straightforward extension of our randomized scheme from QSP to QSVT (see Supplementary Note 4) gives the generalization to generic A. Moreover, in Supplementary Note 1, we also extend the construction to Chebyshev polynomials of the second kind. This is useful for the groundstate energy estimation in the section “Groundstate energy estimation”.
Intrinsic noisesensitivity reduction
Here, we study how the reduction in query complexity per run from k to the average value \({\mathbb{E}}[j]\) translates into sensitivity to experimental noise. The aim is to make a quantitative but general comparison between our randomized semiquantum approach and fullyquantum schemes, remaining agnostic to the specific choice of operator function, circuit compilation, or physical platform. To this end, we consider two toy error models that allow one to allocate one unit of noise per oracle query.
Our first error model consists of a faulty quantum oracle given by the ideal oracle followed by a globally depolarizing channel Λ of noise strength p, defined by^{20}
Here, ϱ is the joint state of the total Hilbert space in Fig. 1a (system register, oracle ancilla, and Hadamard test ancilla) and D_{tot} its dimension. In Supplementary Note 2, we prove the following.
Theorem 3
(Average noise sensitivity) Let \({\tilde{z}}^{(P)}\) be the ideal estimators (6) and (7) and \({\tilde{z}}^{(P),\Lambda }\) their noisy version with Λ acting after each oracle query in Fig. 1. Then
where
Our second model is coherent errors that make the quantum oracle no longer the exact block encoding U_{A} of A but only an εapproximate block encoding (a unitary with operatornorm distance ε from U_{A}). In Supplementary Note 2, we show that Eqs. (11) and (12) hold also there with p replaced by 2ε.
It is instructive to compare Eqs. (11) and (12) with the inaccuracy for the corresponding fullyquantum scheme. A fair scenario for that comparison (in the case of Problem 1) is to equip the standard QSVT with a Hadamard test similar to the ones in Fig. 1 so as to also circumvent the need for postselection. Notice that, while in our randomized method, only the Hadamard ancilla controls the calls to the oracle, the standard QSVT circuit involves twoqubit control to also implement the pulses that determine the Chebyshev coefficients. As a consequence, the underlying gate complexity per oracle query would be considerably higher than for our schemes (with singlequbit gates becoming twoqubit gates, twoqubit gates becoming Toffoli gates, etc.). For this reason, the resulting noise strength p_{fq} is expected to be larger than p. The lefthand side of Eq. (11) would then (see Supplementary Note 2) be upperbounded by p_{fq} E_{fq}, with \({E}_{{\rm{fq}}}=k\, \left\langle\,\phi\;\right\vert \tilde{f}(A)\left\vert \psi \right\rangle \), where p_{fq} > p and \(k\,>\, {\mathbb{E}}[\;j]\).
Another natural scenario for comparison is where the fullyquantum algorithm does not leverage a Hadamard test but implements postselection measurements on the oracle ancilla, in a repeatuntilsuccess strategy. This comparison applies only to Problem 2 since one cannot directly measure the complex amplitudes for Problem 1. The advantage, though, is that the circuits are now directly comparable because the gate complexities per oracle query are essentially the same (the fully quantum scheme has extra QSP pulses, but these are singlequbit gates whose error contribution is low). Hence, similar error rates to p are expected here so that one would have the equivalent of Eq. (12) being \({\mathcal{O}}(k\,p)\). This is already worse than Eq. (12) because \(k \,>\, {\mathbb{E}}[j]\), as already discussed. Moreover, with postselection, one additionally needs to estimate normalizing constants with an independent set of experimental runs, which incurs extra systematic and statistical errors. In contrast, our method does not suffer from this issue, as it directly gives the estimates in Eqs. (1) or (2) regardless of state normalization (see the use cases below).
Finally, a third possibility could be to combine the fully quantum scheme with quantum amplitude amplification to manage the postselection. This would quadratically improve the dependence on the postselection probability. However, the circuit depth would then gain a factor inversely proportional to the square root of the postselection probability. Unfortunately, this is far out of reach of earlyfault tolerant hardware.
In what follows, we illustrate the usefulness of our framework with four use cases of practical relevance: partition function estimation (both for classical or general Hamiltonians), linear system solving, and groundstate energy estimation. These correspond to f(x) = x^{t}, e^{−βx}, x^{−1}, and θ(x), respectively. The endtoend complexities for each case are summarized in Table 1.
Relativeprecision partition function estimation
Partition function estimation is a quintessential hard computational problem, with applications ranging from statistical physics to generative machine learning, as in Markov random fields^{21}, Boltzmann machines^{22}, and even the celebrated transformer architecture^{23} from large language models. Partition functions also appear naturally in other problems of practical relevance, such as constraint satisfaction problems^{24}.
The partition function of a Hamiltonian H at inverse temperature β is defined as
One is typically interested in the problem of estimating Z_{β} to relative error ϵ_{r}, that is, finding \({\tilde{Z}}_{\beta }\) such that
This allows for the estimation of relevant thermodynamic functions, such as the Helmholtz free energy \(F=\frac{1}{\beta }\log {Z}_{\beta }\), to additive precision. The naive classical algorithm based on direct diagonalization runs in time \({\mathcal{O}}({D}^{3})\), where \(D=\,\text{dim}\,({{\mathcal{H}}}_{s})\) is the Hilbert space dimension. Although it can be improved to \({\mathcal{O}}(D)\) using the kernel polynomial method^{25} if H is sparse, one expects no generalcase efficient algorithm to be possible due to complexity theory arguments^{26}. In turn, if the Hamiltonian is classical (diagonal), Z_{β} can be obtained exactly in classical runtime \({\mathcal{O}}(D)\). Generalpurpose quantum algorithms (that work for any inverse temperature and any Hamiltonian) have been proposed^{27,28,29}. The list includes another algorithm^{28} that, like ours, utilizes the Hadamard test and a blockencoding of the Hamiltonian.
In the following, we present two different quantum algorithms for partition function estimation: one for classical Ising models, based on the MarkovChain MonteCarlo (MCMC) method, and another for generic noncommuting Hamiltonians, based on quantum imaginarytime evolution (QITE) simulation^{5,30}.
Partition function estimation via MCMC
Here, we take H as the Hamiltonian of a classical Ising model. As such, spin configurations, denoted by \(\left\vert {\bf{y}}\right\rangle\), are eigenstates of H with corresponding energies E_{y}. Let us define the coherent version of the Gibbs state \(\left\vert \sqrt{{\boldsymbol{\pi }}}\right\rangle := {Z}_{\beta }^{1/2}{\sum }_{{\bf{y}}}{e}^{\beta {E}_{{\bf{y}}}/2}\left\vert {\bf{y}}\right\rangle\). Then, for any \(\left\vert {\bf{y}}\right\rangle\), the partition function satisfies the identity
with \({\Pi }_{{\boldsymbol{\pi }}}:= \left\vert \sqrt{{\boldsymbol{\pi }}}\right\rangle \left\langle \sqrt{{\boldsymbol{\pi }}}\right\vert\). Below we discuss how to use our framework to obtain an estimation of \(\left\langle {\bf{y}}\right\vert {\Pi }_{{\boldsymbol{\pi }}}\left\vert {\bf{y}}\right\rangle\) for a randomly sampled \(\left\vert {\bf{y}}\right\rangle\) and, therefore, approximate the partition function.
Let A be the discriminant matrix^{31} of a Markov chain having the Gibbs state of H at inverse temperature β as its unique stationary state. The Szegedy quantum walk unitary^{31} provides a qubitized blockencoding U_{A} of A that can be efficiently implemented^{32}. A useful property of A is that the monomial A^{t} approaches Π_{π} for sufficiently large integer t (the precise statement is given in Supplementary Note 3). This implies that \(\left\langle {\bf{y}}\right\vert {\Pi }_{{\boldsymbol{\pi }}}\left\vert {\bf{y}}\right\rangle\) can be estimated using Alg. 1 with f(A) = A^{t} and \(\left\vert \psi \right\rangle =\left\vert \phi \right\rangle =\left\vert {\bf{y}}\right\rangle\). In this case, the state preparation unitaries U_{ψ} = U_{ϕ} will be simple bit flips.
A νapproximation \(\tilde{f}(A)\) can be constructed by truncating the Chebyshev representation of A^{t} to order \(k=\sqrt{2\,t\log (2/\nu )}\)^{19}. The l_{1}norm of the corresponding coefficient vector is \({\left\Vert {\bf{a}}\right\Vert }_{1}=1\nu\). For this Chebyshev series, the ratio \({\mathbb{E}}[j]/k\) between the average and the maximum query complexities can be shown (see Supplementary Note 3) to be at most \({(1\nu )}^{1}/\sqrt{\pi \,\log (2/\nu )}\) for large t. This implies that the more precise the estimation, the larger the advantage of the randomized algorithm in terms of total expected runtime. For instance, for ν = 10^{−2}, the ratio is roughly equal to 0.25.
To estimate the partition function up to relative error ϵ_{r}, Alg. 1 needs to estimate \(\left\langle {\bf{y}}\right\vert {\Pi }_{{\boldsymbol{\pi }}}\left\vert {\bf{y}}\right\rangle\) with additive error \(\epsilon =\frac{{e}^{\beta {E}_{{\bf{y}}}}}{2\,{Z}_{\beta }}{\epsilon }_{{\rm{r}}}\) (see Supplementary Note 3). In Supplementary Note 3, we show that the necessary t and ν required for that yield a maximum query complexity per run of \(k=\sqrt{\frac{2}{\Delta }}\log (\frac{12\,{Z}_{\beta }\,{e}^{\beta {E}_{{\bf{y}}}}}{{\epsilon }_{{\rm{r}}}})\) and an average query complexity of \({\mathbb{E}}[j]=\sqrt{\frac{2}{\pi \,\Delta }\log (\frac{12\,{Z}_{\beta }\,{e}^{\beta {E}_{{\bf{y}}}}}{{\epsilon }_{{\rm{r}}}})}\), where Δ is the spectral gap of A. Moreover, from Theorem 2, the necessary sample complexity is \({S}^{(1)}=64\,{e}^{2\beta {E}_{{\bf{y}}}}\,{Z}_{\beta }^{2}\,\frac{\log (2/\delta )}{{\epsilon }_{{\rm{r}}}^{2}}\). This leads to the total expected runtime in Table 1.
Three important observations about the algorithm’s complexities are in place. First, the total expected runtime has no explicit dependence on the Hilbert space dimension D and maintains the squareroot dependence on Δ (a Szegedylike quadratic quantum speedup^{31}). Second, all three complexities in the first row of the Table 1 depend on the product \({Z}_{\beta }\,{e}^{\beta {E}_{{\bf{y}}}}={\mathcal{O}}\left({e}^{\beta ({E}_{{\bf{y}}}{E}_{\min })}\right)\), with \({E}_{\min }\) the minimum eigenvalue of H, where the scaling holds for large β. This scaling plays more in our favor the lower the energy E_{y} of the initial state y is. Hence, by uniformly sampling a constant number of different bitstrings y and picking the lowest energy one, one ensures to start with a convenient initial state. Third, the quadratic advantage featured by \({\mathbb{E}}[j]\) over k on the logarithmic term is an interesting type of speedup entirely due to the randomization over the components of the Chebyshev series.
To end up with, the total expected runtime obtained can potentially provide a quantum advantage over classical estimations in regimes where \(\frac{{e}^{2\beta ({E}_{{\bf{y}}}{E}_{\min })}}{\sqrt{\Delta }\,{\epsilon }_{{\rm{r}}}^{2}} \,<\, D\).
Partition function estimation via QITE
Alternatively, the partition function associated with a Hamiltonian H can be estimated by quantum simulation of imaginary time evolution (QITE). This method applies to any Hamiltonian (not just classical ones), assuming a blockencoding of H. Z_{β} can be written in terms of the expectation value of the QITE propagator e^{−βH} over the maximally mixed state \({\varrho }_{0}:= \frac{{\mathbb{1}}}{D}\), that is
Therefore, we can apply our Alg. 2 with \(A=H,O=D{\mathbb{1}},\varrho ={\varrho }_{0}\), and f(H) = e^{−βH/2} to estimate Z_{β} with relative precision ϵ_{r} and confidence 1−δ. The sample complexity is obtained from Eq. (10) as \({S}^{(2)}=\frac{8\,{D}^{2}{e}^{2\beta }}{{\epsilon }_{{\rm{r}}}^{2}{Z}_{\beta }^{2}}\log \frac{2}{\delta }\), by setting the additive error equal to Z_{β}ϵ_{r}.
We use the Chebyshev approximation of the exponential function introduced in ref. ^{19}, which has a quadratically better asymptotic dependence on β than other wellknown expansions such as the JacobiAnger decomposition^{5}. This expansion was used before to implement the QITE propagator using QSVT coherently^{3}. The resulting truncated Chebyshev series has order \(k=\sqrt{2\,\max \left\{\frac{{e}^{2}\beta }{2},\log \left(\frac{8D\,{e}^{\beta }}{{Z}_{\beta }\,{\epsilon }_{{\rm{r}}}}\right)\right\}\,\log \left(\frac{16D\,{e}^{\beta }}{{Z}_{\beta }\,{\epsilon }_{{\rm{r}}}}\right)}\) and coefficient l_{1}norm \({\left\Vert {\bf{a}}\right\Vert }_{1}\le {e}^{\beta /2}+\nu\) (see Supplementary Note 3). Interestingly, the average query depth does not depend on the precision of the estimation but scales as \({\mathcal{O}}(\sqrt{\beta })\) with a modest constant factor for any ϵ_{r} (see Supplementary Note 3). This implies an advantage of \({\mathcal{O}}\left(\log \left(\frac{D\,{e}^{\beta }}{{Z}_{\beta }\,{\epsilon }_{{\rm{r}}}}\right)\right)\) in terms of overall runtime as compared to coherent QSVT, which is again entirely due to our randomization scheme.
Overall, this gives our algorithm a total expected runtime of \({\mathcal{O}}\left(\frac{{D}^{2}\sqrt{\beta }\,{e}^{2\beta }}{{Z}_{\beta }^{2}}\frac{\log (2/\delta )}{{\epsilon }_{{\rm{r}}}^{2}}\right)\). The previous stateoftheart algorithm from ref. ^{28} has runtime \(\tilde{{\mathcal{O}}}\left(\frac{{D}^{2}{D}_{a}^{2}{e}^{2\beta }{\beta }^{2}}{{\epsilon }_{{\rm{r}}}^{2}{Z}_{\beta }^{2}}\log \frac{1}{\delta }\right)\). Compared with that, we get an impressive quartic speedup in β together with the entire removal of the dependence on \({D}_{a}^{2}\). The improvement comes from not estimating each Chebyshev term individually and allowing the ancillas to be pure while only the system is initialized in the maximally mixed state.
Finally, compared to the \({\mathcal{O}}({D}^{3})\) scaling of the classical algorithm based on exact diagonalization, our expected runtime has a better dependence on D. Moreover, in the regime of small β such that \({Z}_{\beta }^{2} \,>\, {\mathcal{O}}\left(\sqrt{\beta }\,{e}^{2\beta }\log (1/\delta )/{\epsilon }_{{\rm{r}}}^{2}\right)\), the expected runtime can be even better than that of the kernel method, which scales as \({\mathcal{O}}(D)\).
Quantum linearsystem solvers
Given a matrix \(A\in {{\mathbb{C}}}^{D}\times {{\mathbb{C}}}^{D}\) and a vector \({\bf{b}}\in {{\mathbb{C}}}^{D}\), the task is to find a vector \({\bf{x}}\in {{\mathbb{C}}}^{D}\) such that
The best classical algorithm for a generic A is based on Gaussian elimination, with a runtime \({\mathcal{O}}({D}^{3})\)^{33}. For A positive semidefinite and sparse, with sparsity (i.e. maximal number of nonzero elements per row or column) s, the conjugate gradient algorithm^{34} can reduce this to \({\mathcal{O}}(Ds\kappa )\), where \(\kappa := \left\Vert A\right\Vert \,\left\Vert {A}^{1}\right\Vert\) is the condition number of A. In turn, the randomized Kaczmarz algorithm^{35} can yield an ϵprecise approximation of a single component of x in \({\mathcal{O}}\left(s\,{\kappa }_{{\rm {F}}}^{2}\log (1/\epsilon )\right)\), with \({\kappa }_{{\rm {F}}}:= {\left\Vert A\right\Vert }_{{\rm {F}}}\left\Vert {A}^{1}\right\Vert\) and \({\left\Vert A\right\Vert }_{{\rm {F}}}\) the Frobenius norm of A.
In contrast, quantum linearsystem solvers (QLSSs)^{3,4,17,18,36,37,38,39,40} prepare a quantum state that encodes the normalized version of the solution vector x in its amplitudes. More precisely, given quantum oracles for A and \(\left\vert {\bf{b}}\right\rangle := \frac{1}{{\left\Vert {\bf{b}}\right\Vert }_{2}}{\sum }_{i}{b}_{i}\left\vert i\right\rangle\) as inputs, they output the state \(\left\vert {\bf{x}}\right\rangle := \frac{1}{\parallel {\bf{x}}{\parallel }_{2}}{\sum }_{i}{x}_{i}\left\vert i\right\rangle\), where \({\left\Vert \cdot \right\Vert }_{2}\) is the l_{2}norm and we assume \(\left\Vert A\right\Vert \le 1\) for simplicity of presentation (see Supplementary Note 4 for the case of unnormalized A). Interestingly, circuit compilations of block encoding oracles for A with gate complexity \({\mathcal{O}}\left(\log (D/\epsilon )\right)\) have been explicitly worked out assuming a QRAM access model to the classical entries of A^{41}. This can be used for extracting relevant features—such as an amplitude 〈ϕ∣x〉 or an expectation value \(\left\langle {\bf{x}}\right\vert O\left\vert {\bf{x}}\right\rangle\)—from the solution state, with potential exponential speedups over known classical algorithms, assuming that the oracles are efficiently implementable and \(\kappa ={\mathcal{O}}\left(\,\text{polylog}\,(D)\right)\).
Reference ^{18} proposed an asymptotically optimal QLSS based on a discrete version of the adiabatic theorem with query complexity \({\mathcal{O}}\left(\kappa \log (1/\epsilon )\right)\). Within the Chebyshevbased QSP framework, the best known QLSS uses \({\mathcal{O}}\left(\kappa \log (\kappa /\epsilon )\right)\) oracle queries^{17}. If the final goal is, for instance, to reconstruct a computationalbasis component 〈i∣x〉 of the solution vector, the resulting runtime becomes \({\mathcal{O}}\left(({\kappa }^{3}/{\epsilon }^{2})\log (\kappa /\epsilon )\right)\), since this requires \({\mathcal{O}}\left({\kappa }^{2}/{\epsilon }^{2}\right)\) measurements on \(\left\vert {\bf{x}}\right\rangle\). Importantly, however, in order to relate the abovementioned features of \(\left\vert {\bf{x}}\right\rangle\) to the corresponding ones from the (unnormalized) classical solution vector x, one must also independently estimate ∥x∥_{2}. This can still be done with QLSSs (e.g., with quantum amplitude estimation techniques), but requires extra runs. Our algorithms do not suffer from this issue, providing direct estimates from the unnormalized vector \({A}^{1}\left\vert {\bf{b}}\right\rangle\).
More precisely, with f being the inverse function on the cutoff interval \({{\mathcal{I}}}_{\kappa }:= [1,1/\kappa ]\cup [1/\kappa ,1]\), our Algs. 1 and 2 readily estimate amplitudes \(\left\langle \phi \right\vert {A}^{1}\left\vert {\bf{b}}\right\rangle\) and expectation values \(\left\langle {\bf{b}}\right\vert {A}^{1}O{A}^{1}\left\vert {\bf{b}}\right\rangle\), respectively. The technical details of the polynomial approximation \(\tilde{f}\) and complexity analysis are deferred to Supplementary Note 3. In particular, there we show that, to approximate f to error ν, one needs a polynomial of degree \(k={\mathcal{O}}\left(\kappa \,\log (\kappa /\nu )\right)\) and \({\left\Vert {\bf{a}}\right\Vert }_{1}={\mathcal{O}}\left(\kappa \sqrt{\log ({\kappa }^{2}/\nu )}\right)\). For our purposes, as discussed before theorem 2, to ensure a target estimation error ϵ on the quantity of interest one must have \(\nu ={\mathcal{O}}(\epsilon )\) for Alg. 1 and \(\nu ={\mathcal{O}}({(\kappa \left\Vert O\right\Vert )}^{1}\epsilon )\) for Alg. 2. This leads to the sample complexities \({S}^{(1)}={\mathcal{O}}\left(({\kappa }^{2}/{\epsilon }^{2}){\log }^{2}({\kappa }^{2}/\epsilon )\log (4/\delta )\right)\) and \({S}^{(2)}={\mathcal{O}}(({\kappa }^{4}{\left\Vert O\right\Vert }^{2}/{\epsilon }^{2}){\log }^{4}({\kappa }^{3}\,\left\Vert O\right\Vert /\epsilon ))\log (4/\delta )\left.\right)\), respectively.
The expected query depth and total expected runtimes are shown in Table 1. In particular, the former exhibits a quadratic improvement in the error dependence with respect to the maximal query depth k. This places our algorithm in between the \({\mathcal{O}}(\kappa \log (\kappa /\epsilon ))\)^{17} scaling of the fully quantum algorithm and the asymptotically optimal \({\mathcal{O}}(\kappa \log (1/\epsilon ))\) scaling of^{18}, therefore making it more suitable for the early faulttolerance era. In fact, our expected query depth can even beat this optimal scaling for \(\kappa \,\lesssim\, {(1/\epsilon )}^{\log (1/\epsilon )1}\). Note also that our total expected runtimes are only logarithmically worse in κ than the ones in the fullyquantum case. For the case of Alg. 1, an interesting subcase is that of \(\left\langle \phi \right\vert =\left\langle i\right\vert\), as this directly gives the ith component of the solution vector x. The quantum oracle U_{ϕ} is remarkably simple there, corresponding to the preparation of a computationalbasis state. As for the runtime, we recall that \(\left\Vert A\right\Vert \le {\left\Vert A\right\Vert }_{{\rm {F}}}\) in general and \({\left\Vert A\right\Vert }_{{\rm {F}}}={\mathcal{O}}(\sqrt{D}\,\left\Vert A\right\Vert )\) for highrank matrices. Hence, Alg. 1 has the potential for significant speedups over the randomized Kaczmarz algorithm mentioned above. In turn, for the case of Alg. 2, we stress that the estimates obtained refer directly to the target expectation values for a generic observable O, with no need to estimate the normalizing factor ∥x∥_{2} separately (although, if desired, the latter can be obtained by taking \(O={\mathbb{1}}\)).
It is also interesting to compare our results with those of the fully randomized scheme of Wang et al.^{11}. There, for A given in terms of a Pauli decomposition with total Pauli weight λ, they also offer direct estimates, with no need of ∥x∥_{2}. However, their total runtime of \(\tilde{O}\left({\left\Vert {A}^{1}\right\Vert }^{4}{\lambda }^{2}/{\epsilon }^{2}\right)\) is worse than the scaling presented here by a factor \(\tilde{O}\left(\left\Vert {A}^{1}\right\Vert \,{\lambda }^{2}\right)\) (recall that here \(\kappa =\left\Vert {A}^{1}\right\Vert\) since we are assuming \(\left\Vert A\right\Vert =1\)). In turn, compared to the solver in ref. ^{11}, the scaling of our query depth per run is one power of κ superior. In their case, the scaling refers readily to circuit depth, instead of query depth, but this is approximately compensated by the extra dependence on λ^{2} in their circuit depth.
Groundstate energy estimation
The task of estimating the groundstate energy of a quantum Hamiltonian holds paramount importance in condensed matter physics, quantum chemistry, material science, and optimization. In fact, it is considered one of the most promising use cases for quantum computing in the near term^{42}. However, the problem in its most general form is known to be QMAhard^{43}. A typical assumption—one we will also use here—is that one is given a Hamiltonian H with \(\left\Vert H\right\Vert \le 1\) and a promise state ϱ having nonvanishing overlap η with the ground state subspace. The ground state energy estimation (GSEE) problem^{7} then consists of finding an estimate of the ground state energy E_{0} to additive precision ξ.
If the overlap η is reasonably large (which is often the case in practice, e.g., for small molecular systems using the Hartree–Fock state^{44}), the problem is known to be efficiently solvable, but without any guarantee on η the problem is challenging. A variety of quantum algorithms for GSEE have been proposed (see, e.g., refs. ^{37,45,46,47}), but the substantial resources required are prohibitive for practical implementation before fullfledged faulttolerant devices become available. Recent works have tried to simplify the complexity of quantum algorithms for GSEE with a view toward early faulttolerant quantum devices. Notably, a semirandomized quantum scheme was proposed in^{7} with query complexity \({\mathcal{O}}\left(\frac{1}{\xi }\log \left(\frac{1}{\xi \eta }\right)\right)\) achieving Heisenberglimited scaling in ϵ^{48}. Importantly, their algorithm assumes access to the Hamiltonian H through a time evolution oracle e^{−iHτ} (for some fixed time τ), which makes it more appropriate for implementation in analog devices. The similar fullyrandomized approach of Wan et al.^{10} gives rise to an expected circuit (not query) complexity of \({\mathcal{O}}\left(\frac{1}{{\xi }^{2}}\log \left(\frac{1}{\eta }\right)\right)\).
Here we approach the GSEE problem within our Chebyshevbased randomized semiquantum framework. We follow the same strategy used in refs. ^{7,8,10,14} of reducing GSEE to the socalled eigenvalue thresholding problem. The problem reduces to the estimation up to additive precision \(\frac{\eta }{2}\) of the filter function \({F}_{\varrho }(\;y):= {\rm{Tr}}[\varrho \,\theta (y{\mathbb{1}}H)]\) for a set of \(\log \left(\frac{1}{\xi }\right)\) different values of y chosen from a uniform grid of cell size ξ (times the length \({E}_{\max }{E}_{0}\) of the interval of energies of H). This allows one to find E_{0} up to additive error ξ with \(\log \left(\frac{1}{\xi }\right)\) steps of a binarylike search over y^{7}. At each step, we apply our Alg. 1 with f(x) = θ(y−x), A = H, and \(\left\vert \phi \right\rangle =\left\vert \psi \right\rangle\) to estimate F_{ϱ}(y), with \(\varrho =\left\vert \psi \right\rangle \langle \psi\vert\). Here, \(\left\vert \psi \right\rangle\) is any state with promised overlap η > 0 with the ground state subspace. The requirement of additive precision \(\frac{\eta }{2}\) for F_{ϱ}(x) requires an approximation error \(\nu \le \frac{\eta }{4}\) for f and a statistical error \(\epsilon \le \frac{\eta }{4}\) for the estimation.
Interestingly, our approach does not need to estimate F_{ϱ}(y) at different y’s for the search. In Supplementary Note 3, we show that estimating F_{ϱ} at a special point \({y}_{* }=1/\sqrt{2}\) and increasing the number of samples suffices to obtain F_{ϱ}(y) at any other y. As a core auxiliary ingredient for that, we develop a νapproximation \(\tilde{f}\) to the step function with a shifted argument, θ(y−x) that may be of independent interest (see Supplementary Note 3). It has the appealing property that the x and y dependence are separated, namely \(\tilde{f}(\;yx)={\sum }_{j\in [k]}\left[{a}_{j}(y)\,{{\mathcal{T}}}_{j}(x)\right]+{\sum }_{j\in [k]}\left[{b}_{j}(y)\sqrt{1{x}^{2}}\,{{\mathcal{U}}}_{j}(x)\right]\), where \({{\mathcal{U}}}_{j}\) is the jth Chebyshev polynomial of the second kind. The first contribution to \(\tilde{f}\) takes the usual form (23) and can be directly implemented by our Alg. 1; the second contribution containing the \({{\mathcal{U}}}_{j}\)’s can also be implemented in a similar way, with the caveat that the required Hadamard test needs a minor modification described in Supplementary Note 1. The maximal degree \(k={\mathcal{O}}(\frac{1}{\xi }\log \left(\frac{1}{\eta }\right))\) is the same for both contributions and the coefficient 1norms are \({\left\Vert {\bf{a}}\right\Vert }_{1}={\left\Vert {\bf{b}}\right\Vert }_{1}={\mathcal{O}}\left(\log \left(\frac{1}{\xi }\log \left(\frac{1}{\eta }\right)\right)\right)\). Putting all together and taking into account also the \({\mathcal{O}}\left(\log \left(\frac{1}{\xi }\right)\right)\) steps of the binary search, one obtains a total sample complexity \({S}^{(1)}={\mathcal{O}}\left(\frac{1}{{\eta }^{2}}{\log }^{2}\left(\frac{1}{\xi }\log \left(\frac{1}{\eta }\right)\right)\log \left(\frac{4}{\delta }\log \left(\frac{1}{\xi }\right)\right)\right)\).
The corresponding expected query depth and total runtime are shown in Table 1. Remarkably, the query depth exhibits a speedup with respect to the maximal value k, namely a square root improvement in the η dependence and a logarithmic improvement in the \(\frac{1}{\xi }\) dependence (see Supplementary Note 3 for details). In addition, as can be seen in the table, our expected runtime displays the same Heisenbergscaling of Wan et al.^{10}. This is interesting given that our algorithm is based on blockencoded oracles rather than the timeevolution oracles used in ref. ^{10}, which may be better suited for digital platforms as discussed previously. Finally, it is interesting to note that there have been recent improvements in the precision dependence, e.g. based on a derivative Gaussian filter^{8}. Those matrix functions are also within the scope of applicability of our approach.
Discussion
We presented a randomized hybrid quantumclassical framework to efficiently estimate state amplitudes and expectation values involving a generic matrix function f(A). More precisely, our algorithms perform a MonteCarlo simulation of the powerful quantum signal processing (QSP) and singularvalue transformation (QSVT) techniques^{1,2,3,4}. Our toolbox is based on three main ingredients: (i) it samples each component of a Chebyshev series for f weighed by its coefficient in the series; (ii) it assumes coherent access to A via a blockencoding oracle; and (iii) f(A) is automatically extracted from its blockencoding without postselection, using a Hadamard test. This combination allows us to deliver provably better circuit complexities than the standard QSP and QSVT algorithms while maintaining comparable total runtimes.
We illustrated our algorithms on four specific enduser applications: partitionfunction estimation via quantum Markovchain Monte Carlo and via imaginarytime evolution, linear system solvers, and groundstate energy estimation (GSEE). The full endtoend complexity scalings are detailed in Table 1.
For GSEE, the reduction in query complexity (and consequently also in the total gate count) due to randomization is by a factor \(k/{\mathbb{E}}[j]={\mathcal{O}}\left(\sqrt{\log (1/\eta )}\,\log ({\xi }^{1}\log (1/\eta ))\right)\). We estimate this factor explicitly for the iron–molybdenum cofactor (FeMoco) molecule, which is the primary cofactor of nitrogenase and one of the main target use cases in chemistry for early quantum computers^{49,50}. For GSEE within chemical accuracy (see the section “Resource estimates for FeMoco GSEE” in the “Methods” section for details), the resulting reduction factor is approximately 28 (from k ≈ 1.12 × 10^{7} to \({\mathbb{E}}[j]\,\approx\, 4.01\times 1{0}^{5}\)), while the sample complexity overhead is a factor of \({\left\Vert {\bf{a}}\right\Vert }_{1}^{2}\approx 2.35\). Importantly, these estimates do not take into account the overhead for quantum error correction (QEC). Our reductions in query depth imply larger tolerated logicalgate error levels, which translate into lower code distances and, hence, lower QEC overheads. E.g., for the surface code, the overhead in physicalqubit number is usually quadratic with the code distance^{51}.
An interesting future direction is to explore other matrix functions with our framework. This includes recent developments such as Gaussian and derivativeGaussian filters for precision improvements in groundstate energy estimation^{8} or Green function estimation^{11}, and a diversity of other moreestablished use cases^{4}. Another possibility is to explore the applicability of our methods in the context of hybrid quantumclassical rejection sampling^{14}. Moreover, further studies on the interplay between our framework and Fourierbased matrix processing^{6,13} may be in place too. Fourierbased approaches have so far focused mainly on the eigenvalue thresholding for groundstate energy estimation^{7,8,10,11}.
Our findings open a promising arena to build and optimize early faulttolerant quantum algorithms towards practical linearalgebra applications in the near term.
Methods
Qubitized blockencoding
The basic input taken by QSP is a blockencoding U_{A} of the Hermitian operator A of interest (the signal). A blockencoding is a unitary acting on \({{\mathcal{H}}}_{sa}:= {{\mathcal{H}}}_{s}\otimes {{\mathcal{H}}}_{a}\), where \({{\mathcal{H}}}_{s}\) is the system Hilbert space where A acts and \({{\mathcal{H}}}_{a}\) is an ancillary Hilbert space (with dimensions D and D_{a}, respectively), satisfying
for some suitable state \({\left\vert 0\right\rangle }_{a}\in {{\mathcal{H}}}_{a}\) (here \({{\mathbb{1}}}_{s}\) is the identity operator in \({{\mathcal{H}}}_{s}\)). Designing such an oracle for arbitrary A is a nontrivial task^{52}, but efficient blockencoding schemes are known in cases where some special structure is present, e.g., when A is sparse or expressible as a linear combination of unitaries^{2,3,53}. In particular, we use the following particular form of U_{A} that makes it amenable for dealing with Chebyshev polynomials.
Definition 1
(Qubitized blockencoding oracle) Let A be a Hermitian matrix on \({{\mathcal{H}}}_{s}\) with spectral norm \(\left\Vert A\right\Vert \le 1\), eigenvalues \({\{{\lambda }_{\gamma }\}}_{\gamma \in [D]}\), and eigenstates \(\{{\left\vert \lambda \right\rangle }_{s}\}\). A unitary U_{A} acting on \({{\mathcal{H}}}_{sa}\) is called a (exact) qubitized blockencoding of A if it has the form
where \({\vartheta }_{\gamma }:= \arccos ({\lambda }_{\gamma })\) and Y_{γ} is the second Pauli matrix acting on the twodimensional subspace spanned by \(\{{\vert 0\rangle }_{a}\otimes {\vert {\lambda }_{\gamma }\rangle }_{s},{\vert {\perp }_{{\lambda }_{\gamma }}\rangle }_{sa}\}\) with _{sa}\(\langle {\perp }_{{\lambda }_{\gamma }}\vert ({\vert 0\rangle }_{a}\otimes {\vert {\lambda }_{\gamma }\rangle }_{s})=0\).
A qubitized oracle of the form (21) can be constructed from any other blockencoding \({U}_{A}^{{\prime} }\) of A using at most one query to \({U}_{A}^{{\prime} }\) and \({{U}_{A}^{{\prime} }}^{1}\), at most one additional ancillary qubit, and \({\mathcal{O}}(\log ({D}_{a}))\) quantum gates^{2}.
Blockencoding of Chebyshev polynomials
Standard QSP takes as input the qubitized oracle U_{A} and transforms it into (a blockencoding of) a polynomial function \(\tilde{f}(A)\). With the help of function approximation theory^{15}, this allows the approximate implementation of generic nonpolynomial functions f(A). The algorithm complexity is measured by the number of queries to U_{A}, which allows for rigorous quantitative statements agnostic to details of A or to hardwarespecific circuit compilations. For our purposes, only a simple QSP result is needed, namely the observation^{2} that repeated applications of U_{A} give rise to Chebyshev polynomials of A (see Supplementary Note 1 for a proof).
Lemma 4
(Block encoding of Chebyshev polynomials) Let U_{A} be a qubitized blockencoding of A. Then
for \(j\in {\mathbb{N}}\), where \({{\mathcal{T}}}_{j}(\cdot )\) is the jth order Chebyshev polynomial of the first kind.
We are interested in a truncated Chebyshev series
providing a νapproximation to the target realvalued function \(f:[1,1]\to {\mathbb{R}}\), that is, \(\mathop{\max }\nolimits_{x\in [1,1]}\left\vert f(x)\tilde{f}(x)\right\vert \le \nu\). The Chebyshev polynomials \({{\mathcal{T}}}_{j}\) form a key basis for function approximation, often leading to nearoptimal approximation errors^{15}. In particular, unless the target function is periodic and smooth, they tend to outperform Fourier approximations^{16}. The case of complexvalued functions can be treated similarly by splitting it into its real and imaginary parts. The truncation order k is controlled by the desired accuracy ν in a problemspecific way (explicit examples are presented later).
Hadamard test for block encodings
In Algorithms 1 and 2, for each j_{α} or j_{α}, l_{α} sampled, only one measurement shot is obtained. Here we present lemmas showing the result of performing several measurement shots from the Hadamard tests in Fig. 1 with a fixed circuit, i.e., fixed j_{α} (Lemma 5) or j_{α}, l_{α} (Lemma 6). The proofs can be found in the Supplementary Information.
Lemma 5
(Circuit for Algorithm 1) A singleshot measurement on the Hadamard test of Fig. 1a yields a random variable \({\bf{Had}}({U}_{A}^{j},\left\vert \phi \right\rangle ,\left\vert \psi \right\rangle )\in \{1,1\}\) that satisfies
Lemma 6
(Circuit for Algorithm 2) A singleshot measurement on the Hadamard test of Fig. 1b yields a random variable \({\bf{Had}}({U}_{A}^{l},{U}_{A}^{j},O)\) that satisfies
Resource estimates for FeMoco GSEE
Here, we provide details on the complexity analysis of GSEE for the specific case of FeMoco. We use the sparse qubitized Hamiltonian model of Berry et al.^{54} with the corrections given in ref. ^{55} and the active space of Li et al.^{56}.
The molecule’s Hamiltonian is written as a linear combination \({H}^{{\prime} }=\mathop{\sum }\nolimits_{l = 1}^{L}{b}_{l}\,{V}_{l}\) of unitary operators V_{l}. A blockencoding of \(H={H}^{{\prime} }/\parallel {\bf{b}}{\parallel }_{1}\) is obtained using the method of linear combination of unitaries (LCU)^{57}, i.e., encoding the coefficients b_{l} in the amplitudes of a state of \({\mathcal{O}}(\log (L))\) ancillary qubits, which are used to control the action of each unitary V_{l} in the sum. In order to estimate the groundstate energy of \({H}^{{\prime} }\) up to chemistry precision of 0.0016E_{h} (Hatree), we can run the GSEE algorithm on H with precision ξ = 0.001/∥b∥_{1}, leaving the remaining 0.0006/∥b∥_{1} tolerated error to imprecision in the blockencoding procedure.
The number of terms L in the Hamiltonian, the 1norm ∥b∥_{1} of their coefficients, and also the total number of qubits required depend on the active space used. In the case we consider, this is composed of n = 152 orbitals, each one corresponding to one qubit in the system. In ref. ^{55}, App. A] it was shown that ∥b∥_{1} = 1547E_{h} and L = 440,501. Moreover, the authors show how to construct a qubitized blockencoding oracle for H explicitly using LCU with 2446 logical qubits and 1.8 × 10^{4} Toffoli gates per oracle query. It is worth noting that, in principle, only n + a qubits are strictly required for LCU, with \(a=\lceil \log (L)\rceil =19\). Therefore, the majority of the qubits are used to upload the Hamiltonian coefficients into the a ancillas state and could be removed if a more efficient upload is used. They apply quantum phase estimation (QPE), which requires \(Q=\lceil \frac{\pi }{2\xi }\rceil \,\approx\, 2.4\times 1{0}^{6}\) oracle queries and \(2\,\log (Q+1)\,\approx\, 43\) additional control qubits. In our case, only one control qubit is necessary for the Hadamard test, while the number of oracle calls Q is to be compared to k or \({\mathbb{E}}[j]\).
At last, to implement our algorithm, we need an initial state with a sufficiently large overlap η with the target ground state. In ref. ^{58} it was shown that η^{2} = 10^{−7} can be achieved using Slater determinants. Using this and the value of ξ given above, we explicitly construct the series approximating the Heaviside function θ(H). The truncation degree and the average degree obtained are k = 1.12 × 10^{7} and \({\mathbb{E}}[j]=4.01\times 1{0}^{5}\), respectively. When multiplied by the abovementioned gate complexity per oracle query, we get a total of 2.03 × 10^{11} and 7.26 × 10^{9} Toffoli gates per circuit, respectively. Notice that, when compared to phase estimation, the coherent QSP algorithm demands more coherent queries to the oracle while the randomized processing is less costly on average. At the same time, all three algorithms require a similar number of samples, proportional to η^{−2}. We summarize the complexities of the three methods in Table 2.
Another interesting feature of the randomized scheme is that, due to its variable circuit depth, it allows for variable quantum errorcorrecting code distances. Consequently, the average number of physical qubits required for quantum error correction (QEC) and also the runtime of the typical circuits, which have even lower depth than the average, can be highly reduced further. To give a more quantitative idea, we estimated the QEC overhead using the surface code with the spreadsheet provided in ref. ^{59}. We consider a total number of Toffoli’s obtained using the average degree \({\mathbb{E}}[j]\), (i.e., not taking into account the possible further due to typical circuits with even lower depths). With a level1 code distance d_{1} = 19 and a level2 code distance d_{2} = 31, assuming an error rate per physical gate of 0.001, the errorcorrected algorithm uses 8 million qubits to obtain a global error budget of 1%. This means that all circuits with a depth smaller than the average will have an extremely high fidelity. The circuits corresponding to larger degrees will have an increasingly higher chance of failure, but they contribute less to the final estimate. Meanwhile, the same QEC overhead only ensures a roughly 20% chance of errors to a circuit corresponding to the truncation order k.
Code availability
The code used to generate the figures and tables is available upon request.
References
Low, G. H. & Chuang, I. L. Optimal Hamiltonian simulation by quantum signal processing. Phys. Rev. Lett. 118, 010501 (2017).
Low, G. H. & Chuang, I. L. Hamiltonian simulation by qubitization. Quantum 3, 163 (2019).
Gilyén, A., Su, Y., Low, G. H. & Wiebe, N. Quantum singular value transformation and beyond: Exponential improvements for quantum matrix arithmetics. In Proc. of the 51st Annual ACM STOC 193 (2019).
Martyn, J. M., Rossi, Z. M., Tan, A. K. & Chuang, I. L. Grand unification of quantum algorithms. PRX Quantum 2, 040203 (2021).
Silva, Td. L., Taddei, M. M., Carrazza, S. & Aolita, L. Fragmented imaginarytime evolution for earlystage quantum signal processors. Sci. Rep. 13, 18258 (2023).
de Lima Silva, T., Borges, L. & Aolita, L. Fourierbased quantum signal processing. Preprint at https://arxiv.org/abs/2206.02826 (2022).
Lin, L. & Tong, Y. Heisenberglimited groundstate energy estimation for early faulttolerant quantum computers. PRX Quantum 3, 010318 (2022).
Wang, G., França, D. S., Zhang, R., Zhu, S. & Johnson, P. D. Quantum algorithm for ground state energy estimation using circuit depth with exponentially improved dependence on precision. Quantum 7, 1167 (2023).
Campbell, E. Random compiler for fast Hamiltonian simulation. Phys. Rev. Lett. 123, 070503 (2019).
Wan, K., Berta, M. & Campbell, E. T. Randomized quantum algorithm for statistical phase estimation. Phys. Rev. Lett. 129, 030503 (2022).
Wang, S., McArdle, S. & Berta, M. Qubitefficient randomized quantum algorithms for linear algebra. PRX Quantum 5, 020324 (2024).
Campbell, E. T. Early faulttolerant simulations of the Hubbard model. Quantum Sci. Technol. 7, 015007 (2021).
Dong, Y., Lin, L. & Tong, Y. Groundstate preparation and energy estimation on early faulttolerant quantum computers via quantum eigenvalue transformation of unitary matrices. PRX Quantum 3, 040305 (2022).
Wang, G., França, D. S., Rendon, G. & Johnson, P. D. Faster ground state energy estimation on early faulttolerant quantum computers via rejection sampling. Preprint at https://arxiv.org/abs/2304.09827 (2023).
Trefethen, L. N. Approximation Theory and Approximation Practice (SIAM, 2012).
Boyd, J. P. Chebyshev and Fourier Spectral Methods (Dover, Mineola, New York, 2001).
Childs, A. M., Kothari, R. & Somma, R. D. Quantum algorithm for systems of linear equations with exponentially improved dependence on precision. SIAM J. Sci. Comput. 46, 1920 (2017).
Costa, P. C. et al. Optimal scaling quantum linearsystems solver via discrete adiabatic theorem. PRX Quantum 3, 040303 (2022).
Sachdeva, S. & Vishnoi, N. K. Faster algorithms via approximation theory. Found. Trends Theor. Comput. Sci. 9, 125 (2013).
Aolita, L., de Melo, F. & Davidovich, L. Opensystem dynamics of entanglement: a key issues review. Rep. Prog. Phys. 78, 042001 (2015).
Ma, J., Peng, J., Wang, S. & Xu, J. Estimating the partition function of graphical models using Langevin importance sampling. PMLR 31, 433 (2013).
Krause, O., Fischer, A. & Igel, C. Algorithms for estimating the partition function of restricted Boltzmann machines. Artif. Intell. 278, 103195 (2020).
Shim, A. A probabilistic interpretation of transformers. Preprint at https://arxiv.org/abs/2205.01080 (2022).
Bulatov, A. & Grohe, M. The complexity of partition functions. Theor. Comput. Sci. 348, 148 (2005).
Weiße, A., Wellein, G., Alvermann, A. & Fehske, H. The kernel polynomial method. Rev. Mod. Phys. 78, 275 (2006).
Bravyi, S., Chowdhury, A., Gosset, D. & Wocjan, P. Quantum Hamiltonian complexity in thermal equilibrium. Nat. Phys. 18, 1367–1370 (2022).
Poulin, D. & Wocjan, P. Sampling from the thermal quantum Gibbs state and evaluating partition functions with a quantum computer. Phys. Rev. Lett. 103, 220502 (2009).
Chowdhury, A. N., Somma, R. D. & Subasi, Y. Computing partition functions in the one clean qubit model. Phys. Rev. A 103, 032422 (2021).
Jackson, A., Kapourniotis, T. & Datta, A. Partitionfunction estimation: quantum and quantuminspired algorithms. Phys. Rev. A 107, 012421 (2023).
Sun, S.N. et al. Quantum computation of finitetemperature static and dynamical properties of spin systems using quantum imaginary time evolution. PRX Quantum 2, 010317 (2021).
Szegedy, M. Quantum speedup of Markov chain based algorithms. In Proc. of the 45th FOCS Vol. 32 (2004).
Lemieux, J., Heim, B., Poulin, D., Svore, K. & Troyer, M. Efficient quantum walk circuits for Metropolis–Hastings algorithm. Quantum 4, 287 (2020).
Trefethen, L. N. & Bau, D. Numerical Linear Algebra (SIAM, 1997).
Saad, Y. Iterative Methods for Sparse Linear Systems (SIAM, 2003).
Strohmer, T. & Vershynin, R. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262 (2007).
Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502 (2009).
Lin, L. & Tong, Y. Nearoptimal ground state preparation. Quantum 4, 372 (2020).
An, D. & Lin, L. Quantum linear system solver based on timeoptimal adiabatic quantum computing and quantum approximate optimization algorithm. ACM Trans. Quantum Comput. 3, 1 (2022).
Subasi, Y., Somma, R. D. & Orsucci, D. Quantum algorithms for systems of linear equations inspired by adiabatic quantum computing. Phys. Rev. Lett. 122, 060504 (2019).
Wossnig, L., Zhao, Z. & Prakash, A. Quantum linear system algorithm for dense matrices. Phys. Rev. Lett. 120, 050502 (2018).
Clader, B. D. et al. Quantum resources required to blockencode a matrix of classical data. IEEE Trans. Quantum Eng. 3, 1 (2022).
Clinton, L. et al. Towards nearterm quantum simulation of materials. Nat. Commun. 15, 211 (2024).
Kempe, J., Kitaev, A. & Regev, O. The complexity of the local Hamiltonian problem. SIAM J. Comput. 35, 1070 (2006).
Tubman, N. M. et al. Postponing the orthogonality catastrophe: efficient state preparation for electronic structure simulations on quantum devices. Preprint at https://arxiv.org/abs/1809.05523 (2018).
Abrams, D. S. & Lloyd, S. Quantum algorithm providing exponential speed increase for finding eigenvalues and eigenvectors. Phys. Rev. Lett. 83, 5162 (1999).
Ge, Y., Tura, J. & Cirac, J. I. Faster ground state preparation and highprecision ground energy estimation with fewer qubits. J. Math. Phys. 60, 022202 (2019).
Poulin, D. & Wocjan, P. Preparing ground states of quantum manybody systems on a quantum computer. Phys. Rev. Lett. 102, 130503 (2009).
Atia, Y. & Aharonov, D. Fastforwarding of Hamiltonians and exponentially precise measurements. Nat. Commun. 8, 1572 (2017).
Reiher, M., Wiebe, N., Svore, K. M., Wecker, D. & Troyer, M. Elucidating reaction mechanisms on quantum computers. Proc. Natl. Acad. Sci. USA 114, 7555 (2017).
McArdle, S., Endo, S., AspuruGuzik, A., Benjamin, S. C. & Yuan, X. Quantum computational chemistry. Rev. Mod. Phys. 92, 015003 (2020).
Kim, I. H. et al. Faulttolerant resource estimate for quantum chemical simulations: case study on Liion battery electrolyte molecules. Phys. Rev. Res. 4, 023019 (2022).
Camps, D., Lin, L., Van Beeumen, R. & Yang, C. Explicit quantum circuits for block encodings of certain sparse matrices. SIAM J. Matrix Anal. Appl. 45, 801 (2024).
Sünderhauf, C., Campbell, E. & Camps, J. Blockencoding structured matrices for data input in quantum computing. Quantum 8, 1226 (2024).
Berry, D. W., Gidney, C., Motta, M., McClean, J. R. & Babbush, R. Qubitization of arbitrary basis quantum chemistry leveraging sparsity and low rank factorization. Quantum 3, 208 (2019).
Lee, J. et al. Even more efficient quantum computations of chemistry through tensor hypercontraction. PRX Quantum 2, 030305 (2021).
Li, Z., Li, J., Dattani, N. S., Umrigar, C. J. & Chan, G. K.L. The electronic complexity of the groundstate of the FeMo cofactor of nitrogenase as relevant to quantum simulations. J. Chem. Phys. 150, 024302 (2019).
Childs, A. M. & Wiebe, N. Hamiltonian simulation using linear combinations of unitary operations. Quantum Inf. Comput. 12, 901 (2012).
Lee, S. et al. Evaluating the evidence for exponential quantum advantage in groundstate quantum chemistry. Nat. Commun. 14, 1952 (2023).
Gidney, C. & Fowler, A. G. Efficient magic state factories with a catalyzed \(\left\vert CCZ\right\rangle\) to \(\left\vert CCZ\right\rangle\) transformation. Quantum 3, 135 (2019).
Acknowledgements
A.T. and L.A. acknowledge financial support from the Serrapilheira Institute (grant number Serra170917173). We thank Lucas Borges, Samson Wang, Sam McArdle, Mario Berta, Daniel StilckFrança, and Juan Miguel Arrazola for helpful discussions.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to this manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License, which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/byncnd/4.0/.
About this article
Cite this article
Tosta, A., de Lima Silva, T., Camilo, G. et al. Randomized semiquantum matrix processing. npj Quantum Inf 10, 93 (2024). https://doi.org/10.1038/s41534024008830
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41534024008830