Introduction

Quantum computing offers speedups over the classical counterpart in different tasks, including factoring, searching, simulation, etc1. However, the speedups, in many cases, rely on the existence of efficient oracles or access models to encode the related classical data2. In this context, a function f(x) representing the classical data of interest is encoded using a unitary operation Uf, which acts as an oracle in the computation. To study quantum advantages, the number of queries to Uf in a quantum algorithm is compared to the number of queries to f(x) in classical algorithms. Quantum computing provides substantial reduction in query complexity for many problems of practical importance3,4,5.

There are various access models to encode classical data. One commonly used access model is the sparse-access input model (SAIM)4,5,6,7,8,9,10,11,12, which encodes general sparse matrices and outputs the value or position of the non-zero elements when provided with appropriate inputs. SAIM is initially introduced for Hamiltonian simulation and discrete quantum walks4,6,7,8, and has then found broad applications in other fields such as machine learning5,9,11 and classical oscillator simulations12. For example, the quantum linear system problem could be solved with \(\tilde{O}(\kappa )\) queries to SAIM5,9,10, where κ represents the condition number of the matrix to be inverted.

Another important access model is block-encoding, which serves as a crucial subroutine for quantum signal processing13,14 and its generalization—quantum singular-value transformations (QSVT)10,15. The success of block-encoding enables the realization of Hamiltonian simulation with an optimal query complexity13,14. Furthermore, many seminal quantum algorithms, including Grover’s algorithm, quantum Fourier transformation, and the HHL algorithm, could be viewed as special cases of QSVT, where the problem of interest is encoded using block-encoding15.

Many existing works treat access models as black boxes for convenience. However, the actual circuit complexity of the algorithm also depends on the cost of each query to these access models. While being important, this problem only draws much attention very recently with many basic problems still left open. In particular, ref. 16 presents a nearly time-optimal protocol for block-encoding of general dense matrices of 2n × 2n dimension. A circuit depth of \(\tilde{O(n)}\) can be achieved at the expense of exponential ancillary qubits. ref. 17 examines matrices with D data each appearing M times and considers examples including checkerboard matrices and tridiagonal matrices with polynomial circuit complexities. However, the cost of block-encoding of more general matrices remains unexplored. Moreover, it is still unclear if there is a fundamental limit to the resource required by data encoding.

In this work, we provide a framework of constructing quantum access models in the fault-tolerant setting using Clifford + T gates. The protocol works for general classical data and takes the underlying structure of the data, such as sparsity and linear combintaion of unitaries (LCU), into consideration. Our results represent a direct mapping from the query complexity of quantum algorithms to their practical circuit complexity. Our protocols allow tunable ancillary qubit numbers and offer space-time trade-off. For general sparse matrices of dimension 2n = N, we investigate the SAIM and block-encoding. For both access models, we first show that the gate count lower bound increases about linearly with respect to N. We then develop construction algorithms with varying ancillary qubit numbers ranging from Ω(n) to O(N). Across the entire range of qubit numbers, we achieve nearly optimal circuit complexity. We next study the block-encoding of LCU. Efficient block-encoding is achievable when the matrix can be represented as a linear combination of a polynomial number of unitaries, which can be implemented using polynomial-size quantum circuits.

Our access model construction relies on optimized realizations of various subroutines that are independently valuable, including quantum state preparation, selective oracles for Pauli strings, and sparse Boolean functions. In all the listed operations, we achieve improved or at least comparable circuit complexities compared to the best-known realizations.

We now introduce the definition of SAIM and block-encoding in below. Let N = 2n, we consider a sparse matrix \(H\in {{\mathbb{C}}}^{N\times N}\) with at most s = O(1) nonzero elements at each row and column. Let Hx,y be the value of the element at the xth row and yth column, and each Hx,y is a d-digit integer (d = O(1)). Let idx denote a 2n-qubit index register, and wrd denote an n-qubit word register, the sparse-access input model (SAIM) corresponds to two unitaries OH, OF, which satisfies

$${O}_{H}{\vert x,y\rangle }_{{{{\rm{idx}}}}}{\vert z\rangle }_{{{{\rm{wrd}}}}}={\vert x,y\rangle }_{{{{\rm{idx}}}}}{\vert z\oplus {H}_{x,y}\rangle }_{{{{\rm{wrd}}}}},$$
(1a)
$${O}_{F}{\vert x,k\rangle }_{{{{\rm{idx}}}}}={\vert x,F(x,k)\rangle }_{{{{\rm{idx}}}}}.$$
(1b)

Here, F(x, k) is the column index of the kth nonzero element in row x. Due to its simplicity and generality, Eq. (1) becomes one of the standard access models in quantum computing, which is usually assumed to be available in processing classical data.

We call a unitary U the block encoding of H if we have

$$\alpha \left(\left\langle {0}^{{n}_{{{{\rm{anc}}}}}}\right\vert \otimes {{\mathbb{I}}}_{N}\right)U\left(\left\vert {0}^{{n}_{{{{\rm{anc}}}}}}\right\rangle \otimes {{\mathbb{I}}}_{N}\right)=H,$$

where α > 0 is the normalization factor, nanc is the number of ancillary qubits, and \({{\mathbb{I}}}_{N}\) is the N-dimensional identity. In practice, we may consider approximated construction of the block encoding. More specifically, we call unitary \(\tilde{U}\) an (α, nanc, ε) − block-encoding of H if

$$\begin{array}{r}\left\Vert H-\alpha \left(\left\langle {0}^{{n}_{{{{\rm{anc}}}}}}\right\vert \otimes {{\mathbb{I}}}_{N}\right)\tilde{U}\left(\left\vert {0}^{{n}_{{{{\rm{anc}}}}}}\right\rangle \otimes {{\mathbb{I}}}_{N}\right)\right\Vert\, \leqslant \,\varepsilon \end{array}$$
(2)

for error parameter ε ≥ 0. Throughout our manuscript, represents either the spectral norm for matrices or Euclidean norm for vectors. For a general N-dimensional matrix H, the construction of its block-encoding requires Ω(Poly(N)) gate count. This is true even for sparse H as we show in Supplementary Discussion 2.

On the other hand, when H has some other structures, the resource may be significantly reduced. In particular, we consider H in the form of a linear combination of unitaries (LCU) as

$$\begin{array}{r}H=\mathop{\sum }\limits_{p=0}^{P-1}{\alpha }_{p}{u}_{p},\end{array}$$
(3)

where up are n-qubit unitaries that can be implemented with polynomial-size quantum circuit, and P = O(poly(n)). The concept “LCU” appeared firstly in18. The main purpose of ref. 18 and the follow-up work ref. 19 is to realize non-unitary transformation on quantum computers. In the context of Hamiltonian simulation, ref. 20 has shown that LCU-based method can outperform product formula based methods. Many subsequent works with different applications have then been inspired14,15,21,22,23.

Without loss of generality, we may assume that \({\log }_{2}P\) is an integer, and \(\mathop{\sum }\nolimits_{p = 0}^{P-1}{\alpha }_{p}=1\). This can always be satisfied by adding terms with zero amplitude, and rescaling the Hamiltonian. In particular, the linear combination of Pauli strings

$$\begin{array}{r}H=\mathop{\sum }\limits_{p=0}^{P-1}{\alpha }_{p}{H}_{p}\end{array}$$
(4)

will be studied in details. Here, \({\alpha }_{p}\, > \,0,P\,\geqslant \,1,{H}_{p}{ = \bigotimes }_{l = 1}^{n}{H}_{p,l}\), and Hp,l { ± I, ± X, ± Y, ± Z} are single-qubit Pauli operators. Eq. (4) is important as it corresponds to the Hamiltonian of almost all physical quantum systems, such as the spin and molecular systems.

In our constructions, we consider the fault-tolerant quantum computing setting. More specifically, we only use two-qubit Clifford gate and single-qubit T gate, which is equivalent to the elementary gate set \({{{{\mathcal{G}}}}}_{{{{\rm{clf}}}}+T}\equiv \{H,S,T,\,{{\mbox{CNOT}}}\,\}\). All gates in \({{{{\mathcal{G}}}}}_{{{{\rm{clf}}}}+T}\) are error-correctable with surface code24. We benchmark the circuit complexity of a given quantum circuit with three quantities: total number of elementary gates, total qubit number (including data qubits and ancillary qubits), and circuit depth. We will also discuss the space-time trade-off of our algorithm, i.e. the circuit depth under a certain number of ancillary qubits. We also allow at least O(n) ancillary qubits, because this does not increase the total space complexity.

Results

Circuit complexity lower bound

Before discussing the access model construction, we first study the lower bound of the circuit complexity. We will focus on the encoding of sparse matrices. The methodology here is general and can be readily applied to other related problems.

Our strategy is as follows. Firstly, we analyze the capacity of a quantum circuit with bounded resource, i.e. how much unique unitaries can be constructed, given fixed number of elementary gates or circuit depth. Secondly, we analyze the size of the access model, i.e. the number of unique unitaries required to approximate the access model with arbitrary parameters. The circuit complexity can then be estimated by comparing the capacity of a quantum circuit and the size of the access model. All proofs of our lemma and theorems in this section are provided in Supplementary Discussion 1.

Quantum circuit capacity

Assuming that we are given a finite two-qubit elementary gate set \({{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}\). We define \(g\equiv | {{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}| =O(1)\) with the number of elements in the set. Our first result is that the capacity can be lower bounded only with the number of elementary gates, independent of the space and time resources.

Lemma 1

Let \({{{{\mathcal{G}}}}}_{C}\) be the set containing all n-qubit unitaries that can be constructed with C elementary gates in \({{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}\). Then, we have \(\log | {{{{\mathcal{G}}}}}_{C}| =O\left((C\log (C+n))\right)\), even with unlimited ancillary qubit number.

Lemma 1 implies that the capacity does not always increase with ancillary qubit number, which can be understood as follows. All ancillary qubits should be uncomputed at the end of the circuit. When C is fixed, only finite number of unitaries can satisfy this requirement, while constructable by those elementary gates. We also note that the circuit depth D is bounded by C, so Lemma 1 also implies a relation between capacity and circuit depth.

On the other hand, when the ancillary qubit number and circuit depth are finite, the lower bound of capacity can be tighten as follows.

Lemma 2

Let \({{{{\mathcal{G}}}}}_{{n}_{{{{\rm{anc}}}}},D}^{{\prime} }\) be the set containing all unitaries that can be constructed with nanc ancillary qubits and D circuit depth. Then, we have \(\log \left\vert {{{{\mathcal{G}}}}}_{{n}_{{{{\rm{anc}}}}},D}^{{\prime} }\right\vert =O\left(D(n+{n}_{{{{\rm{anc}}}}})\right)\).

Lemma 1, 2 represent the ultimate representational power of quantum circuits constructed with local gates. Lemma 1 and 2 can be used to estimate the circuit complexity lower bound whenever the tasks have requirement on \(| {{{{\mathcal{G}}}}}_{C}|\) or \(| {{{{\mathcal{G}}}}}_{{n}_{{{{\rm{anc}}}}},D}^{{\prime} }|\). Moreover, similar results can be obtained straightforwardly for other type of elementary gate sets, such as k-local operations with k > 2.

Circuit complexity for encoding sparse matrices

With Lemma 1 and 2, we now estimate the circuit complexity lower bound for accessing sparse matrices. For SAIM, it turns out that at least Ω(N!) unique unitaries are required to cover the set of all SAIM for 1-sparse matrices. So according to Lemma 1, 2, we have the following result.

Theorem 1

Given an arbitrary finite two-qubit elementary gate set \({{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}\). Let nanc, D and C be the number of ancillary qubits, circuit depth and total number of gates in \({{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}\) required to approximate SAIM in Eq. (1) with any accuracy ε < 1. Then, we have (n + nanc)D = Ω(2nn) and C = Ω(2n).

A similar result is also obtained for the block-encoding of sparse matrix as follows.

Theorem 2

Given an arbitrary finite two-qubit elementary gate set \({{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}\). Let nanc, D and C be the number of ancillary qubits, circuit depth and total number of gates in \({{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}\) required to construct the block-encoding of H with any accuracy ε < 2. Then, we have (n + nanc)D = Ω(N) and C = Ω(Nα) for arbitrary α (0, 1).

Theorem 1, 2 imply that a general sparse matrix can not be encoded with subexponential quantum gates, for both SAIM and block-encoding. It is possible to trade ancillary qubit numbers for the circuit depth. However, the space and time complexities can not achieve sub-exponential scaling simultaneously. The hardness of SAIM can be interpreted as follows. Although H is assumed to be sparse (O(1) nonzero elements at each row and column), there are still totally 2n × O(1) = O(2n) number of independent variables in total. Therefore, the quantum circuit should be large enough to contain exponential number of elementary gates.

We note that the quantum circuits capacity for ancillary-free case has been studied in Section 4.5.4 of1. Moreover, a related result to Theorem 1 has obtained in25, which gives a distinct quantum circuit number lower bound with fixed qubit number, and show that there exists a table of size N requiring Ω(N) gate count. ref. 1 allows approximated implementations, but does not consider ancillary qubit usage. ref. 25 implicitly allows ancillary qubits, but does not consider approximated implementations. On the contrary, our results are more general, because both ancillary qubit usage and approximated implementations are allowed. Our results can be generalized from unitary to quantum channels. In Supplementary Discussion 1, we show that the circuit capacity and circuit lower bound are similar if we consider two-qubit quantum channels as elementary quantum operations, which can include measurement and feedback controls.

Quantum state preparation

Quantum state preparation is a critical step of our access model construction and of independent interest. We say that a (n + nanc) qubit unitary G prepares the n-qubit quantum state \(\vert \psi \rangle\) with accuracy ε if

$$\begin{array}{r}G(\vert {0}^{n}\rangle \otimes \vert {0}^{{n}_{{{{\rm{anc}}}}}}\rangle )=\vert \tilde{\psi }\rangle \otimes\vert {0}^{{n}_{{{{\rm{anc}}}}}}\rangle \end{array}$$
(5)

for some \(\Vert \vert \psi \rangle -\vert \tilde{\psi }\rangle \Vert\, \leqslant \,\varepsilon\).

Such a problem has been studied extensively16,26,27,28,29,30,31,32,33,34,35,36,37. When given sufficiently large among of ancillary qubits, the optimal Clifford + T depth \(O(n+\log (1/\varepsilon ))\) can be achieved37. However, with restricted ancillary qubit number, the optimal circuit depth has not been reached. For example, with O(n) ancillary qubits, the best-known Clifford + T construction has achieved \(O((N/n)\log (N/\varepsilon ))\) circuit depths32. Besides, for gate count scaling, all existing algorithms have either O(Npoly(n)) or O(Npolylog(n)) Clifford + T count. It remains an outstanding question if a linear gate count scaling with respect to the data dimension N can be reached.

Here, we provide a family of improved quantum state preparation protocols with tunable ancillary qubit number. The result is summarized in below (follows directly from Theorem 8 in Methods).

Theorem 3

With nanc ancillary qubits where Ω(n) nancO(N), an arbitrary n-qubit quantum state can be prepared to accuracy ε with \(O(N\log (1/\varepsilon ))\) count and \(\tilde{O}\left(N\log (1/\varepsilon )\frac{\log ({n}_{{{{\rm{anc}}}}})}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates, where \(\tilde{O}\) suppresses the doubly logarithmic factors of nanc.

Theorem 3 achieves linear scaling of Clifford + T count with respect to N, and this is applied for arbitrary space complexity. When nanc = O(n), the circuit depth is lower than the best-known result of \(O\left(N\frac{\log (N/\varepsilon )}{n}\right)\). Moreover, compared to32 which also study the space-time trade-off of state preparation, our method improves the circuit depth scaling for a factor of \(\tilde{O}({n}_{{{{\rm{anc}}}}}/\log {n}_{{{{\rm{anc}}}}})\). Summary of some representative state preparation protocols are provided in Table 1 and Table 2.

Table 1 Clifford+T complexities of n-qubit state preparation protocols with fixed accuracy ε and total qubit (data qubit + ancillary qubit) number O(n)
Table 2 Clifford+T complexities of n-qubit state preparation protocols with fixed ε and exponential ancillary qubits

The main idea of our construction is as follows (see also Fig. 1). For nanc = O(n), we construct the quantum state with a set of uniformly controlled rotations (UCR) with the method in28. Instead of decomposing each UCR with identical accuracy, we distribute the decomposition error in an optimized way. UCR with m controlled qubits, denoted as m-UCR, should be decomposed into 2m number of m-qubit controlled rotations. When performing Clifford + T decomposition, to reduce the total circuit complexity, we allow larger decomposition accuracy when m becomes larger.

Fig. 1: State preparation achieving O(N) Clifford + T count for few qubit case.
figure 1

The operation is decomposed into uniformly controlled Z- and Y-rotations, whose control and rotation parts are denoted with red and blue colors respectively. Each m-UCR is decomposed into 2m multi-qubit controlled single-qubit rotations, and m increases with the opacity of the control part (red). Each Z- or Y-rotation is decomposed into Clifford + T gates, and the decomposition accuracy increases with the opacity of the rotation part (blue).

For nanc = O(N), we improve the Clifford + T decomposition of the method in34 in a similar way. In both cases, the gate count scaling \(O(N\log (1/\varepsilon ))\) is achieved. For arbitrary ancillary qubit number between two extreme cases, we provide a scheme combing two protocols together, which allows space-time trade-off. Details of our state preparation scheme and the corresponding complexity analysis are provided in Methods. We also note that our protocol for few qubit case can be combined with the depth-optimal scheme in37. The circuit depth can then be improved to \(O(N\log (1/\varepsilon ){n}_{{{{\rm{anc}}}}}/\log ({n}_{{{{\rm{anc}}}}}))\), at the cost of higher gate count.

We note that when the quantum state is sparse, the circuit complexity will be significantly lower. The construction of sparse state preparation is useful for sparse block-encoding. Details about sparse state preparation and sparse matrix block-encoding are provided in Supplementary Discussion 2.

Other useful subroutines

Before discussing the construction of access models in Eq. (1) and Eq. (2), we introduce some other useful subroutines, including select oracle and quantum sparse Boolean memory. These operations may have applications individually in some other scenarios. For both operations, we obtain their space-time trade-off constructions, which have improved or comparable Clifford + T complexities compared to the best-known realizations (see also Table 3).

Table 3 Summary of the Clifford + T circuit complexities of the operations serving as subroutines in this work

Select oracle for Pauli strings

We consider a function of Pauli strings \({H}_{x}{ = \bigotimes }_{l = 1}^{L}{H}_{x,l}\), where x {0, 1,  , 2m − 1} and Hx,l { ± I, ± X, ± Y, ± Z}. We introduce two registers, the index register contains m qubits, and the word register contains L qubits. Select oracle for Hx is defined as

$$\begin{array}{r}\,{{\mbox{Select}}}\,({H}_{x})=\mathop{\sum }\limits_{x=0}^{{2}^{m}-1}\left\vert x\right\rangle \left\langle x\right\vert \otimes {H}_{x},\end{array}$$
(6)

where \(\left\vert x\right\rangle\) represents the computational basis of index register, and the unitary Hx is applied at the word register. In other words, the state of index register controls the operations applied at the word register.

Several proposals of implementing Eq. (6) has been introduced in the literature. For example, with nanc = m ancillary qubits, ref. 38 (Appendix G.4) proposed a method achieving O(ML) circuit depth and gate count with M = 2m. With nanc = O(ML) ancillary qubits, Eq. (6) is a special form of the “product unitary memory” in34, which can be constructed with \(O(\log (ML))\) depth and O(ML) count of Clifford + T gates. We provide an algorithm with tunable ancillary qubit number achieving the circuit complexity as follows.

Theorem 4

With nanc ancillary qubits where Ω(m + L) nancO(ML), Eq. (6) can be realized with O(ML) count and \(O\left(ML\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates.

Compared to the result in ref. 38, our protocol reduces the circuit depth for a factor of \(O\left(\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) while maintaining the gate count scaling. The proof of Theorem 4 and details of circuit constructions are provided in Methods.

Sparse Boolean memory

We consider a sparse Boolean function \(B:{\{0,1\}}^{n}\to {\{0,1\}}^{\tilde{n}}\), which has totally s input digits q satisfying B(q) ≠ 0  0. Given an n-qubit index register (denoted as idx) and a \(\tilde{n}\)-qubit register (denoted as wrd), we define the sparse Boolean memory Select(B) as a unitary satisfying

$$\begin{array}{r}\,{{\mbox{Select}}}\,(B){\vert q\rangle }_{{{{\rm{idx}}}}}{\vert z\rangle }_{{{{\rm{wrd}}}}}={\vert q\rangle }_{{{{\rm{idx}}}}}{\vert z\oplus B(q)\rangle }_{{{{\rm{wrd}}}}}.\end{array}$$
(7)

We have the following result (see Methods for proof).

Theorem 5

With nanc ancillary qubits where \(\Omega (n)\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(ns\tilde{n})\), Select(B) can be realized with \(O(ns\tilde{n})\) count and \(O\left(ns\tilde{n}\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates.

Different from SAIM, Eq. (7) contains constant number of nonzero outputs. So its construction requires much less resource.

Construction of SAIM

With all necessary tools ready, we now discuss the construction of the SAIM in Eq. (1). We have the following result.

Theorem 6

Given nanc ancillary qubits where Ω(n) nancO(Nnds), OH can be constructed with O(Nnds) count and \(O\left(Nnds\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates.

Given nanc ancillary qubits where \(\Omega (n)\leqslant {n}_{{{{\rm{anc}}}}}\leqslant O(Nns\log s),{O}_{F}\) can be constructed with \(O(Nns\log s)\) count and \(O\left(Nns\log s\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates.

Proof

OH corresponds to a 2n-index, d-word and Ns-sparse Boolean function. So the construction of OH follows directly from Theorem 5.

The construction of OF can be realized in three steps. We introduce an n-qubit ancillary register (denoted as anc). In the first step, we perform the following transformation

$$\begin{array}{r}{\vert x,k\rangle }_{{{{\rm{idx}}}}}{\vert 0\rangle }_{{{{\rm{anc}}}}}\to {\vert x,k\rangle }_{{{{\rm{idx}}}}}{\vert F(x,k)\rangle }_{{{{\rm{anc}}}}}.\end{array}$$
(8)

According to Theorem 4, this step can be constructed with O(2nns) count and \(O({2}^{n}ns\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}})\) depth with nanc ancillary qubits. In the second step, we apply swap gates between the ancillary register and half of the index register which encodes k, i.e.

$$\begin{array}{r}{\left\vert x,k\right\rangle }_{{{{\rm{idx}}}}}{\left\vert F(x,k)\right\rangle }_{{{{\rm{anc}}}}}\to {\left\vert x,F(x,k)\right\rangle }_{{{{\rm{idx}}}}}{\left\vert k\right\rangle }_{{{{\rm{anc}}}}}\end{array}$$
(9)

This step can be realized with O(n) count and O(1) depth of Clifford + T gates. In the final step, we perform the transformation

$$\begin{array}{r}{\vert x,F(x,k)\rangle }_{{{{\rm{idx}}}}}{\vert k\rangle }_{{{{\rm{anc}}}}}\to {\vert x,F(x,k)\rangle }_{{{{\rm{idx}}}}}{\vert 0\rangle }_{{{{\rm{anc}}}}}\end{array}$$
(10)

which can be realized by a 2n-index, \(\lceil {\log }_{2}s\rceil\)-word and Ns-sparse Boolean memory. According to Theorem 5, this step can be constructed with \(O(Nns\log s)\) count and \(O\left(Nns\log s\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates. The total gate complexity is therefore the combination of three steps above. □

Compared to the circuit complexity lower bound obtained in Theorem 1, our protocol has nearly optimal circuit complexities with respect to the matrix dimension up to a factor of n. As mentioned before, SAIM is a standard access model in many quantum algorithms, and the query complexity to SAIM has been studied extensively for various tasks. With Theorem 6, one can directly obtain the natural circuit complexity of those algorithms. Further discussions are provided in the DISCUSSION section.

Construction of LCU-based block-encoding

The construction of LCU-based block-encoding can be realized with quantum state preparation and select oracle13. We define α = [α1,  , αP] and \(\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle =\mathop{\sum }\nolimits_{p = 1}^{P}\sqrt{{\alpha }_{p}}\left\vert p\right\rangle\). Let \({G}_{\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle }\) be the state preparation unitary for \(\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle\), and we define \({\mathbb{G}}\equiv {G}_{\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle }\otimes {{\mathbb{I}}}_{{2}^{n}}\). We then define a Select oracle corresponding to Eq. (3) as \(\,{{\mbox{Select}}}\,({u}_{p})=\mathop{\sum }\nolimits_{p = 0}^{P-1}\vert p\rangle \langle p\vert \otimes {u}_{p}\). It can be verified that \({{\mathbb{G}}}^{{\dagger} }\,{{\mbox{Select}}}\,({u}_{p}){\mathbb{G}}\) is a block-encoding of H with normalization factor α = 114. The constructions of LCU-based block-encoding is then reduced to the quantum state preparation and Select(up), both of which can be constructed with polynomial-size quantum circuits.

The exact circuit complexity of block-encoding depends on the specific form of up. We take the LCU for Pauli strings (Eq. (4)) as an example. Based on our improved quantum state preparation (Theorem 3) and Select oracle for Pauli strings (Theorem 4), we have the following result, where (nanc, ε)-block-encoding is the abbreviation of (1, nanc, ε)-block-encoding (see Methods section for proof).

Theorem 7

With nanc ancillary qubits where \(\Omega ({\log }_{2}P)\,\leqslant\, {n}_{{{{\rm{anc}}}}}\,\leqslant \,O(NP)\), the (nanc, ε)-block-encoding of H defined in Eq. (4) can be constructed with \(O\left(P(n+\log (1/\varepsilon ))\right)\) count and \(\tilde{O}\left(Pn\log (1/\varepsilon )\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates, where \(\tilde{O}\) suppresses the doubly logarithmic factors of nanc.

The block-encoding of LCU can be constructed with polylogarithmic circuit complexity with respect to the data dimension, as oppose to the SAIM requiring polynomial gate count. Therefore, for structured classical data in the form of Eq. (3) exponential quantum advantage can be expected. In below, we provide further discussions about by our results.

Discussion

As demonstrated in Theorem 1, a general SAIM can not be implemented with O(Poly(n)) size quantum circuit. In the language of complexity class, this implies that BQPSAIM ≠ BQP, where SAIM represent the quantum oracles in the form of Eq. (1). In other words, if problem A can be solve with polynomial number of queries to the SAIM, A is not necessarily solvable with polynomial-size quantum circuits. In fact, it is reasonable to conjecture that BQPSAIM ≠ PSPACE when considering the scaling with n. The reason is that for a general matrix with 2n dimension, storing all its element requires exponentially large space, and this is true even for sparse matrix. The same argument applies to the block-encoding of sparse matrices as well.

This argument is consistent with the results about classical dequantization algorithms39,40, which demonstrate that sub-linear classical runtime can be achieved for tasks such as recommendation systems and solving linear systems. Note that these algorithms assumes a classical oracle similar to SAIM.

On the other hand, our study on sparse matrix encoding still has its great value. First of all, it is rare to have structured classical data that can be encoded with logarithmic complexity. In many cases, sparse matrix is the most compact representation for classical data of interest. Second, with SAIM or block-encoding, polynomial quantum speedup with respect to the matrix dimension N is still possible. Our constructions are nearly optimal, and can be used to estimate the concrete Clifford + T complexities of many quantum algorithms of practical interest. Finally, techniques developed here may serve as a subroutine for encoding a larger matrix with special structures, with which the with which exponential quantum advantage may be possible.

An open question is how to determine whether a given matrix is efficiently block-encodable. This problem can be considere as a generalization of the unitary complexity problem41,42,43,44, which is important due to the broad applications of block-encoding15. According to Theorem 7, LCU for efficient unitaries [Eq. (3)] is a sufficient condition of efficient block-encoding. Due to the generality and simplicity of LCU, it is reasonable to conjecture that the decomposition of a matrix in the form of Eq. (3) has close relation to the efficiency of its block-encoding. The block encoding of H is challenging when it can not be well approximated by Eq. (3) with P = O(Poly(n)).

In conclusion, we have studied the circuit complexities of typical quantum access models, such as SAIM and block-encoding. We show that the circuit complexity lower bound for encoding sparse matrix is polynomial with respect to the matrix dimension. We provide nearly-optimal construction protocols to achieve the lower bound. For LCU-based block-encoding, we develop a construction protocol based on the improved implementation of quantum state preparation and select oracle for Pauli strings. Our protocols are based on Clifford + T gates and allow tunable ancillary qubit number. We expect that our results are useful for processing classical data with quantum devices45,46,47. Future works may include the study of the circuit complexity lower bound for block-encoding, and how to further improve our protocols to achieve the lower bounds. Another interesting topic is about the power of quantum circuits with global quantum channels. For example, if the feedback controls are dependent on the measurement outcomes of many measurements. In this case, the elementary operations may no longer be described by local operations, and the computation power of the circuit is expected to be enhanced. In the direction of applications, it is interesting to find practical classical problems, whose data structure are able to be represented in the form LCU. In those scenarios, exponential quantum advantage can be expected.

Methods

Quantum state preparation

We first consider the preparation with n ancillary qubits. There are some state preparation protocol with optimal single- and two-qubit gate count, such as ref. 28. However, with direct Clifford + T decomposition, the gate complexity becomes suboptimal. We achieve gate count and circuit depth linear to the state dimension with an optimized Clifford + T decomposition. The result is as follows.

Lemma 3

With n ancillary qubits, an arbitrary quantum state can be prepared to precision ε with \(O(N\log (1/\varepsilon ))\) depth and \(O(N\log (1/\varepsilon ))\) count of Clifford + T gates.

Proof

According to28, with single- and two-qubit gates, an arbitrary quantum state \(\vert {\psi }_{{{{\rm{targ}}}}}\rangle\) can be expressed as

$$\begin{array}{r}\vert {\psi }_{{{{\rm{targ}}}}}\rangle =\left(\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{y}\right)\left(\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{z}\right)\left\vert 0\cdots 0\right\rangle ,\end{array}$$
(11)

where \({F}_{j}^{z}\) and \({F}_{j}^{y}\) are uniformly controlled Z- and Y-rotations

$${F}_{j}^{z}=\mathop{\sum }\limits_{k=0}^{{2}^{j-1}-1}\left\vert k\right\rangle \left\langle k\right\vert \otimes {R}_{z}({\alpha }_{j,k}^{z})\otimes {{\mathbb{I}}}_{{2}^{n-j}},$$
(12a)
$${F}_{j}^{y}=\mathop{\sum }\limits_{k=0}^{{2}^{j-1}-1}\vert k\rangle\langle k\vert \otimes {R}_{y}({\alpha }_{j,k}^{y})\otimes {{\mathbb{I}}}_{{2}^{n-j}},$$
(12b)

with single qubit rotation gates \({R}_{y}(\theta )={e}^{-i\theta {\sigma }_{y}/2},{R}_{z}(\theta )={e}^{-i\theta {\sigma }_{z}/2}\). Here \({\alpha }_{j,k}^{y}\in {\mathbb{R}}\) and \({\alpha }_{j,k}^{z}\in {\mathbb{R}}\) are some rotation angles, the exact values of which are not important for our analysis.

Single-qubit rotations can be approximated with Clifford + T gates. According to ref. 48, unitary \({\widetilde{u}}_{z}\) satisfying \(\Vert {\widetilde{u}}_{z}-{R}_{z}({\alpha }_{j,k}^{z}/2)\Vert\, \leqslant \,{\varepsilon }_{j}/2\) can be constructed with \(O(\log (1/{\varepsilon }_{j}))\) single-qubit Clifford + T gates without ancilla. Accordingly, we can implement single-qubit-controlled-\({R}_{z}({\alpha }_{j,k}^{z},{\varepsilon }_{j})\), such that

$$\begin{array}{r}\left\Vert {R}_{z}\left({\alpha }_{j,k}^{z},{\varepsilon }_{j}\right)-{R}_{z}\left({\alpha }_{j,k}^{z}\right)\right\Vert \,\leqslant \,{\varepsilon }_{j}\end{array}$$
(13)

with the following circuit.

Note that \({\widetilde{u}}_{z}^{{\dagger} }\) can be realized by the inverse conjugation of the Clifford + T gate sequence of \({\widetilde{u}}_{z}\). Similar argument is also applied for \({R}_{y}({\alpha }_{j,k}^{y})\). Then, according to Lemma 6 as will be introduced in the next section, one can construct the following unitaries

$${\widetilde{F}}_{y}^{j}=\mathop{\sum }\limits_{k=0}^{{2}^{j-1}-1}\left\vert k\right\rangle \left\langle k\right\vert \otimes {\widetilde{R}}_{y}\left({\alpha }_{j,k}^{y},{\varepsilon }_{j}\right)\otimes {{\mathbb{I}}}_{{2}^{n-j}}$$
(14)
$${\widetilde{F}}_{z}^{j}=\mathop{\sum }\limits_{k=0}^{{2}^{j-1}-1}\left\vert k\right\rangle \left\langle k\right\vert \otimes {\widetilde{R}}_{z}\left({\alpha }_{j,k}^{z},{\varepsilon }_{j}\right)\otimes {{\mathbb{I}}}_{{2}^{n-j}}$$
(15)

with j ancillary qubits, \(O({2}^{j}\log (1/{\varepsilon }_{j}))\) depth and \(O({2}^{j}\log (1/{\varepsilon }_{j}))\) count of Clifford + T gates. We therefore approximate the target state with the following

$$\begin{array}{r}\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle =\left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{z}\right)\left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{y}\right)\left\vert 0\cdots 0\right\rangle .\end{array}$$
(16)

In below, we first bound the distance between \(\left\vert {\psi }_{{{{\rm{targ}}}}}\right\rangle\) and \(\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle\). It can be verified that \(\Vert {\widetilde{F}}_{j}^{y}\vert \psi \rangle -{F}_{j}^{y}\vert \psi \rangle \Vert \,\leqslant \,{\varepsilon }_{j}\) and \(\Vert {\widetilde{F}}_{j}^{z}\vert \psi \rangle -{F}_{j}^{z}\vert \psi \rangle \Vert\, \leqslant \,{\varepsilon }_{j}\) for any quantum state \(\left\vert \psi \right\rangle\). In other words, we have \(\Vert {\widetilde{F}}_{j}^{y}-{F}_{j}^{y}\Vert\, \leqslant\, {\varepsilon }_{j}\) and \(\Vert {\widetilde{F}}_{j}^{z}-{F}_{j}^{z}\Vert \,\leqslant \,{\varepsilon }_{j}\). Therefore,

$$\begin{array}{ll}\quad\left\Vert \left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{y}-\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{y}\right)\right\Vert \\ \leqslant \,\left\Vert {\widetilde{F}}_{n}^{y}\left(\mathop{\prod }\limits_{j=1}^{n-1}{\widetilde{F}}_{j}^{y}-\mathop{\prod }\limits_{j=1}^{n-1}{F}_{j}^{y}\right)\right\Vert +\left\Vert \left({\widetilde{F}}_{n}^{y}-{F}_{n}^{y}\right)\mathop{\prod }\limits_{j=1}^{n-1}{F}_{j}^{y}\right\Vert \\ \leqslant \left\Vert \left(\mathop{\prod }\limits_{j=1}^{n-1}{\widetilde{F}}_{j}^{y}-\mathop{\prod }\limits_{j=1}^{n-1}{F}_{j}^{y}\right)\right\Vert +{\varepsilon }_{n}\\ \cdots \\ \leqslant \mathop{\sum }\limits_{j=1}^{n}{\varepsilon }_{j}.\end{array}$$
(17)

In a similar way, we can obtain \(\left\Vert \left(\mathop{\prod }\nolimits_{j = 1}^{n}{\widetilde{F}}_{j}^{z}-\mathop{\prod }\nolimits_{j = 1}^{n}{F}_{j}^{z}\right)\right\Vert\, \leqslant \,\mathop{\sum }\nolimits_{j = 1}^{n}{\varepsilon }_{j}\). So we have

$$\begin{array}{ll}\quad\Vert \vert {\psi }_{{{{\rm{targ}}}}}\rangle -\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle \Vert \\\leqslant\left\Vert \left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{y}\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{z}-\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{y}\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{z}\right)\right\Vert \\ \leqslant \left\Vert \left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{z}-\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{z}\right)\right\Vert +\left\Vert \left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{y}-\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{y}\right)\right\Vert \\ \leqslant 2\mathop{\sum }\limits_{j=1}^{n}{\varepsilon }_{j}.\end{array}$$
(18)

According to Eq. (18), to control the total error rate to a constant value i.e. \(\parallel \vert {\psi }_{{{{\rm{targ}}}}}\rangle -\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle \parallel \,\leqslant \,\varepsilon\), it suffice to set εj = ε/2nj+1. Because each \({\widetilde{F}}_{j}^{y}\) or \({\widetilde{F}}_{j}^{z}\) require \(O({2}^{j}\log (1/{\varepsilon }_{j}))\) gate count and circuit depth, the total gate count is

$$\begin{array}{ll}\quad{C}\,=\,\mathop{\sum }\limits_{j=0}^{n-1}O({2}^{j}\log (1/{\varepsilon }_{j}))\\ \qquad=\,\mathop{\sum }\limits_{j=0}^{n-1}O({2}^{j}\log ({2}^{n-j}/\varepsilon ))\\ \qquad=\,O(N\log (1/\varepsilon )).\end{array}$$
(19)

Similarly, the total circuit depth is

$$\begin{array}{r}D=\mathop{\sum }\limits_{j=0}^{n-1}O({2}^{j}\log (1/{\varepsilon }_{j}))=O(N\log (1/\varepsilon )).\end{array}$$
(20)

We then consider the quantum state preparation with exponential ancillary qubits. Our protocol follows the same idea in34 with improvement.

Lemma 4

Arbitrary n-qubit quantum state can be prepared with O(N) ancillary qubits, \(O(n\log (n/\varepsilon ))\) depth and \(O(N\log (1/\varepsilon ))\) count of Clifford + T gates.

Proof

Our construction is based on the protocol in34 with revision and improved Clifford+T decomposition.

General procedure. The hardware layout of our method contains a binary tree of qubits with n + 1 layers, which is denoted as H. The lth (with 0 ln) layer of H is denoted as Hl. For 1 ln, Hl connects to another binary tree of qubits, denoted as Vl. The root of the tree Vl serves as the lth data qubit, and we denote it as dl here.

Our protocol for preparing target state \(\vert {\psi }_{{{{\rm{targ}}}}}\rangle =\mathop{\sum }\nolimits_{k = 0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert k\right\rangle }_{{{{\rm{d}}}}}\) works as follows. We initialize the root of H as \({\left\vert 1\right\rangle }_{{H}_{1}}\) while all other qubits are at state \(\left\vert 0\right\rangle\). In the first stage, H is prepared at the quantum state (qubits at state \(\left\vert 0\right\rangle\) are not shown)

$$\begin{array}{r}{\left\vert 1\right\rangle }_{{H}_{1}}\to \mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}.\end{array}$$
(21)

Here, \(\left\vert {\varphi }_{k}\right\rangle\) is one of the computational basis of H to be defined later. In the second stage, the data qubits are transferred to the n-qubit computational basis \({\left\vert k\right\rangle }_{{{{\rm{d}}}}}\) conditioned on \(\left\vert {\varphi }_{k}\right\rangle\), i.e.

$$\begin{array}{r}\mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}\to \mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}{\left\vert k\right\rangle }_{{{{\rm{d}}}}}.\end{array}$$
(22)

Finally, the binary tree H is uncomputed

$$\begin{array}{r}\mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}{\left\vert k\right\rangle }_{{{{\rm{d}}}}}\to \mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert 0\right\rangle }_{H}{\left\vert k\right\rangle }_{{{{\rm{d}}}}}.\end{array}$$
(23)

The target state is then obtained after tracing out H. The readers are refereed to34 for more details. Transformations in Eq. (22) and Eq. (23) can be ideally realized using Clifford circuit with O(n) depth and O(2n) gate count. On the other hand, the first stage for obtaining Eq. (21) contains rotation that has to be approximated with T gates and hence more complicated. So we focus on Eq. (21) in below.

Realization of Eq. (21). We will first show how Eq. (21) can be realized with single-qubit and CNOT gates with a method slightly different from34, and then introduce its Clifford + T decomposition.

We define αn,k ≡ ak and \({\alpha }_{L,k}=\,{{\mbox{arg}}}\,({\alpha }_{L+1,2k})\sqrt{| {\alpha }_{L+1,2k}{| }^{2}+| {\alpha }_{L+1,2k+1}{| }^{2}}\) for all 0 ln − 1. Note that we can assume arg(α0) = 0 without loss of generality. For 0 Ln, we define

$$\left\vert {\Psi }_{L}\right\rangle =\mathop{\sum }\limits_{k=0}^{{2}^{L}-1}{\alpha }_{L,k}\bigotimes\limits_{l = 0}^{L}{\left\vert (k,l)\right\rangle }_{{H}_{l}}^{{\prime} }.$$
(24)

The realization of Eq. (21) contains n steps, with the Lth step corresponds to \(\left\vert {\Psi }_{L-1}\right\rangle \to \left\vert {\Psi }_{L}\right\rangle\).

In Eq. (24), we have defined (0, 0) ≡ 0, and (k, l) ≡ knkn−1knl+1 for l 1; \({\vert (k,l)\rangle }^{{\prime} }\equiv {\vert 0\rangle }^{\otimes (k,l)}\vert 1\rangle {\vert 0\rangle }^{\otimes {2}^{l}-(k,l)-1}\); Hl represents the lth layer of H. Eq. (21) and Eq. (24) have the correspondence \(\vert {\varphi }_{k}\rangle { = \bigotimes }_{l = 0}^{L}{\vert (k,l)\rangle }_{{H}_{l}}^{{\prime} }\) and \(\mathop{\sum }\nolimits_{k = 0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}=\left\vert {\psi }_{n}\right\rangle\). So \(\left\vert {\Psi }_{n}\right\rangle\) is the target state of the stage 1 introduced in Eq. (21).

We then introduce the realization of \(\left\vert {\Psi }_{L-1}\right\rangle \to \left\vert {\Psi }_{L}\right\rangle\). We define single qubit rotation \({r}_{y}(\theta )=\left(\begin{array}{rc}\cos \theta &\sin \theta \\ -\sin \theta &\cos \theta \end{array}\right)\) and \({r}_{z}(\phi )=\left(\begin{array}{rc}{e}^{-i\phi }&0\\ 0&{e}^{i\phi }\end{array}\right)\), and a three-qubit controlled operation as follows.

a, b, c are labels of the corresponding qubits. Let \({\theta }_{l,j}\equiv \arccos ({b}_{l,2j}/{b}_{l-1,j})\) and ϕl,j = ϕl,2j+1 − ϕl,2j, at the Lth step (1 Ln), we implement the parallel rotation

$$\begin{array}{r}{W}_{L}=\mathop{\prod }\limits_{j=0}^{{2}^{L-1}-1}w({\theta }_{L,j};{\phi }_{L,j};{H}_{L-1,j};{H}_{L,2j};{H}_{L,2j+1})\end{array}$$
(25)

which costs O(1) depth and O(2L) count of single-qubit and CNOT gates. It can be verified that

$$\begin{array}{r}{W}_{L}\left\vert {\Psi }_{L-1}\right\rangle =\left\vert {\Psi }_{L}\right\rangle .\end{array}$$
(26)

The total single-qubit + CNOT depth and gate count are O(n) and O(2n) respectively.

Clifford + Tdecomposition. WL are assumed to be constructed with single- and two-qubit gates. In below, we discuss how to decompose it with Clifford +T gates with high accuracy. According to ref. 48, one can always construct a unitaries \({\tilde{r}}_{y}(\theta ;\varepsilon ),{\tilde{r}}_{z}(\phi ;\varepsilon )\), with \(O(\log (1/\varepsilon ))\) depth of gates in {H, S, T}, which satisfies

$$\begin{array}{r}\parallel {\tilde{r}}_{y}(\theta ;\varepsilon )-{r}_{y}(\theta )\parallel \,\leqslant \,\varepsilon ,\quad \parallel {\tilde{r}}_{z}(\phi ;\varepsilon )-{r}_{z}(\phi )\parallel\, \leqslant \,\varepsilon .\end{array}$$
(27)

Accordingly, we define \(\widetilde{w}(\theta ;\phi ;\varepsilon ;a;b;c)\) as the following transformation

We have

$$\begin{array}{ll}\quad\,\,\widetilde{w}(\theta ;\phi ;\varepsilon ;a;b;c)(a{\left\vert 0\right\rangle }_{a}{\left\vert 0\right\rangle }_{b}{\left\vert 0\right\rangle }_{c}+b{\left\vert 1\right\rangle }_{a}{\left\vert 0\right\rangle }_{b}{\left\vert 0\right\rangle }_{c})\\ \,=\,a{\left\vert 0\right\rangle }_{a}{\left\vert 0\right\rangle }_{b}{\left\vert 0\right\rangle }_{c}+{b}_{1}(\varepsilon )\left\vert 1\right\rangle \left\vert 10\right\rangle +{b}_{2}(\varepsilon )\left\vert 01\right\rangle ,\end{array}$$
(28)

for some \(\sqrt{| {b}_{1}(\varepsilon )-{b}_{1}(0){| }^{2}+| {b}_{2}(\varepsilon )-{b}_{2}(0){| }^{2}}\,\leqslant \,| b| \varepsilon\). We then define

$$\begin{array}{r}{\widetilde{W}}_{L}(\varepsilon )=\mathop{\prod }\limits_{j=0}^{{2}^{L-1}-1}\widetilde{w}({\theta }_{L,j};{\phi }_{L,j};\varepsilon ;{H}_{L-1,j};{H}_{L,2j};{H}_{L,2j+1}),\end{array}$$
(29)

which is used to approximate WL. From Eq. (28), it can be verified that \(\left\Vert {\widetilde{W}}_{L}(\varepsilon )\left\vert {\Psi }_{L-1}\right\rangle -{W}_{L}\left\vert {\Psi }_{L-1}\right\rangle \right\Vert\, \leqslant \,\varepsilon\). We set the accuracy at the Lth layer as εL, and define

$$\begin{array}{r}\vert {\widetilde{\Psi }}_{0}\rangle =\vert {\Psi }_{0}\rangle ,\quad \vert {\widetilde{\Psi }}_{L}\rangle ={\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\widetilde{\Psi }}_{L-1}\rangle.\end{array}$$
(30)

We have

$$\begin{array}{ll}\quad\Vert \vert {\widetilde{\Psi }}_{L}\rangle -\vert {\Psi }_{L}\rangle \Vert \\ =\Vert {\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\widetilde{\Psi }}_{L-1}\rangle -{W}_{L}\vert {\Psi }_{L-1}\rangle \Vert \\ \leqslant \Vert {\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\widetilde{\Psi }}_{L-1}\rangle -{\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\Psi }_{L-1}\rangle \Vert \\ \quad+\,\Vert {\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\Psi }_{L-1}\rangle -{W}_{L}\vert {\Psi }_{L-1}\rangle \Vert \\ \leqslant \Vert {\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\widetilde{\Psi }}_{L-1}\rangle -{\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\Psi }_{L-1}\rangle \Vert +{\varepsilon }_{L}\\ =\Vert \vert {\widetilde{\Psi }}_{L-1}\rangle -\vert {\Psi }_{L-1}\rangle \Vert +{\varepsilon }_{L}.\end{array}$$
(31)

By applying the inequality above iteratively from L = 1 to L = n, we have

$$\begin{array}{r}\Vert \vert {\widetilde{\Psi }}_{n}\rangle -\vert {\Psi }_{n}\rangle \Vert\, \leqslant \,\mathop{\sum }\limits_{L=1}^{n}{\varepsilon }_{L}.\end{array}$$
(32)

According to Eq. (32), to control the total error rate to a constant value, it suffices to set εL = Kε/(nL+1)2 for some constant K. This is the key step of our improved construction.

Circuit complexity. Each \({\widetilde{W}}_{L}\) can be realized with \(O({2}^{L}\log (1/{\varepsilon }_{L}))\) count and \(O(\log (1/{\varepsilon }_{L}))\) depth of Clifford + T gates. Therefore, the total gate count at stage 1 (Eq. (21)) is

$$\begin{array}{ll}C\,=\,O\left(\mathop{\sum }\limits_{L=1}^{n}{2}^{L}\log (1/{\varepsilon }_{L})\right)\\ \quad\,=\,O\left(\mathop{\sum }\limits_{L=1}^{n}{2}^{L}\log ({(n-L+1)}^{2}/\varepsilon )\right)\\ \quad\,=\,O\left({2}^{n+1}\mathop{\sum }\limits_{m=1}^{n}\frac{\log (m)}{{2}^{m}}\right)+O\left({2}^{n}\log (1/\varepsilon )\right)\\ \quad\,=\,O\left({2}^{n}\right)+O\left({2}^{n}\log (1/\varepsilon )\right)\\ \quad\,=\,O\left(N\log (1/\varepsilon )\right).\end{array}$$
(33)

The total circuit depth at stage 1 is

$$\begin{array}{ll}D\,=\,O\left(\mathop{\sum }\limits_{L=1}^{n}\log (1/{\varepsilon }_{L})\right)\\ \quad\,=\,O\left(\mathop{\sum }\limits_{L=1}^{n}\log ({(n-L+1)}^{2}/\varepsilon )\right)\\ \quad\,=\,O\left(\mathop{\sum }\limits_{m=1}^{n}\log ({m}^{2})\right)+O\left(n\log (1/\varepsilon )\right)\\ \quad\,=\,O\left(\log (n!)\right)+O\left(n\log (1/\varepsilon )\right)\\ \quad\,=\,O\left(n\log (n/\varepsilon )\right).\end{array}$$
(34)

Recall that Eqs. (22), (23) has O(N) count and O(n) depth of Clifford + T gates. So the total gate count and circuit depth are \(O\left(N\log (1/\varepsilon )\right)\) and \(O\left(n\log (n/\varepsilon )\right)\) respectively. □

We also cares about the controlled quantum state preparation. In our preparation scheme, the initial state is \({\left\vert 1\right\rangle }_{{H}_{1}}\), i.e. the root of H is set as \(\left\vert 1\right\rangle\). If we set H1 as \({\left\vert 0\right\rangle }_{{H}_{1}}\) instead, it can be verified that the output state is \({\left\vert 0\cdots 0\right\rangle }_{{{{\rm{d}}}}}\). Therefore, to implement controlled state preparation, one can simply replace the root qubit H1 by the controlled qubit, and the circuit complexity remains unchanged. In other words, we have the following result.

Lemma 5

Arbitrary single-qubit-controlled n-qubit state preparation unitary can be constructed with O(N) ancillary qubits, \(O(n\log (n/\varepsilon ))\) depth and \(O(N\log (1/\varepsilon ))\) count of Clifford + T gates.

Based on Lemma 3, Lemma 4 and Lemma 5, We have the following result for intermediate number of ancillary qubits. Note that Theorem 3 in the main text follows directly from Theorem 8.

Theorem 8

(space-time tradeoff QSP). With nanc ancillary qubits where Ω(n) nancO(2n), state preparation and controlled state preparation of an arbitrary n-qubit quantum state can be realized with precision ε with \(O(N\log (1/\varepsilon ))\) count and \(O\left(N\frac{\log ({n}_{{{{\rm{anc}}}}})\log (\log ({n}_{{{{\rm{anc}}}}})/\varepsilon )}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates.

Proof

We separate all data qubits into two registers. Register A contains the last \({n}_{a}=n-\lfloor {\log }_{2}m\rfloor\) data qubits, and register B contains the first \({n}_{b}=\lfloor {\log }_{2}m\rfloor\) qubits for some nm 2n. We define \({N}_{a}={2}^{{n}_{a}}\) and \({N}_{b}={2}^{{n}_{b}}\). The target state can be rewritten as

$$\begin{array}{r}\vert {\psi }_{{{{\rm{targ}}}}}\rangle =\mathop{\sum }\limits_{k=0}^{{N}_{a}-1}{\beta }_{k}{\left\vert k\right\rangle }_{A}{\left\vert {\phi }_{k}\right\rangle }_{B}\end{array}$$
(35)

for some normalized βk, and normalized quantum states \(\left\vert {\phi }_{k}\right\rangle\). We define \({\left\vert {\psi }_{a}\right\rangle }_{A}=\mathop{\sum }\nolimits_{k = 0}^{{N}_{a}-1}{\beta }_{k}{\left\vert k\right\rangle }_{A}\).

In the first step, we prepare register A to a quantum state

$$\begin{array}{r}{\left\vert {\widetilde{\psi }}_{a}\right\rangle }_{A}=\mathop{\sum }\limits_{k=0}^{{N}_{a}-1}{\tilde{\beta }}_{k}{\left\vert k\right\rangle }_{A}\end{array}$$
(36)

which satisfies \(\left\Vert \left\vert {\psi }_{a}\right\rangle -\left\vert {\widetilde{\psi }}_{a}\right\rangle \right\Vert \,\leqslant \,\varepsilon /2\). According to Lemma 3, this step can be realized with \(O({N}_{a}\log (1/\varepsilon ))=O\left(\frac{N}{m}\log (1/\varepsilon )\right)\) count and depth of Clifford + T circuit.

In the second step, we implement

$$\begin{array}{c}\,{{\mbox{Select}}}\,({\widetilde{G}}_{k})\mathop{\sum }\limits_{k=0}^{{N}_{a}-1}{\widetilde{\beta }}_{k}{\left\vert k\right\rangle }_{A}{\left\vert {0}^{{n}_{b}}\right\rangle }_{B}\,=\,\mathop{\sum }\limits_{k=0}^{{N}_{a}-1}{\widetilde{\beta }}_{k}{\left\vert k\right\rangle }_{A}{\vert {\widetilde{\phi }}_{k}\rangle }_{B}\\ \qquad\qquad\qquad\qquad\equiv \vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle \end{array}$$
(37)

where \({\widetilde{G}}_{k}\) is a state preparation unitary satisfying \({\widetilde{G}}_{k}\vert 0\rangle =\vert {\widetilde{\phi }}_{k}\rangle\) for some \(\Vert \vert {\phi }_{k}\rangle -\vert {\widetilde{\phi }}_{k}\rangle \Vert\, \leqslant \,\varepsilon /2\). It can be then verified that \(\Vert \vert {\psi }_{{{{\rm{targ}}}}}\rangle -\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle \Vert\, \leqslant \,\varepsilon\). According to Lemma 4, controlled-\({\widetilde{G}}_{k}\) such that \({\widetilde{G}}_{k}\vert {0}^{{n}_{b}}\rangle =\vert {\widetilde{\phi }}_{k}\rangle\) can be constructed with O(m) ancillary qubits, \(O({N}_{b}\log (1/\varepsilon ))\) count and \(O({n}_{b}\log ({n}_{b}/\varepsilon ))=O(\log (m)\log (\log (m)/\varepsilon ))\) depth of Clifford + T gates. Then, according to Lemma 6, with O(m) ancillary qubits, Select\(({\widetilde{G}}_{k})\) can be constructed with

$$\begin{array}{r}C=O({N}_{a}\times {N}_{b}\log (1/\varepsilon ))=O(N\log (1/\varepsilon ))\end{array}$$
(38)

gate count, and

$$\begin{array}{ll}D\,=\,O\left({N}_{a}\times \log (m)\log (\log (m)/\varepsilon )\right)\\ \quad\,=\,O\left(N\frac{\log (m)\log (\log (m)/\varepsilon )}{m}\right)\end{array}$$
(39)

depth of Clifford + T gates. By setting nanc = O(m) for some nancn, we complete the proof. □

Select oracle for general unitary functions

Suppose x is an m-bit bitstring, and Ux are general unitaries. We consider the unitary

$$\begin{array}{r}\,{{\mbox{Select}}}\,({U}_{x})=\mathop{\sum }\limits_{x=0}^{M-1}\left\vert x\right\rangle \left\langle x\right\vert \otimes {U}_{x},\end{array}$$
(40)

where M = 2m. In below, we discuss how to construct Select(Ux) based on the implementation of single-qubit-controlled-Ux, and the corresponding circuit complexity upper bound. We define Cctrl(Ux, r) and Dctrl(Ux, r) as the count and depth of Clifford + T gates required to construct the controlled-Ux, given r ancillary qubits. The following result corresponds to the case with m + r ancillary qubits.

Lemma 6

(Appendix G.4 of38). With m + r ancillary qubits, Select(Ux) can be constructed with O(MCctrl(Ux, r)) count and O(MDctrl(Ux, r)) depth of Clifford + T gates.

Proof

We introduce an ancillary register with m qubits. We denote the jth qubit at the index register (encoding \(\left\vert x\right\rangle\)) and ancillary registers as Cj, Aj respectively. We also denote C = [C1, C2,  , Cm] and A = [A0, A1, A2,  , Am]. A0 is initialized as \(\left\vert 1\right\rangle\) while all other ancillary qubits are initialized as \(\left\vert 0\right\rangle\). □

Eq. (40) can be realized by querying Select(C, A, m, 0), which is defined recursively by Algorithm 1. In Algorithm 1, Toffoli(a, b; c) is the Toffoli gate with qubit a and b as the controlled qubits and c as the target qubit; C-Ux(a) is the controlled-Ux with qubit a as controlled qubit and the corresponding word register as target qubits; dim(v) represent the dimension of the vector (for example, dim(C) = n); vj represents the jth element of v and vj: = [vj, vj+1,  , vdim(v)].

Algorithm 1

Select(y, q, l, x)

1: if l ≠ 0:

2:  Toffoli(\(\overline{{y}_{1}},{q}_{1};{q}_{2}\))

3:  Select(y2:, q2:, l − 1, x)

4:  Toffoli(\(\overline{{y}_{1}},{q}_{1};{q}_{2}\))

5:  Toffoli(y1, q1; q2)

6:  Select(y2:, q2:, l − 1, x + 2l−1)

7:  Toffoli(y1, q1; q2)

8: elseif l = 0:

9:  C-Ux(q1)

10: end if

In our implementation, the controlled-Ux are queried for totally M times with x {0,  , m − 1} sequentially. Moreover, there are totally O(M) Toffoli gates acting sequentially. Therefore, the total gate count and circuit depth are O(MCctrl(Ux, r)) and O(MDctrl(Ux, r)) respectively.

We note that Algorithm 1 can be further simplified by combining some concatenated gates38. But the asymptotic scaling here is optimal.

We then consider the construction of expoential ancillary qubits. In Algorithm 4,5 of34, based on the bucket-brigade architecture for quantum random access memory49,50,51, it has been shown that any Select(Ux) can be constructed by 4M − 1 ancillary qubits, O(M) Clifford + T gates arranged in O(m) circuit depth, and queries to all single-qubit-controlled-Ux for x {0,  , M − 1} in parallel. If each controlled-Ux uses r ancillary qubits, we require totally M(4 + r) − 1 ancillary qubits, because they are implemented in parallel. To sum up, we have the following result.

Lemma 7

(many qubit Select oracle). With M(4 + r) − 1 ancillary qubits, Select(Ux) can be constructed with O(MCctrl(Ux, r)) count and O(m + Dctrl(Ux, r)) depth of Clifford + T gates.

Select oracle for general unitary functions

In below, we give the proof of Theorem 4 about the construction of select oracles for Pauli strings defined in Eq. (6). Note that Eq. (6) is a special case of Eq. (40) with Ux {±I, ±X, ±Y, ±Z}L.

proof of Theorem 4

Recall that Select oracle for Pauli strings corresponds to Eq. (40) with Ux = Hx, where \({H}_{x}{ = \bigotimes }_{l = 1}^{L}{H}_{x,l}\) and Hx,l { ± I, ± X, ± Y, ± Z}.

Given L ancillary qubits, controlled-Hx can be constructed with the following circuit.

where control qubit is denoted as c, ancillary qubits, all initialized as \(\left\vert 0\right\rangle\), are denoted as a1, a2,  , aL and target qubits are denoted as t1, t2,  , tL respectively. Two of the L-Toffoli gates can be effectively constructed with O(L) count and \(O(\log L)\) depth of Clifford + T gates. All controlled Pauli gates can be constructed with totally O(L) count and O(1) depth of Clifford + T gates. In other words, we have Cctrl(Hx, L) = O(L) and \({D}_{{{{\rm{ctrl}}}}}({H}_{x},L)=O(\log L)\).

Our protocol of constructing Select(Hx) uses at least Ω(m + L) ancillary qubits. We divide the m-qubit index registers into two subregisters A and B with \({m}_{a}\,\geqslant \,{\log }_{2}(m+L)\) and mb = m − ma qubits respectively. Let \({M}_{a}={2}^{{m}_{a}},{M}_{b}={2}^{{m}_{b}},\,{{\mbox{Select}}}\,({H}_{x})\) can be rewritten as

$$\,{{\mbox{Select}}}\,({H}_{x})=\mathop{\sum }\limits_{{x}_{a}=0}^{{M}_{a}-1}\left\vert {x}_{a}\right\rangle \left\langle {x}_{a}\right\vert \otimes {V}_{{x}_{a}}$$
(41)
$${V}_{{x}_{a}}=\mathop{\sum }\limits_{{x}_{b}=0}^{{M}_{b}-1}\left\vert {x}_{b}\right\rangle \left\langle {x}_{b}\right\vert \otimes {H}_{{x}_{a}\oplus {x}_{b}}.$$
(42)

xa and xb are bit strings with ma and mb bits respectively, and x ≡ xaxb. According to Lemma 7, \({V}_{{x}_{a}}\) can be constructed with Ma(4 + L) − 1 ancillary qubits, O(MaL) count and \(O({m}_{a}+\log L)\) depth of Clifford + T gates. According to Lemma 6, with totally nanc = Ma(4 + L) − 1 + mb ancillary qubits, the Clifford + T gate count of Select(Hx) is

$$\begin{array}{r}C=O({M}_{b}{M}_{a}L)=O(ML).\end{array}$$
(43)

The Coifford + T depth is

$$\begin{array}{ll}D\,=\,O({M}_{b}({m}_{a}+\log L))\\ \quad\,=\,O\left(M\frac{\log ({M}_{a}L)}{{M}_{a}}\right)\\ \quad\,=\,O\left(M\frac{\log (({n}_{{{{\rm{anc}}}}}+1)/4)}{({n}_{{{{\rm{anc}}}}}+1)/4L}\right)\\ \quad\,=\,O\left(ML\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right),\end{array}$$
(44)

which completes the proof. □

Details about LCU-based Block-encoding

Without loss of generality, we assume that \(m={\log }_{2}P\) is an integer. We let \({\tilde{G}}_{\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle }\) be a state preparation unitary satisfying \(\Vert {\tilde{G}}_{\vert {{{\boldsymbol{\alpha }}}}\rangle }{\vert {0}^{m}\rangle }_{{{{\rm{anc}}}}}-{G}_{\vert {{{\boldsymbol{\alpha }}}}\rangle }{\vert {0}^{m}\rangle }_{{{{\rm{anc}}}}}\Vert \,\leqslant\, \varepsilon /3\). Let \({\tilde{u}}_{p}\) be unitaries satisfying \(\parallel {\tilde{u}}_{p}-{u}_{p}\parallel\, \leqslant\, \varepsilon /3\). We then define

$$\tilde{{\mathbb{G}}}\equiv {\tilde{G}}_{\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle }\otimes {{\mathbb{I}}}_{N},$$
(45)
$$U\equiv {{\mathbb{G}}}^{{\dagger} }\,{{\mbox{Select}}}\,({u}_{p}){\mathbb{G}},$$
(46)
$$\tilde{U}\equiv {\tilde{{\mathbb{G}}}}^{{\dagger} }\,{{\mbox{Select}}}\,({u}_{p})\tilde{{\mathbb{G}}},$$
(47)

and \(W\equiv \tilde{U}-U\). With a similar argument to Eq. (31), we have

$$\begin{array}{r}\left\Vert W\left\vert \Psi \right\rangle \right\Vert\, \leqslant \,\varepsilon ,\end{array}$$
(48)

where \(\left\vert \Psi \right\rangle =\left\vert {0}^{m}\right\rangle \otimes \left\vert \psi \right\rangle\) and \(\left\vert \psi \right\rangle\) is an arbitrary N-dimensional quantum state. We may rewrite W as

$$W=\left(\begin{array}{cc}\delta H&{W}_{1,2}\\ {W}_{2,1}&{W}_{2,2}\end{array}\right)$$
(49)

where \(\delta H\in {{\mathbb{C}}}^{P\times P},{W}_{1,2}\in {{\mathbb{C}}}^{N\times P},{W}_{2,1}\in {{\mathbb{C}}}^{P\times N}\) and \({W}_{2,2}\in {{\mathbb{C}}}^{N\times N}\). Note that if \(\parallel \delta H\parallel \,\leqslant \,\varepsilon ,\tilde{U}\) is a (m, ε)-block-encoding of H. We have

$$W\left\vert \Psi \right\rangle =\left(\begin{array}{cc}\delta H&{W}_{1,2}\\ {W}_{2,1}&{W}_{2,2}\end{array}\right)\left(\begin{array}{c}\left\vert \psi \right\rangle \\ 0\end{array}\right)=\left(\begin{array}{c}\delta H\left\vert \psi \right\rangle \\ {W}_{2,1}\left\vert \psi \right\rangle \end{array}\right)$$
(50)

Combining Eq. (48) with Eq. (50), we have

$$\begin{array}{r}\parallel \delta H\vert \psi \rangle \parallel\, \leqslant\, \parallel W\vert \Psi \rangle \parallel\, \leqslant\, \varepsilon .\end{array}$$
(51)

Because Eq. (51) is applied for arbitrary \(\left\vert \psi \right\rangle\), we have δHε. Therefore, \(\tilde{U}\) is a (m, ε)-block-encoding to H. We can now study the efficiency of block-encoding.

The actual circuit complexity depends on the form of up. We now proof Theorem 7 which corresponds to up {±I, ±X, ±Y, ±Z}n.

Proof of Theorem 7

With nanc ancillary qubits where \({\log }_{2}P\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(P),\tilde{{\mathbb{G}}}\) can be constructed with \(O(P\log (1/\varepsilon ))\) count and \(O\left(P\frac{\log ({n}_{{{{\rm{anc}}}}})\log (\log ({n}_{{{{\rm{anc}}}}})/\varepsilon )}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates. With \(\Omega ({\log }_{2}P)\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(Pn)\), Select(Hx) can be constructed with O(nP) count and \(O\left(nP\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) depth of Clifford + T gates. Therefore, the total gate count of Select(Hx) is \(O(P(n+\log (1/\varepsilon )))\). For \(\Omega ({\log }_{2}P)\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(P)\), the circuit depth is

$$O\left(P\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\left(n+\log \left(\log ({n}_{{{{\rm{anc}}}}})/\varepsilon \right)\right)\right)$$
(52)
$$=O\left(P\left(n+\log (1/\varepsilon )\right)\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right).$$
(53)

For Ω(P) nancO(Pn), the circuit depth for \(\tilde{{\mathbb{G}}}\) is \(O(\log P\log (\log (P)/\varepsilon ))=O(\log n\log (\log n/\varepsilon ))\), where we have used the assumption P = O(Poly(n)). Combining with circuit depth of Select(Hx), the total circuit depth for block-encoding is

$${\displaystyle{\begin{array}{ll}\quad\,\,{O}\left(\left(\frac{{n}_{{{{\rm{anc}}}}}\log (n)\log (\log (n)/\varepsilon )}{\log {n}_{{{{\rm{anc}}}}}}+nP\right)\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\\ \,=\,O\left(\left(\frac{nP\log (n)\log (\log (n)/\varepsilon )}{\log (nP)}+nP\right)\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\\ \,=\,O\left(\left(nP\log (\log (n))+nP\log (1/\varepsilon )\right)\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\\ \,=\,\tilde{O}\left(nP\log (1/\varepsilon )\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right),\end{array}}}$$
(54)

which completes the proof. □

Sparse Boolean memory

Recall that sparse Boolean memory performs the transformation \(\,{{\mbox{Select}}}\,(B){\vert q\rangle }_{{{{\rm{idx}}}}}{\vert z\rangle }_{{{{\rm{wrd}}}}}={\vert q\rangle }_{{{{\rm{idx}}}}}{\vert z\oplus B(q)\rangle }_{{{{\rm{wrd}}}}}\), idx represents an n-qubit index register, wrd represents a \(\tilde{n}\)-qubit register, and there are most s input digits q satisfying B(q) ≠ 0  0. We define qk as the kth input digit with nonzero output, and \({{{{{Q}}}}}_{B}\equiv \{{q}_{1},{q}_{2},\cdots \,,{q}_{s}\}\). In34, we have developed a construction of SBM with \(O(ns\tilde{n})\) ancillary qubits. The result is as follows.

Lemma 8

(Sec. III B in Supplemental Material of34). With \(O(ns\tilde{n})\) ancillary qubits, Select(B) can be realized with \(O(ns\tilde{n})\) count and \(O(\log (ns\tilde{n}))\) depth of Clifford + T gates.

Based on Lemma 8, we can obtain the gate complexity with intermediate number of ancillary qubits. The proof of Lemma 5 is given as follows.

Proof of Lemma 5

Let wrdl be the lth qubit of the word register, and zl be the lth digit of z. So \({\vert z\rangle }_{{{{\rm{wrd}}}}}=\mathop{\prod }\nolimits_{l = 1}^{\tilde{n}}{\vert {z}_{l}\rangle }_{{{{\mbox{wrd}}}}_{l}}\). Select(B) can be separated into multiple Boolean functions applied at different words. Let Bl(q) be the lth digit of B(q), and \({B}_{{l}_{\min }:{l}_{\max }}(q)\equiv {B}_{{l}_{\max }}(q)\cdots {B}_{{l}_{\min }+1}(q){B}_{{l}_{\min }}(q)\). We define \(\,{{\mbox{Select}}}\,({B}_{{l}_{\min }:{l}_{\max }})\) as a unitary satisfying

$$\,{{\mbox{Select}}}\,({B}_{{l}_{\min }:{l}_{\max }}){\left\vert q\right\rangle }_{{{{\rm{idx}}}}}\mathop{\prod }\limits_{l={l}_{\min }}^{{l}_{\max }}{\left\vert {z}_{l}\right\rangle }_{{{{\mbox{wrd}}}}_{l}}$$
(55)
$$={\left\vert q\right\rangle }_{{{{\rm{idx}}}}}\mathop{\prod }\limits_{l={l}_{\min }}^{{l}_{\max }}{\left\vert {z}_{l}\oplus {B}_{l}(q)\right\rangle }_{{{{\mbox{wrd}}}}_{l}}.$$
(56)

For any \(1={l}_{0}\, < \,{l}_{1}\, < \,\cdots\, < \,{l}_{{n}^{{\prime} }}=\tilde{n}+1\), it can be verified that

$$\begin{array}{r}\,{{\mbox{Select}}}(B)=\mathop{\prod }\limits_{r=1}^{{n}^{{\prime} }}{{\mbox{Select}}}\,({B}_{{l}_{r-1}:{l}_{r}-1}).\end{array}$$
(57)

We also define Select(Bl) = Select(Bl:l). For each Bl, we further define Boolean functions \({B}_{l,{k}_{\min }:{k}_{\max }}(q)={B}_{l}(q)\wedge ({k}_{\min }\,\leqslant \,k\,\leqslant\, {k}_{\max })\) for \({k}_{\min }\leqslant {k}_{\max }\). For any \(0={k}_{0}\, < \,{k}_{1}\, < \,\cdots \,< \,{k}_{{s}^{{\prime} }}=s\), it can be verified that

$$\begin{array}{r}\,{{\mbox{Select}}}({B}_{l})=\mathop{\displaystyle{\prod }}\limits_{j=1}^{{s}^{{\prime} }}{{\mbox{Select}}}\,({B}_{l,{k}_{j-1}:{k}_{j}-1}).\end{array}$$
(58)

We first consider the construction with ancillary qubit number \(O(ns)\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(ns\widetilde{n})\). In this case, we decompose Select(B(q)) with Eq. (57). We let d = nanc/(ns) and \({n}^{{\prime} }=\lceil \tilde{n}/d\rceil\), and

$$\begin{array}{r}{l}_{r}=\left\{\begin{array}{ll}rd+1&r \,< \,{n}^{{\prime} }\\ \tilde{n}+1&r={n}^{{\prime} }\end{array}\right.\,.\end{array}$$
(59)

According to Lemma 8, with nanc ancillary qubits, each \(\,{{\mbox{Select}}}\,({B}_{{l}_{r-1}:{l}_{r}-1})\) can be constructed with O(nsd) count and \(O(\log (nsd))=O(\log {n}_{{{{\rm{anc}}}}})\) depth of Clifford + T circuit. So the total gate count is \(O(nsd)\times {n}^{{\prime} }=O(ns\tilde{n})\), and the total circuit depth is \(O(\log (ns{n}_{{{{\rm{anc}}}}}))\times {n}^{{\prime} }=O\left(ns\tilde{n}\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\).

We then consider the construction with ancillary qubit number O(n) nancO(ns). In this case, we first perform the decomposition \(\,{{\mbox{Select}}}\,(B)=\mathop{\prod }\nolimits_{r = 1}^{\tilde{n}}\,{{\mbox{Select}}}\,({B}_{l})\). Then, we decompose each Select(Bl) with Eq. (58). We let w = m/n and \({s}^{{\prime} }=\lceil s/w\rceil\), and

$$\begin{array}{r}{k}_{j}=\left\{\begin{array}{ll}jw&j\, < \,{s}^{{\prime} }\\ s&j={n}^{{\prime} }\end{array}\right.\,.\end{array}$$
(60)

According to Lemma 8, with nanc ancillary qubits, each \(\,{{\mbox{Select}}}\,({B}_{l,{k}_{j-1}:{k}_{j}-1})\) can be constructed with O(nw) count and \(O(\log (nw))=O(\log {n}_{{{{\rm{anc}}}}})\) depth of Clifford + T circuit. So each Select(Bl) requires gate count \(O(nw)\times {s}^{{\prime} }=O(ns)\), and circuit depth \(O(\log ({n}_{{{{\rm{anc}}}}}))\times {s}^{{\prime} }=O\left(ns\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\). In this case, we have \({n}^{{\prime} }=\tilde{n}\) in Eq. (57), so the total gate count and circuit depth of Select(B(q)) is \(O(ns\tilde{n})\) and \(O\left(ns\tilde{n}\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\) respectively. □