Circuit complexity of quantum access models for encoding classical data

Zhang, Xiao-Ming; Yuan, Xiao

doi:10.1038/s41534-024-00835-8

Download PDF

Article
Open access
Published: 23 April 2024

Circuit complexity of quantum access models for encoding classical data

npj Quantum Information volume 10, Article number: 42 (2024) Cite this article

446 Accesses
Metrics details

Subjects

Abstract

How to efficiently encode classical data is a fundamental task in quantum computing. While many existing works treat classical data encoding as a black box in oracle-based quantum algorithms, their explicit constructions are crucial for the efficiency of practical algorithm implementations. Here, we unveil the mystery of the classical data encoding black box and study the Clifford + T complexity in constructing several typical quantum access models. For general matrices (even including sparse ones), we prove that sparse-access input models and block-encoding both require nearly linear circuit complexities relative to the matrix dimension. We also give construction protocols achieving near-optimal gate complexities. On the other hand, the construction becomes efficient with respect to the data qubit when the matrix is a linear combination of polynomial terms of efficiently implementable unitaries. As a typical example, we propose improved block-encoding when these unitaries are Pauli strings. Our protocols are built upon improved quantum state preparation and a select oracle for Pauli strings, which hold independent values. Our access model constructions provide considerable flexibility, allowing for tunable ancillary qubit numbers and offering corresponding space-time trade-offs.

Probing entanglement in a 2D hard-core Bose–Hubbard lattice

Article Open access 24 April 2024

Logical quantum processor based on reconfigurable atom arrays

Article Open access 06 December 2023

Constant-overhead fault-tolerant quantum computation with reconfigurable atom arrays

Article 29 April 2024

Introduction

Quantum computing offers speedups over the classical counterpart in different tasks, including factoring, searching, simulation, etc¹. However, the speedups, in many cases, rely on the existence of efficient oracles or access models to encode the related classical data². In this context, a function f(x) representing the classical data of interest is encoded using a unitary operation U_f, which acts as an oracle in the computation. To study quantum advantages, the number of queries to U_f in a quantum algorithm is compared to the number of queries to f(x) in classical algorithms. Quantum computing provides substantial reduction in query complexity for many problems of practical importance^3,4,5.

There are various access models to encode classical data. One commonly used access model is the sparse-access input model (SAIM)^{4,5,6,7,8,9,10,11,12}, which encodes general sparse matrices and outputs the value or position of the non-zero elements when provided with appropriate inputs. SAIM is initially introduced for Hamiltonian simulation and discrete quantum walks^4,6,7,8, and has then found broad applications in other fields such as machine learning^5,9,11 and classical oscillator simulations¹². For example, the quantum linear system problem could be solved with $\tilde{O}(\kappa )$ queries to SAIM^5,9,10, where κ represents the condition number of the matrix to be inverted.

Another important access model is block-encoding, which serves as a crucial subroutine for quantum signal processing^13,14 and its generalization—quantum singular-value transformations (QSVT)^10,15. The success of block-encoding enables the realization of Hamiltonian simulation with an optimal query complexity^13,14. Furthermore, many seminal quantum algorithms, including Grover’s algorithm, quantum Fourier transformation, and the HHL algorithm, could be viewed as special cases of QSVT, where the problem of interest is encoded using block-encoding¹⁵.

Many existing works treat access models as black boxes for convenience. However, the actual circuit complexity of the algorithm also depends on the cost of each query to these access models. While being important, this problem only draws much attention very recently with many basic problems still left open. In particular, ref. ¹⁶ presents a nearly time-optimal protocol for block-encoding of general dense matrices of 2ⁿ × 2ⁿ dimension. A circuit depth of $\tilde{O(n)}$ can be achieved at the expense of exponential ancillary qubits. ref. ¹⁷ examines matrices with D data each appearing M times and considers examples including checkerboard matrices and tridiagonal matrices with polynomial circuit complexities. However, the cost of block-encoding of more general matrices remains unexplored. Moreover, it is still unclear if there is a fundamental limit to the resource required by data encoding.

In this work, we provide a framework of constructing quantum access models in the fault-tolerant setting using Clifford + T gates. The protocol works for general classical data and takes the underlying structure of the data, such as sparsity and linear combintaion of unitaries (LCU), into consideration. Our results represent a direct mapping from the query complexity of quantum algorithms to their practical circuit complexity. Our protocols allow tunable ancillary qubit numbers and offer space-time trade-off. For general sparse matrices of dimension 2ⁿ = N, we investigate the SAIM and block-encoding. For both access models, we first show that the gate count lower bound increases about linearly with respect to N. We then develop construction algorithms with varying ancillary qubit numbers ranging from Ω(n) to O(N). Across the entire range of qubit numbers, we achieve nearly optimal circuit complexity. We next study the block-encoding of LCU. Efficient block-encoding is achievable when the matrix can be represented as a linear combination of a polynomial number of unitaries, which can be implemented using polynomial-size quantum circuits.

Our access model construction relies on optimized realizations of various subroutines that are independently valuable, including quantum state preparation, selective oracles for Pauli strings, and sparse Boolean functions. In all the listed operations, we achieve improved or at least comparable circuit complexities compared to the best-known realizations.

We now introduce the definition of SAIM and block-encoding in below. Let N = 2ⁿ, we consider a sparse matrix $H\in {{\mathbb{C}}}^{N\times N}$ with at most s = O(1) nonzero elements at each row and column. Let H_x,y be the value of the element at the xth row and yth column, and each H_x,y is a d-digit integer (d = O(1)). Let idx denote a 2n-qubit index register, and wrd denote an n-qubit word register, the sparse-access input model (SAIM) corresponds to two unitaries O_H, O_F, which satisfies

$${O}_{H}{\vert x,y\rangle }_{{{{\rm{idx}}}}}{\vert z\rangle }_{{{{\rm{wrd}}}}}={\vert x,y\rangle }_{{{{\rm{idx}}}}}{\vert z\oplus {H}_{x,y}\rangle }_{{{{\rm{wrd}}}}},$$

(1a)

$${O}_{F}{\vert x,k\rangle }_{{{{\rm{idx}}}}}={\vert x,F(x,k)\rangle }_{{{{\rm{idx}}}}}.$$

(1b)

Here, F(x, k) is the column index of the kth nonzero element in row x. Due to its simplicity and generality, Eq. (1) becomes one of the standard access models in quantum computing, which is usually assumed to be available in processing classical data.

We call a unitary U the block encoding of H if we have

$$\alpha \left(\left\langle {0}^{{n}_{{{{\rm{anc}}}}}}\right\vert \otimes {{\mathbb{I}}}_{N}\right)U\left(\left\vert {0}^{{n}_{{{{\rm{anc}}}}}}\right\rangle \otimes {{\mathbb{I}}}_{N}\right)=H,$$

where α > 0 is the normalization factor, n_anc is the number of ancillary qubits, and ${{\mathbb{I}}}_{N}$ is the N-dimensional identity. In practice, we may consider approximated construction of the block encoding. More specifically, we call unitary $\tilde{U}$ an (α, n_anc, ε) − block-encoding of H if

$$\begin{array}{r}\left\Vert H-\alpha \left(\left\langle {0}^{{n}_{{{{\rm{anc}}}}}}\right\vert \otimes {{\mathbb{I}}}_{N}\right)\tilde{U}\left(\left\vert {0}^{{n}_{{{{\rm{anc}}}}}}\right\rangle \otimes {{\mathbb{I}}}_{N}\right)\right\Vert\, \leqslant \,\varepsilon \end{array}$$

(2)

for error parameter ε ≥ 0. Throughout our manuscript, ∥⋅∥ represents either the spectral norm for matrices or Euclidean norm for vectors. For a general N-dimensional matrix H, the construction of its block-encoding requires Ω(Poly(N)) gate count. This is true even for sparse H as we show in Supplementary Discussion 2.

On the other hand, when H has some other structures, the resource may be significantly reduced. In particular, we consider H in the form of a linear combination of unitaries (LCU) as

$$\begin{array}{r}H=\mathop{\sum }\limits_{p=0}^{P-1}{\alpha }_{p}{u}_{p},\end{array}$$

(3)

where u_p are n-qubit unitaries that can be implemented with polynomial-size quantum circuit, and P = O(poly(n)). The concept “LCU” appeared firstly in¹⁸. The main purpose of ref. ¹⁸ and the follow-up work ref. ¹⁹ is to realize non-unitary transformation on quantum computers. In the context of Hamiltonian simulation, ref. ²⁰ has shown that LCU-based method can outperform product formula based methods. Many subsequent works with different applications have then been inspired^{14,15,21,22,23}.

Without loss of generality, we may assume that ${\log }_{2}P$ is an integer, and $\mathop{\sum }\nolimits_{p = 0}^{P-1}{\alpha }_{p}=1$. This can always be satisfied by adding terms with zero amplitude, and rescaling the Hamiltonian. In particular, the linear combination of Pauli strings

$$\begin{array}{r}H=\mathop{\sum }\limits_{p=0}^{P-1}{\alpha }_{p}{H}_{p}\end{array}$$

(4)

will be studied in details. Here, ${\alpha }_{p}\, > \,0,P\,\geqslant \,1,{H}_{p}{ = \bigotimes }_{l = 1}^{n}{H}_{p,l}$, and H_p,l ∈ { ± I, ± X, ± Y, ± Z} are single-qubit Pauli operators. Eq. (4) is important as it corresponds to the Hamiltonian of almost all physical quantum systems, such as the spin and molecular systems.

In our constructions, we consider the fault-tolerant quantum computing setting. More specifically, we only use two-qubit Clifford gate and single-qubit T gate, which is equivalent to the elementary gate set ${{{{\mathcal{G}}}}}_{{{{\rm{clf}}}}+T}\equiv \{H,S,T,\,{{\mbox{CNOT}}}\,\}$. All gates in ${{{{\mathcal{G}}}}}_{{{{\rm{clf}}}}+T}$ are error-correctable with surface code²⁴. We benchmark the circuit complexity of a given quantum circuit with three quantities: total number of elementary gates, total qubit number (including data qubits and ancillary qubits), and circuit depth. We will also discuss the space-time trade-off of our algorithm, i.e. the circuit depth under a certain number of ancillary qubits. We also allow at least O(n) ancillary qubits, because this does not increase the total space complexity.

Results

Circuit complexity lower bound

Before discussing the access model construction, we first study the lower bound of the circuit complexity. We will focus on the encoding of sparse matrices. The methodology here is general and can be readily applied to other related problems.

Our strategy is as follows. Firstly, we analyze the capacity of a quantum circuit with bounded resource, i.e. how much unique unitaries can be constructed, given fixed number of elementary gates or circuit depth. Secondly, we analyze the size of the access model, i.e. the number of unique unitaries required to approximate the access model with arbitrary parameters. The circuit complexity can then be estimated by comparing the capacity of a quantum circuit and the size of the access model. All proofs of our lemma and theorems in this section are provided in Supplementary Discussion 1.

Quantum circuit capacity

Assuming that we are given a finite two-qubit elementary gate set ${{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}$. We define $g\equiv | {{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}| =O(1)$ with ∣ ⋅ ∣ the number of elements in the set. Our first result is that the capacity can be lower bounded only with the number of elementary gates, independent of the space and time resources.

Lemma 1

Let ${{{{\mathcal{G}}}}}_{C}$ be the set containing all n-qubit unitaries that can be constructed with C elementary gates in ${{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}$. Then, we have $\log | {{{{\mathcal{G}}}}}_{C}| =O\left((C\log (C+n))\right)$, even with unlimited ancillary qubit number.

Lemma 1 implies that the capacity does not always increase with ancillary qubit number, which can be understood as follows. All ancillary qubits should be uncomputed at the end of the circuit. When C is fixed, only finite number of unitaries can satisfy this requirement, while constructable by those elementary gates. We also note that the circuit depth D is bounded by C, so Lemma 1 also implies a relation between capacity and circuit depth.

On the other hand, when the ancillary qubit number and circuit depth are finite, the lower bound of capacity can be tighten as follows.

Lemma 2

Let ${{{{\mathcal{G}}}}}_{{n}_{{{{\rm{anc}}}}},D}^{{\prime} }$ be the set containing all unitaries that can be constructed with n_anc ancillary qubits and D circuit depth. Then, we have $\log \left\vert {{{{\mathcal{G}}}}}_{{n}_{{{{\rm{anc}}}}},D}^{{\prime} }\right\vert =O\left(D(n+{n}_{{{{\rm{anc}}}}})\right)$.

Lemma 1, 2 represent the ultimate representational power of quantum circuits constructed with local gates. Lemma 1 and 2 can be used to estimate the circuit complexity lower bound whenever the tasks have requirement on $| {{{{\mathcal{G}}}}}_{C}|$ or $| {{{{\mathcal{G}}}}}_{{n}_{{{{\rm{anc}}}}},D}^{{\prime} }|$. Moreover, similar results can be obtained straightforwardly for other type of elementary gate sets, such as k-local operations with k > 2.

Circuit complexity for encoding sparse matrices

With Lemma 1 and 2, we now estimate the circuit complexity lower bound for accessing sparse matrices. For SAIM, it turns out that at least Ω(N!) unique unitaries are required to cover the set of all SAIM for 1-sparse matrices. So according to Lemma 1, 2, we have the following result.

Theorem 1

Given an arbitrary finite two-qubit elementary gate set ${{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}$. Let n_anc, D and C be the number of ancillary qubits, circuit depth and total number of gates in ${{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}$ required to approximate SAIM in Eq. (1) with any accuracy ε < 1. Then, we have (n + n_anc)D = Ω(2ⁿn) and C = Ω(2ⁿ).

A similar result is also obtained for the block-encoding of sparse matrix as follows.

Theorem 2

Given an arbitrary finite two-qubit elementary gate set ${{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}$. Let n_anc, D and C be the number of ancillary qubits, circuit depth and total number of gates in ${{{{\mathcal{G}}}}}_{{{{\rm{ele}}}}}$ required to construct the block-encoding of H with any accuracy ε < 2. Then, we have (n + n_anc)D = Ω(N) and C = Ω(N^α) for arbitrary α ∈ (0, 1).

Theorem 1, 2 imply that a general sparse matrix can not be encoded with subexponential quantum gates, for both SAIM and block-encoding. It is possible to trade ancillary qubit numbers for the circuit depth. However, the space and time complexities can not achieve sub-exponential scaling simultaneously. The hardness of SAIM can be interpreted as follows. Although H is assumed to be sparse (O(1) nonzero elements at each row and column), there are still totally 2ⁿ × O(1) = O(2ⁿ) number of independent variables in total. Therefore, the quantum circuit should be large enough to contain exponential number of elementary gates.

We note that the quantum circuits capacity for ancillary-free case has been studied in Section 4.5.4 of¹. Moreover, a related result to Theorem 1 has obtained in²⁵, which gives a distinct quantum circuit number lower bound with fixed qubit number, and show that there exists a table of size N requiring Ω(N) gate count. ref. ¹ allows approximated implementations, but does not consider ancillary qubit usage. ref. ²⁵ implicitly allows ancillary qubits, but does not consider approximated implementations. On the contrary, our results are more general, because both ancillary qubit usage and approximated implementations are allowed. Our results can be generalized from unitary to quantum channels. In Supplementary Discussion 1, we show that the circuit capacity and circuit lower bound are similar if we consider two-qubit quantum channels as elementary quantum operations, which can include measurement and feedback controls.

Quantum state preparation

Quantum state preparation is a critical step of our access model construction and of independent interest. We say that a (n + n_anc) qubit unitary G prepares the n-qubit quantum state $\vert \psi \rangle$ with accuracy ε if

$$\begin{array}{r}G(\vert {0}^{n}\rangle \otimes \vert {0}^{{n}_{{{{\rm{anc}}}}}}\rangle )=\vert \tilde{\psi }\rangle \otimes\vert {0}^{{n}_{{{{\rm{anc}}}}}}\rangle \end{array}$$

(5)

for some $\Vert \vert \psi \rangle -\vert \tilde{\psi }\rangle \Vert\, \leqslant \,\varepsilon$.

Such a problem has been studied extensively^{16,26,27,28,29,30,31,32,33,34,35,36,37}. When given sufficiently large among of ancillary qubits, the optimal Clifford + T depth $O(n+\log (1/\varepsilon ))$ can be achieved³⁷. However, with restricted ancillary qubit number, the optimal circuit depth has not been reached. For example, with O(n) ancillary qubits, the best-known Clifford + T construction has achieved $O((N/n)\log (N/\varepsilon ))$ circuit depths³². Besides, for gate count scaling, all existing algorithms have either O(Npoly(n)) or O(Npolylog(n)) Clifford + T count. It remains an outstanding question if a linear gate count scaling with respect to the data dimension N can be reached.

Here, we provide a family of improved quantum state preparation protocols with tunable ancillary qubit number. The result is summarized in below (follows directly from Theorem 8 in Methods).

Theorem 3

With n_anc ancillary qubits where Ω(n) ⩽ n_anc ⩽ O(N), an arbitrary n-qubit quantum state can be prepared to accuracy ε with $O(N\log (1/\varepsilon ))$ count and $\tilde{O}\left(N\log (1/\varepsilon )\frac{\log ({n}_{{{{\rm{anc}}}}})}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates, where $\tilde{O}$ suppresses the doubly logarithmic factors of n_anc.

Theorem 3 achieves linear scaling of Clifford + T count with respect to N, and this is applied for arbitrary space complexity. When n_anc = O(n), the circuit depth is lower than the best-known result of $O\left(N\frac{\log (N/\varepsilon )}{n}\right)$. Moreover, compared to³² which also study the space-time trade-off of state preparation, our method improves the circuit depth scaling for a factor of $\tilde{O}({n}_{{{{\rm{anc}}}}}/\log {n}_{{{{\rm{anc}}}}})$. Summary of some representative state preparation protocols are provided in Table 1 and Table 2.

Table 1 Clifford+T complexities of n-qubit state preparation protocols with fixed accuracy ε and total qubit (data qubit + ancillary qubit) number O(n)

Full size table

Table 2 Clifford+T complexities of n-qubit state preparation protocols with fixed ε and exponential ancillary qubits

Full size table

The main idea of our construction is as follows (see also Fig. 1). For n_anc = O(n), we construct the quantum state with a set of uniformly controlled rotations (UCR) with the method in²⁸. Instead of decomposing each UCR with identical accuracy, we distribute the decomposition error in an optimized way. UCR with m controlled qubits, denoted as m-UCR, should be decomposed into 2^m number of m-qubit controlled rotations. When performing Clifford + T decomposition, to reduce the total circuit complexity, we allow larger decomposition accuracy when m becomes larger.

**Fig. 1: State preparation achieving O(N) Clifford + T count for few qubit case.**

For n_anc = O(N), we improve the Clifford + T decomposition of the method in³⁴ in a similar way. In both cases, the gate count scaling $O(N\log (1/\varepsilon ))$ is achieved. For arbitrary ancillary qubit number between two extreme cases, we provide a scheme combing two protocols together, which allows space-time trade-off. Details of our state preparation scheme and the corresponding complexity analysis are provided in Methods. We also note that our protocol for few qubit case can be combined with the depth-optimal scheme in³⁷. The circuit depth can then be improved to $O(N\log (1/\varepsilon ){n}_{{{{\rm{anc}}}}}/\log ({n}_{{{{\rm{anc}}}}}))$, at the cost of higher gate count.

We note that when the quantum state is sparse, the circuit complexity will be significantly lower. The construction of sparse state preparation is useful for sparse block-encoding. Details about sparse state preparation and sparse matrix block-encoding are provided in Supplementary Discussion 2.

Other useful subroutines

Before discussing the construction of access models in Eq. (1) and Eq. (2), we introduce some other useful subroutines, including select oracle and quantum sparse Boolean memory. These operations may have applications individually in some other scenarios. For both operations, we obtain their space-time trade-off constructions, which have improved or comparable Clifford + T complexities compared to the best-known realizations (see also Table 3).

Table 3 Summary of the Clifford + T circuit complexities of the operations serving as subroutines in this work

Full size table

Select oracle for Pauli strings

We consider a function of Pauli strings ${H}_{x}{ = \bigotimes }_{l = 1}^{L}{H}_{x,l}$, where x ∈ {0, 1, ⋯ , 2^m − 1} and H_x,l ∈ { ± I, ± X, ± Y, ± Z}. We introduce two registers, the index register contains m qubits, and the word register contains L qubits. Select oracle for H_x is defined as

$$\begin{array}{r}\,{{\mbox{Select}}}\,({H}_{x})=\mathop{\sum }\limits_{x=0}^{{2}^{m}-1}\left\vert x\right\rangle \left\langle x\right\vert \otimes {H}_{x},\end{array}$$

(6)

where $\left\vert x\right\rangle$ represents the computational basis of index register, and the unitary H_x is applied at the word register. In other words, the state of index register controls the operations applied at the word register.

Several proposals of implementing Eq. (6) has been introduced in the literature. For example, with n_anc = m ancillary qubits, ref. ³⁸ (Appendix G.4) proposed a method achieving O(ML) circuit depth and gate count with M = 2^m. With n_anc = O(ML) ancillary qubits, Eq. (6) is a special form of the “product unitary memory” in³⁴, which can be constructed with $O(\log (ML))$ depth and O(ML) count of Clifford + T gates. We provide an algorithm with tunable ancillary qubit number achieving the circuit complexity as follows.

Theorem 4

With n_anc ancillary qubits where Ω(m + L) ⩽ n_anc ⩽ O(ML), Eq. (6) can be realized with O(ML) count and $O\left(ML\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates.

Compared to the result in ref. ³⁸, our protocol reduces the circuit depth for a factor of $O\left(\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ while maintaining the gate count scaling. The proof of Theorem 4 and details of circuit constructions are provided in Methods.

Sparse Boolean memory

We consider a sparse Boolean function $B:{\{0,1\}}^{n}\to {\{0,1\}}^{\tilde{n}}$, which has totally s input digits q satisfying B(q) ≠ 0 ⋯ 0. Given an n-qubit index register (denoted as idx) and a $\tilde{n}$-qubit register (denoted as wrd), we define the sparse Boolean memory Select(B) as a unitary satisfying

$$\begin{array}{r}\,{{\mbox{Select}}}\,(B){\vert q\rangle }_{{{{\rm{idx}}}}}{\vert z\rangle }_{{{{\rm{wrd}}}}}={\vert q\rangle }_{{{{\rm{idx}}}}}{\vert z\oplus B(q)\rangle }_{{{{\rm{wrd}}}}}.\end{array}$$

(7)

We have the following result (see Methods for proof).

Theorem 5

With n_anc ancillary qubits where $\Omega (n)\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(ns\tilde{n})$, Select(B) can be realized with $O(ns\tilde{n})$ count and $O\left(ns\tilde{n}\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates.

Different from SAIM, Eq. (7) contains constant number of nonzero outputs. So its construction requires much less resource.

Construction of SAIM

With all necessary tools ready, we now discuss the construction of the SAIM in Eq. (1). We have the following result.

Theorem 6

Given n_anc ancillary qubits where Ω(n) ⩽ n_anc ⩽ O(Nnds), O_H can be constructed with O(Nnds) count and $O\left(Nnds\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates.

Given n_anc ancillary qubits where $\Omega (n)\leqslant {n}_{{{{\rm{anc}}}}}\leqslant O(Nns\log s),{O}_{F}$ can be constructed with $O(Nns\log s)$ count and $O\left(Nns\log s\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates.

Proof

O_H corresponds to a 2n-index, d-word and Ns-sparse Boolean function. So the construction of O_H follows directly from Theorem 5.

The construction of O_F can be realized in three steps. We introduce an n-qubit ancillary register (denoted as anc). In the first step, we perform the following transformation

$$\begin{array}{r}{\vert x,k\rangle }_{{{{\rm{idx}}}}}{\vert 0\rangle }_{{{{\rm{anc}}}}}\to {\vert x,k\rangle }_{{{{\rm{idx}}}}}{\vert F(x,k)\rangle }_{{{{\rm{anc}}}}}.\end{array}$$

(8)

According to Theorem 4, this step can be constructed with O(2ⁿns) count and $O({2}^{n}ns\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}})$ depth with n_anc ancillary qubits. In the second step, we apply swap gates between the ancillary register and half of the index register which encodes k, i.e.

$$\begin{array}{r}{\left\vert x,k\right\rangle }_{{{{\rm{idx}}}}}{\left\vert F(x,k)\right\rangle }_{{{{\rm{anc}}}}}\to {\left\vert x,F(x,k)\right\rangle }_{{{{\rm{idx}}}}}{\left\vert k\right\rangle }_{{{{\rm{anc}}}}}\end{array}$$

(9)

This step can be realized with O(n) count and O(1) depth of Clifford + T gates. In the final step, we perform the transformation

$$\begin{array}{r}{\vert x,F(x,k)\rangle }_{{{{\rm{idx}}}}}{\vert k\rangle }_{{{{\rm{anc}}}}}\to {\vert x,F(x,k)\rangle }_{{{{\rm{idx}}}}}{\vert 0\rangle }_{{{{\rm{anc}}}}}\end{array}$$

(10)

which can be realized by a 2n-index, $\lceil {\log }_{2}s\rceil$-word and Ns-sparse Boolean memory. According to Theorem 5, this step can be constructed with $O(Nns\log s)$ count and $O\left(Nns\log s\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates. The total gate complexity is therefore the combination of three steps above. □

Compared to the circuit complexity lower bound obtained in Theorem 1, our protocol has nearly optimal circuit complexities with respect to the matrix dimension up to a factor of n. As mentioned before, SAIM is a standard access model in many quantum algorithms, and the query complexity to SAIM has been studied extensively for various tasks. With Theorem 6, one can directly obtain the natural circuit complexity of those algorithms. Further discussions are provided in the DISCUSSION section.

Construction of LCU-based block-encoding

The construction of LCU-based block-encoding can be realized with quantum state preparation and select oracle¹³. We define α = [α₁, ⋯ , α_P] and $\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle =\mathop{\sum }\nolimits_{p = 1}^{P}\sqrt{{\alpha }_{p}}\left\vert p\right\rangle$. Let ${G}_{\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle }$ be the state preparation unitary for $\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle$, and we define ${\mathbb{G}}\equiv {G}_{\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle }\otimes {{\mathbb{I}}}_{{2}^{n}}$. We then define a Select oracle corresponding to Eq. (3) as $\,{{\mbox{Select}}}\,({u}_{p})=\mathop{\sum }\nolimits_{p = 0}^{P-1}\vert p\rangle \langle p\vert \otimes {u}_{p}$. It can be verified that ${{\mathbb{G}}}^{{\dagger} }\,{{\mbox{Select}}}\,({u}_{p}){\mathbb{G}}$ is a block-encoding of H with normalization factor α = 1¹⁴. The constructions of LCU-based block-encoding is then reduced to the quantum state preparation and Select(u_p), both of which can be constructed with polynomial-size quantum circuits.

The exact circuit complexity of block-encoding depends on the specific form of u_p. We take the LCU for Pauli strings (Eq. (4)) as an example. Based on our improved quantum state preparation (Theorem 3) and Select oracle for Pauli strings (Theorem 4), we have the following result, where (n_anc, ε)-block-encoding is the abbreviation of (1, n_anc, ε)-block-encoding (see Methods section for proof).

Theorem 7

With n_anc ancillary qubits where $\Omega ({\log }_{2}P)\,\leqslant\, {n}_{{{{\rm{anc}}}}}\,\leqslant \,O(NP)$, the (n_anc, ε)-block-encoding of H defined in Eq. (4) can be constructed with $O\left(P(n+\log (1/\varepsilon ))\right)$ count and $\tilde{O}\left(Pn\log (1/\varepsilon )\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates, where $\tilde{O}$ suppresses the doubly logarithmic factors of n_anc.

The block-encoding of LCU can be constructed with polylogarithmic circuit complexity with respect to the data dimension, as oppose to the SAIM requiring polynomial gate count. Therefore, for structured classical data in the form of Eq. (3) exponential quantum advantage can be expected. In below, we provide further discussions about by our results.

Discussion

As demonstrated in Theorem 1, a general SAIM can not be implemented with O(Poly(n)) size quantum circuit. In the language of complexity class, this implies that BQP^SAIM ≠ BQP, where SAIM represent the quantum oracles in the form of Eq. (1). In other words, if problem A can be solve with polynomial number of queries to the SAIM, A is not necessarily solvable with polynomial-size quantum circuits. In fact, it is reasonable to conjecture that BQP^SAIM ≠ PSPACE when considering the scaling with n. The reason is that for a general matrix with 2ⁿ dimension, storing all its element requires exponentially large space, and this is true even for sparse matrix. The same argument applies to the block-encoding of sparse matrices as well.

This argument is consistent with the results about classical dequantization algorithms^39,40, which demonstrate that sub-linear classical runtime can be achieved for tasks such as recommendation systems and solving linear systems. Note that these algorithms assumes a classical oracle similar to SAIM.

On the other hand, our study on sparse matrix encoding still has its great value. First of all, it is rare to have structured classical data that can be encoded with logarithmic complexity. In many cases, sparse matrix is the most compact representation for classical data of interest. Second, with SAIM or block-encoding, polynomial quantum speedup with respect to the matrix dimension N is still possible. Our constructions are nearly optimal, and can be used to estimate the concrete Clifford + T complexities of many quantum algorithms of practical interest. Finally, techniques developed here may serve as a subroutine for encoding a larger matrix with special structures, with which the with which exponential quantum advantage may be possible.

An open question is how to determine whether a given matrix is efficiently block-encodable. This problem can be considere as a generalization of the unitary complexity problem^41,42,43,44, which is important due to the broad applications of block-encoding¹⁵. According to Theorem 7, LCU for efficient unitaries [Eq. (3)] is a sufficient condition of efficient block-encoding. Due to the generality and simplicity of LCU, it is reasonable to conjecture that the decomposition of a matrix in the form of Eq. (3) has close relation to the efficiency of its block-encoding. The block encoding of H is challenging when it can not be well approximated by Eq. (3) with P = O(Poly(n)).

In conclusion, we have studied the circuit complexities of typical quantum access models, such as SAIM and block-encoding. We show that the circuit complexity lower bound for encoding sparse matrix is polynomial with respect to the matrix dimension. We provide nearly-optimal construction protocols to achieve the lower bound. For LCU-based block-encoding, we develop a construction protocol based on the improved implementation of quantum state preparation and select oracle for Pauli strings. Our protocols are based on Clifford + T gates and allow tunable ancillary qubit number. We expect that our results are useful for processing classical data with quantum devices^45,46,47. Future works may include the study of the circuit complexity lower bound for block-encoding, and how to further improve our protocols to achieve the lower bounds. Another interesting topic is about the power of quantum circuits with global quantum channels. For example, if the feedback controls are dependent on the measurement outcomes of many measurements. In this case, the elementary operations may no longer be described by local operations, and the computation power of the circuit is expected to be enhanced. In the direction of applications, it is interesting to find practical classical problems, whose data structure are able to be represented in the form LCU. In those scenarios, exponential quantum advantage can be expected.

Methods

Quantum state preparation

We first consider the preparation with n ancillary qubits. There are some state preparation protocol with optimal single- and two-qubit gate count, such as ref. ²⁸. However, with direct Clifford + T decomposition, the gate complexity becomes suboptimal. We achieve gate count and circuit depth linear to the state dimension with an optimized Clifford + T decomposition. The result is as follows.

Lemma 3

With n ancillary qubits, an arbitrary quantum state can be prepared to precision ε with $O(N\log (1/\varepsilon ))$ depth and $O(N\log (1/\varepsilon ))$ count of Clifford + T gates.

Proof

According to²⁸, with single- and two-qubit gates, an arbitrary quantum state $\vert {\psi }_{{{{\rm{targ}}}}}\rangle$ can be expressed as

$$\begin{array}{r}\vert {\psi }_{{{{\rm{targ}}}}}\rangle =\left(\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{y}\right)\left(\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{z}\right)\left\vert 0\cdots 0\right\rangle ,\end{array}$$

(11)

where ${F}_{j}^{z}$ and ${F}_{j}^{y}$ are uniformly controlled Z- and Y-rotations

$${F}_{j}^{z}=\mathop{\sum }\limits_{k=0}^{{2}^{j-1}-1}\left\vert k\right\rangle \left\langle k\right\vert \otimes {R}_{z}({\alpha }_{j,k}^{z})\otimes {{\mathbb{I}}}_{{2}^{n-j}},$$

(12a)

$${F}_{j}^{y}=\mathop{\sum }\limits_{k=0}^{{2}^{j-1}-1}\vert k\rangle\langle k\vert \otimes {R}_{y}({\alpha }_{j,k}^{y})\otimes {{\mathbb{I}}}_{{2}^{n-j}},$$

(12b)

with single qubit rotation gates ${R}_{y}(\theta )={e}^{-i\theta {\sigma }_{y}/2},{R}_{z}(\theta )={e}^{-i\theta {\sigma }_{z}/2}$. Here ${\alpha }_{j,k}^{y}\in {\mathbb{R}}$ and ${\alpha }_{j,k}^{z}\in {\mathbb{R}}$ are some rotation angles, the exact values of which are not important for our analysis.

Single-qubit rotations can be approximated with Clifford + T gates. According to ref. ⁴⁸, unitary ${\widetilde{u}}_{z}$ satisfying $\Vert {\widetilde{u}}_{z}-{R}_{z}({\alpha }_{j,k}^{z}/2)\Vert\, \leqslant \,{\varepsilon }_{j}/2$ can be constructed with $O(\log (1/{\varepsilon }_{j}))$ single-qubit Clifford + T gates without ancilla. Accordingly, we can implement single-qubit-controlled-${R}_{z}({\alpha }_{j,k}^{z},{\varepsilon }_{j})$, such that

$$\begin{array}{r}\left\Vert {R}_{z}\left({\alpha }_{j,k}^{z},{\varepsilon }_{j}\right)-{R}_{z}\left({\alpha }_{j,k}^{z}\right)\right\Vert \,\leqslant \,{\varepsilon }_{j}\end{array}$$

(13)

with the following circuit.

Note that ${\widetilde{u}}_{z}^{{\dagger} }$ can be realized by the inverse conjugation of the Clifford + T gate sequence of ${\widetilde{u}}_{z}$. Similar argument is also applied for ${R}_{y}({\alpha }_{j,k}^{y})$. Then, according to Lemma 6 as will be introduced in the next section, one can construct the following unitaries

$${\widetilde{F}}_{y}^{j}=\mathop{\sum }\limits_{k=0}^{{2}^{j-1}-1}\left\vert k\right\rangle \left\langle k\right\vert \otimes {\widetilde{R}}_{y}\left({\alpha }_{j,k}^{y},{\varepsilon }_{j}\right)\otimes {{\mathbb{I}}}_{{2}^{n-j}}$$

(14)

$${\widetilde{F}}_{z}^{j}=\mathop{\sum }\limits_{k=0}^{{2}^{j-1}-1}\left\vert k\right\rangle \left\langle k\right\vert \otimes {\widetilde{R}}_{z}\left({\alpha }_{j,k}^{z},{\varepsilon }_{j}\right)\otimes {{\mathbb{I}}}_{{2}^{n-j}}$$

(15)

with j ancillary qubits, $O({2}^{j}\log (1/{\varepsilon }_{j}))$ depth and $O({2}^{j}\log (1/{\varepsilon }_{j}))$ count of Clifford + T gates. We therefore approximate the target state with the following

$$\begin{array}{r}\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle =\left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{z}\right)\left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{y}\right)\left\vert 0\cdots 0\right\rangle .\end{array}$$

(16)

In below, we first bound the distance between $\left\vert {\psi }_{{{{\rm{targ}}}}}\right\rangle$ and $\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle$. It can be verified that $\Vert {\widetilde{F}}_{j}^{y}\vert \psi \rangle -{F}_{j}^{y}\vert \psi \rangle \Vert \,\leqslant \,{\varepsilon }_{j}$ and $\Vert {\widetilde{F}}_{j}^{z}\vert \psi \rangle -{F}_{j}^{z}\vert \psi \rangle \Vert\, \leqslant \,{\varepsilon }_{j}$ for any quantum state $\left\vert \psi \right\rangle$. In other words, we have $\Vert {\widetilde{F}}_{j}^{y}-{F}_{j}^{y}\Vert\, \leqslant\, {\varepsilon }_{j}$ and $\Vert {\widetilde{F}}_{j}^{z}-{F}_{j}^{z}\Vert \,\leqslant \,{\varepsilon }_{j}$. Therefore,

$$\begin{array}{ll}\quad\left\Vert \left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{y}-\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{y}\right)\right\Vert \\ \leqslant \,\left\Vert {\widetilde{F}}_{n}^{y}\left(\mathop{\prod }\limits_{j=1}^{n-1}{\widetilde{F}}_{j}^{y}-\mathop{\prod }\limits_{j=1}^{n-1}{F}_{j}^{y}\right)\right\Vert +\left\Vert \left({\widetilde{F}}_{n}^{y}-{F}_{n}^{y}\right)\mathop{\prod }\limits_{j=1}^{n-1}{F}_{j}^{y}\right\Vert \\ \leqslant \left\Vert \left(\mathop{\prod }\limits_{j=1}^{n-1}{\widetilde{F}}_{j}^{y}-\mathop{\prod }\limits_{j=1}^{n-1}{F}_{j}^{y}\right)\right\Vert +{\varepsilon }_{n}\\ \cdots \\ \leqslant \mathop{\sum }\limits_{j=1}^{n}{\varepsilon }_{j}.\end{array}$$

(17)

In a similar way, we can obtain $\left\Vert \left(\mathop{\prod }\nolimits_{j = 1}^{n}{\widetilde{F}}_{j}^{z}-\mathop{\prod }\nolimits_{j = 1}^{n}{F}_{j}^{z}\right)\right\Vert\, \leqslant \,\mathop{\sum }\nolimits_{j = 1}^{n}{\varepsilon }_{j}$. So we have

$$\begin{array}{ll}\quad\Vert \vert {\psi }_{{{{\rm{targ}}}}}\rangle -\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle \Vert \\\leqslant\left\Vert \left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{y}\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{z}-\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{y}\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{z}\right)\right\Vert \\ \leqslant \left\Vert \left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{z}-\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{z}\right)\right\Vert +\left\Vert \left(\mathop{\prod }\limits_{j=1}^{n}{\widetilde{F}}_{j}^{y}-\mathop{\prod }\limits_{j=1}^{n}{F}_{j}^{y}\right)\right\Vert \\ \leqslant 2\mathop{\sum }\limits_{j=1}^{n}{\varepsilon }_{j}.\end{array}$$

(18)

According to Eq. (18), to control the total error rate to a constant value i.e. $\parallel \vert {\psi }_{{{{\rm{targ}}}}}\rangle -\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle \parallel \,\leqslant \,\varepsilon$, it suffice to set ε_j = ε/2^n−j+1. Because each ${\widetilde{F}}_{j}^{y}$ or ${\widetilde{F}}_{j}^{z}$ require $O({2}^{j}\log (1/{\varepsilon }_{j}))$ gate count and circuit depth, the total gate count is

$$\begin{array}{ll}\quad{C}\,=\,\mathop{\sum }\limits_{j=0}^{n-1}O({2}^{j}\log (1/{\varepsilon }_{j}))\\ \qquad=\,\mathop{\sum }\limits_{j=0}^{n-1}O({2}^{j}\log ({2}^{n-j}/\varepsilon ))\\ \qquad=\,O(N\log (1/\varepsilon )).\end{array}$$

(19)

Similarly, the total circuit depth is

$$\begin{array}{r}D=\mathop{\sum }\limits_{j=0}^{n-1}O({2}^{j}\log (1/{\varepsilon }_{j}))=O(N\log (1/\varepsilon )).\end{array}$$

(20)

□

We then consider the quantum state preparation with exponential ancillary qubits. Our protocol follows the same idea in³⁴ with improvement.

Lemma 4

Arbitrary n-qubit quantum state can be prepared with O(N) ancillary qubits, $O(n\log (n/\varepsilon ))$ depth and $O(N\log (1/\varepsilon ))$ count of Clifford + T gates.

Proof

Our construction is based on the protocol in³⁴ with revision and improved Clifford+T decomposition.

General procedure. The hardware layout of our method contains a binary tree of qubits with n + 1 layers, which is denoted as H. The lth (with 0 ⩽ l ⩽ n) layer of H is denoted as H_l. For 1 ⩽ l ⩽ n, H_l connects to another binary tree of qubits, denoted as V_l. The root of the tree V_l serves as the lth data qubit, and we denote it as d_l here.

Our protocol for preparing target state $\vert {\psi }_{{{{\rm{targ}}}}}\rangle =\mathop{\sum }\nolimits_{k = 0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert k\right\rangle }_{{{{\rm{d}}}}}$ works as follows. We initialize the root of H as ${\left\vert 1\right\rangle }_{{H}_{1}}$ while all other qubits are at state $\left\vert 0\right\rangle$. In the first stage, H is prepared at the quantum state (qubits at state $\left\vert 0\right\rangle$ are not shown)

$$\begin{array}{r}{\left\vert 1\right\rangle }_{{H}_{1}}\to \mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}.\end{array}$$

(21)

Here, $\left\vert {\varphi }_{k}\right\rangle$ is one of the computational basis of H to be defined later. In the second stage, the data qubits are transferred to the n-qubit computational basis ${\left\vert k\right\rangle }_{{{{\rm{d}}}}}$ conditioned on $\left\vert {\varphi }_{k}\right\rangle$, i.e.

$$\begin{array}{r}\mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}\to \mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}{\left\vert k\right\rangle }_{{{{\rm{d}}}}}.\end{array}$$

(22)

Finally, the binary tree H is uncomputed

$$\begin{array}{r}\mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}{\left\vert k\right\rangle }_{{{{\rm{d}}}}}\to \mathop{\sum }\limits_{k=0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert 0\right\rangle }_{H}{\left\vert k\right\rangle }_{{{{\rm{d}}}}}.\end{array}$$

(23)

The target state is then obtained after tracing out H. The readers are refereed to³⁴ for more details. Transformations in Eq. (22) and Eq. (23) can be ideally realized using Clifford circuit with O(n) depth and O(2ⁿ) gate count. On the other hand, the first stage for obtaining Eq. (21) contains rotation that has to be approximated with T gates and hence more complicated. So we focus on Eq. (21) in below.

Realization of Eq. (21). We will first show how Eq. (21) can be realized with single-qubit and CNOT gates with a method slightly different from³⁴, and then introduce its Clifford + T decomposition.

We define α_n,k ≡ a_k and ${\alpha }_{L,k}=\,{{\mbox{arg}}}\,({\alpha }_{L+1,2k})\sqrt{| {\alpha }_{L+1,2k}{| }^{2}+| {\alpha }_{L+1,2k+1}{| }^{2}}$ for all 0 ⩽ l ⩽ n − 1. Note that we can assume arg(α₀) = 0 without loss of generality. For 0 ⩽ L ⩽ n, we define

$$\left\vert {\Psi }_{L}\right\rangle =\mathop{\sum }\limits_{k=0}^{{2}^{L}-1}{\alpha }_{L,k}\bigotimes\limits_{l = 0}^{L}{\left\vert (k,l)\right\rangle }_{{H}_{l}}^{{\prime} }.$$

(24)

The realization of Eq. (21) contains n steps, with the Lth step corresponds to $\left\vert {\Psi }_{L-1}\right\rangle \to \left\vert {\Psi }_{L}\right\rangle$.

In Eq. (24), we have defined (0, 0) ≡ 0, and (k, l) ≡ k_nk_n−1 ⋯ k_n−l+1 for l ⩾ 1; ${\vert (k,l)\rangle }^{{\prime} }\equiv {\vert 0\rangle }^{\otimes (k,l)}\vert 1\rangle {\vert 0\rangle }^{\otimes {2}^{l}-(k,l)-1}$; H_l represents the lth layer of H. Eq. (21) and Eq. (24) have the correspondence $\vert {\varphi }_{k}\rangle { = \bigotimes }_{l = 0}^{L}{\vert (k,l)\rangle }_{{H}_{l}}^{{\prime} }$ and $\mathop{\sum }\nolimits_{k = 0}^{{2}^{n}-1}{\alpha }_{k}{\left\vert {\varphi }_{k}\right\rangle }_{H}=\left\vert {\psi }_{n}\right\rangle$. So $\left\vert {\Psi }_{n}\right\rangle$ is the target state of the stage 1 introduced in Eq. (21).

We then introduce the realization of $\left\vert {\Psi }_{L-1}\right\rangle \to \left\vert {\Psi }_{L}\right\rangle$. We define single qubit rotation ${r}_{y}(\theta )=\left(\begin{array}{rc}\cos \theta &\sin \theta \\ -\sin \theta &\cos \theta \end{array}\right)$ and ${r}_{z}(\phi )=\left(\begin{array}{rc}{e}^{-i\phi }&0\\ 0&{e}^{i\phi }\end{array}\right)$, and a three-qubit controlled operation as follows.

a, b, c are labels of the corresponding qubits. Let ${\theta }_{l,j}\equiv \arccos ({b}_{l,2j}/{b}_{l-1,j})$ and ϕ_l,j = ϕ_l,2j+1 − ϕ_l,2j, at the Lth step (1 ⩽ L ⩽ n), we implement the parallel rotation

$$\begin{array}{r}{W}_{L}=\mathop{\prod }\limits_{j=0}^{{2}^{L-1}-1}w({\theta }_{L,j};{\phi }_{L,j};{H}_{L-1,j};{H}_{L,2j};{H}_{L,2j+1})\end{array}$$

(25)

which costs O(1) depth and O(2^L) count of single-qubit and CNOT gates. It can be verified that

$$\begin{array}{r}{W}_{L}\left\vert {\Psi }_{L-1}\right\rangle =\left\vert {\Psi }_{L}\right\rangle .\end{array}$$

(26)

The total single-qubit + CNOT depth and gate count are O(n) and O(2ⁿ) respectively.

Clifford + Tdecomposition. W_L are assumed to be constructed with single- and two-qubit gates. In below, we discuss how to decompose it with Clifford +T gates with high accuracy. According to ref. ⁴⁸, one can always construct a unitaries ${\tilde{r}}_{y}(\theta ;\varepsilon ),{\tilde{r}}_{z}(\phi ;\varepsilon )$, with $O(\log (1/\varepsilon ))$ depth of gates in {H, S, T}, which satisfies

$$\begin{array}{r}\parallel {\tilde{r}}_{y}(\theta ;\varepsilon )-{r}_{y}(\theta )\parallel \,\leqslant \,\varepsilon ,\quad \parallel {\tilde{r}}_{z}(\phi ;\varepsilon )-{r}_{z}(\phi )\parallel\, \leqslant \,\varepsilon .\end{array}$$

(27)

Accordingly, we define $\widetilde{w}(\theta ;\phi ;\varepsilon ;a;b;c)$ as the following transformation

We have

$$\begin{array}{ll}\quad\,\,\widetilde{w}(\theta ;\phi ;\varepsilon ;a;b;c)(a{\left\vert 0\right\rangle }_{a}{\left\vert 0\right\rangle }_{b}{\left\vert 0\right\rangle }_{c}+b{\left\vert 1\right\rangle }_{a}{\left\vert 0\right\rangle }_{b}{\left\vert 0\right\rangle }_{c})\\ \,=\,a{\left\vert 0\right\rangle }_{a}{\left\vert 0\right\rangle }_{b}{\left\vert 0\right\rangle }_{c}+{b}_{1}(\varepsilon )\left\vert 1\right\rangle \left\vert 10\right\rangle +{b}_{2}(\varepsilon )\left\vert 01\right\rangle ,\end{array}$$

(28)

for some $\sqrt{| {b}_{1}(\varepsilon )-{b}_{1}(0){| }^{2}+| {b}_{2}(\varepsilon )-{b}_{2}(0){| }^{2}}\,\leqslant \,| b| \varepsilon$. We then define

$$\begin{array}{r}{\widetilde{W}}_{L}(\varepsilon )=\mathop{\prod }\limits_{j=0}^{{2}^{L-1}-1}\widetilde{w}({\theta }_{L,j};{\phi }_{L,j};\varepsilon ;{H}_{L-1,j};{H}_{L,2j};{H}_{L,2j+1}),\end{array}$$

(29)

which is used to approximate W_L. From Eq. (28), it can be verified that $\left\Vert {\widetilde{W}}_{L}(\varepsilon )\left\vert {\Psi }_{L-1}\right\rangle -{W}_{L}\left\vert {\Psi }_{L-1}\right\rangle \right\Vert\, \leqslant \,\varepsilon$. We set the accuracy at the Lth layer as ε_L, and define

$$\begin{array}{r}\vert {\widetilde{\Psi }}_{0}\rangle =\vert {\Psi }_{0}\rangle ,\quad \vert {\widetilde{\Psi }}_{L}\rangle ={\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\widetilde{\Psi }}_{L-1}\rangle.\end{array}$$

(30)

We have

$$\begin{array}{ll}\quad\Vert \vert {\widetilde{\Psi }}_{L}\rangle -\vert {\Psi }_{L}\rangle \Vert \\ =\Vert {\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\widetilde{\Psi }}_{L-1}\rangle -{W}_{L}\vert {\Psi }_{L-1}\rangle \Vert \\ \leqslant \Vert {\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\widetilde{\Psi }}_{L-1}\rangle -{\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\Psi }_{L-1}\rangle \Vert \\ \quad+\,\Vert {\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\Psi }_{L-1}\rangle -{W}_{L}\vert {\Psi }_{L-1}\rangle \Vert \\ \leqslant \Vert {\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\widetilde{\Psi }}_{L-1}\rangle -{\widetilde{W}}_{L}({\varepsilon }_{L})\vert {\Psi }_{L-1}\rangle \Vert +{\varepsilon }_{L}\\ =\Vert \vert {\widetilde{\Psi }}_{L-1}\rangle -\vert {\Psi }_{L-1}\rangle \Vert +{\varepsilon }_{L}.\end{array}$$

(31)

By applying the inequality above iteratively from L = 1 to L = n, we have

$$\begin{array}{r}\Vert \vert {\widetilde{\Psi }}_{n}\rangle -\vert {\Psi }_{n}\rangle \Vert\, \leqslant \,\mathop{\sum }\limits_{L=1}^{n}{\varepsilon }_{L}.\end{array}$$

(32)

According to Eq. (32), to control the total error rate to a constant value, it suffices to set ε_L = Kε/(n−L+1)² for some constant K. This is the key step of our improved construction.

Circuit complexity. Each ${\widetilde{W}}_{L}$ can be realized with $O({2}^{L}\log (1/{\varepsilon }_{L}))$ count and $O(\log (1/{\varepsilon }_{L}))$ depth of Clifford + T gates. Therefore, the total gate count at stage 1 (Eq. (21)) is

$$\begin{array}{ll}C\,=\,O\left(\mathop{\sum }\limits_{L=1}^{n}{2}^{L}\log (1/{\varepsilon }_{L})\right)\\ \quad\,=\,O\left(\mathop{\sum }\limits_{L=1}^{n}{2}^{L}\log ({(n-L+1)}^{2}/\varepsilon )\right)\\ \quad\,=\,O\left({2}^{n+1}\mathop{\sum }\limits_{m=1}^{n}\frac{\log (m)}{{2}^{m}}\right)+O\left({2}^{n}\log (1/\varepsilon )\right)\\ \quad\,=\,O\left({2}^{n}\right)+O\left({2}^{n}\log (1/\varepsilon )\right)\\ \quad\,=\,O\left(N\log (1/\varepsilon )\right).\end{array}$$

(33)

The total circuit depth at stage 1 is

$$\begin{array}{ll}D\,=\,O\left(\mathop{\sum }\limits_{L=1}^{n}\log (1/{\varepsilon }_{L})\right)\\ \quad\,=\,O\left(\mathop{\sum }\limits_{L=1}^{n}\log ({(n-L+1)}^{2}/\varepsilon )\right)\\ \quad\,=\,O\left(\mathop{\sum }\limits_{m=1}^{n}\log ({m}^{2})\right)+O\left(n\log (1/\varepsilon )\right)\\ \quad\,=\,O\left(\log (n!)\right)+O\left(n\log (1/\varepsilon )\right)\\ \quad\,=\,O\left(n\log (n/\varepsilon )\right).\end{array}$$

(34)

Recall that Eqs. (22), (23) has O(N) count and O(n) depth of Clifford + T gates. So the total gate count and circuit depth are $O\left(N\log (1/\varepsilon )\right)$ and $O\left(n\log (n/\varepsilon )\right)$ respectively. □

We also cares about the controlled quantum state preparation. In our preparation scheme, the initial state is ${\left\vert 1\right\rangle }_{{H}_{1}}$, i.e. the root of H is set as $\left\vert 1\right\rangle$. If we set H₁ as ${\left\vert 0\right\rangle }_{{H}_{1}}$ instead, it can be verified that the output state is ${\left\vert 0\cdots 0\right\rangle }_{{{{\rm{d}}}}}$. Therefore, to implement controlled state preparation, one can simply replace the root qubit H₁ by the controlled qubit, and the circuit complexity remains unchanged. In other words, we have the following result.

Lemma 5

Arbitrary single-qubit-controlled n-qubit state preparation unitary can be constructed with O(N) ancillary qubits, $O(n\log (n/\varepsilon ))$ depth and $O(N\log (1/\varepsilon ))$ count of Clifford + T gates.

Based on Lemma 3, Lemma 4 and Lemma 5, We have the following result for intermediate number of ancillary qubits. Note that Theorem 3 in the main text follows directly from Theorem 8.

Theorem 8

(space-time tradeoff QSP). With n_anc ancillary qubits where Ω(n) ⩽ n_anc ⩽ O(2ⁿ), state preparation and controlled state preparation of an arbitrary n-qubit quantum state can be realized with precision ε with $O(N\log (1/\varepsilon ))$ count and $O\left(N\frac{\log ({n}_{{{{\rm{anc}}}}})\log (\log ({n}_{{{{\rm{anc}}}}})/\varepsilon )}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates.

Proof

We separate all data qubits into two registers. Register A contains the last ${n}_{a}=n-\lfloor {\log }_{2}m\rfloor$ data qubits, and register B contains the first ${n}_{b}=\lfloor {\log }_{2}m\rfloor$ qubits for some n ⩽ m ⩽ 2ⁿ. We define ${N}_{a}={2}^{{n}_{a}}$ and ${N}_{b}={2}^{{n}_{b}}$. The target state can be rewritten as

$$\begin{array}{r}\vert {\psi }_{{{{\rm{targ}}}}}\rangle =\mathop{\sum }\limits_{k=0}^{{N}_{a}-1}{\beta }_{k}{\left\vert k\right\rangle }_{A}{\left\vert {\phi }_{k}\right\rangle }_{B}\end{array}$$

(35)

for some normalized β_k, and normalized quantum states $\left\vert {\phi }_{k}\right\rangle$. We define ${\left\vert {\psi }_{a}\right\rangle }_{A}=\mathop{\sum }\nolimits_{k = 0}^{{N}_{a}-1}{\beta }_{k}{\left\vert k\right\rangle }_{A}$.

In the first step, we prepare register A to a quantum state

$$\begin{array}{r}{\left\vert {\widetilde{\psi }}_{a}\right\rangle }_{A}=\mathop{\sum }\limits_{k=0}^{{N}_{a}-1}{\tilde{\beta }}_{k}{\left\vert k\right\rangle }_{A}\end{array}$$

(36)

which satisfies $\left\Vert \left\vert {\psi }_{a}\right\rangle -\left\vert {\widetilde{\psi }}_{a}\right\rangle \right\Vert \,\leqslant \,\varepsilon /2$. According to Lemma 3, this step can be realized with $O({N}_{a}\log (1/\varepsilon ))=O\left(\frac{N}{m}\log (1/\varepsilon )\right)$ count and depth of Clifford + T circuit.

In the second step, we implement

$$\begin{array}{c}\,{{\mbox{Select}}}\,({\widetilde{G}}_{k})\mathop{\sum }\limits_{k=0}^{{N}_{a}-1}{\widetilde{\beta }}_{k}{\left\vert k\right\rangle }_{A}{\left\vert {0}^{{n}_{b}}\right\rangle }_{B}\,=\,\mathop{\sum }\limits_{k=0}^{{N}_{a}-1}{\widetilde{\beta }}_{k}{\left\vert k\right\rangle }_{A}{\vert {\widetilde{\phi }}_{k}\rangle }_{B}\\ \qquad\qquad\qquad\qquad\equiv \vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle \end{array}$$

(37)

where ${\widetilde{G}}_{k}$ is a state preparation unitary satisfying ${\widetilde{G}}_{k}\vert 0\rangle =\vert {\widetilde{\phi }}_{k}\rangle$ for some $\Vert \vert {\phi }_{k}\rangle -\vert {\widetilde{\phi }}_{k}\rangle \Vert\, \leqslant \,\varepsilon /2$. It can be then verified that $\Vert \vert {\psi }_{{{{\rm{targ}}}}}\rangle -\vert {\widetilde{\psi }}_{{{{\rm{targ}}}}}\rangle \Vert\, \leqslant \,\varepsilon$. According to Lemma 4, controlled-${\widetilde{G}}_{k}$ such that ${\widetilde{G}}_{k}\vert {0}^{{n}_{b}}\rangle =\vert {\widetilde{\phi }}_{k}\rangle$ can be constructed with O(m) ancillary qubits, $O({N}_{b}\log (1/\varepsilon ))$ count and $O({n}_{b}\log ({n}_{b}/\varepsilon ))=O(\log (m)\log (\log (m)/\varepsilon ))$ depth of Clifford + T gates. Then, according to Lemma 6, with O(m) ancillary qubits, Select$({\widetilde{G}}_{k})$ can be constructed with

$$\begin{array}{r}C=O({N}_{a}\times {N}_{b}\log (1/\varepsilon ))=O(N\log (1/\varepsilon ))\end{array}$$

(38)

gate count, and

$$\begin{array}{ll}D\,=\,O\left({N}_{a}\times \log (m)\log (\log (m)/\varepsilon )\right)\\ \quad\,=\,O\left(N\frac{\log (m)\log (\log (m)/\varepsilon )}{m}\right)\end{array}$$

(39)

depth of Clifford + T gates. By setting n_anc = O(m) for some n_anc ⩾ n, we complete the proof. □

Select oracle for general unitary functions

Suppose x is an m-bit bitstring, and U_x are general unitaries. We consider the unitary

$$\begin{array}{r}\,{{\mbox{Select}}}\,({U}_{x})=\mathop{\sum }\limits_{x=0}^{M-1}\left\vert x\right\rangle \left\langle x\right\vert \otimes {U}_{x},\end{array}$$

(40)

where M = 2^m. In below, we discuss how to construct Select(U_x) based on the implementation of single-qubit-controlled-U_x, and the corresponding circuit complexity upper bound. We define C_ctrl(U_x, r) and D_ctrl(U_x, r) as the count and depth of Clifford + T gates required to construct the controlled-U_x, given r ancillary qubits. The following result corresponds to the case with m + r ancillary qubits.

Lemma 6

(Appendix G.4 of³⁸). With m + r ancillary qubits, Select(U_x) can be constructed with O(MC_ctrl(U_x, r)) count and O(MD_ctrl(U_x, r)) depth of Clifford + T gates.

Proof

We introduce an ancillary register with m qubits. We denote the jth qubit at the index register (encoding $\left\vert x\right\rangle$) and ancillary registers as C_j, A_j respectively. We also denote C = [C₁, C₂, ⋯ , C_m] and A = [A₀, A₁, A₂, ⋯ , A_m]. A₀ is initialized as $\left\vert 1\right\rangle$ while all other ancillary qubits are initialized as $\left\vert 0\right\rangle$. □

Eq. (40) can be realized by querying Select(C, A, m, 0), which is defined recursively by Algorithm 1. In Algorithm 1, Toffoli(a, b; c) is the Toffoli gate with qubit a and b as the controlled qubits and c as the target qubit; C-U_x(a) is the controlled-U_x with qubit a as controlled qubit and the corresponding word register as target qubits; dim(v) represent the dimension of the vector (for example, dim(C) = n); v_j represents the jth element of v and v_j: = [v_j, v_j+1, ⋯ , v_dim(v)].

Algorithm 1

Select(y, q, l, x)

1: if l ≠ 0:

2: Toffoli($\overline{{y}_{1}},{q}_{1};{q}_{2}$)

3: Select(y_2:, q_2:, l − 1, x)

4: Toffoli($\overline{{y}_{1}},{q}_{1};{q}_{2}$)

5: Toffoli(y₁, q₁; q₂)

6: Select(y_2:, q_2:, l − 1, x + 2^l−1)

7: Toffoli(y₁, q₁; q₂)

8: elseif l = 0:

9: C-U_x(q₁)

10: end if

In our implementation, the controlled-U_x are queried for totally M times with x ∈ {0, ⋯ , m − 1} sequentially. Moreover, there are totally O(M) Toffoli gates acting sequentially. Therefore, the total gate count and circuit depth are O(MC_ctrl(U_x, r)) and O(MD_ctrl(U_x, r)) respectively.

We note that Algorithm 1 can be further simplified by combining some concatenated gates³⁸. But the asymptotic scaling here is optimal.

We then consider the construction of expoential ancillary qubits. In Algorithm 4,5 of³⁴, based on the bucket-brigade architecture for quantum random access memory^49,50,51, it has been shown that any Select(U_x) can be constructed by 4M − 1 ancillary qubits, O(M) Clifford + T gates arranged in O(m) circuit depth, and queries to all single-qubit-controlled-U_x for x ∈ {0, ⋯ , M − 1} in parallel. If each controlled-U_x uses r ancillary qubits, we require totally M(4 + r) − 1 ancillary qubits, because they are implemented in parallel. To sum up, we have the following result.

Lemma 7

(many qubit Select oracle). With M(4 + r) − 1 ancillary qubits, Select(U_x) can be constructed with O(MC_ctrl(U_x, r)) count and O(m + D_ctrl(U_x, r)) depth of Clifford + T gates.

Select oracle for general unitary functions

In below, we give the proof of Theorem 4 about the construction of select oracles for Pauli strings defined in Eq. (6). Note that Eq. (6) is a special case of Eq. (40) with U_x ∈ {±I, ±X, ±Y, ±Z}^⊗L.

proof of Theorem 4

Recall that Select oracle for Pauli strings corresponds to Eq. (40) with U_x = H_x, where ${H}_{x}{ = \bigotimes }_{l = 1}^{L}{H}_{x,l}$ and H_x,l ∈ { ± I, ± X, ± Y, ± Z}.

Given L ancillary qubits, controlled-H_x can be constructed with the following circuit.

where control qubit is denoted as c, ancillary qubits, all initialized as $\left\vert 0\right\rangle$, are denoted as a₁, a₂, ⋯ , a_L and target qubits are denoted as t₁, t₂, ⋯ , t_L respectively. Two of the L-Toffoli gates can be effectively constructed with O(L) count and $O(\log L)$ depth of Clifford + T gates. All controlled Pauli gates can be constructed with totally O(L) count and O(1) depth of Clifford + T gates. In other words, we have C_ctrl(H_x, L) = O(L) and ${D}_{{{{\rm{ctrl}}}}}({H}_{x},L)=O(\log L)$.

Our protocol of constructing Select(H_x) uses at least Ω(m + L) ancillary qubits. We divide the m-qubit index registers into two subregisters A and B with ${m}_{a}\,\geqslant \,{\log }_{2}(m+L)$ and m_b = m − m_a qubits respectively. Let ${M}_{a}={2}^{{m}_{a}},{M}_{b}={2}^{{m}_{b}},\,{{\mbox{Select}}}\,({H}_{x})$ can be rewritten as

$$\,{{\mbox{Select}}}\,({H}_{x})=\mathop{\sum }\limits_{{x}_{a}=0}^{{M}_{a}-1}\left\vert {x}_{a}\right\rangle \left\langle {x}_{a}\right\vert \otimes {V}_{{x}_{a}}$$

(41)

$${V}_{{x}_{a}}=\mathop{\sum }\limits_{{x}_{b}=0}^{{M}_{b}-1}\left\vert {x}_{b}\right\rangle \left\langle {x}_{b}\right\vert \otimes {H}_{{x}_{a}\oplus {x}_{b}}.$$

(42)

x_a and x_b are bit strings with m_a and m_b bits respectively, and x ≡ x_a ⊕ x_b. According to Lemma 7, ${V}_{{x}_{a}}$ can be constructed with M_a(4 + L) − 1 ancillary qubits, O(M_aL) count and $O({m}_{a}+\log L)$ depth of Clifford + T gates. According to Lemma 6, with totally n_anc = M_a(4 + L) − 1 + m_b ancillary qubits, the Clifford + T gate count of Select(H_x) is

$$\begin{array}{r}C=O({M}_{b}{M}_{a}L)=O(ML).\end{array}$$

(43)

The Coifford + T depth is

$$\begin{array}{ll}D\,=\,O({M}_{b}({m}_{a}+\log L))\\ \quad\,=\,O\left(M\frac{\log ({M}_{a}L)}{{M}_{a}}\right)\\ \quad\,=\,O\left(M\frac{\log (({n}_{{{{\rm{anc}}}}}+1)/4)}{({n}_{{{{\rm{anc}}}}}+1)/4L}\right)\\ \quad\,=\,O\left(ML\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right),\end{array}$$

(44)

which completes the proof. □

Details about LCU-based Block-encoding

Without loss of generality, we assume that $m={\log }_{2}P$ is an integer. We let ${\tilde{G}}_{\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle }$ be a state preparation unitary satisfying $\Vert {\tilde{G}}_{\vert {{{\boldsymbol{\alpha }}}}\rangle }{\vert {0}^{m}\rangle }_{{{{\rm{anc}}}}}-{G}_{\vert {{{\boldsymbol{\alpha }}}}\rangle }{\vert {0}^{m}\rangle }_{{{{\rm{anc}}}}}\Vert \,\leqslant\, \varepsilon /3$. Let ${\tilde{u}}_{p}$ be unitaries satisfying $\parallel {\tilde{u}}_{p}-{u}_{p}\parallel\, \leqslant\, \varepsilon /3$. We then define

$$\tilde{{\mathbb{G}}}\equiv {\tilde{G}}_{\left\vert {{{\boldsymbol{\alpha }}}}\right\rangle }\otimes {{\mathbb{I}}}_{N},$$

(45)

$$U\equiv {{\mathbb{G}}}^{{\dagger} }\,{{\mbox{Select}}}\,({u}_{p}){\mathbb{G}},$$

(46)

$$\tilde{U}\equiv {\tilde{{\mathbb{G}}}}^{{\dagger} }\,{{\mbox{Select}}}\,({u}_{p})\tilde{{\mathbb{G}}},$$

(47)

and $W\equiv \tilde{U}-U$. With a similar argument to Eq. (31), we have

$$\begin{array}{r}\left\Vert W\left\vert \Psi \right\rangle \right\Vert\, \leqslant \,\varepsilon ,\end{array}$$

(48)

where $\left\vert \Psi \right\rangle =\left\vert {0}^{m}\right\rangle \otimes \left\vert \psi \right\rangle$ and $\left\vert \psi \right\rangle$ is an arbitrary N-dimensional quantum state. We may rewrite W as

$$W=\left(\begin{array}{cc}\delta H&{W}_{1,2}\\ {W}_{2,1}&{W}_{2,2}\end{array}\right)$$

(49)

where $\delta H\in {{\mathbb{C}}}^{P\times P},{W}_{1,2}\in {{\mathbb{C}}}^{N\times P},{W}_{2,1}\in {{\mathbb{C}}}^{P\times N}$ and ${W}_{2,2}\in {{\mathbb{C}}}^{N\times N}$. Note that if $\parallel \delta H\parallel \,\leqslant \,\varepsilon ,\tilde{U}$ is a (m, ε)-block-encoding of H. We have

$$W\left\vert \Psi \right\rangle =\left(\begin{array}{cc}\delta H&{W}_{1,2}\\ {W}_{2,1}&{W}_{2,2}\end{array}\right)\left(\begin{array}{c}\left\vert \psi \right\rangle \\ 0\end{array}\right)=\left(\begin{array}{c}\delta H\left\vert \psi \right\rangle \\ {W}_{2,1}\left\vert \psi \right\rangle \end{array}\right)$$

(50)

Combining Eq. (48) with Eq. (50), we have

$$\begin{array}{r}\parallel \delta H\vert \psi \rangle \parallel\, \leqslant\, \parallel W\vert \Psi \rangle \parallel\, \leqslant\, \varepsilon .\end{array}$$

(51)

Because Eq. (51) is applied for arbitrary $\left\vert \psi \right\rangle$, we have ∥δH∥ ⩽ ε. Therefore, $\tilde{U}$ is a (m, ε)-block-encoding to H. We can now study the efficiency of block-encoding.

The actual circuit complexity depends on the form of u_p. We now proof Theorem 7 which corresponds to u_p ∈ {±I, ±X, ±Y, ±Z}^⊗n.

Proof of Theorem 7

With n_anc ancillary qubits where ${\log }_{2}P\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(P),\tilde{{\mathbb{G}}}$ can be constructed with $O(P\log (1/\varepsilon ))$ count and $O\left(P\frac{\log ({n}_{{{{\rm{anc}}}}})\log (\log ({n}_{{{{\rm{anc}}}}})/\varepsilon )}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates. With $\Omega ({\log }_{2}P)\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(Pn)$, Select(H_x) can be constructed with O(nP) count and $O\left(nP\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ depth of Clifford + T gates. Therefore, the total gate count of Select(H_x) is $O(P(n+\log (1/\varepsilon )))$. For $\Omega ({\log }_{2}P)\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(P)$, the circuit depth is

$$O\left(P\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\left(n+\log \left(\log ({n}_{{{{\rm{anc}}}}})/\varepsilon \right)\right)\right)$$

(52)

$$=O\left(P\left(n+\log (1/\varepsilon )\right)\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right).$$

(53)

For Ω(P) ⩽ n_anc ⩽ O(Pn), the circuit depth for $\tilde{{\mathbb{G}}}$ is $O(\log P\log (\log (P)/\varepsilon ))=O(\log n\log (\log n/\varepsilon ))$, where we have used the assumption P = O(Poly(n)). Combining with circuit depth of Select(H_x), the total circuit depth for block-encoding is

$${\displaystyle{\begin{array}{ll}\quad\,\,{O}\left(\left(\frac{{n}_{{{{\rm{anc}}}}}\log (n)\log (\log (n)/\varepsilon )}{\log {n}_{{{{\rm{anc}}}}}}+nP\right)\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\\ \,=\,O\left(\left(\frac{nP\log (n)\log (\log (n)/\varepsilon )}{\log (nP)}+nP\right)\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\\ \,=\,O\left(\left(nP\log (\log (n))+nP\log (1/\varepsilon )\right)\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)\\ \,=\,\tilde{O}\left(nP\log (1/\varepsilon )\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right),\end{array}}}$$

(54)

which completes the proof. □

Sparse Boolean memory

Recall that sparse Boolean memory performs the transformation $\,{{\mbox{Select}}}\,(B){\vert q\rangle }_{{{{\rm{idx}}}}}{\vert z\rangle }_{{{{\rm{wrd}}}}}={\vert q\rangle }_{{{{\rm{idx}}}}}{\vert z\oplus B(q)\rangle }_{{{{\rm{wrd}}}}}$, idx represents an n-qubit index register, wrd represents a $\tilde{n}$-qubit register, and there are most s input digits q satisfying B(q) ≠ 0 ⋯ 0. We define q_k as the kth input digit with nonzero output, and ${{{{{Q}}}}}_{B}\equiv \{{q}_{1},{q}_{2},\cdots \,,{q}_{s}\}$. In³⁴, we have developed a construction of SBM with $O(ns\tilde{n})$ ancillary qubits. The result is as follows.

Lemma 8

(Sec. III B in Supplemental Material of³⁴). With $O(ns\tilde{n})$ ancillary qubits, Select(B) can be realized with $O(ns\tilde{n})$ count and $O(\log (ns\tilde{n}))$ depth of Clifford + T gates.

Based on Lemma 8, we can obtain the gate complexity with intermediate number of ancillary qubits. The proof of Lemma 5 is given as follows.

Proof of Lemma 5

Let wrd_l be the lth qubit of the word register, and z_l be the lth digit of z. So ${\vert z\rangle }_{{{{\rm{wrd}}}}}=\mathop{\prod }\nolimits_{l = 1}^{\tilde{n}}{\vert {z}_{l}\rangle }_{{{{\mbox{wrd}}}}_{l}}$. Select(B) can be separated into multiple Boolean functions applied at different words. Let B_l(q) be the lth digit of B(q), and ${B}_{{l}_{\min }:{l}_{\max }}(q)\equiv {B}_{{l}_{\max }}(q)\cdots {B}_{{l}_{\min }+1}(q){B}_{{l}_{\min }}(q)$. We define $\,{{\mbox{Select}}}\,({B}_{{l}_{\min }:{l}_{\max }})$ as a unitary satisfying

$$\,{{\mbox{Select}}}\,({B}_{{l}_{\min }:{l}_{\max }}){\left\vert q\right\rangle }_{{{{\rm{idx}}}}}\mathop{\prod }\limits_{l={l}_{\min }}^{{l}_{\max }}{\left\vert {z}_{l}\right\rangle }_{{{{\mbox{wrd}}}}_{l}}$$

(55)

$$={\left\vert q\right\rangle }_{{{{\rm{idx}}}}}\mathop{\prod }\limits_{l={l}_{\min }}^{{l}_{\max }}{\left\vert {z}_{l}\oplus {B}_{l}(q)\right\rangle }_{{{{\mbox{wrd}}}}_{l}}.$$

(56)

For any $1={l}_{0}\, < \,{l}_{1}\, < \,\cdots\, < \,{l}_{{n}^{{\prime} }}=\tilde{n}+1$, it can be verified that

$$\begin{array}{r}\,{{\mbox{Select}}}(B)=\mathop{\prod }\limits_{r=1}^{{n}^{{\prime} }}{{\mbox{Select}}}\,({B}_{{l}_{r-1}:{l}_{r}-1}).\end{array}$$

(57)

We also define Select(B_l) = Select(B_l:l). For each B_l, we further define Boolean functions ${B}_{l,{k}_{\min }:{k}_{\max }}(q)={B}_{l}(q)\wedge ({k}_{\min }\,\leqslant \,k\,\leqslant\, {k}_{\max })$ for ${k}_{\min }\leqslant {k}_{\max }$. For any $0={k}_{0}\, < \,{k}_{1}\, < \,\cdots \,< \,{k}_{{s}^{{\prime} }}=s$, it can be verified that

$$\begin{array}{r}\,{{\mbox{Select}}}({B}_{l})=\mathop{\displaystyle{\prod }}\limits_{j=1}^{{s}^{{\prime} }}{{\mbox{Select}}}\,({B}_{l,{k}_{j-1}:{k}_{j}-1}).\end{array}$$

(58)

We first consider the construction with ancillary qubit number $O(ns)\,\leqslant \,{n}_{{{{\rm{anc}}}}}\,\leqslant \,O(ns\widetilde{n})$. In this case, we decompose Select(B(q)) with Eq. (57). We let d = ⌊n_anc/(ns)⌋ and ${n}^{{\prime} }=\lceil \tilde{n}/d\rceil$, and

$$\begin{array}{r}{l}_{r}=\left\{\begin{array}{ll}rd+1&r \,< \,{n}^{{\prime} }\\ \tilde{n}+1&r={n}^{{\prime} }\end{array}\right.\,.\end{array}$$

(59)

According to Lemma 8, with n_anc ancillary qubits, each $\,{{\mbox{Select}}}\,({B}_{{l}_{r-1}:{l}_{r}-1})$ can be constructed with O(nsd) count and $O(\log (nsd))=O(\log {n}_{{{{\rm{anc}}}}})$ depth of Clifford + T circuit. So the total gate count is $O(nsd)\times {n}^{{\prime} }=O(ns\tilde{n})$, and the total circuit depth is $O(\log (ns{n}_{{{{\rm{anc}}}}}))\times {n}^{{\prime} }=O\left(ns\tilde{n}\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$.

We then consider the construction with ancillary qubit number O(n) ⩽ n_anc ⩽ O(ns). In this case, we first perform the decomposition $\,{{\mbox{Select}}}\,(B)=\mathop{\prod }\nolimits_{r = 1}^{\tilde{n}}\,{{\mbox{Select}}}\,({B}_{l})$. Then, we decompose each Select(B_l) with Eq. (58). We let w = ⌊m/n⌋ and ${s}^{{\prime} }=\lceil s/w\rceil$, and

$$\begin{array}{r}{k}_{j}=\left\{\begin{array}{ll}jw&j\, < \,{s}^{{\prime} }\\ s&j={n}^{{\prime} }\end{array}\right.\,.\end{array}$$

(60)

According to Lemma 8, with n_anc ancillary qubits, each $\,{{\mbox{Select}}}\,({B}_{l,{k}_{j-1}:{k}_{j}-1})$ can be constructed with O(nw) count and $O(\log (nw))=O(\log {n}_{{{{\rm{anc}}}}})$ depth of Clifford + T circuit. So each Select(B_l) requires gate count $O(nw)\times {s}^{{\prime} }=O(ns)$, and circuit depth $O(\log ({n}_{{{{\rm{anc}}}}}))\times {s}^{{\prime} }=O\left(ns\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$. In this case, we have ${n}^{{\prime} }=\tilde{n}$ in Eq. (57), so the total gate count and circuit depth of Select(B(q)) is $O(ns\tilde{n})$ and $O\left(ns\tilde{n}\frac{\log {n}_{{{{\rm{anc}}}}}}{{n}_{{{{\rm{anc}}}}}}\right)$ respectively. □

References

Nielsen, M. A. & Chuang, I. Quantum computation and quantum information (Cambridge University Press, Cambridge, 2000).
Aaronson, S. Open problems related to quantum query complexity. ACM Trans. Quantum Comput. 2, 1–9 (2021).
Article MathSciNet Google Scholar
Grover, L. K. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett. 79, 325 (1997).
Article ADS Google Scholar
Berry, D. W., Ahokas, G., Cleve, R. & Sanders, B. C. Efficient quantum algorithms for simulating sparse hamiltonians. Commun. Math. Phys. 270, 359–371 (2007).
Article ADS MathSciNet Google Scholar
Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502 (2009).
Article ADS MathSciNet Google Scholar
Childs, A. M. & Kothari, R. Simulating sparse hamiltonians with star decompositions. In Theory of Quantum Computation, Communication, and Cryptography: 5th Conference, TQC 2010, Leeds, UK, April 13-15, 2010, Revised Selected Papers 5, 94–103 (Springer, 2011).
Childs, A. M. On the relationship between continuous-and discrete-time quantum walk. Commun. Math. Phys. 294, 581–603 (2010).
Article ADS MathSciNet Google Scholar
Berry, D. W., Childs, A. M., Cleve, R., Kothari, R. & Somma, R. D. Exponential improvement in precision for simulating sparse hamiltonians. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 283–292 (2014).
Childs, A. M., Kothari, R. & Somma, R. D. Quantum algorithm for systems of linear equations with exponentially improved dependence on precision. SIAM J. Comput. 46, 1920–1950 (2017).
Article MathSciNet Google Scholar
Gilyén, A., Su, Y., Low, G. H. & Wiebe, N. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 193–204 (2019).
Chakraborty, S., Gilyén, A. & Jeffery, S. The power of block-encoded matrix powers: Improved regression techniques via faster hamiltonian simulation. In Proceedings of the 46th International Colloquium on Automata, Languages and Programming (ICALP) (2019).
Babbush, R., Berry, D. W., Kothari, R., Somma, R. D. & Wiebe, N. Exponential quantum speedup in simulating coupled classical oscillators. Physical Review X 13, 041041 (2023).
Article ADS Google Scholar
Low, G. H. & Chuang, I. L. Optimal hamiltonian simulation by quantum signal processing. Phys. Rev. Lett. 118, 010501 (2017).
Article ADS MathSciNet Google Scholar
Low, G. H. & Chuang, I. L. Hamiltonian simulation by qubitization. Quantum 3, 163 (2019).
Article Google Scholar
Martyn, J. M., Rossi, Z. M., Tan, A. K. & Chuang, I. L. Grand unification of quantum algorithms. PRX Quantum 2, 040203 (2021).
Article ADS Google Scholar
Clader, B. D. et al. Quantum resources required to block-encode a matrix of classical data. IEEE Trans. Quantum Eng. 3, 1–23 (2022).
Article Google Scholar
Sünderhauf, C., Campbell, E. & Camps, J. Block-encoding structured matrices for data input in quantum computing. Quantum 8, 1226 (2024).
Article Google Scholar
Gui-Lu, L. General quantum interference principle and duality computer. Commun. Theor. Phys. 45, 825 (2006).
Article ADS MathSciNet Google Scholar
Long, G. L. Duality quantum computing and duality quantum information processing. Int. J. Theor. Phys. 50, 1305–1318 (2011).
Article MathSciNet Google Scholar
Childs, A. M. & Wiebe, N. Hamiltonian simulation using linear combinations of unitary operations. Quantum Inf. Comput. 12, 901–924 (2012).
MathSciNet Google Scholar
Cong, I., Choi, S. & Lukin, M. D. Quantum convolutional neural networks. Nat. Phys. 15, 1273–1278 (2019).
Article Google Scholar
Wei, S., Chen, Y., Zhou, Z. & Long, G. A quantum convolutional neural network on nisq devices. AAPPS Bull. 32, 1–11 (2022).
Article ADS Google Scholar
Li, H.-S., Fan, P., Xia, H. & Long, G.-L. The circuit design and optimization of quantum multiplier and divider. Sci. China Phys. Mech. Astronomy 65, 260311 (2022).
Article ADS Google Scholar
Fowler, A. G., Mariantoni, M., Martinis, J. M. & Cleland, A. N. Surface codes: Towards practical large-scale quantum computation. Phys. Rev. A 86, 032324 (2012).
Article ADS Google Scholar
Jaques, S. & Rattew, A. G. Qram: A survey and critique. Preprint at https://arxiv.org/abs/2305.10310 (2023).
Long, G.-L. & Sun, Y. Efficient scheme for initializing a quantum register with an arbitrary superposed state. Phys. Rev. A 64, 014303 (2001).
Article ADS Google Scholar
Grover, L. & Rudolph, T. Creating superpositions that correspond to efficiently integrable probability distributions. Preprint at https://arxiv.org/abs/quant-ph/0208112 (2002).
Möttönen, M., Vartiainen, J. J., Bergholm, V. & Salomaa, M. M. Transformation of quantum states using uniformly controlled rotations. Quantum. Inf. Comput. 5, 467–473 (2005).
MathSciNet Google Scholar
Plesch, M. & Brukner, Č. Quantum-state preparation with universal gate decompositions. Phy. Rev. A 83, 032302 (2011).
Article ADS Google Scholar
Zhang, Z., Wang, Q. & Ying, M. Parallel quantum algorithm for hamiltonian simulation. Quantum 8, 1228 (2024).
Article Google Scholar
Zhang, X.-M., Yung, M.-H. & Yuan, X. Low-depth quantum state preparation. Phys. Rev. Res. 3, 043200 (2021).
Article Google Scholar
Sun, X., Tian, G., Yang, S., Yuan, P. & Zhang, S. Asymptotically optimal circuit depth for quantum state preparation and general unitary synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023).
Rosenthal, G. Query and depth upper bounds for quantum unitaries via grover search. Preprint at https://arxiv.org/abs/2111.07992 (2021).
Zhang, X.-M., Li, T. & Yuan, X. Quantum state preparation with optimal circuit depth: Implementations and applications. Phys. Rev. Lett. 129, 230504 (2022).
Article ADS MathSciNet Google Scholar
Yuan, P. & Zhang, S. Optimal (controlled) quantum state preparation and improved unitary synthesis by quantum circuits with any number of ancillary qubits. Quantum 7, 956 (2023).
Article Google Scholar
Ashhab, S. Quantum state preparation protocol for encoding classical data into the amplitudes of a quantum information processing register’s wave function. Phys. Rev. Res. 4, 013091 (2022).
Article Google Scholar
Gui, K., Dalzell, A. M., Achille, A., Suchara, M. & Chong, F. T. Spacetime-efficient low-depth quantum state preparation with applications. Quantum 8, 1257 (2024).
Article Google Scholar
Childs, A. M., Maslov, D., Nam, Y., Ross, N. J. & Su, Y. Toward the first quantum simulation with quantum speedup. Proc. Natl. Acad. Sci. 115, 9456–9461 (2018).
Article ADS MathSciNet Google Scholar
Tang, E. A quantum-inspired classical algorithm for recommendation systems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (ACM Press, New York, 2019).
Chia, N.-H., Li, T., Lin, H.-H. & Wang, C. Quantum-inspired sublinear algorithm for solving low-rank semidefinite programming. In 45th International Symposium on Mathematical Foundations of Computer Science (Schloss Dagstuhl–Leibniz-Zentrum f'´ur Informatik, 2020).
Nielsen, M. A. A geometric approach to quantum circuit lower bounds. Preprint at https://arxiv.org/abs/quant-ph/0502070 (2005).
Bu, K., Garcia, R. J., Jaffe, A., Koh, D. E. & Li, L. Complexity of quantum circuits via sensitivity, magic, and coherence. arXiv:2204.12051 (2022).
Eisert, J. Entangling power and quantum circuit complexity. Phys. Rev. Lett. 127, 020501 (2021).
Article ADS MathSciNet Google Scholar
Li, L., Bu, K., Koh, D. E., Jaffe, A. & Lloyd, S. Wasserstein complexity of quantum circuits. Preprint at https://arxiv.org/abs/2208.06306 (2022).
Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).
Article ADS Google Scholar
Liu, J., Hann, C. T. & Jiang, L. Data centers with quantum random access memory and quantum networks. Phys. Rev. A 108, 032610 (2023).
Article ADS Google Scholar
Liu, J. & Jiang, L. Quantum data center: Perspectives. Preprint at https://arxiv.org/abs/2309.06641 (2023).
Selinger, P. Efficient Clifford+T approximation of single-qubit operators. Preprint at https://arxiv.org/abs/1212.6253 (2012).
Giovannetti, V., Lloyd, S. & Maccone, L. Quantum random access memory. Phys. Rev. Lett. 100, 160501 (2008).
Article ADS MathSciNet Google Scholar
Hann, C. T. et al. Hardware-efficient quantum random access memory with hybrid quantum acoustic systems. Phys. Rev. Lett. 123, 250501 (2019).
Article ADS Google Scholar
Hann, C. T., Lee, G., Girvin, S. & Jiang, L. Resilience of quantum random access memory to generic noise. PRX Quantum 2, 020311 (2021).
Article ADS Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 12175003, 12247124 and 12361161602), NSAF (Grant No. U2330201) and Project funded by China Postdoctoral Science Foundation (Grant No. 2023T160004).

Author information

Authors and Affiliations

Key Laboratory of Atomic and Subatomic Structure and Quantum Control (Ministry of Education), Guangdong Basic Research Center of Excellence for Structure and Fundamental Interactions of Matter, School of Physics, South China Normal University, Guangzhou, 510006, China
Xiao-Ming Zhang
Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, Guangdong-Hong Kong Joint Laboratory of Quantum Matter, Frontier Research Institute for Physics, South China Normal University, Guangzhou, 510006, China
Xiao-Ming Zhang
Center on Frontiers of Computing Studies, Peking University, 100871, Beijing, China
Xiao-Ming Zhang & Xiao Yuan
School of Computer Science, Peking University, 100871, Beijing, China
Xiao-Ming Zhang & Xiao Yuan

Authors

Xiao-Ming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xiao-Ming Zhang conceived the project. All authors contributed in the preparation of the manuscript.

Corresponding author

Correspondence to Xiao-Ming Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Materials

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, XM., Yuan, X. Circuit complexity of quantum access models for encoding classical data. npj Quantum Inf 10, 42 (2024). https://doi.org/10.1038/s41534-024-00835-8

Download citation

Received: 23 July 2023
Accepted: 28 March 2024
Published: 23 April 2024
DOI: https://doi.org/10.1038/s41534-024-00835-8

Subjects

Abstract

Similar content being viewed by others

Probing entanglement in a 2D hard-core Bose–Hubbard lattice

Logical quantum processor based on reconfigurable atom arrays

Constant-overhead fault-tolerant quantum computation with reconfigurable atom arrays

Introduction

Results

Circuit complexity lower bound

Quantum circuit capacity

Lemma 1

Lemma 2

Circuit complexity for encoding sparse matrices

Theorem 1

Theorem 2

Quantum state preparation

Theorem 3

Other useful subroutines

Select oracle for Pauli strings

Theorem 4

Sparse Boolean memory

Theorem 5

Construction of SAIM

Theorem 6

Proof

Construction of LCU-based block-encoding

Theorem 7

Discussion

Methods

Quantum state preparation

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Theorem 8

Proof

Select oracle for general unitary functions

Lemma 6

Proof

Algorithm 1

Lemma 7

Select oracle for general unitary functions

proof of Theorem 4

Details about LCU-based Block-encoding

Proof of Theorem 7

Sparse Boolean memory

Lemma 8

Proof of Lemma 5

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplemental Materials

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links