Introduction

The theory of quantum information processing promises to accelerate certain computational tasks substantially. In practice, the computational cost of generating an arbitrary quantum input state1,2 must be addressed to ensure the speedup. The ability to efficiently convert classical data into quantum states is essential in many algorithms with complex data sets, such as quantum searching3, collision finding4, and quantum Fourier transform5. The demand for such ability has continued to grow with recent discoveries of quantum algorithms for data analysis and machine learning applications with large classical data6,7,8,9,10. Quantum simulation also requires the preparation of a quantum register in the initial physical state of the simulated system11. One promising avenue is to use quantum random access memory (QRAM)12, a device that stores either classical or quantum data with the ability to query the data with respect to superposition of addresses. The bucket brigade (BB) model for QRAM proposed in refs12,13 requires \(O({\mathrm{log}}_{{\rm{2}}}(M))\) address qubits, O(M) qutrits for routing, and O(M) classical or quantum memory cells for M binary data. The content of multiple data cells can be returned in superposition with only \(O({\mathrm{log}}_{2}(M))\) qutrits activated after \(O({\mathrm{log}}_{2}^{2}(M))\) time steps. A critical assumption for the practicality of this scheme is that the inactive routing elements do not render noticeable errors.

Since quantum operations are applied directly to the qubits that form a QRAM as the state preparation is followed by a quantum algorithm, it is favorable to build a QRAM based on the quantum circuit model. At the same time, a QRAM should be a good interface to classical data for big data applications. In this work, we propose a flip-flop (FF) QRAM, which is constructed with the standard circuit-based quantum computation and without relying on a routing algorithm. The FF-QRAM can read unsorted classical data stored in memory cells, and superpose them in the computational basis states with non-uniform probability amplitudes to create a specific input state required in a quantum algorithm. Also, the classical information stored in the quantum state can easily be updated with the same FF-QRAM process. The cost for writing or modifying M classical data represented as \(D=\{({\overrightarrow{d}}^{(l)},{b}_{l})|0\le l < M\}\), where \({\overrightarrow{d}}^{(l)}\) represents n bits of information and bl is the attribute of \({\overrightarrow{d}}^{(l)}\), is O(n) qubits and O(Mn) quantum operations that are commonly found in many known algorithms. The probability amplitudes can be modified by post-selection at an additional cost of repeating the process and single qubit measurement. In addition, the FF-QRAM architecture can serve as a building block for the classical-quantum interface.

A quantum state prepared by QRAM as an input to a specific algorithm cannot be reused once measured (the quantum measurement postulate), nor be copied (the no-cloning theorem) for another task. Thus, in general, the QRAM cost seems unavoidable per algorithm run, even when performing a set of algorithms with an identical input state. Here we introduce a process of quantum forking inspired by process forking in computer operating systems, which creates a child process that can evolve independently14. Quantum forking is a framework to split unitary processes in superposition, with which the number of QRAM queries can be reduced in certain applications.

Results

A quantum superposition state prepared with respect to classical data for quantum computation can be referred to as quantum database (QDB). In most general form, a QDB can be expressed as

$$|\psi {\rangle }_{{\rm{QDB}}}=\sum _{l=0}^{M-1}\,{b}_{l}|{\overrightarrow{d}}^{(l)}\rangle ,$$
(1)

where \({\overrightarrow{d}}^{(l)}={d}_{0}^{(l)}{d}_{1}^{(l)}\ldots {d}_{n-1}^{(l)}\in {\{0,1\}}^{n}\) denotes a string of quantum bits in the computational basis to represent a classical data, and M is the number of data. For big data applications, a typical quantum representation of \({\overrightarrow{d}}^{(l)}\) can be decomposed as \(|{\overrightarrow{d}}^{(l)}\rangle =|{\overrightarrow{x}}^{(l)}\rangle |l\rangle \), where \({\overrightarrow{x}}^{(l)}\) and l denote a data entry and the corresponding label, respectively7,10. The probability amplitude, bl, can encode continuous data as required in some applications10,15, or represent the normalized occurrence of the data entry \({\overrightarrow{x}}^{(l)}\). For M numbers of n-bit data, n + m qubits are sufficient to realize the QDB, where n and \(m=\lceil {\mathrm{log}}_{2}(M)\rceil \) qubits encode the data and the label, respectively. The number of label qubits can be reduced by labelling only the data that appears more than once. The label qubits are unnecessary when all data entries, \({\overrightarrow{x}}^{(l)}\), are unique.

Flip-flop QRAM

The FF-QRAM is used to generate a QDB as follows. Consider a quantum computer with an (n + m)-qubit bus state to encode a big data class database. The bus qubit state can be arbitrary, and it defines which computational basis states, |jB, are accessed with probability amplitudes, \({\psi }_{j}\). A QRAM operation on the bus qubit superposes a set of classical data D as

$${\rm{QRAM}}(D)\,\sum _{j}\,{\psi }_{j}|j{\rangle }_{B}|0{\rangle }_{R}\equiv \sum _{l}\,{\psi }_{l}|{\overrightarrow{d}}^{(l)}{\rangle }_{B}|{b}_{l}{\rangle }_{R},$$
(2)

where the subscript B (R) indicates the bus (register) qubit, and the register qubit can include the probability amplitudes for encoding the analog data. The FF-QRAM is implemented systematically with standard quantum circuit elements, which include the classically-controlled Pauli X gate, \(\bar{c}X\), and the n-qubit controlled rotation gate, CnRp(θ). The \(\bar{c}X\) flips the target qubit only when the classical control bit is zero. The CnRp(θ) gate rotates the target qubit by θ around the p-axis of the Bloch sphere only if all n control qubits are 1.

The underlying idea of the FF-QRAM model is depicted in Fig. 1, describing the procedure to superpose two independent bit strings \({\overrightarrow{d}}^{(l)}\) and \({\overrightarrow{d}}^{(l+1)}\) with target probability amplitudes in the bus qubit state, \(|\psi {\rangle }_{B}\). In this example, the label qubits, \(|l\rangle \), are omitted without loss of generality to deliver the main idea. The initial state can be expressed with a focus on \({\overrightarrow{d}}^{(l)}\) as

$$|{\psi }_{0}{\rangle }_{l}={\psi }_{{\overrightarrow{d}}^{(l)}}|{\overrightarrow{d}}^{(l)}\rangle |0{\rangle }_{R}+\sum _{j\ne {\overrightarrow{d}}^{(l)}}\,{\psi }_{j}|j\rangle |0{\rangle }_{R},$$
(3)

where \(|{\psi }_{s}{\rangle }_{l}\) denotes the state of the (n + 1) qubits in the process of writing the lth data entry, observed at the sth step in Fig. 1. The \(\bar{c}X\) gates controlled by \({\overrightarrow{d}}^{(l)}\) rearrange the computational basis states of the bus qubit so that \(|{\overrightarrow{d}}^{(l)}\rangle \) becomes \(|1{\rangle }^{\otimes n}\), and the rest of the quantum bits flip accordingly:

$$|{\psi }_{1}{\rangle }_{l}={\psi }_{{\overrightarrow{d}}^{(l)}}|1{\rangle }^{\otimes n}|0{\rangle }_{R}+\sum _{|\overline{j\oplus {\overrightarrow{d}}^{(l)}}\rangle \ne |1{\rangle }^{\otimes n}}\,{\psi }_{j}|\overline{j\oplus {\overrightarrow{d}}^{(l)}}\rangle |0{\rangle }_{R}.$$
(4)
Figure 1
figure 1

Quantum circuit for FF-QRAM that writes bit strings \({\overrightarrow{d}}^{(l)}\) and \({\overrightarrow{d}}^{(l+\mathrm{1)}}\) as a quantum superposition state with probability amplitudes using multi-qubit controlled rotation gates determined by θ(l) and θ(l+1), respectively. The double lines indicate classically controlled operations, and the empty (filled) circle indicates that the gate is activated when the control bit (qubit) is 0 (1). The dotted and numbered arrows indicate the various steps described in the main text.

The overline in the last term indicates that the bit flip occurs if the control bit is 0. After step 1, the controlled qubit rotation, \({C}^{n}{R}_{y}({\theta }^{(l)})\), denoted as θ(l) in the figure, is applied to the register qubit. The quantum state at step 2 becomes

$$|{\psi }_{2}{\rangle }_{l}={\psi }_{{\overrightarrow{d}}^{(l)}}|1{\rangle }^{\otimes n}|{\theta }^{(l)}{\rangle }_{R}+\sum _{|\overline{j\oplus {\overrightarrow{d}}^{(l)}}\rangle \ne |1{\rangle }^{\otimes n}}\,{\psi }_{j}|\overline{j\oplus {\overrightarrow{d}}^{(l)}}\rangle |0{\rangle }_{R},$$
(5)

where \(|\theta \rangle =\,\cos \,\theta |0\rangle +\,\sin \,\theta |1\rangle \). The \(\bar{c}X\) gates conditioned on \({\overrightarrow{d}}^{(l)}\) are applied again to revert the bus state:

$$|{\psi }_{3}{\rangle }_{l}={\psi }_{{\overrightarrow{d}}^{(l)}}|{\overrightarrow{d}}^{(l)}\rangle |{\theta }^{(l)}{\rangle }_{R}+\sum _{j\ne {\overrightarrow{d}}^{(l)}}\,{\psi }_{j}|j\rangle |0{\rangle }_{R}.$$
(6)

The second round registers the next data \({\overrightarrow{d}}^{(l+\mathrm{1)}}\) and θ(l+1):

$$|{\psi }_{4}{\rangle }_{l,l+1}={\psi }_{{\overrightarrow{d}}^{(l)}}|{\overrightarrow{d}}^{(l)}\rangle |{\theta }^{(l)}{\rangle }_{R}+{\psi }_{{\overrightarrow{d}}^{(l+1)}}|{\overrightarrow{d}}^{(l+1)}\rangle |{\theta }^{(l+1)}{\rangle }_{R}+\sum _{j\ne {\overrightarrow{d}}^{(l)},{\overrightarrow{d}}^{(l+1)}}\,{\psi }_{j}|j\rangle |0{\rangle }_{R}.$$
(7)

This process can be repeated as many times as the number of data entries. In this way, M data entries can be registered with non-uniform weights to generate a state,

$$\sum _{l=0}^{M-1}\,{\psi }_{{\overrightarrow{d}}^{(l)}}|{\overrightarrow{d}}^{(l)}\rangle \,{[\cos {\theta }^{(l)}|0\rangle }_{R}+\,\sin \,{\theta }^{(l)}|1{\rangle }_{R}]+\sum _{j\notin \{{\overrightarrow{d}}^{(l)}\}}\,{\psi }_{j}|j\rangle |0{\rangle }_{R}.$$
(8)

Finally, the queried QDB derived from Eq. (1) can be obtained by selecting an appropriate angle θ(l) to match the desired probability amplitude bl, and post-selecting the measurement outcome \(\mathrm{|1}{\rangle }_{R}\). The probability to measure \(\mathrm{|1}{\rangle }_{R}\) is

$$P(1)=\sum _{l=0}^{M-1}\,|{\psi }_{{\overrightarrow{d}}^{(l)}}\,\sin \,{\theta }^{(l)}{|}^{2}.$$
(9)

The post-selection increases the total runtime by a factor of ~1/P(1), which is data dependent. In some instances, such as in the distance-based quantum classifier10, the post-selection success probability can be improved by pre-processing the classical data so that θ(l) is close to /2 for all l where k is an odd-integer.

For the quantum state containing data as equal superposition, the controlled rotations can be replaced with the controlled-NOT gate. Then, the classical data is encoded only in the digital form. If the bus qubit is not in the basis state that corresponds to a data entry \({\overrightarrow{d}}^{(j)}\), then the jth data entry cannot be written in the queried QDB. Moreover, when the same bit string appears more than once, the register qubit accumulates the rotation.

Updating desired data entries of the existing quantum database using our scheme is straightforward. The update can be done by inserting the QDB state as the bus qubit and addressing only the target basis states that are to be updated with the selective flip-flop process.

It is important to note that the post-selection process is not always necessary. For example, the post-selection is not needed when all or some of the bus qubit states are addressed to write or modify the binary data to generate the transformation,

$${\rm{QRAM}}(D)\,\sum _{j}\,{\psi }_{j}|j{\rangle }_{B}|{D}_{j}^{0}{\rangle }_{R}=\sum _{j}\,{\psi }_{j}|j{\rangle }_{B}|{D}_{j}{\rangle }_{R},$$
(10)

where \({D}_{j}^{0}\), \({D}_{j}\in \{0,1\}\). With r register qubits, this process can be easily generalized to encode r-bit data.

In most of the big data applications, real-valued data is encoded in the probability amplitude as discussed thus far. However, if desired, our method can also encode complex probability amplitudes by using a controlled rotation around an arbitrary axis.

In addition to O(Mn) flip-register-flop steps, the total FF-QRAM cost must include the resource overhead for the register operations. In fact, the number of elementary gates needed for this step can dominate the runtime of the entire QRAM process. Thus efficient realization of CnRp(θ) is critical for the practicality of our scheme. Though the optimal circuit depth reduction can be carried out based on the naturally available set of gates in a specific experimental setup, and is beyond the primary scope of this paper, we briefly mention some examples on how to implement CnRp(θ) here. If energy splittings between all pairs of the computational basis states are distinct, then in principle, a resonant pulse at the frequency corresponding to the energy difference between \(|1{\rangle }^{\otimes n}|0\rangle \) and \(\mathrm{|1}{\rangle }^{\otimes n+1}\) can realize the desired CnRp(θ). But this condition becomes exponentially challenging to satisfy in practice as the number of qubits increases. On the other hand, we can decompose the controlled rotation as \({C}^{n}{R}_{y}(\theta )={{\rm{C}}}^{n}{\rm{NOT}}\) Ry(θ/2) CnNOT \({R}_{y}^{\dagger }(\theta /\mathrm{2)}\). The CnNOT gate can be further decomposed into 2n − 3 Toffoli gates with n − 2 ancilla qubits prepared in \(\mathrm{|0}\rangle \) (see Methods). A Toffoli gate can be realized by applying a frequency-selective on-resonance pulse as described above if a set of three qubits is fully addressable while decoupled from the rest of the qubits in the system. Alternatively, a Toffoli gate can be decomposed into five two-qubit gates without requiring ancilla qubits16. Other methods for implementing CnNOT using O(n) number of elementary gates and ancillary space are discussed in refs16,17,18. The circuit optimization in terms of Clifford and T gates can be performed using the techniques presented in refs19,20.

We investigate the robustness of the FF-QRAM shown in Fig. 1 under imperfections using a simple but relevant error model. We assume a typical depolarizing error, in which the state at each time step becomes the maximally mixed classical state with probability ε, and remains unchanged with probability \(1-\varepsilon \). Here, we use the Toffoli gate as an example to count the number of time steps, while further gate decomposition and optimization can be required depending on the experimental setup as mentioned above. When implementing the CnNOT, 2n − 1 qubits undergo \(2\lceil {\mathrm{log}}_{2}(n)\rceil -1\) time steps. Therefore, the success probability after writing M classical bit strings of length n with arbitrary probability amplitudes is \({(1-\varepsilon )}^{O(Mn{\mathrm{log}}_{2}(n))}\). As an illustrative example, solid lines in Fig. 2 show the individual error rate at each time step necessary for writing M classical bit strings with arbitrary probability amplitudes, assuming \(n={\mathrm{log}}_{2}(M)\) without loss of generality, with the success probability ps of the total QRAM process. A milder assumption that the imperfect CnRp(θ) operation causes independent errors on n + 1 qubits yields a better success probability, \({(1-\varepsilon )}^{O(Mn)}\). This case is plotted as dashed lines with open symbols in the figure. The Methods section elaborates on how the number of noisy operations is counted.

Figure 2
figure 2

Error rate at each time step required for writing M classical data and \(n={\mathrm{log}}_{2}(M)\) with the success probability ps. Solid lines with filled symbols are obtained when CnRp(θ) can fail at any of its decomposed locations independently, while dashed lines with open symbols are obtained when the multi-qubit controlled rotation only causes independent errors on n + 1 qubit, as described in the main text.

Since our scheme is based on the quantum circuit model, fault-tolerant quantum error correction techniques21,22,23 can be employed to enhance the accuracy. In contrast, if quantum error correction is applied to BB-QRAM, all routing components are activated at a physical level and make the scheme equivalent to the conventional fanout architecture24. In addition, depending on the physical setup, the quantum circuit can be further optimized using various gate decomposition techniques16,19,25,26.

Application to quantum support vector machine

As an example, we demonstrate the FF-QRAM process for preparing a training data state in the quantum support vector machine. The classified training examples, \({\overrightarrow{x}}^{(i)}\in {{\mathbb{R}}}^{N}\), need to be encoded in the quantum format as

$$|\chi \rangle =\frac{1}{\sqrt{{\sum }_{i=0}^{M-1}\,|{\overrightarrow{x}}^{(i)}{|}^{2}}}\,\sum _{i=0}^{M-1}\,\sum _{k=0}^{N-1}\,{x}_{k}^{(i)}|k\rangle |i\rangle ,$$
(11)

where M denotes the number of training data samples8. The quantum circuit for preparing this QDB is depicted in Fig. 3 for a particular training data with an index i, where without loss of generality, M and N are assumed to be powers of two. An equal superposition state for log2(M) and log2(N) qubits is used as the bus qubit for providing the computational basis states k and i, respectively. The jth element of \({\overrightarrow{x}}^{(i)}\) determines the angle \({\theta }_{j}^{(i)}\) for the y-axis rotation of the register qubit. Note that the gates shaded in gray are included only to show the full flip-flop process for addressing a specific computational basis state, but they can be omitted in the implementation. The state shown in Eq. (11) is obtained after post-selecting the measurement outcome \(\mathrm{|1}{\rangle }_{R}\) on the register qubit. This scheme constructs the QDB shown in Eq. (11) using \(O({\mathrm{log}}_{2}(MN))\) hardware resources and O(MN) flip-register-flop operations.

Figure 3
figure 3

Quantum circuit for preparing the QDB for a quantum support vector machine. The gates are shown for writing an ith training data only. The gates shaded in gray are added solely for illustrating the flip-flop process, and are not implemented in practice.

Quantum forking

Here, we introduce a concept of quantum forking (QF) with which a qubit can undergo independent processes in superposition. This can be utilized as a means to reduce the number of QRAM queries in certain applications. Let us consider a quantum state \(|{{\rm{\Psi }}}_{0}\rangle =|0\rangle |{\rm{\Phi }}\rangle |a\rangle \) with an n-qubit QDB state \(|{\rm{\Phi }}\rangle \) generated by a QRAM process and an arbitrary n-qubit state \(|a\rangle \), where \(|{{\rm{\Psi }}}_{s}\rangle \) denotes the state at step s in Fig. 4(a). An n-qubit swap gate between \(|{\rm{\Phi }}\rangle \) and \(|a\rangle \) controlled by a qubit in \((|0\rangle +|1\rangle )/\sqrt{2}\) forms an entangled state,

$$|{{\rm{\Psi }}}_{1}\rangle =\frac{1}{\sqrt{2}}\,(|0\rangle |{\rm{\Phi }}\rangle |a\rangle +|1\rangle |a\rangle |{\rm{\Phi }}\rangle ).$$
(12)
Figure 4
figure 4

(a) Quantum forking to reduce the number of QRAM calls for evaluating the functions of inner products. The swap operation is represented by two × symbols connected with a vertical line. (b) Quantum forking circuit for evolving the target state \(|{\rm{\Phi }}\rangle \) under three different unitary operations in each branch (subspace). Thicker horizontal lines indicate multi-qubit channels, \(|a\rangle \) is an arbitrary ancilla state, and H (H3) represents the Hadamard gate on a qubit (qutrit).

In other words, the QDB is encoded in the first n-qubit data block if the control qubit is 0, and in the second n-qubit data block if the control qubit is 1. Then by applying two unitary evolutions activated by different computational basis states of the control qubit to each n-qubit block, \(|{\rm{\Phi }}\rangle \) forks into two different states in superposition:

$$|{{\rm{\Psi }}}_{2}\rangle =\frac{1}{\sqrt{2}}\,(|0\rangle {U}_{1}|{\rm{\Phi }}\rangle |a\rangle +|1\rangle |a\rangle {U}_{2}|{\rm{\Phi }}\rangle ).$$
(13)

Evidently, it is not possible to create correlations between \(|{{\rm{\Phi }}}_{1}\rangle ={U}_{1}|{\rm{\Phi }}\rangle \) and \(|{{\rm{\Phi }}}_{2}\rangle ={U}_{2}|{\rm{\Phi }}\rangle \) via linear operations. Nonetheless, QF can speedup certain tasks, such as ensemble averaging27 and the inner product calculation. Here we focus on the inner product evaluation problem as an example. The inner product between \(|{{\rm{\Phi }}}_{1}\rangle \) and \(|{{\rm{\Phi }}}_{2}\rangle \) can be evaluated by preparing these two states individually by making queries to the QRAM and performing the swap test28. Alternatively, starting from the state shown in Eq. (13), another controlled swap followed by a Hadamard operation on the control qubit yields the state

$$|{{\rm{\Psi }}}_{3}\rangle =\frac{1}{2}[|0\rangle \,(|{{\rm{\Phi }}}_{1}\rangle +|{{\rm{\Phi }}}_{2}\rangle )+|1\rangle \,(|{{\rm{\Phi }}}_{1}\rangle -|{{\rm{\Phi }}}_{2}\rangle )]\,|a\rangle .$$
(14)

Finally, the probability of measuring the control qubit in \(\mathrm{|0}\rangle \) is given by \(P(0)=[1+{\rm{Re}}(\langle {{\rm{\Phi }}}_{1}|{{\rm{\Phi }}}_{2}\rangle )]/2\). This procedure only reveals the real part of the inner product. The imaginary part can be evaluated by adding a phase gate to the control qubit in front of the final Hadamard gate. Since the ancilla qubit is in an arbitrary state, QRAM is used only for preparing \(|{\rm{\Phi }}\rangle \) once. The ancilla qubit can even be in the maximally mixed state, and we assume that the cost of preparing such a state is negligible. This method consumes O(n) additional gates, but reduces the number of QRAM queries by a factor of ~1/2. Note that the conventional swap test can only estimate the magnitude of the inner product. Thus the QF based approach not only reduces the number of QRAM queries, but also allows for the determination of the sign of the inner product. This is a consequence of an important property of the QF circuit; since different unitary operators can be applied to each branch (subspace), the global phase that a unitary operator introduces become distinguishable. Clearly, the quantum circuit shown on the left side of Fig. 4(a) can be rewritten more compactly without the controlled swap gates and the ancilla qubit by applying both controlled unitary operators directly to the data qubit as shown on the right side. The quantum circuit on the left illustrates the general quantum forking framework that can also be adapted for other applications by replacing the swap test with other measurement schemes27.

Generalizing above idea, a quantity such as \({\sum }_{1\le i,j\le d}\,{\rm{Re}}(\langle {{\rm{\Phi }}}_{i}|{{\rm{\Phi }}}_{j}\rangle )\) can be evaluated by repeating the modified swap test for which only one QRAM state preparation is needed. The modified swap test based on QF requires a control qudit of dimension d (or log2(d) qubits), and O(nd) gates.

The quantum forking for implementing three different unitary processes in superposition is depicted in Fig. 4(b). A qutrit is used as the control, and H3 represents the qutrit Hadamard operation for preparing the equal superposition of the three computational basis states. This circuit produces an entangled state \((|0\rangle |{{\rm{\Phi }}}_{1}\rangle |a\rangle |a\rangle +|1\rangle |a\rangle |{{\rm{\Phi }}}_{2}\rangle |a\rangle +|2\rangle |a\rangle |a\rangle |{{\rm{\Phi }}}_{3}\rangle )\)/\(\sqrt{3}\).

Discussion

Encoding large classical data into a quantum database must be done efficiently in a way that the potential advantages of the quantum algorithms for big data applications do not vanish. We proposed the flip-flop QRAM, a systematic architecture, for preparing a quantum database using the quantum circuit model. The circuit-based construction is imperative since it provides flexibility and compatibility with existing quantum computing techniques. Our process can register n-bit classical data with arbitrary probability amplitudes stored in M memory cells into quantum format using O(n) qubits and O(Mn) flip-register-flop steps. The versatility of the architecture allows to create a complex data structure via encoding any classical information, either discrete or continuous, as quantum bits or as probability amplitudes of a quantum state. An example of the amplitude encoding is the application to a quantum state generation for a quantum support vector machine algorithm in which the training data is represented with the probability amplitudes as shown in Fig. 3. Qubit encoding can be achieved by beginning with the quantum bus state as \(|\psi {\rangle }_{B}=|+{\rangle }^{\otimes n}\), and inserting the weights (e.g., the normalized occurrence) of the data by adjusting the multi-qubit controlled rotation CnRp(θ). For the uniform weight which is, for example, encountered in the parity learning algorithm29,30, the multi-qubit controlled gate is simply CnNOT. For the amplitude encoding, the final QDB state is obtained by post-selecting on the register qubit being \(\mathrm{|1}{\rangle }_{R}\). Hence the amplitude encoding introduces additional resource overhead for repeating the entire algorithm. However, for some tasks, the classical data can be pre-processed to increase the success probability of the post-selection. It is an interesting open problem whether the post-selection can be avoided in certain amplitude encoding schemes by utilizing the fact that the probability amplitudes are determined by cosines instead of sines if the register qubit is \(\mathrm{|0}{\rangle }_{R}\). Also, the post-selection can be avoided for the qubit encoding if the bus qubit state only contains the basis states that are to be queried and all classical data are registered with an equal weight. Note that BB-QRAM also employs the post-selection for preparing an arbitrary QDB state with the amplitude encoding. With some limitations, the amplitude encoding can be done without relying on the post-selection. For example, ref.20 introduces a procedure inspired by classical alias sampling to assign the probability amplitude \(\sqrt{\tilde{\rho }}\) using 2μ + 2n + O(1) ancilla qubits, where \(\tilde{\rho }\) is μ-bit binary approximation to the desired non-negative real value, ρ. In ref.11, adiabatic-diabatic state preparation is used to generate superposition states with squared amplitudes.

We point out potential solutions to several issues for meaningful applications of QRAM. First, when the FF-QRAM process leaves the last term in Eq. (8) that corresponds to the states without data entries, the rate of producing the desired QDB can be reduced. This issue can be partially circumvented by running L identical QRAM processes in parallel. Then the success probability of the post-selection improves by a factor of L while also increasing the number of qubits and the gates by the same factor. Note that the time complexity remains the same. Second, the QDB is not reusable once it is consumed by a quantum algorithm since the measurement collapses the state. Motivated by the above, we introduced the concept of quantum forking that allows to reduce the number of QRAM queries in some instances, in particular, when evaluating the inner product. Finding other instances for which the quantum forking can reduce the number of QRAM calls, even by a constant amount, remains an interesting open problem.

Methods

Error analysis

The CnNOT gate can be decomposed into a Cn−1NOT and a C2NOT (Toffoli) using an ancilla qubit prepared in \(\mathrm{|0}\rangle \) as shown in step (1) of Fig. 5 for n = 4 as an example. Then by recursion, CnNOT can be realized using 2n − 3 Toffoli gates and n − 2 ancilla qubits prepared in \(\mathrm{|0}\rangle \) (step (2) in Fig. 5). Note that n − 2 Toffoli gates are added after the Toffoli gate for conditionally flipping the target qubit in order to unentangle the ancilla qubits from the system. The quantum circuit can be rearranged to further reduce the depth as shown in the last step in Fig. 5.

Figure 5
figure 5

Quantum circuit for implementing a C4NOT gate using 5 Toffoli gates and 2 ancilla qubits prepared in \(\mathrm{|0}\rangle \) in 3 steps. The control bits are a, b, c and d, and the target qubit is \(|t\rangle \). The gates after the Toffoli gate for conditionally flipping the target qubit uncompute the ancila qubits in order to unentangle the ancillae from the system.

We assume that the \(\bar{c}X\) operations can occur simultaneously on all target qubits, but even when the control bit is 1, the target qubit may undergo an erroneous identity operation. To reduce the circuit depth, the \(\bar{c}X\) operations after CnRy(θ(l)) and before CnRy(θ(l+1)) can be merged. The combined operation flips the jth qubit only if \({d}_{j}^{(l)}\oplus {d}_{j}^{(l+1)}=1\), and otherwise does nothing, for \(j=0,\ldots ,n-1\). Thus, the total number of single qubit gates used for writing M classical data of length n is n(M + 1). Each CnRy(θ) uses two single qubit gates and two CnNOT gates. In the CnNOT implementation described above (Fig. 5), 2n − 1 qubits (n control qubits + 1 target qubit + n − 2 ancillae) undergo \(2\lceil {\mathrm{log}}_{2}(n)\rceil -1\) time steps.

Therefore, the total number of time steps τ that are subject to noise can be counted as

$$\tau =2M[(2n-1)\,(2\lceil {\mathrm{log}}_{2}\,(n)\rceil -1)+1]+n(M+1).$$
(15)

If CnRy(θ) can be implemented with only n + 1 independent errors, then τ can be further reduced to \((n+1)M+n(M+1)\).