Circuit-Based Quantum Random Access Memory for Classical Data

A prerequisite for many quantum information processing tasks to truly surpass classical approaches is an efficient procedure to encode classical data in quantum superposition states. In this work, we present a circuit-based flip-flop quantum random access memory to construct a quantum database of classical information in a systematic and flexible way. For registering or updating classical data consisting of M entries, each represented by n bits, the method requires O(n) qubits and O(Mn) steps. With post-selection at an additional cost, our method can also store continuous data as probability amplitudes. As an example, we present a procedure to convert classical training data for a quantum supervised learning algorithm to a quantum state. Further improvements can be achieved by reducing the number of state preparation queries with the introduction of quantum forking.

The theory of quantum information processing promises to accelerate certain computational tasks substantially. In practice, the computational cost of generating an arbitrary quantum input state 1,2 must be addressed to ensure the speedup. The ability to efficiently convert classical data into quantum states is essential in many algorithms with complex data sets, such as quantum searching 3 , collision finding 4 , and quantum Fourier transform 5 . The demand for such ability has continued to grow with recent discoveries of quantum algorithms for data analysis and machine learning applications with large classical data [6][7][8][9][10] . Quantum simulation also requires the preparation of a quantum register in the initial physical state of the simulated system 11 . One promising avenue is to use quantum random access memory (QRAM) 12 , a device that stores either classical or quantum data with the ability to query the data with respect to superposition of addresses. The bucket brigade (BB) model for QRAM proposed in refs 12 time steps. A critical assumption for the practicality of this scheme is that the inactive routing elements do not render noticeable errors.
Since quantum operations are applied directly to the qubits that form a QRAM as the state preparation is followed by a quantum algorithm, it is favorable to build a QRAM based on the quantum circuit model. At the same time, a QRAM should be a good interface to classical data for big data applications. In this work, we propose a flip-flop (FF) QRAM, which is constructed with the standard circuit-based quantum computation and without relying on a routing algorithm. The FF-QRAM can read unsorted classical data stored in memory cells, and superpose them in the computational basis states with non-uniform probability amplitudes to create a specific input state required in a quantum algorithm. Also, the classical information stored in the quantum state can easily be updated with the same FF-QRAM process. The cost for writing or modifying M classical data represented as represents n bits of information and b l is the attribute of → d l ( ) , is O(n) qubits and O(Mn) quantum operations that are commonly found in many known algorithms. The probability amplitudes can be modified by post-selection at an additional cost of repeating the process and single qubit measurement. In addition, the FF-QRAM architecture can serve as a building block for the classical-quantum interface.
A quantum state prepared by QRAM as an input to a specific algorithm cannot be reused once measured (the quantum measurement postulate), nor be copied (the no-cloning theorem) for another task. Thus, in general, the QRAM cost seems unavoidable per algorithm run, even when performing a set of algorithms with an identical input state. Here we introduce a process of quantum forking inspired by process forking in computer operating systems, which creates a child process that can evolve independently 14 . Quantum forking is a framework log ( ) 2 qubits encode the data and the label, respectively. The number of label qubits can be reduced by labelling only the data that appears more than once. The label qubits are unnecessary when all data entries, → x l ( ) , are unique.
Flip-flop QRAM. The FF-QRAM is used to generate a QDB as follows. Consider a quantum computer with an (n + m)-qubit bus state to encode a big data class database. The bus qubit state can be arbitrary, and it defines which computational basis states, |j〉 B , are accessed with probability amplitudes, ψ j . A QRAM operation on the bus qubit superposes a set of classical data D as where the subscript B (R) indicates the bus (register) qubit, and the register qubit can include the probability amplitudes for encoding the analog data. The FF-QRAM is implemented systematically with standard quantum circuit elements, which include the classically-controlled Pauli X gate, cX, and the n-qubit controlled rotation gate, C n R p (θ). The cX flips the target qubit only when the classical control bit is zero. The C n R p (θ) gate rotates the target qubit by θ around the p-axis of the Bloch sphere only if all n control qubits are 1.
The underlying idea of the FF-QRAM model is depicted in Fig. 1, describing the procedure to superpose two independent bit strings → d l ( ) and → + d l ( 1) with target probability amplitudes in the bus qubit state, ψ | 〉 B . In this example, the label qubits, | 〉 l , are omitted without loss of generality to deliver the main idea. The initial state can be expressed with a focus on as a quantum superposition state with probability amplitudes using multi-qubit controlled rotation gates determined by θ (l) and θ (l+1) , respectively. The double lines indicate classically controlled operations, and the empty (filled) circle indicates that the gate is activated when the control bit (qubit) is 0 (1). The dotted and numbered arrows indicate the various steps described in the main text.
where ψ | 〉 s l denotes the state of the (n + 1) qubits in the process of writing the lth data entry, observed at the sth step in Fig. 1 The overline in the last term indicates that the bit flip occurs if the control bit is 0. After step 1, the controlled qubit rotation, θ C R ( ) n y l ( ) , denoted as θ (l) in the figure, is applied to the register qubit. The quantum state at step 2 becomes are applied again to revert the bus state: The second round registers the next data → + d l ( 1) and θ (l+1) : This process can be repeated as many times as the number of data entries. In this way, M data entries can be registered with non-uniform weights to generate a state, Finally, the queried QDB derived from Eq. (1) can be obtained by selecting an appropriate angle θ (l) to match the desired probability amplitude b l , and post-selecting the measurement outcome | 〉 1 R . The probability to meas- The post-selection increases the total runtime by a factor of ~1/P(1), which is data dependent. In some instances, such as in the distance-based quantum classifier 10 , the post-selection success probability can be improved by pre-processing the classical data so that θ (l) is close to kπ/2 for all l where k is an odd-integer.
For the quantum state containing data as equal superposition, the controlled rotations can be replaced with the controlled-NOT gate. Then, the classical data is encoded only in the digital form. If the bus qubit is not in the basis state that corresponds to a data entry → d j ( ) , then the jth data entry cannot be written in the queried QDB. Moreover, when the same bit string appears more than once, the register qubit accumulates the rotation.
Updating desired data entries of the existing quantum database using our scheme is straightforward. The update can be done by inserting the QDB state as the bus qubit and addressing only the target basis states that are to be updated with the selective flip-flop process.
It is important to note that the post-selection process is not always necessary. For example, the post-selection is not needed when all or some of the bus qubit states are addressed to write or modify the binary data to generate the transformation, With r register qubits, this process can be easily generalized to encode r-bit data. In most of the big data applications, real-valued data is encoded in the probability amplitude as discussed thus far. However, if desired, our method can also encode complex probability amplitudes by using a controlled rotation around an arbitrary axis.
In addition to O(Mn) flip-register-flop steps, the total FF-QRAM cost must include the resource overhead for the register operations. In fact, the number of elementary gates needed for this step can dominate the runtime of www.nature.com/scientificreports www.nature.com/scientificreports/ the entire QRAM process. Thus efficient realization of C n R p (θ) is critical for the practicality of our scheme. Though the optimal circuit depth reduction can be carried out based on the naturally available set of gates in a specific experimental setup, and is beyond the primary scope of this paper, we briefly mention some examples on how to implement C n R p (θ) here. If energy splittings between all pairs of the computational basis states are distinct, then in principle, a resonant pulse at the frequency corresponding to the energy difference between | 〉 | 〉 ⊗ 1 0 n and | 〉 ⊗ + 1 n 1 can realize the desired C n R p (θ). But this condition becomes exponentially challenging to satisfy in practice as the number of qubits increases. On the other hand, we can decompose the controlled rotation as The C n NOT gate can be further decomposed into 2n − 3 Toffoli gates with n − 2 ancilla qubits prepared in | 〉 0 (see Methods). A Toffoli gate can be realized by applying a frequency-selective on-resonance pulse as described above if a set of three qubits is fully addressable while decoupled from the rest of the qubits in the system. Alternatively, a Toffoli gate can be decomposed into five two-qubit gates without requiring ancilla qubits 16 . Other methods for implementing C n NOT using O(n) number of elementary gates and ancillary space are discussed in refs [16][17][18] . The circuit optimization in terms of Clifford and T gates can be performed using the techniques presented in refs 19,20 .
We investigate the robustness of the FF-QRAM shown in Fig. 1 under imperfections using a simple but relevant error model. We assume a typical depolarizing error, in which the state at each time step becomes the maximally mixed classical state with probability ε, and remains unchanged with probability ε − 1 . Here, we use the Toffoli gate as an example to count the number of time steps, while further gate decomposition and optimization can be required depending on the experimental setup as mentioned above. When implementing the C n NOT, 2n − 1 qubits undergo − ⌈ ⌉ n 2 log ( ) 1 2 time steps. Therefore, the success probability after writing M classical bit strings of length n with arbitrary probability amplitudes is As an illustrative example, solid lines in Fig. 2 show the individual error rate at each time step necessary for writing M classical bit strings with arbitrary probability amplitudes, assuming = n M log ( ) 2 without loss of generality, with the success probability p s of the total QRAM process. A milder assumption that the imperfect C n R p (θ) operation causes independent errors on n + 1 qubits yields a better success probability, . This case is plotted as dashed lines with open symbols in the figure. The Methods section elaborates on how the number of noisy operations is counted.
Since our scheme is based on the quantum circuit model, fault-tolerant quantum error correction techniques [21][22][23] can be employed to enhance the accuracy. In contrast, if quantum error correction is applied to BB-QRAM, all routing components are activated at a physical level and make the scheme equivalent to the conventional fanout architecture 24 . In addition, depending on the physical setup, the quantum circuit can be further optimized using various gate decomposition techniques 16,19,25,26 . Application to quantum support vector machine. As    Quantum forking. Here, we introduce a concept of quantum forking (QF) with which a qubit can undergo independent processes in superposition. This can be utilized as a means to reduce the number of QRAM queries in certain applications. Let us consider a quantum state |Ψ 〉 = | 〉|Φ〉| 〉 a 0 0 with an n-qubit QDB state |Φ〉 generated by a QRAM process and an arbitrary n-qubit state | 〉 a , where |Ψ 〉 s denotes the state at step s in Fig. 4(a). An n-qubit swap gate between |Φ〉 and | 〉 a controlled by a qubit in | 〉 + | 〉 ( 0 1 )/ 2 forms an entangled state, In other words, the QDB is encoded in the first n-qubit data block if the control qubit is 0, and in the second n-qubit data block if the control qubit is 1. Then by applying two unitary evolutions activated by different computational basis states of the control qubit to each n-qubit block, |Φ〉 forks into two different states in superposition: via linear operations. Nonetheless, QF can speedup certain tasks, such as ensemble averaging 27 and the inner product calculation. Here we focus on the inner product evaluation problem as an example. The inner product between |Φ 〉 1 and |Φ 〉 2 can be evaluated by preparing these two states individually by making queries to the QRAM and performing the swap  www.nature.com/scientificreports www.nature.com/scientificreports/ test 28 . Alternatively, starting from the state shown in Eq. (13), another controlled swap followed by a Hadamard operation on the control qubit yields the state Finally, the probability of measuring the control qubit in | 〉 0 is given by = + 〈Φ |Φ 〉 P(0) [1 Re( )]/2 1 2 . This procedure only reveals the real part of the inner product. The imaginary part can be evaluated by adding a phase gate to the control qubit in front of the final Hadamard gate. Since the ancilla qubit is in an arbitrary state, QRAM is used only for preparing |Φ〉 once. The ancilla qubit can even be in the maximally mixed state, and we assume that the cost of preparing such a state is negligible. This method consumes O(n) additional gates, but reduces the number of QRAM queries by a factor of ~1/2. Note that the conventional swap test can only estimate the magnitude of the inner product. Thus the QF based approach not only reduces the number of QRAM queries, but also allows for the determination of the sign of the inner product. This is a consequence of an important property of the QF circuit; since different unitary operators can be applied to each branch (subspace), the global phase that a unitary operator introduces become distinguishable. Clearly, the quantum circuit shown on the left side of Fig. 4(a) can be rewritten more compactly without the controlled swap gates and the ancilla qubit by applying both controlled unitary operators directly to the data qubit as shown on the right side. The quantum circuit on the left illustrates the general quantum forking framework that can also be adapted for other applications by replacing the swap test with other measurement schemes 27 .
Generalizing above idea, a quantity such as ∑ 〈Φ |Φ 〉 ≤ ≤ Re( ) can be evaluated by repeating the modified swap test for which only one QRAM state preparation is needed. The modified swap test based on QF requires a control qudit of dimension d (or log 2 (d) qubits), and O(nd) gates.

Discussion
Encoding large classical data into a quantum database must be done efficiently in a way that the potential advantages of the quantum algorithms for big data applications do not vanish. We proposed the flip-flop QRAM, a systematic architecture, for preparing a quantum database using the quantum circuit model. The circuit-based construction is imperative since it provides flexibility and compatibility with existing quantum computing techniques. Our process can register n-bit classical data with arbitrary probability amplitudes stored in M memory cells into quantum format using O(n) qubits and O(Mn) flip-register-flop steps. The versatility of the architecture allows to create a complex data structure via encoding any classical information, either discrete or continuous, as quantum bits or as probability amplitudes of a quantum state. An example of the amplitude encoding is the application to a quantum state generation for a quantum support vector machine algorithm in which the training data is represented with the probability amplitudes as shown in Fig. 3. Qubit encoding can be achieved by beginning with the quantum bus state as ψ | 〉 = | + 〉 ⊗ B n , and inserting the weights (e.g., the normalized occurrence) of the data by adjusting the multi-qubit controlled rotation C n R p (θ). For the uniform weight which is, for example, encountered in the parity learning algorithm 29,30 , the multi-qubit controlled gate is simply C n NOT. For the amplitude encoding, the final QDB state is obtained by post-selecting on the register qubit being | 〉 1 R . Hence the amplitude encoding introduces additional resource overhead for repeating the entire algorithm. However, for some tasks, the classical data can be pre-processed to increase the success probability of the post-selection. It is an interesting open problem whether the post-selection can be avoided in certain amplitude encoding schemes by utilizing the fact that the probability amplitudes are determined by cosines instead of sines if the register qubit is | 〉 0 R . Also, the post-selection can be avoided for the qubit encoding if the bus qubit state only contains the basis states that are to be queried and all classical data are registered with an equal weight. Note that BB-QRAM also employs the post-selection for preparing an arbitrary QDB state with the amplitude encoding. With some limitations, the amplitude encoding can be done without relying on the post-selection. For example, ref. 20 introduces a procedure inspired by classical alias sampling to assign the probability amplitude ρ  using 2μ + 2n + O(1) ancilla qubits, where ρ  is μ-bit binary approximation to the desired non-negative real value, ρ. In ref. 11 , adiabatic-diabatic state preparation is used to generate superposition states with squared amplitudes.
We point out potential solutions to several issues for meaningful applications of QRAM. First, when the FF-QRAM process leaves the last term in Eq. (8) that corresponds to the states without data entries, the rate of producing the desired QDB can be reduced. This issue can be partially circumvented by running L identical QRAM processes in parallel. Then the success probability of the post-selection improves by a factor of L while also increasing the number of qubits and the gates by the same factor. Note that the time complexity remains the same. Second, the QDB is not reusable once it is consumed by a quantum algorithm since the measurement collapses the state. Motivated by the above, we introduced the concept of quantum forking that allows to reduce the number of QRAM queries in some instances, in particular, when evaluating the inner product. www.nature.com/scientificreports www.nature.com/scientificreports/ Methods error analysis. The C n NOT gate can be decomposed into a C n−1 NOT and a C 2 NOT (Toffoli) using an ancilla qubit prepared in | 〉 0 as shown in step (1) of Fig. 5 for n = 4 as an example. Then by recursion, C n NOT can be realized using 2n − 3 Toffoli gates and n − 2 ancilla qubits prepared in | 〉 0 (step (2) in Fig. 5). Note that n − 2 Toffoli gates are added after the Toffoli gate for conditionally flipping the target qubit in order to unentangle the ancilla qubits from the system. The quantum circuit can be rearranged to further reduce the depth as shown in the last step in Fig. 5.
We assume that the cX operations can occur simultaneously on all target qubits, but even when the control bit is 1, the target qubit may undergo an erroneous identity operation. To reduce the circuit depth, the cX operations after C n R y (θ (l) ) and before C n R y (θ (l+1) ) can be merged. The combined operation flips the jth qubit only if ⊕ = , and otherwise does nothing, for = … − j n 0, , 1 . Thus, the total number of single qubit gates used for writing M classical data of length n is n(M + 1). Each C n R y (θ) uses two single qubit gates and two C n NOT gates. In the C n NOT implementation described above (Fig. 5), 2n − 1 qubits (n control qubits + 1 target qubit + n − 2 ancillae) undergo − ⌈ ⌉ n 2 log ( ) 1 2 time steps. Therefore, the total number of time steps τ that are subject to noise can be counted as If C n R y (θ) can be implemented with only n + 1 independent errors, then τ can be further reduced to + + + n M n M ( 1) ( 1).

Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Figure 5.
Quantum circuit for implementing a C 4 NOT gate using 5 Toffoli gates and 2 ancilla qubits prepared in | 〉 0 in 3 steps. The control bits are a, b, c and d, and the target qubit is | 〉 t . The gates after the Toffoli gate for conditionally flipping the target qubit uncompute the ancila qubits in order to unentangle the ancillae from the system.