A quantum algorithm for string matching

Algorithms that search for a pattern within a larger data-set appear ubiquitously in text and image processing. Here, we present an explicit, circuit-level implementation of a quantum pattern-matching algorithm that matches a search string (pattern) of length M inside a longer text of length N. Our algorithm has a time complexity of Õ(N)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(\sqrt{N})$$\end{document}, while the space complexity remains modest at O(N + M). We report the quantum gate counts relevant for both pre-fault-tolerant and fault-tolerant regimes.


INTRODUCTION
Pattern matching is one of the core algorithms in computer science that stand to benefit from quantum computers 1,2 .Pattern matching algorithms are used ubiquitously used in image processing 3,4 , the study of DNA sequences 5 , and data compression and statistics 6 , to name a few.Thus, accelerating pattern matching using a quantum computer would be a boon to all these areas.
The simplest form of pattern matching is string matching.In string matching, given a long string T of length N, we search for a pattern P of length M with M ≤ N 7 .Depending on the application, we may need to search for an exact match or a fuzzy match, or a match with some wildcards 8 .
The best known classical algorithm for string matching is the Knuth-Pratt-Morris algorithm, which has the worst-case time complexity of Θ(N + M) 9,10 .The best-known algorithms for approximate string matching have a similar run-time of Θ(N + M).For random strings, the exact matching complexity is lower bounded by ΩððN=MÞlog ðMÞÞ 11 .
Ramesh and Vinay developed an exact string matching quantum algorithm with a query complexity of Õð ffiffiffi ffi 1 .This algorithm uses Grover's search to identify the position at which a segment of length M from T matches the pattern P, where each of the checks is done using a nested Grover search.However, this work does not construct explicit oracles required and the total time complexity, measured in units of gate depth, is bound to increase once we account for the gate-level complexity of accessing the text and pattern from a database.Another approach that relies on a quantum solver for the dihedral hidden subgroup problem 12 has a time complexity of ÕððN=MÞ 1=2 2 Oð ffiffiffiffiffiffiffiffiffiffi ffi log ðMÞ p Þ Þ for average-case matching 13 .This work also assumes that M is larger than the logarithm of the length N, i.e M ¼ ωðlog NÞ and fails with a high probability for certain worst-case inputs.In our work, we do not make any assumptions on the length of pattern or the distribution of inputs.
In this paper, we present a string-matching algorithm, based on generalized Grover's amplitude amplification 14 , with a time complexity of Õð ffiffiffi ffi N p Þ for arbitrary text length N and pattern length M ≤ N. Note our algorithm does not rely on a quantum database, incurring no initialization overhead of the database, expected to be O(N), that would overshadow any quantum advantage.The techniques we develop for our algorithm can readily be extended to solve pattern matching problems in higher dimensions.Over the course of detailing each step of our algorithm, we also ensure to provide a gate-by-gate level instruction to construct relevant quantum circuits.This allows us to straightforwardly obtain a concrete estimate of the total gate counts.The gate counts we report help us establish contexts as to when we may expect quantum computers to be of help in the problem space of pattern matching.
Our paper is organized as follows.To motivate the readers, we first compare our main results that are derived in the remainder of the paper with the current state of the art.After the comparison, we provide an outline of our string-matching algorithm.In Section "Results", we provide the details of the algorithm, including the explicit circuits for all necessary oracles.We then calculate the overall complexity of our algorithm.We provide an estimate for gate counts in terms of CNOT and T gates, useful for pre-fault tolerant and fault tolerant regimes, respectively.We summarize our paper in Section "Discussion" and discuss the implications of our results.
We start by pointing out that our work differs from 13 , where the algorithm therein targets an average case input, in that we, as in 1 , provide a quantum algorithm for pattern matching for the worst case inputs.The work in 13 further assumes M ¼ ωðlog ðNÞÞ, whereas the work in 1 and the work reported in this manuscript do not.We rely on a Grover oracle (see Section "Grover oracle") that simply checks if a state is an all-zero state in the computational basis, whereas the oracles in refs 1,13 are random memory access oracles of the form P i i j i 0 j i !P i i j i t i j i where t i is the ith bit of a text.As such, we are unaware of an efficient quantum circuit that implements the oracle (see Section G.4 of the appendix of ref. 15 for the best-known construction) without resorting to quantum random access memory (QRAM) 16 .The known blueprints for QRAM 16 have polylogarithmic time complexity in the size of memory to be accessed.In our case, the size of memory is O(N) and, therefore, QRAM queries will incur at additional multiplicative cost of at least Oððlog NÞ 2 Þ.Moreover, we would also have to account for the cost of initializing the quantum memory-this is expected to take a number of operations linear in N 17 .In contrast, our algorithm does not assume any random access oracles.We also provide an explicit circuit for the Grover oracle we need using elementary quantum gates, specifically single-qubit Clifford, T, and CNOT gates.
Note the algorithm in ref. 13 fails with a probability O(1/N) over the choice of T and P. For certain worst-case T and P, the algorithm inherently fails to return a match.In addition, there is internal randomness in the algorithm which contributes to an additional probability of failure.Our work also fails with probability O(1/N) if there is a match between T and P, but this is purely due to the internal randomness of Grover's algorithm.We can simply repeat the algorithm to suppress the failure probability to be arbitrarily small, with the average repetition number of N/(N − 1).We make no assumptions on the distribution of text and pattern and the algorithm works for all possible inputs.This may be contrasted to the impossibility to suppress the failure probability by repeated use of the algorithm for the worst-case inputs in ref. 13 .
Our algorithm has a space complexity of O(N + M) since we need N (M) qubits to store the text (pattern).With N > M, we may omit the M dependence and simplify it to O(N).The space complexities of 1,13 depend on the space complexity of the oracle.Assuming an N-bit register containing the text to be searched over is prepared in QRAM, in the bucket-brigade model, the bulk of the space complexity comes from routing qutrits, where random access over N bits of information requires O(N) routing qutrits.Expending a constant number of qubits for each qutrit, the space complexities of 1,13 are Ω(N), and likely Θ(N).
Finally, unlike the two prior works, the simplicity of our algorithm allows us to not just provide an explicit circuit-level blueprint for the algorithm but also estimate the quantum resources needed to implement it.A summary of the comparison between our work and 1,13 is given in Table 1.
In the remainder of this section, we outline the steps of our algorithm.The detailed implementation is presented in Section "Results".

Initialize two quantum registers to
where t i and p i denote the ith bit of string T and pattern P, respectively.2. Transform the first register containing the string T into a superposition of N states, where each state is a bit-shifted state of the original state of the first register, shifted by 0, 1, 2..., N − 1 bits.This results in, assuming modulo-N space for the bit indices, 3. Compute XOR between the first M bits of the first register and all M bits of the second register to obtain 1 ffiffi ffi 4. The second register is all zeros if the pattern matches with the first M bits of T .The register contains d ones if the string and the pattern differ in d bit positions.5. Use the generalized Grover search or amplitude amplification 14 to isolate the state where the second register has all zeros (when searching for exact match) or has fewer than D matches (in the case of fuzzy search).

RESULTS
In this section, we lay out the detailed implementation of the algorithm we outlined above.Specifically, we detail the transformations and registers used to implement the algorithm.One of the central transformations to be used in our algorithm is the cyclic shift operator.We present the details of its construction in Section "Construction of the cyclic-shift operator."We also present the construction of the necessary Grover oracle in Section "Grover oracle" for completeness.
To encode a binary string T of length N and a binary pattern P of length M, we use quantum registers of N and M qubits, respectively.This can be done by using identity and bit-flip gates on a quantum register initialized as 0 j i ðNþMÞ .Denoting the encoded states as where t i (p i ) is the ith bit of string T (P), together with an index register of n qubits in the zero states, we prepare on a quantum computer a composite initial state where, for convenience, we assumed N = 2 n .Next, we apply an nqubit Hadamard transform H ⊗n (or a Fourier transform in case of N ≠ 2 n for n 2 N) on the index register to produce a uniform superposition of 0 j i; (5) We now apply a cyclic shift operator S that left-circular shifts the qubits of the target state by k positions, where the values of k are encoded in the control state (see Section "Construction of the cyclic-shift operator" for details).Applying S on the first two registers results in At this point, we check for the match between the cyclicallyshifted text strings in the second register and the pattern string stored in the third register.We use an XOR operation between The oracles for refs 13 and 1 provide random access to bits in the text and pattern.This random-access oracle is not needed in our work.Instead, for our work, by oracle, we mean a Grover oracle that checks if a register is in an all-zero state.We provide an explicit construction for such an oracle.The time complexities for refs 1 and 13 are unknown because the time of execution depends on the random-access oracles, which do not have a circuit-level construction in the respective papers.
each of the first M bits of the second register with each of the M bits of the third register.For instance, if the XOR results are all zeros, the strings match.With the help of CNOT gates on a quantum computer then, we obtain, with an abuse of notation, 1 ffiffi ffi The final register, to this end, contains the number of mismatches between the pattern and the first M bits of the string register.Indeed, it is all zero if and only if those two string segments match completely.
We may now use the generalized Grover search or amplitude amplification 14 to search for the state where the pattern register is in 0 j i state (in the case of exact search).If this state is found, we know that the pattern occurs in the string.We also obtain the position from the index register where this match occurs.In addition to the exact match, we can also use this method to search for fuzzy matches or matches with wildcards by constructing appropriate Grover oracles.

Construction of the cyclic-shift operator
In this subsection, we explicitly construct a circuit that implements the cyclic-shift operator S. The two-register operator S is defined according to To implement the k-controlled circular shift operator S k , we consider k in its binary encoded form k j i as k 0 j i k 1 j i k nÀ1 j i, such that 2 0 k 0 + 2 1 k 1 + … + 2 n−1 k n−1 = k.The circular bitwise rotation by k in the second register can then be implemented by a product of controlled-shift operators that shifts the target qubits by 2 j bits, conditioned on the k j th qubit.Using S a S b ¼ S aþb , we may now write where S ðkj Þ 2 j applies a shift of 2 j bits on the second register, which encodes the text T , controlled by the jth qubit of the index register k j i.The circuit decomposition of this as a visual guide is shown in Fig. 1.
The decomposition shown in (9) reveals that, together with (8), it suffices to now consider the controlled bit-shift operators S ðcÞ 2 j that circular shifts by 2 j bits for some j conditioned on qubit c to implement the cyclic-shift operator S. To this end, in order to construct the circuit for S ðcÞ 2 j , we first consider an operator S 2 j without any controls, which, as we show below, can be implemented using SWAP gates.We later promote the swap gates to a controlled version, effectively replacing the SWAP gates with controlled-SWAP (Fredkin) gates.
A circular shift operator S s by s bits applies a permutation P s , in modulo N space, of the form ; N À s À 1g; (10)   where the N − sth bit is inserted in the zeroth position, N − s + 1th bit is inserted in the first position, and so on.Any such permutation can be decomposed into a product of transpositions.As a result, a circular shift operation of the form ( 9) can be decomposed into a product of SWAP operations.We now calculate how many SWAP-operation layers are needed to efficiently apply the permutation of the form (10).With a register with N qubits, we can apply N/2 SWAP operations in parallel.Using the N/2-parallel SWAP operator, we can move N/2 qubits to their right positions in a single time step.This leaves us with sorting the remainder of N/2 bits.At each subsequent time step, the number of qubits that need to be swapped decreases by half.Therefore, we can arbitrarily permute N qubits in Oðlog ðNÞÞ time steps using parallel SWAP operations.A sample diagrammatic representation of this unitary operation is shown in Fig. 2.This implies that each of the controlled shift operators S kj 2 j in ( 9) can be achieved in Oðlog ðNÞÞ time steps using parallel controlled-SWAP operators.
We next discuss a method to apply as many as N/2 parallel swap operations, controlled on the same qubit in the index register.As shown below, we achieve this at the cost of N/2 clean ancilla qubits.
We start by considering a fan-out CNOT operation, acting on the control qubit in a state k j and N/2 clean ancilla qubits initialized to 0 j i as targets.This results in N/2 copies of k j , which can then be used to implement up to N/2 Fredkin gates in a single time step.Once all necessary Fredkin gates have been implemented, we undo the fan-out operation and return all ancilla qubits to 0 j i Fig. 1 Circuit diagram for circular bitwise rotation operator S k .A shift by k bits can be achieved by a product of log ðkÞ controlled shift operations.
Fig. 2 A diagrammatic representation of the circular shift operator.In this example, we left circular shift a register of 8 qubits by 6 positions within two-time steps.This kind of operation can in general be performed in depth log ðNÞ À 1 using parallel SWAP operations, where N is the size of the qubit register.
states.We recycle the freed-up ancilla qubits for the subsequent control qubits, one at a time.
The time cost of the fan-out operation is Oðlog ðNÞÞ.Since there are Oðlog ðNÞÞ parallel SWAP layers required for the implementation of the qubit permutation discussed in Section "Construction of the cyclic-shift operator", the overall time complexity of S ðcÞ 2 j is Oðlog ðNÞÞ.

Grover oracle
To complete our algorithm, we need a Grover oracle U w that acts on the pattern register, required to amplify and help identify exact matches or close matches.The oracle may be defined according to x i >d; where d is zero if we desire to find exact matches and a small number if we desire to find close matches.Assuming an architecture that has long-range interactions, we can obtain this oracle in Oðlog ðMÞÞ depth using O(M) ancilla qubits.We note in passing that there have also been proposals to implement a single-step n-control Toffoli that takes O(1) time in trapped-ion and neutral-atom architectures 18 .For the remainder of the paper, however, we take the circuit-depth complexity of this oracle to be Oðlog ðMÞÞ.

Time complexity
In this subsection, we compute the time complexity of our algorithm.Encoding of strings T and P takes O(1) time.The Hadamard transformation applied to the index register takes O(1) time as well.The cyclic-shift operator S takes time Oððlog ðNÞÞ 2 Þ, since each S ðkj Þ 2 j operator, including the fan-out and its uncompute operation, takes Oðlog ðNÞÞ time and j ¼ 0; 1; 2; :::; log ðNÞ À 1.The evaluation of XOR results via CNOT gates takes time O(1), as it admits straightforward parallel operation.Lastly, the Grover oracle has the complexity Oðlog ðMÞÞ.The overall complexity of the steps considered so far, a single Grover step, is then Oððlog ðNÞÞ 2 þ log ðMÞÞ.
For the Grover search to be successful, we need to repeat the Grover steps Oð ffiffiffi ffi N p Þ times.This brings the total complexity to Oð ffiffiffi ffi N p ððlog ðNÞÞ 2 þ log ðMÞÞÞ.

Space complexity
In addition to the N and M qubits needed to encode the search string and the pattern, we need Oðlog ðNÞÞ qubits for the index register.For the depth-optimized implementation of our algorithm we need N/2 ancilla qubits for the index register.Furthermore, O(M) ancilla qubits are required for the depthoptimized Grover oracle implementation.Therefore, the space complexity of our string-matching algorithm is O(N + M).

Gate counts
In this section, we obtain an estimate for the gate count in terms of CNOT and T gates.We chose the two gates as metrics since it is widely expected that two-qubit gates, such as CNOT, are expected to dominate the cost of implementation in the pre-fault tolerant regime, whereas T gates are expected to dominate the cost of implementation in the fault-tolerant regime, assuming the standard gate set of Clifford + T.
The strings T and P can be encoded in qubits initially in 0 j i state using only identity and bit-flip(X) gates and thus the encoding step has zero cost.A Hadamard transform of the index register in (5) where the factor of 2 comes from the fact that for amplitude amplification, we need to apply a unitary to produce a state ψ j i ¼ U 0 j i and also the inverse unitary U † .Based on (12), we see that searching for a pattern with 20 ASCII characters (or 160 bits) in a text file that is 1 MB long would require about 10 13 CNOT and T gates.Similarly, searching for a kilobyte-long pattern of a genetic signature in a genome sequence of 1 GB would require more than 10 17 CNOT and T gates.We expect classical computers to outperform quantum computers for datasets of such length.However, for applications like matching templates in data generated by gravitational-wave experiments which may be petabytes long (matching a megabytelong signature in the petabyte-long text would require 10 25 CNOT and T gates), we may expect to see the quantum advantage.

DISCUSSION
In this paper, we have constructed a quantum string-matching algorithm that admits a circuit-depth complexity of Oð ffiffiffi ffi N p ððlog ðNÞÞ 2 þ log ðMÞÞÞ.We also provide an explicit gatelevel implementation of our algorithm, enabling a concrete estimate of quantum resources needed.The direct use cases of the matching algorithm range from a simple text search in a large file to detecting patterns in an image.The simple matching procedure can help, for example, in making intelligent recommendations based on pictures in a consumer device 22 , detecting defects in industrial lithography 23 , detecting signals in large timeseries data collected in experiments like the Laser Interferometer Gravitational-Wave Observatory 24 , etc.In these applications, the typical size of data to be searched varies between ~10 6 and ~10 15 bytes.Our algorithm admits processing of such data size in time steps $ C ðlog 2 ðNÞÞ 2 ffiffiffi ffi N p , where C < 20 and N is the number of bits in the data.We hope the speed-up provided by the quantum algorithm contributes to further advances in these areas.

Table 1 .
Comparison of our work with prior algorithms discussed in this paper.
21,20arget qubits, its inverse, and at most N − 1 Fredkin gates, since the permutation specified in(10)of size as large as N can be decomposed into at most N − 1 transpositions.As shown explicitly in Supplementary Note 1, based on circuit identities reported in refs19,20, each Fredkin gate costs 7 CNOT gates and 7 T gates.Thus the cyclic shift operator costs at most ð8N À 9Þlog ðNÞ CNOT gates and ½7ðN À 1Þlog ðNÞ T gates.Next, the XOR operation in(7)takes M CNOT gates.Lastly, the Grover oracle of (11), using a parallelized version of the results reported in21(see Supplementary Note 2 for details), can be implemented with 6M − 12 CNOT gates and 8M − 17 T gates with a linear overhead in ancilla upper bounded by M − 3.
needs log ðNÞ Hadamard gates, requiring zero cost as well.The cyclic shift operator S in (6) consists of log ðNÞ applications of S ðcÞ s operators.Each S ðcÞ s operator consists of a CNOT fan-out to N/2