Better-than-classical Grover search via quantum error detection and suppression

Grover's search algorithm is one of the first quantum algorithms to exhibit a provable quantum advantage. It forms the backbone of numerous quantum applications and is widely used in benchmarking efforts. Here, we report better-than-classical success probabilities for a complete Grover search algorithm on the largest scale demonstrated to date, of up to five qubits, using two different IBM superconducting transmon qubit platforms. This is enabled, on the four and five-qubit scale, by error suppression via robust dynamical decoupling pulse sequences, without which we do not observe better-than-classical results. Further improvements arise after the use of measurement error mitigation, but the latter is insufficient by itself for achieving better-than-classical performance. For two qubits, we demonstrate a success probability of 99.5% via the use of the [[4,2,2]] quantum error-detection (QED) code. This constitutes a demonstration of quantum algorithmic breakeven via QED. Along the way, we introduce algorithmic error tomography, a method of independent interest that provides a holistic view of the errors accumulated throughout an entire quantum algorithm, filtered via the errors detected by the QED code used to encode the circuit. We demonstrate that algorithmic error tomography provides a stringent test of an error model based on a combination of amplitude damping, dephasing, and depolarization.


I. INTRODUCTION
The best possible classical strategy for finding a particular "marked" element in an unsorted list of length N requires querying half of the elements in the list on average; a quantum computer (QC) can do this in quadratically fewer queries using Grover's search algorithm [1].This algorithm is optimal and provably better than all classical strategies [2].As one of the first algorithms with a provable quantum speedup, Grover search is often used as a subroutine for other quantum algorithms [3,4].Over the last two decades, Grover search has been implemented on various quantum computing platforms [5][6][7][8], albeit for relatively small N .
Encoding a list of length N requires n = log 2 (N ) qubits.The list can be queried classically or using quantum queries; in both cases, one finds the marked element with some probability, which we refer to as the classical or quantum success probability.The largest implementation of Grover's algorithm to date is for n = 8 qubits, but without demonstrating a better-than-classical quantum success probability [5].Such better-than-classical performance has been achieved for n = 3 [6,7] and n = 4 [8] qubits.Here, employing two seven-qubit IBM Quantum Experience (IBMQE) transmon qubit platforms ibm nairobi (Nairobi) and ibmq jakarta (Jakarta), we demonstrate higher success probabilities than all previous implementations, for n ≤ 5. [ [4,2,2]] quantum error-detecting code [9,10], which encodes k = 2 logical qubits into n = 4 physical qubits and detects arbitrary single-qubit errors, to demonstrate a significant success probability enhancement relative to using two copies of n = 2 physical qubits.These success probabilities are further improved by combining error detection with measurement error mitigation [11,12].We use the quantum error detection results to perform what we call algorithmic error tomography: for each algorithm execution we compute the probability of an output X, Y , or Z error (corresponding to the three Pauli matrices) on one of the four physical qubits, or a logical error.This allows us to compute a detailed map of the errors that arise after executing the entire algorithm.In this sense, algorithmic error tomography provides a holistic and complementary perspective to techniques such as gate set tomography [13,14], which instead focus on individual gates.
We compare the experimentally obtained results for Grover's algorithm with an error model based on the concatenation of amplitude damping, phase damping, and depolarization maps.Each map is parameterized by the calibration metrics provided by the IBM Quantum Expe-rience (IBMQE) backend [22].We test this model using the observed success probabilities and the algorithmic error tomography results; the latter provides a much more stringent test.We find good agreement with the model, but only after using DD.We interpret this in terms of the suppression of crosstalk by DD [23,24], which is unaccounted for by the error model.
In summary, we demonstrate a better-than-classical Grover search for up to 5 qubits, enabled by quantum error detection and dynamical decoupling.That is, we demonstrate algorithmic performance that is enhanced beyond the break-even point -where protected operations outperform their unprotected counterparts -and the capabilities of the best possible classical algorithm executing the same task.Along the way, we introduce algorithmic error tomography -a characterization of errors afflicting an entire quantum algorithm based on the syndromes of a quantum error detecting code.
The structure of this paper is as follows.In Section II, we summarize Grover's algorithm's salient aspects and discuss its implementation.In Section III, we describe the open system model we use to compute the theoretically expected algorithmic performance.Details about our dynamical decoupling implementation are in Section IV.Section V focuses on the performance of Grover's algorithm on n = 2 qubits with and without error detection.Algorithmic error tomography is introduced in Section V as well.The results for 2 < n ≤ 5, where DD plays a crucial role in achieving better-than-classical performance, are given in Section VI.We conclude with observations and the implications of our results in Section VII.

II. GROVER'S ALGORITHM: BACKGROUND AND IMPLEMENTATION
A. Problem Description Informally, the Grover problem is to search an unsorted list with N = 2 n elements for a marked element.Formally, the goal is to find the marked n-bit bitstring m using the smallest number of queries of an oracle that implements a function f m : {0, 1} n → {0, 1} defined as f m (x) = δ x,m .Classically, after q queries, the probability of correctly identifying the marked element, which hereafter we refer to as the success probability, is p C s (q, N ) = (q + 1)/N (see Appendix A).Consequently, the classical algorithm requires O(N ) queries.
Grover's algorithm provides a quadratic quantum speedup, requiring only O( √ N ) queries [1].This scaling remains valid with more than one marked element [25], or even for an arbitrary initial amplitude distribution over the list elements [26].In the original setting of a single marked element, the state after q queries to the oracle is ψ q = sin[(2q + 1)θ] |m + cos[(2q + 1)θ] |m ⊥ , (1) where x =m |x and θ = arcsin 1 √ N .Thus, the quantum success probability is p Q s (q, N ) = sin 2 (2q + 1)θ , and the theoretically optimal number of queries is q opt = π 4 √ N .Note that p C s (q, N ) < p Q s (q, N ) for all q < q opt .However, the theoretically optimal q is often not experimentally optimal.As circuit depth increases with the number of queries and the problem size, there is a trade-off between the added decoherence and the increase in the success probability.Most experimental implementations of Grover's algorithm have focused on a single query [5][6][7][8], but this strategy does not scale well, as both p C s (1, N ) and p Q s (1, N ) decrease exponentially with n.We adopt an empirical approach to identify the optimal number of queries such that p s is maximized.We set q = 2 for all problem sizes other than n = 2 where q opt = 1.We justify our choice of the number of queries in Appendix B.

B. Implementation
A schematic illustrating the implementation of the nqubit Grover algorithm is shown in Fig. 1.The only multi-qubit operation is the n-qubit controlled-phase gate C n−1 Z, which needs to be implemented twice for each oracle query: once for the oracle and again for the amplitude amplification step.Different marked elements are represented by sandwiching the C n−1 Z gate with X i or I i depending on whether the corresponding bit b i in the marked bitstring m is 0 or 1. I.e., letting m = b 1 b 2 . . .b n , then C n−1 Z in the oracle layer is preceded and followed by X Likewise, amplitude amplification is implemented as For all problem sizes and oracles, we repeated each circuit for the maximum number of shots allowed on the QPU: 20000 and 32000 for Nairobi and Jakarta, respectively.The reported success probabilities were extracted by bootstrapping over these trials and all N possible marked states.All error bars reflect 95% confidence intervals obtained after bootstrapping unless specified otherwise.

III. OPEN SYSTEM MODEL
The QPUs used here are calibrated daily, and the following calibration metrics are recorded: the gate error e g and gate duration τ g , the qubit damping timescale T 1 and dephasing timescale T 2 , and the response matrix M for readout errors (see Appendices C and D).In this section, we describe how we estimate the theoretical performance of Grover's algorithm using these metrics.The model described here is mathematically equivalent to the one used in Qiskit's Aer API (see the supplementary information of Ref. [27]).In a closed system described by a state ρ, a unitary gate U acts as U(ρ) = U ρU † .In reality, the system is open, so we model gate U as a CPTP map E = D •Φ•A•U, where D, A, Φ are depolarizing, amplitude damping and phase damping maps respectively [28].The amplitude damping and phase damping maps account for thermal relaxation, which we represent as R = Φ•A.The single-qubit Kraus operators for A = {A 0 , A 1 } and Φ = {F 0 , F 1 } are The n-qubit depolarizing map is where D has 4 n Kraus operators: We parameterize these maps by their respective error probabilities p A , p Φ and p D , which in turn depend on the calibration metrics e g , τ g , T 1 , and T 2 .In particular, where 1 and is the average gate fidelity for a CPTP map E.Here F pro (E) is the process fidelity of the map E with the target map U, and d is the dimension of the map [29].
For each gate, we know the total gate error e g , and so we compute p D by setting 1 − e g = F (D • R), which gives us Eq. (5c) (see Appendix E for more details).In other words, we assign any gate error not accounted for by relaxation to depolarization.Since p D ≥ 0, we must have F (R) ≥ F (D • R), i.e., the error due to relaxation alone cannot exceed the error due to relaxation followed by depolarization.If this condition is not met, then we assume that the error is entirely due to depolarization, so that F (R) = 1 and hence we set p D = e g d/(d − 1) and E = D • U.For idle intervals in the circuit, no gate error e g is reported by IBMQE [22], and therefore we model idle intervals with duration τ as identity operations where only the relaxation R matters.This is equivalent to setting the gate error for idle intervals to e idle = 1 − F (R).
In summary, we model single-qubit gates U 1Q with gate duration τ g as and two-qubit gates U 2Q with duration τ g acting on qubits j, k are modeled as Recall that each oracle query for 4-qubit Grover requires two C3Z gates.C3Z requires 14 CNOTs, and the entire circuit uses 28 CNOTs; see Appendix F for circuit compilation details.The pre-DD circuit elements are grayed out, and the colored lines represent the DD pulses.The DD sequence exemplified here uses four pulses for illustration purposes; in reality, we used longer sequences.The scheme demonstrated highlights four primary features of our implementation: (1) all idle intervals, including the ones on inactive qubits, are filled, (2) only one repetition of each sequence is performed, and the pulse interval is adjusted accordingly, (3) each pulse in the sequence can be unique, (4) a single qubit can experience multiple DD repetitions if there are multiple idle intervals.
The quantum circuit for each experiment is first compiled into the QPU's native gate set and then scheduled using IBMQE's API [22].We use this circuit to determine the order of operations and then replace each unitary map U with the corresponding CPTP channel E. In the end, we acquire a probability distribution corresponding to the theoretical estimate of the circuit's output as measured in the computational basis.

IV. DYNAMICAL DECOUPLING
DD is an open-loop quantum control technique wherein a sequence of pulses is strategically inserted between gates to suppress unwanted system-bath interactions [15][16][17][18].While DD is fully compatible with quantum error correction [30], its most economical form requires no encoding, measurements, or post-processing.It is, therefore, perhaps the least resource-intensive error suppression strategy.Error suppression via DD has a long history of experimental demonstrations on various quantum devices (see Ref. [31] for a review).Here, we employ a "decouple then compute" strategy [32,33], whereby control pulses constituting short but complete DD sequences are interleaved with the quantum circuit, exploiting intervals when individual qubits in the corresponding quantum circuits are idle.A scheme demonstrating our strategy is shown in Fig. 2.This interleaving strategy has been used to improve quantum volume [34], variational quantum algorithms [35], and most recently to demon-strate an algorithmic quantum speedup [36].
In addition to the popular basic DD sequences -CPMG [37] and XY4 [16] -we consider three robust sequence families: universally robust (UR) DD [19], concatenated DD (CDD) [20], and robust genetic algorithm (RGA) DD [21] (see Appendix B for more details).Other than CPMG, these are all high-order, multi-axis sequences that are universal for single qubits, i.e., they suppress arbitrary single-qubit errors beyond first order in the Magnus or Dyson expansion [38].Robustness refers to the mitigation of axis-angle and over/under-rotation errors.In addition, these sequences can cancel crosstalk errors [23,24].Our sequence choice is informed by the results of Ref. [39], which reported on a significantly more comprehensive survey of sequences using superconducting qubits and concluded that robust sequences are preferred default choices.Here we do not utilize the Open-Pulse functionality of the IBMQE platforms, nor do we implement Uhrig-type [40] non-uniform pulse interval DD sequences such as quadratic DD (QDD) [41], which were also found to perform well in the survey [39].Both have the potential to enhance our results and are attractive options for future studies.

V. TWO-QUBIT ENCODED GROVER ALGORITHM PROTECTED BY QUANTUM ERROR DETECTION
The [[4, 2, 2]] code [9] is the smallest possible qubitbased error detecting code [10] and has been invoked for proof-of-principle demonstrations of quantum error detection [42].Notably, it has been used to improve Clifford gate set fidelities [43] and the performance of variational algorithms [44].However, measurement error mitigation (MEM) played a dominant role in Ref. [44], and it is unclear if error detection alone would have improved performance in that work.Here, we compare the performance of the two-qubit Grover algorithm with and without the [[4, 2, 2]] code and MEM.The unencoded version needs two qubits, while the encoded version requires four.To equalize resources, we simultaneously use two copies of the unencoded circuit and report the best fidelity of the two copies.We incorporate MEM using iterative Bayesian unfolding via the pyIBU package [45] (see Appendix D for details about MEM).Ultimately, we demonstrate a conclusive improvement in algorithmic performance due to quantum error detection.

A. Encoding into the [[4, 2, 2]] code
The stabilizers of the [[4, 2, 2]] code are XXXX and ZZZZ.The logical operators of this code can be chosen as X 1 = XIXI, X 2 = XXII, Z 1 = ZZII, and Z 2 = ZIZI.Two-qubit Grover also requires the encoded Hadamard H and controlled-phase CZ.The deconstruction of the logical circuit into physical components is de- tailed in Appendix F, and the resultant encoded twoqubit Grover circuit is shown in Fig. 3.
The encoding and decoding circuits, U enc and U † enc , are also depicted in Fig. 3.The logical basis states are: These are also the four possible marked states in the two-qubit Grover problem.Consequently, after applying U † enc to decode the results, only states from the set I = {|0000 , |0010 , |0111 , |0101 } could have arisen from valid logical states.Therefore, we postselect by removing any of the 12 measurement outcomes that do not correspond to valid logical states, i.e., states in I ⊥ = {0, 1} 4 I.

B. Algorithmic error tomography
Even though we discard any states not in the set I during postselection, there is important information in such outcomes.They allow us to diagnose the frequency with which different Pauli errors appear at the end of the circuit for a given encoded marked state, i.e., to perform algorithmic error tomography (AET), which we now describe.
Let us first consider the different ways errors can occur before the measurement outcome is obtained.In principle, errors can occur anywhere during the circuit, in-cluding between the decoding and measurement steps or even during the measurement.There can be multiple errors at multiple locations of arbitrary weight.Given an [[n, k, d]] code C, in AET, we treat all these errors as either the effective errors that this code can detect or as logical errors, and assign a location to these errors as if they happened just before decoding.Namely, if |b ∈ C is a code basis state, where b ∈ {0, 1} k , E is an error, U dec = U † enc is the decoding unitary for C, and M denotes a projective measurement in the computational basis, then in AET .In that case, we can further identify the error type it arose from (i.e., associate it to a particular error subspace C ⊥ E ).By finding the empirical relative frequency of this error type, we can compute this error's probability p E .For logical errors, the procedure is similar; we need to find the empirical relative frequency of |b arising from a given code basis state |b .
Let us now illustrate this formal description of AET using the [[4, 2, 2]] code.
In this case C = span({|0000 , |0010 , |0111 , |0101 }).It is clear from Eq. ( 11) that the decoding Thus, each subspace uniquely determines the error type, which we use to perform algorithmic error tomography.map between encoded states |b 1 b 2 and unencoded states: ] code and shows U dec E |b 1 b 2 for all single-qubit Pauli errors and logical computational basis elements.As the probability of each outcome (i.e., each row in Table I) depends on the error E, the bitstring observed after applying the decoding circuit tells us the frequency with which different single-qubit Pauli errors occurred.More specifically, since the [[4, 2, 2]] code is a distance-2 code, we can only infer which weight-1 error type occurred (X, Y , or Z), but not which specific qubit was affected.In other words, we can associate each observed state |b (b ∈ {0, 1} 4 I) with an error subspace C ⊥ X , C ⊥ Y , or C ⊥ Z .Each observed state |b will have an empirical probability p b = N b /N tot , where N b is the number of times |b is observed out of a total of N tot observations.The probability of single-qubit X-type errors, p X , is the sum of p b in the first four rows, etc.

Algorithmic error tomography
Figs. 4 and 5 show the results of AET after implementing or simulating the encoded two-qubit Grover algorithm on Jakarta and Nairobi, respectively.Each of the four panels corresponds to a different error outcome table, with the top row representing experiments and the bottom row representing simulations using the model of Section III.The left and right columns of Fig. 4 exclude or include DD, respectively.We did not use DD in the Nairobi case (Fig. 5).Each row corresponds to a different encoded basis state within each error outcome table, i.e., encoded marked state.Other than the logical errors counted during postselection, all other columns represent outcomes ignored during the postselection step.We discuss these results in more detail in the following subsection, showing how AET allows us to identify and mitigate qubit crosstalk.

DD protection and comparison with the open system model
Recall that we define the success probability p s as the probability of correctly identifying the marked element.We denote the empirical success probability obtained for a list of N elements after q oracle queries by p e s (q, N ).Our results are summarized in Fig. 6, which shows the failure probability (1 − p e s (1, 4)) for the unencoded and the encoded implementations on two different QPUs.
Before comparing the results with and without encoding, we analyze whether the observed performance matches the model of Section III in both cases.Let us focus first on Jakarta, where without DD, the empirical failure probabilities in the unencoded case are slightly higher than predicted; see the leftmost column of Fig. 6.Fortunately, in the encoded case, Jakarta's failure probability overlaps with the prediction bands (Fig. 6, third column from the left).However, a closer look at the detected errors via AET reveals a different discrepancy.The simulated results for Jakarta (bottom-left table in Fig. 4) do not match the empirical error profile (top-left table of Fig. 4), which has significantly stronger Z errors and also a state-dependent asymmetry in these errors.In other words, Jakarta does not match the simulations for unencoded or encoded circuits without DD.In contrast, for Nairobi, Fig. 5 shows that the AET simulation results agree with the empirically observed ones.This also holds for the simulated failure probabilities (Fig. 6).
To investigate Jakarta's observed discrepancy, we first attempt to systematically amplify p D , T 1 , and T 2 by multiplying each quantity by a phenomenologically determined variable λ i (see Appendix G).This leads to a better overlap between predicted and observed success probabilities but does not reproduce the AET asymmetry seen in Fig. 4.This shows the limitations of the phenomenological model of Section III and highlights the level of detail provided by AET.
However, the Jakarta discrepancy is effectively removed after the application of DD.Our two-qubit Grover implementation uses four qubits, leaving three inactive qubits in the 7-qubit QPUs used in our experiments.As there are no idle intervals in the unencoded two-qubit Grover circuit, we applied the XY4 sequence on the inactive qubits -q 2 ,q 4 , and q 6 (see Appendix C).We applied the XY4 sequence to both the active and inactive qubits for the encoded case.Due to the relative spar-  5. Algorithmic error tomography on Nairobi.Data entries are as in Fig. 4, except that only data without DD is shown.Good agreement is observed between the results of our error model and the experiment.In particular, compared to the error tomography table for Jakarta (Fig. 4), we do not observe an asymmetry in Z errors across marked states.
sity of idle intervals in the two-qubit Grover circuits, we did not attempt to implement robust sequences, which require more pulses than XY4.
Fig. 6 shows how the failure probability and the rates of various detected errors on Jakarta are affected by the presence of DD.For the unencoded case (the first two columns from left of Fig. 6), DD improves the performance slightly, and the discrepancy between the predicted and observed failure probabilities is removed.The improvement by DD in the unencoded two-qubit Grover case is in concurrence with Refs.[23,24], which showed the efficacy of the XY4 sequence in suppressing static ZZ crosstalk in superconducting qubits.In other words, these results confirm that ZZ crosstalk -which is welldocumented for superconducting QCs [46] -likely contributes to the observed performance being slightly worse than expected from the model.
Adding the XY4 sequence removes most of the empirical-theoretical discrepancies in both the magnitude and the asymmetry of the errors exhibited by the AET profiles, as seen by comparing the top and bottom right of Fig. 4. With DD, the encoded circuits have a weaker state-wise asymmetry in Z-errors than seen in the left column of Fig. 4.Moreover, the DD-protected circuits more closely reproduce the distribution of detected errors predicted by the model of Section III than the same circuits without DD.This observation -that the agreement between our theoretical model and the experimental results improves under DD -is further validated below.
The close agreement we found for Nairobi between our (crosstalk-free) model and the experimental results without DD or MEM (Fig. 5 and the first and third from left columns of Fig. 6) suggests that crosstalk does not play a significant role in this QPU.Fig. 6 does exhibit a significant discrepancy between the model and the experimental Nairobi results when MEM is included (second and last columns of Fig. 6).As we show in Appendix D, this discrepancy arises from the choice to mitigate readout errors using iterative Bayesian unfolding (IBU) [12].
Finally, Fig. 7 complements the first and last columns of Fig. 6, as well as the AET results, and shows the output distributions for Jakarta and Nairobi for the twoqubit Grover case, with and without encoding and MEM. for Jakarta with DD and Nairobi, or pink for Jakarta without DD) correspond to 95% confidence intervals after bootstrapping.Dark green appears where the pink and light green colors (i.e., Jakarta with and without DD) overlap.In the Unenc case, we run two identical copies of the two-qubit Grover problem to equalize resources with the Enc case and choose the copy with the highest success probability.Also shown are the results with MEM using iterative Bayesian unfolding (see Appendix D for details).Failure probabilities with and without DD protection are shown for Jakarta but not for Nairobi, where the simulated and observed error tomography and failure probabilities are in agreement (see Section V C 2).The presence of DD does not affect the success probability in the encoded implementation, and as a result, the pink bars are mostly hidden behind the green bars.However, the nature of detected errors, even in the encoded case, is affected by DD (see Fig. 4).All data for different runs on the same QPU were collected on the same day; data from different QPUs were collected on different days.
The main observation is that for the unencoded case, the maximum success probability is obtained for the marked state |00 , which is also the QPU's ground state; this is unsurprising given the dominance of amplitude damping errors.With encoding plus error mitigation, the overall performance increases and becomes independent of the marked state.

Success probability: beyond break-even improvement
We now focus on the effect of error detection on twoqubit Grover performance as seen in Fig. 6.Due to the shallow circuit depth, even without any error detection, p e s (1, 4) ∼ 93.0% -already much higher than the classical success probability p C s (1, 4) = 1 2 .Adding error detection improves the success probability to ∼ 96.0%.The effect of MEM is similar to that of error detection: the success probability increases to ∼ 97.0%.Combining error detection with MEM results in additional improvement: we obtain success probabilities of ∼ 98.5% on Nairobi and ∼ 99.5% on Jakarta.Due to error detection and MEM, Jakarta's success probabilities increase by an order of magnitude.
This improvement over the unencoded case is nontrivial, considering that the [[4, 2, 2]] code can only detect weight-1 errors, and the encoded circuit requires six twoqubit gates.In contrast, the unencoded version requires only two.The relatively high success probabilities we observe in the encoded case suggest that most errors, even those due to the two-qubit gates, manifest as weight-1 errors.This shows, albeit for a relatively small problem size, that error detection can more than offset the extra errors introduced due to increased circuit depth and complexity.
We have demonstrated an algorithmic beyond breakeven improvement using error detection in the sense that the protected algorithm clearly outperforms its unprotected counterpart.Previous break-even improvements The DD sequences are ranked in order of decreasing success probability.The two dotted lines represent success probabilities corresponding to a random and classical strategy, respectively.RGA8a and RGA8c are tied as the best-performing sequences.Free denotes the result of an unprotected implementation.Error bars correspond to 99% confidence intervals.
were at the individual gate level [47,48].Here we have demonstrated such an improvement at the level of the execution of an entire algorithm, albeit of a fixed size.The holy grail is to demonstrate the implementation of an algorithm for a family of problem sizes at the logical level with higher fidelity than the same algorithm executed at the physical level.Achieving this in our setting would require increasing the problem and code sizes.The family of [[2k + 2, k, 2]] subsystem quantum error detecting codes is an attractive option in this regard since all their logical operators can be chosen to be 2-local [49], which simplifies the circuit design.An experimental implementation of such larger codes and problem sizes remains a coveted goal.

VI. 3-QUBIT TO 5-QUBIT GROVER PROTECTED BY DYNAMICAL DECOUPLING
Crossing the classical threshold in Grover's search for an increasingly larger number of qubits is a meaningful goal, not only because the quadratic speedup offered by Grover's algorithm leads to a more dramatic improvement as the problem size increases but also because it becomes more challenging to realize the speedup experimentally as the controlled phase gate C n−1 Z is an n-qubit entangling operation.In the implementation of Ref. [5], 5-qubit Grover required nearly a thou-sand two-qubit gates, and for 8-qubit Grover, nearly 15000 gates were used.Notably, this exponential increase in the number of two-qubit gates with problem size is because Ref. [5] did not use ancilla qubits to make the circuits shallower.It is possible to implement C n−1 Z with circuits where two-qubit gates scale linearly with n (see Appendix F).Ref. [8] employed shallower circuits for C n−1 Z and solved 5-qubit Grover with slightly better-than-random success probabilities but without better-than-classical performance.We use an efficient, ancilla-assisted implementation of generalized Toffoli-type gates [8,50] to implement C n−1 Z (see Appendix F).Our implementation, which builds upon the circuits from Ref. [8], uses 8, 14, and 22 CNOTs for a single C n−1 Z gate for n = 3, 4, and 5, respectively.The deepest circuit we implement is two oracle queries for 5-qubit Grover, totaling 88 CNOTs.Despite being far shallower than Ref. [5]'s implementation, this is still a relatively deep circuit; e.g., the quantum supremacy demonstration of Ref. [51] and the algorithmic quantum speedup demonstration of Ref. [36] involved circuits of depth up to 40 and 44, respectively.As we detail in this section, owing to error suppression via DD, we crossed the classical probability threshold for all problem sizes, including for 5-qubit Grover.
A. Implementation with dynamical decoupling DD sequences are inserted into idle intervals of a quantum circuit using the "decouple then compute strategy" demonstrated in Fig. 2, which shows the DD insertion scheme for a single query on a 4-qubit Grover circuit.In contrast to two-qubit Grover, where we restricted DD implementation to XY4, n ≥ 3 has ample idle intervals.Therefore we can implement robust sequences (RGA, CDD, UR) requiring more than four pulses.Each of these families has multiple members that are parameterized by the number of pulses in the sequence.We restrict our implementation to DD sequences with fewer than 32 pulses and, for each circuit, only consider sequences that we can fit in the idle intervals available in the quantum circuit.
At each problem size n, there are 2 n possible oracles, each corresponding to one marked state |b , where b ∈ {0, 1} n .We proceed as follows to avoid implementing this exponentially large set of oracles.Given 0 ≤ k ≤ n, there are n k distinct bitstrings that are identical to 0 k 1 n−k up to qubit permutation.Recall that marked states differ only by whether X or I gates surround the C n−1 Z gate.Thus, we only consider the n + 1 oracles with marked states |0 k 1 n−k , k ∈ {0, . . ., n}.We then estimate the average success probability by computing We use p(n) as the metric for selecting the optimal DD sequences among those we tested and to identify the The boxes correspond to the theoretically expected success probabilities.The quantum oracle is queried twice; in the ideal case, the success probability is 0.602.The unprotected (Free) evolution is on par with a random guess, significantly worse than the optimal classical strategy (dashed vertical line), and just adding MEM does not change the result.In contrast, the DD-assisted implementation crosses the classical threshold, and the results improve even more with MEM, up to a success probability of 0.15.Error bars correspond to 99% confidence intervals.Middle and right: the complete input-output maps for all 2 5 marked states, without and with DD + MEM, are shown.States are sorted by increasing Hamming weight; in the Free case, low Hamming weight states have a higher success probability (more green on the left).This is likely to be a consequence of amplitude damping (spontaneous emission), which favors the |0 state of each qubit.In the unprotected case (Free, middle), there is no discernible correlation between the input marked state and the output detected state.In the protected case (DD + MEM right), black-to-purple signifies better-than-classical success probability, and this threshold is crossed for all 32 marked states.The DD sequence used here is RGA8a [21], which was the top-performing sequence in our DD survey (see Fig. 8).Jakarta Theory Nairobi Theory Jakarta Free Jakarta DD Nairobi Free Nairobi DD FIG. 10.Success probabilities vs problem size.Nairobi (green) and Jakarta (orange) success probabilities for n ∈ {3, 4, 5} are shown for DD-protected and unprotected implementations.The translucent bands indicate the theoretically estimated success probabilities using the model described in Section III.We performed q = 2 queries to the quantum oracle in all cases.The ideal success probabilities are 0.945, 0.908, and 0.602 for n = 3, 4, and 5, respectively.The white lines correspond to the success probabilities for the classical strategy and random sampling from the unsorted list (q = 0).Error bars correspond to 99% confidence intervals.
experimentally optimal number of queries q e opt .Once the optimal DD sequence and q e opt are identified for each n, we run the unprotected and the DD-protected Grover's algorithm again at q e opt , but this time for all 2 n oracles.
B. Results

Optimal DD sequence and number of queries
Our first goal is to identify the best DD sequence and q opt .The determinations made in this step inform our choices for the next step.For conciseness, in this section, we focus on the results of the largest problem size we implemented, i.e., n = 5.Appendix B shows the results for 3 ≤ n ≤ 5 on both Nairobi and Jakarta.The performance of various DD sequences for Nairobi for 5-qubit Grover are compared in Fig. 8 by computing p(5) .The unprotected evolution (Free) is marginally better than choosing an element randomly and does not cross the classical threshold.DD protection is necessary to cross this threshold, but the two-pulse sequences RGA2x and CPMG still result in worse-than-classical performance.The RGA and UR sequences perform well, particularly those with fewer than 12 pulses.RGA8a and RGA8c are tied as the best-performing sequences; we choose RGA8a for the next step, where we implement all of the 2 5 oracles.The performance improvement seen due to robust sequences is consistent across problem sizes and devices, as detailed in the Appendix B.
We also use p(n) to identify the experimentally optimal number of oracle queries q e opt .The theoretically optimal number of repetitions for n = 3, 4, 5 is q opt = 2, 3, 4, respectively.However, Appendix B shows that in reality, the theoretically expected q opt often leads to worse performance than q e opt .For the DD-protected implementation, q = 2 maximizes the success probability p s in all cases other than 5-qubit Grover on Jakarta, where the performance at q = 2 is comparable to q = 1.As DD protection is necessary to cross the classical threshold, for simplicity of analysis and to maximize p s we set q = 2 from here on.

Better-than-classical performance
Fig. 9 shows our results for the 5-qubit Grover problem on Nairobi with and without DD.Even in our relatively shallow-depth implementation, before error suppression via DD (whether Free or Free + MEM), the final results are indistinguishable from randomly guessing the marked state.The results change significantly when we implement DD.With DD, the classical threshold is crossed by all marked states.Adding MEM improves the results slightly, but only when accompanied by DD.
This dramatic improvement due to DD holds for other problem sizes as well.Fig. 10 shows the success probabilities after two oracle queries on both devices for 3 ≤ n ≤ 5 (see Appendix H for the role of postselection in these results).At the two smaller problem sizes (n = 3, 4), the unprotected implementations are better than random sampling, but the success probability is relatively low.For n = 4, the unprotected quantum Grover circuit does not exceed the classical single-query threshold.It is effectively on par with random sampling for n = 5.In contrast, for all problem sizes, the DD-protected quantum strategy at q = 2 outperforms the classical strategy for q ≤ 3. DD-protected Grover performance at n = 3, 4, 5 is equivalent to classical q = 4, 5, 3 for Jakarta and q = 4, 5, 4 for Nairobi, respectively.Thus, DD is essential in attaining a better-than-classical performance.
The translucent bands in Fig. 10 and the boxes in Fig. 9 (left) show the theoretically expected results computed from the open system model with the IBMQEsupplied parameters.The success probability in this unprotected Grover case (dashed lines in Fig. 10) is considerably lower than the theoretical expectation.This discrepancy is likely due to crosstalk and non-Markovianity, which are well-documented for IBMQE's superconducting qubit-based QPUs.Once we use DD, the observed fidelities improve and are close to the theoretical predictions.This improvement is expected given DD's ability to reduce the effect of crosstalk [23,24] and non-Markovian effects.
With DD, the algorithmic performance approaches the expectations based on our error model.However, we emphasize that this model does not predict the QPU's performance under DD; it simply tells us what the per-formance would be if the reported calibration metrics corresponded to observed dynamics.The overlap between the theoretically predicted (translucent) and the DD-protected (solid) performance implies that DD successfully mitigates the errors that our simple model does not account for.However, the model does not provide an upper bound on the possible performance improvement due to error suppression.For instance, better-optimized sequences could suppress idle-time errors further, and dynamically corrected gates can suppress errors during operations [52,53].
We note that the restriction to n ≤ 5 arose not because of circuit width but depth.In particular, we used two oracle queries, though theoretically q opt = 4 at n = 5.The gap between the theoretically and experimentally optimal number of queries is expected to grow with problem size.As is true for any quantum algorithm, optimizing circuit compilation and increasing metrics such as T 1 , T 2 , and gate fidelities are all vital for scalability.

VII. DISCUSSION AND CONCLUSIONS
We implemented Grover's algorithm of various sizes on multiple superconducting qubit devices.To our knowledge, this is the largest successful demonstration of Grover's algorithm for which the quantum strategy outperforms its classical counterpart.For two-qubit Grover, we focused on error detection via the [[4, 2, 2]] code and showed that it allowed us to achieve near-optimal performance.Along the way, we introduced the method of algorithmic error tomography.We showed that it provides a wealth of information complementary to previous protocols, such as gate set tomography or just measuring the success probability of an algorithm.We showed that error suppression via DD is essential in attaining betterthan-classical performance for larger problem sizes.
Grover's algorithm is a demanding algorithm [5] as it requires multiple implementations of C n−1 Z -a fully entangling operation.The superconducting trimon device [6], which prior to our results achieved the highest success probability for 3-qubit Grover, is an example of algorithm-tailored hardware where C 2 Z is a native gate.Constructing hardware that can natively perform such entangling operations may be one path to realizing the full potential of Grover's algorithm.Still, it is desirable to achieve this goal with more general-purpose quantum hardware, as we have strived to do here.
Today's quantum experimentalists have various error mitigation tools at their disposal.Measurement error mitigation [12,54], dynamical decoupling, zero noise extrapolation [11], and quantum error detection [38] are parallel strategies that address different kinds of errors.Whether and which error mitigation method to employ must be decided based on the problem and available resources.In this work, we combined MEM with DD and quantum error detection.As expected, these strategies complement each other.However, we found that often, MEM only became useful after DD was employed.Dynamical decoupling, which arguably has the lowest resource overhead and requires no postprocessing, was the single most effective strategy in improving the performance of our implementation of Grover's algorithm.Our work adds to the growing literature [23, 34-36, 39, 55] on the effectiveness of error suppression through DD.
While we demonstrated a crossing of the classical threshold at every problem size we tested, better-thanclassical success probabilities are not enough to claim a provable quantum speedup [56].Such a claim would require computing the scaling of the time-to-solution metric as a function of problem size and extending it to the largest possible problem size that can be embedded on the device.Here we could not go to the largest possible problem size as even at n = 5, our circuit is quite deep -two queries required 88 two-qubit gates, and for a larger number of queries or qubits, we no longer observed a quantum advantage.Achieving quantum speedup for Grover search will require devices that can implement circuits much deeper than those used here without a catastrophic drop in fidelity.Recent results [57,58] suggest that without significant improvements in the surface code implementation, the latter will not necessarily provide an advantage over the type of error suppression and mitigation methods we have explored here.Thus, our results are likely to be necessary (but not sufficient) stepping stones toward a quantum speedup for Grover's algorithm.
Appendix A: Classical success probability Let p C s (q, N ) be the classical success probability after q oracle queries for an unsorted list with N elements.If the oracle is never consulted, then we are simply picking an element at random, and the probability of finding the marked element is p C s (0, N ) = 1/N .At the other extreme, after N − 1 queries with negative replies from the oracle, we are guaranteed to identify the marked element by selecting the last remaining element so that p C s (N − 1, N ) = 1.Clearly, the probability grows in proportion to q, so we may conclude that p C s (q, N ) = q+1 N .A complete argument is the following.Suppose there are N elements.With zero queries, we pick the marked element with probability p C s (0, N ) = 1/N and stop.With one query, either we already picked the correct element with probability p C s (0, N ) = 1/N and are so informed by the oracle, or we are told this was the wrong element, so pick again from the remaining set of N − 1 elements.The probability that the marked element was in the set of N − 1 is (N − 1)/N , and the probability of now picking the correct element is 1/(N − 1): With another query, we're now told whether our second pick was correct or wrong; if the latter, we pick again from the remaining set of N − 2 elements.The probability that the marked element was in the set of N − 2 is (N − 2)/N , and the probability of now picking the correct element is 1/(N − 2): Each time we increase the probability of success by 1/N .Thus, continuing, we have p C s (q, N ) = p C s (q − 1, N ) + (N − q)/N × 1/(N − q) = (q + 1)/N .

Appendix B: Survey of dynamical decoupling sequences
In addition to the well-known XY4 and CPMG sequences, we consider three families of robust dynamical decoupling sequences.These sequences are expected to work well on a superconducting device with finite pulse width and flip-angle errors.The first sequence family is concatenated DD (CDD).CDD comprises recursively generated sequences by concatenating a base sequence such as the XY4 sequence.Formally, Here we could only proceed as far as CDD 2 , as the idle intervals in the circuit were too short to incorporate higherorder CDD sequences.
The second family comprises the robust genetic algorithm (RGA) sequences [21].These were found by assuming a generic single-qubit error term and a numerical optimization using genetic algorithms.A subset of the sequences was enforced to be robust against flip-angle errors.Therefore, these sequences are called robust genetic algorithm sequences.Due to duration constraints, we only attempted sequences up to 32 pulses, even though longer sequences were identified in Ref. [21].
Here X means a π-rotation about the +x axis.In contrast, X means a π-rotation about the −x axis (see Ref. [39] for a concise and detailed summary with more explicit definitions, including the effect of pulse width and the associated errors).Finally, the third family is that with universally robust (UR) sequences [19] 8. Average success probability for n = 3, 4, 5 with two oracle queries on Jakarta (top) and Nairobi (bottom).The DD sequences are ranked in order of decreasing success probability.The two dotted lines represent success probabilities corresponding to a random and classical strategy, respectively.For n > 3, the unprotected evolution (Free) is marginally better than choosing an element randomly and does not cross the classical threshold.DD protection is necessary to cross the classical threshold, and the RGA and UR sequences with fewer than 12 pulses are the best performers.Error bars correspond to 99% confidence intervals.12. Performance under different oracle query numbers.Success probabilities are shown as a function of the number of oracle queries for Jakarta (left) and Nairobi (right).All results included MEM, and error bars represent 99% confidence intervals.Dashed red lines correspond to the optimal classical success probability.Except for n = 3, the classical threshold is crossed only with DD.In the main text, we set q = 2, which is the optimal number of repetitions for all instances other than n = 5 on Jakarta.Error bars correspond to 99% confidence intervals.
where (π) φ is rotation about the axis at an angle φ from the +x-axis.We choose φ 1 = 0, and φ 2 = Φ(n) so that all UR n sequences are palindromic.Once again, we constrained our survey to sequences with up to 32 pulses.Our results from testing these three robust families of DD sequences are shown in Fig. 11.For n > 3, almost all DD sequences improved the success probability, but even among the sequences tried, there was a considerable variation.Robust sequences with fewer than 12 pulses per DD cycle were the best performers.The eventual decrease in the performance of sequences with an increasing number of pulses is to be expected as they are implemented using noisy gates, and there is a trade-off between the protection provided by DD and the accumulation of gate errors.The RGA8c and RGA8a sequences performed consistently well and are the only sequences to cross the classical threshold on Jakarta for n = 5.RGA8c is also commonly known as the Eulerian DD (EDD) sequence, and RGA8a is a slightly modified version of EDD.These palindromic sequences are known to be robust against flip-angle and finite-width errors.The best sequence at each problem size is shown in Table II.
RGA8c RGA8a Fig. 12 shows the experimental success probabilities for the unprotected and DD-protected Grover circuits for all queries q.Here, only the best DD sequence from the survey above (listed in Table II) is used in each case.Theoretically, for n = 3, 4, 5, q opt = 2, 3, 4 respectively.Unfortunately, for 5-qubit Grover, software restrictions prevented us from going beyond q = 2 and 3 on Jakarta and Nairobi, respectively.However, it is already clear that the experimentally optimal value was reached in both cases.Recall that we restricted our results to q = 2 oracle queries in the main text.For n = 3, this is both experimentally and theoretically optimal.For Nairobi, two queries have the highest experimental success probability for all problem sizes.For Jakarta and n = 5, a single query has a slightly higher success probability, but the difference between q = 1 and q = 2 is not substantial.
Overall, for simplicity of analysis, in the main text, we focused only on results for q = 2.
Finally, Fig. 13 shows the results for all oracles at two queries using the DD sequence found from the survey above.The results are qualitatively identical on both devices.We have already clarified that DD is necessary to cross the classical threshold.One might suspect that majority voting may suffice to declare a detected state as the marked state if it is the mode of its corresponding probability distribution.However, even under this criterion, for 5-qubit Grover, there is no way to detect the marked state without DD.probability(prepare |j |measure bitstring k).M is used to extract the "true" mitigated probability vector t = f ( p, M ) from the observed probability vector p.Different error mitigation methods differ in how they define f .The most commonly used MEM method, response matrix inversion (Inv), sets t = M −1 p.This is a frequently used MEM strategy [34,[60][61][62][63].However, Inv is inherently flawed in the sense that M −1 need not be stochastic, and as a result t can have negative elements.Ref. [12] notes that iterative Bayesian unfolding (IBU) -a wellestablished method to correct detector defects in highenergy physics experiments -can address readout noise without compromising on stochasticity.IBU is a simplified form of the expectation-maximization method from machine learning that maximizes the likelihood function.In particular, starting with a prior "truth spectrum" t 0 , the error mitigated distribution t n is obtained by repeatedly applying Bayes' rule to get We rely on IBU, in particular, the pyIBU package [45] to perform MEM to avoid dealing with negative probabilities.In Ref. [45], the optimal n at which the iteration stops is determined by placing a lower limit on the 1distance between t n and t n−1 .In Fig. 6, there is a slight discrepancy in the observed and predicted fidelities in the Unenc + MEM and Enc + MEM columns for Nairobi: the success probability is lower than expected.The bar plots in Figs. 15 and 16 show that this discrepancy is reduced if the mitigated distribution is computed using Inv instead of IBU.However, Inv leads to unphysical results, as seen in the rightmost output distribution map in Fig. 16 where p s > 1 for the marked states |01 and |10 .In particular, when using Inv, the terms in t sum to 1 but the elements t i are not guaranteed to be in [0, 1].The off-diagonal terms under Enc + Inv in Fig. 16 are indeed negative.So while Enc + Inv has the best reported average success probability, which is also higher than our model predicts, the underlying output distribution is unphysical.IBU, on the other hand, always returns a valid probability distribution but does not fully invert the response matrix M .The discrepancy in simulated and observed values of Nairobi's measurement error mitigated failure probabilities reflects this.Despite its limitations, we err on the side of caution and use IBU as the default MEM method.A more systematic and recent critique of MEM is provided in Ref. [64].

3-qubit to 5-qubit Grover circuits
The problem of transpilation increases in complexity with problem size.C n−1 Z can be achieved by finding a circuit decomposition for the n-qubit Toffoli gate C n−1 X.It is known that the three-qubit Toffoli gate, C 2 X, can be implemented using six CNOTs [65].However, this requires a fully connected architecture.As no fully connected group of three qubits can be found in the QPUs we used, we rely on the 8-CNOT decomposition [8] of C 2 Z shown in Eq. (F5).
where G = R y (π/4), and C 3 Y is shown in Eq. (F7): Finally, using the relative-phase Toffoli gates, C 3 Z can be written as in Eq. (F8): and likewise, C 4 Zcan be constructed as in Eq. (F9): This scheme -where relative phase Toffoli gates [50] are sewn together to generate a circuit for C n−1 Z (n > k + 2) -can be generalized.In particular, C n−1 Z can be implemented using C 2 Y , C 2 Y † and C n−2 Z, which in turn uses C n−3 Z.As a result of this recursion, the number of CNOTS for a Thus, the number of CNOTs required to implement a single query of n-qubit Grover scales as O(n).At the same time, the number of necessary ancillas is n − 2, i.e., it also scales linearly with n.As we did by using Eq.(F9), this linear scaling of ancillas could be avoided by considering C k Y with k > 2 while increasing the number of CNOTs.Whether entangling fewer qubits by allowing for deeper circuits is worthwhile will depend on the QPU architecture under consideration.Note that the theoretical optimal number of queries q opt = O(2 n/2 ) so at q opt , the number of CNOTs scales as O(2 n/2 n) where the exponential component will dominate.However, as we noted before, the experimentally allowed number of queries before decoherence takes over might be less than q opt .
Appendix G: Open system model optimization The model presented in Section III only accounts for amplitude damping, dephasing, and depolarization, which are a subset of the errors in a superconducting device.Notably, this model does not include crosstalk, leakage to higher energy levels, and non-Markovian systemenvironment interaction.As we saw in the main text, introducing error suppression improves the agreement between our model and the observations.However, in the absence of DD, the model results often provide an upper bound on the performance, and the device can perform worse than the model's predictions.
and pD → λgpD.The left and right columns represent unencoded and encoded implementations, respectively.The dotted black line represents the default setting λi = 1.The grey band in the Jakarta case (top row) is D for the DD-protected circuit versions.For Nairobi (bottom row), even without error suppression, the fit between theory and experiment is already quite good, and our optimization aims to improve the fit further.Thus, in the bottom row (Nairobi), the grey band corresponds to the distance between simulated and observed at λi = 1.
It is plausible to ask if rescaling the calibration metrics provided by IBMQE results in an improved phenomenological model and better predictability of the device performance.To do this, we quantify model predictability by computing the 1 -norm based distance between the experimentally observed probability distribution p m and the simulated distribution q m for marked states |m .We define such that D is the distance between the probability distributions averaged over all the marked states |m .D = 0 implies a perfect match between the simulated and the observed distributions, and D = 1 means completely distinguishable distributions.We then consider D as a function of rescaling parameters λ i such that T 1 → λ −1 1 T 1 , T 2 → λ −1 2 T 2 and p D → λ g p D .To evaluate the feasibility of this approach, we focus on the smallest problem size, n = 2.The most significant error contribution comes from the depolarizing channel, as D is most responsive to λ g .With this in mind, we start   6).Each row of the error tomography table corresponds to a marked state, and each column represents logical errors and X, Y, Z type errors.Even after rescaling the model parameters, compared to the error tomography table for Jakarta (Fig. 4), we do not see an asymmetry in Z errors across marked states.
from the default setting λ i = 1, then optimize λ g , λ 1 , λ 2 in that order, using the optimal value from the previous scan as the input for the next parameter.Fig. 17 shows the distance D as a function of λ i .Recall that in the absence of DD, we reported discrepancies in both the AET of the encoded circuit (Fig. 4) and the failure probabilities of the unencoded circuit (Fig. 6) for Jakarta.To directly compare the result of optimizing λ to the effectiveness of error suppression, Fig. 17  For the unencoded experiment, increasing p D by a factor of λ g = 10 suffices for the model to match the experiment.D is minimized at λ 1 , λ 2 = 1, i.e., there is no change from the default setting for T 1 and T 2 .However, even with the optimized values (i.e., setting λ g = 10, λ 1 , λ 2 = 1), the new model barely improves upon activating DD, corresponding to the grey band.Similarly, for the encoded two-qubit Grover on Jakarta, only optimizing p D leads to a noticeable decrease in D, with the best fit found at λ g = 6.The optimized model does match the success probability acquired by activating DD (grey band, top right).However, AET at λ g = 6, shown in Fig. 18, does not exhibit the state-wise asymmetry seen in Fig. 4. A similar optimization on Nairobi confirms our earlier observation: the model that uses the provided calibration metrics fits with the observed results, and Nairobi does not benefit from parameter rescaling.
To summarize, rescaling the calibration metrics to improve predictability is mildly effective: the agreement with the experimental p s results improves by increasing the depolarization.However, the optimized model does not match the AET results for Jakarta.A more sophisticated model that includes qubit crosstalk and leakage to/from higher energy levels appears necessary for a better agreement.However, as highlighted in the main text, introducing error suppression via DD provides a reliable way to simultaneously improve performance and agreement with a model that accounts only for Markovian am-plitude damping, dephasing, and depolarization.The circuits for C 3 Z and C 4 Z require 5 and 6 qubits, respectively [see Eqs.(F8) and (F9)], with one ancilla qubit used to link the Toffoli and relative-phase Toffoli gates.The ancilla qubit is initialized in |0 and should be in that state at the end of the algorithm.Consequently, when measured in the Z-basis, observing the ancilla qubit in |1 implies that an error occurred in the implementation of the C 3 Z or C 4 Z gate.We postselect the 4-qubit and 5-qubit Grover experiments by only considering experiments for which the ancilla qubit (q1 in Fig. 14) is measured to be in state |0 .Fig. 19 shows the effect of postselection on success probabilities.The better-than-classical performance for Nairobi holds even if postselection is avoided.However, without postselection, we see a tie with the classical result for 5-qubit Grover on Jakarta at q = 2. Overall, there is a nontrivial increase in success probabilities due to postselection.Using ancilla qubits for postselection is convenient, for Nairobi (green) and Jakarta (orange).Here we only consider the DD-protected circuits.3-qubit Grover does not undergo postselection.For 4-qubit and 5-qubit Grover, we postselect to count only the experiments for which the ancilla qubit (q1 in Fig. 14) is in |0 .The white lines correspond to the success probabilities for the classical strategy and random sampling from the unsorted list (q = 0).With postselection, for all problem sizes, the DD-protected quantum q = 2 strategy outperforms the classical strategy for q ≤ 3. Without postselection, the better-than-classical requirement is met by all implementations other than 5-qubit Grover on Jakarta, where we achieve a breakeven.Error bars correspond to 99% confidence intervals.

FIG. 1 .
FIG.1.Circuit description for Grover's algorithm.The relative amplitudes of all the states at each stage of the algorithm are shown.Starting with an equal superposition state, the oracle assigns a relative phase difference of π to the marked state.The amplitude amplification step then performs an inversion about the mean, allowing |m to have a larger probability amplitude than all other states.This round of querying and amplifying is repeated q times.The optimal number of rounds for the n-qubit Grover problem is qopt = π 4 2 n/2 .The only multi-qubit operation required to implement both the oracle and the amplitude amplification step is Cn−1Z (vertical line in the Oracle and Amplitude Amplification boxes).

FIG. 2 .
FIG.2.Grover and DD.The timeline for one oracle query for 4-qubit Grover with the marked state |1111 is shown.Qubits q4 and q6 are spectators in this example.Recall that each oracle query for 4-qubit Grover requires two C3Z gates.C3Z requires 14 CNOTs, and the entire circuit uses 28 CNOTs; see Appendix F for circuit compilation details.The pre-DD circuit elements are grayed out, and the colored lines represent the DD pulses.The DD sequence exemplified here uses four pulses for illustration purposes; in reality, we used longer sequences.The scheme demonstrated highlights four primary features of our implementation: (1) all idle intervals, including the ones on inactive qubits, are filled, (2) only one repetition of each sequence is performed, and the pulse interval is adjusted accordingly, (3) each pulse in the sequence can be unique, (4) a single qubit can experience multiple DD repetitions if there are multiple idle intervals.

FIG. 6 .
FIG.6.Two-qubit Grover results.Two-qubit single-query Grover failure probability results without (Unenc) and with (Enc) postselection using the [4, 2, 2] code on Jakarta and Nairobi are shown.The transparent boxes represent the theoretically expected failure probabilities from the model described in Section III, which does not include DD; their centers correspond to the average over marked states, and their boundaries correspond to 95% confidence intervals after bootstrapping.The colored bars represent the experimental results (see the legend), and the experimental error bars (black for Jakarta with DD and Nairobi, or pink for Jakarta without DD) correspond to 95% confidence intervals after bootstrapping.Dark green appears where the pink and light green colors (i.e., Jakarta with and without DD) overlap.In the Unenc case, we run two identical copies of the two-qubit Grover problem to equalize resources with the Enc case and choose the copy with the highest success probability.Also shown are the results with MEM using iterative Bayesian unfolding (see Appendix D for details).Failure probabilities with and without DD protection are shown for Jakarta but not for Nairobi, where the simulated and observed error tomography and failure probabilities are in agreement (see Section V C 2).The presence of DD does not affect the success probability in the encoded implementation, and as a result, the pink bars are mostly hidden behind the green bars.However, the nature of detected errors, even in the encoded case, is affected by DD (see Fig.4).All data for different runs on the same QPU were collected on the same day; data from different QPUs were collected on different days.

FIG. 7 .
FIG. 7. Two-qubit Grover results for Jakarta (top) and Nairobi (bottom).The left and right panels show the output distribution for all possible oracles for two setups: unencoded and encoded with MEM.As in Fig. 6, Unenc corresponds to two copies of unencoded two-qubit Grover, of which the best result is reported.Enc corresponds to the results encoded using the [[4, 2, 2]] code.The Enc results are reported after postselection.Let ps(m, b, e) be the observed success probability for marked state m, detected state b, and experiment type e ∈ {Enc,Unenc+MEM,Enc+MEM}.The success probability changes from orange to green when ps(m, b, e) > max m,b ps(m, b, Unenc).Error bars correspond to 95% confidence intervals.

FIG. 9 . 5 -
FIG.9.5-qubit Grover results on Nairobi.Left: average success probability with and without DD or MEM for 5-qubit Grover implemented on Nairobi.The boxes correspond to the theoretically expected success probabilities.The quantum oracle is queried twice; in the ideal case, the success probability is 0.602.The unprotected (Free) evolution is on par with a random guess, significantly worse than the optimal classical strategy (dashed vertical line), and just adding MEM does not change the result.In contrast, the DD-assisted implementation crosses the classical threshold, and the results improve even more with MEM, up to a success probability of 0.15.Error bars correspond to 99% confidence intervals.Middle and right: the complete input-output maps for all 2 5 marked states, without and with DD + MEM, are shown.States are sorted by increasing Hamming weight; in the Free case, low Hamming weight states have a higher success probability (more green on the left).This is likely to be a consequence of amplitude damping (spontaneous emission), which favors the |0 state of each qubit.In the unprotected case (Free, middle), there is no discernible correlation between the input marked state and the output detected state.In the protected case (DD + MEM right), black-to-purple signifies better-than-classical success probability, and this threshold is crossed for all 32 marked states.The DD sequence used here is RGA8a[21], which was the top-performing sequence in our DD survey (see Fig.8).

(f) n = 5 ,FIG. 11 .
FIG. 11.Performance of DD sequences, expanding on the results shown in Fig.8.Average success probability for n = 3, 4, 5 with two oracle queries on Jakarta (top) and Nairobi (bottom).The DD sequences are ranked in order of decreasing success probability.The two dotted lines represent success probabilities corresponding to a random and classical strategy, respectively.For n > 3, the unprotected evolution (Free) is marginally better than choosing an element randomly and does not cross the classical threshold.DD protection is necessary to cross the classical threshold, and the RGA and UR sequences with fewer than 12 pulses are the best performers.Error bars correspond to 99% confidence intervals.
FIG.12.Performance under different oracle query numbers.Success probabilities are shown as a function of the number of oracle queries for Jakarta (left) and Nairobi (right).All results included MEM, and error bars represent 99% confidence intervals.Dashed red lines correspond to the optimal classical success probability.Except for n = 3, the classical threshold is crossed only with DD.In the main text, we set q = 2, which is the optimal number of repetitions for all instances other than n = 5 on Jakarta.Error bars correspond to 99% confidence intervals.
FIG. 13. 3-qubit, 4-qubit, and 5-qubit Grover on Jakarta (left) and Nairobi (right) after two oracle queries, complementing Fig. 9, which only shows Nairobi for n = 5.Each row represents a problem size in ascending order.In a row, the horizontal bar plot on the left shows the success probability under no error suppression and mitigation (Free), with measurement error mitigation (Free + MEM), with DD protection (DD), and with DD protection and measurement error mitigation (DD + MEM).The dashed horizontal line and the boxes represent the classical and the theoretically expected success probability, respectively.The second and third columns show the input-output map for Free and DD + MEM, highlighting the improvement offered by these strategies.The states are sorted by increasing Hamming weight.The transition from green to black occurs at the classical success probability threshold.With DD protection, the classical threshold is crossed in all cases.

FIG. 14 .
FIG.14.Device connectivity.Jakarta and Nairobi devices are built using the IBM Quantum Falcon r5.11H processors and have seven qubits.

FIG. 15 .
FIG. 15.Nairobi and MEM for unencoded two-qubit Grover.The horizontal bar plot on the left shows the failure probability for the unencoded two-qubit Grover algorithm for the unmitigated data and under two MEM techniques: iterative Bayesian unfolding (IBU) and response matrix inversion (Inv).The three heat maps show the output distribution under no mitigation, IBU, and Inv, going from left to right.The input marked states are on the vertical axis, and the printed numbers on the diagonal represent ps for the respective marked states.The off-diagonal elements in the output distribution are written explicitly to emphasize the presence of negative probabilities under Inv in the right-most figure.
F5) Here T = Z 1/4 .For C 3 Z and C 4 Z we use relative-phase Toffoli gates [50].Breaking down C k Z using C a Z and C b Y such that a + b = k + c allows for C k Z to be implemented with fewer CNOTs as long as we use c ancillas.In our construction, we only use one ancilla for C 3 Z and C 4 Z.C 2 Y is shown in Eq. (F6),

D
FIG.17.Scaling calibration metrics.The 1-norm based distance D between the observed and simulated outputs for twoqubit Grover is shown as a function of the scaling parameters λi.The model parameters are scaled by settingT1 → λ −1 1 T1, T2 → λ −1 2 T2and pD → λgpD.The left and right columns represent unencoded and encoded implementations, respectively.The dotted black line represents the default setting λi = 1.The grey band in the Jakarta case (top row) is D for the DD-protected circuit versions.For Nairobi (bottom row), even without error suppression, the fit between theory and experiment is already quite good, and our optimization aims to improve the fit further.Thus, in the bottom row (Nairobi), the grey band corresponds to the distance between simulated and observed at λi = 1.

FIG. 18 .
FIG.18.Algorithmic error tomography on Jakarta for optimized parameters.The plot shows the results of AET on the encoded two-qubit Grover algorithm after setting (λ1, λ2, λg) = (1, 1, 6).Each row of the error tomography table corresponds to a marked state, and each column represents logical errors and X, Y, Z type errors.Even after rescaling the model parameters, compared to the error tomography table for Jakarta (Fig.4), we do not see an asymmetry in Z errors across marked states.
(top row) also shows the distance D between theory and experiment after DD.

FIG. 19 .
FIG.19.Postselection for 4-qubit and 5-qubit Grover: Success probabilities are shown with and without postselection for Nairobi (green) and Jakarta (orange).Here we only consider the DD-protected circuits.3-qubit Grover does not undergo postselection.For 4-qubit and 5-qubit Grover, we postselect to count only the experiments for which the ancilla qubit (q1 in Fig.14) is in |0 .The white lines correspond to the success probabilities for the classical strategy and random sampling from the unsorted list (q = 0).With postselection, for all problem sizes, the DD-protected quantum q = 2 strategy outperforms the classical strategy for q ≤ 3. Without postselection, the better-than-classical requirement is met by all implementations other than 5-qubit Grover on Jakarta, where we achieve a breakeven.Error bars correspond to 99% confidence intervals.
-qubit GHZ state) from the physical initial state |0000 .The encoded Grover circuit is implemented by converting each physical gate of the n = 2 case of Fig. 1 into its logical counterpart, which is then converted into a physical 4-qubit implementation (middle left and right dashed boxes); see Appendix F for details.P = diag(1, i) is the phase gate.We postselect the measured results by decoding (right dashed box) and discarding any measurement outcome for which the code detects errors, i.e., does not result in one of the four decoded states {|0000 , |0010 , |0111 , |0101 }.

compute the probability p b of each measurement outcome |b , and from here the probabil- ity p E of each error E by constructing the error outcome table {U dec E |b } for all b ∈ {0, 1} k and all the errors
we interpret each measured computational basis state |b as having arisen from MU dec E |b , where b ∈ {0, 1} n .If E is an error the code can detect then |b= MU dec E |b ∈ C ⊥ E ⊂ C ⊥ , while if E is a logical error then |b = MU dec E |b ∈ C.Here C ⊕ C ⊥ = H, the full system Hilbert space, and C ⊥ = E the code can detect.With this table in hand, we can check for each measurement outcome |b whether it is in the table (if so, it corresponds to an effective error the code can detect) or not (in which case it corresponds to a logical error, or no error).Suppose |b is in the error outcome table

TABLE I .
Error outcome table for the[[4, 2, 2]] code.The table describes the effect of single-qubit Pauli errors E on each of the encoded computational basis elements |b1b2 .The X, Y , and Z-type errors map each |b1b2 to a distinct subspace after decoding: Logical 0100 0110 0011 0001 1100 1110 1011 1001 1000 1010 1111 1101 FIG. 4. Algorithmic error tomography on Jakarta.The bitstring observed after U dec either corresponds to a marked entry or an error tabulated in Table I.Top: experimental results.The numbers in each box are the empirical percentage probabilities for detected X, Y , or Z-type errors, with 2σ standard deviation.Logical error percentage probabilities are shown in the first column of each table.Each row corresponds to a different marked state.The probabilities in each row do not sum to unity since we do not display the probability of obtaining the correct marked state.Left: without DD protection.Right: with DD protection.Bottom: the same for the simulated model. FIG.
FIG. 8. Performance of DD sequences.Average success probability for 5-qubit Grover with two oracle queries on Nairobi.
. UR sequences are defined such that

TABLE II .
The best-performing DD sequence at each problem size for both QPUs.These sequences were determined by implementing n + 1 oracles of the form 0 k 1 n−k for the n-qubit Grover problem.