Scalable algorithm simplification using quantum AND logic

Implementing quantum algorithms on realistic devices requires translating high-level global operations into sequences of hardware-native logic gates, a process known as quantum compiling. Physical limitations, such as constraints in connectivity and gate alphabets, often result in unacceptable implementation costs. To enable successful near-term applications, it is crucial to optimize compilation by exploiting the capabilities of existing hardware. Here we implement a resource-efficient construction for a quantum version of AND logic that can reduce the compilation overhead, enabling the execution of key quantum circuits. On a high-scalability superconducting quantum processor, we demonstrate low-depth synthesis of high-fidelity generalized Toffoli gates with up to 8 qubits and Grover’s search algorithm in a search space of up to 64 entries. Our experimental demonstration illustrates a scalable and widely applicable approach to implementing quantum algorithms, bringing more meaningful quantum applications on noisy devices within reach. To run algorithms on a computer they are broken down into logical operations that are implemented in hardware. A quantum logical AND gate has now been demonstrated, which could substantially improve the efficiency of near-term quantum computers.

Quantum algorithms are predicted to provide a computational speed-up over their classical counterparts.To be implemented, these algorithms need to be compiled on specific quantum hardware to decompose global operations into the naturally available elementary gates.Given the stringent resource constraints offered by the noisy intermediate-scale quantum (NISQ) technology foreseeable in the next 5-10 years [1], it is essential to optimize the use of every qubit and every gate cycle to enable successful near-term applications [2].One effective strategy is to fully explore the hardware capabilities and diversify the available gate alphabets to optimize compilation [3][4][5].
Several global or multi-qubit operations are textbook circuit components essential for building quantum algorithms [6].The best-known examples are the quantum arithmetic circuits used in Shor's factoring algorithm [7] and the multiply controlled gates used in Grover's search algorithm [8].The latter are nontrivial multi-qubit quantum logics that perform unitary operations on target qubits conditioned on the states of all the control qubits.Relevant applications include quantum error correction [9][10][11], quantum simulation [12], and quantum machine learning [13].One brute-force approach for an extensible implementation of these large operations is to decompose them into a finite set of universal gates.For example, the generalized Toffoli gate, i.e., the n-qubit controlled-NOT (CNOT) gate, can be constructed using quadratically many (O(n 2 )) two-qubit CNOT gates plus single-qubit gates on a qubit array with all-to-all connections [14] and even more gates on devices with nearest-neighbor couplings [15].A more efficient approach is to concatenate together small Toffoli gates, assisted by ancilla qubits [6,16].Leaving aside the extra resources needed, it is challenging to achieve high-quality small Toffoli gates.Apart from brute-force decomposition, small Toffoli gates may be obtained via one-step manipulations [17][18][19][20][21][22][23] or by leveraging either, again, ancilla qubits [24,25] or ancilla levels [26][27][28][29][30].Despite successful demonstrations of single small Toffoli gates in various systems, a scalable synthesis has never been experimentally realized because of the prohibitive implementation cost.A scheme that is, at the same time, hardware-efficient, low-depth, easy to control, and compatible with state-of-the-art hardware [31][32][33]] is yet to be realized.
In this study, we introduce a quantum version of the AND (QuAND) gate, a novel gate type that, as inspired by [28], utilizes an ancilla level for temporary information storage only.The QuAND gate enables a scaling advantage in the circuit depth when synthesizing arithmetic circuits and multiply controlled gates.We experimentally implement QuAND gates on a superconducting quantum processor featuring simplified wiring and low crosstalk and demonstrate a linear-depth synthesis of generalized Toffoli gates with up to 8 qubits, i.e., a total of 7 control qubits.To the best of our knowledge, our demonstration is the largest in size and the highest in performance to date (truth-table fidelity: 89.1%, 53.2%, and 39.1% for n = 4, 6, and 8, respectively).Using these gates, we perform Grover's search algorithm with multiple amplification cycles and achieve significant success probabilities (46.8% and 3.9% for searches of 16 and 64, respectively), demonstrating the feasibility of our method for scaled applications.Note that alternative efficient compilation schemes have been proposed in recent theoretical studies [11,34]; however, these schemes generally require the manipulation of a multi-level system with high degrees of freedom, adding considerable operational complexity.
The logic AND operation is a basic ingredient for designing both classical and quantum algorithms.Unfortunately, it cannot be directly implemented on qubits because of the reversibility of quantum operations.One workaround is to extract it from a Toffoli gate at the cost of an extra qubit [6]; such overhead hinders scaled implementation on realistic hardware.Here, we propose a resource-efficient QuAND gate scheme (Fig. 1a) in which one of the two outputs registers the AND result of the inputs, i.e., |A&B , and the other output |C spans three different states, in our case, |1 , |2 , and |0 for the input states |00 , |01 , and |10 , respectively.The use of the ancilla level |2 preserves the reversibility; the reverse QuAND gate simply switches the inputs and outputs.We refer to the AND-value qubit as the "parent" and the other qubit as the "child".The circuit notation, truth table, and decomposition schemes of the QuAND gate and its reversal are illustrated in Fig. 1a.Here, we decompose the QuAND gate (or its reversal) into a single-qubit X gate in front of (or after) a SWAP operation between |11 and |20 (denoted SWAP 11−20 ), which is naturally available on our hardware, as shown later.
One direct application of a QuAND gate is to simplify the compilation of large gate operations, in particular, multiply controlled gates, which are experimentally challenging to realize and the focus of this study.Figure 1b shows the circuit decomposition for an n-qubit CZ gate on a one-dimensional qubit chain divided into three stages: embedding, the controlled-unitary operation, and recovery.Let |s = |s 1 s 2 ...s n (s i = 0, 1) denote a basis state at the input.During embedding, we apply QuAND gates sequentially to the chain from both ends inward.At the end of the QuAND sequence, the two root parents in the middle, Q k and Q k+1 , temporarily register the AND result of all the qubits from the upper and lower halves of the chain, respectively.That is, Simplifying compilation using the quantum version of the AND (QuAND) gate.a, Circuit notation, truth table, and decomposition of the QuAND gate and its reversal.The AND-value qubit, indicated by an &, is referred to as the "parent" and the other qubit is referred to as the "child."A QuAND gate is indicated by an arrow pointing from the child to the parent, with an arrow in the opposite direction indicating a reverse QuAND gate.Both can be synthesized with a single-qubit X gate and a SWAP operation between |11 and |20 , which is indicated by a double-cross sign with a dashed cross on the child qubit.b, Circuit decomposition of an n-qubit controlled-Z (CZ) gate on a onedimensional qubit chain using a sequence of QuAND gates, a CZ gate, and a sequence of reverse QuAND gates, shown here with time progressing from left to right.During embedding, the sequentially applied QuAND gates register the AND results of all the qubits from the upper and lower halves of the chain onto the two root parents, Q k and Q k+1 , respectively.The embedded information is later released via the reverse QuAND gates to recover the original binary encoding.The CZ gate is only effective when all qubits are in state |1 .c, Sketch showing a quantum processor with qubits connected in an arbitrary topology.A branching tree is enacted for implementing the QuAND gate sequence (arrows) with time progressing from dark blue to light green.The CZ gate is performed between the two root parents.The QuAND gate could also be performed across multiple processors (arrows pointing from outside) to efficiently implement global operations on a larger quantum network.
coding and completes an n-qubit CZ gate.There are a total of 2n − 3 two-qubit gates in this sequence.The linear (O(n)) circuit depth (the number of two-qubit gate cycles) as a result of using QuAND gates manifests a scaling advantage over the quadratic depth when using only CNOT gates [14].Note that the ancilla levels are used here only for the temporary storage of the state information and that only a state-transfer operation is needed, which is in contrast to schemes that require more complex, hard-to-engineer operations with ancilla levels [29,30,34].Moreover, other types of multiply controlled gates such as the generalized Fredkin (controlled-SWAP) gate can be similarly synthesized.In fact, any classical circuit, such as Boolean logic and arithmetic circuits, can be constructed efficiently using QuAND and single-qubit gates because the classical NAND gate is universal.Examples of quantum adder circuits are shown in [35].
An even more impressive scaling advantage can be achieved on qubit arrays with higher connectivity.To see this, it is helpful to first identify the key idea of our proposal.That is, to enact a branching tree graph on an arbitrarily connected qubit array and apply QuAND gates sequentially to register the AND results of neighboring qubits onto the parents layer-by-layer from the leaves up to the root, as illustrated in Fig. 1c.Ignoring the constant, the optimal circuit depth is then equivalent to the depth of the tree.For example, the circuit depth can be reduced to O( √ n) on a two-dimensional square array and to O(log 2 n) on a binary tree [35]; such polynomial or exponential speed-up in compiling global operations can constitute a huge boost for relevant quantum applications.In addition, because this scheme only requires that qubits be connected, it is well suited for a distributed quantum network where only sparse connections are likely to be available.Our experimental device (Fig. 2a), tested inside a dilution refrigerator at a base temperature of 10 mK, consists of 8 fixed-frequency transmon qubits [36], known for long coherence and simplified control, arranged in a ladder array and interconnected via 10 frequency-tunable couplers.The two couplers in the middle have no control lines, resulting in the qubit array having a ring topology.Each qubit has a dedicated readout resonator, and all the resonators share a common feed line enabling a multiplexed dispersive readout.The qubit frequencies are arranged alternatively between a red band (6.2-6.5 GHz) and a blue band (7.0-7.3GHz) along the ring; such frequency planning helps suppress the microwave crosstalk.The qubits are strongly coupled (with an interaction strength of g/2π ≈ 100 MHz) to their adjacent couplers, which are tunable via their flux biases Φ e .The couplers are designed to turn off the inter-qubit coupling via multipath interference [37] near their maximum frequencies (8.0-8.4GHz) at Φ e = 0, which resolves the frequencycrowding problem and reduces the nearest-neighbor ZZ crosstalk down to about 50 kHz.In addition, the use of tunable couplers enables fast two-qubit gates between the fixed-frequency qubits, e.g., the adiabatic CZ gate [38,39].We use a shared control line to deliver the diplexed signals for both the qubit (4-8 GHz) and coupler (DC-1 GHz) control; these signals are synthesized at room temperature and transmitted to the device inside the refrigerator.This design substantially simplifies the wiring effort both on the chip and inside the refrig- erator, promising higher scalability.See [35] for details concerning the device and experimental setup.
The QuAND and SWAP 11−20 gates on our device were implemented using coupler-assisted level transitions.According to the tri-mode (|qubit, coupler, qubit ) notation, the SWAP 11−20 gate is a full swap operation between |101 and |200 , which is realized by a flux pulse sent to the coupler.To activate such a transition, we applied a flux pulse, consisting of an adiabatic rise and fall (40 ns each) separated by a sinusoidal pulse (30 ns), to the coupler, as illustrated in Fig. 2b.Under this pulse, as shown by the thin black line with embedded arrows, the sys-  tem state first follows an adiabatic excursion on state |101 from the idling bias Φ e = 0 Φ 0 to Φ e = 0.26 Φ 0 , then transits to |200 via a parametric drive resonant with the instantaneous frequency gap between |101 and |200 , and eventually adiabatically returns to the idling bias.There are two major concerns when choosing the transition bias.First, the flux-induced |101 ↔|200 transition is inhibited at Φ e = 0 but is significantly enhanced at a sufficiently large bias as a result of wavefunction hybridization, as is evident by the strong bending of the energy levels [40].Second, a proper bias is critical to avoid spurious transitions [35].
In the experiment, we calibrated the SWAP 11−20 gate by optimizing both the frequency and the amplitude of the parametric pulse.An example of the continuous swapping between |11 and |20 as a function of the pulse amplitude A p is shown in Fig. 2c.The average observed transition error of 2.7% is primarily caused by energy relaxation during the pulse.All data presented here were corrected to account for the state preparation and measurement error.See [35] for details concerning the gate scheme, pulse calibration, and data processing.
Using calibrated QuAND gates, we demonstrate the low-depth synthesis of a generalized Toffoli gate, which is equivalent to the n-qubit CZ circuit described in Fig. 1b with two additional single-qubit gates.Figure 3a illustrates how we compile, on the 8-qubit ring, an n-qubit CZ gate with incremental size (n = 4, 6, and 8) in lin-ear time steps.We characterize these large gates by measuring their truth tables U exp , i.e., the output state probability distribution for each of the 2 n input states, which are shown in Fig. 3b.The truth-table fidelities F tt = 1  2 n Tr(U exp U ideal ) are 89.1%,53.2%, and 39.1% for n = 4, 6, and 8, respectively.To the best of our knowledge, the 4-qubit result is by far the highest reported in any system and there have been no reports of generalized Toffoli gates with more than 4 qubits.The relaxationlimited gate fidelities (total duration) for the 4-qubit, 6qubit, and 8-qubit Toffoli gates are 92.5% (0.4 µs), 66.7% (1.3 µs, staggered pulses), and 62.3% (1.1 µs), respectively, and are responsible for approximately 70% of the total error; the remaining error is due in part to dephasing and in part to stray inter-qubit coupling [40][41][42].
Finally, we performed Grover's search algorithm as a complementary method to benchmark our multi-qubit gates.The core steps of this algorithm encode a solution bit-string j with a phase oracle O j = s =j |s s|−|j j|, a unitary that accesses the input function, and amplify the probability of finding |j via phase diffusion, with each step containing an n-qubit CZ gate (Fig. 4a); these two steps may be repeated for further amplification.Note that the phase oracle performs a conditional phase flip on |j ; therefore, an arbitrary oracle can be constructed from an n-qubit CZ gate with additional pairs of X gates applied to qubits being conditioned on |0 instead of |1 .qubit single-solution Grover's search algorithms with one oracle-amplification cycle (see [35] for extended data of the multi-solution Grover's search).The diagonal matrix elements correspond to the probabilities of finding the correct states, i.e., the algorithm success probability (ASP), and are substantially higher than the other elements, on average 34.2% versus 4.4% for the 4-qubit Grover's search and 3.9% versus 1.5% for the 6-qubit Grover's search, showing the effectiveness of the amplification.Because of its insufficient fidelity, the 8-qubit Grover result (not shown) does not display a significant ASP gain.
To optimize ASP, we tested Grover's search algorithm with multiple rounds of amplification.As shown in Fig. 4c, the average ASP in the 4-qubit case shows a clear improvement to 46.8% with one additional cycle (M = 2), and a clear dependence is visible up to 10 cycles, that is, a total of 20 CCCZ gates, owing to the high gate fidelity.Ignoring contributions from the singlequbit gate error, which is estimated to be 0.14% from simultaneous randomized benchmarking, we developed a simplified model for estimating ASP [35]: where F is the n-qubit CZ gate fidelity.Fitting the data to Eq. ( 1) gives F = 84.4% and 50.9% for the 4-qubit and 6-qubit cases, respectively, which are close to the above-measured truth-table fidelities.
The low-depth circuit synthesis using novel QuAND logic enabled our implementation of multiply controlled gates and Grover's search algorithm at record scale, confirming the feasibility of a scalable and resource-efficient approach to simplify algorithm compilation.This study should not only stimulate interest in exploring alternative compilation schemes using QuAND logic but also help reduce hardware-related challenges, in particular, the connectivity problem for which solid-state devices has long been criticized.Our work marks an essential step toward closing the gap between most anticipated near-term applications and available NISQ devices.ceived and designed the experiment.Y.Z. and F.Y. designed the device.J.C. conducted the measurements.J.C., X.H., and J.Y. analyzed the data.Y.Z., H.J., and L.Z. performed sample fabrication.J.C., X.H., X.S. and F.Y. wrote the manuscript.X.S. and F.Y. supervised the project.All authors discussed the results and contributed to revising the manuscript and the supplementary infor-mation.All authors contributed to the experimental and theoretical infrastructure to enable the experiment.In the main text, we have shown the efficient decomposition for multi-qubit controlled-Z (n-CZ) using the QuAND gate, which extends the classical AND logic to qubits.Basically, any multi-qubit controlled-unitary (CU) gates can be implemented efficiently in a same three-step procedure: embedding, controlled-unitary, and recovery (Fig. S1a). Figure S1b and S1c also show alternative way to synthesize generalized Toffoli and Fredkin gate (controlled-SWAP), which are important quantum logics.
Since our QuAND gate is a quantum implementation of AND logic leveraging ancilla level and since NAND gate is universal in classical circuit, all classical logic circuits can be efficiently constructed by adapting classical circuit optimization techniques with single-qubit, two-qubit CZ and QuAND gates.In fact, compared to the traditional Toffoli decomposition scheme, our scheme requires fewer ancilla qubits and gate operations.The QuAND gate is readily applicable to a large category of circuits and useful in simplifying circuit synthesis.Here we show three examples of leveraging QuAND gates for efficient synthesis of basic arithmetic circuits.
Figure S2 shows an efficient decomposition for incrementer using QuAND.The first half of the circuit -a sequence of QuAND gates -computes the carry information.The second half -a sequence of CNOT and reverse QuAND gates -recovers the original binary encoding and completes the incrementation.Figure S3 shows an efficient decomposition for constant adder using QuAND.The first half of the circuit -a sequence of G 0 or G 1 (constructed by QuAND and single-qubit X gates) gate depending on the corresponding bit value of b -computes the carry information.The second half of the circuit -a sequence of reverse G 0 or reverse G 1 , and CNOT gates -recovers the original binary encoding and completes the addition operation.Figure S4 shows an efficient decomposition for adder using QuAND, as inspired by [1].The M gate -constructed by QuAND and CNOT gates -computes the majority function and the carry information.The U gates undo the M gates and complete the addition of the two integers.
Depth of these circuits may be further reduced on topology with higher connectivity by the carry-lookahead technique.Other arithmetic and boolean logic circuits can be constructed in a similar way that replaces AND gates in classical circuits with QuAND gates.

II. MULTI-QUBIT TOFFOLI DECOMPOSITION
To compare different schemes for synthesizing n-qubit Toffoli gate, we list relevant references and their main properties in Table.S1 for all-to-all connection and in Table.S2 for 1-D chain topology.
In prior works, multi-qubit Toffoli gate can be decomposed to qubit-only circuit with linear circuit depth and size by using ancilla qubits.The textbook approach [2] reduces a big Toffoli to standard (3-qubit) Toffoli and costs n − 2 ancilla qubits for concatenating the AND results.He et.al.[3] provides a way to trade off between the number of ancilla qubits and circuit depth, but requires additional cost for feedback control or large constant factor.A similar approach from Barenco et.al.[4] uses the last control qubit as ancilla, saving ancilla qubit at the cost of circuit depth.Other works have focused on simplifying n-qubit Toffoli by ancilla levels.Ralph et.al.[5] and Lanyon et.al.[6] utilizes n-level qudit system to achieve 2n circuit depth and size by swapping the target state out of qubit space when the control qubits are not |1 .Gokhale et.al.[7] and Inada et.al.[8] proposes leveraging qutrit control to achieve at most 2n circuit depth and size.In particular, Gokhale et.al.[7] proposes a novel approach which utilizes |2 state for storing the AND result of control qubits and propagate the results with Toffoli-like gate conditioned on |2 state, achieving logarithmic depth.
. A circuit decomposition of constant adder using QuAND.The n-qubit binary input |a = |an−1 . . .a1a0 is added by a known constant integer b = bn−1 . . .b1b0 (b0 = 1) at the output.In our scheme, the circuit depth and size are both in line with the best value from previous works.However, it requires only one additional operation with the ancilla level, i.e. the |11 20| + |20 11| SWAP gate, which is naturally available in state-of-the-art hardware.The scheme features resource-efficient implementation in the sense that it is low-depth, free from ancilla qubits, and simple in control.These advantages plus compatibility with state-of-theart hardware are the key to our successful realization of the large-scale multi-qubit Toffoli gate and Grover's search algorithm.
Our scheme shows better scalability on qubit arrays with higher connectivity.The circuit depth can be reduced to 2 √ n on a 2-D square array and to 2log 2 n on a binary tree, as shown in Fig. S5.
Comparison of multi-qubit Toffoli gate decomposition assuming all-to-all connectivity.
Depth Size Constant Ancilla qubits Control Requirement Intuition Nielson and Chuang [2] n

III. DEVICE AND EXPERIMENTAL SETUP A. Wiring
The processor is made of aluminum on sapphire following a similar recipe as described in Ref. [9].It is mounted inside a dilution refrigerator at a base temperature of 10 mK.We magnetically shield the processor with two Cryoperm cylinders.Inside the refrigerator, we use a total of 10 coaxial lines for the qubit/coupler control, 1 input and 1 output line for qubit readout.Attenuators and filters are installed at different temperature stages for thermalization and noise attenuation.At the lowest-temperature stage, we use customized low-pass filters for attenuating noise on all the control lines.The output signals are amplified by a high electron mobility transistor (HEMT) amplifier (40 dB gain) at the 4K stage and another low-noise amplifier (50 dB gain) at room temperature.Circulators and filters are placed on the output line to block noises from higher temperature stages.The output signals are finally down-converted to intermediate frequency and demodulated by two analog-to-digital converters (ADCs) Microwave signals for single-qubit XY control and dispersive readout are up-converted from carriers generated by a microwave source using IQ mixing.We use diplexers to combine the XY and the Z signals at room temperature.For better impedance matching, an isolator is added to the XY port.

B. Device parameters
We summarize the measured device parameters in Table .III.Over time, we observe coherence fluctuations for som qubits, likely due to coupling to spurious two-level systems (TLSs).
We observe significant dephasing (Ramsey decay time ¡200 ns) from flux noise when performing two-qubit operations by adjusting the coupler frequency close to the qubit frequency.We fit the dephasing time to the slope of corresponding energy spectrum in order to extract the flux noise amplitude σ Φe = 116µΦ 0 according to the relation Γ s φ (Φ e ) = We use randomized benchmarking (RB) to characterize gate errors.For single qubit gates, the gate fidelity in both isolated and simultaneous benchmarking is near the coherence limit.For two-qubit gates, we observe worse error rate -typically 2-3 times -when performing simultaneous two-qubit RB experiments on all eight qubits.We suppose the cause is the spectator effect in the presence of unwanted stray coupling as discussed in [10].

C. Crosstalk
Crosstalk is a major technical challenge for large-scale processors.In Fig. S7, we show the measured flux (Z) crosstalk and residual ZZ interaction.The Z crosstalk is strongest between neighboring qubits (average: 0.23%, standard deviation: 0.16%).The ZZ crosstalk is also strongest between neighboring qubits (average: 52 kHz, standard deviation: 26 kHz).Both show very low level crosstalk.In addition, the effect from microwave (XY) crosstalk between neighboring qubits (not shown) are negligible in this device due to large detuning between red-band and blue-band qubits.There is negligible effect from XY crosstalk as evident by isolated and simultaneous single-qubit gate RB results (Table S3).

D. Readout correction
To correct state preparation and measurement (SPAM) error, we first find the transfer matrix R by preparing all computational states of the joint system and measured the final probability distribution, as shown in Fig. S8.Given the relatively small single-qubit gate error (0.14%), we have ignored the error of R itself.We then use R to correct readout results in subsequent experiments according to (1) b The coupling strength between the qubits and the coupler is frequency dependent, as gqc = rqc √ ωqωc.We extract the coupling coefficient by fitting the measured level spectra versus the flux bias on the coupler.c The gate errors are measured using single(two)-qubit randomized benchmarking (RB).For two-qubit RB, we extract CZ gate errors by subtracting single qubit errors from the average errors of two-qubit Cliffords.The simultaneous two-qubit RB is performed in group of (Q TABLE S3.Device parameters.

IV. COUPLER-ASSISTED SWAP GATE
A. Theory The tri-mode (Q-C-Q) system Hamiltonian in the laboratory frame under a flux drive H drive (t) is ( = 1) , where and The drive Hamiltonian can be divided into the adiabatic part H adia (t) and the parametric drive . Note that we have absorbed a drive-amplitude-dependent frequency shift from the nonlinear relation between coupler frequency and applied flux into the adiabatic part.Following the instantaneous eigenbasis defined by H static + H adia (t), we may rewrite an approximate Hamiltonian concerning only two levels of interest (|i and |j ), where σz = |i i| − |j j|, σ− = |i j|, ∆ ij is the instantaneous level spacing, n ij = i|a † c a c |j , and δ ij = n ii − n jj .Defining the following unitary operator, where ζ(t) = t 0 ξ(t )dt and assuming a constant drive amplitude A(t) = b, we can express the effective Hamiltonian Z crosstalk ZZ crosstalk FIG.S7.Z crosstalk coefficient (left) between couplers and ZZ interaction strength (right) between qubits.
The ZZ interaction strength is measured with all the couplers biased at the sweet spot.Multi-qubit readout correction matrix.Readout matrix for 8-qubit system |Q0Q1Q2Q3Q4Q5Q6Q7 .The readout matrix is measured by traversing 2 8 = 256 eigenstates of the system (lowest two states for each qubit).For each prepared state, we repeat multiplexed state measurement for 50000 times.
(towards the left in the Fig. S9A and D) are preferred.It's worth noting that there is generally a small avoided level crossing (∆ gap ≈ 2 MHz) between |110 and |002 , which we want to pass as fast as possible to avoid transition to |002 .Shortening the parametric pulse requires stronger parametric modulation amplitude and/or a larger transition matrix element | s|a † c a c |101 | (s=200 or 002) according to Eq. 5. Too strong a parametric modulation results in high-order interaction term that cannot be ignored in the Jacobi-Anger expansion.For small modulation amplitude and short length, the working point towards lower coupler frequency is preferred as the transition matrix element becomes significantly stronger due to wavefunction hybridization.
To avoid spurious transitions to other states, we compare their impact on the target transition across the bias range.Figure S9B and E show the calculated AC Stark shift δ = ( ∆ 2 + (r Ω 0 ) 2 − |∆|), a metric to quantify the spurious effect by taking into consideration both detuning from targeted to unwanted transition ∆ and drive strength r * Ω 0 , where Ω 0 is the Rabi frequency of the targeted transition given a certain drive amplitude, and r is the relative strength of the unwanted transition.In the experiments, we choose the extremum (black dot) as our operation point, balancing both the spurious effect and adiabatic constraint.The working points can be roughly identified from the transition spectroscopy experiment by comparing theory and experimental result, as shown in Fig. S9B and E.
At the selected operation points, we identify |101 ↔|200 transition through a swap-spectroscopy experiment (Fig. S10A).The targeted transition is well-separated from other spurious transitions.Finding the parametric frequency, we calibrate the SWAP gate (pulse width: 30 ns) by sweeping the pulse amplitude, as shown in Fig. S10B.We check the integrity of the selected transition by counting the probability at each computational state, and repeat it for four different input states.Residual transition errors (2.7% on average) are mainly caused by energy relaxation during the pulse.
V. IMPLEMENTATION OF n-QUBIT CZ GATES

A. Gate decomposition
The n−CZ gate scheme can be decomposed into SWAP gates calibrated for connected qubit pairs on the 8-qubit ring and single-qubit X gates.The circuit for implementing the n−CZ gate with n = 4, 5, 8 are illustrated in Fig. S11A.There are two considerations when selecting the qubits in each case.First, the qubit pairs at both ends of the ladder, Q 0 -Q 7 and Q 3 -Q 4 , have a relatively weak qubit-coupler interaction strength (g/2π ≈ 40 MHz), leading to a tighter adiabatic condition and an inferior gate performance.Second, the frequency of Q 3 is unstable, showing random telegraph behavior, possibly as a result of spurious two-level systems.The set of control waveforms for the 8-CZ gate

B. Reverse QuAND gate
Here we discuss about the SWAP gate phase ignored in the main text and explain how we effectively calibrate it away in the experiment.We only consider the case for SWAP between |11 and |20 .The other case for SWAP between |11 and |02 can be analyzed in the same way, so not to be repeated.
First, we show that the reverse QuAND can be implemented by changing the phase of the second parametric modulation.According to the time sequence shown in Fig. S12A, the lab-frame unitary operator for the SWAP gate between Here t a the denotes rising and falling time of the adiabatic pulse and t d the length of the parametric pulse.ω s , ωs , ω s denotes the idle frequency, average frequency during the parametric modulation, average frequency during the adiabatic rising/falling edge of eigenstate |s .The combined unitary (in lab frame) for two SWAP operation with an idling time t idle in between is

=
where the asterisk indicates the phase factor on the non-computational level which we do not care about.In the rotating frame which is equivalent to logical basis, the unitary is φ zz is the conditional phase accumulated on state |101 .This conditional phase comes from the adiabatic modulation and the idling part in between, so it is important to take both into account when calibrating the phase.The phase of the first parametric pulse can be simply set to 0. Then the reverse QuAND gate is implemented by calibrating the phase of the second parametric pulse until φ zz = 0.The single-qubit (local) phase can be conveniently compensated by virtual Z gates.The phase of the second parametric pulse is calibrated to cancel the conditional phase accumulation.(B) Circuit diagram for n-qubit CZ gates using experimental unitary operations.The phases of the SWAP gates and the compensatory Z gates are iteratively calibrated.(C) Propagation of (virtual) Z phase.We utilize the commutation relationship between Z gates and SWAP or n-qubit CZ gates (top) to modify the circuit in (B).The phases of the Z gates propagate to the end of the circuit and change the phases of subsequent X rotation and SWAP operations.

C. Phase calibration
The entire calibration process starts with the CZ gate in the middle, and then progressively extends to the outermost part of the circuit, as shown in Fig. S12B.In each iteration, we calibrate the phase of the parametric pulse in reverse QuAND through a conditional Ramsey experiment and Z gate phases through single-qubit Ramsey experiments.
We use virtual-Z gates to compensate the local phases induced by the flux pulse.Note that the SWAP gate and Z gate do not commute.When swapping the order of the two operations, the phase of the Z gate is absorbed into the phase of the parametric pulse.How virtual-Z phase propagate through the circuit is illustrated in Fig. S12C.The phase calibration is verified by multi-qubit conditional Ramsey experiments (Fig. S13).The circuit diagram is shown on the left.Q 0 is chosen as the target qubit in all the shown cases (n = 4, 6, 8) while the other qubits work as the control ones.There is approximately a π phase shift on the target qubit when all the control qubits are in the excited state, shown as the red lines, verifying the phase calibration.
To characterize the synthesized gate, we perform standard process tomography for the 4-qubit CZ or CCCZ gate (larger matrix is unattainable due to limited memory).The reconstructed process matrix χ exp from 4 4 = 256 distinct input states gives process fidelity F p = Tr(χ exp χ ideal ) =82.6% (Fig. S14).Note that the process fidelity of the 4-qubit CZ is lower then the 4-qubit Toffoli truth-table fidelity shown in the main text (both using Q 4 Q 5 Q 6 Q 7 ), partly as a result of underestimated phase errors in the truth-table measurements.
We estimate the T1-limited gate fidelity (F T1 ) by segmenting the circuit (in Fig. S15) and taking the product of T1-limited fidelity of each segment F T1 = Π j,k F j,k T1 .The relaxation rate of an instantaneous eigenstate during a flux pulse generally keeps varying during the pulse, due to the varying wavefunction participation of different bare states [10].The average relaxation rate of the eigenstate is calculated by summing up contribution from all bare states.The rate is larger than that of idle status, because of the strong overlap with the relatively short-lived coupler (see Table .

VII. GROVER'S SEARCH ALGORITHM
The Grover's search algorithm generally includes four steps (as shown in Fig. 3A in the main text): (i) initialize the n-qubit system into an equal superposition of all possible bit-string states 1 Step (ii) and (iii) may be repeated for M times for further amplification.The algorithm promises quadratic speedup, reaching the optimal amplification at M = 2 n/2 .

A. Error model for algorithm success probability
Here we provide a simple model for estimating the success probability (ASP) in Grover's search algorithm with non-ideal gate fidelity.To find one solution among an unstructured list of size 2 n using Grover's algorithm, the ideal ASP after M cycles of oracle queries is sin 2 (2M + 1) arcsin(2 − n 2 ) .When the coherence is completed destroyed, the measurement outcome is a uniformly random number between 0 and 1 because the last layer of the circuit is a layer of Hadamard gates.Hence, the ASP is 1  2 n .Since the circuit of each encoding-diffusing cycle contains two n-qubit CZ gates with fidelity F (ignoring single-qubit gate error), the fidelity of the whole circuit with M cycles is then F 2M .A empirical ASP model can be written ASP = F 2M sin 2 (2M + 1) arcsin(2 FIG. 1.Simplifying compilation using the quantum version of the AND (QuAND) gate.a, Circuit notation, truth table, and decomposition of the QuAND gate and its reversal.The AND-value qubit, indicated by an &, is referred to as the "parent" and the other qubit is referred to as the "child."A QuAND gate is indicated by an arrow pointing from the child to the parent, with an arrow in the opposite direction indicating a reverse QuAND gate.Both can be synthesized with a single-qubit X gate and a SWAP operation between |11 and |20 , which is indicated by a double-cross sign with a dashed cross on the child qubit.b, Circuit decomposition of an n-qubit controlled-Z (CZ) gate on a onedimensional qubit chain using a sequence of QuAND gates, a CZ gate, and a sequence of reverse QuAND gates, shown here with time progressing from left to right.During embedding, the sequentially applied QuAND gates register the AND results of all the qubits from the upper and lower halves of the chain onto the two root parents, Q k and Q k+1 , respectively.The embedded information is later released via the reverse QuAND gates to recover the original binary encoding.The CZ gate is only effective when all qubits are in state |1 .c, Sketch showing a quantum processor with qubits connected in an arbitrary topology.A branching tree is enacted for implementing the QuAND gate sequence (arrows) with time progressing from dark blue to light green.The CZ gate is performed between the two root parents.The QuAND gate could also be performed across multiple processors (arrows pointing from outside) to efficiently implement global operations on a larger quantum network.

FIG. 2 .
FIG. 2. Implementing the QuAND gate on a highscalability superconducting quantum processor.a, False-color micrograph of the device.Here, red and blue indicate the lower and higher fixed-frequency transmon qubits, respectively.b, Eigenenergies of the states |101 and |200 (trimode notation) in a qubit-coupler-qubit subsystem versus the coupler-flux bias, Φe.The thin black line with the embedded arrows indicates the state trajectory for the SWAP11−20 pulse sent to the coupler (inset), where Ap is the amplitude of the parametric drive on the flux pulse plateau.c, Measured final state probabilities in states |11 and |20 after the SWAP11−20 pulse versus the parametric drive amplitude.The dashed line indicates a full SWAP11−20 operation.
FIG.3.Low-depth synthesis of generalized Toffoli gates.a, Schematic diagram showing the compiled sequence for implementing the 4-qubit (left), 6-qubit (center), and 8-qubit (right) CZ gate circuit described in Fig.1bon the 8-qubit processor with qubits indexed from 0 to 7. The arrows denote the QuAND gate sequence with time progressing from dark blue to light green.We have omitted the reverse QuAND sequence.b, Measured truth tables of the corresponding generalized Toffoli gate.In these examples, the controlled-NOT operation is performed on Q7 in all cases.

Figure
Figure4bshows the results of the 4-qubit and 6-

1 -FIG. 4 .
FIG. 4. Demonstration of Grover's search algorithm with multiple amplification cycles.a, Circuit diagram implementing Grover's search algorithm.In this example, the encoded solution is 110101.Y ±1/2 indicates a ±π/2 rotation about the Y -axis.b, Measured output state probability distribution for each of the 2 n encoded states in the 4-qubit and 6-qubit Grover's search algorithms.c, Average algorithm success probabilities (dots) and four times the standard deviations (error bars) of all 2 n cases versus the number of oracle-amplification cycles.The solid lines are fit to Eq. (1) given finite gate fidelity F.
FIG. S1. Circuit decomposition of generalized Fredkin gate.(a) A circuit decomposition of multi-qubit controlled-U gate (left) using using QuAND, reversal QuAND and controlled-U gates (right).(b)A circuit decomposition of multi-qubit Toffoli gate (left) using n-qubit CZ and Hadamard gates (right).(c)A circuit decomposition of generalized Fredkin gate (left) using n-qubit Toffoli and CNOT gates. |a0

1 FIG
FIG. S2.A circuit decomposition of incrementer using QuAND.The n-qubit binary input |a = |an−1 . . .a1a0 is incremented to |a + 1 at the output.The subscript indicates the index of the binary digit.
Φe .The relaxation time of the couplers are measured by swapping excitation back and forth between the coupler and its neighboring qubit.The strong flux noise and shorter relaxation time of the couplers explain majority of the two-qubit gate error.
FIG. S8.Multi-qubit readout correction matrix.Readout matrix for 8-qubit system |Q0Q1Q2Q3Q4Q5Q6Q7 .The readout matrix is measured by traversing 2 8 = 256 eigenstates of the system (lowest two states for each qubit).For each prepared state, we repeat multiplexed state measurement for 50000 times.
FIG. S9.Optimizing the operation point for the SWAP gate.(A) Energy level spectra of the qubit(Q4)-couplerqubit(Q5) system versus the coupler flux bias Φe (solid lines).The inset shows the small avoided level crossing between |002 and |110 .(B) AC Stark shift induced by spurious transitions from the parametric drive versus flux bias.At each bias, we assume a drive amplitude that corresponds to 10 MHz |101 -|002 swapping rate, i.e.Ω =10 MHz.Several prominent transitions are identified.(C) Transition spectroscopy (initial state: |101 , measured qubit: Q4) by sweeping the frequency of the parametric pulse and the the flux bias amplitude.The experiment sequence is shown in the left.Here we identify transitions by measuring the probability of |0 , since the chosen readout frequency doesn't discriminate the first and second excited state.The black dots sharing the same horizontal axis indicate the optimal operation point for the parametric swap between |101 and |002 .(D) Same as (A) but the transition is to |200 .(E) Same as (B) but the resonant transition is between |101 -|200 .(F) Same as (C) but the measured qubit is Q5.

5 FIG
FIG. S10.Implementation of the coupler-assisted SWAP gate.(A) Swap spectroscopy (starting from |101 )by sweeping the length and frequency of the parametric pulse at Φe = 0.26 Φ0, using a weak parametric pulse amplitude (Ap ≈ 0.01).Several prominent transitions are identified.(B) Measured probability distribution of the final state after the SWAP pulse (coupler traced out) versus the parametric pulse amplitude, repeated for four different input states.The parametric pulse length is 30 ns.
FIG. S11. Circuit diagram and 8-qubit pulse sequence for n-qubit CZ gates.(A) Circuit diagram for implementing the 4-qubit (blue), 6-qubit (light gray), and 8-qubit (dark gray) controlled-Z gate.(B) Experimental pulse sequence for the 8-Toffoli gate.The black (purple) curve represents qubit-XY (coupler-Z) signal.The couplers are idly biased at the sweet spot and the second flux pulse is inverted for suppressing low frequency noise.

2 n/2 2 n − 1 s=0
|s ; (ii) encode the solution j with a phase oracle O j = s =j |s s| − |j j|, i.e. a conditional π-phase shift on state |j ; (iii) diffuse the encoded phase and amplify the probability of finding |j ; (iv) measure the final state.

B
FIG. S16.Multiple solution Grover's search algorithm.A Circuit diagram for implementing n-solution Grover's search algorithm.During the encoding process, we apply n phase oracles in succession.An example of a 4-qubit phase oracle is shown on the right.B Four-qubit two-solution Grover's search result (M = 1).With the first encoded state fixed as |0100 (left) or |1000 (right), the matrix shows how the measured probabilities of 2 4 = 16 states vary with the second encode state (y-axis).When state |0000 is encoded twice (bottom), the net effect equals to no encoding and all states are measured with about equal probability.C Four-qubit three-solution Grover's search result (M = 1).The first, second encoded states are fixed as |0000 ,|1000 (left) or |0100 ,|1100 (right) .The third one traverses over all the 16 logical states (y-axis).
Data availability: The data that support the plots within this paper and other findings of this study are available from the corresponding authors upon reasonable request.
I. LOGIC CIRCUIT CONSTRUCTION USING QUAND

TABLE S2 .
Comparison of multi-qubit Toffoli gate decomposition assuming a 1-D chain.FIG.S5.Schematics of synthesizing multi-qubit CZ gates on 2-D square array (left) and a binary tree (right).Circles indicate qubits and arrows indicate QuAND gates, pointing from child to parent.Color gradient indicates the temporal order.The reverse QuAND sequence is omitted.
Pulse sequence for QuAND and reverse QuAND gate.
Multi-qubit conditional Ramsey experiments for phase checking of n-qubit CZ gates.In the experiments, Q0 (target qubit) is initialized at the superposition state, while the others (control qubits) are either excited or in the ground state.The phase of the target qubit is flipped after n-qubit CZ if all control qubits are excited, shown as the red lines.