Realization of a quantum neural network using repeat-until-success circuits in a superconducting quantum processor

Artificial neural networks are becoming an integral part of digital solutions to complex problems. However, employing neural networks on quantum processors faces challenges related to the implementation of non-linear functions using quantum circuits. In this paper, we use repeat-until-success circuits enabled by real-time control-flow feedback to realize quantum neurons with non-linear activation functions. These neurons constitute elementary building blocks that can be arranged in a variety of layouts to carry out deep learning tasks quantum coherently. As an example, we construct a minimal feedforward quantum neural network capable of learning all 2-to-1-bit Boolean functions by optimization of network activation parameters within the supervised-learning paradigm. This model is shown to perform non-linear classification and effectively learns from multiple copies of a single training state consisting of the maximal superposition of all inputs.


I. INTRODUCTION
Deep learning is an established field with pervasive applications ranging from image classification to speech recognition [1].Among the most intriguing recent developments is the extension to the quantum regime and the search for advantage based on quantum mechanical effects [2].This effort is pursued in a variety of ways, often inspired by the diversity of classical models and based on the concept of artificial neural networks.Prior works have proposed quantum versions of perceptrons [3,4], support vector machines [5,6], Boltzmann machines [7], autoencoders [8], and convolutional neural networks [9][10][11].The advantage ranges from reducing the model size by exploiting the exponentially large number of amplitudes defining multi-qubit states, to speeding-up either training or inference by applying efficient quantum algorithms such as HHL [12] to solve systems of linear equations, or reducing the number of samples needed for accurate learning.
A promising implementation is based on variational quantum algorithms in which parametrized quantum circuits are used to prepare approximate solutions to the problem at hand.These solutions are then refined by classically optimizing circuit parameters [13].However, fundamental questions must be answered on the parameter landscape [14], on the cost of the classical optimization loop, and on the expressive power of circuit ansatze [15].Encouraging results suggest that trainability is possible for quantum convolutional neural networks [16,17].Still, it is recognized that loading training set data into a quantum machine accurately and efficiently is an unsolved problem [18] and, although promising results [19][20][21], current solutions work only under specific assumptions.Nevertheless, the exponential complexity of states generated by ever larger quantum computers [22] suggests that machine learning techniques will become increasingly important at directly processing large-scale quantum states [23].
It was noted in traditional machine-learning literature that non-linear activation functions for neurons are superior [24].To translate this observation to the design of quantum neural networks (QNNs), several methods have been proposed to break the intrinsic linearity of quantum mechanics.These solutions range from the use of quantum measurements and dissipative quantum gates [25], to the quadratic form of the kinetic term [26], reversible circuits [27], recurrent neural networks [28] and the SWAP test [29] with phase estimation [30].
Previous work in this context [10] has shown the implementation of neural networks applied in postprocessing to the classical results of measurements.Here, we experimentally demonstrate a quantum neural network architecture based on variational repeat-untilsuccess (RUS) circuits [31,32], that is implemented in a fully coherent way, handling quantum data directly, and in which real-time feedback is used to perform the internal update of neurons.In this model, each artificial neuron is substituted by a single qubit [33].The neuron update is achieved with a quantum circuit that generates a non-linear activation function using controlflow feedback based on mid-circuit measurements.This activation function is periodic, but locally resembles a sigmoid function.Despite the mid-circuit measurement, this approach does not suffer from the collapse of relevant quantum information.Rather, the measurement outcome signals that the neuron update is either successfully implemented or that a fixed, input independent operation is performed.This other operation can be undone by feedback and the circuit rerun as necessary until success, leading to a constant, not exponential, overhead in the number of elementary operations required by RUS.Note that the overall fidelity of RUS circuits critically depends on the architecture and speed of the active feedback mechanism.

Repeat-until-success conditional gearbox circuit
Our experiment uses 4 of the 7 transmons in a circuit QED processor [34] to implement a feedforward QNN with two inputs, one output and no intermediate layers.We demonstrate that the QNN can learn each of the 16 2-to-1-bit Boolean functions by changing the weights and bias associated to the output neuron.It is particularly noteworthy that this architecture allows implementation of the XOR Boolean function using a single neuron, since this is a fundamental example of the limitations of classical artificial neuron constructions, which cannot capture the linear inseparability of such a function.
We follow the supervised learning paradigm, in which a set of training examples provides information to the network about the specific function to learn.Our experiment uses multiple copies of a single input state (the maximal superposition of 4 inputs), demonstrating that the QNN can learn from a superposition.Finally, we investigate the specificity of parameters learned for each of the Boolean functions by characterizing how well the values learned for one function can be used for any other.This provides indications on using the QNN to discriminate between the Boolean functions when provided as a quantum black box.

II. RESULTS
Synthesizing non-linear functions using conditional gearbox circuits.The conditional gearbox circuit [35] belongs to a class of RUS circuits [31] that use one ancilla qubit Q A and mid-circuit measurements to implement a desired operation.The three-qubit version (Fig. 1a) has input qubit Q I , output qubit Q O and angles w and b as classical input parameters.For an ideal processor starting with Q A in |0 A , Q I in computational state |k I (k ∈ {0, 1}), and Q O in arbitrary state |ψ O , the coherent operations produce the state where θ k = kw+b and p S (θ k ) = cos 4 θ k 2 +sin 4 θ k 2 .A measurement of Q A in its computational basis produces outcome m A = +1 (projection to |0 A ) with probability p S (θ k ).In this case, the net effect on Q O is a rotation around the x axis of its Bloch sphere by angle g(θ k ), where is a non-linear function with sigmoid shape (Fig. 1d).This outcome constitutes success.
For failure (i.e., outcome m A = −1 and projection onto |1 A ), the effect on Q O is an x rotation by −π/2, independent of k, w and b.In this case, the effect of the circuit can be undone using feedback, specifically R π x and R π 2 x gates on Q A and Q O , respectively.The circuit can then be re-run with feedback corrections until success is achieved.For an ideal processor, the average number of runs to success, N RTS , is bounded by 1 ≤ N RTS = 1/p S (θ k ) ≤ 2. This bound holds even when Q I is initially in a superposition state |ψ I = α |0 I + β |1 I .In this general case, the output state upon success is still a superposition but with potentially different amplitudes: The probability amplitudes can change, from α k to α k , depending on the initial |ψ I , w, b, and N RTS .This distortion of probability amplitudes can be mitigated using amplitude amplification [36], which we do not employ here.
We compile the three-qubit conditional gearbox circuit into the native gate set of our processor (Fig. 1b) and evidence its action after one round using state tomography of Q O conditioned on success and failure.Figure 2 shows experimental results when preparing , setting b = 0 and sweeping w, alongside simulation for both an ideal and a noisy processor.Qualitatively, the experimental results reproduce the key features of the ideal circuit: we observe a πperiodic oscillation in p S (w) with minimal value 0.5 at w = π/2, and a sharp variation in Z O from +1 to −1 centered at w = π/2.However, the nonzero Z O components observed for both success and failure indicate that the action on Q O for both cases is not purely an x-axis rotation.The noisy simulation captures all key nonidealities observed.This simulation includes nonlinearity in single-qubit microwave driving, cross resonance [37] effects between Q A and Q O , phase errors in CZ gates, readout error in Q A , and qubit decoherence [38].
Control-flow feedback on a programmable superconducting quantum processor.Active feedback is important for many quantum computing ap-
Past demonstrations of QEC relied on the storage of measurements without real-time feedback [39,40].Moreover, real-time feedback has been demonstrated using data-flow mechanisms, where individual operations are applied conditionally [41].In contrast, the implementation of RUS hinges on support for controlflow mechanisms in the control setup (Fig. 4), where the entire sequence of operations has to be assessed and executed, depending on the results of measurements, in real-time.
In our quantum control architecture, a controller sequences the sets of operations to be performed in realtime, controlling various arbitrary waveform generators (AWG) and digitizers to implement the desired program.Therefore, our implementation of control-flow feedback focuses on this controller and achieves a maximum latency of 160 ns.The latency to complete the full feedback loop of the overall control system (controller, analog-interface devices, and the entire analog chain) was measured to be 980 ns.This represents 3% of the worst coherence time (see Table S1), and sets an upper bound on the efficiency of RUS execution with the quantum processor.Further improvements could be achieved by optimizing the design of our RO AWG for trigger latency and speeding up the task of digital signal processing within the digitizers.
Note that the critical feedback path consists of the entire readout chain in addition to the slowest instrument, whose latency must also be accounted for before the branching condition is assessed and implemented.In our control setup, the slowest instrument is the Flux AWG, due to the latency introduced by various finite input response and exponential filters implemented in hardware for the correction of on-chip distortion of control pulses [42].
Constructing a QNN using RUS circuits.The characteristic threshold shape of g makes it useful in the context of neural networks: the conditional gearbox circuit can be seen as a non-linear activation function, whose rotations are controlled by the input qubits to mimic the propagation of information between network layers.We use these concepts [33] to implement a minimal QNN capable of learning any of the 2-to-1-bit Boolean functions (see [38] for their definition and naming convention).These 16 functions can be separated into three categories (Table S2): two constant functions (NULL and IDENTITY) have the same output for all inputs; 6 balanced functions (e.g., XOR) output 0 for exactly two inputs; and 8 unbalanced functions (e.g.AND) have the same output for exactly three inputs.
The 4-qubit circuit shown in Fig. 3 corresponds to a 3-neuron feedforward network.Two quantum inputs (Q I1 and Q I2 ) are initialized in a maximal superposition state.Next, the RUS-based conditional gearbox circuit (now with three input angles w 1 , w 2 and b) where θ kl = kw 1 +lw 2 +b.Finally, Q A and Q O are compared by mapping their parity onto Q A and performing a final measurement on Q A in the computational basis.Training a QNN from superpositions of data.To train the QNN, we employ an adaptive learning algorithm [43] to minimize C over the full 3-D parameter space.Figure 6 shows the training process for NAND, chosen for the complexity of its feature space.The parameters evolve with each training step, starting from a randomly chosen initial point, then exploring the bounds, and subsequently converging to the global minimum in ∼ 50 training steps.This satisfactory behavior is observed for all the Boolean functions.
Following training of the QNN for each Boolean function, we investigate the specificity of learned parameters by preparing the 256 pairs of trained parameters and function oracles and measuring C for each pair.To understand the structure of the experimental specificity matrix (Fig. 7), it is worthwhile to first consider the case of an ideal processor (see [38]).Along the diagonal, we expect C = 0 for constant and balanced functions, which can be perfectly learned, and C ≈ 0.029 for unbalanced functions, which cannot be perfectly learned due to the finite width of the activation function g(θ).
For off-diagonal terms, we expect C at or close to multiples of 0.25, the multiple being set by the number of 2-bit inputs for which the paired training function and oracle function have different 1-bit output.For example, NAND and XOR have different output only for input 00, while TRANSFER1 and NOT1, which are complementary functions, have different output for all inputs.Note that every constant or balanced function, when compared to any unbalanced function, has different output for exactly two inputs.Evidently, while the described pattern is discerned in the experimental specificity matrix, deviations result from the compounding of decoherence, gate-calibration, crosstalk, and measure-ment errors.These errors affect the 256 pairs differently for two main reasons.First, the average circuit depth of the RUS-based conditional gearbox circuit is higher for unbalanced functions.Second, the fixed circuit depth of oracles is also significantly higher for unbalanced functions, as these all require a Toffoli-like gate which we realize using CZ and single-qubit gates.Noisy simulation [38] modeling the main known sources of error in our processor produces a close match to Fig. 7. Despite the evident imperfections, we have shown that it is possible to train the network across all functions, arriving at parameters that individually optimize each landscape.The circuit is thus able to learn different functions using multiple copies of a single training state corresponding to the superposition of all inputs, despite the complexity of feature space landscapes for various Boolean functions.

III. DISCUSSION
We have seen that RUS is an effective strategy to address the probabilistic nature of the conditional gearbox circuit, allowing the deterministic synthesis of nonlinear rotations.Even at the error rates of current superconducting quantum processors, it allowed the implementation of a QNN that reproduced a variety of classical neural network mechanisms while preserving quantum coherence and entanglement.Moreover, we have shown that this QNN architecture could be trained to learn all 2-to-1-bit Boolean functions using superpositions of training data.
This minimal QNN represents a fundamental building block that can be used to build larger QNNs.With larger numbers of qubits, these neurons could form multi-layer feed-forward networks containing hidden layers between inputs and outputs.Beyond feedforward networks, this minimal QNN is amenable to the implementation of various other network architectures, from Hopfield networks to quantum autoencoders [33].
Finally, this work highlights the importance of realtime feedback control performed within the qubit coherence time and the quantum-classical interactions governing RUS algorithms.The ability to implement RUS circuits is in itself a useful result, as the active feedback architecture demonstrated is crucial for various other applications of a quantum computer, including activereset protocols and the synthesis of circuits of shorter depth relative to purely unitary circuit design [31], of value in areas such as quantum chemistry.Moreover, recent work into quantum error correction (QEC) highlights the importance of real-time quantum control in protocols for the distillation of magic states or, when coupled to a real-time decoder, the correction of errors.Similarly to real-time feedback, the construction of a real-time decoder that meets the stringent require-ments for QEC with superconducting qubits requires application-specific hardware developments that are the focus of ongoing work.This supplement provides additional information in support of statements and claims made in the main text.

I. DEVICE CHARACTERISTICS
The device used is already introduced and described in prior published experiments [S1-S3].Select metrics for the four transmon qubits used in this work are provided in Table.S1. Figure S1 highlights the circuit QED elements allowing coherent control and measurement.Each qubit has a dedicated flux-control line, microwave drive line, and readout resonator with dedicated Purcell filter.Readouts of the four qubits employed in this experiment use a single common feedline.We note that Q A is driven from this feedline due to an issue with its dedicated microwave drive line.This leads to cross-resonance effects during single-qubit gates of Q A .The extra amplification required to overcome the filtering effect of the readout and Purcell resonators also leads to non-linearity when driving Q A (Section VI).

II. 2-TO-1-BIT BOOLEAN FUNCTIONS
The definition and nomenclature used for the 16 2-to-1-bit Boolean functions are presented in Table S2.The corresponding quantum oracles needed for the preparation of training datasets are presented in Fig. S3.These circuits are compiled using the native gate set of the processor, making simplifications wherever possible.For example, we substitute all CC-NOT gates with CC-iX gates (Fig. S6) as they can be implemented with lower circuit depth.This is possible as Q I1 and Q I2 are not reused after training set preparation in the QNN circuit (Fig. 3) and, therefore, the difference between CC-NOT and CC-iX gates is not relevant in this context.S1: Summary of select parameters and performance metrics of the four transmon qubits used in the experiment.Coherence times are obtained using standard time-domain measurements [S4].The multiplexed readout fidelity, F RO , is the average assignment fidelity extracted from single-shot readout histograms [S5] .The single-qubit gate fidelity, F 1Q , is extracted from individual single-qubit randomized benchmarking.The two-qubit gate fidelity, F 2Q , is obtained through interleaved randomized benchmarking with modifications to quantify leakage, L 1 [S6, S7].With the exception of frequency values, quantities listed are vulnerable to drift.For example, relaxation and dephasing times typically vary by several µs and readout fidelity and residual excitation vary by a few percent.

III. ERROR MITIGATION STRATEGIES A. Characterization and optimization of CZ gates
As observed in previous work [S1-S3] using this quantum processor, the residual ZZ coupling between qubit pairs constitute a significant source of error.This translates to spectator qubits coupling to either of the qubits involved in a CZ gate, leading to the increase of leakage and phase errors when spectators are not in |0 .To assess the phase impact of spectators, we fit the action of each CZ gate (between pairs Table S2: 2-to-1-bit Boolean functions.Naming, definition, truth table, and characteristic of the 16 2-to-1-bit Boolean functions.The functions have bit inputs I 1 and I 2 .Symbols , ⊕, ∧, and ∨ denote NOT, XOR (exclusive or), AND, and OR operations, respectively.Constant functions have the same output for all inputs.Balanced functions output 0 for exactly two inputs.Unbalanced functions have the same output for exactly three inputs.

Unbalanced functions
Constant functions : Training set preparation circuits for 2-to-1-bit Boolean functions.Three-qubit circuits (inputs Q I1 and Q I2 , and output Q A ) implementing oracles for the preparation of training sets of each 2-to-1-bit Boolean function.The circuits here are not written in the native gate set.When compiling them into the native gate set, we perform additional simplifications and add dynamical decoupling pulses for error mitigation.
the model   This model includes all single-, two-, three-, and four-qubit phase terms.To extract the 15 terms, we first measure the quantum phase imparted on each qubit for each of the 8 computational states of the other three qubits and then perform a least-squares fit to the model.Results are shown in Fig. S4.We observe single-qubit and two-qubit phase errors for CZ(Q A , Q I1 ) and CZ(Q A , Q I2 ), and particularly on terms Z O , Z A and Z O Z A .These are consistent with measurements of residual ZZ couplings between all qubit pairs in this quantum processor (Fig. S2), which show strongest coupling between Q A and Q O .
To mitigate these phase errors, two R π x gates are performed on 1b).This is done to symmetrize the population of the spectator qubits during the CZ gates while having the added gates compile to identity, leaving the overall effect of the circuit unchanged.The addition notably reduces phase errors (Fig. S4b-c).Characterization of CZ(Q A , Q O ) (Fig. S4a) showed accurate performance without similar error mitigation, which is likely due to low residual couplings of both We note that these phase errors are time varying and were captured here after CZ-gate calibration.Simulation efforts described below extracted different values for these angles, hinting at drift between calibration and data collection for the experiments.Characterization of microwave control on Q A through the effective rotation angle implemented on the qubit.Small inaccuracies stemming from non-linearity of the microwave-drive chain can be accounted for in this way, to ensure fine control is implemented.These measurements are carried out while preparing all spectator qubits in |0 .
To ensure proper calibration of the arbitrary single-qubit rotations required, despite known non-linearities associated with the amplifiers and microwave-drive lines required for the implementation of these gates, the Rabi oscillation of Q A is thoroughly characterized using quantum state tomography (Fig. S5).Using this dataset, the effective rotation angle of Q A is computed and used to correct for these effects.Despite our best efforts, errors consistent with over-rotations on Q A are still observed in the horizontal compression evident in Fig. 2).The implementation of oracles for unbalanced Boolean functions requires three-qubit operations (Fig. S3).We can use the CC-iX gate (Fig. S6) as a proxy to the CC-NOT (Toffoli) gate, which can be implemented with lower depth.The difference between CC-iX and CC-NOT is only a two-qubit phase that is of no relevance in the QNN circuit (Fig. 3).

C. Characterization and optimization of CC-iX gate circuit
YoZ plane) Figure S7: Characterization of CC-iX gate decomposition.Full tomography of Q A after circuit implementing CC-iX gate before (a-g) and after (h-n) optimization meant to symmetrize the population of Q I1 and Q I2 during the circuit.
The effect of this circuit on Q O is characterized through tomography for various input states (Fig. S7a).In particular, the result of optimizing the circuit against residual ZZ effects by symmetrizing the population of Q I1 and Q I2 using R π x gates is studied (Fig. S7b).This optimization produced only minor improvements, most likely owing to the reduced residual ZZ couplings observed between Q A , Q I1 and Q I2 .

D. Characterization of RUS correction pulse
The use of the gearbox circuit with RUS is contingent on the ability to recover Q A and Q O in case of failure.This can be done with R π x and R π 2 x rotations, respectively.However, inaccuracies in the CZ gates stemming from residual ZZ couplings lead to a dependency of the optimal Q O correction pulse on w 1 and w 2 .This is studied further with recourse to simulation using realistic parameters extracted from hardware (Fig. 2).Furthermore, spectator toggling effects during the idling time of Q O are expected to lead to a coherent rotation of the qubit, effectively changing the axis of the π 2 rotation required to bring the qubit to |0 .To characterize the optimal correction pulse, tomography is performed on Q O after running the conditional gearbox circuit through the first measurement (Fig. 1b) and post-selecting on m = −1.To maximize the probability of failure, therefore increasing the significance of the results acquired, this measurement is performed for (w 1 , w 2 , b) = (π, −π, π/2).The results (Fig. S8) showed a coherent effective rotation R  each based on 8000 repetitions.These data correspond to the diagonal of the specificity matrix (Fig. 7).The increased C observed for unbalanced functions (right half) results from higher circuit depth in both the implementation of the function oracles as well as the RUS-based conditional gearbox circuit (note the higher N RTS for these functions).x , R w2 x and R b x gates on Q A .This circuit is deterministic and, contrary to the circuit presented in Fig. 1, does not require an ancilla qubit.Instead, the training set preparation is effected directly on Q A , after which C is assessed through m A .

V. COMPARING GEARBOX ZERO ITERATION, ONE ITERATION AND ORIGINAL ACTIVATION
To study the effectiveness of correction and the practical usefulness of employing a RUS strategy with the gearbox circuit, a variation of the gearbox circuit is implemented such that Q A always completes after the first iteration, with no correction in case of failure.Furthermore, following the observation that a simple Rabi oscillation has non-linear Z (θ) (although g(θ) = θ for an analogy to Fig. 1), we propose a 3-qubit version of the quantum neuron that is capable of using this property as its activation function.Although such a circuit follows a slightly modified construction (Fig. S11), it should still be able to implement a three-neuron feedforward network with parameters (w 1 , w 2 , b), without needing an extra ancillary qubit.However, this should come at the cost of a softer activation function.Indeed, the difference between sinusoidal and sigmoid-like activation functions gives the original QNN circuit an advantage in ideal simulation.The effectiveness of the two new circuit variations in experiment (no-correction and Rabi activation) is illustrated through their feature space landscapes (Fig. S12) for XOR, IMPLICATION 2 and NAND, a set representative of the complexity of feature spaces for all the functions considered.For functions whose minimum is expected with parameters for which ideally N RTS = 1 (XOR is the only such example here), all circuit variations appear to work equally well.However, for functions whose minimum is expected for parameters leading to a higher N RTS , the no-correction circuit variation already highlights several distortions, leading to minima that privilege always outputting one regardless of its inputs, i.e., w 1 = w 2 = 0 and b = π (Fig. S12e), a perversion of the expected behavior of the network, highlighting its failure to properly weigh and learn the output for all inputs equally in this configuration.
Having performed training using the same procedure for all three circuit variations, the results in both minimum cost-function value achieved and learned parameters are compiled (Table S3) for a quantitative comparison.They show that in all instances, the value of C is higher for the original activation circuit without correction than for the same circuit making full use of RUS, demonstrating the usefulness of this strategy already in this limited where f NL is a non-linearity factor.This form captures the dominant third-order nonlinearity.The smaller the value of f NL , the stronger the non-linearity.We set f NL from a fit to Fig. 2. 7. Coherent phase errors during CZ gates.We simulate the phase action of each CZ gate as a four-qubit operation according to Eq. ( S1), but truncating terms with negligible phase.In practice, the dominant phases errors are on terms Z A , Z O , and Z A Z O .We obtain these errors for CZ(Q A , Q I1 ) and CZ(Q A , Q O ) from a fit to Fig. 2. We do not include errors on CZ(Q A , Q I2 ).
8. Increased dephasing of flux-pulsed qubits during CZ gates.The higher-frequency transmon in the pair is pulsed away from the sweetspot.In addition, for CZ(Q A , Q I1 ) (respectively CZ(Q A , Q I2 )), the spectator qubit Q I2 (respectively Q I1 ) must also be pulsed away (this action is sometimes referred to as "parking").All such pulsing causes a suppression of T 2 .For simplicity, we take T 2 suppression to be the same for all pulsed qubits, setting the value from a fit to Fig. 2. 9. Measurement-induced phase shift.Mid-circuit measurements of The stochastic nature of the RUS-backed conditional gearbox circuit is taken into account in simulation in the following manner.The measurement of Q A as part of the neuron update collapses the state of Q A to one of two density matrices, ρ 0 and ρ 1 , depending on the ancilla qubit collapsing to |0 or |1 , respectively.Since the simulator maintains a complete representation of the quantum state at each point of the circuit, we have complete access to the two (un-normalized) density matrices.We apply the measurement-induced phase to Q O .Then we apply the misclassification of the measurement outcome with probability p, leading to density matrices ρ succ = (1 − p)ρ 0 + p ρ 1 , ρ fail = (1 − p)ρ 1 + p ρ 0 corresponding to declared success and failure, respectively.At this point we apply the remaining circuit (paritycheck comparison and training-set preparation) to ρ succ and the correction sub-circuit and then repeat the neuronupdate step for ρ fail .The simulation results are obtained as the incoherent sum of the ρ succ at each attempt.Notice that ρ succ and ρ fail are not normalized and that their norm represents the probability of the corresponding history of failures and success.Figure S13 helps to visualize the method described.

VII. SIMULATION RESULTS
In this section we provide simulation results for an ideal processor and a noisy one, and compare them to experimental data.We show the feature landscapes for NAND in Fig. S14.The noisy simulation qualitatively matches the experiment, with similar distortion and reduced contrast of the feature space landscape relative to the ideal simulation.Quantitative discrepancies between noisy simulation and experiment likely result form additional errors not included in simulation, most notably transmon leakage during CZ gates.Note that the minimal C , indicated by the black dot in the left panels, is achieved for different values of (w 1 , w 2 , b).For an ideal processor, there are multiple (w in noisy simulation and experiment.Finally, we compare the specificity matrices obtained in simulation and experiment (Fig. S15).The horizontal axis corresponds to the Boolean function used in training, while the vertical axis corresponds to the Boolean function used to test the network.We use ideal simulation to gain familiarity with the expected structure of the specificity matrix.For optimal parameters (w 1 , w 2 , b), we use any one of the several choices that minimize C .The lowest values of C are found along the diagonal.This is expected, as in this case learning and testing functions match.Constant and balanced functions can be perfectly learned, and thus C = 0 for these.Unbalanced functions cannot be perfectly learned, and for these we find C ≈ 0.029.Half of the next-to-diagonal elements have C equal to or close to unity, since the testing function is the complement of the function learned (the specific function whose output differs for all four inputs).For all entries, C is equal to or close to a multiple of 0.25, the multiple corresponding to the fraction of input states for which the output of the learned and testing functions differs.For any two constant of balanced functions, the outputs differ for 0, 2, or 4 inputs.The same holds for any two unbalanced functions.This explains the structure of the lower-left and top-right quadrants.Outputs for any constant/balanced function differ from those of any balanced function for either 1 or 3 inputs.This explains the structure of the top-left and bottom-right quadrants.
For noisy simulation we use the optimal parameters obtained in simulated training.We observe a qualitative similarity between experiment and noisy simulation.

Figure 1 : 2 x 2 x
Figure 1: Conditional gearbox circuit using repeat until success.(a) Three-qubit circuit with input parameters (w, b) ideally implementing R g(w+b) x on Q O for Q I = |1 and R g(b) x for Q I = |0 , heralded by m A = +1 (success).For m A = −1 (failure), the circuit ideally implements R − π 2 x on Q O .The probabilistic nature of the circuit is rectified using RUS: in case of failure, Q A and Q O are first reset (R π x and R π 2 x , respectively) and the circuit re-run.(b) Compilation into the native gate set after circuit optimization and added error mitigation (two refocusing pulses on Q O during Q I -Q A CZ gates).(c) Illustration of the ideal action of the conditional gearbox circuit on Q O when starting in |0 O .(d) Comparison of the ideal g(θ) to a Rabi oscillation of Q O , showing the non-linearity of g.

Figure 2 :
Figure 2: Synthesis of non-linear functions using a conditional gearbox circuit.(a) Probability of success and failure at first iteration of the conditional gearbox circuit (Fig. 1) as a function of w (b = 0).(b,c) Pauli components of Q O assessed by quantum state tomography conditioned on (b) success and (c) failure, for Q I prepared in |1 .(d) Purity of Q O for success and failure.All panels include experimental results (symbols), ideal simulation (dashed curves), and noisy simulation (solid curves).

Figure 4 :
Figure 4: Quantum control setup.(a) Schematic of wiring and control electronics, highlighting critical feedback path between outputs of the quantum processor, the analog-interface devices, controller and the flux-drive lines; (b) Timing diagram for the critical feedback path.Latency includes processing times necessary for synchronicity and hashed regions indicate idling operations for each instrument.

Figure 5 :
Figure 5: Feature space landscapes of three Boolean functions.2-D slices of C and N RTS for XOR (a, b), IMPLICATION 2 (c, d), and NAND (e, f).For each function, the slice includes (w 1 , w 2 , b) parameters that minimize C for an ideal quantum processor.Black dots indicate the experimental parameters achieving minimal C within each slice.

6 CFigure 6 :
Figure 6: Learning the NAND function.(a) Training the QNN to learn NAND over the full parameter space (w 1 , w 2 , b) by minimizing C with an adaptive algorithm.Training starts from a randomly-chosen point, then explores the boundaries, and ultimately converges within ∼ 50 steps.(b-e) Evolution of training parameters (w 1 , w 2 , b) and C as a function of training step.The current best setting achieved is marked by a star.

Figure 7 :
Figure 7: Specificity of the quantum neural network.Cost function of the optimized parameter set for every training function (horizontal axis) against all oracle functions (vertical axis).In each axis, the functions are ordered from constant, to balanced, to unbalanced.Functions are put alongside their complementary function (NULL and IDENTITY, TRANSFER1 and NOT1, etc.).For an ideal processor, C values are expected at or close to multiples of 0.25, due to the varying overlap between the 16 Boolean functions (i.e., the number of 2-bit inputs producing different 1-bit outcome).Further differences arise in experiment due to variation in the average circuit depth of the RUS-based activation functions and in the fixed circuit depth of oracle functions.
.M. performed the experiment and data analysis.M.B., N.H. and L.D.C. designed the device.N.M., C.Z. and A.B. fabricated the device.M.S.M., J.F.M. and H.A. calibrated the device.M.S.M., W.V., J.S., J.S. and C.G.A. designed the control electronics.G.G.G. and L.D.C performed the numerical simulations and motivated the project.M.S.M., G.G.G. and L.D.C. wrote the manuscript with input from A.Y.M., S.P.P. and X.Z.A.Y.M. and L.D.C. supervised the theory and experimental components of the project, respectively.SUPPLEMENTAL MATERIAL FOR 'REALIZATION OF A QUANTUM NEURAL NETWORK USING REPEAT-UNTIL-SUCCESS CIRCUITS IN A SUPERCONDUCTING QUANTUM PROCESSOR'

Figure S1 :
Figure S1: Superconducting quantum processor with overlaid circuit topology.Optical image of the quantum processor with added falsecolor to emphasize different circuit QED elements.Qubit names are also overlaid to indicate the four transmons used in this work.The green (red) patch shows the transmons used in the 3-qubit conditional gearbox circuit (Figs. 1 and 2) and in the QNN (Figs. 3 to 7).

Figure S2 :
Figure S2: Residual ZZ coupling.Characterization of residual ZZ coupling between all qubit pairs at the bias point (simultaneous flux sweetspot).The matrix elements indicate the shift in frequency experienced by one qubit (target qubit) when another (spectator qubit) changes from |0 to |1 .The procedure used for this measurement is similar to the one described in [S1].

Figure S4 :
Figure S4: Characterization of native two-qubit gates.Characterization of the phase action of the CZ gates between Q A and each of Q O , Q I1 , and Q I2 , separated into one-, two-, three-and four-qubit phase terms.

BFigure S5 :
Figure S5: Characterization of arbitrary-angle rotations of Q A .Characterization of microwave control onQ A through the effective rotation angle implemented on the qubit.Small inaccuracies stemming from non-linearity of the microwave-drive chain can be accounted for in this way, to ensure fine control is implemented.These measurements are carried out while preparing all spectator qubits in |0 .

Figure S6 :
Figure S6: CC-iX gate decomposition.Decomposition of the CC-iX gate into the native gate set of the quantum processor.To maximize fidelity, dynamical decoupling pulses are added to mitigate the effect of residual ZZ coupling.

− π 2 − 40 • 2 −Figure S8 :Figure S9 :
Figure S8: Characterization of correction pulse.Characterization of Q O after one iteration of the RUS conditional gearbox circuit, performed through full tomography conditioned on failure.This experiment allowed the calibration of an optimal pulse to recover Q O , before another iteration of the conditional gearbox circuit is attempted.This characterization highlights a coherent phase error in Q O .

Figure S10 :
Figure S10: Metrics of quantum-neural-network training.(a) C and (b) N RTS with optimized parameters for each Boolean function.Error bars represent the standard deviation over 50 function evaluations,each based on 8000 repetitions.These data correspond to the diagonal of the specificity matrix (Fig.7).The increased C observed for unbalanced functions (right half) results from higher circuit depth in both the implementation of the function oracles as well as the RUS-based conditional gearbox circuit (note the higher N RTS for these functions).

Figure S11 :
Figure S11: Quantum neural network with sinusoidal activation.3-qubit circuit implementing a Rabi activation function with control parameters (w 1 , w 2 , b).The circuit takes advantage of the Rabi oscillation to implement a softer activation function through controlled R w1x , R w2 x and R b x gates on Q A .This circuit is deterministic and, contrary to the circuit presented in Fig.1, does not require an ancilla qubit.Instead, the training set preparation is effected directly on Q A , after which C is assessed through m A .

Figure S12 :
Figure S12: Comparison of feature space landscapes obtained for variations of the activation function circuits.Landscapes of C for functions XOR (a, b), IMPLICATION 2 (c, d), and NAND (e, f), representative of different Boolean functions, obtained with variations of the activation function circuit.(a, c, e) represent activation through a single iteration of the gearbox circuit (sigmoid-like activation), without correction on failure, and (b, d, f) use the Rabi oscillation (sinusoidal-like activation) as a softer non-linear function.These represent 2-D slices of the 3-D landscapes, chosen at specific cuts where simulation indicated the minimum to be located.The points corresponding to minimal C for each of these cuts are represented in all subplots.

Figure S13 :
Figure S13: Simulation technique to account for the stochasticity in repeat-until-sucess circuits.The red arrows indicate how we incorporate errors due to Q A readout misclassification in the RUS procedure.

Figure S14 :
Figure S14: Simulation of feature space landscapes for the NAND function.Top panels show noiseless simulation of (a) C and (b) N RTS for NAND along 2-D slices of the 3-parameter space.Similarly, panels (c and d) show noisy simulation results and panels (e and f) the corresponding experimental results (same as Figs.5e and 5f, respectively), reproduced here to facilitate comparison.Black dots indicate the parameters minimizing C within the slice.
Quantum neural network using the repeat-until-success conditional gearbox circuit.(a) Schematic representation of simplest feedforward network, highlighting the role played by parameters (w 1 , w 2 , b) in weighing sum of input signals, before result is passed through non-linear activation function.Q I1 and Q I2 are input nodes, Q O is the output node and Q A is an ancilla used first within the RUS circuit and then as expected output for the training set.(b) Quantum circuit for a 3-neuron feedforward network.This circuit is divided into four steps.Input (Q I1 , Q I2 ) preparation into maximal superposition; threshold activation into Q O using RUS conditional gearbox circuit with (w 1 , w 2 , b); unitary encoding of Boolean function (AND, in this case) using oracle; and comparison of Q A with Q O .The symbol denotes parking of spectator qubit Q I2