## Introduction

Fault-tolerant quantum computation offers the potential for vast computational advantages over classical computing for a variety of problems1. The implementation of quantum error correction (QEC) is the first step toward practical realization of any of these applications. This typically requires detecting the occurrence of an error by performing a stabilizer measurement, followed by either a corrective action on the physical device (active QEC), or a frame update in software (passive QEC)2,3,4,5,6,7,8,9,10,11,12,13,14. In either case, these checkpoints must occur regularly to protect a quantum state throughout the computation. To do that, one needs to rapidly measure the stabilizers with high fidelity and without disrupting the encoded qubit. In spite of these challenges, recent progress has been made in repetitive stabilizer measurements across a diverse range of physical architectures, including trapped ions15,16, superconducting qubits17,18,19,20,21, and defects in diamond22.

Active feed-forward control is useful not only for active error correction23,24,25,26, but also for other QEC schemes employing state injection and magic-state distillation3,27,28,29,30,31,32,33,34,35,36,37,38, both of which may be used in implementations of a universal logical gate set. In all cases, determining the appropriate control and implementing it in real time, with minimal latency is a particularly attractive capability for efficient error correction techniques10.

Furthermore, when the stabilizer measurements cannot be trusted as they themselves are error prone, one can introduce a decoder that uses information from multiple rounds of stabilizer measurements39,40 or from the spatial connectivity of the device41 to determine the appropriate correction. Unfortunately, performing the decoding calculation in software at a high level of the hardware stack hinders low-latency correction and fast feed-forward control due to the communication and computation overhead17.

In this work, we overcome this bottleneck by performing both QEC decoding and control with custom low-latency hardware, which acts as a classical co-processor to our quantum processor. We demonstrate repeated active correction, as well as real-time decoding of multi-round stabilizer measurements. We show that the decoding strategy successfully mitigates stabilizer errors, and identifies the encoded state with a latency far below the qubit coherence times, while matching the results obtained by post processing on a conventional computer.

## Results

### Repeated stabilizer measurements on a five-qubit device

For our demonstration, we implement a three-qubit code that corrects bit-flip errors ($$\hat{X}$$), and is sufficient to encode one logical bit of classical memory. We use an IBM five-transmon device similar to ibmqx2 (refs 42,43; Fig. 1), of which three transmons (D1, D2, D3) are used as data qubits, and two (At, Ab) as ancilla qubits to measure the stabilizers. Each qubit is coupled to a dedicated resonator for readout and control. Additional resonators dispersively couple D1, D2 with At and D2, D3 with Ab.

We perform CNOT gates between data and ancilla qubits by using a sequence of single-qubit gates, and a ZX90 rotation driven by the cross-resonance interaction44. By applying two CNOT gates in succession controlled by two different data qubits with a single ancilla as the target, the parity of the data qubit pair is mapped onto the ancilla state. The same protocol is applied simultaneously to both data qubit pairs, with the shared qubit D2 interacting first with At, then with Ab (Fig. 1b). The ancilla measurement result at,b = 0(1) ideally corresponds to even (odd) parity for the corresponding pair. We refer to the complete sequence comprised of four CNOT gates and ancilla measurement as a single error correction cycle. The result of each cycle (the measurements at, ab) is a syndrome that identifies which data qubit (if any) has most likely been subjected to an $$\hat{X}$$ error.

Key to preserving a logical state is the capability of repeating such stabilizer measurements16,17,20,21, which has two technical requirements. First, the two ancilla qubits must be reused at every cycle, either by resetting them to the ground state45,46 or by tracking their state. For either reset or state tracking by measurement, we need to ensure that the readout process is nondestructive, i.e., the result is consistent with the qubit state at the end of the measurement. This sets an upper limit to the allowable photon number, and therefore to the readout fidelity (Supplementary Methods). Second, the readout cavities must be depleted of photons before starting the new cycle to prevent gate errors. To accelerate the cavity relaxation to its steady state (near vacuum), we employ the CLEAR technique47 for the slower resonator coupled to Ab (Supplementary Table 2), reducing its average photon population to <0.1 in 600 ns. Altogether, we measure a single-round joint stabilizer readout fidelity of 0.61, averaged over the eight computational states of the data qubits (Supplementary Fig. 3).

### Real-time processing of measurement results

An integral part of our experiment is the interdependence of qubit readout and control, mediated by fast processing of the measurement results. For this purpose, we use a combination of custom-made and off-the-shelf hardware, consisting of pulse sequencers, receivers, and a processor (Fig. 1c), all based on field-programmable gate arrays (FPGAs). In particular, the storage capability of the processor FPGA enables expansion beyond one-time feedback protocols18,48,49, where conditional actions rely on single, or joint but simultaneous measurements. During each QEC cycle, digital-to-analog converters in the pulse sequencers produce a pre-programmed series of gate and measurement pulse envelopes. Each returning readout signal is captured by a receiver channel via an analog-to-digital converter, where it is integrated and compared against a calibrated threshold to determine the qubit state48. The processor collects all the digitized results and stores them in memory. After a preset number N of cycles have been executed, the processor feeds the stored values to an internal custom calculation engine. The engine function result is broadcast to the pulse sequencers, which conditionally apply a corresponding set of gates. The overall latency to store and process the classical data and to issue a conditional pulse is 590 ns, a small fraction of coherence times (Supplementary Methods).

### Preserving an encoded qubit state by active error correction

We will explore three distinct approaches to the bit-flip code. In all cases, we first prepare the logical excited state $$\left|111\right\rangle$$. Next, we apply one of the following schemes: (i) uncorrected, in which the cycle is performed but no correction is applied to the data qubits based on the syndrome measurements, and the ancilla qubits are not reset; (ii) repeated error correction (REC), in which, at the end of each cycle, a correction gate is conditionally applied to the data qubits based on the last syndrome results only, and the ancillas are reset; (iii) decoder error correction (DEC), in which N cycles are performed without ancilla reset or corrective gates, and the set of syndromes from all N cycles are used to determine and apply the optimal correction via a decoder40. To assess how well each code has protected the prepared state after a desired number of cycles, we perform a logical data measurement. This involves measuring the constituent physical data qubits and computing the majority function over the digitized results {d1, d2, d3}. In cases (i) and (ii), the majority function is calculated offline. In case (iii), the processor computes both the decoding and majority functions sequentially, making the result available for further conditional operations.

We begin by comparing the REC protocol to the uncorrected case (Fig. 2). For REC, there is a one-to-one relation between the two-bit value syndrome {at, ab} and one of the three possible corrective $$\hat{X}$$ gates (in blue in Fig. 2a), or no gate at all. The same syndrome value is used to actively reset the ancilla qubits for use in subsequent cycles (Fig. 2b).

When error detection is based on a single round of stabilizer measurements, it is impossible to distinguish between the targeted data errors and stabilizer measurement errors, largely caused by imperfect CNOT gates and ancilla readout. Thus, the resulting active correction directly propagates syndrome errors to the data qubits. These errors dominate in the case of d1 and d3, whose average values decay faster with the number of cycles than without active correction (Fig. 2b). Conversely, the larger intrinsic error per cycle for d2 (due to its shorter T1) is partially compensated by the protocol. Overall, this gain nearly balances out the errors introduced by the active error correction, as shown by comparing the results of the majority function (Fig. 2b). REC can be thought of as repeated one-time feedback, where the processor storage and calculation engine are unused. The added latency is considerable: for each cycle, the stabilizer results are aggregated by the processor and forwarded to the pulse sequencers (400 ns), followed by correction and reset operations (160 ns).

Improvements in logical state protection are achieved by correlating multiple stabilizer measurements using the DEC protocol. In ref. 17, a simplified minimum-weight perfect-matching decoder40 was used to post-process the syndrome results, and differentiate between true data bit flips and false positives. We apply the same method (Fig. 3), but with the crucial difference that the results are processed in real time. Specifically, the processor acquires stabilizer measurement results for N cycles, and uses the engine to decode them into the appropriate set of $$\hat{X}$$ gates using a precomputed lookup table. These corrections are then applied by the pulse sequencers on the data qubits. Finally, the data qubits are measured as in Fig. 2, with the majority function also computed on the processor. Whereas for N ≤ 2 the decoder is ineffective—as there are not enough records to identify ancilla readout errors—a gap emerges at larger N (Fig. 3c) in favor of the decoder. Furthermore, this scheme eliminates the per-cycle latency cost; the latest ancilla results can be processed while the next cycle is executing. The total additional latency becomes fixed at 1300 ns (590 ns for each processor engine call, 120 ns for corrective gates), approximately equal to that accrued over two REC cycles (Table 1).

In an effort to minimize errors by avoiding unnecessary quantum gates, it is important to move as many operations as possible to the classical hardware. In this case, we note that measuring the data qubits immediately after conditional $$\hat{X}$$ gates is equivalent to inverting the classical measurement result. Therefore, in a final experiment (Fig. 4, triangles), we dispense with the active correction and instead filter the di results based on the decoder output. This corresponds to a Pauli frame update (PFU)2 applied just before the measurement. The slight reduction in latency (120 ns) and error rate due to the absence of these pulses consistently achieves 1–2% improvement for all N over the actively corrected case. The results match those obtained by post-processing all the data in software (diamonds), confirming that the fast classical loop works as expected.

Finally, we evaluate the decoder against the majority result we would obtain by replacing the QEC gates and measurements, with an idle time of equal duration. The result shows that, for all N > 1, DEC has a higher success probability of determining the initial state compared to free decay (Fig. 4, gray curve).

## Discussion

Although the experiment ends with the measurement of the data qubits, the readily available majority result may be used to condition additional operations on a second encoded qubit. This would be the case, for instance, to teleport an S gate using a logical ancilla27. More generally, the ability to update the Pauli frame in real time will be essential to implement quantum algorithms at the logical level. Since not all gates can be transversal in any given code50, conditional operations based on the current frame can be used to complete the universal gate set (e.g., T gates in the surface code30).

In conclusion, we have demonstrated the repeated measurement and real-time processing of stabilizers for a minimal bit-flip code. An intertwined readout and control system provides a real-time interface to the quantum processor, converting a series of stabilizer results to the current Pauli frame without interrupting the execution of a potential algorithm. This approach is not limited to superconducting qubits, but is applicable to any quantum computing platform that faces coherence-limited operation.

Finally, we touch upon the applicability of the presented control architecture to larger circuits. Clearly, the use of a look-up-table as a decoder is a limiting factor, with the number of entries scaling exponentially with both circuit depth and width. However, we predict that 3–4 cycles of a small surface code40 are within reach of this approach, provided that the processor is upgraded with commercially available, albeit significantly larger, memory. Developing efficient decoders for the fault-tolerant scale is an active area of research, with promising results addressing both low-latency and scalability requirements51.

## Methods

### Classical control hardware

The real-time protocols presented above rely on the interconnection between the receivers (two Innovative Integration X6-1000M digitizers), the processor (BBN Trigger Distribution Module—TDM in ref. 48), and the pulse sequencers (BBN Arbitrary Pulse Sequencers—APS2). The event sequence and the communication between those instruments, as well as their interface with the qubit device (hosted in a Bluefors BF-LD400), are detailed in the Supplementary Methods.

### Numerical simulations

In this section, we describe the model and methods used to obtain the numerical simulation results shown in the main text. We chose not to do a full time-dependent master equation simulation of the error correction, as such an open system simulation is numerically intensive. Further, for the real-time error correction with ancilla reset, interspersing strong measurement and conditional operations within a time-dependent evolution is a nontrivial task. Instead, we use a simulation model that is approximate, but with well-controlled error that does not significantly reduce the accuracy of our results. We aim for qualitative agreement with the experimental results, using a model with no fit parameters (i.e., we do not search among models for a best fit to the data), with all free parameters in our model determined by independent characterization of the device.

Each round of the error correction can be thought of as an entangling operation (CNOT gates), followed by a measurement operation, followed by an optional correction and ancilla reset operation. This can be represented by the following composition of linear maps

$${\rho }_{i+1}={{\mathcal{R}}}_{{\bf{m}}}\circ {\mathcal{M}}\circ {\mathcal{E}}\left({\rho }_{i}\right),$$
(1)

where ρi is the state after the ith round of error correction, and $${\mathcal{E}}$$, $${\mathcal{M}}$$, and $${{\mathcal{R}}}_{{\bf{m}}}$$ are the entangling, measurement, and correction/reset operations, respectively, with the correction/reset operation being conditional on the vector of ancilla measurement outcomes m. The modeling of these operations is described in detail in the Supplementary Methods.