Overcoming leakage in quantum error correction

The leakage of quantum information out of the two computational states of a qubit into other energy states represents a major challenge for quantum error correction. During the operation of an error-corrected algorithm, leakage builds over time and spreads through multi-qubit interactions. This leads to correlated errors that degrade the exponential suppression of the logical error with scale, thus challenging the feasibility of quantum error correction as a path towards fault-tolerant quantum computation. Here, we demonstrate a distance-3 surface code and distance-21 bit-flip code on a quantum processor for which leakage is removed from all qubits in each cycle. This shortens the lifetime of leakage and curtails its ability to spread and induce correlated errors. We report a tenfold reduction in the steady-state leakage population of the data qubits encoding the logical state and an average leakage population of less than 1 × 10−3 throughout the entire device. Our leakage removal process efficiently returns the system back to the computational basis. Adding it to a code circuit would prevent leakage from inducing correlated error across cycles. With this demonstration that leakage can be contained, we have resolved a key challenge for practical quantum error correction at scale. Physical realizations of qubits are often vulnerable to leakage errors, where the system ends up outside the basis used to store quantum information. A leakage removal protocol can suppress the impact of leakage on quantum error-correcting codes.

Leakage of quantum information out of computational states into higher energy states represents a major challenge in the pursuit of quantum error correction (QEC).In a QEC circuit, leakage builds over time and spreads through multi-qubit interactions.This leads to correlated errors that degrade the exponential suppression of logical error with scale, challenging the feasibility of QEC as a path towards fault-tolerant quantum computation.Here, we demonstrate the execution of a distance-3 surface code and distance-21 bit-flip code on a Sycamore quantum processor where leakage is removed from all qubits in each cycle.This shortens the lifetime of leakage and curtails its ability to spread and induce correlated errors.We report a ten-fold reduction in steady-state leakage population on the data qubits encoding the logical state and an average leakage population of less than 1 × 10 −3 throughout the entire device.The leakage removal process itself efficiently returns leakage population back to the computational basis, and adding it to a code circuit prevents leakage from inducing correlated error across cycles, restoring a fundamental assumption of QEC.With this demonstration that leakage can be contained, we resolve a key challenge for practical QEC at scale.
Quantum error correction (QEC) promises to exponentially suppress uncorrelated errors in quantum computing devices, bridging the gap between achievable physical error rates and the low logical error rates required for useful quantum algorithms [1][2][3].The surface code is a promising candidate for experimental implementations of QEC, where a repetitive stabilizer circuit protects a logical qubit state.
Leakage is particularly dangerous in the context of QEC [27][28][29][30].A key underlying assumption of QEC is that the physical errors to be suppressed are sufficiently uncorrelated in both space and time.Contrary to this requirement, a qubit in a leakage state can induce er-rors on multiple neighboring qubits, even causing them to leak as well [29].The correlated spread of errors through the device represents a major problem for experimental QEC.Identifying and post-selecting out leakage events has permitted cutting-edge experiments on the surface code [15,16], and partial leakage removal has been integrated into surface code circuits [14,17].However, all these experiments displayed a characteristic rise in the number of detected errors as the code progressed, indicative of accumulating leakage population in the device.A demonstration of leakage removal from all qubits in a surface code circuit has not yet been reported.Further, stabilizing the leakage populations such that error rates do not grow over time is a requirement for scalable QEC, and this remains an important open challenge.
Here, we study and remove the effects of leakage in a surface code circuit on an array of transmon qubits.First, we detail the dynamics of leakage in the QEC circuit and the spread of errors through space and time.We quantify the effect of leaked qubits undergoing multiqubit interactions, which is the primary vehicle for spatial propagation of leakage.Second, we demonstrate the effective removal of leakage from all qubits involved in the surface code circuit.We show residual leakage populations averaged over all qubits are suppressed to below 1×10 −3 , and do not grow as the code is extended in time.Finally, we show that removing leakage improves logical performance.Using a distance-21 bit-flip code with leakage removal, injected leakage impacts logical performance equivalently to injected Pauli errors.This confirms that leakage removal is effective in suppressing the correlated nature of leakage-induced errors.Then, using a distance-3 surface code, we show that leakage removal both decreases the rate of logical errors and prevents the code performance from declining over time, proving that QEC can be stable when carried out over many cycles.We extrapolate this behavior to larger code distances operating well below threshold, where we find that injected leakage impacts logical error rates in the same fashion as uncorrelated Pauli errors.In summary, leakage removal overcomes an important obstacle to growing QEC to algorithmically relevant scales.

CHARACTERIZING THE SPREAD OF LEAKAGE
Leakage states (Figure 1a) are particularly problematic in structured QEC circuits because they are longlived and spread through the device, inducing correlated errors in both space and time.The surface code circuit displayed in Figure 1b shows a single cycle, which consists of a number of moments.A moment is a grouping of gates operated concurrently in time.Four such moments correspond to CZ gates used to measure the surface code stabilizers.When a qubit in the circuit leaks, subsequent The energy potential of a transmon qubit, illustrating the computational energy levels |0 and |1 (blue) and the leakage levels |2 and higher (red).b) The circuit for surface code QEC, showing a square grid comprised of measure qubits (light blue circle) and data qubits (orange squares).The cycle consists of four layers of entangling gates, along with intervening single qubit rotations, followed by the measurement (M) and reset (R).The reset operation here is shown across all qubits; it may be implemented as single qubit operations on the measure qubit, or include entangling operations with various neighboring data qubits.c) The time decay (main, blue) and spatial spread (inset) of leakage in a distance-3 surface code following the injection of |2 on the central data qubit.Each cycle takes approximately 1 µs, and leakage population is measured at the end of each cycle.The expected decay of |2 from T1 relaxation on the leaked qubit alone is indicated (dashed red).Excess leakage population is defined as the subtraction of leakage population in the absence of injection from the leakage population in the presence of injection.
gates involving that qubit produce additional errors.Figure 1c illustrates the dynamics of leakage in a distance-3 surface code circuit.At the cycle labeled 0, we inject a full |1 → |2 rotation on the central data qubit, producing an expected near-50% |2 population.It takes many surface code cycles before this injected leakage population decays sufficiently, with an exponential decay constant around 4.4 cycles.However, this decay is somewhat faster than the expected decay from T 1 relaxation of |2 alone.The insets show that the leakage population does not stay on the injected qubit, but is also transported to neighboring qubits as the circuit progresses.At the small code distance being considered, this transport is enough to affect every qubit involved in the circuit.
Without any attempt to remove it, a single leakage event persists for many rounds and spreads a signifi-cant distance through the device, affecting many measurements and inducing many error detection events.The number of uncorrelated errors required to produce the same effect is the decomposed weight of the leakage event [27].This high weight of leakage events when decomposed into uncorrelated errors makes them especially problematic for QEC.
The precise dynamics of leakage depends primarily on the details of the entangling gate used in the circuit.Here, we focus on the diabatic CZ gate used in the Sycamore architecture [14,17,31].This gate involves biasing qubits to satisfy the resonance conditions indicated in Figure 2a, and tuning the interaction strength to achieve a rotation of 2π in |11 ↔ |20 .We maintain the convention that the higher-energy qubit state is listed first in two-qubit states |HL .This resonance condition also aligns other resonances that involve leakage states.In particular, the |30 ↔ |12 resonance enables a twophoton process which allows |2 on the lower-energy qubit to move to |3 on the higher-energy qubit.Similarly, the |31 ↔ |22 resonance enables |3 on the higher-energy qubit to cause the lower-energy qubit to leak to |2 , while the higher-energy qubit remains leaked in |2 .These socalled leakage transport processes are what allow leakage to spread, even in a single QEC cycle.
The amount of leakage transport a gate produces is not normally calibrated, and so depends on the chosen gate length and effective coupling between levels.Figure 2b shows how a calibrated CZ gate affects populations, as measured by the circuits shown in Figure 2c.In this device, we find around 18% of the population of |30 is transported to |12 and vice versa.The transport population is around 61% for |31 ↔ |22 .We can also see the first indications of expected higher resonances such as |42 ↔ |33 .Data for each individual experiment and further characterisation of the readout can be found in Supplementary Information Section S1.
Even in the absence of leakage transport, we find that leakage induces additional errors in the CZ gate.When the higher-energy qubit is in |2 and the lower-energy qubit is in the computational basis, leakage transport is not possible but a significant phase error is imparted on the non-leaked qubit.When a CZ gate is applied as in Figure 2d with the higher-energy qubit in |0 , we expect to see no phase shift φ = 0 on the lower-energy qubit.With the higher-energy qubit prepared in |1 , we expect to see a phase shift φ = π, indicating a wellcalibrated CZ gate. Figure 2e shows the relative phase for 20 pairs of qubits.When computational states |0 and |1 are prepared we see tight groupings around the expected phase shifts φ = 0 and φ = π, respectively.However, when a leakage state is prepared on the higherenergy qubit, we see a phase shift near φ ≈ 0.65π.This represents a significant computational error on the nonleaked qubit, and is a significant source of errors to be detected and corrected as leakage spreads.Output state Leakage transport and phase errors in CZ gates.a) The eigenenergy ladder for a pair of qubits satisfying the resonance condition for a diabatic CZ gate, where the qubits are detuned by their common nonlinearity |η|.We denote the two-qubit states |HL with the higher (lower) energy qubit first (second).In addition to the intended resonance (|20 ↔ |11 ), higher levels also satisfy a resonance condition, either directly (|31 ↔ |22 ) or mediated by a two-photon process (|30 ↔ |12 ).b) The relative population transport (net change in state populations) ∆Pt for the diabatic CZ gate, including the first two leakage levels.The rotation in |20 ↔ |11 has been calibrated to 2π.Highlighted are the off-diagonal elements due to the couplings between higher levels, with average relative population transport |∆Pt| shown below.c) The two circuits used to measure the relative population transport shown in (b).We subtract the population transport Pt in the baseline experiment without a CZ gate (right) from the experiment with a CZ gate (left).d) The circuit for the modified Ramsey experiment shown in (e) with an interleaved CZ gate to a neighboring qubit at a higher frequency, followed by tomography on the lower frequency qubit.e) The measured phase shift φ during the modified Ramsey experiment with the neighboring qubit prepared in |0 , |1 , or |2 , shown in an ECDF over 20 qubit pairs, with the mean value indicated by the dashed line.The CZ gate should produce a phase shift of φ = 0 for an input |0 , and a shift of φ = π for an input |1 .A spurious phase shift near φ ≈ 0.65π is produced when the higher-energy qubit is prepared in |2 .
These results illuminate the dangers of leakage: A single leakage event on any qubit will expose many CZ gates to a leaked input state before it decays sufficiently.Each of these interactions has a significant probability to introduce new computational errors, move the leakage to another qubit, or induce additional leakage on previously non-leaked qubits.In QEC circuits, these effects are damaging enough that they must be included in simulations to achieve good agreement with experimental performance [17].Accordingly, we are motivated to remove leakage in the code circuit so as to suppress these effects.

SUPPRESSING LEAKAGE POPULATIONS DURING A QEC CIRCUIT
Having better understood the dangers of leakage in QEC circuits, we turn to removing it.An unconditional reset gate can remove all energy from a qubit, including when it starts in a leakage state, and can be applied to the measure qubits at the end of each cycle [32][33][34][35][36].However, our study of leakage transport motivates the need to remove leakage from the data qubits as well.Leaving the computational state intact is incompatible with unconditional reset and requires a more delicate leakage removal operation.
Three broad approaches for leakage removal have been proposed: swap-type [27,37], where the roles of measure and data qubits are exchanged at a regular interval by the use of additional operations; feedback-type [29,30] where the leakage is identified classically from measurement patterns and feedback is applied to return the qubit to the computational subspace; and direct-type [38] where an operation is used to remove leakage from a qubit without disturbing the computational states.In light of our findings on leakage transport, swap-type strategies become more difficult to justify; only half the qubits are reset in each cycle, and so leakage may still move between qubits and thereby spread through time.Similarly, the conditional nature of feedback-type approaches prevents them from fully solving the leakage problem -leakage states cause several errors before they are noticed and corrected.Hence, we pursue a direct removal approach.
In the following sections, we present and compare three leakage removal strategies.First, No Reset forgoes any operations at the end of the cycle, representing the best case for a simple Pauli error model, but the worst case for leakage.Second, MLR applies multi-level reset (MLR) gates [35] on measure qubits immediately after measurement at the end of every cycle.This adds additional error to the cycle due to additional data qubit idle time while the gate is performed, but has been previously shown to remove leakage population and improve logical performance compared with the baseline No Reset strategy [35].Finally, in DQLR we perform a multi-level reset on the measure qubits followed by data qubit leakage removal (DQLR), consisting of a two-qubit interaction to transport leakage from data to measure qubits and a reset gate on measure qubits.Additional details on the DQLR process and constituent operations are included in Supplementary Information Section S2.Leakage population during surface code execution.a) Average leakage populations for data qubits (squares) and measure qubits (circles) measured at the end of each surface code cycle with No reset (red), MLR (green), and DQLR (blue).b) Top: The surface code circuit shown for a pair of neighboring measure and data qubits.Each surface code cycle is highlighted (rounded rectangles).Bottom: A single surface code cycle showing each moment in the cycle.c) Leakage populations after each moment in the cycle for MLR (green) and DQLR (blue) leakage removal strategies, averaged over data qubits (squares) and measure qubits (circles) and over cycles 25-30.To compare the leakage dynamics for the three strategies, we implement a distance-3 surface code on a Sycamore processor.We measure the evolution of leakage population as the surface code progresses by truncating the circuit in time and performing a measurement that can resolve |2 on all qubits [35].In Figure 3a, we perform this truncation at the end of each surface code cycle (top of Figure 3b).Using No Reset, we observe a gradual rise in leakage populations over all qubits, reaching nearly 5% average leakage population for data qubits and nearly 3% for measure qubits over 30 cycles.We note that, even after 30 cycles, leakage populations have not stabilized and continue to grow.Using MLR reduces average measure qubit leakage populations to about 3 × 10 −4 , but average data qubit populations still rise to over 1.5%.Using DQLR suppresses average leakage populations to around 10 −3 for data qubits and less than 10 −4 for measure qubits.Most importantly, DQLR maintains these levels throughout the full 30 cycles.
We can use the same technique to study the dynamics of leakage within a surface code cycle, by truncating the circuit at each moment midway through a cycle (bottom of Figure 3b).Figure 3c shows the leakage population measured after each moment in the cycle, averaged over cycles [25][26][27][28][29][30] where the leakage populations have stabilized.We neglect the No Reset strategy here, as leakage populations do not stabilize.With MLR, the average leakage population on the data qubits saturates to a stable value around 1.5%, consistent with Figure 3a.However, the average measure qubit leakage population starts each cycle at a very low value near 2 × 10 −4 , grows over the course of the cycle as operations produce leakage, and is then reduced back to its initial low value by the reset procedure.This lets us estimate that the operations produce around 5 × 10 −3 leakage in each cycle.With DQLR, we see that leakage populations for both measure and data qubits grow over the course of the cycle, and are removed by the reset procedure.The data qubits start each cycle with around 1×10 −3 leakage population, again increasing to around 5×10 −3 immediately following measurement, before it is removed.The measure qubits attain even lower leakage populations compared to MLR.
These results demonstrate that our DQLR procedure successfully suppresses steady-state leakage populations to previously unachievable levels and stabilizes those levels over the course of a long QEC circuit.The removal strategy also contains the leakage dynamics to a single cycle.However, the residual ability for leakage to spread and induce correlated errors within a single round [28] should be the subject of further study.

EFFECT ON QEC LOGICAL PERFORMANCE
Having achieved low leakage populations in both data qubits and measure qubits with our DQLR procedure, we turn to evaluating logical performance.We consider two codes providing complementary information: a distance-21 bit-flip code and a distance-3 surface code.Our physical qubit error rates place the surface code close to threshold, whereas the bit-flip code is well below threshold [14,17].The vastly lower logical error rates for the bit-flip code give us finer resolution on the effect of leakage within the code.In contrast, the surface code is a more challenging circuit for calibration and operation, and is sensitive to both bit-flip and phase-flip errors, providing an environment where more potentially adverse effects from reset can be detected and measured.
Figure 4a shows the logical error probability of a

Pauli injection b
We compare P L to injected Pauli error "population" P P , which is produced by X and Z rotations on the data and measure qubits (Figure 4b, right), respectively, taking advantage of the classical nature of the bit-flip code.The Pauli error rotation angle θ P is where the missing factor of 2 relative to the definition of leakage population accounts for Pauli rotations always affecting the qubit state in the computational basis, whereas leakage injection only applies to qubit population in |1 .We fit the experimental data and numerical simulations to an offset power law as a guide, as detailed in Section S5 of the Supplementary Information.
With No Reset, even small amounts of injected leakage population less than 1% cause the logical error probability to rise above 40%.This is in contrast with correctable Pauli errors, which can be introduced to around 5% population before similar logical error probabilities are encountered.With MLR, the logical error probability is drastically lowered without injection, consistent with prior measurements in bit-flip codes [35].Still, the logical error probability rises much more rapidly when injecting leakage compared to injecting Pauli errors.We attribute this to unmitigated leakage accumulation on the data qubits, which leads to high decomposed weight of uncorrelated errors and ultimately logical errors.When we prevent this leakage buildup with DQLR, we observe a much smaller difference between the code's response to injected leakage compared to injected Pauli errors.This is strong evidence that the DQLR operation has successfully reduced the decomposition weight of a leakage event to near 1.In this situation, leakage has around the same influence on logical performance as an equivalent amount of Pauli error, and has been prevented from effectively spreading and inducing correlated errors.
We also note the good agreement between data and numerical simulation for injected leakage and Pauli errors, quantifying our understanding of the effects of leakage in the code with both MLR and DQLR strategies.In both cases, we note that we slightly underestimate the logical error induced by injected leakage, illustrating the difficulties of fully capturing the effect of correlated errors even with DQLR preventing substantial spread across cycles, and emphasising the importance of future work on leakage dynamics inside a single cycle.Nonetheless, the close correspondence of the Pauli simulation to the injected leakage experimental data for DQLR helps justify future Pauli simulations as useful estimates of final code performance when leakage is removed each cycle.
Figure 5a shows the average detection probabilities corresponding to the weight-4 stabilizers in the distance-3 surface code.Detection probabilities are the fraction of the total number of experiments where an error was detected on a given stabilizer.With No Reset, the buildup of leakage population produces more errors as the code progresses, creating a rising pattern of detection probability.With MLR, a large portion of this rise is mitigated, but the detection probability still rises by 2.5% over the course of the first 15 cycles.With DQLR the detection probability immediately stabilizes to around 18% and remains steady throughout the code duration.We attribute this to the recurrent removal of leakage on all qubits preventing growth in leakage populations and resulting correlated errors over time.This resolves a key concern in state of the art QEC work [15][16][17] where detection probabilities were found to rise even with partial leakage removal or post-selection.These results confirm the relationship between rising detection probability and rising leakage populations and demonstrate the resolu- a) The detection probability averaged for the weight-4 stabilizers in a distance-3 surface code under the three leakage removal strategies studied in this work.b) Logical error probability for a distance-3 surface code run for 15 cycles, under varying injected leakage population and the three different leakage removal strategies studied in this work.The inset shows that the circuit has an included layer where leakage is injected by performing a |1 ↔ |2 rotation.c) Dependence of projected surface code error budget 1/Λ 5/7 (the inverse of the exponential error suppression factor between a distance-5 and distance-7 surface code) under injected leakage population compared between MLR (green) and DQLR (blue).Solid lines are fits to a ratio of offset power laws, while the dotted light blue line is a linear fit of the data using DQLR.
tion of this effect.
In Figure 5b, we evaluate the three leakage removal strategies by measuring the logical error probability of a distance-3 surface code after 15 cycles.At 0% injected leakage the circuit corresponds to the standard code circuit with an additional idle where the injection is otherwise inserted.Over the range of injected leakage population values, No Reset exhibits the worst log-ical performance, followed by MLR, with DQLR having the lowest logical error probability.This confirms that DQLR improves logical errors by suppressing correlated errors from leakage, despite the additional cycle time and errors introduced by the DQLR operations.Further, No Reset and MLR degrade in logical performance faster with more injected leakage when compared to DQLR.
In order to study surface code performance in a regime further below threshold, we turn to numerical simulations of distance-5 and distance-7 surface codes.To consider scaling performance, we use the exponential error suppression factor Λ 5/7 , defined as Λ 5/7 = ε 5 /ε 7 , where ε 5 and ε 7 are the logical error rates for a distance-5 and distance-7 surface code, respectively.In Figure 5c, we investigate Λ 5/7 for a hypothetical device with lower component errors than what is currently realizable (see Supplementary Information Section S6 for details).In particular, we set intrinsic leakage rates to zero and vary the probability of leakage injection.With no leakage in the system, Λ 5/7 ≈ 7.2, independent of leakage removal strategy.However, when injecting up to 4 × 10 −3 leakage population per round (comparable to intrinsic leakage rates in current devices), the surface code error budget 1/Λ 5/7 [17] rises rapidly and nonlinearly for MLR.In contrast, with DQLR, leakage increases 1/Λ 5/7 much more slowly and with a near-linear dependence on injected leakage population, characteristic of an uncorrelated error source [14,17].With this ability to maintain effective error suppression in the presence of leakage, DQLR successfully mitigates the dangers of correlated leakage-induced errors to scalable QEC.

SUMMARY AND OUTLOOK
We have demonstrated the effective removal of leakage from all qubits involved in a surface code QEC circuit.Moreover, we have shown that when leakage is removed on all qubits, correlated leakage-induced errors are suppressed.At the same time, the logical performance of the code improves outright and stabilizes in time.We confirm the conjecture that growth in logical errors is attributable to leakage, and we do not uncover other major sources of logical error that grow as the code continues in time.
With these findings, we unequivocally resolve the longstanding concern that qubits with weak nonlinearity cannot successfully implement QEC at long times due to correlated leakage-induced errors.As such, we confirm that large arrays of transmon qubits are a viable and promising architecture for QEC at scale.

DECLARATIONS
During the diabatic CZ gate, additional levels are placed on resonance and contribute to the leakage transport phenomenon depicted in the main text Figure 2. The resonance |30 ↔ |12 allows a two-photon transition mediated by |21 , which is detuned by around the nonlinearity η.If g is the induced coupling between |11 and |20 , then the effective couplings are: Then, for a CZ gate where g is maintained for time t, the population transport P t for |30 ↔ |12 can be estimated as To measure the leakage transport in the diabatic CZ gate, we calibrate a readout pulse capable of distinguishing all of the four lowest qubit energy levels, as shown in Figure S1a.When we encounter |4 during measurement using this readout pulse, we assign and identify it as |3 .The baseline experiment consists of preparing a given two-qubit state using microwave drives and then performing simultaneous readout of the qubits.The "Baseline" matrix of Figure S1b shows the results of this experiment, illustrating that performing readout simultaneously does not impact the high distinguishability between all twoqubit states.We can also see that the majority of the error in this simultaneous readout is due to T 1 decay during the readout process.These decay channels reduce the populations on the main diagonal by a few percent, and become more prominent for higher levels.The "With CZ gate" matrix of Figure S1b shows the same experiment with a CZ gate inserted between state preparation and measurement.We then see new off-diagonal processes corresponding to leakage transport.We subtract the "Baseline" matrix from the "With CZ gate" matrix to produce the matrix shown in Figure 2b of the main text.The leakage transport processes that are most relevant to the spread of leakage in our system correspond to the |30 ↔ |12 and |31 ↔ |22 rotations.In Figure S1c, we numerically show the individual relative population transport values ∆P t for these two processes.We take

58%
Input state With CZ gate the mean of the absolute value of the relative population transport values to determine the values at the bottom of Figure 2b of the main text.
To measure the spurious phase shift generated by leaked higher-energy qubits undergoing the CZ gate, we perform the experiment detailed in Figure S2a.The higher-energy qubit in the pair is prepared in each of |0 , |1 , and |2 , while a Ramsey experiment is performed on the lower-energy qubit with an interleaved CZ gate.On the lower-energy qubit, we vary the tomography phase φ T of the second pulse relative to the first pulse in the Ramsey experiment to record data as shown in Figure S2b for each input state.We fit a sinusoid to the experimental data and extract the phase offset φ.We perform this experiment on 20 pairs of qubits, and show the empirical cumulative distribution function of the extracted phase offsets in Figure 2e of the main text.

S2. LEAKAGE REMOVAL STRATEGY DETAILS
We studied three leakage removal strategies; No reset, MLR and DQLR.We now describe them in greater detail.
For No reset, we add no additional operations at the end of each cycle.Because this prepares the qubit for the next cycle in whichever state was measured rather than deterministically in |0 , this also requires the redefinition of the detectors in the surface and bit-flip code circuits: rather than comparing time-neighboring measurements, we compare time-next-neighboring measurements on the Cross-entropy benchmarking (XEB) of DQLR operation.Inferred XEB error per cycle for 9 data qubits when the DQLR operation is applied on the data qubit, compared to XEB error per cycle when the data qubit idles for the equivalent time.Shaded region indicates 1 SD error of inferred XEB error per cycle.Vertical dashed lines indicate the mean XEB error per cycle over all data qubits.(Inset) XEB circuit where random single-qubit unitary rotations (U) are repeatedly applied to the target qubit, interleaved with a reset operation (R), which is either DQLR or idle (Idle).The final state of the target qubit is measured.The cross-entropy between the measured and expected distribution of states is calculated and the resulting XEB error per cycle is inferred.
same measure qubit to detect errors [10].This redefinition has an insignificant impact on code performance, especially when compared to the studied effects of leakage, and so we neglect it from our analysis.
For MLR, we add the multi-level reset (MLR) operation introduced in [35] on the measure qubits at the end of each cycle.Additional pulse shaping on the diabatic return and calibration improvements allow us to achieve gate times of 160 ns without impacting performance.
For DQLR, we first perform the MLR operation on all measure qubits, as in MLR.Following that, we perform the DQLR procedure, which consists of a LeakageISWAP gate between pairs of measure and data qubits, and a second reset of the measure qubits.The LeakageISWAP gate is similar to the diabatic CZ gate used in the surface code and bit-flip code cycles, but executes an ISWAP gate in the |11 − |20 subspace.We note that this DQLR procedure relies on the high fidelity of the preceding MLR operation; when the measure qubit is prepared in |0 , the LeakageISWAP gate removes |2 on the data qubits.Any reset error leaving |1 on the measure qubits will be converted into leakage on the data qubit by the LeakageISWAP gate.Our results show that this error path is sufficiently low in probability so as not to increase the leakage population on the data qubits.
Ideally, DQLR should not induce additional errors on the data qubit.In particular, when the lower-energy measure qubit is in |0 , the LeakageISWAP gate should act as an identity operation on the data qubit computational basis.However, the non-zero time to execute the DQLR procedure introduces incoherent errors caused by relaxation, in addition to coherent errors from miscalibration.We evaluate the impact of the DQLR procedure on the data qubit state using cross-entropy benchmarking (XEB).The inset of Figure S3 shows the experimental circuit used to evaluate XEB error.The upper and lower qubits mimic the role of data and measure qubits, respectively.The section of the circuit within the parentheses is repeated a variable number of times, and the final state of the upper qubit is measured.The crossentropy of the measured and expected distributions of states is calculated as a function of repetitions, and then the XEB error per repetition is extracted.In a given repetition, a random unitary U is executed on the upper qubit, followed by a reset operation R. In the case of DQLR, R is substituted with the DQLR procedure.We compare this to the Idle case, where R is replaced with waiting for the duration of the DQLR procedure.We carry this measurement out over the 9 pairs of data and measure qubits corresponding to the pairings used in the distance-3 surface code experiment.By comparing the resulting distribution of XEB error per cycle for DQLR and Idle, we conclude that the DQLR operation does not induce significantly more errors than idling for the equivalent duration.Furthermore, the mean error rate of less than 2.5 × 10 −3 per cycle when using the DQLR operation is low enough so that the operation is suitable to be added to a sensitive QEC circuit such as the surface code.We note that leakage is still a relevant consideration in this XEB experiment, and is captured by XEB error per cycle as an incoherent error.Thus, it is possible that the leakage removal properties of DQLR result in underreported XEB error per cycle when compared to Idle, which allows for leakage to accumulate over the course of the XEB circuit.

S3. EFFECT OF RESET STRATEGIES ON LEAKAGE DYNAMICS
As we have demonstrated in the main text, leakage transport can move leakage population from qubit to qubit in a structured circuit such as the surface code.Once a data qubit is leaked, leakage removal strategies must be employed or the leakage population may remain for many QEC cycles and cause additional leakage and leakage-induced error through leakage transport.In Figure S4, we evaluate the dynamics of leakage population in a surface code under the three leakage removal strategies discussed in this work.We fully inject leakage at the beginning of the first cycle by performing a |1 → |2 rotation on the central data qubit to obtain near-50% population of |2 .We measure excess leakage population, which is defined as the leakage population without injection subtracted from the leakage population with injection.This allows us to separate the contribution of intrinsic heating to leakage population dynamics from the injected leakage population.
For No reset, leakage transport allows for leakage to move freely throughout the qubits surrounding the central data qubit, eventually resulting in measurable excess leakage population in nearly all of the 17 qubits involved in the distance-3 surface code.The lack of leakage removal procedures on either the measure qubits or data qubits leaves T 1 energy relaxation of |2 as the primary mechanism for leakage population mitigation.As we showed in Figure 1c of the main text, T 1 of |2 can be longer than 10 surface code cycles, and this number is expected to increase as qubit coherence improves and as surface code cycle durations shorten.Hence, relying on energy relaxation is not a viable strategy for leakage removal in QEC.This is partially addressed by MLR by applying a multi-level reset gate on all measure qubits at the end of each cycle.In Figure S4, the effect of this operation appears as reduced excess leakage populations on all measure qubits at the end of every cycle.However, leakage transport within a QEC cycle can still move leakage population beyond nearest-neighbor qubits.This is readily observed in the excess leakage population of the data qubit two sites away from the central data qubit, which exhibits increasing population even though the measure qubit between the two data qubits has its leakage population removed by the MLR operation at the end of every cycle.
DQLR suppresses the ability of leakage to hop between data qubits by directly removing a large fraction of all the data qubits' leakage populations at the end of each cycle.
In particular, the central data qubit has its excess leakage population reduced to less than 1% after the first cycle.Similarly, other data qubits that previously leaked due to leakage transport have their excess leakage populations reduced close to the measurement floor.The shortened lifetime of leakage on all qubits is clearly seen for DQLR after two QEC cycles, where excess leakage population over all qubits is less than 1 × 10 −3 .

S4. EFFECT OF RESET STRATEGIES ON QEC ERROR DETECTION
In Figure S5, we compare the time dynamics of the bit-flip code under the three different reset strategies presented in the main text.We execute a distance-21 bit-flip code over 60 cycles, as described by the circuit in Figure 4d of the main text.For MLR and DQLR, we inject both 0% and 1% leakage population in each cycle, whereas for No reset we do not inject leakage.In the first few cycles of code execution, differences in the logical performance of the bit-flip code between the various leakage removal strategies and injection populations are difficult to distinguish -we attribute this to time-boundary effects where physical errors have not sufficiently accumulated to manifest as a logical error, and statistical limitations where the logical error probability is much smaller than what is Logical error probability of the distance-21 bit-flip code over 60 cycles for the three reset strategies No reset (red), MLR (green), and DQLR (blue).For MLR and DQLR, we also execute the code while injecting 1% leakage population per cycle.Early in the code (fewer than 5 cycles), boundary effects cause all three reset strategies to perform similarly.As the code progresses through more and more cycles, however, logical performance for the three strategies diverge as leakage populations are handled differently.Distance-3 surface code detection probabilities.
Average detection probabilities over 30 surface code cycles for weight-2 (circles) and weight-4 (squares) stabilizers for three leakage removal strategies No reset (red), MLR (green), and DQLR (blue).Individual stabilizer detection probabilities are shown as lighter lines.
resolvable by the number of trials.However, the logical performance for the three leakage removal strategies begins to diverge after about 10 cycles.The accumulation of leakage on measure and data qubits causes a rapid rise of logical error probability for No reset, where it exceeds 1 × 10 −2 error by 25 cycles.This is in contrast to MLR and DQLR without leakage injection, which continue to have less than 3 × 10 −3 logical error probability through 60 cycles; DQLR has the best performance with about 1 × 10 −3 logical error probability at 60 cycles.
Turning to the cases where we inject 1% leakage population per cycle for MLR and DQLR, we observe a no-table distinction in logical error probability scaling over cycle number.With MLR, the logical error probability rises to about 1 × 10 −2 by 30 cycles, with a similar qualitative behavior to No reset without leakage injection.However, when we use DQLR, the code sustains logical error probabilities of less than 5 × 10 −3 over 60 cycles.This is a signature that QEC scaling in time can be more easily achieved when using DQLR.
In Figure S6, we present the average weight-2 stabilizer detection probabilities in addition to the average weight-4 stabilizer values already shown in Figure 5a of the main text.Additionally, we show the detection probabilities associated with the individual stabilizers.We can draw parallel conclusions for the behavior of weight-2 stabilizers as we did for weight-4 stabilizers in the main text.For No reset, the average weight-2 stabilizer detection probabilities rise and do not stabilize over the course of 30 surface code cycles, and even exhibit damped oscillations at early cycles.When using MLR, the detection probability stability improves significantly and the average probability only rises by about 1 × 10 −2 over 10 cycles before flattening.However, the best performance and stability is achieved with DQLR, where the average weight-2 stabilizer detection probability remains at 11% over the course of the entire 30-cycle experiment.

S5. FITTING TECHNIQUES FOR LOGICAL ERROR PROBABILITY
We employ phenomenological models to fit experimental and simulated data.The models are implemented on the logical error per cycle ε of the QEC code.In experimental and simulated data, we measure logical error probability p L after n QEC cycles.The conversion between ε and p L after n cycles is given by With respect to injected error population P , ε can be modeled as a power law with an offset, where a, b, P 0 are phenomenological free parameters.We do not ascribe a physical meaning to P 0 even though it may appear to be "intrinsic" error at P = 0. We use this power law model to fit the experimental data in Figures 4a and 5b of the main text, as well as the numerical simulations in Figure 4a of the main text.
In order to model 1/Λ 5/7 as a function of injected leakage population P L in Figure 5c of the main text, we take the inverse of the ratio between distance-5 and distance-7 logical error per cycle, Λ 5/7 (P L ) = ε 5 (P L ) ε 7 (P L ) .(S4) For logical error probability p L with respect to cycle n of a bit-flip code operated well below threshold and in the presence of leakage dynamics (Figure S5), we use a Gompertz model to describe logical error rate ε, where a, b, and c are phenomenological free parameters.A Gompertz model can partly capture the transient dynamics of ε at small n, where time-boundary effects and still-increasing leakage populations make ε highly dependent on n.At large n, ε tends to a constant as leakage populations stabilize.

S6. SURFACE CODE SIMULATIONS WELL BELOW THRESHOLD
To gain insight into the future importance of leakage removal for scaling quantum error correction, we performed numerical simulations of distance-5 and distance-7 surface codes and evaluated their logical performance subject to different levels of leakage injection carried out using the circuit shown in the inset of Figure 5b of the main text.We performed these simulations using a Kraus operator simulation detailed in [17].We include operators that accurately reflect leakage transport, leakage phase errors, and the leakage removal parameters for both the MLR and DQLR strategies, but do not include any sources of leakage in the baseline error model at zero leakage injection.The error model for the baseline simulations is detailed in Table S1.We repeat these simulations with varying amounts of injected leakage population under two leakage removal strategies, MLR and DQLR, which is presented in the Figure 5c of the main text.
We fit the data from the numerical simulations in Figure 5c of the main text with a power law ratio model (Equation S4).Additionally, we fit the data for DQLR with a line.The high degree of linearity (R 2 = 0.983, 1/Λ 5/7 ≈ 111 × P L + 0.2) suggests that the surface code error budget 1/Λ 5/7 for leakage under DQLR can be linearized and modeled as an uncorrelated error source.

S7. ERROR CORRELATIONS IN THE SURFACE CODE EXPERIMENT
One technique to evaluate error correlations in the surface code is to employ p ij correlation matrices [14,17,35,39].By analyzing the data presented in Figure S6 with p ij correlation matrices, we can elucidate the presence of error correlations in space and time over the course of the QEC experiment.For p ij matrices, i and j correspond to nodes in the detection graph, each of which have stabilizer (or space, s) and time (t) coordinates; i = (s, t), j = (s , t ).To first focus on correlations in time, we average the matrix elements p ij with s = s over all t and t , producing autocorrelation matrices pt,t , defined as pt,t = s=s p ij s=s 1 . (S6) The autocorrelation matrix pt,t reports the average of the probabilities of error graph edges between nodes i and j associated to the same stabilizer s and for arbitrary time separation |t − t |.
In Figure S7a, we show averaged autocorrelation matrices pt,t for both X-and Z-basis stabilizers over the 30 cycle experiment under the three leakage removal strategies investigated in this work.Z-basis stabilizers do not report detection probabilities at the time-boundary cycles 0 and 30 and thus do not have pt,t elements associated with those cycles.Independent Pauli errors present themselves as detection events on consecutive cycles |t − t | = 1.In such an ideal setting with only fully independent errors, we expect non-local correlations to vanish, i.e. pt,t = 0 where |t−t | > 1. Leakage and other time-non-local error sources will break this assumption and force those pt,t elements to be non-zero.
For No reset, non-local correlations immediately appear at cycle 1 and increase in intensity over the course of the 30-cycle experiment.The high values for non-local pt,t matrix elements greater than 1% at large correlation distances |t − t | indicate that non-local correlations are significant contributors to logical performance degradation, and that QEC cannot be practically scaled in time under these conditions [27].
For MLR, non-local correlations are visibly reduced when compared to No reset.Still, an impactful degree of non-local correlation remains, primarily stemming from  data qubit leakage and resultant leakage-induced correlated errors, such as CZ gate phase shifts highlighted in Figure 2e of the main text.
For DQLR, nearly all non-local correlations are heavily suppressed in the experiment.This is seen by very small correlation magnitudes less than 0.2% at correlation distances greater than 1.Qualitatively, the effective suppression of time-non-local correlations suggests we are much closer to fulfilling the QEC requirement of uncorrelated errors.Furthermore, this is evidence that our DQLR procedure does not introduce additional unwanted correlations within the experiment.
We isolate and average pt,t matrix elements for cycles t and t in [19][20][21][22][23][24][25][26][27][28][29]inclusive,in Figure S7b.This offers a quantitative profile to the degree of non-local correlations present in the surface code as a function of time correlation distance t − t , which correspond to the probability of encountering long-time error edges in the detection graph.Again, under ideal circumstances with fully independent errors, the correlation strength pt,t should be 0 at all t − t greater than 1.DQLR most closely approximates this condition, where |p t,t | exceeds 2 × 10 −3 only for distance-2 correlation and otherwise is below 0.2% for all distances up through 10.The 1 SD error bars for DQLR suggest that we cannot resolve variations in |p t,t | < 1 × 10 −3 for these non-local correlations.However, for MLR, |p t,t | is about an order of magnitude higher at 1% for distance-2 correlation, and slowly decays with distance, remaining above 0.1% even at distance-10.Finally, No reset never has pt,t < 1% for any of the correlation distances considered here.
In order to evaluate correlations across space, we can perform a similar analysis but now with nearest-neighbor stabilizers, The resulting average time correlations, which indicate the probability of encountering a long-diagonal error edge, are shown in the right panel of Figure S7b.By only considering time correlation distances of t−t > 1, we primarily exclude contributions from CZ gate errors, which manifest as diagonal edges at s − s = 1 and t − t = 1 [17].The remaining correlations at t − t > 1 then predominantly arise from leakage and crosstalk and should be zero under ideal circumstances without these error sources.Similar to the average correlation pt,t for s − s = 0, we observe the same hierarchy of correlation strengths for s − s = 1 under No reset, MLR, and DQLR.No reset performs the worst with correlation strengths at about 1% for all correlation distances studied here.MLR performs considerably better, but still has measurable correlations at time distances as large as 6 cycles.Given that we desire zero correlation at all time distances, DQLR most closely approximates this condition, with |p t,t | at 0.1% for distance-2 correlations, and otherwise well below 0.1% for longer time distances.
These observed correlation strengths suggest that if scalable QEC requires near-independent errors, complete leakage removal on all qubits must be carried out in some form.The DQLR strategy presented in this work pro-vides one pathway to reaching that requirement.As a corollary, we show that No reset and MLR cannot support this requirement under existing leakage rates and transport mechanisms.

Figure 1 .
Figure 1.Leakage in a structured QEC circuit.a)The energy potential of a transmon qubit, illustrating the computational energy levels |0 and |1 (blue) and the leakage levels |2 and higher (red).b) The circuit for surface code QEC, showing a square grid comprised of measure qubits (light blue circle) and data qubits (orange squares).The cycle consists of four layers of entangling gates, along with intervening single qubit rotations, followed by the measurement (M) and reset (R).The reset operation here is shown across all qubits; it may be implemented as single qubit operations on the measure qubit, or include entangling operations with various neighboring data qubits.c) The time decay (main, blue) and spatial spread (inset) of leakage in a distance-3 surface code following the injection of |2 on the central data qubit.Each cycle takes approximately 1 µs, and leakage population is measured at the end of each cycle.The expected decay of |2 from T1 relaxation on the leaked qubit alone is indicated (dashed red).Excess leakage population is defined as the subtraction of leakage population in the absence of injection from the leakage population in the presence of injection.
Figure 3.Leakage population during surface code execution.a) Average leakage populations for data qubits (squares) and measure qubits (circles) measured at the end of each surface code cycle with No reset (red), MLR (green), and DQLR (blue).b) Top: The surface code circuit shown for a pair of neighboring measure and data qubits.Each surface code cycle is highlighted (rounded rectangles).Bottom: A single surface code cycle showing each moment in the cycle.c) Leakage populations after each moment in the cycle for MLR (green) and DQLR (blue) leakage removal strategies, averaged over data qubits (squares) and measure qubits (circles) and over cycles25-30.

Figure 4 .
Figure 4. Bit-flip code logical performance and dependence on injected errors.a)Logical error probability for a distance-21 bit-flip code run for 60 cycles, under the effect of either injected leakage (dark circles) or injected Pauli errors (light squares).Three leakage removal strategies, No reset (red, unfilled), MLR (green, semi-filled), and DQLR (blue, filled), are considered.Lines are fits to experimental data using a power law with an offset.Below: Highlights of fits to experimental data (solid) and numerical simulations (dashed) for the MLR and DQLR strategies.b) Circuits for the bit-flip code, showing the error injection locations for both leakage (left) and Pauli errors (right).

Figure 5 .
Figure 5. Surface code logical performance and dependence on injected errors.a)The detection probability averaged for the weight-4 stabilizers in a distance-3 surface code under the three leakage removal strategies studied in this work.b) Logical error probability for a distance-3 surface code run for 15 cycles, under varying injected leakage population and the three different leakage removal strategies studied in this work.The inset shows that the circuit has an included layer where leakage is injected by performing a |1 ↔ |2 rotation.c) Dependence of projected surface code error budget 1/Λ 5/7 (the inverse of the exponential error suppression factor between a distance-5 and distance-7 surface code) under injected leakage population compared between MLR (green) and DQLR (blue).Solid lines are fits to a ratio of offset power laws, while the dotted light blue line is a linear fit of the data using DQLR.

Figure S1 .
Figure S1.Measuring leakage transport.a) Readout clouds for the pair of qubits used in the leakage transport experiment, showing good distinguishability between all of the lowest four qubit energy states.b) Raw measured population transport matrices for the two transport experiments.In both matrices, we can see the effect of T1 decay during measurement, which is enhanced for the higher levels.Subtracting "Baseline" from "With CZ gate" produces the matrix shown in Figure2bof the main text.c) Individual relative population transport values ∆Pt for relevant leakage transport mechanisms |30 ↔ |12 and |31 ↔ |22 .The average of the norm of these values produce the population transport values at the bottom of Figure 2b of the main text.

Figure S2 .
Figure S2.Measuring leakage phases.a) Raw data for the leakage phase experiment on a single pair of qubits.Each of |0 , |1 , and |2 is prepared on the higher-energy qubit, and then a Ramsey experiment with an interleaved CZ gate is performed on the lower-energy qubit.The solid lines are sinusoidal fits and the dashed lines indicate the extracted phase shifts φ shown in the main text.b) The hardware circuit executed in the above experiment in the |2 case.The two initial rotations (X and L) on the higher-energy qubit (top) are removed when preparing |0 , and the second rotation L is removed when preparing |1 .
Figure S3.Cross-entropy benchmarking (XEB) of DQLR operation.Inferred XEB error per cycle for 9 data qubits when the DQLR operation is applied on the data qubit, compared to XEB error per cycle when the data qubit idles for the equivalent time.Shaded region indicates 1 SD error of inferred XEB error per cycle.Vertical dashed lines indicate the mean XEB error per cycle over all data qubits.(Inset) XEB circuit where random single-qubit unitary rotations (U) are repeatedly applied to the target qubit, interleaved with a reset operation (R), which is either DQLR or idle (Idle).The final state of the target qubit is measured.The cross-entropy between the measured and expected distribution of states is calculated and the resulting XEB error per cycle is inferred.

Figure S4 .
Figure S4.Leakage dynamics in a surface code with different removal strategies.Comparison of excess leakage population dynamics over 5 cycles for all qubits in a distance-3 surface code after a full |1 → |2 leakage injection during the first cycle.With No reset, leakage transport mechanisms lead to increasing leakage population over nearly all qubits involved in the code.With the introduction of MLR, a sink for leakage population is present on measure qubits, mitigating the spread of leakage from leakage transport effects.The central leakage-injected data qubit still remains significantly leaked, even after 5 cycles.Using DQLR, leakage populations on all qubits in the code are brought to about 1 × 10 −3 or lower within 2 cycles.
Figure S5.Logical performance of bit-flip code in time.Logical error probability of the distance-21 bit-flip code over 60 cycles for the three reset strategies No reset (red), MLR (green), and DQLR (blue).For MLR and DQLR, we also execute the code while injecting 1% leakage population per cycle.Early in the code (fewer than 5 cycles), boundary effects cause all three reset strategies to perform similarly.As the code progresses through more and more cycles, however, logical performance for the three strategies diverge as leakage populations are handled differently.
Figure S6.Distance-3 surface code detection probabilities.Average detection probabilities over 30 surface code cycles for weight-2 (circles) and weight-4 (squares) stabilizers for three leakage removal strategies No reset (red), MLR (green), and DQLR (blue).Individual stabilizer detection probabilities are shown as lighter lines.

Figure S7 .
Figure S7.Surface code pij correlations.a) Average autocorrelated pij matrices pt,t of X and Z stabilizers under different leakage removal strategies.Correlations along the upper and lower diagonals (p t,t where |t − t | = 1) represent independent timelike errors, whereas non-local correlations (p t,t where |t − t | > 1) can be primarily attributed to leakageinduced correlated errors.For No reset, non-local correlations intensify as the code is executed in time, suggesting increasing leakage-induced correlated errors from growing leakage population.With MLR, non-local correlations are reduced but remain, which are manifestations of data qubit leakage-induced correlated errors.For DQLR, complete leakage removal over all qubits results in suppression of non-local correlations to about 1 × 10 −3 or lower.b) (Left) Averaged autocorrelation pt,t ; s − s = 0 over all stabilizers for cycles t and t in 19-29, inclusive, under different leakage removal strategies.Average non-local correlation magnitudes do not exceed 2 × 10 −3 for DQLR, whereas No reset and MLR have exponential decays in correlation magnitude with respect to correlation distance.The inset shows an example of a distance-4 long-time error edge on a detection graph.(Right) Averaged nearest-neighbor correlation pt,t ; s − s = 1 over all stabilizers, extracted under the same conditions as the averaged autocorrelation values.The inset shows an example of a distance-4 long-diagonal error edge on a detection graph.

Table S1 .
Hypothetical device error model for simulations carried out in Figure5cof the main text.