Leakage detection for a transmon-based surface code

Leakage outside of the qubit computational subspace, present in many leading experimental platforms, constitutes a threatening error for quantum error correction (QEC) for qubits. We develop a leakage-detection scheme via Hidden Markov models (HMMs) for transmon-based implementations of the surface code. By performing realistic density-matrix simulations of the distance-3 surface code (Surface-17), we observe that leakage is sharply projected and leads to an increase in the surface-code defect probability of neighboring stabilizers. Together with the analog readout of the ancilla qubits, this increase enables the accurate detection of the time and location of leakage. We restore the logical error rate below the memory break-even point by post-selecting out leakage, discarding about 47% of the data. Leakage detection via HMMs opens the prospect for near-term QEC demonstrations, targeted leakage reduction and leakage-aware decoding and is applicable to other experimental platforms.

The presence of leakage errors has motivated investigations of its effect on the code performance and of strategies to mitigate it. A number of previous studies have focused on a stochastic depolarizing model of leakage [38,[40][41][42][43], allowing to explore large-distance surface codes and the reduction of the code threshold using simulations. These models, however, do not capture the full details of leakage, even though a specific adapta-tion has been used in the case of trapped-ion qubits [41][42][43]. Complementary studies have considered a physically realistic leakage model for transmons [36,39], which was only applied to a small parity-check unit due to the computational cost of many-qutrit density-matrix simulations. In either case, leakage was found to have a strong impact on the performance of the code, resulting in the propagation of errors, in the increase of the logical error rate and in a reduction of the effective code distance. In order to mitigate these effects, there have been proposals for the introduction of leakage reduction units (LRUs) [37,39,40,45] beyond the natural relaxation channel, for modifications to the decoding algorithms [17,38,40], as well as for the use of different codes altogether [42]. Many of these approaches rely on the detection of leakage or introduce an overhead in the execution of the code. Recently, the indirect detection of leakage in a 3-qubit parity-check experiment [20] was realized via a Hidden Markov Model (HMM), allowing for subsequent mitigation via post-selection. Given that current experimental platforms are within reach of quantummemory demonstrations, detailed simulations employing realistic leakage models are vital for a comprehensive understanding of the effect of leakage on the code performance, as well as for the development of a strategy to detect leakage without additional overhead.
In this work we demonstrate the use of computationally efficient HMMs to detect leakage in a transmon implementation of the distance-3 surface code (Surface-17) [46]. Using full-density-matrix simulations [27,47] we first show that repeated stabilizer measurements sharply project data qubits into the leakage subspace, justifying the use of classical HMMs with only two hidden states (computational or leaked) for leakage detection. We observe a considerable increase in the surface-code defect probability of neighboring stabilizers while a data  Schematic of the relevant interactions and the CZ error model for two transmons. a Sketch of the considered avoided crossings, arranged vertically versus energy and horizontally versus the frequency ω flux of Q flux . Inset: Schematic diagram of the frequency excursion taken by Q flux during a Net-Zero pulse [9]. b The parametrized CZ error model. Phase errors, SWAP-like exchanges, relaxation and decoherence are taken into account (see Section I A).
or ancilla qubit is leaked, a clear signal that may be detected by the HMMs. For ancilla qubits, we further consider the information available in the analog measurement outcomes, even when the leaked state |2 can be discriminated from the computational states |0 and |1 with limited fidelity. We demonstrate that a set of twostate HMMs, one HMM for each qubit, can accurately detect both the time and the location of a leakage event in the surface code. By post-selecting on the detected leakage, we restore the logical performance of Surface-17 (using targeted experimental parameters) below the memory break-even point, while discarding only ≈ 47% of the data. Finally, we outline a minimal set of conditions for our leakage-detection scheme to apply to other quantum-computing platforms. These results open the prospect for near-term demonstrations of fault tolerance even in the presence of leakage, as well as leakage-aware decoding [17,40] and real-time targeted application of LRU schemes [37,39,40].

A. Leakage error model
We develop an error model for leakage in superconducting transmons, for which two-qubit gates constitute the dominant source of leakage [5,6,9,11,12,[29][30][31][32][33][34], while single-qubit gates have negligible leakage probabilities [8,44]. We thus focus on the former, while the latter is assumed to induce no leakage at all. We assume that single-qubit gates act on a leaked state as the identity. Measurement-induced leakage is also assumed to be negligible.
We use full-trajectory simulations to characterize leakage in the Net-Zero implementation [9] of the controlledphase gate (CZ), considered as the native two-qubit gate in a transmon-based Surface-17, with experimentally targeted parameters (see Table I and Table II). This gate uses a flux pulse such that the higher frequency qubit (Q flux ) is fluxed down from its sweetspot frequency ω max to the vicinity of the interaction frequency ω int = ω stat − α, where ω stat is the frequency of the other qubit (Q stat ), which remains static, and α is the transmon anharmonicity. The inset in Fig. 1 a shows a schematic diagram of the frequency excursion taken by Q flux . A (bipolar) 30 ns pulse tunes twice the qubit on resonance with the |11 ↔ |02 avoided crossing, corresponding to the interaction frequency ω int . This pulse is followed by a pair of 10 ns single-qubit phasecorrection pulses. The relevant crossings around ω int are shown in Fig. 1 a and are all taken into account in the full-trajectory simulations. The two-qubit interactions give rise to population exchanges towards and within the leakage subspace and to the phases acquired during gates with leaked qubits, which we model as follows.
The model in Fig. 1 b considers a general CZ rotation, characterized by the two-qubit phase φ 11 for state |11 and φ = 0 for the other three computational states. The single-qubit relative phases φ 01 and φ 10 result from imperfections in the phase corrections. The conditional phase is defined as φ CZ = φ 11 − φ 01 − φ 10 + φ 00 , which for an ideal CZ is φ CZ = π. In this work, we assume φ 00 = φ 01 = φ 10 = 0 and φ CZ = φ 11 = π. We set φ 02 = −φ 11 in the rotating frame of the qutrit, as it holds for flux-based gates [35].
Interactions between leaked and non-leaked qubits lead to extra phases, which we call leakage conditional phases. We consider first the interaction between a leaked Q flux and a non-leaked Q stat . In this case the gate restricted to the {|02 , |12 } subspace has the effect diag e iφ02 , e iφ12 , which up to a global phase corresponds to a Z rotation on Q stat with an angle given by the leakage conditional phase φ L stat := φ 02 − φ 12 . Similarly, if Q stat is leaked, then Q flux acquires a leakage conditional phase φ L flux := φ 20 − φ 21 . These rotations are generally nontrivial, i.e., φ L stat = π and φ L flux = 0, due to the interactions in the 3-excitation manifold which cause a shift in the energy of |12 and |21 (see Appendix H). If the only interaction leading to non-trivial φ L stat , φ L flux is the interaction between |12 and |21 , then it can be expected that φ 12 = −φ 21 in the rotating frame of the qutrit, leading to φ L stat = π − φ L flux . Leakage is modeled as an exchange between |11 and |02 , i.e., |11 → √ [48]. We observe that the phase φ and the off-diagonal elements |11 02| and |02 11| do not affect the results presented in this work, so we set them to 0 for computational efficiency (see Appendix B). The SWAP-like exchange between |12 and |21 with probability L m , which we call leakage mobility, as well as the possibility of further leaking to |3 , are analyzed in Appendix H. The described operations are implemented in quantumsim [47] as instantaneous, while the amplitude and phase damping experienced by the transmon during the application of the gate are symmetrically introduced around them, indicated by light-orange arrows in Fig. 1 b. The dark-orange arrows indicate the increased dephasing rate of Q flux far away from ω max during the Net-Zero pulse. The error parameters considered in this work are summarized in Appendix B. In particular, unless otherwise stated, L 1 is set to 0.125% and φ L flux and φ L stat are randomized for each qubit pair across different runs, but not across CZ gates during the same run. We randomize φ L flux and φ L stat as they have not been characterized in experiment and we instead capture an average behavior.
Some potential errors are found to be small from the full-trajectory simulations of the CZ gate and thus are not included in the parametrized error model. The population exchange between |01 ↔ |10 , with coupling J 1 , is suppressed (< 0.5%) since this avoided crossing is offresonant by one anharmonicity α with respect to ω int . While |12 ↔ |21 is also off-resonant by α, the coupling between these two levels is stronger by a factor of 2, hence potentially leading to a larger population exchange (see Appendix H). The |11 ↔ |20 crossing is 2α away from ω int and it thus does not give any substantial phase accumulation or population exchange (< 0.1%). We have compared the average gate fidelity of CZ gates simulated with the two methods and found differences below ±0.1%, demonstrating the accuracy of the parametrized model.

B. Effect of leakage on the code performance
We implement density-matrix simulations [47] to study the effect of leakage in Surface-17 (Fig. 2). We follow the frequency arrangement and operation scheduling proposed in [46], which employs three qubit frequencies for the surface-code lattice, arranged as shown in Fig. 2 a. The CZ gates are performed between the high-mid and mid-low qubit pairs, with the higher frequency qubit of the pair taking the role of Q flux (see Fig. 1). Based on the leakage model in Section I A, only the high and mid frequency qubits are prone to leakage (assuming no leakage mobility). Thus, in the simulation those qubits are included as three-level systems, while the lowfrequency ones are kept as qubits. Ancilla-qubit measurements are modeled as projective in the {|0 , |1 , |2 } basis and ancilla qubits are not reset between QEC cycles. As a consequence, given the ancilla-qubit mea-  [46]. Pink (resp. red) circles with D labels represent low-(high-) frequency data qubits, while blue (resp. green) circles with X (Z) labels represent ancilla qubits of intermediate frequency, performing X-type (Z-type) parity checks. b Dependence of the logical error rate εL on the leakage probability L1 for a MWPM decoder (green) and for the decoding upper bound (red). The black solid line shows the physical error rate of a single transmon qubit. The dashed line corresponds to the recently achieved L1 in experiment [9]. Logical error rate εL for MWPM (c) and upper bound (d) as a function of the leakage conditional phases φ L flux and φ L stat (for L1 = 0.5%). Here, these phases are not randomized but fixed to the given values across all runs. The logical error rates are extracted from an exponential fit of the logical fidelity over 20 QEC cycles and averaged over 10 5 runs for b and 2×10 4 runs for c,d. Error bars correspond to 95% confidence intervals estimated by bootstrapping (not included in b due the error bars being smaller than the symbol size). decoder, whose weights are trained on simulated data without leakage [27,49] and we benchmark its logical performance in the presence of leakage errors. The logical qubit is initialized in |0 L and the logical fidelity is calculated at each QEC cycle, from which the logical error rate ε L can be extracted [27]. Figure 2 b shows that the logical error rate ε L is sharply pushed above the memory break-even point by leakage. We compare the MWPM decoder to the decoding upper bound (UB), which uses the complete densitymatrix information to infer a logical error. A strong increase in ε L is observed for this decoder as well. Furthermore, the logical error rate has a dependence on the leakage conditional phases for both decoders, as shown in Fig. 2 c,d.

C. Projection and signatures of leakage
We now characterize leakage in Surface-17 and the effect that a leaked qubit has on its neighboring qubits. From the density matrix (DM), we extract the probability p L DM (Q) = P(Q ∈ L) = 2|ρ Q |2 of a qubit Q being in the leakage subspace L at the end of a QEC cycle, after the ancilla-qubit measurements, where ρ Q is the reduced density matrix of Q.
In the case of data-qubit leakage, p L DM (Q) sharply rises to values near unity, where it remains for a finite number of QEC cycles (on average 9 QEC cycles for the considered parameters, see Table I). An example showing this projective behavior (in the case of qubit D 4 ), as observed from the density-matrix simulations, is reported in Fig. 3 a. This is the typical behavior of leakage, as shown in Fig. 3 b by the bi-modal density distribution of the probabilities p L DM (Q) for all the high-frequency data qubits Q. Given this projective behavior, we identify individual events by introducing a threshold p L th (Q), above which a qubit is considered as leaked. Here we focus on leakage on D 4 , sketched in Fig. 3 c. Based on a threshold p L th (D 4 ) = 0.5, we select leakage events and extract the average dynamics shown in Fig. 3 d. Leakage grows over roughly 3 QEC cycles following a logistic function, reaching a maximum probability of approximately 0.8. We observe this behavior for all three high-frequency data qubits D 3 , D 4 , D 5 .
We observe an increase in the defect probability of the neighboring ancilla qubits during data-qubit leakage. We extract the probability p d of observing a defect d = 1 on the neighboring stabilizers during the selected data-qubit leakage events, as shown in Fig. 3 e. As p L DM (D 4 ) reaches its maximum, p d goes to a constant value of approximately 0.5. This can be explained by data-qubit leakage reducing the stabilizer checks it is involved in to effective weight-3 anti-commuting checks, illustrated in Fig. 3 c and as observed in [20]. This anti-commutation leads to some of the increase in ε L for the MWPM and UB decoders in Fig. 2 b. Furthermore, we attribute the observed projection of leakage (see Fig. 3 d) to a back-  c-e Signatures of dataqubit leakage. c Sketch of how leakage on a data qubit, e.g. D4, alters the interactions with neighboring stabilizers, leading to their anti-commutation (see Appendix D). d The average projection of the leakage probability p L DM of D4 over all events, where this probability is first below and then above a threshold of p L th = 0.5 for at least 5 and 8 QEC cycles, respectively. e The average number of defects on the neighboring stabilizers of D4 over the selected rounds, showing a jump when leakage rises above p L th . f-h Signatures of ancillaqubit leakage. f Sketch of how leakage on an ancilla qubit, e.g. Z1, effectively disables the stabilizer check and probabilistically introduces errors on the neighboring data qubits. g We select realizations where Z1 was in the computational subspace for at least 5 QEC cycles, after which it was projected into |2 by the readout and remained in that state for at least 5 QEC cycles. h The corresponding defect rate on neighboring stabilizers during the period of leakage. The error bars, which were estimated by bootstrapping, are smaller than the symbol sizes.
action effect of the measurements of the neighboring stabilizers, whose outcomes are nearly randomized when the qubit is leaked (see Appendix D). The weight-3 checks can also be interpreted as gauge operators, whose pairwise product results in weight-6 stabilizer checks, which can be used for decoding [50][51][52][53], effectively reducing the code distance from 3 to 2.
We also find a local increase in the defect probability during ancilla-qubit leakage. For ancilla qubits, p L DM is defined as the leakage probability after the ancilla projection during measurement. Since in the simulations ancilla qubits are fully projected, p L DM (Q) = 0, 1 for an ancilla qubit Q, allowing to directly obtain the individual leakage events, as shown in Fig. 3 g. We note that an ancilla qubit remains leaked for 11 QEC cycles on average for the considered parameters (see Table I). We extract p d during the selected events, as shown in Fig. 3 h. In the QEC cycle after the ancilla qubit leaks, p d abruptly rises to a high constant value. We attribute this to the Z rotations acquired by the neighboring data qubits during interactions with the leaked ancilla qubit, as sketched in Fig. 3 f and described in Section I A. The angle of rotation is determined by φ L flux or φ L stat , depending on whether the leaked ancilla qubit takes the roles of Q stat or Q flux , respectively (see Appendix A for the scheduling of operations). In the case of Z-type parity checks, these phase errors are detected by the X-type stabilizers. In the case of X-type checks, the phase errors on data qubits are converted to bit-flip errors by the Hadamard gates applied on the data qubits, making them detectable by the Z-type stabilizers. Furthermore, while the ancilla qubit is leaked, the corresponding stabilizer measurement does not detect any errors on the neighboring data qubits, effectively disabling the stabilizer, as sketched in Fig. 3 f. This, combined with the spread of errors, defines the signature of ancilla-qubit leakage and explains part of the observed increase in ε L for the MWPM and UB decoders in Fig. 2 For both data and ancilla qubits, a leakage event is correlated with a local increase in the defect rate, albeit due to different mechanisms. However, interpreting the spread of defects as signatures of leakage suggests the possible inversion of the problem, allowing for effective leakage detection.

D. Hidden Markov Models
We use a set of HMMs, one HMM for each leakageprone qubit, to detect leakage, similarly to what recently demonstrated in a minimal system [20]. An HMM (see Fig. 4 and Appendix E) models the time evolution of a discrete set of hidden states, the transitions between which are assumed to be Markovian. At each time step a set of observable bits is generated with state-dependent emission probabilities. Depending on the observed outcomes, the HMM performs a Bayesian update of the predicted probability distribution over the hidden states.
We apply the concept of HMMs to leakage inference and outline their applicability for an accurate, scalable and run-time executable leakage-detection scheme. This is made possible by two observations. The first is that both data-and ancilla-qubit leakage are sharply pro-leakage seepage Hidden states

Observables Comp
Leak Figure 4: Schematic representation of an HMM for leakage detection. For both ancilla and data qubits only two hidden states are considered, corresponding to the qubit being either in the computational (teal) or leakage subspace (orange). Transitions between these states occur each QEC cycle, depending on the leakage and seepage probabilities. The statedependent observables are the defects d (Q) on the neighboring stabilizers. For ancilla qubits, the in-phase component Im of the analog measurement is also used as an observable.
jected (see Section I C) to high p L DM (Q). This justifies the use of classical HMMs with only two hidden states, corresponding to the qubit being in the computational or leakage subspace.
The second observation is the sharp local increase in p d associated with leakage (see Section I C), which we identify as the signature of leakage. This allows us to consider only the defects on the neighboring stabilizers as relevant observables and to neglect correlations between pairs of defects associated with qubit errors. In the case of ancilla-qubit leakage, in addition to the defects, we consider the state information obtained from the analog measurement as input to the HMMs. Each transmon is dispersively coupled to a dedicated readout resonator. The state-dependent shift in the single-shot readout produces an output voltage signal, with in-phase and quadrature components (see Appendix C).
The transition probabilities between the two hidden states are determined by the leakage and seepage probabilities per QEC cycle, which are, to lowest order, a function only of the leakage probability L 1 per CZ gate and of the relaxation time T 1 (see Appendix E). We extract the state-dependent emission probabilities from simulation. When a qubit is not leaked, the probability of observing a defect on each of the neighboring stabilizers is determined by regular errors. When a data qubit is leaked, the defect probability is fixed to a nearly constant value by the stabilizer anti-commutation, while when an ancilla qubit is leaked, this probability depends on φ L flux and φ L stat . Furthermore, the analog measurement outcome can be used to extract a probability of the transmon being in |0 , |1 or |2 using a calibrated measurement (see Section I F and Appendix C). . The average is computed by selecting single realizations where p L DM (Q) was below a threshold p L th = 0.5 for at least 5 QEC cycles and then above it for 5 or more rounds. Error bars, estimated by bootstrapping, are smaller than the symbol sizes. b Precision-recall curves for the data qubits over 4 × 10 4 runs of 50 QEC cycles each using the HMM predictions (solid) and the leakage probability from the density matrix (dashed). The dotted line corresponds to a random guess classifier for which P is equal to the fraction of leakage events (occurring with probability given by the density matrix) over all QEC cycles and runs.

E. Data-qubit leakage detection
We assess the ability of the data-qubit HMMs to accurately detect both the time and the location of a leakage event. The average temporal response p L HMM (Q) of the HMMs to an event is shown in Fig. 5 and compared to the leakage probabilities p L DM (Q) extracted from the density-matrix simulation. Events are selected when crossing a threshold p L th , as described in Section I C, and the response is averaged over these events. For the data-qubit HMMs, the response p L HMM (Q) closely follows the probability p L DM (Q) from the density matrix, reaching the same maximum leakage probability but with a smaller logistic growth rate. This slightly slower response is expected to translate to an average delay of about 1 QEC cycles in the detection of leakage.
We now explore the leakage-detection capability of the HMMs. The precision P of an HMM tracking leakage on a qubit Q is defined as and can be computed as where i runs over all runs and QEC cycles and θ is the Heaviside step function. The precision is then the fraction of correctly identified leakage events (occurring with probability given by the density matrix), over all of the HMM detections of leakage. The recall R of an HMM for a qubit Q is defined as and can be computed as The recall is the fraction of detected leakage by the HMM over all leakage events (occurring with probability given by the density matrix). The precision-recall (PR) of an HMM (see Fig. 5 b) is a parametric curve obtained by sweeping p L th (Q) and plotting the value of P and R. Since the PR curve is constructed from p L HMM (Q) over all QEC cycles and runs, it quantifies the detection ability in both time and space. The detection ability of an HMM manifests itself as a shift of the PR curve towards higher values of P and R simultaneously. We define the optimality O (Q) of the HMM corresponding to qubit Q as where AUC HMM (Q) is the area under the PR curve of the HMM and AUC DM (Q) is the area for the optimal model that predicts leakage with probability p L DM (Q), achieving the best possible P DM and R DM . An average optimality of O (Q) ≈ 74.0% is extracted for the data-qubit HMMs. Given the few QEC-cycle delay in the data-qubit HMM response to leakage, the main limitation to the observed HMM optimality O (Q) is false detection when a neighboring qubit is leaked (see Appendix F).

F. Ancilla-qubit leakage detection
We now assess the ability of the ancilla-qubit HMMs to accurately detect both the time and the location of a leakage event. These HMMs take as observables the defects on the neighboring stabilizers at each QEC cycle as well as the analog measurement outcome of the ancilla qubit itself.
We first consider the case when the HMMs rely only on the increase in the defect probability p d and show their PR curves in Fig. 6 a,b. Bulk ancilla qubits have a low O (Q) ≈ 35%, while boundary ancilla qubits possess no ability to detect leakage at all. We attribute this to the boundary ancilla qubits having only a single neighboring stabilizer, compared to bulk ancilla qubits having 3 in Surface-17. The HMMs corresponding to pairs of sametype (X or Z) bulk ancilla qubits exhibit significantly different PR curves (see Fig. 6 a,b), despite the apparent symmetry of Surface-17. This symmetry is broken by the use of high-and low-frequency transmons as data qubits, leading to differences in the order in which an ancilla qubit interacts with its neighboring data qubits (see [46] and Fig. 8), together with the fact that CZs with L 1 = 0 do not commute in general. In addition to a low O (Q), the errors propagated by the leaked ancilla qubits (and hence the signatures of ancilla-qubit leakage) depend on φ L stat and φ L flux (randomized in the simulations). The values of these phases generally lead to different p d than the ones parameterizing the HMM. The latter is extracted based on the average p d observed over the runs (see Appendix E). In the worst-case (for leakage detection), these phases can lead to no errors being propagated onto the neighboring data qubits, resulting in the undetectability of leakage. The mismatch between the fluctuating p d (over φ L stat and φ L flux ) and the average value hinders the ability of the ancilla-qubit HMMs to detect leakage. Even if these phases were individually controllable, tuning them to maximize the detection capability of the HMMs would also lead to an undesirable increase in ε L of a (leakage-unaware) decoder (see Fig. 2).
To alleviate these issues, we consider the statedependent information obtained from the analog measurement outcome.
The discrimination fidelity between |1 and |2 is defined as where P (i | j) is the conditional probability of declaring the measurement outcome i given that the qubit has been prepared in state |j , assuming that no excitation or relaxation occur during the measurement (accounted for in post-processing). Here we assume that P (0 | 2) = P (2 | 0) = 0, as observed in experiment (see Fig. 9). We focus on the discrimination fidelity as in our simulations relaxation is already accounted for in the measurement outcomes (see Appendix A). We extract F L from recent experimental data [20], where the readout pulse was only optimized to discriminate between the computational states. By taking the in-phase component of the analog measurement, for each state |j a Gaussian distribution N j is obtained, from which we get F L = 88.4% (see Appendix C).
In order to emulate the analog measurement in simulation, given an ancilla-qubit measurement outcome m ∈ {0, 1, 2}, we sample the in-phase response I m from the corresponding distribution N m . The probability of the ancilla qubit being leaked given I m is computed as The ancilla-qubit HMMs use the sampled responses I m , in combination with the observed defects, to detect leakage.
The PR curves of the HMMs using the analog readout are shown in Fig. 6 c,d, from which an average O (Q) ≈ 96% can be extracted for the ancilla-qubit HMMs. Given that projective measurements are used in the simulations, AUC DM (Q) = 1 for ancilla qubits. The temporal responses of the HMMs to leakage are shown in Fig. 6 e,f, showing a sharp response to a leakage event, with an expected delay in the detection of at most 2 QEC cycles. A further benefit of the inclusion of the analogmeasurement information is that the detection capability of the HMMs is now largely insensitive to the fluctuations in φ L stat and φ L flux . We explore O (Q) as a function of F L , as shown in the inset of Fig. 6 c,d. To do this, we model N j for each state as symmetric and having the same standard deviation, in which case F L is a function of their signal-to-noise ratio only (see Appendix C). At low F L ( 60%) the detection of leakage is possible but very limited, especially for the boundary ancilla qubits. On the other hand, even at moderate values of F L (≈ 80%), corresponding to experimentally achievable values, ancilla-qubit leakage can be accurately identified for both bulk and boundary ancilla qubits. Furthermore, relying solely on the analog measurements would allow for the potential minimization of the error spread associated with ancilla-qubit leakage, given controllability over φ L stat and φ L flux , without compromising the capability of the HMMs to detect leakage.

G. Improving code performance via post-selection
We use the detection of leakage to reduce the logical error rate ε L via post-selection on leakage detection, with the selection criterion defined as Improvement in the logical error rate εL via post-selecting on the detection of leakage for a MWPM decoder (green) and the decoder upper bound (red). The postselection is based on the probabilities predicted by the HMMs (solid) or on those extracted from the density-matrix simulation (dashed), for 2 × 10 4 runs of 20 QEC cycles each. The physical error rate of a single transmon qubit under decoherence is also given (solid black). Detection of leakage allows for the restoration of the performance of the MWPM decoder, reaching the memory break-even point by discarding about ≈ 47% of the data. The logical error rates obtained from simulations without leakage (and without post-selection) are indicated by diamonds.
We thus post-select any run for which the leakage probability of any qubit exceeds the defined threshold in any of the QEC cycles. Note that, while this criterion is insensitive to overestimation of the leakage probability due to a leaked neighboring qubit (see Appendix F), it is sensitive to the correct detection of leakage in the first place and to the HMM response in time (especially for short-lived leakage events).
We perform the multi-objective optimization where f is the fraction of discarded data. The inequality constraint on the feasible space is helpful for the fitting procedure, required to estimate ε L . This optimization uses an evolutionary algorithm (NGSA-II ), suitable for conflicting objectives, for which the outcome is the set of lowest possible ε L for a given f . This set is known as the Pareto front and is shown in Fig. 7 for both the MWPM and UB decoders. In Fig. 7 we also compare post-selection based on the HMMs against post-selection based on the density-matrix simulation. Here we use the predictions of the HMMs which include the analog measurement outcome with the experimentally extracted F L (see Section I F). The observed agreement between the two post-selection methods proves that the performance gain is due to discarding runs with leakage instead of runs with only regular errors. The performance of the MWPM decoder is restored below the quantum memory break-even point by discarding f ≈ 47%. The logical error rates extracted from simulations without leakage are achieved by post-selection of f ≈ 57% and f ≈ 43% of the data for the MWPM and UB decoders, respectively, when leakage is included.

II. DISCUSSION
We have investigated the effects of leakage and its detectability using density-matrix simulations of a transmon-based implementation of Surface-17. Data and ancilla qubits tend to be sharply projected onto the leakage subspace, either indirectly by a back-action effect of stabilizer measurements for data qubits or by the measurement itself for ancilla qubits. During leakage, a large, but local, increase in the defect rate of neighboring qubits is observed. For data qubits we attribute this to the anticommutation of the involved stabilizer checks, while for ancilla qubits we find that it is due to an interactiondependent spread of errors to the neighboring qubits. We have developed a low-cost and scalable approach based on HMMs, which use the observed signatures together with the analog measurements of the ancilla qubits to accurately detect the time and location of leakage events. The HMM predictions are used to post-select out leakage, allowing for the restoration of the performance of the logical qubit below the memory break-even point by discarding f ≈ 47%, opening the prospect of near-term QEC demonstrations even in the absence of a dedicate leakage-reduction mechanism.
A few noise sources have not been included in the simulations. First, we have not included readout-declaration errors, corresponding to the declared measurement outcome being different from the state in which the ancilla qubit is projected by the measurement itself. These errors are expected to have an effect on the performance of the MWPM decoder, as well as a small effect on the observed optimality of the HMMs. We have also ignored any crosstalk effects, such as residual couplings, crossdriving or dephasing induced by measurements on other qubits. While the presence of these crosstalk mechanisms is expected to increase the error rate of the code, it is not expected to affect the HMM leakage-detection capability. We have assumed measurements to be perfectly projective. However, for small deviations, we do not expect a significant effect on the projection of leakage and on the observation of the characteristic signatures.
We now discuss the applicability of HMMs to other quantum-computing platforms subject to leakage and determine a set of conditions under which leakage can be efficiently detected. First, we assume single-and twoqubit gates to have low leakage probabilities, otherwise QEC would not be possible in general. In this way, singleand two-qubit leakage probabilities can be treated as perturbations to block-diagonal gates, with one block for the computational subspace C and one for the leakage subspace L. We focus on the gates used in the sur-face code, i.e., CZ and Hadamard H (or R Y (π/2) rotations or equivalent gates). We consider data-qubit leakage first. We have observed that it is made detectable by the leakage-induced anti-commutation of neighboring stabilizers. The only condition ensuring this anticommutation is that H acts as the identity in L or that it commutes with the action of CZ within the leakage block (see Appendix D), regardless of the specifics of such action. Thus, data-qubit leakage is detectable via HMMs if this condition is satisfied. In particular, it is automatically satisfied if L is 1-dimensional. We now consider ancilla-qubit leakage. Clearly, ancilla-qubit leakage detection is possible if the readout discriminates computational and leakage states perfectly or with high fidelity. If this is not the case, the required condition is that leaked ancilla qubits spread errors according to non-trivial leakage conditional phases, constituting signatures that can be used by an HMM. If even a limited-fidelity readout is available, it can still be used to strengthen this signal, as demonstrated in Section I F. An issue is the possibility of the readout to project onto a superposition of computational and leakage subspaces. In that case, the significance of ancilla-qubit leakage is even unclear. However, for non-trivial leakage conditional phases, we expect a projection effect to the leakage subspace by a back-action of the stabilizer measurements, due to leakage-induced errors being detected onto other qubits, similarly to what observed for data qubits.
The capability to detect the time and location of a leakage event demonstrated by the HMMs could be used in conjunction with leakage-reductions units (LRUs) [37]. These are of fundamental importance for fault tolerance in the presence of leakage, since in [40] a threshold for the surface code was not found if dedicated LRUs are not used to reduce the leakage lifetime beyond the one set by the relaxation time. While the latter constitutes a natural LRU by itself, we do not expect it to ensure a threshold since, together with a reduction in the leakage lifetime, it leads to an increase in the regular errors due to relaxation. A few options for LRUs in superconducting qubits are the swap scheme introduced in [36], or the use of the readout resonator to reset a leaked data-qubit into the computational subspace, similarly to [54,55]. An alternative is to use the |02 ↔ |11 crossing to realize a "leakage-reversal" gate that exchanges the leakage population in |02 to |11 . An even simpler gate would be a single-qubit π pulse targeting the |1 ↔ |2 transition. All these schemes introduce a considerable overhead either in hardware (swap, readout resonator), or time (swap, readout resonator, leakage-reversal gate), or they produce leakage when they are applied in the absence of it (leakage-reversal gate, π pulse). Thus, all these schemes would benefit from the accurate identification of leakage, allowing for their targeted application, reducing the average circuit depth and minimizing the probability of inadvertently inducing leakage.
We discuss how decoders might benefit from the detection of leakage. Modifications to MWPM decoders have been developed for the case when ancilla-qubit leakage is directly measured [17,40], and when data-qubit leakage is measured in the LRU circuits [40]. Further decoder modifications might be developed to achieve a lower logical error rate relative to a leakage-unaware decoder, by taking into account the detected leakage and the probability of leakage-induced errors, as well as the stabilizer information that can still be extracted from the superchecks (see Appendix D). In the latter case, a decoder could switch back and forth from standard surface-code decoding to e.g. the partial subsystem-code decoding in [50][51][52]. Given control of the leakage conditional phases, the performance of this decoder can be optimized by setting φ L stat = π and φ L flux = 0, minimizing the spread of phase errors on the neighboring data qubits by a leaked ancilla qubit, as well as the noise on the weight-6 stabilizer extraction in the case of a leaked data qubit (see Appendix D). Given a moderate discrimination fidelity of the leaked state, this is not expected to compromise the detectability of leakage, as discussed in Section I F. At the same time, for such a decoder we expect the improvement in the logical error rate to be limited in the case of low-distance codes such as Surface-17, as single-qubit errors can result in a logical error. This is because leakage effectively reduces the code distance, either because a leaked data qubit is effectively removed from the code, or because of the fact that a leaked ancilla qubit is effectively disabled and in addition spreads errors onto neighboring data qubits. Large codes, for which leakage could be well tolerated (depend-ing on the distribution of leakage events), cannot be studied with density-matrix simulations, as done in this work for Surface-17. However, the observed sharp projection of leakage and the probabilistic spread of errors justify the stochastic treatment of this error [40]. Under the assumption that amplitude and phase damping can be modeled stochastically as well, we expect that the performance of decoders and LRUs in large surface codes can be well approximated in the presence of leakage. For the Surface-17 simulations we use the open-source density-matrix simulation package quantumsim [27], available at [47]. For decoding we use a MWPM decoder [27], for which the weights of the possible error pairings are extracted from Surface-17 simulations via adaptive estimation [56] without leakage (L 1 = 0) and an otherwise identical error model (described in Appendix B).
The logical performance of the surface code as a quantum memory is the ability to maintain a logical state over a number of QEC cycles. We focus on the Z-basis logical |0 L , but we have observed nearly identical performance for |1 L . We have not performed simulations for the X-basis logical states |± L = 1 √ 2 (|0 L ± |1 L ), as previous studies did not observe a significant difference between the two bases [27]. The state |0 L is prepared by initializing all data qubits in |0 , after which it is maintained for a fixed number of QEC cycles (maximum 20 or 50 in this work), with the quantum circuit given in Fig. 8. The first QEC cycle projects the logical qubit into a simultaneous eigenstate of the X-and Ztype stabilizers [28], with the Z measurement outcomes  Figure 8: The quantum circuit for a single QEC cycle employed in simulation, for the unit-cell scheduling defined in [46]. The qubit labels and frequencies correspond to the lattice arrangement shown in Fig. 2. Gray elements correspond to operations belonging to the previous or the following QEC cycle. The X-type parity checks are performed at the start of the cycle, while the Z-type parity checks are executed immediately after the Z-type stabilizer measurements from the previous cycle are completed. The duration of each operation is given in Table I. The arrow at the bottom indicates the repetition of QEC cycles.
being +1 in the absence of errors, while the X outcomes are random. The information about the occurred errors is provided by the stabilizer measurement outcomes from each QEC cycle, as well as by a Z-type stabilizer measurements obtained by measuring the data qubits in the computational basis at the end of the run. This information is provided to the MWPM decoder, which estimates the logical state at the end of the experiment by tracking the Pauli frame. For decoding, we assume that the |2 state is measured as a |1 , as in most current experiments. In Section I F we considered the discrimination of |2 in readout, which can be used for leakage detection. While this information can be also useful for decoding, we do not consider a leakage-aware decoder in this work.
The logical fidelity F L (n) at a final QEC cycle n is defined as the probability that the decoder guess for the final logical state matches the initially prepared one. The logical error rate ε L is extracted by fitting the decay as where n 0 is a fitting parameter (usually close to 0) [27].

Appendix B: Error model and parameters
In the simulations we include qubit decoherence via amplitude-damping and phase-damping channels. The time evolution of a single qubit is given by the Lindblad where H is the transmon Hamiltonian with a the annihilation operator, ω and α the qubit frequency and anharmonicity, respectively, and L i the Lindblad operators. Assuming weak anharmonicity, we model amplitude damping for a qutrit by The |2 lifetime is then characterized by a relaxation time T 1 /2. Dephasing is described by leading to a dephasing time T φ between |0 (resp. |1 ) and |1 (|2 ), and to a dephasing time T φ /2 between |0 and |2 [9]. The Lindblad equation is integrated for a time t to obtain an amplitude-and phase-damping superoperator R ↓,t , expressed in the Pauli Transfer Matrix representation. For a gate R gate of duration t gate , decoherence is accounted by applying R ↓,tgate/2 R gate R ↓,tgate/2 . For idling periods of duration t idle , R ↓,t idle is applied. For single-qubit gates we only include the amplitude and phase damping experienced over the duration t single of the gate. These gates are assumed to not induce any leakage, motivated by the low leakage probabilities achieved [8,44], and to act trivially in the leakage subspace. For two-qubit gates, namely the CZ, we further consider the increased dephasing rate experienced by qubits when fluxed away from their sweetspot. In superconducting qubits, flux noise shows a typical power spectral density S f = A/f , where f is the frequency and √ A is a constant. In this work we consider √ A = 4 µΦ 0 , where Φ 0 is the flux quantum. Both low-and highfrequency components are contained in S f , which we define relative to the CZ gate duration t CZ . Away from the sweetspot frequency ω max , a flux-tunable transmon has first-order flux-noise sensitivity D φ = 1 2π ∂ω ∂Φ . The highfrequency components are included as an increase in the dephasing rate Γ φ = 1/T φ (compared to the sweetspot), given by Γ φ = 2π √ ln 2AD φ [57], while the low-frequency components are not included due to the built-in echo effect of Net-Zero pulses [9]. High-and mid-frequency qubits are fluxed away to different frequencies, with dephasing rates computed with the given formula. Furthermore, during a few gates low-frequency qubits are fluxed away to a "parking" frequency in order to avoid unwanted interactions [46]. The computed dephasing times at the interaction point are given in Table I. For the CZ gates, we include this increased dephasing during the time t int spent at the interaction point, while for the phase-correction pulses of duration t cor we consider the same dephasing time as at the sweetspot. We do not include deviations in the ideal single-qubit phases of the CZ gate φ 01 = 0 and φ 10 = 0 and the two-qubit phase φ 11 = π, under the assumption that gates are well tuned and that the low-frequency components of the flux noise are echoed out [9].
We now consider the coherence of leakage in the CZ gates, which in the rotating frame of the qutrit is modeled as the exchanges with L 1 the leakage probability [48]. The phase φ can lead to an interference effect between consecutive applications of the CZ gate across pairs of data and ancilla qubits. In terms of the full density matrix, the dynamics of Eqs. (B7) and (B8) leads to a coherent superposition of computational and leaked states where ρ C (resp. ρ L ) is the density matrix restricted to the computational (leakage) subspace, while ρ coh are the off-diagonal elements between these subspaces. We observe that varying the phase φ does not have an effect on the dynamics of leakage or on the logical error rate. We attribute this to the fact that each ancilla qubit interacts with a given data qubit only once during a QEC cycle and it is measured at the end of it (and as such it is dephased). Thus, the ancilla-qubit measurement between consecutive CZ gates between the same pair prevents any interference effect. Furthermore, setting ρ coh = 0, does not affect the projection and signatures of leakage nor the logical error rate (at least for the logical state prepared in the Z basis), leading to an incoherent leakage model. We attribute this to the projection of leakage itself, which leaves the qubit into a mostly incoherent mixture between the computational and leakage subspaces.
In the harmonic rotating frame, |2 is expected to acquire an additional phase during periods of idling, proportional to the anharmonicity. However, following the reasoning presented above, we also believe that this phase is irrelevant. An incoherent leakage model offers significant computational advantage for density-matrix simulations. For the case where ρ coh = 0, the size of the stored density matrix at any time is 4 6 × 9 4 (6 low-frequency data qubits, 3 high-frequency data qutrits plus 1 ancilla qutrit currently performing the parity check). Setting ρ coh = 0 reduces the size of the density matrix to 4 6 × 5 4 , since for each qutrit only the |2 2| matrix element is stored in addition to the computational subspace. Thus, for the simulations in this work we rely on an incoherent model of leakage.
Measurements of duration t m are modeled by applying R ↓,tm/2 R proj R ↓,tm/2 , where R ↓,tm/2 are periods of amplitude and phase damping and R proj is a projection operator. This projector is chosen according to the Born rule and leaves the ancilla qubit in either |0 , |1 or |2 . We do not include any declaration errors, which are defined as the measurement outcome being different from the state of the ancilla qubit immediately after the projection. Furthermore, we do not include any measurementinduced leakage, any decrease in the relaxation time via the Purcell effect or any measurement-induced dephasing via broadband sources. We do not consider non-ideal projective measurements (leaving the ancilla in a superposition of the computational states) due to the increased size of the stored density matrix that this would lead to.

Appendix C: Transmon measurements in experiment
We consider the measurements of transmons in experiment [20], which is enabled by the dispersive coupling between a transmon and a dedicated readout resonator. The resonator is connected to a common feedline via a dedicated Purcell filter [16]. Measurement is performed by applying a readout pulse to the feedline, populating the resonator with photons. Each transmon induces a state-dependent shift of the frequency of the readout res-  Figure 9: The analog measurement of transmons as extracted from experiment. a Histograms of the in-phase I and quadrature Q components of the measured readout for a transmon prepared in |0 , |1 or |2 . b The histograms of the responses for the transmon initially prepared in |0 or |1 , projected along the rotated quadrature maximizing the discrimination fidelity F 01 = 99.6%. c The histograms of the responses for the transmon initialized in |1 or |2 , projected along the I axis, in which case discrimination is achieved with a fidelity F 12 = 88.4%.
onator, changing the amplitude and phase of the outgoing photons. This outgoing signal is amplified and the in-phase (I) and quadrature (Q) components are extracted. For calibration of the single-shot readout, the transmon is prepared in either |0 , |1 or |2 and subsequently measured. Repeating this experiment characterizes the spread of the I and Q components of each state |i , which typically follow a two-dimensional Gaussian distribution N i with mean µ i and standard deviation σ i in the IQ plane [16,58], as exemplified in Fig. 9 a.
Given an analog measurement of I and Q, the probability of a transmon being in state |i can be expressed as where We assume that the prior state probabilities are equally likely. Furthermore, given the typically observed Gaussian distributions, it holds that P (I, Q | i) = N i (I, Q), which leads to .

(C3)
In experiment, one is typically interested in discriminating between pairs of states |i and |j , for which the discrimination fidelity is defined as where P (i | j) is the probability of declaring a measurement outcome i given a prepared state |j , under the assumption of no excitation or relaxation during the measurement (accounted for in post-processing), and where P (i) is the prior probability of the qubit being in state |i . Hence, the discrimination fidelity corresponds to the probability of correctly declaring the projected state. We focus on the discrimination fidelity as in our simulations relaxation is already accounted for in the measurement outcomes (see Appendix A). We assume P (i) = P (j) = 1 2 , which leads to This can be related to the signal-to-noise ratio SNR = | µ i − µ j | /2σ, assuming symmetric Gaussian distributions, as The IQ response can be projected onto the axis joining the centers of a pair of two-dimensional Gaussian distributions, allowing to consider a single quadrature while maximizing the discrimination fidelity. Without loss of generality, we consider this optimal axis to be along I. This allows to express Eq. (C3) as where N i (I) is the marginal of N i (I, Q). In experiment, in order to declare a binary measurement outcome, a threshold value for I is introduced, separating the regions for declaring either outcome. Following this approach, for a 3-outcome measurement, three projection axes are needed in general. However, since the Gaussian distributions for |1 and |2 are typically well-separated from the one for |0 , it is possible to use only two axes, i.e., one to discriminate |0 from |1 , and one to further discriminate |2 from the rest. For the measurement calibration from experiment [20], shown in Fig. 9 a, the discrimination between |0 and |1 can be achieved by projecting the analog responses along a rotated quadrature axis which maximizes the discrimination fidelity F 01 = 99.6%. Discriminating between |1 and |2 is performed with F 12 = 88.4% by projecting along a rotated in-phase axis, maximizing this fidelity.

Appendix D: Leakage-induced anti-commutation
We study the behavior of neighboring stabilizers in the presence of a leaked data qubit. We focus on a paritycheck operator in the bulk of the surface code. For the The effects of data-qubit leakage on the stabilizers of the code. a Sketch of how data-qubit leakage in the bulk (e.g. on D4) effectively defines weight-3 gauge operators, whose product forms a weight-6 X-type (purple) or Z-type (teal) "supercheck" stabilizer, in addition to the standard weight-2 X-type (blue) and Z-type (green) stabilizers. b,c The average probability p d of observing a defect on the supercheck stabilizers during leakage on D4 (defined by the leakage probability being above a threshold of 0.5) as a function of the leakage conditional phase φ L stat .
frequency scheme of Fig. 2, this involves two leakageprone high-frequency transmons and two low-frequency transmons, modeled as qutrits and qubits, respectively. The ancilla qubit used to perform the parity checks is leakage prone as well. However, here we do not consider this possibility, given the low probability of a pair of neighboring data and ancilla qubits to be leaked simultaneously. We consider the CZ for transmons described in Section I A, without including any decoherence. In the limit of the leakage probability L 1 → 0 (and leakage mobility L m → 0), for an ancilla qubit A and a high-frequency data qubit D, the CZ can be decomposed as Note thatĨ| C = I andZ| C = Z, where I and Z are the standard identity and Pauli Z operators, respectively, and C is the qubit computational subspace. For a CZ between an ancilla qubit and a low-frequency data qubit, it simply holds |0 0| A ⊗ I D + |1 1| A ⊗ Z D . Small values of L 1 , as observed in experiment [9], can be treated as a perturbation to this. For a parity-check measurement, the back-action on the state of the data qubits is given by either one of two operators, depending on the outcome. In the case of a Z-type parity-check unit, these operators are given by whereĨ abcd :=Ĩ aĨb I c I d andZ abcd :=Z aZb Z c Z d . Under the assumption that single-qubit gates, namely the Hadamard gate, do not induce any leakage and act trivially on the leakage subspace, for the X-type parity-check unit these operators are instead given by whereX abcd :=X aXb X c X d and in which caseX| C = X with X the standard Pauli operator. The X-type and Z-type parity checks commute if and only if M Z ± and M X ± commute, as it holds To compute the commutator we first evaluate Z ,X = 2 It follows that Z ,X when restricted to the computational and leakage subspaces, respectively. Notice that, when all data qubits are in the computational subspace, it holds Furthermore, we note that Eq. (D10) holds solely because H acts trivially in L (as we assume) and it would continue to hold as long as H commutes with CZ on L.
We now consider the case in which one of the highfrequency data qubits is in L (say a) and the remaining ones are in C. In this case Z abcd ,X abcd This shows that, in the presence of data-qubit leakage, M Z ± and M X ± do not commute. In particular,Z abcd andX abcd anti-commute and this result is independent of the leakage conditional phase. Furthermore, it holds and similarly for M X ± | La . For φ L stat = 0, π, M X ± | La are projectors onto the ±-eigenspaces of Z bcd or X bcd , constituting effective weight-3 parity checks. In this case the anticommutation [Eq. (D12)] leads to fully randomized ancilla-qubit measurement outcomes, corresponding to a probability p d = 50% of observing a defect each QEC cycle on each of the neighboring stabilizers. However, the product of two weight-3 same-type checks is a weight-6 stabilizer of the surface code, thus the product of the two ancilla-qubit measurement outcomes corresponds to the parity of the 6 data qubits involved. In particular, the stabilizer group can be redefined as including the standard weight-4 checks which do not involve the leaked qubit, together with the defined weight-6 "superchecks", while the weight-3 checks are gauge operators [50][51][52][53], as illustrated in Fig. 10 a. For the superchecks to be correctly obtained, both X-type gauge operators need to be measured before any of the two Z-type gauge operators (or viceversa), which already holds true for the circuit schedule we consider [46]. In the case of a leaked qubit on the boundary, only one supercheck operator can be defined (for a rotated surface code, this is a weight-4 X-or Z-type boundary supercheck), while the other one must be ignored for decoding [50][51][52]. In the case of one leaked data-qubit in Surface-17, the minimum weight of a dressed logical operator is 2, reducing the code distance by 1. For example, if D 4 is leaked, two X errors on D 2 and D 7 constitute a logical X. In a larger surface code, the reduction of the distance depends on the number of leaked qubits, as well as their distribution on the lattice [50].
In the general case where φ L stat = 0, π, while the anticommutation still holds, M Z ± | La and M X ± | La are not projectors and thus the ancilla-qubit measurement outcomes are not fully randomized, which is expected to have an effect on the observed p d . However, in the simulations p d ≈ 50% for both the case when φ L stat is randomized across runs (see Fig. 3) or when it is fixed, independently of the specific value. Since the defects d are computed is the measurement outcome at QEC cycle n, even a moderate imbalance between the probabilities of measuring m [n] = 0 and m [n] = 1 (fluctuating across QEC cycles) can lead a defect probability p d ≈ 50%. Furthermore, the phase rotations depending on φ L stat affect the measurement of each of the two weight-3 gauge operators independently, which in turn undermines the correct extraction of the weight-6 stabilizer parity. This effect is observed in Fig. 10 b,c, where in the case of φ L stat = 0, π the observed defect probability roughly corresponds to the expected one from a weight-6 check (relative to the observed one for the standard weight-4 and weight-2 checks in the absence of leakage), while a higher defect probability is observed otherwise, reaching up to 50%. Hence, the control of φ L stat in experiment would be beneficial for decoding in the presence of data-qubit leakage whenever the superchecks are considered. In the context of leakage detection, we consider only two hidden states, S = {C, L}, namely whether the qubit is in the computational (C) or the leakage subspace (L). The transition matrix is parameterized in terms of the leakage and seepage probabilities per QEC cycle. The leakage probability is estimated as Γ C→L ≈ N flux L 1 (for low L 1 ), where N flux is in how many CZ gates the qubit is fluxed during a QEC cycle and L 1 is the leakage probability per CZ gate. The seepage probability is estimated by Γ L→C ≈ N flux L 2 + 1 − e tc T 1 /2 , where t c is the QEC cycle duration and T 1 the relaxation time (see Table I), while L 2 is the seepage contribution from the gate, where L 2 = 2L 1 due to the dimensionality ratio between C and L for a qubit-qutrit pair [48]. The transition matrix A is then given by We assume that all qubits are initialized in C, which defines the initial state distribution p C [n = 0] = 1 used by the HMMs. The HMMs consider the defects d (Q i ) ≡ d i on the neighboring ancilla qubits Q i at each QEC cycle, occurring with probability p di , as the observables for leakage detection. Explicitly, the emission probabilities are parameterized in terms of the conditional probabilities B di[n],s = P (d i [n] | s) of observing a defect when the modeled qubit is in s = C or s = L. We extract B di[n],C directly from simulation, by averaging over all runs and all QEC cycles, motivated by the possible extraction of this probability in experiment. While this includes runs when the modeled qubit was leaked, we observe no significant differences in the HMM performance when we instead post-select out these periods of leakage, which we attribute to the low L 1 per CZ gate. We extract B di[n],L from simulation over the QEC cycles when the leakage probability p L DM (Q i ) as observed from the density matrix is above a threshold of p L th = 0.5. In the case of ancillaqubit leakage, B di[n],L depends on the values of the leakage conditional phases φ L stat and φ L flux . Thus, in the case of randomized leakage conditional phases, the HMMs are parameterized by the average B di[n],L . In the case of data-qubit leakage, the extracted B di[n],L is ≈ 0.5 regardless of the leakage conditional phases, as expected from the anti-commuting stabilizers (see Section I C).
For ancilla-qubit leakage detection, the analog measurement outcome I m can be additionally considered as an observable, in which case o = {d i , I m }. In this case, the state-dependent probability is further parametrized by B Im  two QEC cycles, respectively. As the HMMs take an increase in the defect probability as a signature of leakage, this is expected to result in the HMMs overestimating the probability of the tracked qubit being leaked. In addition, each HMM only takes the defects on the neighboring stabilizers as observables. Despite each HMM sharing observables with the neighboring ones, the probability of leakage at each QEC cycle is estimated independently by each HMM. While this choice minimizes the computational overhead, as a result each HMM is additionally prone to overestimating the probability of leakage when a neighboring qubit is leaked instead (leading to an increased defect probability observed on only a subset of the stabilizers taken as observables by the HMM). The HMMs can be expanded to account for these limitations, either by increasing the number of hidden states to model regular errors [20] or by expanding the set of observables to include next-nearest neighbor stabilizers, in order to account for leakage on neighboring qubits, in which case the HMMs would be still local and hence scalable. As either solution would increase the complexity and overhead of the models, we evaluate the contributions of each of these limitation to the detection capabilities of the HMMs.
We first focus on the overestimation of the leakage probability predicted by the HMMs in the pres-ence of leakage on a neighboring qubit, which we refer to as "HMM crosstalk". We consider the detection scheme taking into account the analog measurements (with the currently achieved experimental discrimination fidelity F L , see Section I F). The average responses of all HMMs to leakage events on any qubit and the predicted leakage probability 1 QEC cycle after detection (defined by the predicted probability crossing a threshold of 0.5) are shown in Fig. 11 b. The responses of the neighboring HMMs immediately (1-2 QEC cycles) after crossing this threshold is indicative of the likelihood of leakage being declared on a neighboring qubit (based on the extracted HMM responses shown in Figs. 5 and 6). Across individual runs, these parasitic responses can lead to false detections. Ancilla-qubit HMMs are insensitive to leakage on other data or ancilla qubits (see Fig. 11). We attribute this to the use of the analog measurement outcomes which discriminate between |1 and |2 with moderate fidelity and between |0 and |2 ) with very high fidelity. Instead, data-qubit HMMs are prone to overestimating the response in the case of leakage on other qubits. The crosstalk is proportional to the number of shared observables between the pairs of HMMs and depends on the expected defect probabilities during leakage by each model.
We further break down the relative contributions to the optimality O (defined in Section I E) of each of the dataqubit HMMs due to the crosstalk, shown in Fig. 11 b. Post-selecting out runs where ancilla-qubit leakage is detected from the density matrix increases the average O of the three data-qubit HMMs from O ≈ 74.0% to O ≈ 80.9%. Further post-selecting out events where leakage is detected on any of the other data qubits (which are not tracked by the given HMM) increases the average optimality to O ≈ 95.9%. The larger contribution from neighboring data-qubit leakage is consistent with the higher crosstalk (see Fig. 11 a) between data-qubit HMMs relative to the ancilla-qubit ones and constitutes the dominant limitation behind the HMM optimality. We attribute the remaining suboptimality to the presence of regular errors, caused by qubit relaxation and dephasing, and to the parametrization of the transition and output probabilities.
Appendix G: An alternative scheme for enhancing ancilla-qubit leakage detection We consider an alternative scheme (to the one considering the analog measurement outcomes) allowing for enhancing ancilla-qubit leakage detection beyond that achievable by only considering the increase in the defect probability on neighboring stabilizers. In this scheme a π pulse is applied to each ancilla qubit every other QEC cycle, accounted for in post-processing. Under the assumption that a π rotation has a trivial effect on a leaked qubit, the post-processed measurement outcomes (in the absence of errors) would show a flip every other Parameter D low D mid D high ω/2π at sweet spot (GHz) 4. 9 6.0 6.7 α/2π (MHz) −300 −300 −300 J1/2π at int. point (MHz) 15 15 Table II: Parameters used in the CZ full-trajectory simulations, with α the anharmonicity and J1 the coupling. We follow the frequency scheme of [46] with the arrangement shown in Fig. 2.
QEC cycle during the period of leakage, which corresponds to a defect every QEC cycle. This scheme would require minimal overhead, as these rotations can be integrated with the existing single-qubit gates applied to the ancilla qubits at the start of each QEC cycle. A downside is that ancilla qubits would spend more time in the first excited state on average, increasing the effect of amplitude damping. We have not simulated this scheme, but we have investigated it entirely in post-processing by only applying flips to the measurement outcomes during periods of ancilla-qubit leakage (as extracted from the density matrix). Although this does not capture the increase in the ancilla-qubit error rate due to amplitude damping, we expect that it captures the effect of the scheme on the detection of leakage.
The average HMM optimality for the bulk X and Z ancilla qubits is O (X) ≈ 58.5% and O (Z) ≈ 51.5%, respectively. For the boundary X and Z ancilla qubits, it is O (X) ≈ 74.0% and O (Z) ≈ 43.0%, respectively. This constitutes a considerable increase in optimality relative to the scheme relying only on the observed defects (see Fig. 6 a,b). However, the artificially induced defects on leaked ancilla qubits lead to the increase in the crosstalk between ancilla-and data-qubit HMMs. This has the effect of lowering the average data-qubit HMM optimality from O (D) ≈ 74.0% (see Section I E) to O (D) ≈ 39.8%. While such scheme may be beneficial for the post-selection-based scheme defined in Section I G (as in that case leakage detected on any qubits leads to discarding the run), it would be detrimental for leakageaware decoding or targeted leakage-reduction units as these rely on the accurate detection in both time and space.

Appendix H: Second-order leakage effects
In this section we consider exchanges between states in the leakage subspace as a result of a CZ gate acting on an already leaked qubit. We focus on the exchange between |12 and |21 , referred to as "leakage mobility" in Section I A. We also expand the model to include |3 on the fluxing qubit and consider the exchange between |03 and |12 , which we call "superleakage".
The Hamiltonian of two transmons dispersively coupled via a bus resonator in the rotating-wave approxima-  [9,33] for definitions). The interaction point is located at θ f = 90 deg. The inset (topright) schematically shows the direct and effective couplings between levels in the 3-excitation manifold at the interaction point. The states |03 and |21 are on resonance, while |12 is detuned by one anharmonicity α. tion is given by [9] where a is the annihilation operator, ω and α are the qubit frequency and anharmonicity, respectively, and J 1 is the effective coupling mediated by virtual excitations through the bus resonator. We assume that this Hamiltonian is a valid approximation up to the included states. For this Hamiltonian, multiple avoided crossings are found when sweeping ω flux , as schematically shown in Fig. 1. We perform full-trajectory simulations (following the same structure as in [9], excluding distortions and quasi-static flux noise) using the parameters reported in Table I and Table II. Note that extending these simulations to |3 does not affect the leakage probability L 1 from the computational (C) to the leakage subspace (L), nor the fidelity within C. We define the superleakage probability L 3 as where S CZ is the superoperator corresponding to the simulated noisy CZ. L 3 can be high depending on the specific parameters of the flux pulse and of the system, as Fig. 12 b shows for the high-mid qubit pair, even when φ CZ = π (see Fig. 12 a). We attribute this to the avoided crossing between |12 ↔ |03 occurring at ω int + |α|, where ω int is the fluxing-qubit frequency at the interaction point. For fast-adiabatic flux pulses [33] (with respect to the |11 ↔ |02 avoided crossing), pulsing the higher frequency qubit to the interaction point results in the near-diabatic passage through |12 ↔ |03 , inducing a Landau-Zener transition in which a small but finite population is transferred from |12 to |03 . At the CZ interaction point, the off-resonant interaction between |12 and |03 leads to a further population exchange, with a coupling strength √ 3J 1 . Compared to the off-resonant exchange between |01 and |10 , this interaction is stronger by a factor √ 3, which can lead to large values of L 3 when combined with the initial population transfer to |03 on the way to the avoided crossing. Furthermore, the phases acquired during the two halves of a Net-Zero pulse can lead to interference [9], increasing or decreasing the |12 ↔ |03 exchange population. Including the |12 ↔ |03 crossing leads also to differences in the values of the leakage conditional phases.
We now focus on leakage mobility, which occurs with probability L m , defined as If |3 is not included, L m takes small but non-negligible values, as shown in Fig. 12 c. We attribute this to the off-resonant interaction between |12 and |21 , with coupling strength 2J 1 . Even though this coupling is stronger than for |12 ↔ |03 , L m is generally smaller than L 3 due to the fluxing qubit not passing through the |12 ↔ |21 avoided crossing (located at ω int − |α|) on its way to the CZ interaction point. Including |3 , L m can take higher values, as shown in Fig. 12, which we associate to a twoexcitation exchange between |03 and |21 , virtually mediated by |12 . While this is a two-excitation process, |21 and |03 are on resonance at the interaction point, in which case the effective coupling can be estimated as the product of the bare couplings divided by the detuning with |12 , i.e.
1 2π in analogy to the excitation exchange between a pair of transmons mediated virtually via the bus resonator.
Since |03 and |21 are on resonance exactly at the interaction point only when α flux = α stat , differences in the anharmonicities affect the strength of this exchange.
populations is given by the exchanges from and to each subspace:ṗ The steady-state condition isṗ i = 0 for i = C, L, resulting in the steady-state populations p i ss : Considering the CZ error model in Section I A, for a qubit it approximately holds that where N flux is in how many CZ gates the qubit is fluxed during a QEC cycle, t c is the duration of a QEC cycle and L 1 (resp. L 2 ) is the average leakage (seepage) probability between C and L [48]. The use of the average leakage and seepage probabilities per gate is justified for the surface code because, in the case of data-qubit leakage, ancilla qubits are put in an equal superposition during the parity checks, while, in the case of ancillaqubit leakage, data qubits are in simultaneous entangled eigenstates of the code stabilizers. The seepage probability [Eq. (J4)] has one contribution from the unitary CZ-gate interaction and one from relaxation during the entire QEC cycle. Regarding the gate contribution, one has L 2 = 2L 1 due to the dimensionality ratio between C and L for a qubit-qutrit pair [48]. The expected steady-state populations in the simulations can be now computed. We focus on highfrequency data qubits since the low-frequency ones cannot leak without leakage mobility. We have N CZ = N flux = 4 (for D 4 ) or 3 (for D 3 , D 5 ), L 1 = 0.125%, t c = 800 ns and T 1 = 30 µs. The result is p L ss (D 4 ) = 7.5% and p L ss (D 3 ) = p L ss (D 5 ) = 5.9%. Furthermore, Eq. (J1) can be solved to find that the time evolution of p L towards the steady state is where n is the QEC cycle number, shown in Fig. 13 for the three high-frequency data qubits. We find a good agreement (within error bars) between these predictions and the average leakage population extracted from the density matrix (see Fig. 13).
We now extend the model to the |3 state, despite the fact that we have not included it in simulation due to computational constraints. To do this, we divide the leakage subspace L into the sub-parts L 2 and L 3 corresponding to leakage in |2 and |3 , respectively. The The steady-state populations {p i ss } then become: In addition to Eqs. (J3) and (J4), in this model we have The factor of 1/2 in Eq. (J8) comes from the fact that superleakage from L 2 to L 3 is possible only when the qubit pair performing the CZ is in |12 and not in |02 . For L 3 = 10%, for example, the expected steady-state populations are p L2 ss (D 4 ) = 7.1%, p L3 ss (D 4 ) = 5.1% and p L2 ss (D 3 ) = p L2 ss (D 5 ) = 5.7%, p L3 ss (D 3 ) = p L3 ss (D 5 ) = 3.8%. While p L2 ss is almost unchanged with respect to the case without superleakage, p L3 ss has a comparable magnitude to p L2 ss , suggesting that superleakage needs to be taken into account in optimizing the surface-code performance over many QEC cycles.