Exponential suppression of bit or phase errors with cyclic error correction

Realizing the potential of quantum computing requires sufficiently low logical error rates1. Many applications call for error rates as low as 10−15 (refs. 2–9), but state-of-the-art quantum platforms typically have physical error rates near 10−3 (refs. 10–14). Quantum error correction15–17 promises to bridge this divide by distributing quantum logical information across many physical qubits in such a way that errors can be detected and corrected. Errors on the encoded logical qubit state can be exponentially suppressed as the number of physical qubits grows, provided that the physical error rates are below a certain threshold and stable over the course of a computation. Here we implement one-dimensional repetition codes embedded in a two-dimensional grid of superconducting qubits that demonstrate exponential suppression of bit-flip or phase-flip errors, reducing logical error per round more than 100-fold when increasing the number of qubits from 5 to 21. Crucially, this error suppression is stable over 50 rounds of error correction. We also introduce a method for analysing error correlations with high precision, allowing us to characterize error locality while performing quantum error correction. Finally, we perform error detection with a small logical qubit using the 2D surface code on the same device18,19 and show that the results from both one- and two-dimensional codes agree with numerical simulations that use a simple depolarizing error model. These experimental demonstrations provide a foundation for building a scalable fault-tolerant quantum computer with superconducting qubits.


I. INTRODUCTION
Many quantum error correction schemes can be classified as stabilizer codes [20], where a single bit of quantum information is encoded in the joint state of many physical qubits, which we refer to as data qubits.Interspersed among the data qubits are measure qubits, which periodically measure the parity of chosen combinations of data qubits.These projective measurements turn undesired perturbations to the data qubit states into discrete errors which we track by looking for changes in the parity measurements.The history of parity measurements can then be decoded to determine the most likely correction for such errors.The error rate on the logical qubit is determined by the error rate on the physical qubits as well as the effectiveness of decoding.If physical error rates are below a certain threshold determined by the decoder, then the probability of logical error per round of error correction ( L ) should scale as: where Λ is the exponential suppression factor, C is a fitting constant, and d is the code distance, which is related to the maximum number of physical errors allowed and increases with the number of physical qubits [3,21].
Many previous experiments have demonstrated the principles of stabilizer measurements in various platforms such as NMR [22,23], ion traps [24][25][26], and superconducting qubits [19,21,27,28].However, achieving exponential error suppression in large systems is not a given, because typical error models for QEC do not include effects such as crosstalk errors.Moreover, exponential error suppression has never previously been demonstrated with cyclic stabilizer measurements, which are a key requirement for fault tolerant computing but put into play error mechanisms such as state leakage, heating, and data qubit decoherence during the measurement cycle [21,29].
In this work, we focus on two stabilizer codes.First, in the repetition code, qubits are laid out in a 1D chain which alternates between measure qubits and data qubits.Each measure qubit checks the parity of its two neighbors, and all of the measure qubits check the same basis so that the logical qubit is protected from either X or Z errors, but not both.In the surface code [3,30], the qubits are laid out in a 2D grid which alternates between measure and data qubits in a checkerboard pattern.The measure qubits further alternate between X and Z types, allowing for protection against both types of errors.The repetition code will serve as a probe for exponential error suppression with number of qubits, while a small (d = 2) primitive of the surface code will test the forward compatibility of our device with larger 2D codes.

II. QEC WITH THE SYCAMORE PROCESSOR
We implement QEC using a Sycamore processor [31], consisting of a 2D array of transmon qubits [32] where each qubit is tunably coupled to four nearest neighbors -the connectivity required for the surface code.Compared to Ref [31], this device has an improved design of the readout circuit, allowing for faster readout with less crosstalk and a factor of 2 reduction in readout error per qubit.While this processor has 54 qubits like its predecessor, we used at most 21. Figure 1a shows the layout of the d = 11 (  Stabilizer circuits on Sycamore.a, Layout of distance-11 repetition code and distance-2 surface code in the Sycamore architecture.In the experiment, the two codes use overlapping sets of qubits, which are offset in the figure for clarity.b, Pauli error rates for gates and identification error rates for measurement.All benchmarks are for simultaneous operation.c, Circuit schematic for the phase flip code.Data qubits are randomly initialized into |+ or |− , followed by repeated application of XX stabilizer measurements and finally X-basis measurements of the data qubits.d, Illustration of error detection events which occur when a measurement disagrees with the previous round.e, Fraction of measurements which detected an error versus measurement round for the d = 11 phase flip code.The dark line is an average of the individual traces (gray lines) for each of the 10 measure qubits.The first (last) round also uses data qubit initialization (measurement) values to identify parity errors and generate detection events.
surface code in the Sycamore device, while Fig. 1b summarizes the error rates of the components which make up the stabilizer circuits.Additionally, the typical coherence times for each qubit are T 1 = 15 µs and T 2 = 19 µs.
We note here two advancements in gate calibration.First, we use the reset protocol introduced in Ref. [33], which removes population from excited states (including non-computational states) by sweeping the transmon past the readout resonator.This reset gate is appended after each measurement during QEC operation, and produces the ground state within 280 ns with a typical error below 0.5%.Second, we implement a 26 ns controlled-Z gate using a direct swap between the states |11 and |02 , similar to the gates described in [14,34].As in Ref. [31], the tunable qubit-qubit couplings allow these CZ gates to be executed with high parallelism, and up to 10 CZ gates are executed simultaneously for the 21 qubit repetition code.Using simultaneous cross-entropy benchmarking [31], we find that the median Pauli error for the CZ gates is 0.62% (or an average error of 0.50%).
We focused our repetition code experiments on the phase flip code where data qubits occupy superposition states and are sensitive to both energy relaxation and dephasing, making it more challenging to implement and more predictive of the performance of a surface code.A 5-qubit unit of the phase flip code is shown in Fig. 1c.This stabilizer circuit maps the X-basis parity of the data qubits onto the measure qubit, which is measured then reset, and this circuit is repeated in both space (across the 1D chain) and time.During measurement and reset, the data qubits are dynamically decoupled to protect the data qubits from various sources of dephasing [35].In a single shot of the experiment, we initialize the data qubits into a random string of |+ or |− on each qubit.Then, we repeat stabilizer measurements across the chain over many rounds, and finally, we measure the state of the data qubits in the X basis.
Our first pass at analyzing the experimental data is to turn measurements into error detection events, which we find by comparing stabilizer measurements of the same measure qubit between adjacent measurement rounds.We refer to each possible spacetime location of a detection event (i.e. a specific measure qubit and measurement round) as a detection node.
In Fig. 1e, for each detection node in a 50-round, 21qubit phase flip code, we plot the fraction of experiments (76,000 total) where a detection event was observed on that node, or the detection event fraction.Overall, roughly 11% of measurements signaled a detection event, except in the first and last round.At these two time boundary rounds, detections are determined by comparing the first (last) stabilizer measurement with data qubit initialization (measurement).Importantly, the time boundary rounds are not subject to errors accumulated by the data qubits during measure qubit readout, illustrating the importance of running QEC for multiple rounds to accurately extract performance [35].Aside from these boundary effects, we find that the detection event fraction is stable across all 50 rounds of the experiment, a key finding for the feasibility of QEC.Previous experiments had observed rising detection event fractions [21], and we attribute the stability of our system to our use of reset to remove leakage in every round [33].

III. CORRELATIONS IN ERROR DETECTION EVENTS
We next characterize the pairwise correlations between detection events.A Pauli error affecting any operation in the repetition code should produce exactly two detections (except at the spatial boundaries of the code) which come in three flavors [21].First, an error on a data qubit usually produces a detection on the two neighboring measure qubits in the same round -a spacelike error.The exception is an error during the CZ gates, which may cause detection events offset by 1 unit in time and space -a spacetimelike error.Finally, an error on a measure qubit which does not propagate to a data qubit will produce detections in two subsequent rounds -a timelike error.These rules are represented in the planar graph shown in Fig. 2a, where expected correlations are drawn as graph edges between detection nodes.We check how well Sycamore conforms to these expectations by computing the correlation probabilities between arbitrary pairs of detection nodes.Under the assumption that all correlations are pairwise and that error rates are sufficiently low, the probability of simultaneously triggering two detection nodes i and j can be estimated as where x i = 1 if there is a detection event and x i = 0 otherwise, and x denotes an average over all experiments [35].The numerator can be understood as the covariance between detections in i and j, while the denominator is an adjustment factor.Note that p ij is symmetric between i and j.In Fig. 2c, we plot the correlation matrix for the data shown in Fig. 1e.In the upper triangle, we show the full scale of the data, where the only visible correlations are either spacelike or timelike, demonstrating that error correlations in the device behave mostly as expected.However, the sensitivity of this technique allows us to find features which do not fit the expected categories.In the lower triangle, we plot the same data but with the scale truncated by nearly an order of magnitude.The next most prominent correlations are spacetimelike, as we expect, but we also find two additional categories of correlations.First, we observe detection correlations between non-adjacent measure qubits in the same measurement round.While these non-adjacent qubits are far apart in the repetition code chain, they are in fact spatially close [35] since the 1D chain is embedded in a 2D array, which suggests that while crosstalk exists in our system, it is short range.Optimization of the frequencies in our system already mitigates crosstalk errors to a large extent [35,36], but further research is required to further suppress these errors.Second, we find excess correlations between measurement rounds that differ by more than 1.
We attribute these long lived correlations to the presence of leakage on the data qubits, which may be generated by a number of sources including gates [12], measurement, and heating [37,38].For the observed crosstalk and leakage errors, the excess correlations are around 3 × 10 −3 , an order of magnitude below the measured spacelike and timelike errors but well above the noise floor of the measurement of 2 × 10 −4 .
Having established that on average, the errors are mostly well-behaved, we now highlight a different kind of error correlation.In Fig. 2d, we plot a time series of detection event fractions averaged over all measure qubits for each shot of an experiment.We clearly observe a sharp spike in the errors at a specific point in time, fol-

FIG. 3.
Logical errors in the repetition code.a, Logical error probability versus number of detection rounds and number of qubits for the phase flip code.Smaller code sizes are subsampled from the 21 qubit code as shown in the inset; small dots are data from subsamples and large dots are averages.b, Semilog plot of the averages from a showing even spacing in log(error probability) between the code sizes.Error bars are estimated standard error from binomial sampling.The lines are exponential fits to data for rounds greater than 10. c, Logical error per round ( L) vs. number of qubits, showing exponential suppression of error rate for both bit and phase flip, with extracted Λ factors.The fit excludes n qubits = 3 to reduce the influence of spatial boundary effects [35].
lowed by an exponential decay.These types of events introduce significant correlated errors for roughly 0.5% of all data taken [35], and we attribute them to high energy particles such as cosmic rays striking the quantum processor, also recently observed in Ref. [39].For the purposes of understanding the typical behavior of our system, we remove data near these events (Fig. 2d.), but note that these errors will need to be understood and mitigated [40,41] for large-scale fault-tolerant computers.

IV. LOGICAL ERRORS IN THE REPETITION CODE
We decode detection events and determine logical error probabilities following the procedure outlined in Ref. [21].Briefly, we use a minimum weight perfect matching al-gorithm to determine which errors were most likely to have occurred given the observed detection events, and correct the final measured state of the data qubits in post-processing.A logical error occurs if the corrected final state is not equal to the initial state.We repeat the experiment and analysis while varying the number of detection rounds from 1 to 50 with a fixed number of qubits, 21.We determine logical performance of smaller code sizes by analyzing spatial subsets of the 21-qubit data, which reduces the amount of data required [35].These results are shown in Fig. 3a, where we clearly observe a decrease in the logical error probability with increasing code size.Figure 3b plots the same data on a semilog scale and illustrates the exponential nature of the error reduction.
To extract logical error per round ( L ), we fit the data for each number of qubits (averaged over spatial subsets) to 2P error = 1 − (1 − 2 L ) n rounds , which expresses an exponential decay in logical fidelity with number of rounds.In Fig. 3c, we show L for the phase flip and bit flip codes versus qubit number.The data clearly demonstrates exponential suppression of logical errors, with more than 100× suppression in L from 5 qubits ( L = 8.7 × 10 −3 ) to 21 qubits ( L = 6.7 × 10 −5 ).Additionally, we fit L vs. code distance to Eqn. 1 to extract Λ, which we plot in Fig. 3c.We find Λ X = 3.18 ± 0.08 for the phase flip code and Λ Z = 2.99 ± 0.09 for the bit flip code [35].

V. ERROR BUDGETING AND PROJECTING QEC PERFORMANCE
To better understand our repetition code results and project surface code performance on the Sycamore architecture, we simulated our experiments using a depolarizing noise model, meaning that we inject a random Pauli error (X, Y , or Z) with some probability after each operation [35].The Pauli error probability for each type of operation is computed using averages of the data in Fig. 1b and shown in Fig. 4a.We perform two different types of simulations to compare our model to the data.First, we run a direct simulation using the error rates in Fig. 4a. to obtain a value of Λ which should correspond to our measured values.Second, we simulate the experiment while individually sweeping operational error rates and observing how 1/Λ changes.The relationship between 1/Λ and the component error rates is roughly linear [35], and the sensitivity coefficients obtained from the second simulation allow us to estimate how much each operation in the circuit increases 1/Λ (decreases Λ).The resulting error budgets for the phase and bit flip codes are shown in Fig. 4b.Overall, measured values of Λ are roughly 20% lower than simulated values, which we attribute to mechanisms such as the leakage and crosstalk errors which are shown in Fig. 2c and were not included in the simulations.Of the modeled contributions to 1/Λ, the dominant sources of error are from the CZ gate and decoherence of the data qubits during measurement and  1b.Note the idle gate (I) and dynamical decoupling (DD) values depend on the code being run because the data qubits occupy different states.b, Estimated error budgets for the bit flip and phase flip codes, and projected error budget for the surface code, based on the depolarizing errors from a.The repetition code budgets slightly underestimate the experimental errors, and the discrepancy is labeled stray error.For the surface code, the estimated 1/Λ corresponds to difference in L between a d = 3 and d = 5 surface code.c, For the d = 2 surface code, fraction of runs that had no detection events versus number of rounds, plotted with the prediction from a similar error model as the repetition code (dashed line).Inset: physical qubit layout of the d = 2 surface code, 7 qubits embedded in a 2D array.d, Surface code logical error probability among runs with no detection events versus number of rounds.Simulations from the same model as c (dashed lines) show good agreement.Error bars for c (not visible) and d are estimated standard error from binomial sampling with 240,000 experimental shots, minus the shots removed by post-selection in d.
reset.In the same plot, we show the projected error budget for a surface code, where we find that overall performance must be improved to observe error suppression in a d = 5 surface code compared to d = 3.
Finally, we test our model against a distance-2 surface code logical qubit [19].We use seven qubits in the same Sycamore device to implement one weight-4 X stabilizer and two weight-2 Z stabilizers as depicted in Fig. 1a.This encoding can detect any single error, but contains ambiguity in what correction corresponds to a given detection, so we discard any runs where we observe a detection event.We show the fraction of runs where no errors were detected in Fig. 4c for both logical X and Z preparations; we discard 27% of runs each round, in good agreement with the model prediction.Logical errors can still occur after post-selection, for example with two simultaneous errors.Following post-selection, we compute the logical error probability in the final measured state of the data qubits, shown in Fig. 4d, where we find roughly 2 × 10 −3 error probability per round [35].The model slightly underestimates the logical error, with stray error similar to the repetition code case, giving us confidence that our surface code projections are accurate up to small corrections for crosstalk and leakage.

VI. CONCLUSION AND OUTLOOK
In this work, we show that a system with 21 superconducting qubits is stable when undergoing many repetitive stabilizer measurement cycles.By computing the probabilities of detection event pairs, we find that the physical errors detected on the device are localized in space and time to the 3 × 10 −3 level.Logical errors in the repetition code are exponentially suppressed when increasing the number of qubits from 5 to 21, even after 50 rounds of operation.Finally, we corroborate experimental results on both 1D and 2D codes with depolarizing model simulations and show that the Sycamore architecture is within a striking distance of the surface code threshold.
Nevertheless, many challenges remain on the path towards scalable quantum error correction.In the short term, our error budgets point to the salient research directions required to reach the surface code threshold: reducing the CZ gate error, and reducing data qubit errors during the measurement and reset cycle.Reaching this threshold will be an important milestone in quantum computing, but practical quantum computation will require Λ ∼ 10 for the physical qubit overhead to be reasonable [35].Achieving this performance will require significant reductions in operational error rates, and maintaining a stable system over the course of a computation will require further research into mitigation of novel error mechanisms such as high energy particles.In addition to the phase flip code that is primarily described in the main text, we also ran a bit flip code for which the logical error rates are shown in Fig. 3c of the main text.The experimental implementation of the bit flip code is similar to the phase flip code except for the following differences: • Initialization and measurements are performed in the Z basis instead of X.
• The stabilizers used are Z type instead of X type, which means that the the data qubits do not have Hadamards at the beginning and end of each stabilizer round, and parity is measured in the Z basis rather than X.
• We do not run dynamical decoupling pulses on the data qubits during measurement.
• Finally, prior to measurement in every round, we flip all of the data qubits with a π pulse to ensure that the data qubits do not collapse into the ground state and remain there, which would artificially reduce logical error probabilities.
In Fig. S1, we show detection fractions and two point correlations for the 50 round bit flip code, and in Fig. S3, we show the logical error probabilities for rounds 1-50 of the bit flip code.

II. LOGICAL ERROR PROBABILITIES WITHOUT POST-SELECTION
Logical error probabilities shown in Fig. 3 of the main text were computed while excluding device-wide correlated error events which we attributed to high energy particles.In Fig. S3, we show the fraction of data that was discarded for every number of rounds in the phase and bit flip codes, as well as the logical error probabilities.To within the uncertainty from fitting, values of Λ X and Λ Z do not change when we do not discard data.

III. THE d = 2 SURFACE CODE
We implement a logical qubit in the distance-2 surface code, the smallest non-trivial example of a surface code logical qubit [42,43].The physical layout is depicted in Fig. S4a-b, consisting of a 2 × 2 array of data qubits, indexed 0 to 3, subject to three stabilizer measurements Z 0 Z 1 , X 0 X 1 X 2 X 3 , and Z 2 Z 3 .
Since there are only four data qubits, it is straightforward to write explicit quantum states for the Z L and X L We can isolate specific logical states using the logical operators Z L = Z 0 Z 2 and X L = X 0 X 1 shown in Fig. S4c.For example, |0 L (+1 eigenstate of Z L ) is the unique ground state of H − Z L .An alternative way to identify |0 L is to start with |ψ 0 ψ 1 ψ 2 ψ 3 = |0000 , which is a +1 eigenstate of Z L and both Z stabilizers, and then project it into the X 0 X 1 X 2 X 3 = +1 subspace with the projec- tion operator (1 + X 0 X 1 X 2 X 3 )/2.The logical states are It is also possible for some stabilizer values to be −1.For example, if X 0 X 1 X 2 X 3 = −1 but the others are +1, then we identify |0 L = (|0000 − |1111 )/ √ 2, differing from the +1 case by Z 0 (or any Z i ).Initializing to |0000 and projectively measuring X 0 X 1 X 2 X 3 , this would be the outcome half the time (also see Fig. S6a).
In our experiments, we explore all 8 stabilizer value combinations, which is representative of stabilizer values Stabilizers and logical operators.a, Layout of the distance-2 logical qubit as depicted in Fig. 1a, with the data qubits labeled 0, 1, 2, 3, and the measure qubits labeled A, B, C. b, The same logical qubit depicted in a more standard lattice surgery surface code notation, as in Ref. [45].
The Z stabilizers are light tiles (Z0Z1 and Z2Z3), and the X stabilizer is a dark tile (X0X1X2X3).c, The logical operators XL = X0X1 and ZL = Z0Z2, which cross at qubit 0, so [XL, ZL] = 0. d, A distance-3 logical qubit and its logical operators, analogous to c, with 9 data qubits and 8 stabilizers.
Surface code quantum circuit.Quantum circuit implementing repeated Z (green) and X (blue) stabilizers, analogous to Fig. 1c.The stabilizer circuit is longer (four CZ layers) because of the weight-4 X stabilizer.For XL logical measurements, we include Hadamard gates on each data qubit prior to measurement, shown in gray; these are omitted for ZL logical measurements.
that would be encountered by a long-lived logical qubit.In particular, we initialize the data qubits to each of the 16 possible bitstrings, such as |0111 .For experiments in the logical Z basis, we proceed directly with stabilizer measurements, and the Z stabilizers and Z L are already well-defined (for For experiments in the logical X basis, we perform Hadamards on all four data qubits before proceeding with the stabilizer measurements, so |0111 becomes |+−−− .Now the X stabilizer and X L are welldefined (for |+−−− , X 0 X 1 X 2 X 3 = −1 and X L = −1), and the first Z stabilizer measurements are each randomly ±1.We show the specific quantum circuit for these experiments, analogous to Fig. 1c, in Fig. S5.
Note that to prepare a logical X L or Z L eigenstate, it is important to initialize all the data qubits in the same basis (X or Z) as the intended logical qubit state.Then, the data qubit state is an eigenstate of all the stabilizers of the same type as the logical operator, and any errors of the opposite type can be detected in the first round.
Error detection.a, Example initialization to |0000 prior to the first round of stabilizer measurements.This is a +1 eigenstate of ZL and both Z stabilizers.In the first round, any X error can be detected.However, the first X stabilizer measurement will be random, so no Z errors can be detected.b, |++++ is a +1 eigenstate of XL and the X stabilizer.In the first round, any Z error can be detected, but the two Z stabilizers will have random values.c, |++00 is a +1 eigenstate of XL and the lower Z stabilizer.As in a, the first X stabilizer measurement will be random, so no Z errors can be detected, risking a logical error XL = −1.d, Illustration of the detected syndrome for one X error.Note X0 and X1 have the same syndrome, but X0 flips ZL while X1 does not.X2 and X3 are similar.e, Illustration of the detected syndrome for one Z error.All four have the same syndrome, but Z0 and Z1 flip XL while Z2 and Z3 do not.In d-e, there is an implicit decoding procedure: for flipped X0X1X2X3, insert Z0 correction; for flipped Z0Z1, insert X0 correction; and for flipped Z2Z3, insert X2 correction.When this correction is the wrong choice, which happens for about half of error events, we get logical errors.
We show standard Z and X initializations in Fig. S6a-b.Alternatively, consider |++00 , shown in Fig. S6c, which is employed in Ref. [43].The first X 0 X 1 X 2 X 3 measurement will be random, so no Z errors can be detected on the first round, risking a logical error in This encoding can detect any single error, but because it is only distance-2, the code cannot be used to correct for errors, as shown in Fig. S6d-e.Any single error on a data qubit leads to an ambiguous syndrome, where it is unclear if a logical operator has been affected.This is distinct from the larger distance-3 logical qubit (see Fig. S4d), where any single error can be corrected unambiguously (distance-d can accommodate any (d − 1)/2 errors).
Consequently, any time we observe a detection event in a run, we simply discard that run.As we increase the number of rounds, we increase the probability that there has been a detection event, so the fraction of runs we keep decreases exponentially, as shown in Fig. 4c of the main text.Empirically, we remove about 27% of runs each round, which agrees well with simulations of the experiment.
At the end of each run, we measure the data qubits in the basis matching the logical basis of the experiment, either X or Z, and evaluate the appropriate logical operator.We identify a logical error if the logical measurement outcome differs from the value we initialized.By post-selecting only runs without detection events, we avoid most logical errors.However, two simultaneous errors can be undetectable and lead to logical errors, such as X 0 X 1 , which flips Z L .Following post-selection, the probability of a logical error is about 0.002 each round, as shown in Fig. 4d.Specifically, for X basis, we observe 0.0016 ± 0.0001 error per round, and for Z basis, 0.0027±0.0001(linear fit uncertainties).For comparison, in Ref. [43], about 60% of runs are removed each round, and the logical error probability is about 0.03 each round.
In Fig 4b, we project the error suppression factor Λ for the surface code.Modest performance improvements will be needed to achieve Λ > 1, which would be a clear demonstration of operating below threshold error rates, where making the code larger makes it better (even if the absolute error rate is worse than a physical qubit).However, a practical surface code quantum computer would benefit from Λ ∼ 10, which vastly decreases the required physical qubits per logical qubit for a given logical error rate.For example, suppose we want an overall logical error suppression 1/Λ (d+1)/2 = 10 −12 for a practical computation.For a given Λ, we can solve for distance d and estimate the required number of physical qubits per logical qubit as roughly 2d 2 , as shown in Fig. S7.For Λ = 10, this corresponds to roughly 1000 physical qubits (distance-23).
FIG. S7.Physical qubits per logical qubit.We estimate the physical qubits required for one logical qubit to achieve an overall logical error suppression of 10 −12 as a function of the inverse error suppression factor 1/Λ, marking Λ = 10 with a vertical line.Left: semi-log, right: log-log.

IV. QUANTIFYING LAMBDA
Accurately benchmarking the performance of quantum error correction can be confounded by artifacts if experiments are not carefully designed.In particular, boundary effects can introduce different error characteristics that must be understood.Here, we study two types of boundary effects.The first is qubits at code boundaries, which interact with a reduced number of stabilizers and thus participate in a reduced number of entangling gates and may decrease the number of physical errors present.Second, data qubits are subject to less errors in the first round of the code than in the steady-state, and data qubit measurement errors are only relevant in the final round of measurements.
In our analysis of the repetition code, we use the technique of subsampling outlined in the supplementary materials of [21].In order to, for example, compare the performance of a d=11 repetition code to a d=3 repetition code, we take a single dataset for the d=11 code, perform matching analysis, then subsample this dataset into a collection of d=3 datasets and perform matching analysis on each sub-dataset.Generally, a repetition code  ), and uniquely choosing a line of 5 qubits (for d = 3) along it, as shown in Fig. S8.Subsampling has a number of practical advantages.First and foremost, the experimental burden of acquiring data is reduced.In order to quantify the performance of a distance d repetition code as well as all possible configurations of smaller code distances, without subsampling we would need to perform n experiments = (d−1)/2, odd n=1 d − 2n.In the case of d = 11, subsampling reduces the datasets needed by a factor of 25.Additionally, by using only a single source dataset, we enforce self-consistency in error rates between code distances and reduce sensitivity to systematic errors and system drift that may occur between data acquisition runs.Alternatively, one could collect only a single dataset for each code distance.However, qubits typically have performance variations and the choice of which qubits for which code distance at what time will introduce bias or noise into benchmarking.
In order to understand boundary effects and their impact on repetition code data, we perform simulations using an uncorrelated depolarized Pauli error model.Here, we use a simple error model described by Table S1, where every qubit shares identical error rates.Given these probabilities, we simulate 100,000 runs of a 21 qubit repetition code over 10 QEC rounds.
We process this simulated data to explore the detection event fraction as a function of round, per qubit.We find that the first and last round deviate from the steadystate detection event round, as seen in Fig S9 .This discrepancy comes from a difference in circuit structure as well as initial conditions.Before initialization, all qubits begin in the |0 state and suffer no Idling error during the M + R operations that subsequent rounds do.In the last round, the stabilizer outcomes are determined from the final data qubit measurements, and require no data qubit idling or entangling gates.These differences manifest in smaller error rates and thus smaller detection event fractions associated with these rounds.
This non-uniformity in detection event fraction must be accounted for when analyzing Λ.In benchmarking QEC, we seek to quantify the logical error rate in the steady-state, but these boundary effects indicate the error rate is slightly different at the beginning and end of the code.Due to this effect, the logical error probabilities will deviate slightly from an exponential decay.To mitigate this behavior, we choose to fit an exponential decay to only experiments with a large number of rounds (greater than 10), where this effect is minimized.This can be seen in Fig. S10, where in this simple model we see logical error probabilities that deviate from an exponential model (dashed, solid lines) at small numbers of rounds.In this regime, the logical error probabilities outperform the steady state and are not predictive of future QEC performance.This discrepancy, here up to a factor of 2, can vary depending on circuit construction and hardware.
In addition to time boundary effects, spatial boundary effects also exist for qubits located at the edge of the code, which participate in less entangling gates.This can be seen in Fig. S11, where the measure qubits at the edges of a simulated 21 qubit repetition code have lower detection event fraction.This introduces a small but systematic difference in comparing subsampled data to experiments that are run in isolation.
fitted / simulated error simulated data exponential fit FIG.S10.Logical error probabilities fitted to an exponential model of logical error rate (dashed, solid lines) (distance-3, repetitions = 10,000).At low rounds, we see deviations from the exponential fit due to boundary effects at the start and end of the code, where error rates are reduced as compared to the steady-state of the experiment.This can be seen in the lower graph, where we plot fitted error over simulated error.At low rounds, we find up to nearly a factor of 2 discrepancy.To mitigate this effect, we fit the exponential only to rounds greater than 10.Similar fits can be seen in Fig. 3 of the main text, and in Fig. S3 and Fig. S3.detection event fraction measure qubit index FIG.S11.Detection event fraction vs measure qubit index for a 21 qubit repetition code.Detection event fraction for measure qubits at the edge of the code (index = 0, 9) have lower detection event fraction, as data qubits on the boundary participate in fewer entangling gates.

V. CIRCUIT SIMULATIONS WITH PAULI NOISE
This section describes simulations that approximate errors in the experiment as Pauli errors sampled from probability distributions and inserted into a circuit of Clifford gates.In many quantum error-correcting codes, including repetition codes and surface codes, the bulk of the encoded operations consist only of gates from the Clifford group [46]; the exception is the need to enact logical non-Clifford gates, such as through magic-state distillation [47], which is needed in a fault-tolerant quantum computer but beyond the scope of logical memory experiments like this work.A circuit composed entirely of Clifford gates can be simulated efficiently using the Gottesman-Knill theorem [48], and this description includes noisy circuits where the noise is a probability distribution for randomly inserting a Pauli operator after each gate.Moreover, for stabilizer codes [46], the stabilizers are Pauli operators which can be measured by Clifford gates, so it is convenient to represent errors as a distribution of Pauli errors.We employ this model here -Clifford circuits with Pauli errors -because the simulations can easily scale to modeling large surface codes, such as a distance-23 surface code requiring at least 1057 qubits.
We employ circuit simulations to attempt to understand the relative contributions of errors from different operations, also known as error budgeting.This proceeds in two stages.First, we run simulations of the repetition codes with circuit-noise parameters informed by benchmarking component operations, such as CZ gate error from cross-entropy benchmarking and idling qubit error from measuring T 1 and T 2 .We compare the logical error rate in the simulations with the logical errors in the experiment, and see close agreement.We also discuss possible explanations for the gap between experiment and simulation.
Second, we use simulations to estimate the relative contributions of component errors to the logical error rate.We construct an error budget for Λ (see Eqn. (1) of the main text) by attempting to represent its inverse Λ −1 as a linear function of the component errors, which we motivate by arguing that Λ −1 is approximately linear in the component errors.For such a model, the fraction budgeted to each component is simply given by the weighted contribution of the component error, divided by quantity Λ −1 .However, Λ −1 is not a perfectly linear function, and we discuss our approach to dealing with this.Our intent with the error budgeting is to determine what component error rates are necessary to implement a working demonstration of a surface code.We can forecast how a small surface code might perform if run on a device with current error rates, and we can use the error budget to compare tradeoffs in component errors and make design decisions for future devices.

A. A Description of a Component-Error Model for Simulations
We simulate the repetition and surface code experiments in a simplified "circuit noise" model.A circuit is constructed from component operations, including Clifford gates and related operations like initialization or measurement in the eigenbasis of a Pauli operator.A circuit composed of these components can be simulated efficiently, and this set of instructions is sufficient to implement stabilizer codes such as repetition codes and surface codes.
Noise in the circuit is simulated by sampling random Pauli errors and inserting them into the circuit according to the following probability model.For each component, there is a "Pauli error channel," which is a distribution over the possible Pauli errors to insert, including identity for no error (e.g. the distribution has 4 elements for single-qubit operation, or 16 for a two-qubit operation).For each component in the circuit, a Pauli error is sampled according to the distribution associated with that component, and this Pauli operator is inserted after the component.Measurement errors are treated slightly differently, as follows.The binary measurement result is flipped with a probability p, i.e. it goes through a classical binary symmetric channel instead of a Pauli channel.For the circuits used in this work, when a qubit is measured, it is always reset before being used again; this means we do not assume that a measured qubit is left in the state consistent with a measurement result, because we unconditionally reset that qubit before using it again.
The effect of the randomly sampled Pauli errors that are injected into the simulated circuit is to change some of the measurement outcomes from their expected values.For example, an X (bitflip) error that occurs on a data qubit will be detected by the next syndrome circuits that interrogate this data qubit.We collect the syndrome measurements and final data-qubit measurements in the simulation, and process them in the same way as the experiment using minimum-weight matching to infer a most likely location of errors.
Our simulations make some simplifying assumptions about the Pauli error channels.First, we assume that each use of a component of the same type (e.g.every CZ gate) has the same error channel.Of course, it would be straightforward to simulate different error channels for each gate in the circuit.This would also be computationally efficient, but we opt to keep the number of parameters in the simulation relatively small.Second, we further simplify error channels to be parameterized by a single scalar parameter.The error channel for each gate or idle is a depolarizing channel parametrized by a single probability p for any error to occur; for a single-qubit depolarizing channel, each of X, Y, or Z errors has probability p/3 to occur; for a two-qubit depolarizing channel, each of the 15 non-identity Paulis has probability p/15 to occur.Each reset operation is followed by a quantum bitflip channel (random insertion of Pauli X), and each measurement operation is followed by a classical bitflip channel (random flip of the measurement bit).All components (e.g.every CZ gate) have the same error channel, but different components can have different error probabilities (i.e.measurement error p m can be distinct from the CZ error p CZ ).
There are six types of component operations in our model, which are listed in Tab.S2.Since the error channel on each component has a single parameter, the noise in the simulator has six parameters.We refer to these parameters collectively as a vector denoted x, which we use to relate the component-error probabilities to performance measures of the repetition and surface codes, such as logical-error probability or Λ, the ratio by which logical error improves when code distance is increased by 2.

B. Comparing Component-Error Simulations to the Experiments
To reproduce experimental conditions in the simplified simulator, we try to approximate the error rate in each component with data from benchmarking of those components.The methods for characterizing error are: • Single-and two-qubit gates: cross-entropy benchmarking [49], averaged over the gates used in the experiment.Averages treat one-qubit and two-qubit gates separately.
• Idle operations: modeled as memoryless depolarizing channel with decay time constant given the by relevant experiment, meaning "T 1 decay" for the bitflip code and "T 2 decay" for the phaseflip code.T 1 decay means initializing |1 and measuring probability of the state being |1 as a function of time; T 2 decay meanings initializing |+ and measuring decay of this state to the mixed state with time, while doing CPMG echoing to remove lowfrequency phase noise (this dynamical decoupling is also done during idle operations in the phaseflip experiments).
• Reset and measurement: These errors are difficult to distinguish; measurement error presents a noise floor for reset characterization.However, for simulation purposes, only the sum of the two error probabilities is important.We characterize reset by performing the reset gate between measurement pulses, preparing the qubit in |0 or |1 ; the error is the probability of finding |1 after reset.For measurement, we benchmark individual qubits by preparing |0 or |1 and immediately measuring, identifying the error probability.We also benchmark simultaneous readout on all the measure qubits and all the qubits, as in Ref. [31].
It is important to note that the model is limited to only simulating Markovian Pauli channels.The associated probability distributions are independent and identically distributed for each type of component.Other important physical effects that we suspect to be present are not included in the model, such as leakage, cross-talk during gates, cosmic rays, parameter drift with time, or any other non-Markovian noise source.The reason for choosing such a limited noise model is that it scales to large problem sizes and allows us to make forecasts of surface codes.In future work, we will improve the simulations to incorporate approximations to effects like leakage that are still computationally efficient at large numbers of qubits.
The simulation conditions mirror the experiments in simulating bitflip and phaseflip error-correcting codes with the following parameters.The values of componenterror probabilities are those given in the main text, Fig. 4a.The syndrome circuits are executed n rounds times, for n rounds being every integer in the range [1,50].At each value of n rounds , the simulation is executed M = 160,000 times.A logical error has occurred if the logical measurement at the end of an error-correction circuit gives an encoded qubit state different from the initial encoded state.We count the number of simulated logical errors m e (n rounds ) at each value of n rounds , and the logical error probability is calculated as For each value of code distance d ∈ {3, 5, 7, 9, 11}, we determine the logical error rate logical by fitting to the sampled data.This fitting ansatz has the properties that P error (n rounds = 0) = 0, it saturates as P error (n rounds → ∞) = 0.5, and the error after one round P error (1) = logical .As in the main text, we calculate Λ as the ratio by which logical error improves when increasing the code distance by 2: The simulated logical error vs. number of syndrome rounds, and fits to this data, are shown in Fig. S12.The simulated logical error rates match well but not perfectly to the experimental results.Figure S13 shows the fitted logical error per round vs. code distance and fits to determine Λ.The error rates are lower, and Λ values are higher, than what is seen in the experiments.We attribute this discrepancy to one of the assumptions of the simulator not holding in experiment.For example, Section VI discusses evidence for cross-talk errors happening during the experiment as well as long time correlations in detection events due to presence of leakage states in the data qubits.Another possibility is that parameter drift during the experiment leads to higher error rates when running error correction than during the component benchmarking that determines the component error probabilities used in the simulation.Said another way, this method of forecasting Λ accounts for about 85% of the error, because it predicts Λ −1 values that are about 0.85 of the experimentally measured values, leaving weighted error contributions of about 15% of the total not accounted for.This method was also used to simulate the d=2 surface code, producing the "model" traces in Fig. 4c-d of the main text.

C. Error Budgeting: Constructing a Linear Model Relating Component Errors to Inverse of Lambda
The quantity Λ is used to forecast logical error rate for a quantum code of a given size, so we extend this reasoning to determine what component error rates are needed to realize a target Λ value.We use the convention that Λ is the factor by which logical error is suppressed by increasing code size, where Λ > 1 means logical error decreases when code size increases.As a ratio, its inverse Λ −1 has the same meaning (the factor by which logical error changes when code size increases one step).Moreover, we argue that Λ −1 is approximately a linear function of component errors.As in the main text, we say that logical error rate is related to code distance d by logical (d) ∝ Λ −[(d+1)/2] for d odd.It has been seen in numerical simulations with Pauli-channel noise [50,51] that for a single physical-error parameter p, logical ∝ (p/p th ) [(d+1) /2] , where p th is the threshold error rate for the chosen code and error model parameterized by p. Hence, a naive comparison of the two approximate expressions would have Λ −1 = p/p th , meaning that Λ −1 is (approximately) linear in p.
For notational simplicity, denote the vector of component error rates as x and let there be a function of component error rates f (x) such that Λ −1 = f .We will assume throughout that f (0) = 0, meaning Λ approaches ∞ in the limit errors go to zero.If f (x) were a truly linear function in its arguments, we could calculate the gradient g = ∇f anywhere to determine f exactly.However, numerical simulations show that this is not the case, and the gradient changes for different choices of the point to linearize around.Since we desire a linear model to form an error budget, we need to make a choice of how to do so; since f (x) is not linear, there is no single "correct" answer.
Our approach is to treat f (x) as if it was a second-order function in its arguments, where g is the gradient of f , (H) ij = ∂ 2 f /∂x i ∂x j is the Hessian matrix of f , and both are evaluated at x → 0 + .By doing so, we are saying that the second-order terms would capture enough of the nonlinearity in f to provide a good approximation in the domain of interest.We then exploit the following property.For any second-order function f with f (0) = 0, there is a linear function given by the first-order Taylor series evaluated at a point a/2 such that this linear function coincides with the secondorder function at a: To make an error budget for the experimental component-error vector x (values in Fig. 4a of the main text), we use simulations to numerically evaluate the gradient of f at x/2, which determines the weights on the error components.From the weights in this linear model, we can produce an estimate of f = Λ −1 that shows the weighted contribution of each component error.These results are summarized in Tab.S3 and Tab.S4.We see in these tables that the major source of logical error (more than 50% of the budget) is idling error during the measurement and reset process.This is simply due to T 1 decay times around 15 µs and idle times (880 ns during measurement and reset), leading to an error probability of 4-5% during each such operation.CZ gates and the combined effect of reset and measurement account for most of the remaining errors, with very small contributions from one-qubit gates and idle operations during gates.S4.Error budget for phase flip code.*Note that "I" gates are assigned zero weight.The term in the gradient of Λ −1 for this component is actually a small negative number that depends on code distance, for example about -1 for Λ between d=3 and d=5.The reason this is negative is that "I" gates only appear on data qubits at the endpoints of the linear chain, and not across the data qubits like the other components.This is why the derivative of Λ −1 with respect to "I"-gate probability is negative: errors in this component affect d=3 more than d=5, and the trend continues to higher distances.For the experimentally measured error rate in this component, it has negligible contribution to logical error and hence Λ −1 , so we choose to set its weight to zero for the purposes of an error budget.In this section, we discuss a technique that allows us to characterize error processes in repetition code experiments using correlations between detection events.We refer to this technique as the p ij correlation matrix method.We use it to estimate the probability p ij of conventional (e.g., bit or phase flips) and unconventional (e.g., leakage and crosstalk) error processes that produce pairs of detection events at the error graph nodes i and j.We use this technique to produce in-situ diagnostics for QEC operation, and because it extracts detailed error information, it can also inform weights to the decoder.

A. Error graph and correlation matrix pij
Figure S14 shows an example of the error graph of a quantum bit-flip or phase-flip repetition code.It contains (N r + 1)N mq nodes (vertices), where N r is the number of rounds (0, 1, ...N r − 1) and N mq is the number of measure qubits (the number of data qubits is then d = N mq + 1, which is also the code distance).Each node i corresponds to readout of a measure qubit (except for the last column of nodes -see below) and can be associated with a pair of error graph coordinates: i = {s, t}, where s = 0, 1, ...N mq − 1 is the space-coordinate (mea- sure qubit index) and t = 0, 1, ...N r is the time-coordinate (round number).The nodes can also be counted, e.g., in the "time-first" manner, or in the "space-first" manner, In each experiment, some of the nodes experience error detection events [21] (or simply "detection events") denoted by red dots in Fig. S14 (black dots denote absence of detection events).By definition, a detection event at node i = {s, t} occurs when the corresponding measurement result m {s,t} is different from the previous measurement of the same qubit, x {s,t} = m {s,t} ⊕m {s,t−1} , where x i = 1 means a detection event at node i, while x i = 0 means no detection event (here ⊕ denotes XOR).There are two exceptions to this rule.First, for the column with t = 0, instead of non-existing m {s,−1} we use the parity of two neighboring data qubits in the initial state (if there is no error, we are supposed to get x {s,0} = 0).The second special case is for the last column of nodes, t = N r , which does not correspond to a physical round (physical rounds are t = 0, 1, ...N r − 1); in this case, instead of non-existing m {s,Nr} , we use the parity of neighboring data qubit readouts at the end (after the round N r − 1), so that x {s,Nr} = 0 again indicates the expected no-error situation.
A decoder's task is to use detection events on the error graph to choose one of two given complementary initial states of data qubits (initial parities of neighboring data qubits are given, so the decoder needs to determine only one bit of information).The decoder for this experiment used minimum-weight perfect matching algorithm [21,50,52], which connects detection events to each other (pairwise) or to a space-boundary.
In the conventional Pauli error model assumed by the decoder [21], the detection events can be produced only in pairs, corresponding to the edges of the error graph (for the space-boundary edges, only one detection event near the boundary is produced).There are 3 types of such edges -see Fig. S14.Spacelike (S) edges connect nodes {s, t} and {s + 1, t} (the boundary S-edges connect nodes {0, t} and {N mq − 1, t} to the corresponding spaceboundaries), timelike (T) edges connect nodes {s, t} and {s, t + 1}, and spacetimelike (ST, "diagonal") edges connect nodes {s, t} and {s + 1, t + 1}.In the conventional Pauli error model, a single physical error corresponds to an edge of the error graph.
Note that if two physical errors occur in edges sharing a node (see Fig. S14), then there will be no detection event at this node: two detection events at the same node cancel each other.Therefore it is better to say that a physical error flips color (black↔red, x i → 1 − x i ) of two nodes, instead of producing two detection events.Now let us discuss how to find the probability p ij of a physical error, which flips colors of both nodes i and j, using experimental statistics of detection events.From experimental data we see that such processes may occur not only when a pair of nodes is connected by a conventional edge on the error graph; therefore, we treat i and j as arbitrary nodes.However, we still assume that such pairs (edges) are uncorrelated with each other.In reality, sometimes there is a correlation between the edges (discussed later); so the assumption of the absence of correlation is a first approximation.
As mentioned above, p ij denotes the probability that two nodes i and j flip color simultaneously.These nodes can also flip color because of other edges connected to i and j separately.However, it is important that these additional flips are independent (uncorrelated) for i and j because they are caused by different physical errors.Therefore, we can consider three uncorrelated processes: node i flips color (x i → 1 − x i ) with some probability p i , similarly node j flips color with probability p j , and both nodes flip color with probability p ij .Since we start with the black color (x i = x j = 0), the joint probabilities P (x i , x j ) of detection or no detection events at nodes i and j are These formulas have obvious meaning, describing combinations of the three processes occurring or not occurring.Note that P (0, 0) + P (0, 1) + P (1, 0) + P (1, 1) = 1.The relations (S9) can also be expressed via the fractions of the detection events (often abbreviated as DEF: detection event fraction) for each node, x i = P (1, 0)+P (1, 1) and x j = P (0, 1) + P (1, 1), and the probability of both detection events, x i x j = P (1, 1), which gives Solving these equations for p ij , p i , and p j , we obtain S11) We can think about p ij as a symmetric matrix, p ji = p ij , with indices corresponding to the nodes ordered either in the "time-first" way (S7) or in the "space-first" way (S8) -see Figs.S15 and S16 discussed later.Formally, in Eqn.(S11) the diagonal elements are the detection fractions, p ii = x i ; however, we usually set them to zero, p ii ≡ 0, for clarity of graphical presentation.
Note that in the experimentally relevant case when p ij 1/4, Eqn.(S11) can be approximated as (i = j) Equation (S13) for p ij is Eqn.(2) of the main text.This form shows a clear relation of p ij to the covariance x i x j − x i x j ; however, the correction due to the denominator is typically quite significant.For example, for x i x j 0.11 (see Fig. 1 of the main text), the denominator in Eqn.(S13) is about 0.6.The approximation (S13) slightly overestimates Eqn.(S11), the correction factor is roughly (1 − 3p ij ).
Equation (S11) allows us to find accurate individual error probabilities for S, T, and ST edges of the error graph, which are needed for the minimum-weight decoder.However, there is an important exception: the error probability for a boundary S-edge cannot be obtained in this way because it contains only one node.To find the error probability p i B for a boundary edge from node i, we use Eqn.(S12), p i,Σ = ( x i − p i B )/(1 − 2p i B ), in which the "individual flip" probability p i,Σ is calculated from already calculated error probabilities for S, T, and ST edges connected to the node i.We essentially sum up the known error probabilities of the connected edges and find the missing error probability (due to the boundary edge) to bring the sum to the DEF x i .Note, however, that it is not a simple sum of the probabilities because of the "color flipping" procedure, so that the errors p ij1 , p ij2 , ... p ij k due to k connected edges produce the total flip probability p i,Σ = g(p ij k , ... g(p ij3 , g(p ij2 , p ij1 ))...), (S14) g(p, q) ≡ p(1 − q) + (1 − p)q = p + q − 2pq.
(S15) Thus, after finding p i,Σ , we calculate the boundary Sedge probability as Note that this procedure for boundary edges assumes that error processes corresponding to different edges are uncorrelated.In reality this is not a very good assumption (this is why we are actually using a slightly different procedure for boundary edges).A natural way to estimate the effect of correlation between the edges is to use Eqn.(S14) for a node i not close to a boundary, summing up the contributions from all connected edges and then comparing the result with the DEF x i .Doing this test for the phase-flip experiment, we typically find a relative inaccuracy of about 4% (median value), which indicates a reasonably small but still nonzero correlation between the main edges (for the bit-flip experiment the median relative inaccuracy is about 9%).A natural way of thinking about positive correlations between the edges is to assume that some error processes flip color of 4, 6, ... nodes on the error graph, so that the same process increases p ij for several pairs of nodes (this also produces unconventional edges on the error graph reported by p ij ).To study correlations between edges, we have generalized the method of p ij to 3-point and 4-point correlators (essentially the "hyperedges"), extending the approach of Eq. (S9) to account for more nodes and more error processes.This generalization will be described in a future publication.

B. Fluctuations of the pij elements
When evaluating Eqn.(S11) using experimental data, the p ij values exhibit statistical fluctuations because the averages x i x j , x i , and x j are estimated from a large but finite number N expt of experimental realizations (typical values of N expt are between 10 3 and 10 5 ).In this section we estimate the standard deviation σ pij of statistical fluctuations of the p ij elements.
For the estimate, let us use the approximation (S13) and assume the usual experimental case when x i 1, x j 1, and p ij 1.Then the effect of the denominator fluctuations is negligible in comparison with fluctuations of the numerator (covariance C ij ), so (S17) Using the form C ij = (x i − x i )(x j − x j ) and using in it true averages x i and x j instead of averages over N expt realizations (the effect of the change is negligible), we find The variance here is ( , in which the first term can be rewritten after some algebra as ), using the properties x 2 i = x i and x 2 j = x j .Inserting this form into Eqn.(S17) and using (S19) Note that the first and second terms in the numerator of Eqn.(S19) have a clear meaning and can be obtained separately.When p ij is well above the statistical noise floor, σ pij mainly comes from fluctuation of the number of realizations, in which the edge error (color flipping event) has occurred: N expt p ij ± N expt p ij (1 − p ij ), as follows from the binomial statistics.It is easy to see that this leads to the first term in Eqn.(S19).The second term is the noise floor, coming from the fluctuations of x i , x j , and x i x j when p ij = 0.It can be obtained, e.g., by considering the number of realizations with x i = 1: , and realizations with x i = 0 and x j = 1: (also with uncorrelated ±).Then calculating the apparent value of the covariance C ij and using it in Eqn.(S17), we obtain the noise floor, which gives the second term in Eqn.(S19).
As a final simplification, let us neglect the factors (1 − p ij ) and (1 − x i )(1 − x j ) in Eqn.(S19) (this slightly increases σ pij , so we are on the safe side), thus obtaining In our repetition phase-flip code experiments, we have N expt = 76, 000 realizations and the detection error fractions are x i x i 0.11 (slightly bigger, 0.12 in the bit-flip experiments).Thus, the standard deviation of the experimental p ij values that are nominally zero (noise floor) is roughly In particular, this is the noise floor seen in the p ij matrix plots shown in Figs.S15 and S16.Additional averaging over the rounds leads to even smaller noise floor (< 2 × 10 −4 ) in Fig. 2(c) of the main text.

C. Experimental results for pij
Figure S15 shows the correlation matrix p ij for a phaseflip code experiment with 21 qubits (N mq = 10 measure qubits and 11 data qubits) and N r = 30 rounds.In this particular experiment, no cosmic rays events were detected, so no data was discarded from N expt = 76, 000 runs.The error graph nodes i and j are ordered in the "time-first" way given by Eqn.(S7). Figure S15 contains 310×310 pixels, with the color of each pixel determined by the value of the corresponding p ij element.Each axis contains N mq = 10 blocks (see grid lines) corresponding to 10 measure qubits indicated on the axes; each block contains N r + 1 = 31 points (small ticks on the axes) corresponding to time rounds.
We see that most pixels in Fig. S15 (which are away from the features discussed below) have values close to zero.The fluctuations are consistent with the expected noise floor given by Eqn.(S21).The figure is symmetric across the main diagonal (which runs bottom-left to topright) because p ji = p ij .The values on the main diagonal are set to zero.
The most visible features are 4 diagonal lines (2 from each side of the main diagonal), which correspond to S and T edges of the error graph: the T-edge line contains pixels next to the main diagonal, while S-edge line is N r + 1 pixels away from the main diagonal.The color scale for S and T lines is saturated because the values of p ij for these lines are around 0.03; they are shown in Fig. S17 discussed in more detail below.There is also a less visible line in Fig. S15 next to the S-line (one pixel farther, N r + 2, from the main diagonal), which corresponds to ST edges.The typical values of p ij for the ST-line are around 0.004.Another well-visible feature in Fig. S15 is a reddish "dirt" near S and T lines for qubits mq1 and mq2 and to a less extent for some other qubits; we attribute this feature to leakage to state |2 in a data qubit.One more feature is short lines ("scars") parallel to the main diagonal, which we attribute to crosstalk.The leakage and crosstalk are discussed later.
S, T, and ST edges.In the conventional theory of the repetition QEC code, the errors are associated only with S, T, and ST edges.The elements of p ij show the probabilities of these errors individually for each edge on the error graph.We emphasize that these probabilities are obtained in situ, during the actual operation of the code, in contrast to estimates based on qubit coherence and gate fidelities.
As expected from the conventional theory, S, T, and ST edges are the main features in Fig. S15.The values of p ij elements for these edges are shown in Fig. S17 by blue markers for S-edges, red markers for T-edges, and green markers for ST-edges; the lines are a guide for the eye.The S-edge error probabilities for the boundary edges (denoted dq0 and dq10 in Fig. S17) are calculated using Eqs.(S14)-(S16); we see that their values are consistent with other S-edges.Each block of blue markers corresponds to a particular data qubit (indicated at the top), markers within a block correspond to time rounds (from 0 to 30, see the horizontal axis).Note that S-edge probabilities for rounds t = 0 and t = 30 are significantly smaller than for other rounds (emphasizing the need of many rounds in an experiment).This is because S-edge errors in our phase-flip code are mainly due to dephasing of data qubits during readout and reset (or due to energy relaxation for a bit-flip code), while the special rounds t = 0 and t = N r do not have these parts of the cycle.For other rounds, the error probability p ij can be crudely estimated as τ /2T 2 , where τ is the readoutand-reset time (expected contribution from CZ gates is significantly smaller).In our experiment, τ = 0.88 µs and on average T 2 16 µs, which gives τ /2T 2 0.028.We see that p ij values for S edges (blue symbols) are close to this estimate, though they are different for different data qubits, mostly reflecting variation in T 2 times and also having contributions from gate errors.The integrated histogram for the S-edges is shown by the blue line in the left panel of Fig. S18; the median p ij value is (S22) The T-edge errors (red symbols in Fig. S17) are grouped in blocks corresponding to measure qubits indicated below the red symbols.The T-edge errors are expected to come mainly from the readout errors, but there are also contributions from the gate errors and reset error.Our median readout error is around 0.018; however, the p ij values are considerably higher, with the median value (see the integrated histogram in Fig. S18) of (S23) The error probabilities for ST edges (green symbols in Fig. S17) are much lower than for S or T edges; they are supposed to come mainly from CZ gate errors.The integrated histogram in Fig. S18 (green line) shows for ST edges the median value of Unconventional edges.Figure S15 clearly shows that in contrast to what is expected from the conventional QEC theory, some correlations between the detection events correspond to error graph edges different from the S, T, and ST types.In particular, there are significantly non-zero p ij values near the lines corresponding to T and S edges, separated from them by a few rounds.The integrated histogram for some types of these edges is shown in the right panel of Fig. S18).As illustrated by the inset, with ST we denote the "diagonal" edges similar to the ST edges, but going into the other direction.With 2T, 3T, etc. we denote the edges spanning 2, 3, etc. rounds for the same measure qubit.We see that out of the unconventional edges, 2T edges have the highest typical probability (the median of 1.7 × 10 −3 ), which is still more than twice smaller than the typical ST-edge probability.A relatively small probability of unconventional edges indicates a high quality of the experiment.Note that before the qubit reset [33] was implemented, the unconventional-edge probabilities were much higher, with 2T probabilities exceeding ST probabilities.
The negative values of p ij for a small fraction of unconventional edges shown in Fig. S18 are consistent with the statistical noise level (S21).Note, however, that in some cases, for example, for 2T edges in a high-quality bit-flip experiment, the p ij values can actually be slightly negative.This can be understood using Eqn.(S13) as a negative correlation.Indeed, a negative correlation between the nodes can be caused by a negative correlation between the edges.An example is the second-order anticorrelation due to data qubit energy relaxation (an energy relaxation event cannot be immediately followed by another relaxation event), which may cause slightly negative p ij in a bit-flip repetition code experiment [33].
Figure S16 shows the same data as Fig. S15 but with the different ordering of nodes: here we use the "spacefirst" ordering from Eqn. (S8).Then each axis contains N r + 1 = 31 blocks corresponding to time rounds (grid lines), while N mq = 10 points within each block correspond to measure qubits.The S-edges are next to the main diagonal, the T-edges are the diagonal lines separated by 10 pixels from the main diagonal, and the ST edges are on the next diagonal line (11 pixels from the main diagonal).The parallel lines in Fig. S15 separated by 20, 30, etc. pixels from the main diagonal correspond to 2T, 3T, etc. edges.The figure clearly shows that temporal correlations can survive for over 5 rounds.
Leakage to state |2 .We attribute the detectionevent correlations lasting for several rounds, as seen in Fig. S16, to the leakage to state |2 in data qubits.The same effect causes the "dirt" in Fig. S15 close to S and T lines, with the magnitude of the correlations for several edge types shown in the right panel of Fig. S18.Note that measure qubits are reset to |0 at every round, so noncomputational states can survive only in data qubits.For a typical qubit energy relaxation time of T 1 15 µs and the round duration of 960 µs, we would expect that state |2 should survive on a data qubit for about 8 rounds.Examining Figs.S18 and S16, we see that this estimate is in the right ballpark, but the actual decay of the state |2 can be significantly faster due to hopping of leakage, a subject of ongoing research.
We have found that the amount of leakage is sensitive to minor experimental details.The p ij technique can be used for a fast diagnostic to estimate the level of leakage and to find which qubits suffer a bigger leakage.Specialized experiments have shown [33] that a typical probability of state |2 in a data qubit is around 4 × 10 −3 .This magnitude is consistent with the values we extract from the p ij analysis.While this analysis is somewhat involved, we note that ST edges have a somewhat similar (though smaller) p ij values due to leakage.For our phase-flip code experiment, the median value for STedge errors is 1.3 × 10 −3 , while the biggest value (averaged over rounds) is 3.3 × 10 −3 for data qubit dq2 (as can be seen from Fig. S15, dq2 has the biggest leakage).So, as a crude proxy for leakage, we can use The 2T edges can also be used to estimate leakage; the biggest 2T-edge value (averaged over rounds) is 3.6×10 −3 for measure qubit mq2.(All these values are for the phase-flip code; for a bit-flip code there is an additional contribution from "odd-even correlations" due to energy relaxation of data qubits).Note that during several rounds while a data qubit is in state |2 , there is a relatively high probability of detection events at the neighboring measure qubits [33].This leads to a significant correlation between S-edges (and also T-edges), which negatively affects performance of the minimum-weight-matching decoder.This is why leakage is dangerous for quantum error correction even for a relatively low leakage probability.
Crosstalk features.Short parallel lines ("scar" features) in Fig. S15 far away from the main diagonal indicate the presence of correlations between detection events at qubits, which are far apart along the 1D line of qubits used in the experiment.However, they are actually close to each other on the Sycamore chip -see the top panel of Fig. S19, which shows 10 pairs of measure qubits (indicated by arrows), for which there are visible scars in Fig. S15.We attribute these scar features to the crosstalk.
The lower panel of Fig. S19 shows the values of sameround p ij elements averaged over the rounds, for all pairs of measure qubits except nearest neighbors.While most values are within the statistical noise level, the elements corresponding to the scar features are significantly above the noise floor (bigger values are indicated by orange and green cells).We see that the magnitude of the crosstalk correlations is For the crosstalk pairs shifted in time by one round we find crudely twice smaller edge probabilities.
The long-range correlation between detection events caused by crosstalk are dangerous to the code operation because they can effectively reduce the code distance.However, we see that in our device the crosstalk is quite small and, most importantly, local in physical distance on the chip.Therefore, we expect that in the future it will not present a serious problem in a surface code operation.The color of each pixel depicts the probability pij for an error process involving error graph nodes i and j.The nodes are ordered in the "time-first" fashion, Eqn.(S7), with 10 blocks (separated by grid lines) corresponding to measure qubits (mq0, mq1, ... mq9) and 31 ticks within each block corresponding to time rounds (from t = 0 to t = Nr).The main features are the diagonal lines corresponding to T, S, and ST edges, which are shifted from the main diagonal by 1, 31, and 32 pixels, respectively (ST line is more faint than T and S lines).Additional features are reddish ("dirty") patches near S and T lines, which are due to leakage to state |2 in data qubits, and also short parallel lines ("scars") due to crosstalk.Note that the color bar ranges to 0.007, while probabilities for S and T edges are above this truncation.

VII. COMPARISON OF EDGE WEIGHTING METHODS FOR MATCHING
To decode the error detections obtained in the experiment, we use a minimum weight perfect matching algorithm to determine which physical errors were most likely given the observed directions.A key component of this algorithm is the weighting of the edges in the error graph which correspond to the expected correlated probabilities of pairs of nodes.The weight of a particular edge (W ) and the expected probability for that edge (p) are related by which satisfies the property that adding the weights of two edges corresponds to multiplying their probabilities.We considered four candidate strategies for determining expected edge probabilities and weights: 1. Uniform weighting -assume that all edges in the matching graph are equally likely 2. Bootstrapping -Run matching on a training dataset with uniform weights, then for a given edge, count the number of times it was matched and divide by the number of total experiments to compute the expected probability for future matches.
3. Node correlations (p ij ) -Use the node correlation technique described in Section VI to determine the correlated probabilities for edges from a training dataset.
4. First principles -From the measured gate, measurement, and reset error probabilities, compute the edge probabilities by propagating possible errors through the circuit.
For methods 2 and 3, we use the data at 50 rounds to determine the matching weights for all other datasets.While these methods can in general produce a unique weight for each edge in the 50 round graph, we average together all rounds so that the edge weights used during matching are uniform in time.Phase flip and bit flip edge weights, as well as weights for each of the smaller subsampled codes, are determined separately.
In Tab.S5, we show the fitted values of Λ using the different weighting methods, for both the bit and phase flip codes.To within the uncertainty from fitting, we find that methods 2, 3, 4 all give the same result for Λ x and Λ z , while uniform weighting reduces Λ x to 2.7 and Λ z to 2.5.The primary effect of the more sophisticated weighting methods is to increase the weights of spacetime edges relative to spacelike and timelike edges.

VIII. DYNAMICAL DECOUPLING OF DATA QUBITS
The measurement and reset operations take 880 ns to complete and account for approximately 92% of the time spent for the duration of the phase flip code (see Fig. S20).Leaving data qubits to idle during these operations, we undergo energy relaxation processes in addition to dephasing processes, accounting for a large portion of the total error budget.The process of measurement  and reset on the measure qubits introduces additional avenues of error including measurement-induced dephasing from photon crosstalk between readout resonators [53], as well as frequency detuning errors incurred from any flux crosstalk between qubits.While energy relaxation is irreversible and cannot be mitigated here, dephasing can be mitigated using dynamical-decoupling techniques.We employ multi-pulse sequences developed within the field of NMR which have been shown to mitigate lowfrequency noise in superconducting qubits [54]: Carr-Purcell (CP) [55], Car-Purcell-Meiboom-Gill (CPMG) [56], XY4, and XY8 [57].
With independent phase coherence measurements, we verified that we were able to effectively decouple the qubits from the noise sources listed above.Using CPMG, we verified independently via phase coherence measurements with and without adversarial readout tones, as well as with and without large frequency excursions on neighboring qubits, that we are able to effectively decouple away the intrinsic low-frequency noise, measurementinduced dephasing on the data qubits caused by crosstalk from measure, as well as any flux crosstalk effects.We then evaluated the performance of each dynamical decoupling protocol within the context of the repetition code.For all of the decoupling sequences, we fix the time between pulses such that every sequence has the same total idle time and executes the same number of gates (see Fig. S21).The fixed idle time was set such that each sequence performed eight gates.Using decoupling, we see an ∼1.7× increase in the error suppression factor, Λ (Fig. S22).To compare the performance of the different decoupling schemes, the experiment was run and analyzed a total of five times for each of the schemes (Idle, CP, CPMG, XY4, and XY8).The performance between schemes was comparable with the CPMG and XY4 sequences slightly outperforming the CP and XY8 sequences.

IX. QUBIT FREQUENCY OPTIMIZATION
Our processor employs frequency-tunable qubits [31].Quantum logic gates are executed at two distinct types of frequencies: idle and interaction frequencies, which are collectively referred to as gate frequencies.Qubits idle and execute single-qubit gates at their respective idle frequencies.Neighboring qubit-pairs execute CZ gates at their respective interaction frequencies.All gate frequencies are explicitly or implicitly interdependent due to engineered interactions and/or crosstalk according to the repetition code circuit and its mapping onto our processor.Since many error mechanisms are frequency dependent, we can mitigate errors by constructing and optimizing an error model with respect to gate frequencies.
To construct an error model, we combine error contributions from Z pulse-distortion, relaxation, dephasing, and qubit crosstalk.The Z pulse-distortion model penalizes CZ gates for large frequency excursions.The relaxation and dephasing models penalize SQ and CZ gates for approaching relaxation and dephasing hotspots, while incorporating coupler physics, qubit hybridization, statedependent transitions, and hardware-accurate frequency trajectories.Finally, the qubit-crosstalk model penalizes for frequency collisions between nearest-neighbor (NN) and diagonal next-nearest-neighbor (NNN) qubits, while incorporating qubit hybridization and the mapping of the repetition code circuit onto our processor.These constituent models are determined via theory and/or experiment, consolidated, and then trained to be predictive of experimentally measured error benchmarks via machine learning.
To determine a frequency configuration that mitigates error, we optimize the error model with respect to gate frequencies.Optimization is complex since the error model spans 41 frequency variables, is non-convex, and time-dependent [58].Furthermore, since each frequency variable is constrained to ∼10 2 values by the control hardware and qubit-circuit parameters, the optimization search space is ∼2 272 , which significantly exceeds the Hilbert-space dimension 2 21 .Given the optimization complexity, exhaustive search is intractable and global optimization is too slow and inefficient.To quickly and efficiently find locally optimal gate-frequency configurations and maintain them in the presence of drift, we use our Snake optimizer [36].
To illustrate the performance of our error mitigation strategy, we conduct a qubit-crosstalk mitigation experiment (see Fig. S23).In this experiment, we first optimize our processor employing one of three qubit-crosstalk mitigation strategies.We then calibrate the processor and run the bit-flip repetition code.The three mitigation strategies are labelled "none", "partial", and "full", according to the expected degree of crosstalk protection.In the "none" strategy, we do not penalize for crosstalk.In the "partial" strategy, we penalize for crosstalk according to the cross-entropy benchmarking (XEB) circuit [31], which we often use in calibration.Although XEB and the repetition code have different circuits and serve different purposes, their respective circuits have similar gate patterns (see Fig. S25 of Ref. [31]).Because of this similarity, penalizing for crosstalk according to XEB should also offer partial crosstalk protection for the repetition code.Finally, in the "full" strategy, we penalize for crosstalk according to the repetition code circuit that we run.
To quantify the efficacy of the three mitigation strategies, we inspect bit-flip repetition-code detection event fractions (DEF).We see that by increasing the degree of crosstalk mitigation from "none" to "partial" to "full", the median DEF is reduced by 33% and 7%, respectively.Furthermore, the DEF standard-deviation is reduced by 82% and 51%, respectively.In total, this amounts to a 38% reduction in median DEF and a 91% reduction in the DEF standard-deviation, representing a significant performance boost.We delegate error mitigation data for other error mechanisms to a future publication.b, Simultaneously active SQ (H or I) and CZ gates (blue nodes and edges, respectively) at each temporal slice.The geometry of active gates is determined by the repetition code circuit and its mapping onto our processor.Simultaneously active gates can crosstalk due to parasitic interactions between NN and NNN qubits.c, Crosstalking SQ and CZ gates (orange nodes and edges, respectively) for one active SQ or CZ (blue nodes and edges, respectively) gate at each temporal slice.We mitigate crosstalk and other error mechanisms by constructing and optimizing an error model with respect to gate-frequencies.d, Three crosstalk mitigation strategies illustrated for one active CZ gate in the upper temporal slice in ac.The strategies are labelled "full", "partial", and "none", according to the degree of expected crosstalk protection.Each strategy can be characterized by domains (red) in which crosstalk is penalized.e, Bit-flip repetition code benchmarks for each mitigation strategy.The points and error bars represent the DEF median and standard-deviation, respectively.By increasing the mitigation strength from "none" to "full", the DEF median and standard-deviation are reduced by 38% and 91%, respectively.

X. OVERVIEW OF ERROR CORRECTION EXPERIMENTS
In Table S6, we list experimental implementations of quantum error correction as a reference.
FIG. 1.Stabilizer circuits on Sycamore.a, Layout of distance-11 repetition code and distance-2 surface code in the Sycamore architecture.In the experiment, the two codes use overlapping sets of qubits, which are offset in the figure for clarity.b, Pauli error rates for gates and identification error rates for measurement.All benchmarks are for simultaneous operation.c, Circuit schematic for the phase flip code.Data qubits are randomly initialized into |+ or |− , followed by repeated application of XX stabilizer measurements and finally X-basis measurements of the data qubits.d, Illustration of error detection events which occur when a measurement disagrees with the previous round.e, Fraction of measurements which detected an error versus measurement round for the d = 11 phase flip code.The dark line is an average of the individual traces (gray lines) for each of the 10 measure qubits.The first (last) round also uses data qubit initialization (measurement) values to identify parity errors and generate detection events.

FIG. 2 .
FIG.2.Analysis of error detections.a, Detection event graph.Errors in the code trigger two detections (except at the ends of the chain), each represented by a node, and edges represent the expected correlations due to data qubit errors (spacelike and spacetimelike) and measure qubit errors (timelike) b, Ordering of the measure qubits in the repetition code.c, Measured two point correlations (pij) between detection events represented as a symmetric matrix.The axes correspond to possible locations of detection events, with major ticks marking measure qubits (space) and minor ticks marking difference in rounds (time).For the purposes of illustration, we have averaged together the matrices for 4-round segments of the 50-round experiment shown in Fig.1e, and also set pij = 0 if i = j.The upper triangle shows the full scale, where only the expected spacelike and timelike correlations are apparent.The lower triangle shows a truncated color scale, highlighting unexpected correlations due to crosstalk and leakage.Note that crosstalk errors are still local in the 2D array.d, (Top) Observed high energy event in a time series of repetition code runs.(Bottom) Zoom in on high energy event, showing rapid rise and exponential decay of device wide correlated errors, and data which is removed when computing logical error probabilities.
VII. AUTHOR CONTRIBUTIONS Z. Chen, K. Satzinger, H. Putterman, A. Fowler, A. Korotkov and J. Kelly designed the experiment.Z. Chen, K. Satzinger, and J. Kelly performed the experiment, and analyzed the data.C. Quintana, K. Satzinger, A. Petukhov, and Y. Chen developed the controlled-Z gate.M. McEwen, D. Kafri, A. Petukhov, and R. Barends developed the reset operation.M. McEwen and R. Barends performed experiments on leakage, reset, and high energy events in error correcting codes.D. Sank and Z. Chen developed the readout operation.A. Dunsworth, B. Burkett, S. Demura, and A. Megrant led the design and fabrication of the processor.J. Atalya and A. Korotkov developed and performed the p ij analysis.C.
FIG. S1. a, Detection event fraction for a 50 round bit flip code, similar to Fig. 1d of the main text.b, pij correlation matrix for the 50 round bit flip code, similar to Fig. 2c of the main text FIG. S2. a, Logical error probabilities vs number of detection rounds for the bit flip code, similar to Fig. 3a of the main text.b, Semilog plot of logical error probabilities, similar to Fig. 3b of the main text.Lines depict fits to 2Perror = 1 − (1 − 2 L)n rounds as in the main text for rounds greater than 10.
FIG. S3. a, How much data was discarded for each run of the repetition code, in both X and Z bases b.Logical error probabilities for the phase flip code if high energy events are kept.Compare with Fig. 3b. of the main text.c, Logical error probabilities for the bit flip code if high energy events are kept.Compare with Fig. . a

FIG
FIG. S8.Example of subsampling a d = 5 repetition code dataset into 3 d = 3 repetition code datasets.
FIG.S9.Simulated repetition code data for 10 QEC rounds and 21 qubits.The plot shows detection event fraction as a function of round.We find a uniform behavior of detection fraction in the intermediate rounds, and different values at the first and last rounds of the code, which differ in circuit structure.
FIG. S12.Simulations of logical-error probability for repetition codes using Pauli-channel noise calibrated to component errors measured in the device.a, Logical error vs. number of syndrome rounds for the bit flip code.b, Same data as panel a (bit flip code), plotted on a log-scaled vertical axis.c, Logical error vs. number of syndrome rounds for the phase flip code.d, Same data as panel c (phase flip code), plotted on a log-scaled vertical axis.
FIG.S13.Logical error vs. code distance for the repetition codes, and a fit to estimate Λ for the two codes.
FIG. S14.Error graph and main edges.An example of the error graph for Nmq = 4 measure qubits (5 data qubits) and Nr = 8 time rounds.The horizontal axis shows numbering of rounds (t coordinate), the vertical axis shows numbering of measure qubits mq0-mq3 (s coordinate).The dots denote the graph nodes; red dots indicate detection events.The vertical, horizontal and diagonal edges are denoted as Spacelike (S) [including the Boundary (B)], Timelike (T) and Spacetimelike (ST) edges.Positions of data qubits dq0-dq4 (not used in the error graph) are indicated at the left.
FIG. S15.Correlation matrix pij.A graphical representation of the 310×310 symmetric matrix pij [Eqn.(S11)] for a phase-flip repetition code experiment with Nmq = 10 measure qubits (11 data qubits) and Nr = 30 rounds.The color of each pixel depicts the probability pij for an error process involving error graph nodes i and j.The nodes are ordered in the "time-first" fashion, Eqn.(S7), with 10 blocks (separated by grid lines) corresponding to measure qubits (mq0, mq1, ... mq9) and 31 ticks within each block corresponding to time rounds (from t = 0 to t = Nr).The main features are the diagonal lines corresponding to T, S, and ST edges, which are shifted from the main diagonal by 1, 31, and 32 pixels, respectively (ST line is more faint than T and S lines).Additional features are reddish ("dirty") patches near S and T lines, which are due to leakage to state |2 in data qubits, and also short parallel lines ("scars") due to crosstalk.Note that the color bar ranges to 0.007, while probabilities for S and T edges are above this truncation.
FIG. S16.Matrix pij in space-first node ordering.The figure shows the same data as in Fig. S15, but with the nodes ordered in the "space-first" fashion of Eqn.(S8).Each axis contains Nr + 1 = 31 blocks with Nmq = 10 points (ticks) within each block.The lines for S, T, and ST edges are shifted from the main diagonal by 1, 31, and 32 pixels, respectively.Short dashed lines correspond to 2T, 3T, ... edges, which connect nodes separated by ∆t = 2, 3, ... rounds.The well-visible diagonal stripes indicate the presence of long-time correlations in detection events lasting for over 5 rounds.

FIG
FIG. S17.S, T, and ST errors.The plot shows the error probabilities pij for S (spacelike), T (timelike), and ST (spacetimelike) edges for the data in Fig.S15(phase-flip code, 10+11 qubits, 30 rounds).For S-edges (blue symbols) the corresponding data qubits dq0-dq10 are indicated at the top, 31 points within each block correspond to rounds.The S-edge probabilities for boundary data qubits dq0 and dq10 are calculated using Eqs.(S14)-(S16).For T-edges (red symbols), the corresponding measure qubits mq0-mq9 are indicated below the red symbols, each block contains 30 points.ST-edges (green symbols) are positioned in the same way as S-edges (without boundaries), with 30 points per block.Lines are a guide for the eye.
FIG. S19.Crosstalk error probabilities.Top panel: Layout of 10 measure qubits (black circles with integer labels) and 11 data qubits (gray-filled circles) on the Sycamore device.Arrows indicate the pairs of measure qubits that exhibit stronger (red arrows) and weaker (orange arrows) detectionevent correlations due to crosstalk.Bottom panel: Effective crosstalk probabilities between pairs of measure qubits (except for nearest neighbors).We show the values of pij × 10 3 for same-round pij elements averaged over rounds.Cells are colored according to the values: yellow and green indicate a significant crosstalk, blue indicates statistical noise.The biggest crosstalk of 2.2 × 10 −3 is between mq4 and mq6 (leftmost arrow in the top panel).

a
FIG. S20.Stabilizer Circuit.a, Circuit schematic representation of the stabilizer circuit.Layers of single qubit and two qubit gates highlighted in blue.Measurement, reset, and dynamical decoupling operations highlighted in yellow to correspond to the waveforms in b, Rendered waveforms to show that the majority of the time spent during the stabilizer is during the measurement and reset operations.Lines represent microwave control (XY), flux control (Z), and readout for the stabilizer circuit for one data qubit (blue) and one measure qubit (red).
FIG. S22.Benchmarking phase flip performance with and without dynamical decoupling.b Detection event fractions vs qubit and round for each of the data qubit Idle, CP, CPMG, XY4, and XY8 operations during measure qubit readout and reset.Median detection event fraction by round plotted in black.b, Logical error rate vs number of qubits, showing exponential suppression of error rate in all cases.c, Boxplot of extracted error suppression factors (Λ) from fits like those shown in b, for five iterations of the experiment for each decoupling scheme.Overall, we see an 1.7x increase in Λ for all decoupling schemes.The performance between the various decoupling schemes is comparable.
FIG.S23.Qubit-crosstalk mitigation.a, The repetition code, with three distinct temporal slices indicated by dashed boxes.The empty boxes in the lowest temporal slice are either H or I depending on whether we run the bit-or phase-flip code.b, Simultaneously active SQ (H or I) and CZ gates (blue nodes and edges, respectively) at each temporal slice.The geometry of active gates is determined by the repetition code circuit and its mapping onto our processor.Simultaneously active gates can crosstalk due to parasitic interactions between NN and NNN qubits.c, Crosstalking SQ and CZ gates (orange nodes and edges, respectively) for one active SQ or CZ (blue nodes and edges, respectively) gate at each temporal slice.We mitigate crosstalk and other error mechanisms by constructing and optimizing an error model with respect to gate-frequencies.d, Three crosstalk mitigation strategies illustrated for one active CZ gate in the upper temporal slice in ac.The strategies are labelled "full", "partial", and "none", according to the degree of expected crosstalk protection.Each strategy can be characterized by domains (red) in which crosstalk is penalized.e, Bit-flip repetition code benchmarks for each mitigation strategy.The points and error bars represent the DEF median and standard-deviation, respectively.By increasing the mitigation strength from "none" to "full", the DEF median and standard-deviation are reduced by 38% and 91%, respectively.

TABLE S1 .
Pauli error rates (bit flip error rates for measurement and reset) used in subsequent simulations. of distance d s can be subsampled from a larger code of distance d, where n = d − d s + 1 is the number of unique datasets one could produce.This can be understood by considering a line of 9 qubits (for d = 5

TABLE S2 .
Error rates used in bit and phase flip simulations

TABLE S3 .
Error Budget for bit flip code.

TABLE S5 .
Error suppression factors (Λx, Λz for phase and bit flip) and multiplicative constants (Cx and Cz) fit to logical error rates vs code distance (Eqn. 1 of the main text) for the four different edge weighting methods.