High-threshold and low-overhead fault-tolerant quantum memory

The accumulation of physical errors1–3 prevents the execution of large-scale algorithms in current quantum computers. Quantum error correction4 promises a solution by encoding k logical qubits onto a larger number n of physical qubits, such that the physical errors are suppressed enough to allow running a desired computation with tolerable fidelity. Quantum error correction becomes practically realizable once the physical error rate is below a threshold value that depends on the choice of quantum code, syndrome measurement circuit and decoding algorithm5. We present an end-to-end quantum error correction protocol that implements fault-tolerant memory on the basis of a family of low-density parity-check codes6. Our approach achieves an error threshold of 0.7% for the standard circuit-based noise model, on par with the surface code7–10 that for 20 years was the leading code in terms of error threshold. The syndrome measurement cycle for a length-n code in our family requires n ancillary qubits and a depth-8 circuit with CNOT gates, qubit initializations and measurements. The required qubit connectivity is a degree-6 graph composed of two edge-disjoint planar subgraphs. In particular, we show that 12 logical qubits can be preserved for nearly 1 million syndrome cycles using 288 physical qubits in total, assuming the physical error rate of 0.1%, whereas the surface code would require nearly 3,000 physical qubits to achieve said performance. Our findings bring demonstrations of a low-overhead fault-tolerant quantum memory within the reach of near-term quantum processors.


Introduction
Quantum computing attracted attention due to its ability to offer asymptotically faster solutions to a set of computational problems compared to the best known classical algorithms [1].It is believed that a scalable functioning quantum computer may help solve computational problems in such areas as scientific discovery, materials research, chemistry, and drug design, to name a few [2,3,4,5].
The main obstacle to building a quantum computer is the fragility of quantum information, owing to various sources of noise affecting it.Since isolating a quantum computer from external effects and controlling it to induce a desired computation are in conflict with each other, noise appears to be inevitable.The sources of noise include imperfections in qubits, materials used, controlling apparatus, State Preparation and Measurement (SPAM) errors, and a variety of external factors ranging from local man-made, such as stray electromagnetic fields, to those inherent to the Universe, such as cosmic rays.See Ref. [6] for a summary.While some sources of noise can be eliminated with better control [7], materials [8], and shielding [9,10,11], a number of other sources appear to be difficult if at all possible to remove.The latter kind can include spontaneous and stimulated emission in trapped ions [12,13], and the interaction with the bath (Purcell Effect) [14] in superconducting circuits-covering both leading quantum technologies.Thus, error correction becomes a key requirement for building a functioning scalable quantum computer.
The possibility of quantum fault tolerance was established earlier [15].Encoding a logical qubit redundantly into many physical qubits enables one to diagnose and correct errors by repeatedly measuring syndromes of parity check operators.However, error correction is only beneficial if the hardware error rate is below a certain threshold value that depends on a particular error correction protocol.The first proposals for quantum error correction, such as concatenated codes [16,17,18], focused on demonstrating the theoretical possibility of error suppression.As understanding of quantum error correction and the capabilities of quantum technologies matured, the focus shifted to finding practical quantum error correction protocols.This resulted in the development of the surface code [19,20,21,22] that offers a high error threshold close to 1%, fast decoding algorithms, and compatibility with the existing quantum processors relying on 2-dimensional (2D) square lattice qubit connectivity.Small examples of the surface code with a single logical qubit have been already demonstrated experimentally by several groups [23,24,25,26,27].However, scaling up the surface code to a hundred or more logical qubits would be prohibitively expensive due to its poor encoding efficiency.This spurred interest in more general quantum codes known as Low-Density Parity-Check (LDPC) codes [28].Recent progress in the study of LDPC codes suggests that they can achieve quantum faulttolerance with a much higher encoding efficiency [29].Here, we focus on the study of LDPC codes, as our goal is to find quantum error correction codes and protocols that are both efficient and possible to demonstrate in practice, given the limitations of quantum computing technologies.
A quantum error correcting code is of LDPC type if each check operator of the code acts only on a few qubits and each qubit participates only in a few checks.Multiple variants of the LDPC codes have been proposed recently including hyperbolic surface codes [30,31,32], hypergraph product [33], balanced product codes [34], two-block codes based on finite groups [35,36,37,38], and quantum Tanner codes [39,40].The latter were shown [39,40] to be asymptotically "good" in the sense of offering a constant encoding rate and linear distance -a parameter quantifying the number of correctable errors.In contrast, the surface code has an asymptotically zero encoding rate and only square-root distance.Replacing the surface code with a high-rate, high-distance LDPC code could have major practical implications.First, fault-tolerance overhead (the ratio between the number of physical and logical qubits) could be reduced dramatically.Secondly, high-distance codes exhibit a very sharp decrease in the logical error rate: as the physical error probability crosses the threshold value, the amount of error suppression achieved by the code can increase by orders of magnitude even with a small reduction of the physical error rate.This feature makes high-distance LDPC codes attractive for near-term demonstrations which are likely to operate in the near-threshold regime.However, it was previously believed that outperforming the surface code for realistic noise models including memory, gate, and SPAM errors may require very large LDPC codes with more than 10,000 physical qubits [31].
Here we present several concrete examples of high-rate LDPC codes with a few hundred physical qubits equipped with a low-depth syndrome measurement circuit, an efficient decoding algorithm, and a fault-tolerant protocol for addressing individual logical qubits.These codes exhibit an error threshold close to 1%, show excellent performance in the near-threshold regime, and offer more than 10X reduction of the encoding overhead compared with the surface code.Hardware requirements for realizing our error correction protocols are relatively mild, as each physical qubit is coupled by two-qubit gates with only six other qubits.Although the qubit connectivity graph is not locally embeddable into a 2D grid, it can be decomposed into two planar degree-3 subgraphs.As we argue below, such qubit connectivity is well-suited for architectures based on superconducting qubits.Before stating our results, let us describe several must-have features for a quantum error-correcting code to be suitable for near-term experimental demonstrations and formally pose the problem addressed in this work.

Code selection criteria
In this work, we study the problem of realizing a fault-tolerant quantum memory with a small qubit overhead and a large code distance.Our goal is to construct a combination of the LDPC code, syndrome measurement circuitry, and the decoding (error correction) algorithms, suitable for a near-term demonstration, but also offering long-term utility, while taking into account the capabilities and limitations of the superconducting circuits quantum hardware.In other words, we seek to develop a practical error correction protocol.Our selection criteria reflect this goal.
We focus on encoding k ≫ 1 logical qubits into n data qubits and use c ancillary check qubits to measure the error syndrome.In total, the code relies on n + c physical qubits.The net encoding rate is therefore For example, the standard surface code architecture encodes k = 1 logical qubit into n = d 2 data qubits for a distance-d code and uses c = n−1 check qubits for syndrome measurements.The net encoding rate is r ≈ 1/(2d 2 ), which quickly becomes impractical as one is forced to choose a large code distance, due to, for instance, the physical errors being close to the threshold value.In contrast, we seek a high-rate LDPC code with r ≫ 1/d 2 .
To prevent the accumulation of errors one must be able to measure the error syndrome frequently enough.This is accomplished by a syndrome measurement (SM) circuit that couples data qubits in the support of each check operator with the respective ancillary qubit by a sequence of CNOT gates.Check qubits are then measured revealing the value of the error syndrome.The time it takes to implement the SM circuit is proportional to its depth -the number of gate layers composed of non-overlapping CNOTs.Since new errors continue to occur while the SM circuit is executed, its depth should be minimized.Thus we seek an LDPC code with a high rate r and low-depth SM circuit.
A noisy version of the SM circuit may include several types of faulty operations such as memory errors on data or check qubits, faulty CNOT gates, qubit initializations and measurements.We consider the circuit-based noise model [22] where each operation fails with the probability p. Faults on different operations are independent.A logical error occurs when the final error-corrected state of k logical qubits differs from the initial encoded state.The probability of a logical error p L depends on the error rate p, details of the SM circuits, and a decoding algorithm.A pseudo-threshold p 0 of an error correction protocol is defined as a solution of the break-even equation p L (p) = kp.
Here kp is an estimate of the probability that at least one of k unencoded qubits suffers from an error.To achieve a significant error suppression in the regime p ∼ 10 −3 , which is relevant for near-term demonstrations, it is desirable to have pseudo-threshold close to 1% or higher.For example, the surface code architecture achieves pseudo-threshold p 0 ≈ 1% for a large enough code distance [22].We seek a high-rate LDPC code with a low-depth SM circuit and a high pseudo-threshold.
A logical error is undetectable if it can be generated without triggering any syndromes.Such errors span at least d data qubits for a distance-d code.Let us say that a SM circuit has distance d circ if it takes at least d circ faulty operations in the circuit to generate an undetectable logical error.By definition, d circ ≤ d for any distance-d code and typically d circ < d since a few faulty operations in the SM circuit may create a high-weight error on the data qubits.We say that a SM circuit is distance-preserving if d circ = d meaning the circuit is designed so as to avoid accumulating high-weight errors, which is the best one can hope for.It is preferred (but not required) that the SM circuit is distance-preserving.
Another criterion is dictated by the limited qubit connectivity of near-term quantum devices.Each quantum code can be described by a Tanner graph G such that each vertex of G represents either a data qubit or a check operator.A check vertex i and a data vertex j are connected by an edge if the i-th check operator acts non-trivially on the j-th data qubit (by applying Pauli X or Z). Figure 1 A) shows the Tanner graph describing a distance-3 surface code.To keep the SM circuit depth small, it is desirable that two-qubit gates such as CNOT can be applied along every edge of the Tanner graph.By construction, the Tanner graph of any LDPC code has a small degree.One drawback of high-rate LDPC codes is that their Tanner graphs may not be locally embeddable into the 2D grid [41,42].This poses a challenge for hardware implementation with superconducting qubits coupled by microwave resonators.A useful VLSI design concept is graph thickness, see [43,29] for details.A graph G = (V, E) is said to have thickness θ if one can partition its set of edges E into disjoint union of θ sets E 1 ⊔ E 2 ⊔ . . .⊔ E θ = E such that each subgraph (V, E i ) is planar.Informally, a graph with thickness θ can be viewed as a vertical stack of θ planar graphs.Qubit connectivity described by a planar graph (thickness θ = 1) is the simplest one from hardware perspective since the couplers do not cross.Graphs with thickness θ = 2 might still be implementable since two planar layers of couplers and their control lines can be attached to the top and the bottom side of the chip hosting qubits, and the two sides mated (see Section 10 for a detailed discussion).Graphs with thickness θ ≥ 3 are much harder to implement.Thus we seek a high-rate LDPC code with a low-depth SM circuit, high pseudo-threshold, and a low-degree Tanner graph with thickness θ ≤ 2.
Finally, the code must perform a useful function within a larger architecture for quantum computation, the simplest of which is a quantum memory.In a quantum memory it must be possible to measure every logical qubit in at least one Pauli basis, permitting initialization and readout of individual qubits.Furthermore it should be possible to connect the code to another error correction code and facilitate Pauli product measurements between their logical qubits.This enables load-store operations that transfer quantum data out of and into the code via quantum teleportation.For the purpose of the shorter-term goal of demonstrating the code in practice, the code should also feature enough logical operations to facilitate experiments to verify correct operation.
Our code selection criteria are summarized below.
1. We desire a code with a large distance d and a high encoding rate r ≫ 1/d 2 , 2. that is complemented by a short-depth syndrome measurement circuit, 3. offers a pseudo-threshold close to 1% (or higher) for the circuit-based noise model, 4. is constructed over thickness-2 or less Tanner graph, 5. and possesses fault-tolerant load-store operations as well as readout and initialization of individual qubits.

Main results
Here we give concrete examples of LDPC codes equipped with syndrome measurement circuits and efficient decoding algorithms that meet all above conditions.Our examples fall into the family of tensor product generalized bicycle codes proposed by Kovalev and Pryadko [35].We named our codes Bivariate Bicycle (BB) since they are based on bivariate polynomials, as detailed below.These are stabilizer codes of CSS-type [44,45] that can be described by a collection of few-qubit check (stabilizer) operators composed of Pauli X and Z.At a high level, a BB code is similar to the two-dimensional toric code [19].In particular, physical qubits of a BB code can be laid out on a two-dimensional grid with periodic boundary conditions such that all check operators are obtained from a single pair of X-and Z-checks by applying horizontal and vertical shifts of the grid.However, in contrast to the plaquette and vertex stabilizers describing the toric code, check operators of a BB code are not geometrically local.Furthermore, each check acts on six qubits rather than four qubits.See Figure 1 B) an example Tanner graph of a BB code.We give a formal definition of BB codes in Section 4. The Tanner graph of any BB code has vertex degree six.Although this graph may not be locally embeddable into a 2D grid, we show that it has thickness θ = 2, as desired.This result may be surprising since it is known that a general degree-6 graph can have thickness θ = 3, see [43].[144,12,12]] embedded into a torus.Any edge of the Tanner graph connects a data and a check vertex.Data qubits associated with the registers q(L) and q(R) are shown by bLue and oRange circles.Each vertex has six incident edges including four short-range edges (pointing north, south, east, and west) and two long-range edges.
There are also several long-range edges, of which we only show a few to to avoid clutter.Dashed and solid edges indicate two planar subgraphs spanning the Tanner graph, see Section 4. B) Sketch of a Tanner graph extension for measuring Z and X following [46].The ancilla corresponding to the X measurement can be connected to a surface code, enabling load-store operations for all logical qubits via quantum teleportation and some logical unitaries.This extended Tanner graph has a thickness-2 implementation, see Section 9.
Below we use the standard notation [[n, k, d]] for code parameters.Here n is the code length (the number of data  qubits), k is the number of logical qubits, and d is the code distance.Table 1 shows small examples of BB codes along with several metrics of the error suppression achieved by each codes.The distance-12 code [[144, 12, 12]] may be the most promising for near-term demonstrations, as it combines large distance and high net encoding rate r = 1/24.For comparison, the distance-13 surface code has net encoding rate r = 1/338.Below we show that the distance-12 BB code outperforms the distance-13 surface code for the experimentally relevant range of error rates, see Figure 2 B).
To the best of our knowledge, all codes shown in Table 1 are new.
To quantify the level of error suppression achieved by a code we introduce SM circuits that repeatedly measure the syndrome of each check operator.The full cycle of syndrome measurement for a length-n BB code requires n ancillary check qubits to store the measured syndromes.According, the net encoding rate is r = k/(2n).Check qubits are coupled with the data qubits by applying a sequence of CNOT gates.The full cycle of syndrome measurement requires only 7 layers of CNOTs regardless of the code length.The check qubits are initialized and measured at the beginning and at the end of the syndrome cycle respectively, see Section 5 for details.We emphasize that our SM circuit applies to any BB code beyond those listed in Table 1.The circuit respects the cyclic shift symmetry of the underlying code.Assuming that the physical qubits (data or check) are located at vertices of the Tanner graph, all CNOT gates in the SM circuit act on nearest-neighbor qubits.Thus the required qubit connectivity is described by a degree-6 thickness-2 graph, as desired.We conjecture, based on the numerical simulations, that our SM circuit is distance-preserving for the code [[72, 12, 6]], see Table 1 for the upper bounds on d circ (the upper bound d circ ≤ 18 for the 288-qubit code is unlikely to be tight and this affects the fit and extrapolations).
The full error correction protocol performs N c ≫ 1 syndrome measurement cycles and calls a decoder -a classical algorithm that takes as input the measured syndromes and outputs a guess of the final error on the data qubits.Error correction succeeds if the guessed and the actual error coincide modulo a product of check operators.In this case the two errors have the same action on any encoded (logical) state.Thus applying the inverse of the guessed error would return data qubits to the initial logical sate.Otherwise, if the guessed and the actual error differ by a non-trivial logical operator, error correction fails resulting in a logical error.Our numerical experiments are based on the Belief Propagation with an Ordered Statistics Decoder (BP-OSD) proposed by Panteleev and Kalachev [36].The original work [36] described BP-OSD in the context of a toy noise model with memory errors only.Here we show how to extend BP-OSD to the circuit-based noise model.Our approach closely follows Refs.[47,48,49,50].We also show that BP-OSD can be applied to other problems in quantum fault-tolerance such as estimating the distance of a quantum LDPC code, see Section 6 for details.These tasks can be accomplished with a relatively minor extension of the publicly available BP-OSD software developed by Roffe et al. [51] Let P L (N c ) be the logical error probability after performing N c syndrome cycles.Define the logical error rate as p L = 1 − (1 − P L (N c )) 1/Nc ≈ P L (N c )/N c .Informally, p L can be viewed as the logical error probability per syndrome cycle.Following common practice, we choose N c = d for a distance-d code.Figure 2 A) shows the logical error rate achieved by codes from Table 1.The logical error rate was computed numerically for p ≥ 10 −3 and extrapolated to lower error rates using a fitting formula p L = p d ′ circ /2 e c0+c1p+c2p 2 , where c 0 , c 1 , c 2 are fitting parameters and d ′ circ is an upper bound on d circ from Table 1.The observed pseudo-threshold for the 144-qubit and 288-qubit codes is close to 0.007, which is nearly the same as the error threshold of the surface code [52].To the best of our knowledge, this provides the first example of high-rate, large-distance LDPC codes achieving the pseudo-threshold close to 1% under the circuit-based noise model.For example, suppose that the physical error rate is p = 10 −3 , which is a realistic goal for near-term demonstrations.Encoding 12 logical qubits using the distance-12 code from Table 1 would offer the logical error rate 2 × 10 −7 which is enough to preserve 12 logical qubits for nearly one million syndrome cycles.The total number of physical qubits required for this encoding is 288.The distance-18 code from Table 1 would require 576 physical qubits while suppressing the error rate from 10 −3 to 2 × 10 −12 enabling roughly hundred billion syndrome cycles.For comparison, encoding 12 logical qubits into separate patches of the surface code would require nearly 3000 physical qubits to suppress the error rate from 10 −3 to 10 −6 , see Figure 2 B).In this example the distance-12 BB code offers more than 10X saving in the number of physical qubits compared with the surface code.
We also find that BB LDPC codes admit extensions that allow them to function as a logical memory with loadstore operations.In Section 9 we show how to use methods from [46] to attach two ancilla systems to the code that permit logical measurement of all logical qubits in the X and Z bases.Which logical qubit is being measured can be controlled via a set of fault tolerant unitary operations.The extended Tanner graph is not only thickness-2, but the extension from the X ancilla system is "effectively planar" (in a sense we define later) facilitating interconnection with other codes on the same chip.
Our findings bring experimental demonstration of high-rate LDPC codes within the reach of near-term quantum processors which are expected to offer a few hundred physical qubits, gate error rates close to 10 −3 , and long range qubit connectivity [53].
The rest of this paper is organized as follows.Section 4 formally defines BB LDPC codes and proves their basic properties.The construction of the syndrome measurement circuit is detailed in Section 5.The circuit-based noise model and BP-OSD decoder for this noise model are discussed in Section 6 with some implementation details deferred to Section 8. We describe fault tolerant memory capabilities in Section 9. A summary of our findings and some open questions can be found in Section 10.

Bivariate Bicycle quantum LDPC codes
Let I ℓ and S ℓ be the identity matrix and the cyclic shift matrix of size ℓ × ℓ respectively.The i-th row of S ℓ has a single nonzero entry equal to one at the column i + 1 (mod ℓ).For example, Consider matrices x = S ℓ ⊗ I m and y = I ℓ ⊗ S m .
Note that xy = yx and x ℓ = y m = I ℓm .A BB code is defined by a pair of matrices where each matrix A i and B j is a power of x or y.Here and below the addition and multiplication of binary matrices is performed modulo two, unless stated otherwise.Thus, we also assume the A i are distinct and the B j are distinct to avoid cancellation of terms.For example, one could choose A = x 3 + y + y 2 and B = y 3 + x + x 2 .Note that A and B have exactly three non-zero entries in each row and each column.Furthermore, AB = BA since xy = yx.The above data defines a BB LDPC code denoted QC(A, B) with length n = 2ℓm and check matrices Here the vertical bar indicates stacking matrices horizontally and T stands for the matrix transposition.Both matrices . Any X-check and Z-check commute since they overlap on even number of qubits (note that H X (H Z ) T = AB + BA = 0 (mod 2)).To describe the code parameters we use certain linear subspaces associated with the check matrices, see Table 1 for our notations.Then the code see Lemma 1.Here |v| = n i=1 v i is the Hamming weight of a vector v ∈ F n 2 .We note that the code QC(A, B) can be viewed as a special case of the Lifted Product construction [54] based on the abelian group Z ℓ × Z m .Here Z j denotes the cyclic group of order j.
Table 3 describes the polynomials A and B that give rise to examples of high-rate, high-distance BB codes found by a numerical search.This includes all codes from Table 1 and two examples of higher distance codes.To the best of our knowledge, all these examples are new.The code [[360, 12, ≤ 24]] improves upon a code [[882, 24, ≤ 24]] with weight-6 checks found by Panteleev and Kalachev in [36] (assuming that our distance upper bound is tight).Indeed, taking two independent copies of the 360-qubit code gives parameters [[720, 24, ≤ 24]].
By construction, the code QC(A, B) has weight-6 check operators and each qubit participates in six checks (three X-type plus three Z-type checks).Accordingly, the code QC(A, B) has a degree-6 Tanner graph.Below we show that the Tanner graph has thickness θ ≤ 2, as desired, see Lemma   We note that the recent work by Wang, Lin, and Pryadko [38,37] described examples of group-based codes closely related to the codes considered here.Some of the group-based codes with weight-8 checks found in [37] outperform our BB codes with weight-6 checks in terms of the parameters n, k, d.It remains to be seen whether group-based codes can achieve a similar or better level of error suppression for the circuit-based noise model.
In the rest of this section we establish some properties of BB LDPC codes. where The code offers equal distance for X-type and Z-type errors.
Proof.It is known [44,45] that We claim that rk(H X ) = rk(H Z ).Indeed, define a self-inverse permutation matrix C ℓ of size ℓ × ℓ such that the ith column of C ℓ has a single nonzero entry equal to one at the row j = −i (mod ℓ).Define C m similarly and let Therefore one can write Thus H Z is obtained from H X by multiplying on the left and on the right by invertible matrices.This implies rk(H X ) = rk(H Z ).Therefore Here we noted that H Z has size (n/2) × n and ker(( It is known [44,45] that a CSS code with check matrices H X and H Z has distance d = min (d X , d Z ), where d X and d Z are the code distances for X-type and Z-type errors defined as Thus there exists a logical Z-type operator Z(g) = n j=1 Z gj j anti-commuting with X(f ).In other words, H X g = 0 and f T g = 1.Here, f and g are length-n binary vectors.Write f = (α, β) and g = (γ, δ), where α, β, γ, δ are length-(n/2) vectors.Conditions H Z f = 0 and H X g = 0 are equivalent to Here and below all arithmetics is modulo two.Define length-n vectors e = (Cδ, Cγ) and h = (Cβ, Cα).
From Eqs. (13,14) one gets Likewise, Thus X(e) and Z(h) are non-identity logical operators.It follows that d Z ≤ |h|.We get We note that the equality d X = d Z can also be established using the machinery of Ref. [54] by viewing QC(A, B) as a Lifted Product code.
In the following, we partition the set of data qubits as [n] = LR, where L and R are the left and right blocks of n/2 = ℓm data qubits.Then, data qubits L and R and checks X and Z may each be labeled by integers Z ℓm = {0, 1, . . ., ℓm − 1} which are indices into the matrices A, B. Alternatively, qubits and checks can be labeled by monomials from M = {1, y, . . ., y m−1 , x, xy, . . ., xy m−1 , . . ., x ℓ−1 y m−1 } in this order, so that i ∈ Z ℓm labels the same qubit or check as x ai y i−mai for a i = floor(i/m).Using the monomial labeling, L data qubit α ∈ M is part of X checks A T i α and Z checks B i α for i = 1, 2, 3. Similarly, R data qubit β ∈ M is part of X checks B T i β and Z checks A i β.A unified notation assigns each qubit or check a label q(T, α) where T ∈ {L, R, X, Z} denotes its type and α ∈ M its monomial label 1 .
Lemma 2. The Tanner graph G of the code QC(A, B) has thickness θ ≤ 2. A decomposition of G into two planar layers can be computed in time O(n).Each planar layer of G is a degree-3 graph.
Tanner graph G B : where the two subgraphs are named by whether they contain more A i edges or more B i edges.Then G A and G B are regular degree-3 graphs (since A i and B j are permutation matrices).Consider the graph G A .Each X-check vertex is connected to a pair of data vertices i 1 , i 2 ∈ L via the matrices A 2 , A 3 and a data vertex i 3 ∈ R via the matrix B 3 .Each Z-check vertex is connected to a pair of data vertices Edge labels indicate adjacency matrices that generate the respective edges.By extracting either horizontal or vertical strips from these grids, we obtain planar 'wheel graphs' whose union contains all edges in the Tanner graph.The 'A' wheels (dashed lines) cover A 2 , A 3 , B 3 and the 'B' wheels (solid lines) cover B 1 , B 2 , A 1 .To avoid clutter, each grid shows only a subset of edges present in the Tanner graph.We claim that each connected component of G A can be represented by a "wheel graph" illustrated in Figure 3.A wheel graph consists of two disjoint cycles of the same length p interconnected by p radial edges.The outer cycle alternates between X-check and L-data vertices.
Edges of the outer cycle alternate between those generated by A 3 (as one moves from a check to a data vertex) and A T 2 (as one moves from a data to a check vertex).The length of the outer cycle is equal to the order of the matrix For example, consider the code [[144, 12, 12]] from Table 3.Then A = x 3 + y + y 2 , A 2 = y, and A 3 = y 2 .Thus A 3 A T 2 = y 2 y −1 = y which has order m = 6.The inner cycle of a wheel graph alternates between Z-check and R-data vertices.
Edges of the inner cycle alternate between those generated by A T 3 (as one moves from a check to a data vertex) and A 2 (as one moves from a data to a check vertex).The length of the inner cycle is equal to the order of the matrix A T 3 A 2 which is just the transpose of A 3 A T 3 considered earlier.Thus both inner and outer cycles have the same length m.The two cycles are interconnected by m radial edges as shown in Figure 3 A).Radial edges are generated by the matrix B 3 , as one moves towards the center of the wheel.The wheel graph contains 4-cycles generated by tuples of edges (B 3 , A 2 , B T 3 , A T 2 ) and (B T 3 , A 3 , B 3 , A T 3 ).Commutativity between A i and B j ensures that traversing any of these 4-cycles implements the identity matrix, that is, the graph is well defined.Clearly, the wheel graph is planar.Since G A is a disjoint union of wheel graphs, G A is planar.The same argument shows that G B is planar: see Figure 3  B).
We empirically observed that BB codes reported in Table 3 have no weight-4 stabilizers.The presence of such stabilizers is known to have a negative impact on the performance of belief propagation decoders [36], which we use here.
The definition of code QC(A, B) does not guarantee that its Tanner graph is connected.Some choices of A and B lead to a code that is actually several separable code blocks.This manifests as a Tanner graph with several connected components.For instance, although all codes in Table 3 are connected, taking any of them with even ℓ and replacing every instance of x with x 2 creates a code with two connected components.Lemma 3. The Tanner graph of the code QC(A, B) is connected if and only if S = {A i A T j : i, j ∈ {1, 2, 3}} ∪ {B i B T j : i, j ∈ {1, 2, 3}} generates the group M. The number of connected components in the Tanner graph is ℓm/|⟨S⟩|, and all components are graph isomorphic to one another.
Proof. Figure 4 is helpful for following the arguments in this proof.We start by proving the reverse implication of the first statement.Note that there is a length 2 path in the Tanner graph from L qubit α ∈ M to L qubit A i A T j α and another length 2 path to L qubit B i B T j α.These travel through X and Z checks, respectively.Thus, because the A i A T j and B i B T j generate M, there is some path from α to any other L qubit β.A similar argument shows the existence of a path connecting any pair of R qubits.Since each X check and each Z check are connected to at least one L qubit and at least one R qubit, this implies that the entire Tanner graph is connected.The forward implication of the first statement follows after noticing that, for all T ∈ {L, R, X, Z}, the path from a type T node to any other T node is necessarily described as a product of elements from S. Connectivity of the Tanner graph implies the existence of all such paths, and so S must generate M.
If S does not generate M, it necessarily generates a subgroup ⟨S⟩ and nodes in connected components of the Tanner graph are labeled by elements of the cosets of this subgroup.This implies the theorem's second statement.
For the next part, we establish some terminology.A spanning sub-graph of a graph G is a sub-graph containing all the vertices of G. Also, the undirected Cayley graph of a finite Abelian group G (with identity element 0) generated by set S ⊂ G is the graph with vertex set G and undirected edges (g, g + s) for all g ∈ G and all s ∈ S, s ̸ = 0. We say the Cayley graph of Z a × Z b when we mean the Cayley graph of Z a × Z b generated by {(1, 0), (0, 1)}.The order ord(g) of an element g in a multiplicative group is the smallest positive integer such that g ord(g) = 1.Definition 1. Code QC(A, B) is said to have a toric layout if its Tanner graph has a spanning sub-graph isomorphic to the Cayley graph of Z 2µ × Z 2λ for some integers µ and λ.
Note that only codes with connected Tanner graphs can have a toric layout according to this definition.An example toric layout is depicted in Figure 1 B).Lemma 4. A code QC(A, B) has a toric layout if there exist i, j, g, h ∈ {1, 2, 3} such that Proof.We let µ = ord(A i A T j ) and λ = ord(B g B T h ).We associate qubits and checks in the Tanner graph of QC(A, B) Because of (ii) and the pigeonhole principle, this choice of (a, b) is unique.We associate L qubit α with (2a, 2b) ∈ G. Similarly, an R qubit with label αA T j B g is associated with (2a + 1, 2b + 1) ∈ G, X-check αA T j with (2a + 1, 2b), and Z-check αB g with (2a, 2b + 1).Edges in the Tanner graph A i , A T j , B g , and B T h can now be drawn as in Figure 4 (B) and correspond to edges in the Cayley graph of G.For instance, to get from (2a + 1, 2b + 1), an R qubit, to (2a + 2, 2b + 1), a Z check, we apply A i , taking R qubit labeled αA T j B g to the Z check labeled (αA CNOT with control qubit c and target qubit t InitX q Initialize qubit q in the state Initialize qubit q in the state |0⟩ MeasX q Measure qubit q in the X-basis |+⟩, |−⟩ MeasZ q Measure qubit q in the Z-basis |0⟩, |1⟩ Idle q Identity gate on qubit q Table 4: Elementary operations used for syndrome measurements.
All codes in Table 3 have a toric layout with µ = m and λ = ℓ.Most of these codes satisfy Lemma 4 with i = g = 2 and j = h = 3.The exception is the 90, 8, 10 code, for which we can take i = 2, g = 1 and j = h = 3.
However, we also note two interesting cases.First, there are codes with connected Tanner graphs that do not satisfy the conditions for a toric layout given in Lemma 4. One example of such a code is QC(A, B) with ℓ, m = 28, 14, A = x 26 + y 6 + y 8 , and B = y 7 + x 9 + x 20 which has parameters [[784, 24, ≤ 24]].Second, for a code satisfying the conditions of Lemma 4, it need not be the case that the set {ord(A i A T j ), ord(B g B T h )} and the set {ℓ, m} are equal.For example, the [[432, 4, ≤ 22]] code with ℓ, m = 18, 12 and A = x + y 11 + y 3 , B = y2 + x 15 + x only satisfies Lemma 4 with µ, λ = 36, 6 (take i = g = 1 and j = h = 2 for instance).

Syndrome measurement circuit
The next step is to furnish the code QC(A, B) with a syndrome measurement (SM) circuit that repeatedly measures the syndrome of each check operator.Here we describe a SM circuit that requires 2n physical qubits in total: n data qubits and n ancillary check qubits used to record the measured syndromes.The circuit only applies CNOTs to pairs of qubits that are connected in the Tanner graph.
The SM circuit is defined as a periodically repeated sequence of syndrome cycles (SC).A single SC is responsible for measuring syndromes of all n check operators of the code.Let N c be the number of syndrome cycles.We envision that N c > 1.The circuit begins and ends with a special initialization and measurement cycle responsible for initializing logical qubits in a suitable initial state and measuring logical qubits in a suitable basis.Here we focus on the optimization of the SC circuit.Logical initialization and measurements are discussed in Section 9.
The SC circuit is divided into N r rounds such that each round is a depth-1 circuit composed of CNOTs and singlequbit operations.The latter include initializing a qubit in the X or Z basis and measuring a qubit in the X or Z basis.CNOTs can be applied only to pairs of qubits which are nearest neighbors in the Tanner graph.Some qubits remain idle during some rounds, although we try to minimize such occurrences by squeezing more useful computations in as little time as possible.Our notations are summarized in Table 4.
Below we describe a SC circuit with effectively N r = 8 rounds 2 .Ignoring single-qubit initialization and measurement operations, the SC circuit is a depth-7 CNOT circuit.By designing the circuit for an explicit family of LDPC codes we are able to leverage the symmetries and reduce computational depth to 7 from what otherwise would be 14 = 2•6 + 2, as shown by previous authors [29, Theorem 1].Our notations are as follows.We divide n data qubits into the left and the right registers q(L) and q(R) of size n/2 each.Each check operator acts on three data qubits from q(L) and three data qubits from q(R).The SM circuit uses 2n physical qubits in total: n data qubits and n ancillary check qubits that record the syndrome of each check operator.Let q(X) and q(Z) be the ancillary registers of size n/2 that span X-check and Z-check qubits respectively.Thus the physical qubits are partitioned into four registers, q(X), q(L), q(R), and q(Z), of size n/2 each.Label qubits in each register by integers i = 1, 2, . . ., n/2.We write q(X, i) for the i-th qubit of the register q(X) with similar notations for q(L), q(R), and q(Z).Each permutation matrix A p and B q from Eq. ( 1) defines a one-to-one map from the set {1, 2, . . ., n/2} onto itself.

Circuit
Round Circuit We identify a permutation matrix and the corresponding one-to-one map.For example, we write j = A 1 (i) if the matrix A 1 has a one at row i and column j (this is well defined since A 1 is a permutation matrix).Likewise, we write j = A T 1 (i) if the transposed matrix A T 1 has a one at the row i and column j.In this notation, the i-th X-check operator acts on data qubits q(L, A p (i)) and q(R, B p (i)) with p = 1, 2, 3.The i-th Z-check operator acts on data qubits q(L, B T p (i)) and q(R, A T p (i)) with p = 1, 2, 3. Our depth-8 SC circuit is described in Table 5 and illustrated in Figure 5.Note that within each round all operations act over non-overlapping sets of qubits.In particular, each round applies at most one layer of CNOT gates between q(X) and q(L) registers (Rounds 2, 6, and 7), at most one layer of CNOTs between q(X) and q(R) registers (Rounds 3, 4, and 5), at most one layer of CNOTs between q(Z) and q(L) registers (Rounds 3, 4, and 5), and at most one layer of CNOTs between q(Z) and q(R) registers (Rounds 1, 2, and 6).Qubits from q(Z) are always targets for CNOTs.Accordingly, X-type errors propagate from data qubits to check qubits in q(Z).The latter are measured in the Z-basis in Round 7 revealing the syndrome of X-type errors.Qubits from q(X) are always controls for CNOTs.Accordingly, Z-type errors propagate from data qubits to check qubits in q(X).The latter are measured in the X-basis in Round 8 revealing the syndrome of Z-type errors.We envision that the syndrome cycles are repeated periodically.This justifies applying CNOTs to q(Z) at Round 1 even though q(Z) is initialized only at Round 8. Indeed, Round 8 of the previous syndrome cycle goes immediately before Round 1 of the current cycle.Thus q(Z) has been already initialized at the beginning of Round 1.We were not able to find a depth-8 (or smaller depth) syndrome cycle in which X-check and Z-check qubits are initialized and measured synchronously.
Let us now prove that the above SC circuit has the desired functionality.Since the circuit involves only Clifford operations, its action can be compactly described using stabilizer tableau [56].We track how the tableau changes as each layer of CNOTs in the circuit is applied.Since the CNOT gates do not mix Pauli X and Z operators, one may consider tableau describing the action of the circuit on X-type and Z-type Pauli operators separately.
Let us begin with X-type Pauli.The corresponding tableau T is a binary matrix of size n × 2n such that each row of T defines an X-type stabilizer of the underlying quantum state.We partition columns of T into four blocks that represent qubit registers q(X), q(L), q(R), and q(Z).We partition rows of T into two blocks such that initially the top n/2 rows represent weight-1 check operators on qubits of the register q(X) initialized in the state |+⟩ while the bottom n/2 rows represent weight-6 check operators on data qubits associated with the chosen code QC(A, B).Thus, at the beginning of Round 1, when all check qubits in the register q(X) have been initialized in the state |+⟩, while data qubits are in some logical state of the code QC(A, B), the binary matrix is Here I ≡ I n/2 is the identity matrix.The SC circuit (ignoring qubit initialization and measurements) enacts the transformation Indeed, the circuit must map a single-qubit X stabilizer X j on a check qubit j ∈ q(X) to a product of X j and the j-th X-type check operator on the data qubits determined by the j-th row of H X = [A|B].The eigenvalue measurement of X j at the final round then reveals the syndrome of the j-th check operator.The bottom n/2 rows must be unchanged since the check operators of the code must be the same before and after the syndrome measurement.
Let us verify that the circuit defined in Table 5 enacts the desired transformation.To accomplish this, we rewrite the SC circuit by removing notations irrelevant to showing the correctness of X-checks.Specifically, we write each CNOT in Table 5 as CNOT M (a, b), where a, b ∈ {1, 2, 3, 4} = {q(X), q(L), q(R), q(Z)}, and M ∈ {A 1 , A 2 , A 3 , B 1 , B 2 , B 3 }.Note that the CNOT instructions where the matrix M T is used instead of M can be written using matrix M by performing the variable renaming i ← M (i) in the corresponding for loop in Table 5.
Using the above compact notation, the unitary part of the SC circuit becomes: Round 1: In the following we apply all seven unitary rounds to verify the correctness of the performed transformation: Round 1: Here, we use the identity (A , which holds since the sum of first summands and second summands on both sides of the equation gives AB, and AB + AB = 0.
Round 5: Round 6: Round 7: This is the desired transformation.
So far, we have not considered the action of the SC circuit on the logical qubits of the code.Let us show that this action is trivial.Indeed, consider some X-type logical operator X(v), where v ∈ F n 2 .Write v = (u, w) where u and w are restrictions of v onto the registers 2 and 3 respectively.Commutativity between X(v) and any Z-type check operator implies uB + wA = 0.
Here we consider u and w as row vectors.Extending v by zeroes on registers 1 and 4 gives the row vector (0 u w 0), where 0 stands for the all-zero row vector of length n/2.Let us follow the same chain of transformations as above starting from the initial vector (0 u w 0).All CNOTs controlled by the register 1, such as CNOT A2 (1, 2) or CNOT B2 (1, 3) in Eq. ( 9), have trivial action on the vector (0 u w 0) since all qubits of the control register are zeroes.Such CNOTs can be omitted.The remaining CNOTs in Eq. ( 9) such as CNOT A1 (3,4) or CNOT B1 (2, 4) map the initial vector (0 u w 0) to (0 u w t) for some vector t since the registers 2 and 3 always serve as the controls and the register 4 always serves as the target.Rounds 1, 2, and 6 in Eq. ( 9) are equivalent to XORing vectors wA 1 , wA 3 , and wA 2 respectively to the register 4. Rounds 3, 4, and 5 in Eq. ( 9) are equivalent to XORing vectors uB 1 , uB 2 , and uB 3 respectively to the register 4. Thus We have shown that the SC circuit maps the vector (0 u w 0) to itself.Hence the circuit acts trivially on logical X-type operators.
To prove the correctness of Z-checks, observe that Z-checks can be mapped into X-checks by conjugation with Hadamards.When the unitary circuit in Figure 5 is conjugated with Hadamards, this flips controls and targets of all CNOT gates.Thus, to verify Z-checks, it suffices to perform a very similar calculation to the one already shown for X-checks.We omit this calculation here.
The SC circuit shown in Table 5 is not unique in the following sense: we found 935 depth-7 alternatives to the unitary part of the SC circuit via a computer search.These alternatives are obtained from the circuit defined in Eq. ( 9) by applying the gate layers CNOT Ai and CNOT Bj in a different order.In the special case of the [[144, 12, 12]] code, numerical simulations show that all 936 variants of the syndrome cycle give rise to syndrome measurement circuits with distance d circ ≤ 10 explaining our focus on a specific circuit Eq. ( 9) which we conjecture to have distance d circ = 10.The short depth of the single cycle, relying on only seven computational stages, helps to keep the spread of errors under control.Details of calculating upper bounds on the circuit-level distance are provided in Section 6.

Decoder for the circuit-based noise model
So far we assumed that the SM circuit is noiseless.As shown in Section 5, in this case all measured syndromes are zero and the circuit implements the logical identity gate.Consider now what happens when each operation in the circuit including CNOT gates, qubit initializations, measurements, and idle qubits is subject to noise.To enable efficient decoding and numerical simulations, we use the standard circuit-based depolarizing noise model [22].It assumes that each operation in the circuit is ideal or faulty with the probability 1 − p or p respectively.Here p is a model parameter called the error rate.Faults on different operations occur independently.We define faulty operations as follows.A faulty CNOT is an ideal CNOT followed by one of 15 non-identity Pauli errors on the control and the target qubits picked uniformly at random.A faulty initialization prepares a single-qubit state orthogonal to the correct one.A faulty measurement is an ideal measurement followed by a classical bit-flip error applied to the measurement outcome.A faulty idle qubit suffers from a Pauli error X or Y or Z picked uniformly at random.
To perform error correction one needs a decoder -a classical algorithm that takes as input the measured error syndrome and outputs a guess of the final Pauli error on the data qubits resulting from all faults in the SM circuit.The error syndrome may itself be faulty due to measurement errors.The decoder succeeds if the guessed Pauli error coincides with the actual error up to a product of check operators.In this case the guessed and the actual error have the same action on any logical state.
Let us show how to adapt Belief Propagation with an Ordered Statistics postprocessing step Decoder (BP-OSD) proposed in [36,51] to the circuit-based noise model.The decoder consists of two stages.The first stage takes as input a BB code QC(A, B) equipped with a SM circuit U and an error rate p.It outputs a certain linearized noise model that ignores possible cancellations between errors generated by two or more faulty operations in U.This stage is analogous to computing the decoding graph in error correction algorithms based on the surface code [57,58].The second (online) stage of the decoder takes as input an error syndrome measured in the experiment and outputs a guess of the final error on the data qubits.This stage decodes the linearized noise model using BP-OSD method [36,51].Our linearized noise model is conceptually similar to spacetime codes studied by Delfosse and Paetznick [47] and detector-based noise model proposed by McEwen, Bacon, and Gidney [48].The online stage of our decoder closely follows Refs.[49,50].In particular, Gehér, Crawford, and Campbell [50] applied BP-OSD to study tangled syndrome measurement circuits capable of measuring certain non-local check operators on a hardware with short-range qubit connectivity.Higgott et al [49] showed that the performance of the standard minimum-weight matching decoder can be enhanced by computing prior error probabilities using BP-decoder as a preprocessing step.
We begin by describing the offline stage.Consider a BB code with parameters [[n, k, d]] and let U be the SM circuit constructed in Section 5 with N c syndrome cycles.The circuit U contains 6nN c CNOTs, nN c initializations and measurements, and 2nN c idle qubit locations.Let U 1 , U 2 , . . ., U M be the list of all possible faulty realizations of U with exactly one faulty operation.If the faulty operation happens to be CNOT or an idle qubit, one of the admissible Pauli errors for this operation is specified.A simple counting shows that M = 98nN c , where 98 = 15 • 6 + 1 + 1 + 3 • 2 accounts for 15 noisy realizations of each CNOT, 3 realizations of memory errors on idle qubits, noisy initializations and measurements.By definition, the list U 1 , U 2 , . . ., U M includes all realizations of U that can occur with the probability O(p) in the limit p → 0. We simulate each circuit U j by propagating the chosen Pauli error towards the final time step taking into account qubit initialization and measurement errors (if any).This simulation can be performed efficiently using the stabilizer formalism.Let s U j ∈ {0, 1} nNc be the full measured syndrome of U j and E j be the final n-qubit Pauli error on the data qubits generated by U j .Let s F j ∈ {0, 1} n be the syndrome of the final error E j .In other words, if we write E j = X(α j )Z(β j ) for some vectors α j , β j ∈ {0, 1} n , then Here H X and H Z are the check matrices of the chosen code.Finally, let s L j ∈ {0, 1} 2k be a logical syndrome of the final error E j defined as follows.Fix some basis set of logical Pauli operators P 1 , P 2 , . . ., P 2k for the chosen code.For example, P 1 , P 2 , . . ., P k could be logical X-type operators and P k+1 , P k+2 , . . ., P 2k could be logical Z-type operators.The i-th bit of s L j is defined as for i = 1, . . ., 2k.Note that the pair of syndromes s F j , s L j uniquely determines the final error E j modulo check operators.Define a pair of decoding matrices D and D L of size (nN c + n) × M and 2k × M respectively such that the j-th column of D is s U j s F j and the j-th column of D L is s L j .Let p j be the probability of a Pauli error that occurred in the circuit U j .We have p j = p/15 if U j contains a faulty CNOT, p j = p/3 if U j contains a faulty idle qubit, and p j = p if U j contains a faulty qubit initialization or measurement.Suppose I ⊆ {1, 2, . . ., M } is a subset of columns of D such that triples of syndromes (s U j , s F j , s L j ) are the same for all j ∈ I.We merge all columns in I to a single column and assign the value j∈I p j to the bit-flip error probability associated with the merged column.Let M be the number of columns of D after the merging step and p 1 , p 2 , . . ., p M be the respective error probabilities.
Next, the decoding matrix D is converted to a sparse form.To this end consider a faulty circuit U j and a sequence of syndromes measured by U j on some check operator.Let this sequence be m = (m 1 , m 2 , . . ., m Nc ) ∈ {0, 1} Nc .Since U j contains a single fault, the sequence m has only a few locations where the measured syndromes differ at two consecutive cycles.For example, if U j contains a Pauli error on some idle data qubit between two syndrome cycles, the m-sequence may look as (0, 0, . . ., 0, 1, 1, . . ., 1).Such sequence can be made sparse if we represent it by a binary vector In other words, m ′ stores changes in the measured syndrome at a given check operators at each cycle.We convert the matrix D to a sparse form by applying the map m → m ′ to the syndromes measured by each check operator for each faulty circuit U j .Let ξ 1 , ξ 2 , . . ., ξ M ∈ {0, 1} be independent random variables such that ξ j takes values 0 and 1 with the probability 1 − p j and p j respectively.Define a linearized noise model that outputs a random triple (s U , s F , E), where is an n-qubit Pauli error and is a binary vector that represents the error syndrome.The linearized model is a simplified version of the circuit-based noise that ignores possible cancellations among errors generated by two or more faulty operations in U. Note that such errors occur with the probability only O(p 2 ).The decoder attempts to guess the final error E acting on the data qubits based on the syndrome s U measured in the experiment making a simplifying assumption that that the pair (s U , E) was generated using the linearized noise model.We additionally assume that the decoder knows the syndrome s F of the final error E. This syndrome can be acquired by adding one noiseless cycle at the end of the syndrome measurement circuit, which is a common practice in numerical simulations of error correction.By definition, we have Here ξ = (ξ 1 , ξ 2 , . . ., ξ M ) is a column vector and matrix-vector multiplication is modulo two.Define a minimum weight error ξ * = ξ * (s) ∈ {0, 1} M as a solution of an optimization problem This problem is equivalent to the minimum weight decoding for a length-M linear code with the check matrix D, bit-flip error probabilities p 1 , p 2 , . . ., p M , and noiseless syndromes.Our guess of the unknown logical syndrome is Let E * be any n-qubit Pauli operator with the syndrome s F and the logical syndrome s L .Note that E * is defined uniquely modulo multiplication by check operators.The Pauli E * is our guess of the final error on the data qubits.Let E be the actual final error on the data qubits generated by a noisy realization of U without making any simplifications of the noise model.By definition, Pauli operators E and E * have the same syndrome but they may differ by a logical Pauli operator.We declare a logical error if E and E * differ by any non-identity logical operator (there are 4 k − 1 choices of this logical operator).Otherwise the decoding is deemed successful.It remains to explain how to solve the optimization problem Eq. (10).Since the minimum weight decoding for a linear code is known to be NP-hard problem [59], finding the exact solution of Eq. ( 10) might be practically impossible for problem instances with several thousand variables that we have to deal with.Furthermore, estimation of the logical error probability p L by the Monte Carlo method requires solving O(1/p L ) instances of the problem Eq. ( 10).This number can be quite large since p L is a small parameter.To address these challenges, we employ the BP-OSD algorithm [36,51].Recall that belief propagation (BP) is a heuristic message passing algorithm aimed at computing single-bit marginals of a probability distribution Here ξ ∈ {0, 1} M and Z is a normalization factor chosen such that ξ∈{0,1} M P (ξ|σ) = 1.In our case ξ represents an unknown error in the linearized noise model, D is the decoding matrix constructed above, and σ = s U s F is the measured error syndrome.Let q j ∈ [0, 1] be an estimate of the marginal probability Pr[ξ j = 1] obtained by the belief propagation method with some fixed number of message passing iterations.The ordered statistics post-processing step examines information sets which are subsets of bits I ⊆ [M ] such that the linear system Dξ = σ has a unique solution ξ supported on I, that is, ξ j = 0 for all j / ∈ I. Information sets are ranked according to their reliability which is defined as ρ(I) = j∈I max (q j , 1 − q j ).
BP-OSD finds an information set I with the largest reliability using a greedy algorithm [36].The final output of BP-OSD is a solution of the system Dξ = σ supported on the most reliable information set I. We replace the minimum weight error ξ * in Eq. ( 10) by the solution ξ proposed by BP-OSD.
Since BB LDPC codes are of CSS-type, it is natural to decode X-type and Z-type errors independently.Accordingly, we solve the minimum weight decoding problem Eq. ( 10) twice with a pair of decoding matrices D X and D Z constructed as above but including only the syndromes of X-type and Z-type check operators respectively.This results in guessed X-type and Z-type errors E * X and E * Z .The guessed final error is E * = E * X E * Z .We empirically observed that the resulting decoding matrices D X and D Z are (6, 35)-sparse for any BB code, meaning that there are at most 6 nonzeros in each column and at most 35 nonzeros in each row of D X and D Z .The number of columns scales as O(nN c ) where the constant coefficient depends on a particular code.For example, decoding matrices D X and D Z describing the code [[144, 12, 12]] with N c = 12 syndrome cycles have 8857 and 8785 columns respectively.
We also employed BP-OSD to compute an upper bound on the code distance d.Consider a CSS-type LDPC code [[n, k, d]] with check matrices H X and H Z .Assume for simplicity that this code has the same distance for Xand Z-type errors (this assumption is satisfied for BB LDPC codes due to Lemma 1).Suppose Z(ξ) is a minimum weight logical Z-type operator.Then ξ ∈ ker(H X ) and ξ / ∈ rs(H Z ) .Let X(η) be any logical X-type operator.Here η ∈ ker(H Z ) \ rs(H X ).Consider the following optimization problem: obtained by solving the optimization problem Eq. ( 11) using BP-OSD method with a parity check matrix H X η T and a syndrome (0, 0, . . ., 0, 1) T .We have d BP (η) ≥ d with certainty and d BP (η) = d with the probability 1/2 whenever BP-OSD finds the optimal solution.Choose the number trials T ≫ 1 and pick vectors η 1 , η 2 , . . ., η T ∈ ker(H Z ) \ rs(H X ) uniformly at random.Then is an upper bound on the distance d that can be systematically improved by increasing the number of trials T .
Using the quantity d BP as an efficiently computable proxy for the code distance enabled us to search over a large number of candidate BB codes with n = O(100) qubits.The vast majority of these candidates was discarded due to an insufficiently large upper bound d BP .This left only a few viable candidates with a sufficiently large value of d BP .The actual distance of each candidate code was computed using the integer linear programming method [55].
We similarly computed an upper bound on the circuit-level distance d circ .Since the SM circuit can break the symmetry between X-and Z-type errors, the circuit-level distance has to be computed for both types of errors.For concreteness, let us discuss the circuit-level distance d Z circ for Z-type errors.The latter is defined as the minimum number of faulty operations in the SM circuit that can generate an undetectable Z-type logical error.The optimization problem Eq. ( 11) is replaced by where D X is the decoding matrix constructed above and η ∈ {0, 1} M is a random linear combination of rows of D X and rows of D L that represent logical X-type operators.Then d circ (η) ≥ d Z circ with certainty and d circ (η) = d Z circ with the probability at least 1/2.Solving the optimization problem Eq. ( 12) using BP-OSD method for many random choices of the vector η and taking the minimum value of d Z circ (η) provides an upper bound on d Z circ .One can similarly compute an upper bound on the circuit-level distance d X circ for X-type errors.This provides an upper bound on

Proof of Lemma 1
For convenience of the reader we restate the lemma below.The code offers equal distance for X-type and Z-type errors.
Proof.It is known [44,45] that We claim that rk(H X ) = rk(H Z ).Indeed, define a self-inverse permutation matrix C ℓ of size ℓ × ℓ such that the ith column of C ℓ has a single nonzero entry equal to one at the row j = −i (mod ℓ).Define C m similarly and let Therefore one can write Thus H Z is obtained from H X by multiplying on the left and on the right by invertible matrices.This implies rk(H X ) = rk(H Z ).Therefore Here we noted that H Z has size (n/2) × n and ker(( It is known [44,45] that a CSS code with check matrices H X and H Z has distance d = min (d X , d Z ), where d X and d Z are the code distances for X-type and Z-type errors defined as We claim that d Z ≤ d X .Indeed, let X(f ) = n j=1 X fj j be a minimum weight logical X-type Pauli operator such that |f | = d X .Then H Z f = 0 and f / ∈ rs(H X ).Thus there exists a logical Z-type operator Z(g) = n j=1 Z gj j anti-commuting with X(f ).In other words, H X g = 0 and f T g = 1.Here, f and g are length-n binary vectors.Write f = (α, β) and g = (γ, δ), where α, β, γ, δ are length-(n/2) vectors.Conditions H Z f = 0 and H X g = 0 are equivalent to Here and below all arithmetics is modulo two.Define length-n vectors e = (Cδ, Cγ) and h = (Cβ, Cα).
From Eqs. (13,14) one gets Likewise, Thus X(e) and Z(h) are non-identity logical operators.It follows that d Z ≤ |h|.We get We note that the equality d X = d Z can also be established using the machinery of Ref. [54] by viewing QC(A, B) as a Lifted Product code.

Numerical simulation details
Data reported in Figure 2 A) was generated using BP-OSD software developed by Roffe et al. [51,60].The decoder was extended to the circuit-based noise model as described in Section 6.The simulations were performed using MIN-SUM belief propagation with the limit of 10, 000 iterations and combination sweep version of OSD, as described in [51].All data points except for those with the smallest error rate accumulated at least 100 logical errors to estimate the logical error rate p L with the error bars ≈ p L /10.The fitting formula p L (p) = p d circ /2 e c0+c1p+c2p 2 with fitting parameters c 0 , c 1 , c 2 was proposed in [61] in the context of surface code simulations.We observed that the same fitting formula works well for BB LDPC codes.The fitting parameters c i of the considered codes are provided in Table 6.We note that the logical error rate achieved by the combination of a distance-preserving SM circuit and an optimal decoder is expected to follow an exponential decay p  1.
in the sub-threshold regime.The function f (p) must have a logarithmic singularity f (p) ≈ −(1/2) log p for small p since one expects degree-(d/2) error suppression for a distance-d code.The fitting formula for p L (p) approximates the remaining non-singular terms in f (p) by a low-degree polynomial in p. Coefficients of the polynomial are considered as fitting parameters.Since our SM circuit is not distance-preserving, the code distance d in the fitting formula of Ref. [61] is replaced by the circuit-level distance d circ .
Surface code data reported in Figure 2 B) was generated using software developed by one of the authors and Alexander Vargo in [61].The simulation was performed for the rotated surface code with parameters [[d 2 , 1, d]], where d ∈ {9, 11, 13, 15}, and the standard SM circuit [22].Let P L,1 be the logical error probability for the surface code encoding one logical qubit and SM circuit with N c = d syndrome cycles.Encoding k = 12 logical qubits into 12 separate patches of the surface code results in a logical error probability Figure 2 B) shows the logical error rate p L defined as the logical error probability per syndrome cycle,

Logical memory capabilities
In this section we give evidence that BB LDPC codes have the required features for an effective quantum memory or storage unit.Although there are few ways of performing computations on stored qubits, there are fault tolerant operations for initialization and measurement of individual qubits, and most importantly transfer of data into and out of the code via quantum teleportation.These capabilities are based on a combination of two different techniques.First, we follow [62] to derive fault tolerant unitary operations that require only the connectivity already necessary to perform syndrome measurements.Second, we give low-overhead extensions of the Tanner graph based on work by [46] which enable measurement of a single logical operator while preserving the thickness-2 implementability criterion.Together, these capabilities allow us to address any logical qubit.
A conceptual representation of the logical operators is shown in Figure 6.We first derive logical Pauli operators for BB LDPC codes, and find that the logical qubits divide into an "unprimed" and a "primed" block with equal size and symmetrical structure.We visualize the primed and unprimed block as two sheets featuring a 2D grid of logical operators.Some of these grid cells contain one of the k/2 logical qubits per sheet.
Next, we show that there exists a set of fault tolerant depth-four circuits that implement a small family of commuting logical CNOT circuits.These gates are based on automorphisms of the code: permutations of the data qubits that commute with the stabilizer.Based on their group structure we can think of the automorphism gates as translations of a 2D grid of operators within each of the primed and unprimed blocks.Furthermore, we follow [62] to derive a fault tolerant operation based on a ZX-duality that also allows us to swap the two blocks while also applying Hadamard gates to all qubits.
Finally, we show how to leverage techniques from [46] to extend the Tanner graph to a larger Tanner graph allowing fault-tolerant measurement of one logical X and one logical Z operator.Various subgraphs of this extended Tanner graph contain either this logical X or Z operator as a stabilizer.This construction acts as a "probe" that gives us access to one of the logical qubits.
Measurements of both logical X and Z operators on any qubit can be realized by conjugating this measurement by gates based on automorphisms and the ZX-duality.We can think of this as shifting any desired qubit to be the target of the probe using translation and exchange of the two blocks.
Logical X and Z measurement of any logical qubit also enables transfer of data into and out of the code using a teleportation circuit.This teleportation can be realized through a product measurement of the logical Pauli in the BB code, and a logical Pauli in another quantum error correction code.While the Tanner graph of the BB code demands thickness-2, we show how the ancilla system corresponding to the logical X measurement can be implemented in an "effectively planar" Tanner graph.This makes it possible two connect this ancilla system to another quantum error correction code, like a surface code, within a thickness-2 implementation.This capability indicates the suitability of BB LDPC codes as a fault tolerant quantum memory.
On the other hand, this construction incurs rather significant resource overhead that undercuts the compactness of the error correction codes introduced in this paper.For example, to equip the [[144, 12, 12]] code with ancilla systems capable of measuring X and Z, we require 2 × 30 × (2d−1) = 1380 additional qubits on top of the original 288.However, the argument for fault-tolerance from [46] is designed to be very broadly applicable, and hence may demand excessively many resources for any particular error correction code.We consider it very likely that the size of these systems can be significantly reduced.We leave resource optimization of this scheme for future work.

Logical Pauli Operators
In this section we derive that the logical Pauli matrices of BB LDPC codes split into a "primed" and an "unprimed" block with |M| = ℓm many X operators and Z operators each.Operators in the primed block commute with operators in the unprimed block, and the two blocks have identical commutation structure.
We begin by introducing some new notation for Pauli matrices acting on the data qubits.We denote with F M 2 the set of polynomials over F 2 with monomials from M. This is equivalent to the quotient ring obtained from F 2 [x, y] by Primed Block ZX Duality Gate: exchange blocks Ancilla Systems: measure X and Z Automorphism gates: translate grid Logical Qubit X and Z Redundant Logical X and Z q 1 q 2 q 3 q 4 q 5 q 6 q 1 q 2 q 3 q 4 Figure 6: Conceptual diagram depicting the manner by which logical operators can be loaded into and out of a BB LDPC code.In Subsection 9.1 we derive that there are two blocks of logical Pauli operators corresponding to a 2d grid.Some subset of these grid elements can be chosen as logical qubits (large dots) and the other elements correspond to various Pauli products (small dots).In Subsection 9.2 we show that there are fault tolerant logical gates based on automorphisms that translate the grid of operators within each block, and in Subsection 9.3 we give a fault tolerant gate based on a ZX-duality [62] that swaps the two blocks.Finally, in Subsection 9.4 we show that there exists an ancilla system based on [46] that can measure one logical X operator.We can think of this system as a probe that can access one logical qubit.Together, these operations allow external access of every logical qubit and many of their products.
identifying x ℓ = y m = 1.With x = S m ⊗ I and y = I ⊗ S ℓ , the elements of F M 2 have natural matrix representations, and can also be interpreted as sets since the coefficient on any particular x ∈ M is either 0 or 1.
For P, Q ∈ F M 2 , we can consider the set of qubits q(L, α) for α ∈ P and q(R, β) for β ∈ Q.We write X(P, Q) to denote a Pauli matrix acting as X on this collection of qubits, and identity elsewhere.Similarly, Z(P, Q) denotes Z acting on q(L, α) for α ∈ P and q(R, β) for β ∈ Q.For example, we can recall that q(L, β) is connected to q(X, α) whenever β ∈ Aα, and see that the stabilizer corresponding to q(X, α) becomes X(αA, αB).Similarly, the stabilizer corresponding to q(Z, α) can be written as Z(αB T , αA T ).There is also the following useful fact: Lemma 2. X(P, Q) anticommutes with Z( P , Q) if and only if 1 where p α , pα , q α , qα ∈ F 2 are coefficients.Pauli operators X(P, Q) and Z( P , Q) overlap on a qubit q(L, α) iff p α pα = 1 and overlap on a qubit q(R, α) iff q α qα = 1.Thus X(P, Q) and Z( P , Q) anti-commute iff α∈M p α pα + q α qα is odd.
We have where dots represent all monomials different from 1.By linearity, Thus X(P, Q) and Z( P , Q) anti-commute iff P P T + Q QT contains the monomial 1.
Without loss of generality, we can express logical Pauli matrices as either The operator X(P, Q) commutes with the stabilizer Z(αB T , αA T ) whenever 1 ̸ ∈ P (αB T ) T + Q(αA T ) T = α T (P B + QA).This is equivalent to α ̸ ∈ P B + QA.Since we must have α ̸ ∈ P B + QA for all α, we see that X(P, Q) commutes with the stabilizer whenever P B + QA vanishes.Similarly we can derive that Z(Q T , P T ) commutes with the stabilizer when P B + QA = 0.
We aim to construct a family of solutions to P B + QA = 0 which give rise to a basis of logical qubits defined by a set of operators { X1 , X2 , ..., Z1 , Z2 , ...} with the correct commutation relations.To do so, let us make some observations about Pauli operators defined via solutions to P B + QA = 0. First, if P, Q are a solution, then so are αP, αQ for any α ∈ M, so each P, Q immediately gives rise to a family of |M| = lm logical operators for both X and Z.Second, consider using the same P, Q to define both X(αP, αQ) and Z(βQ T , βP T ).Then these operators always commute because βα T ∈ P Q + QP = 0 never holds.So we require at least two solutions to P B + QA = 0 to define a set of operators with nontrivial commutation relations.
For reasons described later in Subsection 9.4, we would like a logical X operator with no support on q(R).To this end, we select f, g, h ∈ F M 2 that satisfy Bf = 0 and gB + hA = 0, yielding two solutions to the equation P B + QA = 0 with P, Q = f, 0 and P, Q = g, h.These yield the following family of logical operators for all α ∈ M: For all α, β, we see that Xα , Z′ β always commute because f 0 T + 0f T = 0, and X′ β , Zα always commute because gh + hg = 0. Furthermore, Xα , Zβ and X′ α , Z′ β form anticommuting pairs when α T β ∈ f h.We see that we have constructed two independent blocks of logical operators with symmetrical structure.It follows that each of these blocks must contain a set of operators that define k/2 qubits.We name these the "unprimed" and "primed" logical blocks with Xα , Zβ and X′ α , Z′ β respectively.Not all choices of f, g, h span all k logical qubits, but valid choices are readily enumerated in software.Solutions to Table 7: Choices of polynomials f, g, h such that Xα := X(αf, 0) and Zα = Z(αh T , αg T ) as defined in equation 16 are minimum-weight logical Pauli operators.The dots represent the monomials of the form x i y j with coefficient 1.If we let {n i } = {1, y, x 2 y, x 2 y 5 , x 3 y 2 , x 4 } and {m i } = {y, y 5 , xy, 1, x 4 , x 5 y 2 }, then Xni , Zmj anticommute exactly when i = j.When used to construct an ancilla system as in Section 9.4, these polynomials give a system with 60 qubits per layer.
to the stabilizer.We find all codes in Table 3 admit several such choices of f, g, h.In Table 7 we show a particularly favorable choice of f, g, h for the [[144, 12, 12]] code where the resulting logical operators have minimum weight.
To identify logical qubits we can enumerate choices of monomials {n 1 , n 2 , . . ., n k/2 } and {m 1 , m 2 , . . ., m k/2 } such that n T i m j ∈ f h exactly when i = j.That way, Xni , Zmi as well as X′ ni , Z′ mi for i = 1...k/2 form a set of k logical qubits: Xni anticommutes with Zmj exactly when i = j.A brute force search readily finds choices of {n i }, {m i }.

Logical Gates based on Automorphisms
An automorphism of an error correction code is a permutation of the physical qubits that is equivalent to a permutation of the checks (more generally, an automorphism can map a check operator to a product of check operators).We focus on permutations that are implementable using fault tolerant circuits within the connectivity already required for syndrome measurements.
The existing connectivity admits some natural fault tolerant circuits implementing a particular family of permutations on the data qubits.BB LDPC codes feature two data registers q(L), q(R) and two check registers q(X), q(Z).We consider circuits that transfer the qubits from the data registers to the ancilla registers, and back again on a different path.The adjacency matrices describing the connectivity between the data and the ancilla registers are given by A and B, which are the sum of three monomials A 1 , A 2 , A 3 and B 1 , B 2 , B 3 in M. Each monomial is a permutation and thus describes a vertex-disjoint set of edges between the data and ancilla block.Hence, all swaps along these edges can be parallelized.In a single circuit we can either swap along the edges defined by A which are q(L) ↔ q(X) and q(R) ↔ q(Z), or along edges defined by B which are q(L) ↔ q(Z) and q(R) ↔ q(X).See also Figure 4 A).
The monomial defining the particular set of edges in each of these sets of swaps can be chosen independently for each stage of the permutation (data → ancilla or ancilla → data), and on each side of the Tanner graph.For example, we can select any A j , A k , A j ′ , A k ′ and move q(L) → A T j q(X) → A k q(L) and simultaneously move q(R) → A k ′ q(Z) → A T j ′ q(R).However, we will see later that it is necessary to select A j = A j ′ and A k = A k ′ .Furthermore, these swaps admit a standard optimization: if we initialize the check registers q(X), q(Z) to the |0⟩ state, then circuits implementing these permutations have CNOT depth four.If we also reset qubits to the |0⟩ state in between the swaps wherever possible, we obtain circuits whose errors cannot propagate between physical qubits, and are hence fault tolerant.See Table 8.
We now verify that the permutations implemented by the circuits described above are indeed automorphisms.After having applied an 'A' type permutation based on A j , A k , the qubits are permuted by q(L, α) ↔ q(L, A T k A j α) and q(R, α) ↔ q(R, A T k A j α).We see that this transforms a Pauli matrix by X(P, Q) → X(A j A T k P, A j A T k Q).Consequently, the stabilizers are transformed as X(αA, αB) → X(αA j A T k A, αA j A T k B), which is the same as permuting the X checks by α → αA j A T k .The Z stabilizers are also permuted by α → αA j A T k , so the described circuit indeed implements an automorphism.Notice also that this only works because the q(L) and q(R) blocks were transformed by the same 'A' type automorphism based on any A j , A k for α ∈ M do InitZ q(X, α) CNOT q(L, A j α) q(X, α) CNOT q(X, α) q(L, A j α) InitZ q(L, A T k α) CNOT q(X, α) q(L, A k α) CNOT q(L, A k α) q(X, α) end for for α ∈ M do InitZ q(Z, α) CNOT q(R, α) q(Z, A j α) CNOT q(Z, A j α) q(R, α) InitZ q(R, A k α) CNOT q(Z, A k α) q(R, α) CNOT q(R, α) q(Z, A k α) end for 'B' type automorphism based on any B j , B k for α ∈ M do InitZ q(X, α) CNOT q(R, B j α) q(X, α) CNOT q(X, α) q(R, B j α) InitZ q(R, B T k α) CNOT q(X, α) q(R, B k α) CNOT q(R, B k α) q(X, α) end for for α ∈ M do InitZ q(Z, α) CNOT q(L, α) q(Z, B j α) CNOT q(Z, B j α) q(L, α) InitZ q(L, B k α) CNOT q(Z, B k α) q(L, α) CNOT q(L, α) q(Z, B k α) end for Table 8: Circuits implementing automorphisms of a BB LDPC code within the connectivity already present for syndrome checks.These circuits are fault tolerant and have CNOT depth four.If s = A j A T k or s = B j B T k , then the logical gate implemented by these automorphisms performs the transformation Xα , Zα , X′ α , Z′ α → Xsα , Zsα , X′ sα , Z′ sα .
A j A T k .The 'B' type permutations can be verified to be automorphisms in the same manner, permuting the checks by some B j B T k .These automorphisms allow us to fault tolerantly implement a subgroup of the Clifford gates.As we saw in Lemma 3, shifts of the form A j A T k or B j B T k generate the entire group M whenever the Tanner graph is connected.Therefore, by leveraging these permutations as generators, we can perform all translations of the tori containing q(L), q(R) using fault tolerant circuits of varying depth.An automorphism defined by an element s ∈ M transforms Xni → Xsni , Zmi → Zsmi and similarly for the primed logical Pauli matrices.This capability is critical for addressing all logical qubits.
We can also comment on the nature of these operations as logical gates, although they are less useful in this sense.There is one such operation per element in M, and since M is Abelian the subgroup of Clifford gates implemented by these automorphisms must be Abelian as well.A transformation of this form cannot act like the logical identity so all of these gates (except s = 1) are nontrivial.Since automorphism operations take X to X and Z to Z, and they must hence be logical CNOT circuits up to a logical Pauli correction.While it is not clear how to use these CNOT circuits to facilitate useful computations, they may make for interesting test cases in an implementation.

Accessing the Primed Block via a ZX-duality
A ZX-duality is a permutation of the logical qubits that commutes with the stabilizer, except that it turns X checks into Z checks and Z checks into X checks.A physical circuit implementing this permutation and then applying Hadamard to all data qubits always acts as a logical gate [62].In this section we focus on the implementation of a particular ZX-duality with applications for readout.We leave discovery and implementation of other ZX-dualities for future work.In particular, we derive a general method for constructing fault tolerant circuits for implementing a particular ZX-duality that is present in all BB LDPC codes.While the circuits from this construction are generally quite expensive, they may be amenable to further optimization and can be used sparingly in practice.
Consider a permutation of data qubits that swaps q(L, α) with q(R, α T ) for all α ∈ M. A check qubit q(X, β) which [[144,12,12]] q 2 s 2 p 2 q s x 12 = y 6 = 1 p 4 = q 3 = r 2 = s 3 = 1 x = pq y = rs [[90,8,10]] 1 1 r 2 x 15 = y 3 = 1 p 3 = q 5 = r 3 = 1 x = pq y = r p p 2 p r r 1 q q 2 q 3 q 4 q q Figure 7: Diagrams for the description of the implementation of the ZX-duality permutation.A) A subgraph of the Tanner graph providing enough connectivity to fault tolerantly swap q(L, α) and q(L, A j A T i α).B) A sequence of shifts of the data on the qubits in the q(L) block that performs the desired exchange without interacting qubits directly.A naive implementation of this sequence has CNOT depth 12. C) D) Decomposition of the generators of M via the classification of finite Abelian groups for two different codes.Drawing the Cayley graph of the subgroup for each generator reveals the ratios defining pairs of qubits that must be exchanged to implement the permutation q(L, α) ↔ q(L, α T ).
previously implemented the stabilizer X(βA, βB) now is connected to the qubits q(L, (βB) T ) and q(R, (βA) T ) instead, corresponding to the check Z(β T B T , β T A T ).We see that this permutation switches the stabilizer implemented by q(X, β) with the stabilizer implemented by q(Z, β T ), so this permutation is indeed a ZX-duality.
We can also see that implementing this permutation and applying Hadamard to all qubits takes logical Pauli matrices to logical Pauli matrices.In particular, the operation swaps Xα = X(αf, 0) with Z′ α T = Z(0, α T f T ), as well as Zα = Z(αh T , αg T ) with X′ α T := X(α T g, α T h).This operation swaps the primed and unprimed logical blocks, transposes the grid of operators, and applies logical Hadamard to all qubits.Since we can measure logical X for all qubits in the unprimed block using the ancilla system described in Subsection 9.4, we can use this operation to measure logical Z for qubits in the primed block.
For the rest of this section we describe a fault tolerant method for implementing this operation.We begin with exchanging q(L) and q(R): since these blocks are connected by pairs of edges in q(X) and q(Z), for any A i ∈ A and B j ∈ B there exists a loop connecting the qubits q(L, α) → q(X, A T i α) → q(R, B j A T i α) → q(Z, B j α) → q(L, α).A circuit identical in shape to those in Table 8 hence performs a fault tolerant exchange of q(L, α) and q(R, B j A T i α) for all α.The additional shift of B j A T i can be removed via an additional automorphism gate.It remains to exchange q(L, α) ↔ q(L, α T ), as well as q(R, α) ↔ q(R, α T ) for all α, which is significantly more complicated.We focus on q(L, α) ↔ q(L, α T ) in our discussion but it will be clear the exact same transformations are implementable on q(R) in parallel with those on q(L).
Fault tolerant implementation of the permutation q(L, α) ↔ q(L, α T ) can be achieved using a more sophisticated version of the fault tolerant circuits in Table 8 used for implementing automorphisms.These circuits relied on the existence of a connected loop of alternating check and data qubits, enabling a short depth fault tolerant circuit implementing a cyclic permutation of the data qubits therein.The same connectivity can be leveraged to implement a fault tolerant nearest neighbor swap of two data qubits connected by a check qubit.The fault tolerance of these circuits relies on the same principle: while a swap gate acting on two qubits containing data is not fault tolerant, moving a data qubit onto a blank qubit is. Figure 7 A) shows a subgraph of the Tanner graph consisting of several connected qubits in the q(L) and q(X) block, and Figure 7 B) shows a sequence of operations where two data qubits can be exchanged without ever interacting directly.This gives us the following capability: whenever the circuits in Table 8 can implement the cyclic permutation q(L, α) → q(L, sα) for all α, there also exists a circuit that can swap q(L, α) and q(L, sα) for a particular α.Matching circuits exist for q(R), and can be implemented simultaneously.
To decompose q(L, α) ↔ q(L, α T ) into a sequence of swaps, it will be helpful to consider the group structure of M. Consider for example the [[90, 8, 10]] code with x 15 = y 3 = 1.Following the classification of finite Abelian groups we see that M ∼ = Z 3 × Z 5 × Z 3 .We can re-express elements of M using generators p, q, r with p 3 = q 5 = r 3 = 1 where x = pq and y = r.Transforming α to α T amounts to decomposing α as α = p i q j r k and exchanging the qubit with α T = p −i q −j r −k .This exchange p i q j r k ↔ p −i q −j r −k can be split into a sequence of swaps that are implementable with the method described above using Figure 7 A) and B).It suffices to be able to exchange for any α = q j r k the qubits q(L, p i α) ↔ q(L, αp −i ), as well as for any α = p i r k the qubits q(L, q j α) ↔ q(L, αq −j ), and finally for any α = p i q j the qubits q(L, r k α) ↔ q(L, αr −k ).This, for any i, j, k, creates a sequence of qubits q(L, p i q j r k ) ↔ q(L, p −i q j r k ) ↔ q(L, p −i q −j r k ) ↔ q(L, p −i q −j r −k ) where swaps are possible along each nearest neighbor.This is sufficient for swapping the first and last qubit in the chain.The implementation of the individual generators like q(L, αq i ) ↔ q(L, αq −i ) swaps may also involve additional intermediate qubits, but this only lengthens the chain and does not prohibit implementation.
The resources required for swapping q(L, p i α) ↔ q(L, αp −i ) where p 3 = 1 and similarly for other generators depends on the order of the generator p as well as the ratios A i A T j that can be formed using terms A i , A j ∈ A or similar ratios from B. See Figure 7 C).Plotting the Cayley graph of the cyclic subgroup spanned by p immediately reveals that since p is order three, only a single ratio B i B T j = p is needed in order to swap any qubits marked p 1 and p 2 , while leaving p 0 qubits in place.Indeed since B = 1 + x 2 + x 7 = 1 + p 2 q 2 + pq 2 we can implement p = (p 2 q 2 )(pq 2 ) T in a single layer of transforms in Figure 7 B).
The exchange q(L, αq i ) ↔ q(L, αq −i ) with q 5 = 1 requires two such ratios q and q 2 , the minimal depth expression of which demands the chaining together of two such transforms each.In other codes, like the [[144, 12, 12]] code, we encounter generators p, q, r, s of order p 4 = q 3 = r 2 = s 3 = 1.Elements of order two like r require no swaps at all, and elements of order four like p can be implemented either using the ratio p 2 or just p, as shown in Figure 7 D).Numerical searches can quickly compute the most efficient decompositions of the required swaps.We give the orders of the generators, the ratios defining the required swaps, and the number of transforms required to implement them in Table 9.
We emphasize that the swap q(L, p i α) ↔ q(L, αp −i ) can be performed for all α = q j r k simultaneously in parallel.This stems from the structure of the exchange circuit in Figure 7 B).This circuit performs the swap q(L, α) ↔ q(L, A j A T i • α) while using the qubit q(L, A k A T i • α) as scratch space.However, we can simultaneously want to swap q(L, A k A since the first step of the exchange circuit in Figure 7 B) is to move the data marked 'A' away from the qubit holding it, just as if it were a piece of data marked 'C' for a different exchange.
For clarity, we compute the total depth of the circuit for the [[144, 12, 12]] code without any further optimization.The ratios p, q, s can be implemented using two ratios each via p = x −3 y • y −2 y, q = y −3 x 2 • y −3 x 2 and s = y −2 y • y −2 y.This results in a chain q(L, α) ↔ q(L, α ′ ) ↔ q(L, α ′′ )... ↔ q(L, α T ) of length six (counting the number of ↔s).We can swap the qubits at the ends of a chain of length n using 2n−1 many nearest neighbor swaps.Each swap circuit of the form Figure 7 B) can be implemented in CNOT depth twelve, resulting in CNOT depth (2 • 6 − 1) • 12 = 132 to implement q(L, α) ↔ q(L, α T ).
Despite its fault tolerance, the implementation of this logical operation is clearly significantly more expensive than that of the automorphisms.Since the intermediate permutations corresponding to each of the generators p, q, r are not ZX dualities in general, it will not be possible in general to perform error correction during this long operation.However, the significant overhead of this operation may be worth such a large cost, since it grants us the capability of accessing the primed block of qubits, effectively doubling the storage capacity of the code.This operation can also be used significantly more sparingly than the automorphism gates, and may be amenable to additional optimization.Alternatively, additional connections and qubits beyond those necessary for the Tanner graph could be introduced to more directly implement the ZX-duality, though it is likely this will sacrifice the thickness-2 property.

Logical Measurements
In this section we describe how to leverage methods from [46] to implement fault-tolerant measurements of the operators X1 = X(f, 0) and Z1 = Z(h T , g T ).As described above, this capability suffices to measure X and Z for all logical qubits.We can also use this technique to measure various Pauli product operators by measuring Xα , Zα , X′ α , Z′ α for α not corresponding to logical qubits.
The measurement is facilitated by an ancilla system that extends the Tanner graph of the original code.The code defined by this extended Tanner graph contains the logical operator of interest as a stabilizer, enabling its fault tolerant measurement.A sketch of the structure of this ancilla system is given in Figure 1 C).For the logical operator X(f, 0), we consider a subgraph of the Tanner graph consisting of q(L, f ) as well as q(Z, α) operators corresponding to checks with support on q(L, f ).Similarly for the logical operator Z(h T , g T ) we consider a subgraph consisting of q(L, h T ), q(R, g T ) as well as q(X, α) for the relevant α.These subgraphs are copied several times and are connected together as shown in the figure: we call the resulting construction an ancilla system.With enough copies, the code defined by the extended Tanner graph has the same distance as the original code.
Furthermore, the ancilla system for measuring X(f, 0) can be connected to another quantum error correction code, such as a surface code.This enables a joint X X measurement between a surface code qubit and any qubit within the BB code.A subsequent measurement of Z(h T , g T ) and some additional Pauli corrections then achieves a quantum teleportation circuit.
The main challenge of implementing these ancilla systems, in addition to minimizing their size, is to show that the extended Tanner graph satisfies the thickness-2 constraint.If our goal is to leverage the X(f, 0) ancilla system to measure a Pauli product measurement with a surface code qubit, then arguably a thickness-2 extension of the Tanner graph does not suffice since there is no obvious way of connecting it to the surface code qubit as in Figure 1 C).To this end, we show how to make the subgraph corresponding to the X(f, 0) ancilla system "effectively planar": while the graph has thickness-2, the planar graph in one plane consists entirely of connected components with two vertices.Given this property of the embedding of the X(f, 0) ancilla system, a connection between this system and a surface code may be facilitated by a construction that is thickness-2 overall.
An effectively planar embedding of the ancilla system relies on the fact that the logical operator X(f, 0) has no support on the q(R) block.An implementation of more general logical operators is possible, but would require a graph that renders many ancilla qubits inaccessible from the outside.
We briefly give a self-contained description for the construction of the ancilla system from [46], following their notation.Suppose we are interested in measuring a logical operator X that is supported on some set of qubits V X .Then, let C X be the collection of Pauli-Z checks that have support on any of the V X .If we view these as sets of vertices in a Tanner graph, and let E X contain the edges between V X and C X , then G X := (V X , C X , E X ) forms a subgraph of the Tanner graph of the BB code.
The ancilla system is constructed out of copies of 'primal layers' isomorphic to G X , and 'dual layers' isomorphic to G T X := (V T X , C T X , E T X ) defined as follows: each v ∈ V X has a corresponding v T ∈ C T X , each c ∈ C X has a corresponding c T ∈ V T X , and each edge (v, c) ∈ E X has a corresponding (v T , c T ) ∈ E T X .For some parameter r, the final Tanner graph is that of the BB code, plus r additional copies of the dual graph labeled G T X [j] for 1 ≤ j ≤ r, and r − 1 additional copies of the primal graph labeled G X [j] for 2 ≤ j ≤ r.We regard the G X within the original code as G X [1].We also add additional connections between G X [j] and G T X [j] for j ≤ r, as well as G T X [j] and G X [j + 1] for j < r: specifically, we connect the associated pairs of v, v T and c, c T .
It is shown by [46] that the resulting Tanner graph defines an error correction code of distance d when r = d.We construct two such ancilla systems: one for X := X(f, 0) and one for Z := Z(h T , g T ).Table 7 shows a choice of f, g, h for the [[144, 12, 12]] code, defining X(f, 0) and Z(h T , g T ) such that these operators are all minimum weight, and define G X and G Z with 30 qubits each.To achieve d = 12 we hence require 2 × 30 × (2d − 1) = 1380 additional qubits.We suspect that significantly more efficient variations of this constructions are possible, but leave their development for future work.
The construction presented above is complicated by the fact that vertices in the Tanner graph take on alternating roles in each layer: in the primal layers the vertices v are physical qubits, whereas in the dual layers the v T are checks.However, for the purposes of giving a thickness-2 decomposition we need not concern ourselves with this.If we do not distinguish between checks and physical qubits, then the primal layers G X and dual layers G T X have isomorphic Tanner graphs.Hence, for the purposes of the following, we view all layers as identical.
We now show why a thickness-2 embedding of the Z(h T , g T ) ancilla system, and an effectively planar embedding of the X(f, 0) ancilla system is possible.This argument is best understood in reference to Figure 8.We begin by understanding the thickness-2 decomposition of each layer of the ancilla systems, leveraging Figure 3.In Figure 8 A), we can see that G Z for the Z(h T , g T ) system decomposes into 'hairy rings' in both the 'A' plane and the 'B' plane since it has no support on q(Z).G X for the X(f, 0) system is a collection of connected pairs in the 'A' plane and collection of rings in the 'B' plane, since it has no support on q(X).
Since G X , G Z are subgraphs of the BB code's Tanner graph, and its Tanner graph has thickness-2, and since G X , G Z and G T X , G T Z are isomorphic if we do not distinguish between qubits and checks, we see that each layer of the ancilla construction must be thickness-2 individually.The main challenge is to show that the connections between the layers can be facilitated without introducing any crossings.
Figure 8 B) shows how to connect several layers of the two ancilla systems to both the wheel graphs of the BB code, and also an ancillary surface code.We arrange the wheels of the BB code such that q(X), q(L) are on the inside of the 'A' wheels, and that q(X), q(R) are on the inside of the 'B' wheels.G Z and G T Z for of the Z(h T , g T ) system can be repeatedly nested inside of the wheels of the BB code.The q(L) qubits can be connected together on the 'A' plane, and the q(R) and q(X) qubits can be connected on the 'B' plane.As for G X and G T X for the X(f, 0) system, the rings in the 'B' plane can be wrapped around the wheels of the BB code which already allows connection of the required q(L) and q(Z) qubits.This leaves the pairs of connected qubits in the 'A' plane completely free of any connections between the layers, making them available to be connected to a surface code system.
We have considered just two ancilla systems here for measuring X(f, 0) and Z(h T , g T ).However, using additional ancilla systems, especially if their size can be reduced, or equipping these two ancilla systems with additional connections to the X(g, h) and Z(0, f T ) logical operators are potential ways to eliminate the need for the error-prone ZX-duality from Subsection 9.3 and access all logical qubits.On the other hand, it is not clear that either approach would preserve the thickness-2 property.

Conclusion
In summary, we offered a new perspective on how a fault-tolerant quantum memory could be realized using nearterm quantum processors with a small qubit overhead.Our approach complements a concatenation-based scheme by Pattison, Krishna, and Preskill [63] where each data qubit of a high-rate LDPC code is additionally encoded by the surface code.Although the concatenation approach makes use of the high error threshold of the surface code and its geometric locality to address quantum hardware limitations such as a relatively high noise rate and limited qubit connectivity, the additional surface code encoding incurs a significant qubit overhead, partially negating the advantages offered by LDPC codes.Here we have shown that the concatenation step can be avoided by introducing examples of high-rate LDPC codes which have nearly the same error threshold as the surface code itself.Although these LDPC codes are not geometrically local, qubit connectivity required for syndrome measurements is described by a thickness-two graph which can be implemented using two planar degree-3 layers of qubit couplers.This is a valid architectural solution for platforms based on superconducting qubits.Numerical simulations performed for the circuit-based noise model indicate that the proposed LDPC codes compare favorably with the surface code in the practically relevant range of error rates p ≥ 0.1% offering the same level of error suppression with nearly 15x reduction in the qubit overhead.
The key hardware challenges to enable the new codes with superconducting qubits are: 1. the development of a low-loss second layer, 2. the development of qubits that can be coupled to 7 connections (6 buses and 1 control line), and 3. the development of long-range couplers.
These are all difficult to solve but not impossible.For the first challenge, we can imagine a small change to the packaging [53] which was developed for the IBM Quantum Eagle processor [64].The simplest would be to place the extra buses on the opposite side of the qubit chip.This would require the development of high Q through substrate vias (TSV) which would be part of the coupling buses and as such would require intensive microwave simulation to make sure these TSVs could support microwave propagation while not introducing large unwanted crosstalk.
The second challenge is an extension of the number of couplers from the heavy hex lattice arrangement [65] which is 4 (3 couplers and 1 control) to 7. The implication of this is that the cross-resonance gate, which has been the core gate used in large quantum systems for the past few years, would not be the path forward.This is due to the fact that the qubits in the cross-resonance gate are not tunable and as such for a large device with a large number of connections the probability of energy collisions (not just the qubit levels but also higher levels of the transmon) trends to one very fast [66].This is because of frequency requirements for the gate to work properly and intrinsic device variability, which is fundamental to Josephson junction fabrication.However, with the tunable coupler [67,68], which was used in the IBM Quantum Egret and is now being developed for the IBM Quantum Heron, this problem no longer exists as the qubits can be designed to be further apart.This new gate is also similar to the gates used by Google Quantum AI [69], which have shown that a square lattice arrangement is possible.Extending the coupling map to 7 connections will require significant microwave modeling; however, typical transmons have about 60fF of capacitance and each gate is around 5fF to get the appropriate coupling strengths to the buses, so it is fundamentally possible to develop this coupling map without changing the properties of the transmon qubits which have been shown to have larger coherence and are stable.
The final challenge is the most difficult.For the buses that are short enough so that the fundamental mode can be used the standard circuit QED model holds.However, to demonstrate the 144-qubit code some of the buses will be long enough that we will require frequency engineering.One way to achieve this is with filtering resonators, and a proof of principle experiment was demonstrated in Ref. [70].
Our work leaves several open questions concerning BB LDPC codes and their applications.
5. How much would the error threshold change for a noise biased towards measurement errors?Note that measurements are the dominant source of noise for superconducting qubits.Since the considered BB codes have a highly redundant set of check operators, one may expect that they offer extra protection against measurement errors.
6.The general-purpose BP-OSD decoder used here may not be fast enough to perform error correction in real time.
Is there a faster decoder making use of the special structure of BB codes?
7. How to apply logical gates?While our work gives a fault-tolerant implementation of certain logical gates, these gates offer very limited computational power and are primarily useful for implementing memory capabilities.

Figure 1 :
Figure 1: A) Tanner graph of a surface code, for comparison.C) Tanner graph of a Bivariate Bicycle code with parameters[[144, 12, 12]] embedded into a torus.Any edge of the Tanner graph connects a data and a check vertex.Data qubits associated with the registers q(L) and q(R) are shown by bLue and oRange circles.Each vertex has six incident edges including four short-range edges (pointing north, south, east, and west) and two long-range edges.There are also several long-range edges, of which we only show a few to to avoid clutter.Dashed and solid edges indicate two planar subgraphs spanning the Tanner graph, see Section 4. B) Sketch of a Tanner graph extension for measuring Z and X following[46].The ancilla corresponding to the X measurement can be connected to a surface code, enabling load-store operations for all logical qubits via quantum teleportation and some logical unitaries.This extended Tanner graph has a thickness-2 implementation, see Section 9.

Figure 4 :
Figure 4: A) "Compass" diagram that shows the direction in which matrices A, B are applied to travel between different nodes.B) The unit cell of the construction of a toric layout in the proof of Lemma 4.

ξ
j subject to η T ξ = 1.(11)Then d(η) ≥ d for any logical operator X(η) and d(η) = d if X(η) anti-commutes with some minimum-weight logical operator Z(ξ).The latter event occurs with the probability 1/2 if one picks η ∈ ker(H Z ) uniformly at random.In this case d(η) = d with the probability at least 1/2 and d(η) ≥ d with certainty.Let d BP (η) be an upper bound on d(η)

Bf = 0
and gB + hA = 0 correspond to null spaces of B and B A respectively, which can be constructed by Gaussian elimination.Gaussian elimination can also be used to check if the operators Xα , Zα , X′ α , Z′ α together span k qubits up Polynomials for Logical Pauli Matrices in the[[144, 12, 12] Tanner Graph of the [[144,12,12]] Bivariate Bicycle Code

Table 1 :
Small examples of Bivariate Bicycle LDPC codes and their performance for the circuit-based noise model.All codes have weight-6 checks, thickness-2 Tanner graph, and a depth-7 syndrome measurement circuit.A code with parameters[[n, k, d]] requires 2n physical qubits in total and achieves the net encoding rate r = k/2n (we round r down to the nearest inverse integer).Circuit-level distance d circ is the minimum number of faulty operations in the syndrome measurement circuit required to generate an undetectable logical error.The pseudo-threshold p 0 is a solution of the break-even equation p L (p) = kp, where p and p L are the physical and logical error rates respectively.The logical error rate p L was computed numerically for p ≥ 10 −3 and extrapolated to lower error rates.
Logical vs physical error rate for small examples of Bivariate Bicycle LDPC codes.A numerical estimate of p L (diamonds) was obtained by simulating d syndrome cycles for a distance-d code.Most of the data points have error bars ≈ p L /10 due to sampling errors.B) Comparison between the Bivariate Bicycle LDPC code [[144, 12, 12]] and surface codes with 12 logical qubits and distance d ∈ {9, 11, 13, 15}.The distance-d surface code with 12 logical qubits has length n = 12d 2 since each logical qubit is encoded into a separate d×d patch of the surface code lattice.

Table 2 :
Notations for linear spaces associated with a binary matrix H.Here the linear span, orthogonality, and dimension are computed over the binary field F 2 = {0, 1}.If H has size s × n then rs(H) ⊆ F n + y 10 + y 17 y 5 + x 3 + x19

Table 3 :
[55]l examples of Bivariate Bicycle LDPC codes and their parameters.All codes have weight-6 checks, thickness-2 Tanner graph, and a depth-7 syndrome measurement circuit.Code distance was computed by the mixed integer programming approach of Ref.[55].Notation ≤ d indicates that only an upper bound on the code distance is known at the time of this writing.We round r down to the nearest inverse integer.The codes have check matrices H X = [A|B] and H Z = [B T |A T ] with A and B defined in the last two columns.The matrices x, y obey x ℓ = y m = 1 and xy = yx.

Table 6 :
Parameters c 0 , c 1 , c 2 in the fitting formula p L (p) = p d circ /2 e c0+c1p+c2p 2 for BB LDPC codes shown in Table