Very low overhead fault-tolerant magic state preparation using redundant ancilla encoding and flag qubits

Fault-tolerant quantum computing promises significant computational speedup over classical computing for a variety of important problems. One of the biggest challenges for realizing fault-tolerant quantum computing is preparing magic states with sufficiently low error rates. Magic state distillation is one of the most efficient schemes for preparing high-quality magic states. However, since magic state distillation circuits are not fault-tolerant, all the operations in the distillation circuits must be encoded in a large distance error-correcting code, resulting in a significant resource overhead. Here, we propose a fault-tolerant scheme for directly preparing high-quality magic states, which makes magic state distillation unnecessary. In particular, we introduce a concept that we call redundant ancilla encoding. The latter combined with flag qubits allows for circuits to both measure stabilizer generators of some code, while also being able to measure global operators to fault-tolerantly prepare magic states, all using nearest neighbor interactions. We apply such schemes to a planar architecture of the triangular color code family and demonstrate that our scheme requires at least an order of magnitude fewer qubits and space–time overhead compared to the most competitive magic state distillation schemes. Since our scheme requires only nearest-neighbor interactions in a planar architecture, it is suitable for various quantum computing platforms currently under development.

X , · · · , g (9) X , and g (3) Z , · · · , g (9) Z , as well as the following four weight-2 stabilizers g (1) X = X 11 X 15 , g We define S st to be the group generated by the operators in Eqs. (1) and (2). Note that |S t can be prepared by first preparing all qubits in the |0 state, and then measuring only the X-type generators in S st . As such, we have One can readily check that |S t is stabilized by all the 18 stabilizers in Eqs. (1) and (2) as desired.
Once |S t is prepared, we measure the four missing stabilizer generators of the distance-5 triangular code given by g (1) X = X 5 X 8 X 11 X 12 , g (1) Such operators are the white plaquettes shown in Supplementary Fig. 1a. We define S b1 to the group generated by the operators in Eq. (4). Initially, the system is in the state After measuring the X-type stabilizers of S b1 , we get the following state conditioned on obtaining ((−1) m1 , (−1) m2 ) where m 1 , m 2 ∈ {0, 1} when measuring g (1) X and g (2) X . In any case, we can always convert the state |ψ  An input physical state α|0 + β|1 is grown to a logical state α|0 d=5 + β|1 d=5 encoded in the distance-5 triangular color code. b An input logical α|0 d=3 + β|1 d=3 encoded in the distance-3 triangular color code is grown to a logical state α|0 d=7 + β|1 d=7 encoded in the distance-7 triangular color code.
since g (1) Z anti-commutes with g (1) X and g (2) X , and g (2) Z anti-commutes with g (2) X . Note that the correction operators in Eq. (7) can be determined by implementing MWPM on the matching graph G (5) 1x shown in Fig. 7b in the main paper. Thus after the correction, we are always left with the state where we used α|0 + β|1 = (αI + βX 1 )|0 . Next, we measure the two Z-type stabilizers of S b1 . After the measurement, the state becomes conditioned on the measurement outcomes ((−1) m1 , (−1) m2 ). Performing the same steps as above, the state |ψ can be converted to |ψ by applying an appropriate correction operator which is determined by implementing MWPM on the matching graph G where |0 d=5 is the logical zero state of the distance 5 triangular color code. Note that Since X = X 1 X 3 X 5 X 11 X 15 is the logical X operator of the distance-5 triangular color code, we can conclude that the output state |ψ [00] 2 is given by Hence, the output state is the desired state α|0 + β|1 encoded in the d = 5 triangular color code. We now move on to the growing scheme shown in Supplementary Fig. 1b that converts an input state α|0 d=3 + β|1 d=3 (encoded in the d = 3 triangular color code) into a logical state α|0 d=7 + β|1 d=7 encoded in the d = 7 triangular color code. As in the previous scheme, we first prepare a stabilizer state |S t that is stabilized by 24 out of the 36 generators of the d = 7 triangular color code as well as 6 weight-2 stabilizers which are given by To initiate the growing scheme, we measure the following 6 stabilizers of the d = 7 triangular color code, which are represented by the white plaquettes of Supplementary Fig. 1b: Note that the operators in Eq. (14) are not stabilizers of the stabilizer state |S t and the d = 3 triangular color code (which is being merged with |S t ).
Once the stabilizer state is prepared, and prior to measuring the operators in Eq. (14), the system is in the state After measuring the three X-type stabilizers in Eq. (14), we get the following state where the values of m 1 , m 2 , m 3 ∈ {0, 1} depend on the measurement outcomes of g (1) X and g X . As in the case where a physical state was grown to an encoded state of the d = 5 triangular color code, we can convert any output state |ψ . (17) Note that correction operators of Eq. (17) can be determined by implementing MWPM on the matching graph G The final step consists of measuring the three Z-type stabilizers in Eq. (14). The state in Eq. (18) then becomes where the values of m 1 , m 2 , m 3 ∈ {0, 1} depend on the measurement outcomes of g (1) Z and g Repeating a similar analysis as in Eq. (11), we get the following desired result Here, we show that in order for the magic state preparation protocol of Fig. 1 in the main paper to be fault-tolerant (see Definition 2 in the main paper), the pair of H m circuits given an input error E in (see Supplementary Fig. 2). The input error E in arises from faults which occur during the implementation of the G (1→d) circuit (see for instance Fig. 7 in the main paper or Supplementary Fig. 1). We illustrate all possible cases for E in and for each case we show the resulting output errors. Note that in what follows, the state |GHZ = 1 √ 2 (|0 ⊗m + |1 ⊗m ) (where m is the number of ancilla qubits) can be replaced by the single-qubit |+ state without changing the final result. We also write the controlled-Hadamard gate as C H . Lastly, given an error E, s(E) will correspond to the error syndrome of E obtained by measuring all the stabilizer generators of the underlying stabilizer code used to encode the data.
Prior to performing the C H gate, we have Applying the C H gate and using the identity HX = ZH, |ψ 1 transforms to |ψ 2 which is given by Similarly, if E in = Z, performing the same steps shows that From Eqs. (23) and (24), we see that performing the X-basis measurement and discarding the ancilla, the final output state |ψ 3 is |ψ f = |H if a +1 outcome is obtained (which is the desired state), and |ψ f = X−Z √ 2 |H if a −1 outcome is obtained (each occurring with a 50% probability).
Case 2: E in = Y . Prior to performing the C H gate, we have Applying the C H gate and using the identity HY = −Y H, we have Hence the ancilla measurement outcome will always be −1 with the final output state |ψ f = Y |H .
In this case, we assume that s(E ) = 0 where 0 is the all zeros bit string of length n − 1 (where n is the number of data qubits of the underlying stabilizer code encoding the data). Further, we defineẼ = HE H † .
Performing an analogous calculation to the one leading to Eq. (26), we have Hence, if the ancilla is measured as +1, the output state will be are detectable errors.
Again, we assume that s(E ) = 0. Performing the same calculations as above, we find Again, the ancilla measurement outcomes will be ±1, each occurring with 50% probability. A +1 outcome yields the output state |ψ f = (E X+Ẽ Z) |H . In both cases, the errors afflicting the state |H will be detected by a fault-free EC (d) circuit. The case where E in = E Z yields identical output states, up to a global sign. We will now explain why the H (d) m and EC (d) circuits need to come in pairs. In the example provided leading to Eq. (28), if the +1 measurement outcome is obtained, there can be a fault resulting in the error E at the very beginning of the subsequent EC (d) circuit cancelling the term multiplying X and thus resulting in the error E = X+E Ẽ Z √ 2 . As such, there is a 50% chance that a trivial syndrome is obtained when implementing the EC (d) circuit, resulting in the output error X. However, by applying the H  m and EC (d) circuits are t-flag circuits (with t = (d − 1)/2), the final output error E final must have wt(E final ) ≤ s and as such, it must be a correctable error.
We note that in the general case, where a |GHZ state is used instead of the |+ state, we can replace +1 and −1 measurement outcomes with even and odd parity measurement outcomes, and the same conclusions would follow.

SUPPLEMENTARY NOTE 3: TWIRLING APPROXIMATION
Here, we show that a non-Pauli error after a T † or T gate can be converted via a noise twirling operation into an incoherent mixture of Pauli errors. To be clear, we do not propose to physically perform the noise twirling after the T † and T gates as part of the protocol as this can reduce the performance of our scheme. Instead, we aim to show that the approximations we performed in our numerical simulations are justified. Suppose for instance there is an input Z error to a T gate. Recall that the input Pauli error Z is converted through the T gate into a non-Pauli error H = 1 √ 2 (X + Z). We make this non-Pauli error into an incoherent mixture of Pauli errors by applying Y (θ) and Y (−θ) before and after the noisy T gate where the rotation angle θ is drawn uniformly from the range [0, 2π]. Note that where we used the fact that we have and thus That is, the output Hadamard error H = 1 √ 2 (X + Z) is converted via the noise twirling to an incoherent mixture of the Pauli X and Z errors, each with 50% probability (see Supplementary Fig. 3). The same reasoning holds for any output error cos φX + sin φZ for any φ ∈ [0, 2π]. On the other hand, since a Pauli Y error commutes with the T gate and the Y (θ) gates, it is unaffected by the noise twirling and remains to be a Pauli Y error.

SUPPLEMENTARY NOTE 4: PREPARING AN H-TYPE MAGIC STATE WITH PHYSICAL STABILIZER OPERATIONS
Here, we discuss more details on the simulation of non-Clifford gates. Note that an X error propagating through the control qubit of a controlled-Hadamard gate results in a Hadamard error applied to the data (see Supplementary  Fig. 4). Therefore, an X ⊗ X error on the first CNOT between the ancillas |+ and |0 in the circuit implementing H (d) m results in the error H ⊗n (which acts trivially on |H ) without any flag qubits flagging (see Supplementary Fig.  5 for an illustration). However, since the controlled-Hadamard is decomposed as in Fig. 5b in the main paper, an X error on the control qubit of the controlled-Z gate results in a Z error on the data. Therefore, if we propagate Z ⊗n (arising from the X ⊗ X error at the CNOT mentioned above) through the T gates as described above, the output will not be a benign error. As such, prior to the application of the T gates, let E Z be the Z component of the data qubit errors. For instance, if the data qubit errors are E = Z ⊗ X ⊗ Y , then E Z = Z ⊗ I ⊗ Z. The Z component which is propagated through the T gates is chosen to be E Z = min(wt(E Z ), wt(E Z Z ⊗n )). This prevents a single fault from causing a logical error without any flag qubits flagging in our simulations. Here, we provide a detailed overhead analysis of our magic state preparation scheme with stabilizer operations encoded in the triangular color code.
Recall that to prepare an |H f state with some target logical error rate, we require all stabilizer operations to be encoded in a triangular color code of distance d 2 . First, |H d1 states are prepared using the scheme described in Fig. 1 in the main paper (with d 1 ∈ {3, 5, 7} and with physical stabilizer operations). The distance d 1 is chosen such that the |H d1 states have smaller logical failure rates compared with those of the distance d 2 encoded stabilizer operations. The |H d1 states are used to implement the logical T gates (see Fig. 5a in the main paper) and for injection in the circuitG (1→d f ) . If d 1 < d 2 , the |H d1 states must first be grown into |H d2 states using the growing circuit G (d1→d2) shown in Fig. 7c in the main paper or in Supplementary Fig. 1b. Note that we did not put the tilde on G (d1→d2) since all operations are implemented with physical gates. Using encoded |H d2 states ensures that stabilizer operations encoded in a distance d 2 triangular color code can be used with the prepared magic states.
Once enough |H d2 states have been prepared (see below), such states are injected into the circuits of Fig. 5a in the main paper to perform the logical T gates, in addition to being injected in the circuitG (1→d) (see for instance the circuit used in Fig. 7a in the main paper or Supplementary Fig. 1a, but with distance d 2 encoded stabilizer operations). The final |H f state used for computation is then prepared repeating the same steps as in Fig. 1 Supplementary Fig. 6 we provide a schematic illustration of the full scheme described above.
To compute the overhead for preparing the state |H f , we consider the case where all the T † gates are simultaneously Supplementary Figure 6: General scheme for fault-tolerantly preparing an encoded magic state. Unlike in Fig. 1 in the main paper, each gate is encoded in a triangular color code of distance d2. Magic states that are encoded in a triangular color code of distance d1 (i.e., |H d 1 ) are directly prepared by using the scheme in Fig. 1 in the main paper. These magic states are then grown to the distance d2 triangular color code (i.e., |H d 2 ) by using the growing scheme in Fig. 7c in the main paper or in Supplementary Fig. 1b. Initially, m d f ≥ n d f + 1 encoded magic states |H d 2 are prepared where n d f ≡ (3d 2 f + 1)/4 (see Eq. (33) for the definition of m d f ). One of these encoded magic states is further grown to |H via a growing circuitG 1→d f where each gate is encoded in the distance d2 color code. The remaining n d f encoded magic states are used to implement the n d f encoded T † gates in theH implemented during the second time step of the H (d) m circuit. We can thus prepare m d f |H d1 states which are used for implementing the T † gates in addition to injecting one of these states into the circuitG (1→d) . The probability that at least (3d 2 f + 1)/4 + 1 = n d f + 1 |H d1 states pass the verification test is given by where p (d1) acc (p) is the probability of acceptance for preparing the state |H d1 . An accepted |H d1 state then grows into an encoded |H d2 state since the Clifford operations are chosen to be encoded in the distance d 2 triangular color code. Since the weight-six stabilizers of the stabilizer state (and all encoded Clifford gates) are obtained from the circuit in Supplementary Fig. 7, the total number of qubits required for the stabilizer state |S t is and the number of qubits for each |H d1 state is Lastly, since the qubits in the implementation for preparing |H f using the protocol of Fig. 1 in the main paper are encoded in the triangular color code with distance d 2 , we require an additional qubits. Hence, the total average number of qubits n f required to prepare |H f is where P A,H f (p) is the acceptance probability for preparing |H f with Clifford operations encoded in a distance d 2 triangular color code. For a fixed value of p, m d f is chosen to minimize Eq. (37). Note that we assume that all the qubits used to prepare the m d f |H d1 states can be reused to implement the T gates at the end of the H (d) m circuit. In doing so, it is assumed that the time scale required to prepare the |H d2 states is less than or equal to the time scale required to implemented all the encoded operations prior to applying the T gates at the end of theH (d) m circuit. Also, the denominator of Eq. (37) has the factor P for the following reasons: In the first time step, an extra magic state is used in theG (1→d f ) circuit (see Supplementary Fig. 6). Second, only n d f |H d2 states are required when implementing the sequence of T gates (which are implemented after the T † gates).
circuit is repeated (d f − 1)/2 times (and each circuit requires the injection of n d f magic states for both the T † and T gates).
If one is willing to significantly increase the time-overhead required for preparing all the magic states used in the preparation scheme of |H f , the number of qubits required to prepare |H f can be reduced compared to the requirements given by Eq. (37). In particular, one can repeat the |H d1 protocol until n d f magic states are simultaneously accepted and ready to be grown to |H d2 states. Such qubits are then reused to prepare |H d1 states prior to implementing the T † and T gates every time the H (d) m circuit is repeated. If any of the |H d1 states do not pass the verification test described in Fig. 1 in the main paper, the protocol is aborted. In this case, the minimum number of qubits required to prepare |H f is simply We now describe how to compute the space-time overhead for preparing |H f . We first need to consider the spacetime overhead of preparing the m d f |H d1 states, and then growing them to |H d2 states. The space-time overhead for preparing a single |H d1 state is obtain in a similar way to Eq. (6) in the main paper and is given by where t and EC (d1) circuits. When growing a state |H d1 to |H d2 , the measurements of the plaquettes are repeated d 1 times, with the maximum number of time steps for each round of measurement being t (d1) EC (which come from measuring the stabilizers for the state |H d1 using the circuits of Fig. 2 in the main paper). Therefore, the space-time overhead for growing an |H d1 to |H d2 state is given by Since we prepare m d f such states, the space-time overhead for preparing all of the |H d2 states which are injected in the T † gates and the circuitG (1→d f ) is The last step consists of implementing the (d f − 1)/2 pairs ofH andẼC (d f ) circuits. Each implementation of an encoded Clifford gate requires d 2 rounds of error correction, and an encoded block consists of (3d 2 − 1) 2 /4 qubits. Furthermore, if d 2 > 3, the total number of time steps for one round of error correction is 16. Thus the space-time overhead for preparing |H f with encoded Clifford gates is Note that in Eq. (42), we choose a large enough time window for the implementation of the circuitsG (1→d f ) ,H m , this is true for gates applied in between the T † and T gates) and performed using lattice surgery [1][2][3][4], one can perform the appropriate Pauli frame updates to incorporate the correct decoding scheme over the full triangular color code cycle. In our numerical simulations, we pessimistically added an error at each logical Clifford gate location using the failure probabilities obtained in ref. [5] instead of using such probabilities for adding failures over a full distance d 2 triangular color code cycle. Hence the p Supplementary Table 1 are upper bounds on the exact values that would be obtained using our scheme. Combining Eqs. (41) and (42), the total space-time overhead for the |H f preparation scheme is given by . (43) Here d2 is the triangular color code distance used to encode the logical Clifford gates, d f is distance used for the |H state preparation scheme of Fig. 1 in the main paper, d1 is the distance of |H prior to being grown into |H d 2 and p is the physical error rate. We provide both the average number of qubits (given by Eq. (37)) and also the minimum number of qubits (see Eq. (38)) for preparing |H f . The space-time overhead is given by Eq. (43).
In Supplementary Table 1 we provide the average number of qubits, minimum number of qubits and space-time overhead required to prepare |H f states with logical failure rates p (d f ) L < 4 × 10 −15 . To obtain such results, we assume that each Clifford gate encoded in the triangular color code fails according to the logical error rate polynomials obtained in ref. [5] (which we call p (d2) LC (p)). Hence, when preparing the stabilizer state |S t and for all encoded Clifford operations, the ancilla qubit layout for the weight-six checks are chosen as in Supplementary Fig. 7 (note that the weight-four checks remain unchanged). Further, the distance d 1 is chosen to be the smallest d 1 which ensures that |H d1 has a lower logical error rate than p The numerical values obtained in Supplementary Table 1 shows that the least costly scheme to prepare the state |H f with p (d f ) L < 4 × 10 −15 when p = 10 −4 is to first prepare the states |H d1 with d 1 = 7, and to grow these states to encoded d 2 = 11 states. Using distance d 2 = 11 encoded stabilizer operations, the final magic state |H f is prepared using the distance d f = 3 scheme of Fig. 1 in the main paper. On average, the amount of qubits required to prepare such a state is 10,917 and the space-time overhead is 3.91 × 10 6 . If the time cost for preparing the |H d1 states is of a lesser concern, the minimum number of qubits required to prepare |H f with p To obtain a state |H f with p (d f ) L ≈ 10 −15 when p = 10 −3 using a small amount of resources requires encoded stabilizer operations with much lower logical failure rates than what is achieved with the triangular color code family. One viable option is to use stabilizer operations encoded in the surface code due to the low error rates that can be achieved when p = 10 −3 [7]. However, in such a setting, after the states |H d2 have been prepared, they must be teleported to the surface code before they can be injected in the circuit of Fig. 5a in the main paper and in the circuit implementingG (1→d f ) . In particular, one can convert the color code encoded state to the surface code using lattice surgery techniques as was done in Ref. [8]. In Supplementary Table 2 we provide estimates of the qubit overhead for preparing |H f when the stabilizer operations are encoded in the surface code. The cost of first encoding the states |H d2 in the color code and then using extra qubits to convert such states into the surface code is taken into account. However, we assume that the quality of the encoded |H d2 states does not change when performing lattice surgery. Although such an omission is optimistic, we verified numerically that when only the T and T † gate locations are allowed to fail, the protocol of Fig. 1 in the main paper produces |H f states with logical error rates two to four orders of magnitude (depending on the value of p) less than when all stabilizer operations fail according to the noise model described in the Methods section in the main paper. As such, the simulation provides evidence that the logical error rates obtained in Supplementary Table 2 are good estimates of the error rates that would be obtained when considering errors introduced when performing lattice surgery to obtain surface code encoded |H d2 states.   Fig. 1 in the main paper, but with logical stabilizer operations encoded in a distance d2 surface code. Note that the states |H d 2 are first encoded in the color code, and lattice surgery is performed to obtain an |H d 2 state encoded in the surface code as in Ref. [8].
Instead of performing lattice surgery to convert a color code encoded |H d2 state to one encoded in the surface code, another option would be to initially prepare an encoded |H d2 state in a small distance surface code using some other method, such as a magic state distillation protocol. The |H d2 states would then be injected in the T gate circuits of Fig. 5a in the main paper in addition to theG (1→d f ) circuit in order to prepare an |H f state using the methods presented in Fig. 1   if the d f = 7 scheme were chosen. Note that the logical error rate of the surface code is given by 10 −9 for d 2 = 15, 10 −8 for d 2 = 13, and 10 −7 for d 2 = 11 [7]. In contrast, d 2 ≥ 27 would be required if conventional non fault-tolerant magic state distillation schemes are used, since in this case the stabilizer operations should have a logical error rate as low as 10 −15 . A careful analysis of the overhead would require choosing the appropriate magic state distillation protocol (or some other scheme which uses fault-tolerant circuits to prepare encoded magic states) and therefore such an analysis is left for future work.
One could also directly prepare an encoded |H d2 state in a distance d 2 surface code using the non-fault-tolerant state injection methods of ref. [9]. Such states would then be injected in the scheme for preparing |H f (see Supplementary  Fig. 6) with encoded stabilizer operations in a distance d 2 surface code. In this case, the injected |H d2 states would have much higher failure rates compared to the encoded stabilizer operations. In such a setting, extending our scheme to d f ≥ 9 could potentially be very beneficial. However, to avoid preparing a state |H f with d f > 7, one could consider a two level approach. As a first step, one could inject |H d2 states to first prepare an |H f state with d f ≤ 5. Afterwords, the obtained |H could be further injected to prepare a new |H f state with d f ≥ 5.
Note also that the underlying codes that are used for our work belong to the triangular color code family. One avenue of exploration would be to consider the 4.8.8 color code family (see for instance Refs. [10,11]) for potentially better performance. In addition, the color codes used to encode the Clifford operations required two and three ancillas for the weight-four and weight-six stabilizers respectively. Using similar edge weight renormalization schemes to those described in Ref. [5], one could use fewer ancillas for measuring each stabilizer while maintaining the full effective code distance of the Lift decoder. Due to the smaller number of fault locations and reduced ancilla requirements, such an implementation could potentially significantly reduce the overhead for preparing encoded |H states.
When considering the implementation of our scheme with encoded stabilizer operations using the surface code, the d f = 7 version of our scheme was optimal for both p = 10 −4 and p = 10 −3 . A clear direction of future work would be to find a v-flag circuit (with v ≥ 4) allowing a fault-tolerant implementation of a d f ≥ 9 scheme. Such a scheme could potentially further reduce the overhead for preparing |H states with very low error rates.
The schemes considered in this work to prepare |H states are error detection schemes. In particular, for p = 10 −3 and d f > 3, the acceptance probability for preparing an |H state is very low (for instance, only 12% when d f = 5). One way to improve the acceptance probability could be to use qubits encoded in a bosonic code [12] (such as a GKP code [13]) and concatenate such qubits with the color code (the GKP code concatenated with the surface code was considered in Refs. [14][15][16] for quantum memories). By using bosonic qubits, repeated rounds of error correction at the bosonic level prior to measuring the logical Hadamard operator and stabilizers of the color code could be performed to reduce some of the errors afflicting the data and ancilla qubits. Another possibility would be to develop an error correction scheme for preparing an |H state which applies directly to the color code family. Such a scheme would have higher logical error rates compared to an error detection scheme, and the scheduling of the controlled-Hadamard gates would have to be considered more carefully. However, since an error correction scheme would not require any post selection, there could be an interval of physical error rates where it achieved better performance compared to the error detection scheme considered in this work.