Mitigating Coherent Noise Using Pauli Conjugation

Coherent noise can be much more damaging than incoherent (probabilistic) noise in the context of quantum error correction. One solution is to use twirling to turn coherent noise into incoherent Pauli channels. In this article, we argue that if twirling can improve the logical fidelity versus a given noise model, we can always achieve an even higher logical fidelity by simply sandwiching the noise with a chosen pair of Pauli gates, which we call Pauli conjugation. We devise a way to search for the optimal Pauli conjugation scheme and apply it to Steane code, 9-qubit Shor code and distance-3 surface code under global coherent $Z$ noise. The optimal conjugation schemes show improvement in logical fidelity over twirling while the weights of the conjugation gates we need to apply are lower than the average weight of the twirling gates. In our example noise and codes, the concatenated threshold obtained using conjugation is consistently higher than the twirling threshold and can be up to 1.5 times higher than the original threshold where no mitigation is applied. Our simulations show that Pauli conjugation can be robust against gate errors and its advantages over twirling persist as we go to multiple rounds of quantum error correction. Pauli conjugation can be viewed as dynamical decoupling applied to the context of quantum error correction, in which our objective changes from maximising the physical fidelity to maximising the logical fidelity. The approach may be helpful in adapting other noise tailoring techniques in the quantum control theory into quantum error correction.


I. INTRODUCTION
The quantum fault-tolerant threshold theorem states that when the error rate of the physical components is below a certain threshold value for a given quantum error correction code, we can reduce the error rate of the logical qubits indefinitely by scaling up our code [1][2][3]. Thus for a given code, its threshold value is the target hardware error rate the experimentalists will aim for. However, the threshold error rate is defined using the worst case error rate like the diamond distance, while experimentally we can only measure the average error rate like the fidelity efficiently. For Pauli channels, the worst case error rate is similar to the average case error rate. However, for coherent (unitary) errors, their worst case error rate can scale as the square root of the average error rate, making them potentially more damaging to quantum error correction codes [4][5][6][7][8][9][10].
At the physical qubit level, coherent noise can be mitigated using dynamical decoupling [11,12], however there are limitations due to imperfect control pulses and finite pulse durations and intervals. In the context of quantum error correction, local physical coherent noise will be decohered at the logical level as the code scale up [13]. Their damage to the encoded state can be mitigated by using better decoders [14]. Gate-level coherent errors in quantum error correction circuit can be mitigated by * zhenyu.cai@materials.ox.ac.uk † simon.benjamin@materials.ox.ac.uk splitting the stabiliser check into two oppositely rotating halves [15] with some requirements on the gates available to the given architecture. A more general solution would involve using Pauli twirling to turn the coherent noise into a Pauli channel [16][17][18][19], which as mentioned before can be much less damaging to the fault-tolerant threshold. Twirling generally involves using all possible Pauli gates to sandwich the noise channel and averaging over the results. The average weight of the extra twirling gates we need to apply scales with the total number of qubits, thus the gate errors introduced by the twirling gates are not negligible. While such problems can be mitigated by combining twirling gates during circuit compilation [20], it may still be challenging to implement twirling in practice.
In this article, instead of using twirling to combat coherent errors, we propose to deterministically sandwich the noise channel using a chosen pair of Pauli gates, which we call Pauli conjugation. We start by introducing some background concepts in Section II. In Section III, we find ways to reduce the search space for the optimal Pauli conjugation scheme. This is then used in Section IV to compare the logical fidelity and concatenated threshold of Pauli conjugation to those of twirling for several quantum error correction codes under global Z rotation noise. In Section V, we discuss the extension of our technique to multiple rounds of error corrections and conjugations. This is followed by conclusion and discussion of possible future directions in Section VI.

II. LOGICAL FIDELITY IN QUANTUM ERROR CORRECTION A. Pauli Transfer Matrix Formalism
In the Pauli transfer matrix formalism [21], the density operators are written in vector form by decomposing into Pauli basis G ∈ G: where we have defined the inner product as: We have added a scaling factor 1 √ 2 n when we use the Pauli operators as basis, where n is the number of qubits. This is to ensure the normalisation of the basis set {|G }.
In such a way, a general quantum channel E can be written in matrix form: with the matrix elements given by E G G = G |E|G = G |E(G) = 1 2 n Tr(G E(G)).

B. Quantum Error Correction
For a code defined by the set of stabilisers S, we will denote the stabiliser generators as S. In this article, the generator of a set is denoted using . When we talk about the generators for a Pauli set, the composition operation we used in the generation will ignore all the phase factors in front.
We will do stabiliser measurements for all S i ∈ S to extract the error syndrome m whose element m i ∈ {0, 1} is the measurement outcome of the stabiliser generator S i . This will project the noisy state into the corresponding m-syndrome subspace using the syndrome projection operators For each measured syndrome m, we will apply the corresponding recovery operator R m , which is usually chosen to be the most likely Pauli error that leads to the given syndrome. Using to denote a super-operator the overall quantum error correction process can be written as: If we start within the logical subspace, the error correction process C will always project the state back to the logical subspace even after going through a noisy channel N . Thus, the effective channel N 0 = CN will be a error channel that takes one logical state to another, i.e. it is a logical noise channel. The effective logical noise channel N 0 is defined to be the average over all logically equivalent starting and final states: Twirling is a technique for converting an arbitrary error channel into a Pauli channel [16,17,23], which is carried out by taking the average of the error channel conjugated with different gates chosen from a set of Pauli gate W ⊆ G that we call the twirling set. Conventionally, twirling is carried out using the full set of Pauli gates as the twirling set: W = G. However, it is possible to find a smaller W that is equivalent to the full Pauli set as we will see later (also shown in [24]).
Twirling a noise channel N is just Twirling can decohere the Pauli components in the noise channel and turn it into a Pauli channel. This will correspond to removing the off-diagonal elements of the Pauli transfer matrix of the channel. By removing the coherent components of the physical noise channel N , it is possible to improve the logical fidelity of the code. Using (1) and (2), the effective logical channel after twirling is: We can write the logical channel with the noise process conjugated by the Pauli gate W as N (W ) = R W N W , then we have: The logical fidelity of N (W ) is where ρ is a logical state and the integral is over the pure state surface using the Haar measure. Since the fidelity F is a linear function of the noise process N , we can similarly obtain the original logical fidelity F 0 and the twirled logical fidelity F T : There exists a W max ∈ W such that F (W max ) is the maximum F (W ) that we can achieve. By definition we have Thus if we can find such W max and deterministically apply it to the noise instead of randomly apply all W ∈ W, we will obtain a higher fidelity F (W max ) than the twirled fidelity F T .

III. FINDING THE OPTIMAL CONJUGATION GATE
The usual Pauli twirling will have W = G. For n qubits, this means that there are 4 n elements in W that we need to search over to find W max , which is exponentially difficult for large n. Hence, we first need to reduce the size of W in order to find W max effectively.
Rather than dealing with the twirling set W, we will first be working with its generator W. The reason we can work with the generators for our later purposes is outlined in Appendix C.
The generators of the conventional twirling set is just W = G. For a given quantum error correction code, the generators of the Pauli basis G can be divided into the following partitions: • Stabiliser generators S: the set of Pauli operators that define the stabiliser checks of the code.
• Logical generators G: together with the stabiliser generators, they generate the set of logical operators G, which is just the normaliser of the set of stabilisers S.
• Error generators E: All the remaining generators needed to generate the whole Pauli set. Each error generator E anti-commutes with a different subset of stabiliser generators and thus will produce a different syndrome.
Hence, we have W = G = S + E + G Note that we have used the label 'error generators' since each such element creates a code violation, but physical error process can give rise to elements of any of these sets, and in particular those in G which create undetectable logical errors.

A. Removing Stabilisers and Logical Operators
R and S commute because they are both Pauli channels which are diagonal in the form of Pauli transfer matrix. Hence, for any channel N , and logical states |ρ and |ρ , we have: which means that conjugation using stabilisers on any noise channel has a trivial effect on the effective logical channels. Hence, we can remove all stabilisers from the twirling generator set and reduce it to: W = E + G Now if we are calculating the logical fidelity, we are integrating over all the logical pure state using the unitary Haar measure, which is by definition invariant under any unitary transformation. Thus we have: R and G again commute since they are both Pauli channel. Hence we have: Hence, when calculating the logical fidelity, conjugation with logical Pauli operators also acts trivially and can be removed from the twirling generating group. The remaining non-trivial twirling generators are: The way to construct a E consists of only single-qubit X and Z gates is outlined in Appendix A.
B. Twirling Set Reduction Using the Structure of the Noise We will use η(A, B) to denote the commutator between operators A and B: Two super-operators A and B will commute if η(A, B) = e iφ , i.e. their commutator is some phase factor.
We will write our noise channel N in terms of its noise elements N : Now if a twirling generator W satisfies η(W, N ) = e iφ ∀N , then i.e. it act trivially on noise N and hence can be removed.
After such reduction, the twirling generating set now becomes:

C. Symmetry in Code and Noise
The twirling set W can be generated from W following Appendix B. Based on the symmetry existing in both the code and the noise, we can prove the equivalence between different elements in W.
Suppose we manage to find a Clifford operation U such that the code state basis Π 0 G and the physical noise channel N are invariant under its transformation: we can prove that (see Appendix D) i.e. the effective logical channel conjugated with W is the same as that with U † W U . All of such U will form a group U. Hence, we can define an equivalence relation: In such a way, conjugacy with elements in U will split W into several equivalence classes. The elements in the same equivalence class will produce the same logical fidelity when used to conjugate the noise. The simplest type of Clifford transformation to consider is qubit permutation, for which U consists of swap gates. Permutation symmetry of quantum error correction codes has been studied in [10] and [14]. Note that qubit permutation will preserve the weights of the operators, thus it is crucial to construct W to have the elements with the lowest weight possible (see Appendix B), so that more of them can be proven to be in the same equivalence class.
If a code has one logical qubit and its logical Pauli gates consist of applying physical Pauli gates to all the qubits, then such transversal logical Pauli gates are invariant [25] under any qubit permutation U , i.e. U, G = 0 ∀G ∈ G. For such codes, we only need to further make sure that the set of stabilisers are invariant under the given qubit permutation U : to ensure the code symmetry requirement in (3) is satisfied. Furthermore, if some of the stabilisers commute with the noise, then these stabilisers will have trivial effect in the error correction process and thus can be safely ignored. In such a case, we will only need to consider the symmetry of the stabilisers that do not commute with the noise. For example, for a pure Z noise, we can safely ignore the Z stabilisers when we are considering code symmetry.

IV. MITIGATING COHERENT Z NOISE USING PAULI CONJUGATION
In this Section we will try to find the optimal Pauli conjugation gate for different quantum error correction codes under the global Z rotation noise: where J is the number of qubit. This noise is a coherent superposition of all possible Z operators (tensor products of I and Z). The weight-n Z operators in the superposition will have the amplitude (−i sin θ) n (cos θ) J−n . Since the noise only consists of Z components, all pure Z twirling generators will act trivially on the noise, thus can be removed. This noise is symmetric under any qubit permutation. Hence, any permutation symmetry of the quantum error correction code will also exist for the noise.
For all the codes that we will discuss in this section, their logical Pauli gates consist of applying physical Pauli gates to all the qubits. Thus the code symmetry condition in Section III C can be reduced into (5). Along with the fact that we have pure Z noise, we only need to focus on the symmetry of the X stabilisers in this section when we talk about the symmetry of a code, except for the five-qubit code. Global logical Pauli gates also mean that N ( π 2 ) will be the Z logical operator. Thus the logical fidelity curve against different θ will have a rotational symmetry about θ = π 4 (see Appendix E), which means that we only need to look at 0 ≤ θ ≤ π 4 to see the effect of the noise on logical fidelity. In the Steane Code, we have • Stabiliser generators S: X or Z checks on plaquettes (1, 4, 6, 7), (2,4,5,7) and (3,5,6,7).
• Logical generators G: X or Z on all qubits.
Following Section III A, we can construct our twirling generators to be Since the noise only consists of Z components, all pure Z twirling generators will act trivially on the noise, thus can be removed, we then have: which generates the twirling set: Note that here we have transformed the error operators to their lowest weight equivalence that produce the same error syndromes. The Steane code has the same symmetry as the Fano plane [10], whose permutation symmetry group will be denoted as U. Since our noise model is symmetric under any qubit permutation, all U ∈ U satisfied (3). Now for every pair of single-qubit X operators X i , X j ∈ W, we can find at least one U ∈ U such that Hence, using (4) we know that all the remaining singlequbit X twirling operators are equivalent.
There are two equivalence class of twirling gates here, one is equivalent to I, while the other is equivalent to X 1 (or any single-qubit X gate).
The effect of different strategies on the logical fidelity of Steane code is shown in Figure 2. We can see that twirling is consistently better than doing nothing, while X 1 conjugation will yield even higher fidelity than twirling.

B. Other Codes
In this section, we will explore the effect of Pauli conjugation using other codes under the same noise model. The details of finding the equivalent class of conjugating gates for different codes are outlined in Appendix F. Here we will just look at the effect of using conjugating gates in different equivalence classes and compare their effects to doing nothing and twirling.

Five-qubit code
The structure of five-qubit code is shown in Figure 3. There is just one non-trivial conjugating strategy in fivequbit code, which is conjugation with any single-qubit X gate, the same we found in the Steane code. However, in our noise model, we found that this strategy makes no difference to the logical fidelity compared to doing nothing. Consequently, the twirled logical fidelity is also the same. Hence, rather interestingly under our noise model, none of the strategies works for the five-qubit code.

Nine-qubit Shor code
The structure of the nine-qubit Shor code is shown in gations in the nine-qubit Shor code for our noise model: • Single qubit flip: X 1 • Two-qubit flip (in different rows): The effects of these strategies on the logical fidelity are shown in Figure 5. We see that doing nothing will result in a dip at θ = π 6 , where our noise turns into a logical operator. Twirling can definitely mitigate such a problem, leading to a great jump in fidelity. Superior improvements can be achieved by conjugating the noise with X 1 X 4 X 7 .
The result for the other nine-qubit Shor code with the X and Z checks exchanged is shown in Appendix H.

Distance-3 surface code
The structure of the distance-3 surface code is shown in Figure 6. The non-trivial conjugating strategies and  their effects on the logical fidelity are shown in Figure 7.
Again we see improvement of the twirled fidelity over doing nothing, and a marked improvement of conjugating the noise with X 1 X 8 over twirling.

C. Gate Error
In reality, applying extra Pauli gates does not come free due to the errors associated with the gates. We should expect the effect of such errors due to Pauli conjugation to be small since the quantum error correction circuits involve far more gates than Pauli conjugation and also contain two-qubit gates which usually have much lower fidelity than single-qubit Pauli gates. Here we have simulated the performance of different schemes using different codes with depolarising gate error rates of 0.5% and 1% for the encoding circuit, the quantum error correction circuit and the Pauli conjugation gates. From the result in Figure 8 we can see that as we increase the gate error rate, the fidelity curves shift downward without much change to their shapes. Hence, the optimal Pauli conjugation schemes maintain their advantages over doing nothing when we take into account gate errors. The fidelity curves using twirling are not shown. However, we should expect the advantage of Pauli conjugation over twirling increases with increasing gate error rates since the average weights of the twirling gates are higher than that of the conjugation gates in our examples.

D. Concatenated Threshold
As discussed by Rahn et al. [26], after finding the map between the physical noise channel and the logical noise channel with one level of encoding, composing this map will give us the physical-logical noise map for the concatenated code. Here we have assumed that we are using a hard decoder which only takes into account of syndrome information of the current concatenation level. Finding such maps will allow us to compute the performance of a code with different levels of concatenation and hence find its concatenated threshold. Such analysis was carried out in [10] for a variety of codes. Here we will use the local Z noise map obtained in [10] to calculate the concatenated threshold for different codes when we apply different kinds of noise tailoring schemes at the physical level (not at any subsequent levels of concatenation). From the results in Figure 9, we can see the logical fidelity of the threshold crossing points of different noise tailoring schemes are essentially the same. Hence when we try to achieve the threshold logical fidelity with one level of encoding, if one scheme has a higher tolerance of the physical error than another scheme, we should expect a similar improvement in the concatenated threshold. The improvement of the conjugated threshold over the original threshold is 40%, 160% and 110% for the Steane code, 9-qubit Shor code and distance-3 surface code respectively. All of them also show improvements of the conjugated thresholds over the twirled thresholds.

A. Multiple Rounds of Error Correction
Up to now, we have only considered a noise channel consisting of a discrete global Z rotation N (θ) = e −iθ j Zj . Next, we will consider an error channel that consists of many time steps, with each steps being a coherent global Z rotation.
In one extreme, each step can be a rotation of the same angle in the same direction. Then all we need to do is flip all the qubits right in the middle of the channel which flips the direction of the rotation and cancels the coherent error. This is just a simple case of dynamical decoupling.
In the other extreme, the error channel can be a random walk. At each time step, there is a equal probability of positive or negative rotation of angle θ: N (±θ) = e ±iθ j Zj .
We will focus on the random walk case first, which cannot be removed via dynamical decoupling if we cannot apply gates within each time step.
For all the codes we considered, when they undergo a global Z rotation e −iθ j Zj , the effective logical error for a given measured syndrome m after correction will be a logical Z rotation of angle φ m . Hence, the effective logical error channel averaged over all syndrome measurement is: When the sign of rotation of the physical error θ is flipped, the sign of the logical rotation φ m will also be flipped for all the codes that we are considering (see Appendix G). For each time step, since we have equal probability of positive and negative physical rotations, we also have equal probability of positive and negative logical rotations, leading to the effective logical channel: Note that N 0,± is just the logically twirled version of N 0 . Hence, they have the same diagonal elements in their Pauli transfer matrices and thus the same logical fidelity: In the previous sections, the original noise channel we considered was N 0 , the corresponding conjugated noise channel N c will be in a form similar to (7). In the random walk picture, this will lead to a logical dephasing channel N c,± which also follows F (N c ) = F (N c,± ). The corresponding physically twirled noise channel N T will also be a logical dephasing channel. Our previous simulations for one round of error correction show that: Using (9) and its equivalence we have: in which F (N c,± ), F (N T ) and F (N 0,± ) are all logical dephasing channel with different dephasing probability p d . For a single-qubit dephasing channel N d with dephasing probability p d , the eigenvalues of its Pauli transfer matrix will be . Hence, the logical fidelity of k rounds of N d is [27,28]: Thus we have: Combining with (10), we then have: for any positive integer k. Hence, the improvement of logical fidelity using Pauli conjugation over twirling (or doing nothing) with single-round of error correction in coherent Z rotation will indeed persist when we go to multiple rounds of error corrections for the random walk noise model. One might have the following concern: Perhaps replacing the comprehensive randomisation afforded by standard twirling with deterministic interventions may result in some degree of coherent noise persisting on the logical qubit, which can systematically accumulate. Then, even though the logical fidelity may be higher when monitored over a single cycle (of environmental exposure and subsequent error correction), nevertheless over a large number of cycles the conjugation approach may be inferior to twirling. A noise model for which such an effect could occur is the following, albeit artificial, scenario: Returning to the model just considered, we replace the random walk (whereby the direction of phase acquisition will invert from time to time) with a continual positive phase acquisition; as noted earlier the natural solution here is simply to flip all qubits to realise a 'Hahn echo', but suppose we forbid ourselves that strategy and insist on comparing twirling with a Pauli conjugation strategy derived from analysing only a single cycle. This is simulated in Appendix I, showing the advantage of Pauli conjugation against twirling indeed decreases with multiple cycles of quantum error correction, though it is still consistently better than doing nothing.
Fortunately this can be overcome by injecting 'just enough' randomness -the solution might be called 'logical twirling' of the error channel (instead of twirling at the physical level). In our example, we have coherent logical Z noise, which needs to be twirled using logical X operators. Hence, if the optimal conjugation gate we found is W , then when we run the circuit we will randomly choose between the conjugation gates W and W X in each round of error correction. W X and W will have the same performance for single-round error correction since X is a logical operator (Section III A). Hence, such a new scheme will maintain the superior performance of our conjugation scheme and at the same time destroy coherence in the logical noise channel. In such a way, the advantage of our conjugation scheme in single-round error correction can be reliably extended to multi-round error correction. For more general error channel and one logical qubit, we just need to twirl over all four logical Pauli operators.

B. Multiple Rounds of Noise Tailoring
Instead of applying both noise tailoring and error correction at each time step, we can apply just noise tailoring in each time step and only do one round of error correction at the very end.
The matrix elements for the effective noise channel with K rounds of twirling are: Here we divide the noise process into K steps and apply a random Pauli gate W k at the beginning of each step. At the end, we undo all these random Pauli gate by applying the their inverse K k=1 W k and then perform quantum error correction. We denote the set of K Pauli gate chosen using a vector W . Similar to our arguments in Section II C, multiple rounds of twirling correspond to the average of all the Pauli conjugation schemes, thus one of the Pauli conjugations will be optimal and outperforms twirling.
As detailed in Appendix J, if we want to find the equivalent conjugations to reduce the search space for multiround conjugation, we can use similar arguments about the structure of the noise (Section III B) and the symmetries in both the noise and the code (Section III C), while the arguments about interaction with the code space to remove stabilisers and logical operators (Section III A) can only be applied to the outer-most round of conjugation.
The search space of possible conjugations grows exponentially with the number of rounds while the number of symmetries that we can utilise is less than the oneround case (since we cannot remove all the stabilisers and logical operators from the twirling generating set). Hence, iterating over the whole search space might not be practical for a large number of rounds. However, we can still sample different conjugation schemes in our reduced search space to find a better scheme than doing nothing or even twirling, though such a scheme might not be optimal.

VI. CONCLUSION
In this article, we have shown that if the fidelity for a noise channel can be improved via a finite number of twirling operations, there exists a Pauli conjugation that can further improve the fidelity. To search for the optimal Pauli conjugation under a given noise model using a given quantum error correction code, we use the properties and the symmetries of the noise and the code to identify the equivalent conjugations to reduce our search space. We applied our techniques to the Steane code, the Shor code and the surface code under a global Z rotation noise, reducing the 4 n possibilities of Pauli conjugation to 1, 3 and 5 equivalent classes respectively for those three codes. Iterated over these different classes of conjugations, we managed to find the optimal conjugations for each code, which resulted in higher logical fidelities than the twirled and original noise channel. We have shown via simulation that the advantages of the optimal Pauli conjugation schemes remain with gate errors present. Conjugation can also lead to higher concatenated thresholds than the twirled threshold. The conjugated threshold showed improvements over the original thresholds by 40%, 160% and 110% for the three codes we considered under the coherent Z noise. We showed that the advantages of Pauli conjugation can remain for multiple rounds of error correction with the possibility of incorporating logical twirling, and we briefly discussed how to extend our arguments into multiple rounds of Pauli conjugation.
Compared to twirling, Pauli conjugations does not require the implementation of a random circuit, and the weights of the gates that we need to implement can be on average much smaller than twirling as shown by our examples. Being a deterministic scheme, it can be implemented in hardware systems in which modifying the circuit at each run is hard. It can also be used in quantum communication to combat coherent noise in the communication channel without needing to transmit the extra random bit needed by twirling. Single-qubit Pauli gates are usually the gates with the highest fidelity, combining with the fact that the Pauli conjugation gates we need to implement can be low-weight, it should be resilient to gate errors, as shown by our simulation. Hence, Pauli conjugation can be a practical way forward to mitigate errors in real experiments.
The way we reduce the Pauli conjugation search space is highly dependent on the code we use and the noise model we have. Though our techniques work for the simple examples that we have considered, searching over all possible Pauli conjugations may not be feasible when the size of our system increase, when there are very few symmetries in the noise or when we are considering multiple rounds of Pauli conjugation. We can still sample over the different Pauli conjugations to find a scheme with better performance than the original noise channel instead of finding the optimal one. However, a better way may be digging deeper into the reason why the optimal Pauli conjugation works for a given noise and code, which in turn may give insights into how to construct the optimal conjugation or at least lead us to a better searching strategy than random sampling.
Since we know that the coherence in the noise can be quite destructive to the threshold of the codes, we can look into how the different Pauli conjugation schemes affect the unitarity of the effective logical channels [29], to see if the removal of coherent components is one of the reasons that our technique works. Such investigations might give us insights into how to construct Pauli conjugation to reduce the coherent components in the noise or how to better search for such a strategy. We can also look into applying Pauli conjugation to more general error channels beyond the global Z rotation. An example will be the general local Z noise channel considered in [10] or some non-biased noise models like those considered in [30]. To see if the conjugation technique is valuable in fault-tolerant computation, it will also be interesting to see how Pauli conjugation will perform against gate-level coherent noise and whether it can improve the surface code threshold (instead of the concatenated threshold) given a realistic noise model.
There are several degrees of freedom we can add to further optimise our noise tailoring schemes. Firstly, throughout this article we have been focusing on conjugation using Pauli gates, it will be interesting to extend our technique to Clifford gates or even general unitaries. We can also look into the case where we allow Clifford correction [14]. We definitely did not exhaust all the ways to reduce the Pauli conjugation search space. For example, we have only been focusing on the permutation symmetry of code and noise, which at best can only prove the operators with the same weight are equivalent. A next step could be including other Clifford symmetries like CZ gates, etc.
Our conjugation scheme, especially the multi-round variant, in a way can be viewed as bang-bang dynamical decoupling tailored to a given quantum error correction code. It will be a fruitful area to adapt more schemes in the established literature of dynamical decoupling [12] into the context of quantum error correction. We may get a fuller understanding about how to search for better multi-round conjugation scheme from the way we optimise dynamical decoupling using average Hamilto-nian arguments [31] and group theoretic arguments [32]. Ideas like non-equidistant pulses [33], robust decoupling sequences and higher-order decoupling [12] can also be extended into multi-round conjugation.
Besides applications in quantum error correction for memory, the conjugation technique can also be extended into other fields like quantum metrology and quantum simulation. For quantum metrology with error correction [34][35][36], we hope to find conjugation schemes that can tailor the noise into a form that is less damaging to the code and/or tailor the signal into a form that the code is more sensitive towards. When applied to symmetry verification in quantum simulation [37][38][39], conjugation may enable more noise to be detected via transformation of the previously undetected noise components. In the above applications, it is likely that we need to develop more complex conjugation schemes beyond oneround Pauli conjugation. for the noise map used in the calculation of the concatenated thresholds.
ZC acknowledges support from Quantum Motion Technologies Ltd.
3. The full set of elements that can be generated by E does not contain any elements in S or G, otherwise we can replace the generators in E with the elements in S or G.
The full Pauli set G can be generated using all the single physical qubit X and Z gate, after removing the elements that are dependent on each other through composition with stabiliser generators and/or logical generators, we will be left with the generating set E. Hence, we can always find a E that consist of only single qubit X or Z operators. Here we will show how do we construct it. Any practical stabiliser error correction code will be able to detect and correct all single qubit X and Z errors, hence all of these single-qubit errors will violate different subsets of stabiliser checks. The way we construct E is: 1. Find all single-qubit X and Z errors that violate only one stabiliser check and add them to E. We will denote the set of stabiliser checks that they violate as S E .
2. Starting with n = 2, search in the checked physical qubits of the stabiliser checks in S E , there will be X or Z errors on these qubits that fail n stabiliser checks, with one and only one of the failed stabiliser checks not in S E . For each of such error we found, we will add it into E and add the one additional violated stabiliser check into S E . Note that for each new element added into S E , we will have more physical qubits to check.
3. Repeat step 2 with n increasing by 1 in each iteration until S E = S, i.e. until S E contains all the stabiliser checks (or equivalently until | E| = | S|).
In the case of topological code with boundaries, the above scheme is just starting by adding the stabiliser checks at the boundary into S E and slowly progressing inwards, adding the inner stabiliser checks into S E until all stabiliser checks are within S E . In this way of construction, there is no way to find any composition of elements in E such that there are no stabiliser checks fail, hence there is no way to compose stabilisers or logical operators out of these elements.

Appendix B: Construction of W
With the twirling generators W obtained in Section III B, we can now generate the full set of twirling gate W. The elements in the twirling set W will correspond to the error operators that are detectable by our quantum error correction code and will all have different syndromes. For the purpose of twirling, we would want to replace these operators with the lowest weight error operators that produce the same syndrome (i.e. equivalent up to composition with stabilisers and logical operators), since operators with lower weight will be easier to implement with fewer errors induced. These are usually just the recovery operators of the given syndromes, in such case we can just get them from the decoder. For a distance-d code, by definition all errors with weight d − 1 or lower produce non-trivial error syndromes, and all errors with weight d−1 2 or below will have different syndromes (thus correctable). Hence, for all W ∈ W with weight d−1 2 or below, they are already the lowest weight operators that can produce the given syndrome, while for the others in W it may be possible to find a lower weight equivalence (not guaranteed to find since some correctable errors can be of higher weight than d−1 2 , e.g. a surface code with very long Z boundaries and very short X boundaries).

Appendix C: Twirling Generators Reduction
Using to denote 'super-super-operators': we can rewrite that twirling process as: Here we have implicitly assumed that W only acts on N : Hence, the matrix elements for the twirled logical channel are All Pauli super-operators commute, thus all Pauli supersuper-operators also commute. Hence, we can arrange the order of the twirling generators in W ∈ G I + W in any way we want. We will arrange it in the following way where E c is a subset of E that acts trivially on noise N when used for twirling as discussed in Section III B. Ec∈ Ec I + E c will act trivially on N and can be absorbed by N when we put them closest to N .
On the other hand, the twirling of S and G will act trivially on the error correction code and the logical states (see Section III A), hence we can put them nearest to the logical states to remove them.
In Section III C, the equivalence of the twirling gates is obtained via interaction with both the noise elements and the logical states. To allow a generator to interact with both the noise elements and the logical states, we need to permute the symmetry operator of the noise elements or the logical states through the other generators, which will modify them. Hence instead of proving equivalence of generators, we expand out the product of generators to obtain a linear combination of the elements in the twirling set, and prove their equivalence instead.

Appendix D: Derivation of Eqn (4)
If all the code state basis Π 0 G and the physical noise channel N are invariant under the Clifford transformation U : then the recovery channel R will also be invariant under the same transformation since it is completely based on the code and the error channel: U , R = 0.
Hence, we have: Since W and R are both Pauli channel, they commutes, hence For the noise model in (6) and for all the codes we considered which have global logical Z gates, we have: For the worst case fidelity Q for such pure Z noise, we will start and measured in the logical |+ L eigenstate: For fidelity of one qubit, we have: Hence, the logical fidelity curve F (θ) is rotationally symmetry about a point at θ = π 4 2. Fidelity at θ = π 4 For the noise model in (6), at θ = π 4 we have: For an operator U consist of tensor product of singlequbit Z, we will write the set of qubit index that we apply Z gate on as U : Hence, the terms in the expansion of (E1) will be In the case of measuring the zero syndrome, N ( π 4 ) will collapse into a superposition of stabilisers and Z logical operators. For each stabiliser term (−i) | S| S in the expansion, there will be a corresponding logical Z operator term differed by apply Z to all qubits: (−i) J−| S| S j Z j = (−i) J−| S| SZ, hence the terms in the expansion correspond to zero syndrome is: If all stabilisers have even weights, i.e. | S| = 2n, then Thus if all the stabilisers have even weights, and the logical Z operator consist of applying Z to all the qubits, then the terms in the expansion of (E1) is: and similarly for other syndromes. Note that terms of other syndromes will also result in the same amplitude |S|. Hence, we have the same probability of collapse into any syndrome. For odd number of qubits J, we then have a logical Z rotation of the angle π 2 (or − π 2 ), which means a worst case fidelity of 1 2 and an average fidelity of 2 3 * 1 2 + 1 3 = 2 3 .

Appendix F: Equivalent Conjugation Classes for Codes in Global Z Rotation
Recalled the arguments in Section IV. We can make the following simplification based on the codes and the noise model we consider.
• Our noise model is global, thus have all possible qubit permutation symmetry. Along with the fact that our codes have one logical qubit and global Pauli gates mean that we only need to consider the permutation symmetry of the stabilisers when we try to reduce the twirling set.
• The noise model is pure Z noise, thus all Z twirling generators can be removed. All Z stabiliser checks will also have trivial effects (besides the five-qubit code which does not have pure Z checks), thus we only need to consider the permutation symmetry of the X stabilisers.

Five-qubit code
In five-qubit code, the generators are • Stabiliser generators S: XZZXI and three of its cyclic permutations IXZZX, XIXZZ, ZXIXZ • Logical generators G: X or Z on all qubits.
Following Section III A, we can construct the twirling generators: As mentioned above, the Z twirling generators can be safely removed since we have pure Z noise: which generates the twirling set: Here we have transform the error operators X 1 X 2 to its lowest weight equivalence with the same error syndromes Z 4 . Conjugating the noise with Z 4 has trivial effect since we have pure Z noise. The five-qubit code has cyclic permutation symmetry (there are also additional symmetries that we do not need to use here [10]). As discussed in Section III B, using these symmetry transformation, we can easily prove that conjugating the noise with X 1 is equivalent to X 2 since X 2 = U † X 1 U where U is one of the qubit cyclic permutation operator.
Hence, there are two equivalent class of twirling gate, one is equivalent to I, while the other is equivalent to X 1 (or any single-qubit X gate by cyclic permutation).

Nine-qubit Shor code a. Local Z checks
As shown in Figure 4, in nine-qubit Code, the generators are • Stabiliser generators S: • Logical generators G: X or Z on all qubits.
Following Section III A, we can construct our twirling generators: W = E = {X 1 , X 3 , X 4 , X 6 , X 7 , X 9 , Z 1 , Z 7 } As mentioned above, the Z twirling generators can be safely removed since we have pure Z noise: W = {X 1 , X 3 , X 4 , X 6 , X 7 , X 9 } When looking at the Z stabiliser checks, which will produce the syndromes for these X error operators, we realise they are divided into 3 non-overlapping set (no shared checked qubits), which are individual rows in Figure 4. All the error syndromes within each row can be produced by single-qubit X errors within that row. The Z-check syndrome of different rows are independent of each other since they do not share any qubits. Hence, to produce all possible syndromes using error operators with the lowest weight, we will have zero or one singlequbit X errors in each row. This set of error operators will be the full twirling set W that is generated.
Permutation symmetries that exist in the 9-qubit Shor code will be any permutation of the elements within each row and any permutation between the rows shown in Figure 4. In such a case, all the operators with the same weight in our twirling set can be shown to be equivalent, leaving us with the following four equivalent classes of twirling operators. It is still a nine-qubit Shor code, with a swap between the X and Z stabilisers.
Following Section III A, we can construct our twirling generators: As mentioned above, the Z twirling generators can be safely removed since we have pure Z noise: which generates the twirling set: Here we have transform the error operators X 1 X 7 to its lowest weight equivalence with the same error syndromes X 4 .
We have the same code symmetry as the other Shor code, which allows us to prove the equivalence of conjugation with X 1 , X 4 and X 7 .
Hence, there are two equivalent class of twirling gate, one is equivalent to I, while the other is equivalent to X 1 (or any single-qubit X gate).

Distance-3 surface code
The stabiliser generators of the distance-3 surface code is shown in Figure 6.
Following Section III A, we can construct our twirling generators: As mentioned above, the Z twirling generators can be safely removed since we have pure Z noise: which generates the twirling set: The logical Pauli gates of the code are just applying the corresponding Pauli gates to all the physical qubits (because it is an odd distance surface code), hence we only need to look at the symmetry of its stabilisers as discussed in Section III C. We can group the operators that are equivalent due to the rotational symmetry of the code: (X 1 , Since we have only pure Z noise, we only need to look at the symmetry exists in the X stabilisers, leading to additional symmetry in the exchange between qubits (1, 2) and between qubits (8,9). Applying on top of the rotational symmetry, we have the following classes of equivalent conjugations: Conditioned on Syndrome When we expand the physical noise e −iθ j Zj into the sum of tensor products of Z, all the odd-weight term will form the imaginary part, while all even-weight terms will form the real part. Hence, when we flip the sign of θ, which is equivalent to taking the complex conjugate of our noise channel, all the odd-weight terms (the imaginary part) will flip their signs while all the even-weight terms (the real part) will remain the same.
By saying the operator is real (imaginary) here, we mean that the operator has real (imaginary) amplitude.
Since we are only considering Z noise, composing two Z operators together will not lead to any extra phase factor i. Thus when we compose an imaginary Z operator with a real Z operator, we will get an imaginary Z operator, while composing two imaginary or two real operators will give a real Z operator.
For all the codes we consider here, they have stabiliser generators that are all even-weight, which means that they are all real in the noise expansion. For our codes, we can find one of the logical operators Z is odd-weight, which corresponds to imaginary amplitude in the expansion, thus all Z are imaginary since they can be obtained by composing the imaginary Z with the real stabilisers. Hence, when we measured the 0 syndrome, the noise is collapsed into a coherent superposition of I with real amplitude and Z with imaginary amplitude. When we flip the sign of the physical error angle θ, it is the same as taking the complex conjugate, which will flip the sign of Z since its amplitude is pure imaginary. For other syndromes with error E, we still have one of EZ and EI being one real and the other being imaginary and hence similar argument follows.
For codes with odd-weight stabiliser generators, we will have a complex amplitude (mix of real and imaginary) for I. On the other hand, for codes with even-weight logical operators (even distance code), it will give real Z for real I and imaginary Z for imaginary I. In both case, since the phase of I and Z are no longer guaranteed to be differed by i, the logical channel for a given syndrome can no longer be written as a logical Z rotation, but instead a combination of logical Z rotation and logical dephasing channel. This was observed by Huang et. al. [10] for even-distance repetition code and distance-4 surface code.
Appendix H: Pauli Conjugation for the Other 9-qubit Shor Code In the main text, we have only shown results for the 9-qubit Shor code local Z checks. For the 9-qubit Shor code with local X check, its average fidelity of different conjugation schemes and its concatenated threshold plots are shown in Figure 10 and 11.
For the Z noise we considered, the 9-qubit Shor code that has local X checks will actually have better performance since it has more X checks which are sensitive to Z noise. This is shown by the large gap between the two original fidelity curve in Figure 12. However, after using conjugation to tailor the noise to fit the code, the Shor code with local Z checks receives a huge boost in fidelity such that it even exceeds the fidelity of the other Shor code with conjugation. This further exemplifies the power of Pauli conjugation when there is a misfit between the code and the noise.  All the codes that we have considered in this section have one logical qubit and transversal Z gates. When they undergo coherent Z noise, the effective logical error for a given measured syndrome m after correction will be a logical Z rotation of angle θ m . Hence, the effective logical error channel averaged over all the syndrome If we twirl the Z noise channel, then we will have a logical dephasing channel instead. Thus N T is a diagonal matrix with eigenvalues: 1, 1 − 2p d , 1 − 2p d , 1 where p d is the dephasing probability.
Hence, for k round of error correction (each round undergo the same noise as before), we have Tr N k T = 2 + 2(1 − 2p d ) k with the logical fidelity being: Using these formulae, in Figure 13 we have plotted the logical fidelity of different schemes for different codes after 100 cycles; in each cycle, error correction follows a period of exposure to the environment which induces global Z rotation with angle θ. In all codes, we can see the improvements of the Pauli conjugation schemes and the twirling scheme over doing nothing in small θ. In Shor code and surface code, we can see the advantage of our optimal conjugation scheme over twirling still remains for small error angle θ in each round even after 100 rounds. Nevertheless, we can see that in many cases twirling becomes superior to even our best Pauli conjugation after sufficient cycles have occurred; fortunately this can be entirely remedied by an adaption we term 'logical twirling' which we explained earlier in Section V.

Appendix J: Multi-round Twirling Set Reduction
The effective error channel with K rounds of twirling is: The argument about structure of noise (Section III B) can still be applied to the twirling within each individual rounds here, giving us a smaller set of twirling generators W, from which we can obtained a reduced twirling set W: Since we are summing all possible W and the twirling set is a group on super-operator composition, we can do the following change of variables: W k W k+1 ⇒ W k+1 , which gives: In this form, our arguments about interaction of twirling with the code space in Section III A can be applied to the outermost twirling set, obtaining a reduced twirling set W 1 . Hence, we have (N T K ) G,G = 1 Within each round the noise is the same coherent rotation of the strength θ. These figure illustrate the issue that is tackled through 'logical twirling' as we explain Section V.
Similar arguments to Section III C can be made about the symmetries in both noise and code. However, rather than proving the equivalence of using two different Pauli operators in conjugation, we now will prove the equivalence of using two different sets of Pauli operators in conjugation: i.e. after find the symmetry U , we can say a Pauli conjugation set W is equivalent to W when U W U † = W .
Here U W U † is defined as: Note that this is not a simple tensor product of the single round case. For example, if W 1 equivalent to W 1 due to symmetry U and W 2 equivalent to W 2 due to another symmetry U , this does not means that W = (W 1 , W 2 ) is equivalent to W = (W 1 , W 2 ) since the two elements are related by different symmetry: U W U † = W = U W U † .
: Generating set. Note that A means that A can be generated from A, but does not means that A is the complete set of elements that can be generated from A. In our paper, all the composition are carried out ignoring the irrelevant phase factor of the Pauli operators.
G: The Pauli set. It is not the Pauli group since we are ignoring all the phase factors.
W: The twirling set.