Introduction

The nemesis of quantum information processing is decoherence, the outcome of the inevitable interaction of a quantum system with its environment, or bath. Several methods exist that are capable of mitigating this undesired effect. Of particular interest to us here are quantum error correction (QEC)1,2,3,4 and dynamical decoupling (DD)5,6,7,8. QEC is a closed-loop control scheme which encodes information and flushes entropy from the system via a continual supply of fresh ancilla qubits, which carry off error syndromes. DD is an open-loop control scheme that reduces the rate of entropy growth by means of pulses applied to the system, which stroboscopically decouple it from the environment. QEC and DD have complementary strengths and weaknesses. QEC is relatively resource-heavy, but can be extended into a fully fault-tolerant scheme, complete with an accuracy threshold theorem9,10,11,12,13,14. DD demands significantly more modest resources, can theoretically achieve arbitrarily high decoherence suppression15,16,17,18,19,20,21,22, but cannot by itself be made fully fault-tolerant23.

A natural question is whether a hybrid QEC-DD scheme is advantageous relative to using each method separately in the setting of fault-tolerant quantum computing (FTQC). Typically, improvements in gate accuracy achieved by DD mean that more noise can be tolerated by a hybrid QEC-DD scheme than by QEC alone and that invoking DD can reduce the overhead cost of QEC. While early studies identified various advantages24,25,26, they did not address fault tolerance. A substantial step forward was taken in Ref. 27, which analyzed “DD-protected gates” (DDPGs) in the FTQC setting. Such gates are obtained by preceding every physical gate (i.e., a gate acting directly on the physical qubits) in a fault tolerant quantum circuit by a DD sequence. DDPGs can be less noisy than the bare, unprotected gates, since DD sequences can substantially reduce the strength of the effective system-environment interaction just at the moment before the physical gate is applied. The gains can be very substantial if the intrinsic noise per gate is sufficiently small and can make quantum computing scalable with DDPGs, where it was not with unprotected gates27.

The analysis in Ref. 27 assumed a “local” perspective. Rather than analyzing the complete FT quantum circuit, each single- or multi-qubit gate was separately DD-protected. This required a strong locality constraint limiting the spatial correlations in the noise, known as the “local bath” assumption. Unfortunately, many physically relevant error models violate this assumption12,13,14.

Here we aim to integrate DD with FTQC using a global perspective. This appears to be necessary in order to achieve high order decoupling in a multi-qubit setting, under general noise models. Rather than protecting individual gates we shall show how an entire FT quantum register, including data and ancilla qubits, can be enhanced using DD. This will allow us to relax the restrictive local bath assumption. Along the way, we identify a DD strategy that takes into account the basic structure and building blocks of FT quantum circuits and identify optimal DD pulse sequences compatible with this structure, that drastically reduce the number of pulses required compared with previous designs. Such a reduction is crucial in order to reap the benefits of DD protection, for if a DD sequence becomes too long, noise can accumulate to such an extent as to outweigh any DD enhancements.

Results

The noise model

We assume a completely general noise Hamiltonian H acting on the joint system-bath Hilbert space, the only assumption being that ||H|| < ∞, where ||·|| denotes the sup-operator norm (the largest singular value, or largest eigenvalue for positive operators). Some noise models, such as bosonic baths, violate the ||H|| < ∞ assumption. In this case our analysis still applies, but operator norms must be replaced by frequency cutoffs5,14,27. Informally, H contains a “good” and a “bad” part, the latter being the one we wish to decouple. H is k-local, i.e., involves up to k-body interactions, with k ≥ 1. We allow for arbitrary interactions between the system and the bath, as well as between different parts of the system or between different parts of the bath. See Fig. 1.

Figure 1
figure 1

Qubits and corresponding baths represented as white and black circles respectively.

Bath operators corresponding to different operators inside a box do not necessarily commute, while they do if the baths are in different boxes. The Hamiltonians considered are general within each box, but not between them. In (a) a diagram of the “local bath assumption” used in Ref. 27 is shown, while (b) represents the general scenario considered in fault-tolerance12,13,14. In (c) we illustrate one of our key results: domains are allowed to grow logarithmically in the size of the problem the FTQC is solving. The dark grey boxes represent such domains, each containing O[log(ktot)] physical qubits at the highest level of concatenation, where ktot is the total number of logical qubits. When two domains need to interact (light grey box), then the joint DD generator set is used and the locality of the bath is updated accordingly.

Dynamical decoupling

DD pulse sequences comprise a series of rapid unitary rotations of the system qubits about different axes, separated by certain pulse intervals and generated by a control Hamiltonian HC(t). They are designed to suppress decoherence arising from the “bad” terms in H. This is typically manifested in the suppression or even vanishing of the first N orders, in powers of the total evolution time T, of the time-dependent perturbation expansion (Dyson or Magnus series28) of the evolution operator , where H(t) is H in the “toggling frame” (the interaction picture generated by the DD pulse sequence Hamiltonian HC(t))8 and denotes time-ordering. When the first non-identity system-term of U(T) appears at O(TN+1) one speaks of Nth order decoupling. Such DD sequences are now known and well understood.

Most DD sequences can be defined in terms of pulses chosen from a mutually orthogonal operator set (MOOS), i.e., a set of unitary and Hermitian operators , (identity) i and such that any pair of operators either commute or anticommute20. The generator set of a MOOS (gMOOS), , is defined as the minimal subset such that every element of Ω is a product of elements of but no element in is itself a product of elements in (throughout this work we denote the generator set of a set S by and the cardinality of a set S by |S|). All deterministic DD sequences are finitely generated, meaning that the pulses are elements, or products of elements, of a finite DD generator set (DGS), which we identify with the gMOOS .

The centralizer of the MOOS Ω is , i.e., the set of operators which commute with all MOOS elements. A good example of a gMOOS is the generator set , where X(i) (Z(i)) denotes the Pauli-x (z) matrix acting on the ith qubit, of the Pauli group on n qubits (the group of all n-fold tensor products of the standard Pauli matrices P1 = {1, X, Y, Z}, modulo ). For simplicity, since we will be dealing with qubits and are particularly interested in decoupling sequences that allow for bitwise pulses, we shall assume henceforth that . It is necessary to recast the notion of decoupling order in the MOOS scenario, since the previously mentioned notion turns out to be too strong for our purposes.

Note that any operator A can decomposed as A = A0 + Ar, where A0 (Ar) denotes the component that commutes (does not commute) with all elements of a MOOS, i.e., . We shall say that a pulse sequence with generator set lasting total time T achieves “Nth order -decoupling” if the joint system-bath unitary evolution operator at the conclusion of the sequence becomes

where the effective Hamiltonian is

When this holds we likewise say that the subspace invariant under and the subgroup have been decoupled to order N, in the sense that any operator not in appears only in O(TN+1). Thus the choice of the pulse generator set determines what subspace(s) can be decoupled and conversely a subspace one is interested in decoupling to arbitrary order implies a choice of .

Optimization of the DGS

We define the cost of a DD sequence as the total number of pulse intervals it uses to achieve Nth order -decoupling. While this definition is motivated by the bang-bang limit of DD, i.e., zero-width pulses with finite pulse intervals, it can be modified to accommodate other types of constraints, such as finite bandwidth. The core of the arguments we develop here is unchanged for different cost functions, as long as there is an underlying group structure. For all known DD sequences (even those optimized for multiple qubits29), the cost is at least

and f(N) depends on the particular DD sequence. Pulse interval optimization has already reduced f(N) from 2N for CDD to N + 1 for NUDD. Here we are concerned instead with the optimization of the cost exponent , to which end the following theorem will prove to be crucial:

Theorem 1: Let B be a subgroup of the Pauli group Pn, generated by . Consider a DGS which decouples . Then . Moreover, the DGS decouples B in the desired sense and automatically saturates the bound.

As an immediate application, we reproduce the well-known result that and hence , is optimal for n qubits without encoding8. Indeed, in this case the most general noise Hamiltonian is spanned by the elements of the “error group” B = Pn, so and thus by Theorem 1 for any DGS it must be that . On the other hand indeed decouples Pn since . Note also that Eq. (2c) yields . Moreover, since DD sequences are known that achieve Nth order -decoupling for n ≥ 1 qubits (specifically CDD15 and NUDD20, with explicit -based constructions given in Ref. 20), the generating set is the smallest one capable of achieving Nth order decoupling of a general n-qubit Hamiltonian. However, as we discuss next, there is a better choice for the purpose of protecting a code subspace.

DD generator set for a QEC code

Consider a set of n physical qubits encoding k logical and r gauge qubits via some distance d code, i.e., an [[n, k, r, d]] subsystem code30,31,32 (or an [[n, k, d]] stabilizer code4 for r = 0), subject to the general noise model described above. Let denote the stabilizer generators, where Q = n − (k + r), let denote the logical-operator generators of the code and the gauge operator generators. In the [[n, k, d]] code case, each error correctable by the code maps a codeword to a syndrome subspace labelled by an error syndrome, i.e., a sequence of ±1 eigenvalues of the stabilizer generators4. In order to properly integrate DD with QEC, we require a set of DD generators which preserves the error syndromes to order N, i.e., such that acts trivially on each of the syndrome subspaces and does not mix them, so that at the conclusion of the sequence the original noise model for which the code was chosen, is preserved (again, to order N). This form of the Nth order -decoupling requirement will enable error correction to function as intended. A simple, but important observation is that in light of this, we do not need to protect the complete 2n-dimensional Hilbert space , but rather the 2nk syndrome subspaces. We seek to minimize the impact of decoupling an encoded state.

An intuitive choice for the DGS is the complete set of stabilizer and logical operator generators, i.e., let . Then . We refer to any DD sequence having a DGS of this type as a “stabilizer-logical DD” (SLDD) sequence. While this choice is natural it is not obvious that is also optimal and this is were Theorem 1 will prove to be useful. (One might try instead to choose as a DD sequence generator set the stabilizers only, i.e., let 24. However, since the logical operators of the same code commute with these stabilizer DD pulses, they are not decoupled and hence have non-trivial action on the code subspace, thus causing logical errors. Formally, when , will contain logical operators.) Now note that if , then the elements in commute and they define subspaces characterized by their eigenvalue under the action of . In this case we have independent Nth order -decoupling of each of these subspaces. In other words, leaves each of the syndrome subspaces invariant and does not mix them, as desired. Note that the choice also applies to subsystem codes30,31,32. In this case each of the syndrome subspaces can be decomposed as , where is invariant under , since the gauge operators act non-trivially on only.

Optimal DGS for concatenated QEC codes

Many FTQC constructions are based on concatenated QEC codes33, so what is the optimal DGS for this case, cost-wise? Suppose an [[n, k, r, d]] code is concatenated R times. A complete generator set for all the stabilizers of such a code is given by , where is the stabilizer generator set of concatenation level q. Let denote the set of Rth-concatenation level logical generators.

Theorem 2: The optimal DGS for decoupling all the syndrome subspaces at concatenation level R is the SLDD set , where .

Note that the above theorem is general in the sense that is not limited to concatenated QEC codes as it only requires a stabilizer structure and a set of logical generators satisfying the MOOS structure. The only thing that will change for other stabilizer codes, such as surface or topological codes34, is the actual expression in terms of n, k. Note also that by setting R = 1 Theorem 2 reduces to the optimality of SLDD for subspace or subsystem codes, with as claimed above. The subspace case is recovered by setting r = 0. Let us now quantify the gain achieved by choosing the optimal DGS for QEC codes.

Relative cost of SLDD

For an [[n, k, d]] code and an SLDD sequence, the number of stabilizer generators (nk) plus logical operator generators (2k) yields , which means that . Often , so that . In the case of [[n, k, r, d]] subsystem codes32 the advantage is more pronounced: the number of stabilizers is nkr, so . As an example consider the Bacon-Shor [[m × m, 1, (m − 1)2, m]] subsystem code30, which has the highest (analytically) known fault-tolerant threshold for error correction routines with12 and without measurements35. In this case one would have , a polynomial advantage that grows with the block size m.

Let us illustrate this using the 9-qubit Bacon-Shor code. This code is defined via the stabilizer generator set {XXXXXXIII, XXXIIIXXX, ZZIZZIZZI, IZZIZZIZZ} and has logical operators ZL = ZIIZIIZII, XL = XXXIIIIII. Using the above result we can decouple the relevant subspaces using just the six-element DGS containing

as opposed to the full-decoupling case which would use the eighteen-element DGS containing , where Ai denotes the operator A acting on qubit i.

Choice of DGS for protecting ancilla states

The protection of certain ancilla states is also an important part of fault tolerance. Such states can be thought of as QEC codes with small stabilizer sets. E.g., is often used for fault-tolerant stabilizer measurements or for teleportation of encoded information. The stabilizer is generated by and equals the DGS.

Decoupling multiple subgroups or invariant subspaces

Keeping in mind our goal of integrating DD with FTQC for a complete quantum register, a common scenario is that in which multiple subgroups have to be simultaneously decoupled. While in principle the optimal DGS can be found for any particular subgroup, as we have already done for QEC codes and certain ancillary states, one would like a way to build the optimal DGS for a collection of subgroups. How can this be done optimally? Assume that there are distinct and non-overlapping sets of {ni} physical qubits comprising a quantum register, e.g., a complete register comprising k logical qubits, along with the corresponding ancillas. Assume that they are partitioned into sets of sizes , such that and that each set i is encoded in some subsystem (or subspace) code [[ni, ki, ri, di]]. For each block of ki logical qubits we have an SLDD sequence with DGS .

Let the Hamiltonians of the different sets be {Hi} and spanned by the error groups . Using Theorem 1, it follows that if is optimal for error group Bi then optimally decouples the joint Hamiltonian spanned by . This form of composing a larger DGS out of the union of smaller, independent DGSs guarantees that each term of a general Hamiltonian acting on the whole register must anticommute with at least one element in , which in turn implies that , used to construct, e.g., a CDD or NUDD sequence, is capable of independent Nth order -decoupling of each of the subgroups. This can be formalized as follows:

Corollary 1: Let Bi be a of subgroup of the Pauli group , generated by . If and satisfies the MOOS conditions, then the DGS which decouples the collection of subgroups {Bi} satisfies . Moreover, the DGS decouples in the desired sense and automatically saturates the bound.

This result is important since it allows the modular construction of optimal DGSs, such as the one required for decoupling two or more encoded qubits, or an encoded qubit and an ancilla.

Discussion

We have now assembled and described all the ingredients for optimally combining DD with FTQC for protection of a complete quantum register. However, we must ensure that the cost of implementing the DD sequence does not spoil quantum speedups. To this end we consider once more an [[n, k, r, d]] subsystem code concatenated R times, used to encode an entire quantum register and divide the register into d(R) domains (e.g., a code block along with ancillas) of size kD(R) = O(kR) logical qubits, such that the total number of logical qubits in the register is ktot = d(R)kD(R). We then optimally decouple the ith domain using an SLDD sequence generated by , (where and act non-trivially only on the qubits in the domain i) and ask for the maximal allowed size of each domain such that the DD sequence cost scales polynomially in ktot, as this will ensure that any exponential quantum speedup is retained.

Corollary 2: In a fault tolerant quantum computation the maximal allowed domain size compatible with a DGS having cost , is O[log(ktot)].

Corollary 2 means that we can relax the local bath assumption27, an assumption tantamount to assuming constant domain size kD ≤ 2; instead we find that domains are allowed to grow logarithmically with problem size. Note that this scaling law for the domain size is the same as one would obtain in the absence of encoding. I.e., the code structure does not impose a fundamental limit on the profitable combination of DD and QEC. The scaling law is thus a statement regarding known DD sequences, in particular CDD or NUDD, even when the DGS is optimized, as in SLDD. When two domains i and j are required to interact, the joint DGS should be used [see Fig. 1(c)]. The size of these domains is large enough that they can sustain a full logical qubit, i.e., an encoded logical qubit with all the necessary ancillas for quantum correction and single qubit gates. Moreover, the size of such a joint DGS is compatible with retaining exponential quantum speedups. If the result is that at the highest concatenation level the noise per gate has been reduced (as shown explicitly for the local bath setting in Ref. 27), then a reduction in the number of required concatenation levels is enabled, hence reducing the overall overhead, or the effective noise threshold.

So far we discussed the problem of protecting stored quantum information; what about computation? Quantum logic operations can be combined with DD, e.g., using “decouple while compute” schemes36,37, or (concatenated) dynamically corrected gates [(C)DCGs] for finite-width pulses38, or dynamically protected gates27 in the zero-width (ideal) pulse limit. The optimal SLDD scheme introduced here is directly portable into the latter two schemes, since they use the same DD building blocks and the associated group structure. It is important to emphasize that SLDD sequences require only bitwise (i.e., transversal) pulses and can be generated by one-local Hamiltonians, thus not altering the assumptions of the CDCG construction. More importantly, the polynomial scaling guaranteed by Corollary 2 also applies in the quantum logic scenario, thus allowing, in principle, a fidelity improvement without sacrificing the speedup of quantum computing.

In conclusion, in this work we identified the optimal decoupling generating set in the general context of protection of subspaces invariant under the action of a group and showed how this can be applied in the context of information encoded into a quantum error correcting code. This allowed us to show how DD and FTQC can be optimally integrated. In doing so we showed that one can simultaneously protect disjoint domains growing logarithmically with problem size, thus improving over the constant-size domains associated with the local-bath assumption made in earlier work on hybrid DD-FTQC strategies. Future work should focus on demonstrating that DD-enhanced FTQC results in improved resource overheads and lower noise thresholds and identify, or rule out, multi-qubit DD sequences with sub-exponential scaling in the cardinality of their generating sets.

Methods

Dynamical decoupling background

Concatenated DD (CDD)15, the first explicit arbitrary order DD method, uses a recursive nesting of elementary pulse sequences and (provided pulse intervals can be made arbitrarily small) can be used to achieve Nth order decoupling of n qubits with both N and n arbitrary, but requires a number of pulses that is exponential in both N and n15. Pulse-interval optimized sequences are now known for purely longitudinal or purely transversal system-bath coupling, requiring only N + 1 pulses for Nth order decoupling16. The Uhrig DD (UDD) sequence that accomplishes this was generalized to the quadratic DD (QDD) sequence for general decoherence of a single qubit19, which uses a nesting of the transversal and longitudinal UDD sequences to achieve Nth order decoupling using (N + 1)2 pulses, an exponential improvement over CDD and concatenated UDD18. Both UDD and QDD are essentially optimal in terms of the number of pulses required and are provably universal for arbitrary, bounded baths17,21,22. Generalizing from QDD, nested UDD (NUDD) pulse sequences were proposed for arbitrary system-environment coupling involving n qubits or even higher-dimensional systems20. NUDD requires (N + 1)2n pulses to decouple n qubits to Nth order from an arbitrary environment. For more details see, e.g., the recent review39.

Proof of Theorem 1

Let B be generated by , so that and consider the DD generating set . One can associate to each bi a string where encodes the effect the pulse Ωα has on the error term bi (commutes or anticommutes), via if ΩαbiΩα = ±bi, i.e., . The total number of such strings is |B|, i.e., . Note that if is associated with the “identity string” {+,…,+} then it will not be decoupled since it commutes with all decoupling pulses. Now, we can associate bits (over the ± alphabet) to the DD sequence generators. From these bits we can construct exactly distinct strings , where and . Let us map the r(i′) strings, , to the s(i) strings, . Clearly, if B has “too many” elements, i.e., if , then the mapping will be one-to-many, i.e., some of the r(i′) strings will have to be repeated, meaning that the set will contain duplicates. The product of any two duplicate strings is the identity string {+,…,+}. But since B is a group, this means that the product of the two distinct elements of B associated with a duplicated string is also a group element and moreover is associated with the identity string. Since the elements of B are in the Pauli group, the product of any two distinct elements cannot be the identity operator. Thus we have shown that there is a non-identity element of B which is associated with the identity-string and hence is not decoupled. On the other hand, a DD generating set of cardinality exists and is just itself.

Theorem 1 can in fact be generalized by allowing B to not be a subgroup of Pn, although we do not require or use this more general version here. The proof is similar: if a DGS satisfying the MOOS properties exists such that the only element of B that commutes with all elements in is 1 and, if each element in has a unique inverse then, following an argument similar to the one used in the proof of Theorem 1, such a DGS decoupling B satisfies . This more general result applies to higher dimensional subsystems, such as qudits. The existence of such a DGS is guaranteed, in particular, for subgroups of Pn.

Counting the number of stabilizer generators in a concatenated subsystem code

Before proving Theorem 2 we explain how to number of stabilizer generators behaves under concenation. The total number of physical qubits n in an [[n, k, d]] stabilizer subspace code equals the sum of the Q = nk stabilizer and k logical qubits4. After concatenating R times , and hence . Likewise, the total number of physical qubits n in an [[n, k, r, d]] stabilizer subsystem code equals the sum of the Q = n − (k + r) stabilizer, k logical and r gauge qubits32. One can always view an [[n, k, r, d]] subsystem code as an [[n, k′, d′]] subspace code with k′ = k + r and distance d′ ≤ d: in a subsystem code only the k qubits designated as logical qubits are associated with the code distance d, whereas the gauge qubits have distance at most d. For example, in the [[9, 1, 4, 3]] Bacon-Shor code30 the gauge qubits have distance 2 while the logical qubit has distance 3. Thus, after concatenating an [[n, k, r, d]] stabilizer subsystem code R times, the number of physical qubits is n(R) = nR, which equals the sum of the Q(R) stabilizer qubits, L(R) = kR logical qubits (with distance d) and G(R) gauge qubits (with distance ≤ d). Alternatively, viewed as an [[n, k′, d′]] subspace code concatenated R times, it has L′(R) = (k′)R logical qubits. However, these logical qubits are the logical and gauge qubits of the original code, i.e., L′(R) = L(R) + G(R), so that L(R) + G(R) = (k + r)R.

Proof of Theorem 2

The number of physical qubits after R levels of concatenation of any [[n, k, r, d]] subsystem stabilizer code is n(R) = nR and the error group for the entire Hilbert space is the Pauli group Pn(R). We need to protect the 2Q(R) syndrome subspaces, where is the total number of stabilizer generators after the code is concatenated R times. Q(R) = n(R) − L(R) − G(R), where, as shown above, L(R) = kR [G(R)] is the number of logical (gauge) qubits at level R and L(R) + G(R) = (k + r)R.

The SLDD sequence generated by satisfies the requirement of independent Nth order -decoupling of the 2Q(R) syndrome subspaces since the stabilizers (as DD pulses) remove the errors at each level q, logical included (recall that a logical error at level q − 1 anticommutes with at least one level q stabilizer generator), but not the logical errors at the top level, for which we need as DD pulses. Moreover, for this sequence as claimed. Now, any operator in Pn(R) which is not a stabilizer or gauge operator acts as an error either within or between syndrome subspaces. Thus our choice of code dictates which elements of Pn(R) act as errors and clearly this error set is precisely , where the centralizer generator is . We have . On the other hand , so that and , which proves the optimality of by virtue of Theorem 1.

This implies that is not only the natural choice, since it exactly decouple , but is also the optimal choice.

We note that formally, the QEC structure and the corresponding set separate Pn(R) into families40,41 , such that commutes with the sth element of the DGS and anticommutes with the sth element of the set . Operators with αs = 1, for s corresponding to any of the elements in generate transitions between the syndrome subspaces, while operators with for corresponding to any of the elements in generate transitions within the syndrome subspace. So is a subgroup of Pn with the product rule , where denotes bitwise sum modulo two and with generators. This effectively maps the problem into one in which one has to decouple the subspace invariant under elements of .

Proof of Corollary 2

We assume that the total cost per domain is Eq. (3) as it captures all known DD sequences. Theorem 2 shows that (the O symbol is used since we allow for the presence of ancillas in the domain). We may assume that the code has parameters such that n ~ r ~ k, so that . Now recall that R = O[log log(ktot)] in a fault-tolerant simulation of a quantum circuit42. Therefore requires kD(R) = O[log(poly(ktot))] = O[log(ktot)].