Optimal Two-Qubit Circuits for Universal Fault-Tolerant Quantum Computation

We study two-qubit circuits over the Clifford+CS gate set, which consists of the Clifford gates together with the controlled-phase gate CS=diag(1,1,1,i). The Clifford+CS gate set is universal for quantum computation and its elements can be implemented fault-tolerantly in most error-correcting schemes through magic state distillation. Since non-Clifford gates are typically more expensive to perform in a fault-tolerant manner, it is often desirable to construct circuits that use few CS gates. In the present paper, we introduce an efficient and optimal synthesis algorithm for two-qubit Clifford+CS operators. Our algorithm inputs a Clifford+CS operator U and outputs a Clifford+CS circuit for U, which uses the least possible number of CS gates. Because the algorithm is deterministic, the circuit it associates to a Clifford+CS operator can be viewed as a normal form for that operator. We give an explicit description of these normal forms and use this description to derive a worst-case lower bound of 5log(1/epsilon)+O(1) on the number of CS gates required to epsilon-approximate elements of SU(4). Our work leverages a wide variety of mathematical tools that may find further applications in the study of fault-tolerant quantum circuits.


Introduction
In the context of fault-tolerant quantum computing, operations from the Clifford group are relatively easy to perform and are therefore considered inexpensive. In contrast, operations that do not belong to the Clifford group are complicated to execute fault-tolerantly because they require resource intensive distillation protocols [29]. Since non-Clifford operations are necessary for universal quantum computing, it has become standard to use the number of non-Clifford gates in a circuit as a measure of its cost. This fault-tolerant perspective on the cost of circuits has profoundly impacted the field of quantum compiling and significant efforts have been devoted to minimizing the number of non-Clifford operations in circuits.
An important problem in quantum compiling is the problem of exact synthesis: given an operator U known to be exactly representable over some gate set G, find a circuit for U over G. An exact synthesis algorithm is a constructive solution to this problem. When the gate set G is an extension of the Clifford group, it is desirable that the exact synthesis algorithm for G be efficient and produce circuits that use as few non-Clifford gates as possible.
In the past few years, methods from algebraic number theory have been successfully applied to the exact synthesis problem associated to a variety of single-qubit [4,6,9,22,24,30,31] and single-qutrit [5,15,21,28] gate sets. In many cases, the resulting exact synthesis algorithms efficiently produce circuits that are optimal, zero entries of a matrix.
The single-qubit Pauli gates X, Y , and Z are defined as These gates generate the single-qubit Pauli group {i a P ; a ∈ Z 4 and P ∈ {I, X, Y, Z}}. The two-qubit Pauli group, which we denote by P, is defined as P = {i a (P ⊗ Q) ; a ∈ Z 4 and P, Q ∈ {I, X, Y, Z}}. The Clifford gates H, S, and CZ are defined as These gates are known as the Hadamard gate, the phase gate, and the controlled-Z gate, respectively. The single-qubit Clifford group is generated by H and S and contains the primitive 8-th root of unity ω = e iπ 4 . The two-qubit Clifford group, which we denote by C, consists of the operators which can be represented by a two-qubit circuit over the gate set {H, S, CZ}. Equivalently, C is generated by H ⊗ I, I ⊗ H, S ⊗ I, I ⊗ S, and CZ. Up to global phases, the Clifford groups are the normalizers of the Pauli groups.
Clifford gates are well-suited for fault-tolerant quantum computation but the Clifford group is not universal. One can obtain a universal group by extending C with the controlled-phase gate CS defined as In what follows, we focus on the group G of operators which can be represented by a two-qubit circuit over the universal gate set {H, S, CZ, CS}. Equivalently, G is the group generated by H ⊗ I, I ⊗ H, S ⊗ I, I ⊗ S, CZ, and CS. We have P ⊆ C ⊆ G. We sometimes refer to G as the Clifford+CS group or Clifford+controlledphase group. We know from [1] that G is the group of 4 × 4 unitary matrices of the form where k ∈ N and the entries of M belong to Z [i]. In the fault-tolerant setting, the CS gate is considered vastly more expensive than any of the Clifford gates. As a result, the cost of a Clifford+CS circuit is determined by its CS-count: the number of CS gates that appear in the circuit. Our goal is to find circuits for the elements of G that are optimal in CS-count.
We start by introducing a generalization of the CS gate which will be helpful in describing the elements of G.
Definition 2.1. Let P and Q be distinct elements of P \{I} such that P and Q are Hermitian and P Q = QP . Then R(P, Q) is defined as R(P, Q) = exp iπ 2 We have R(Z ⊗ I, I ⊗ Z) = CS. Moreover, since C normalizes P and CR(P, Q)C † = R(CP C † , CQC † ) for every C ∈ C, we know that R(P, Q) ∈ G for every appropriate P, Q ∈ P. We record some important properties of the R(P, Q) gates in the lemma below. Because the proof of the lemma is tedious but relatively straightforward, it is given in Appendix B.  Lemma 2.2. Let C ∈ C and let P , Q, and L be distinct elements of P \ {I}. Assume that P , Q, and L are Hermitian and that P Q = QP , P L = LP , and QL = −LQ. Then the following relations hold: R(P, Q) = R(Q, P ), R(P, −P Q) = R(P, Q), R(P, −Q) ∈ R(P, Q)C, R(P, Q) 2 ∈ C, and R(P, L)R(P, Q) = R(P, Q)R(P, iQL).
We will use the R(P, Q) gates of Definition 2.1 to define normal forms for the elements of G. The equivalences given by Lemma 2.2 show that it is not necessary to use every R(P, Q) gate and the following definition specifies the ones we will be using. Definition 2.3. Let T 1 and T 2 be the subsets of P × P given below.
The set S contains 15 elements which are explicitly listed in Figure 1. It can be verified that all of the elements of S are distinct, even up to right-multiplication by a Clifford gate. It will be helpful to consider the set S ordered as in Figure 1, which is to be read left-to-right and row-by-row. We then write S j to refer to the j-th element of S. For example, S 1 is in the top left of Figure 1, S 5 is in the top right, and S 15 is in the bottom right. The position of R(P, Q) in this ordering roughly expresses the complexity of the Clifford circuit required to conjugate CS to R(P, Q).
We close this section by showing that every element of G can be expressed as a sequence of elements of S followed by a single element of C. Lemma 2.4. Let P and Q be distinct elements of P \ {I} such that P and Q are Hermitian and P Q = QP . Then there exists P , Q ∈ P and C ∈ C such that R(P , Q ) ∈ S and R(P, Q) = R(P , Q )C.
Proof. Let P = i p (P 1 ⊗ P 2 ) and Q = i q (Q 1 ⊗ Q 2 ) with P 1 , P 2 , Q 1 , Q 2 ∈ {I, X, Y, Z}. Since P and Q are Hermitian, p and q must be even. Moreover, by Equations (3) and (5) of Lemma 2.2, we can assume without loss of generality that p = q = 0 so that P = P 1 ⊗ P 2 and Q = Q 1 ⊗ Q 2 . Now, if one of P 1 , P 2 , Q 1 , or Q 2 is I, then we can use Equations (3), (4) and (5) of Lemma 2.2 to rewrite R(P, Q) as R(P , Q )C with C ∈ C and (P , Q ) ∈ T 1 as in Definition 2.3. If, instead, none of P 1 , P 2 , Q 1 , or Q 2 are I, then we can reason similarly to rewrite R(P, Q) as R(P , Q )C with C ∈ C and (P , Q ) ∈ T 2 .
Proof. Let V ∈ G. Then V can be written as V = C 1 · CS · C 2 · CS · . . . · C n · CS · C n+1 where C j ∈ C for j ∈ [n + 1]. Since CS = R(Z ⊗ I, I ⊗ Z) we have Now, by Equation (2) of Lemma 2.2, C 1 R(Z ⊗ I, I ⊗ Z) = C 1 R(Z ⊗ I, I ⊗ Z)C † 1 C 1 = R(P, Q)C 1 for some P, Q ∈ P. We can then apply Lemma 2.4 to get C 1 R(Z ⊗ I, I ⊗ Z) = R(P, Q)C 1 = R(P , Q )CC 1 = R(P , Q )C with C = CC 1 ∈ C and R(P , Q ) ∈ S. Hence, setting R 1 = R(P , Q ) and C 2 = C C 2 , Equation (8) becomes and we can proceed recursively to complete the proof.
3 The Isomorphism SU(4) ∼ = Spin (6) In this section, we describe the exceptional isomorphism SU(4) ∼ = Spin(6) which will allow us to rewrite two-qubit operators as elements of SO (6). Consider some element U of SU(4). Then U acts on C 4 by leftmultiplication. Moreover, this action is norm-preserving. Now let {e j } be the standard orthonormal basis of C 4 . From this basis, we construct an alternative six-component basis using the wedge product.
Definition 3.1 (Wedge product). Let a ∧ b be defined as the wedge product of a and b. Wedge products have the following properties given vectors a, b, c ∈ C n and α, β ∈ C: Note that the anticommutation of wedge products implies that a ∧ a = 0. We say that v 1 ∧ · · · ∧ v k ∈ k C n for v j ∈ C n . To compute the inner product of two wedge products v 1 ∧ · · · ∧ v k and w 1 ∧ · · · ∧ w k , we compute where v q , w r is the entry in the q-th row and r-th column of a k × k matrix.
Remark 3.2. The magnitude of a wedge product of n vectors can be thought of as the n dimensional volume of the parallelotope constructed from those vectors. The orientation of the wedge product defines the direction of circulation around that parallelotope by those vectors.
The wedge product of two vectors in C 4 can be decomposed into a six-component basis as anticommutativity reduces the 16 potential wedge products of elements of {e j } to six. We choose this basis as B = {s −,12,34 , s +,12,34 , s −,23,14 , s +,24,13 , s −,24,13 , s +,23,14 } where We note that B is an orthonormal basis and we assume that B is ordered as in Equation (9). Definition 3.3. Let U ∈ SU(4) and U be its representation in the transformed basis. Let v, w ∈ C 4 with v ∧ w ∈ 2 C 4 . Then the actions of U and U are related by To avoid confusion, we use an overline, as in O, to denote the SO(6) representation of an operator or set of operators O. We are now equipped to define the transformation from SU(4) to SO (6). Definition 3.4. Let U ∈ SU(4) and let j, k ∈ [6]. Then the entry in the j-th row and k-th column of the SO(6) representation U of U is where B j is the j-th element in the ordered basis B, the action of U on B k is defined by Definitions 3.1 and 3.3, and the inner product is defined by Definition 3.1.
As an illustration of the process specified in Definition 3.4 we explicitly calculate the SO(6) representation of a Clifford+CS operator in Appendix A. Moreover, we provide code to compute this isomorphism for any input with our Mathematica package [16].
Remark 3.5. The fact that this isomorphism yields special orthogonal operators is ultimately due to the fact that the Dynkin diagrams for the Lie algebras of SU(4), Spin (6), and SO(6) are equivalent. However, this fact can be easily illustrated through the Euler decomposition of SU(4) [36]. Direct calculation of U for the operator which is explicitly in SO (6). Computation of the other 14 Euler angle rotations required for an SU(4) parameterization yields similar matrices, likewise in SO (6). Since SO(6) is a group under multiplication, the isomorphism applied to any U ∈ SU(4) yields U ∈ SO(6).
We close this section by explicitly calculating the SO(6) representation of each of the generators of G. We multiply the generators by overall phase factors to ensure that each operator has determinant one, and furthermore that single-qubit operators have determinant one on their single-qubit subspace. Later, when referring to gates or their SO(6) representation, we omit overall phases for readability.
Proposition 3.6. The image of the generators of C in SO(6) are Proposition 3.7. The elements of S are given in Figure 2.

Exact Synthesis
In this section, we leverage the isomorphism SU(4) ∼ = Spin(6) described in the previous section to find optimal decompositions for the elements of G. We will be working extensively with the matrix group Note that H ⊆ SO(6). Our interest in H stems from the following observation. Proof. The property holds for the generators of G by Propositions 3.6 and 3.7.
In the remainder of this section, we prove the converse of Proposition 4.1 by defining an algorithm which inputs an element of H and outputs a product of generators. We start by introducing a few notions that are useful in discussing the elements of H.
The least such is the least denominator exponent of V , which we denote by lde(V ). Proof. The only generators with a factor of 1/ √ 2 in their SO (6) representation are the elements of S. Thus, for a least denominator exponent of k there must be at least k of these operators, each of which requires a single CS gate.
The residue matrices introduced in Definition 4.4 are important in the definition of the exact synthesis algorithm. Indeed, the -residue of a Clifford+CS operator U determines the element of S to use in order to reduce the least denominator exponent of U (although not uniquely, as we discuss below). Similar residue matrices are used in the study of other fault-tolerant circuits [1,14].
Recall that if A is a set, then a partition of A is a collection of disjoint nonempty subsets of A whose union is equal to A. The set of all partitions of a set A is denoted B A . Let p and p be two partitions of A. If every element of p is a subset of an element of p then we say that p is coarser than p and that p is finer than p . be a binary matrix with rows r 1 , . . . , r 6 and let p = {p 1 , . . . , p q } be a partition of the set [6]. Then N has the pattern p if for any p j in p and any j 1 , j 2 ∈ p j we have r j1 = r j2 . In this case we also say that N has a |p 1 | × . . . × |p q | pattern. Definition 4.6. Let V ∈ H with lde(V ) = . We define the pattern map p : H → B [6] as the function which maps V to the pattern of ρ (V ). We say that p = p(V ) is the pattern of V . If V 1 and V 2 are two elements of H, we say that V 1 is finer than V 2 or that V 2 is coarser than V 1 if these statements hold for p(V 1 ) and p(V 2 ).
Remark 4.7. In a slight abuse of notation, we extend the pattern map to any valid representation of a Clifford+CS operator. Given a Clifford+CS operator with SU(4) representation U which can be written as a word W over the generators and with SO(6) representation U , we set p(U ) = p(W ) = p(U ). This extension is unambiguous after fixing our transformation from SU(4) to SO(6), as p is insensitive to relative phase changes in U . We incorporate all relational notions described in Definition 4.6 in this extension.
We now analyze the image in SO(6) of certain subsets of G. We start by showing that the image of the Clifford group C is exactly the collection of elements of H with least denominator 0. In other words, C is the group of 6-dimensional signed permutation matrices.
The operators C 1 and C 2 generate {V ∈ H ; lde(V ) = 0}. Hence, if V ∈ H and lde(V ) = 0 then V can be expressed as a product of the image of Clifford gates. Furthermore, V has a 2 × 2 × 2 pattern.
Proof. The rows of V have unit norm and are pairwise orthogonal. Hence, up to a signed permutation of rows and columns, there is only one such matrix, e.g., By Proposition 2.5 the proof is complete, since Clifford operators correspond to signed permutations by Lemma 4.8.
Since k ≥ 2, this implies that the inner product of any column of √ 2 k V with itself is congruent to 0 modulo 4. Similarly, the inner product of two distinct columns as well as analogous row relations. For x ∈ Z, x 2 = 0 mod 4 if and only if x = 0 mod 2. Hence, there must be exactly zero or four odd entries in every column (or row) of M by Equation (14). By Equation (15), we see that the inner product of any two distinct rows must be even. Up to a permutation of rows and columns, we can then deduce that M is one of the two matrices below, which completes the proof.
Proof. For simplicity, we assume that where each r j is a vector of integers.
Proof. By inspection of Figure 2 we see that for every 2 × 2 × 2 pattern q there exists R ∈ S such that p(R) = q. As a result, if p(V ) is a 2 × 2 × 2 or a 2 × 4 pattern, then there exists R ∈ S such that R has a pattern finer than p(V ). By Corollary 4.11, p(V ) is in fact a 2 × 2 × 2 row-pattern or a 2 × 4 row-pattern and thus there exists R ∈ S such that R is finer than V . We can then conclude by Lemma 4.12.
Theorem 4.14. We have G = H.
Proof. G ⊆ H by Proposition 4.1. We now show H ⊆ G. Let V ∈ H. We proceed by induction on the least denominator exponent of V . If lde(V ) = 0 then, by Lemma 4.8, V ∈ C and therefore V ∈ G. Now if lde(V ) > 0, let R be the element of S with the lowest index such that lde(R T V ) = k − 1. Such an element exists by Lemma 4.13. By the induction hypothesis we have R T V ∈ G which implies that The proof of Theorem 4.14 provides an algorithm to decompose an arbitrary element of G into a product of elements of S, followed by an element of C. In the proof, there is freedom in choosing the element of S used to reduce lde(V ). If there is more than one generator with a finer pattern than V , we must make a choice. The ordering imposed on S in Section 2 is used to make this choice in a uniform manner: we always choose the element of S of lowest index. As a result, the exact synthesis algorithm becomes deterministic. The ambiguity in the choice of generator is a consequence of the relations given in Lemma 2.2. In particular, we have R(P, L)R(P, Q) = R(P, Q)R(P, iQL) = R(P, iQL)R(P, L) Table 1: The elements of S and the explicit row patterns they are associated with under FFP. and these three distinct sequences of generators denote the same operator. This is the source of the three-fold ambiguity in choosing a finer 2 × 2 × 2 pattern for a given 2 × 4 pattern.

Generator Associated Patterns Under First Finer Partition (FFP)
We will sometimes refer to the association between elements of S and patterns used in the exact synthesis algorithm of Theorem 4.14 as the first finer partition association, or FFP for short. The association is explicitly described Table 1. Proof. Let U be as stated. If k = 0, then U belongs to C and U is therefore a Clifford. If k > 0, then as in Theorem 4.14, there is a unique R k ∈ S given by FFP such that lde(R T k U ) = k − 1. By induction on the least denominator exponent, we have a deterministic synthesis algorithm to find a sequence such that U = R k · · · R 1 · C which then implies that U = R k · · · R 1 C. Each of these k steps involves a constant number of basic arithmetic operations. This circuit has CS-count k, which is optimal by Lemma 4.3.
Our Mathematica package [16] implements the algorithm referred to in Theorem 4.15 as well as a significant amount of other tools for two-qubit Clifford + CS circuits. Testing of the performance of this algorithm on a modest device is presented in Table 2.

Normal Forms
In the previous section, we introduced a synthesis algorithm for Clifford+CS operators. The algorithm takes as input a Clifford+CS matrix and outputs a circuit for the corresponding operator. The circuit produced by the synthesis algorithm is a word over the alphabet S ∪ C. Because the algorithm is deterministic, the word it associates to each operator can be viewed as a normal form for that operator. In the present section, we use the language of automata to give a detailed description of the structure of these normal forms. We include the definitions of some basic concepts from the theory of automata for completeness. The reader looking for further details is encouraged to consult [35].

Automata
In what follows we sometimes refer to a finite set Σ as an alphabet. In such a context, the elements of Σ are referred to as letters, Σ * denotes the set of words over Σ (which includes the empty word ε), and the subsets of Σ * are called languages over Σ. If w ∈ Σ * is a word over the alphabet Σ, we write |w| for the length of w. Finally, if L and L are two languages over an alphabet Σ then their concatenation L • L is defined as L • L = {ww ; w ∈ L and w ∈ L }. Definition 5.1. A nondeterministic finite automaton is a 5-tuple (Σ, Q, In, Fin, δ) where Σ and Q are finite sets, In and Fin are subsets of Q, and δ : Q × (Σ ∪ {ε}) :→ P(Q) is a function whose codomain is the power set of Q. We call Σ the alphabet, Q the set of states, In and Fin the sets of initial and final states, and δ the transition function.
Remark 5.2. Definition 5.1 is slightly non-standard. Indeed, automata are typically defined as having a single initial state, rather than a collection of them. One can then think of Definition 5.1 as introducing a collection of automata: one for each element of In. Alternatively, Definition 5.1 can also be recovered from the usual definition by assuming that every automaton in the sense of Definition 5.1 in fact has a single initial state s 0 related to the elements of In by δ(s 0 , ε) = In. We chose to introduce automata as in Definition 5.1 because this results in a slightly cleaner presentation.
It is common to define an automaton A = (Σ, Q, In, Fin, δ) by specifying a directed labelled graph called the state graph of A. The vertices of the graph are labelled by states and there is an edge labelled by a letter w ∈ Σ between vertices labelled q and q if q ∈ δ(q, w). The initial and final states are distinguished using arrows and double lines, respectively. For brevity, parallel edges are drawn only once, with their labels separated by a comma.  An automaton A = (Σ, Q, In, Fin, δ) can be used to specify a language L(A) ⊆ Σ * . Intuitively, L(A) is the collection of all the words over Σ that specify a well-formed walk along the state graph of A. The following definition makes this intuition more precise.  The set of words accepted by A is called the language recognized by A and is denoted L(A). If a language is recognized by some nondeterministic finite automata then that language is called regular. The collection of regular languages is closed under a variety of operations. In particular, regular languages are closed under concatenation.

CS-count
Definition 5.6. Let A = (Σ, Q, In, Fin, δ) and A = (Σ, Q , In , Fin , δ ) be two automata. Then the concatenation of A and A is the automaton A • A = (Σ, Q , In, Fin , δ ) where Q = Q Q is the disjoint union of Q and Q and q ∈ Fin and s = ε, δ(q, s) ∪ In q ∈ Fin and s = ε, and δ (q, s) q ∈ Q .
Proposition 5.7. Let A and A be automata recognizing languages L and L , respectively. Then A • A recognizes L • L .
An example of the concatenation of two automata is provided in Figure 3 and Example 5.11 based off of the automata defined in Definitions 5.9 and 5.10 below.

The Structure of Normal Forms
We now consider the alphabet S ∪ C and describe the words over S ∪ C that are output by the synthesis algorithm of Theorem 4.15.
Definition 5.8. Let U ∈ G. The normal form of U is the unique word over S ∪ C output by the synthesis algorithm of Theorem 4.15 on input U . We write N for the collection of all normal forms.
To describe the elements of N , we introduce several automata. It will be convenient for our purposes to enumerate the elements of C. We therefore assume that a total ordering of the 92160 elements of C is chosen and we write C j for the j-th element of C.    Example 5.11. To illustrate Definitions 5.6, 5.9 and 5.10, the automaton S 1,3 •C is represented in Figure 3. It can be verified that the words C 2 , S 2 S 1 C 1 , and S 3 S 1 S 2 C k are accepted by S 1,3 • C while the words S 1 S 1 C 4 and S 3 C 7 S 1 are not. Note in particular that if C 1 is the symbol for the identity, then S 3 C 1 is distinct (as a word) from S 3 . The former is accepted by S 1,3 • C while the latter is not. Despite the state graph of S 1,3 being fully-connected, full-connectivity does not necessarily hold for state graphs of other S n,m automata.
We will use the automata introduced in Definitions 5.9 and 5.10 to describe the elements of N . Our goal is to show that N = L(S 1,3 • S 4,9 • S 10,15 • C) We start by establishing a few propositions. Proof. By Definitions 5.9 and 5.10.
We emphasize that the inclusions in Proposition 5.12 are strict. This implies that L(S 1,3 •S 4,9 •S 10,15 •C) can be written as the disjoint union of L(C), L(S 1,15 • C), and L(S 1,9 • S 10,15 • C). The lemmas below show that these languages correspond to disjoint subsets of N and, in combination, suffice to prove Equation (17).
Lemma 5.13. Let U be a word over S ∪ C. Then U ∈ L(C) if and only if U ∈ N and U has length 1, i.e U ∈ C.
Proof. By Definition 5.9 and Theorem 4.15.
Lemma 5.14. Let U be a word over S ∪ C. Then U ∈ L(S 1,15 • C) \ L(C) if and only if U ∈ N and U has a 2 × 2 × 2 pattern.
Proof. First, note that L(C) is the set of words of length 1 accepted by S 1,15 • C. This means that L(S 1,15 • C) \ L(C) consists of all the words of length k ≥ 2 accepted by S 1,15 • C. Furthermore, by Lemma 4.8, there are no normal forms of length 1 which have a 2 × 2 × 2 pattern. Thus, to prove our lemma it suffices to establish the following equality of sets {U ∈ L(S 1,15 • C) ; |U | = k} = {U ∈ N ; |U | = k and p(U ) is a 2 × 2 × 2 pattern} (18) for all k ≥ 2. We proceed by induction on k.
Thus, SC must also be the unique word produced by the synthesis algorithm on input U and hence U ∈ N . This accounts for all words of length 2 in N . Therefore Equation (18) holds when k = 2.
• Now suppose that Equation (18) holds for some k ≥ 2. Let U ∈ L(S 1,15 • C) be a word of length k whose first letter is S ∈ S. Then U ∈ N and p(U ) = p(S) is a 2 × 2 × 2 pattern. Furthermore, the least denominator exponent of U is k − 1. We will show that Equation (18)  We conclude that {a, b} is a subset of an element of p(U ). Furthermore, by Lemma 4.10 and Equation (16) we see that, since r a +r b = 0, p(U ) cannot be a 2×4 pattern, and therefore {a, b} ∈ p(U ). As this holds for all {a, b} ∈ p(S ), we conclude that p(S ) = p(U ). Thus, by the induction hypothesis, S U will be the word produced by the synthesis algorithm when applied to U . Hence, U ∈ N and p(U ) is a 2 × 2 × 2 pattern.
⊇: Suppose that U is a normal form of length k + 1 with a 2 × 2 × 2 pattern. Write U as U = S V for some unknown normal form V . We then have p(S ) = p(U ). Let {a, b} ∈ p(S ) and let the corresponding rows of the residue matrix of V be r a and r b . Explicitly, we have Direct calculation of the rows of the residue matrix for U yields Since p(U ) is not a 2 × 4 pattern, we conclude that r a + r b = 0 and thus that r a = r b . Therefore, there is no element of cardinality four in p(V ). Since lde(V ) > 0, p(V ) must then be a 2 × 2 × 2 pattern. Consequently, we have V = U as defined above. Because {a, b} ∈ p(U ) = p(S), we know p(S ) ∩ p(S) = ∅. Given that S = S j and S = S j , we conclude that j ∈ δ S,15 (j , S = S j ). Because S = S j is the first letter of the word U , we know the initial state of U must be j. Therefore, by the induction hypothesis, U = S U is accepted by S 1,15 • C.
We have shown that Equation (18) holds for words of length k + 1 if it holds for words of length k. This completes the inductive step.
Lemma 5.14 characterized the normal forms that have a 2 × 2 × 2 pattern. The two lemmas below jointly characterize the normal forms that have a 2 × 4 pattern. Because their proofs are similar in spirit to that of Lemma 5.14, they have been relegated to Appendix C. Proof. If |U | = 1 then the result follows from Lemma 5.13. If |U | > 1, then U has a 2 × 2 × 2 or a 2 × 4 pattern and the result follows from Proposition 5.12 and Lemmas 5.14, 5.15 and 5.16.

Lower Bounds
Recall that the distance between operators U and V is defined as Because G is universal, for every > 0 and every element U ∈ SU(4), there exists V ∈ G such that U − V ≤ . In such a case we say that V is an -approximation of U . We now take advantage of Theorem 5.17 to count Clifford+CS operators and use these results to derive a worst-case lower bound on the CS-count of approximations. Proof. Each Clifford+CS operator is represented by a unique normal form and this representation is CSoptimal. Hence, to count the number of Clifford+CS operators of CS-count n, it suffices to count the normal forms of CS-count n. By Theorem 5.17, and since Clifford operators have CS-count 0, a normal form of CS-count n is a word w = w 1 w 2 w 3 w 4 (19) such that w 1 ∈ L(S 1,3 ), w 2 ∈ L(S 4,9 ), w 3 ∈ L(S 10,15 ), w 4 ∈ L(C) and the CS-counts of w 1 , w 2 , and w 3 sum to n. There are (6 · 8 n−1 + 6 · 4 n−1 + 3 · 2 n−1 ) · |C| (20) words of the form of Equation (19) such that exactly one of w 1 , w 2 , or w 3 is not ε. Similarly, there are words of the form of Equation (19) such that exactly two of w 1 , w 2 , or w 3 are not ε. Finally, the number of words of the form of Equation (19) such that w 1 , w 2 , and w 3 are not ε is Summing Equations (20), (21) and (22) and applying the geometric series formula then yields the desired result.
Proof. Recall that the Clifford+CS operators of CS-count 0 are exactly the Clifford operators and that |C| = 92160. The result then follows from Lemma 6.1 and the geometric series formula. Proposition 6.3. For every ∈ R >0 , there exists U ∈ SU(4) such that any Clifford+CS -approximation of U has CS-count at least 5 log 2 (1/ ) − 0.67.
Proof. By a volume counting argument. Each operator must occupy an -ball worth of volume in 15dimensional SU(4) space, and the sum of all these volumes must add to the total volume of SU(4) which is ( √ 2π 9 )/3. The number of circuits up to CS-count n is taken from Corollary 6.2 (we must divide the result by two to account for the absence of overall phase ω in the special unitary group) and a 15-dimensional -ball has a volume of π 15 2 Γ 15 2 + 1 15 .
Let U be an element of G of determinant 1. By Equation (1) of Section 2, U can be written as where k ∈ N and the entries of M belong to Z [i]. We can therefore talk about the least denominator exponent of the SU(4) representation of U . We finish this section by relating the least denominator exponent of the SU(4) representation of U and the CS-count of the normal form of U .
Proposition 6.4. Let U be an element of G of determinant 1, let k be the least denominator exponent of the SU(4) representation of U , and let k be the CS-count of the normal form of U . Then Proof. The CS-count of the normal form of U is equal to the least denominator exponent of the SO(6) representation of U . Equation (11) then implies the upper bound for k . Likewise, examination of Theorem 5.17 reveals that the CS operators in the circuit for U must be separated from one another by a Clifford with a least denominator exponent of at most 2 in its unitary representation. Combining this with the fact that the largest least denominator exponent of an operator in C is 3, we arrive at the lower bound for k .
Remark 6.5. It was established in [31] that, for single-qubit Clifford+T operators of determinant 1, there is a simple relation between the least denominator exponent of an operator and its T -count: if the least denominator exponent of the operator is k, then its T -count is 2k − 2 or 2k. Interestingly, this is not the case for Clifford+CS operators in SU(4), as suggested by Proposition 6.4. Clearly, the CS-count of an operator always scales linearly with the least denominator exponent of its unitary representation. For large k, computational experiments with our code [16] suggest that most operators are such that k ≈ k, though there are examples of operators with k ≈ 2k. One example of such an operator is [R(X ⊗ I, I ⊗ Z)R(X ⊗ I, I ⊗ X)R(Z ⊗ I, I ⊗ X)R(Z ⊗ I, I ⊗ Z)] m for m ∈ N.

Conclusion
We described an exact synthesis algorithm for a fault-tolerant multi-qubit gate set which is simultaneously optimal, practically efficient, and explicitly characterizes all possible outputs. The algorithm establishes the existence of a unique normal form for two-qubit Clifford+CS circuits. We showed that the normal form for an operator can be computed with a number of arithmetic operations linear in the gate-count of the output circuit. Finally, we used a volume counting argument to show that, in the typical case, -approximations of two-qubit unitaries will require a CS-count of at least 5 log 2 (1/ ). We hope that the techniques developed in the present work can be used to obtain optimal multi-qubit normal forms for other two-qubit gate sets, such as the two-qubit Clifford+T gate set. Indeed, it can be shown that the SO(6) representation of Clifford+T operators are exactly the set of SO(6) matrices with entries in the ring Z[1/ √ 2]. Further afield, the exceptional isomorphism for SU(8) could potentially be leveraged to design good synthesis algorithms for three-qubit operators. Such algorithms would provide a powerful basis for more general quantum compilers.
An interesting avenue for future research is to investigate whether the techniques and results presented in this paper can be used in the context of synthillation. Quantum circuit synthesis and magic state distillation are often kept separate. But it was shown in [7] that performing synthesis and distillation simultaneously (synthillation) can lead to overall savings. The analysis presented in [7] uses T gates and T states. Leveraging higher-dimensional synthesis methods such as the ones presented here, along with distillation of CS states, could yield further savings. We would like to thank Matthew Amy, Xiaoning Bian, and Peter Selinger for helpful discussions. In addition, we would like to thank the anonymous reviewers whose comments greatly improved the paper.

A Computing SO(6) Representations
Consider the unitary matrix Suppose we want to compute the entry in the third row and fourth column of it's equivalent SO(6) representative U , i.e. U 3,4 .Note that we have det(U ) = 1, so U ∈ SU(4) and we do not have to multiply by a phase before mapping U to U . We need in order to calculate Computing U B 4 directly, we have Using the anticommutation of the wedge product, we can simplify this expression as Examining the rule for computing inner products of wedge products, we see that we have e i ∧ e j , e k ∧ e = e i , e k e j , e − e i , e e j , e k = δ i,k δ j, − δ i, δ j,k and using this property along with the linearity of the inner product, we compute Computing the full matrix, we have

B Proof of Lemma 2.2
This appendix contains a proof of Lemma 2.2, whose statement we reproduce below for completeness.
Lemma. Let C ∈ C and let P , Q, and L be distinct elements of P \ {I}. Assume that P , Q, and L are Hermitian and that P Q = QP , P L = LP , and QL = −LQ. Then the following relations hold: R(P, Q) = R(Q, P ), R(P, −P Q) = R(P, Q), R(P, Q) 2 ∈ C, and R(P, L)R(P, Q) = R(P, Q)R(P, iQL).
Proof. Since C is the normalizer of P, CP C † and CQC † are Hermitian and commuting elements of P \ {I}. Equation (23) We now show that the last two terms in Equation (29) By the definition of the Clifford group and as exp((iπ/4)I) = exp(iπ/4)I ∈ C, we therefore conclude As both Q and −P Q are Hermitain Paulis, we conclude from Equations (29) and (30)  Finally, note that if R(P, Q) is as in Definition 2.1 then (I − P )/2 and (I − Q)/2 are idempotent. We can therefore explicitly compute the exponential to get We use Lemma C.1 to concisely describe the pattern of the rightmost letter of a word U ∈ L(S a,b ) ⊆ L(S 1,15 ) throughout the remaining lemmas.
• Consider a length k accepting word U of the above form such that = 1. Then U = S U where, given that the first letter of U is S, we have S ∈ {S 1 , · · · , S 9 }, S ∈ {S 10 , · · · , S 15 }, and U ∈ L(S 10,15 • C).
We know that we must have p and p(S ) is finer than p(U ). For each such 2 × 4 pattern, there is one and only one element of {S 1 , · · · , S 9 } which is a finer partition than p(U ), and therefore S is the leftmost syllable of the normal form equivalient to U under FFP. As U ∈ N , we thus conclude that U ∈ N .
• Now suppose that Equation (31) holds for some k ≥ 3. We will show that Equation (31) holds for k + 1 by establishing two inclusions.
• Consider a length k accepting word U of the above form such that = 1. Then U = S U where, given that the first letter of U is S, we have S ∈ {S 1 , · · · , S 3 }, S ∈ {S 4 , · · · , S 9 }, and U ∈ L(S 4,9 • S 10,15 • C). We know that we must have p(S ) ∩ p(S) = ∅, and by inspection the only way this is acheived is if p(S ) ∩ p(S) = {{a, b}} ⊆ {{x, y} ; (x, y) ∈ [3] × [4, 6]}. As p(S −1 ) = p(S ) is such that p(S −1 ) ∩ p(U ) = ∅, we conclude that S −1 is not finer than p(U ) and that the least denominator exponent of U is k − 1 by Lemmas 4.13 and 5.14. In the case that U has a 2 × 2 × 2 pattern, we know that p(U ) = p(S) and so {a, b} ∈ p(U ). In the case that U has a 2 × 4 pattern, we know that there exists {c, d} ∈ {{x, y} ; (x, y) ∈ and that p(S ) is finer than p(U ). For each such 2 × 4 pattern, there is one and only one element of {S 1 , · · · , S 3 } which is a finer partition than p(U ), and therefore S is the leftmost syllable of the normal form equivalient to U under FFP. As U ∈ N , we thus conclude that U ∈ N .
• Now suppose that Equation (32) holds for some k ≥ 3. We will show that Equation (32) holds for k + 1 by establishing two inclusions.

⊆:
We have already proven this inclusion in the case of = 1 for all k. We therefore need only consider the > 1 case. Let U ∈ L(S 1,3 • S 4,9 • S 10,15 • C) \ L(S 1,9 • S 10,15 • C) be a word of length k whose first letter is S ∈ {S 1 , · · · , S 3 }. and we therefore conclude that {c, f } ∈ p(U ) and U has a 2 × 4 pattern. As p(S ) is the lowestindexed element of S finer than p(U ), under FFP we conclude that S is the leftmost syllable of the normal form equivalent to U . Since U ∈ N by assumption, we therefore conclude U = S U ∈ N , and have established U has a 2 × 4 pattern such that p(U ) ∩ {{x, y} ; (x, y) ∈