Abstract
Quantum compilation is the task of translating a highlevel description of a quantum algorithm into a sequence of lowlevel quantum operations. We propose and motivate the use of XorAndInverter Graphs (XAG) to specify Boolean functions for quantum compilation. We present three different XAGbased compilation algorithms to synthesize quantum circuits in the Clifford + T library, hence targeting faulttolerant quantum computing. The algorithms are designed to minimize relevant cost functions, such as the number of qubits, the Tcount, and the Tdepth, while allowing the flexibility of exploring different solutions. We present novel resource estimation results for relevant cryptographic and arithmetic benchmarks. The achieved results show a significant reduction in both Tcount and Tdepth when compared with the stateoftheart.
Similar content being viewed by others
Introduction
Different programming languages are currently available to program quantum computers at a high level of abstraction, with the purpose of enabling a wide community to exploit their exceptional computation capabilities. Relevant examples are: Q# (Microsoft)^{1}, Qiskit (IBM)^{2}, PyQuil/Quil (Rigetti)^{3}, Circ (Google)^{4}, Quipper^{5}, Scaffold/ScaffCC^{6}, and ProjectQ^{7}. These languages require fast and reliable methods to compile the program into hardwarespecific lowlevel quantum operations. The compilation result is evaluated by the number of qubits used, as well as by the number and the entity of lowlevel operations obtained.
Many quantum algorithms, such as Grover’s^{8}, Shor’s^{9} and HHL^{10}, require the computation of some combinational logic functions, e.g., arithmetic functions, which usually need large amounts of resources to be computed. Methods capable of generating quantum circuits for such logic designs are needed to run these algorithms on a quantum computer. For example, HHL requires the reciprocal operation, which causes a significant overhead in the number of qubits with respect to the other components of the algorithm. In some cases, the resources required to perform logic operations may dominate the overall resources and exceed the available computing power. Besides, quantum circuits performing combinational logic, called oracles, find application in postquantum cryptography. It has been shown how Grover’s algorithm can be used to break symmetric encryption schemes such as the Advanced Encryption Standard (AES), if the quantum circuit for the encryption function is known^{11,12}. The number of resources required to break a newly proposed postquantum encryption scheme depends on the resources required to build the corresponding quantum oracle. Consider for example the categories for publickey schemes proposed by the National Institute of Standards and Technology (NIST) in their proposal to standardize postquantum cryptography^{13}. Shor’s algorithm also requires combinational logic and can be used to construct quantum algorithms for integer factorization, finite field discrete logarithms, and elliptic curve discrete logarithms. As a consequence, cryptosystems based on these problems cannot be considered secure in a postquantum environment.
Even if the technology is nowadays still far from achieving the system sizes and performances that these applications require, estimating the resources needed to perform combinational functions has a relevant impact on the design and applicability of advanced quantum algorithms. The resource footprint of these operations, e.g., a large number of quantum operations and qubits, can exceed the actual resources available, hence preventing some algorithms to be computed. Consequently, there is a large interest in compilation methods that minimize the impact of combinational logic on the cost of quantum algorithms.
Several research works focus on improving (often manually) quantum implementations of cryptographic functions. As Shor’s algorithm can be used to break elliptic curve cryptography, authors of^{14} have optimized the required quantum circuit that computes the costly elliptic curve scalar multiplication. The authors of ref. ^{11} present Clifford + T implementations of AES (key size 128, 192, and 256) used to evaluate the resources needed to run an exhaustive key search with Grover’s algorithm. In ref. ^{15}, authors present resource estimations of quantum preimage attacks on SHA2 and SHA3. They present quantum oracles for SHA256 and SHA3256. They improve the reversible implementations derived in ref. ^{16} and evaluate the cost of running the attack on a surface code based faulttolerant quantum computer. In ref. ^{17} authors focus on improving the implementation of the Sbox of AES to simplify Grover based key search. Similarly, authors in ref. ^{18} provide implementations for SHA256 and AES128, result successively improved by Jaques et al. ^{12}.
In this work, we focus on the problem of automatically compiling arbitrary logic functions for faulttolerant quantum computing, starting from a multilevel logic network representation. With respect to the previously cited works, we do not rely on manual and designspecific optimizations. Our automatic compilation strategies are designed to minimize qubits and gates, with an emphasis on exploring the tradeoff between the two cost functions. The algorithms are inspired by methods currently applied in classical multilevel logic synthesis—a 50 years old research field focused on optimization and mapping of combinational designs^{19}. Algorithms and data structures developed in this field can be borrowed, adapted, and expanded to the synthesis of quantum circuits. In particular, we exploit a convenient graphbased data structure called XorAndInverter Graphs (XAG). As we target faulttolerant quantum computing, we compile into the Clifford + T universal library and focus on the following cost functions: the Tcount—the number of generated T gates; the Tdepth—the maximum number of T gates to be performed sequentially, also referred to as number of Tstages; and the number of qubits. We identify how the characteristics of the network impact the resource footprint of the compiled circuit and elaborate on how the network could be modified to achieve better compilation results using stateoftheart minimization strategies^{20,21}.
Logic networks are often used as convenient representation to develop scalable reversible synthesis algorithms^{22,23,24}. A recent work^{25} presents an automatic hierarchical synthesis method that leverages lookup table (LUT) decomposition. Such a method has the advantage of being applicable to any logic network, independently of the Boolean function implemented by its nodes. More importantly, it enables us to control the number of generated qubits: the network is decomposed into several singleoutput subnetworks whose results are stored into extra qubits. By controlling the size of the subnetworks, it is possible to control the extra qubits generated. Nevertheless, the method is not able to efficiently optimize the gate count. Typically, when the number of qubits is heavily constrained, the number of gates significantly increases. This happens because large subnetworks will be generated and, with no control on the Boolean functions they implement, they will likely be compiled into a large circuit. In addition, LUT decomposition causes a windowing effect: parts of the networks are prevented from being synthesized together, resulting in more gates. To address this issue, the work in ref. ^{26} implements an LUT decomposition strategy which allows some control on the grouped logic, reducing the Tcount.
The present work is based on a different synthesis approach that enables better control over all the cost functions, which we introduced for the first time in ref. ^{27}. This approach is based on identifying repeated patterns in the network, which conveniently translate into quantum circuits with few gates. In particular, the graph is decomposed into parts that can be implemented by one single Toffoli gate. Hence, a direct correlation can be established between the features of the networks and the cost in terms of T gates (Tcount and Tdepth) and number of qubits.
In this work, we present all the latest improvements on XAGbased compilation, which reflect in the algorithms collected in the opensource library caterpillar. We propose XAGbased compilation as the method of choice to automatically synthesize quantum circuits implementing cryptographic and arithmetic logic functions with application in postquantum cryptography and faulttolerant quantum computing. Through the provided detailed description of the algorithms, the reader can identify (i) the most suited algorithm and (ii) the best XAG preprocessing steps to be used with respect to a specific compilation problem.
The first algorithm presented, which was originally proposed in ref. ^{27}, minimizes the Tcount by correlating it with the number of AND nodes in the XAG (multiplicative complexity). Indeed, the final circuit achieves the upperbound in the number of T gates of four times the multiplicative complexity of the input network. We demonstrated in ref. ^{27} an average 20 × reduction in Tcount with respect to LUTbased methods. The second algorithm proposed minimizes the Tdepth by relating it to (i) the maximum number of levels in the graph with AND nodes, i.e., the multiplicative depth, and (ii) the number of AND nodes in the same level sharing input signals. This algorithm achieves a Tdepth equal to the multiplicative depth of the graph and has been originally used in ref. ^{28} to synthesize designs with maximum 5 inputs. We provide a detailed algorithmic description of both algorithms. Furthermore, we present synthesis results for relevant cryptographic benchmarks (https://homes.esat.kuleuven.be/^{~}nsmart/MPC/ and http://cswww.cs.yale.edu/homes/peralta/CircuitStuff/CMT.html), which can serve as resource estimation for postquantum attacks. Such results are compared with the stateoftheart estimates available in the literature for some of the designs, showing improvement with respect to both Tcount and Tdepth. Differently from ref. ^{28}, we provide resource estimation results for very large designs, proving the scalability of the proposed methods. We discuss and compare the results that the two methods achieve in addition to explaining how properties of the XAGs can be modified to tune the obtained results. For example, we identify the node scheduling as a key tool to minimize the number of qubits when using the second algorithm.
Finally, in this paper we propose a third compilation algorithm that performs quantum memory management to explore the tradeoff between qubits and Tcount. The number of available helper qubits can be selected as a parameter of the algorithm, which will return a valid compilation solution to not exceed the given qubit constraint, then an optimization procedure reduces the number of T gates. In particular, it exploits SAT solvers to find a strategy to fit the logic into a constrained number of qubits. The idea is to enable the reuse of helper qubits by uncomputing intermediate results, solving the socalled reversible pebbling game^{29}. In a previous work^{30} we introduced the problem of quantum memory management and proposed a solution based on SAT. With respect to the first attempt to apply this idea to XAGs in ref. ^{27}, here we propose to work at a wider level of granularity. In other words, while the previous method was enabling computation and uncomputation of every single node in the XAG separately, in this approach we group selected sets of nodes together. This allows us to control the overhead in the number of gates generated when constraining the number of qubits. We present a SAT encoding that, by reducing the number of variables and the size of clauses, is applicable to larger designs and enables a second optimization algorithm to further improve the Tcount of the compiled results. We demonstrate the ability of this method to tradeoff qubits for gates on a selection of our benchmarks.
In classical logic synthesis, a good method is based on the synergy between data structure and algorithm, working together to minimize the target functions. Multilevel logic networks proved to be both scalable and compact data structures. For example, the AndInverter Graph (AIG) is a popular network used both in academic and industrial frameworks^{31,32}.
In this work, we present different algorithms for the synthesis of quantum circuits that rely on the convenient representation of the logic as an XAG. This is a logic network over the gate basis {∧, ⊕, ¬}, meaning that each node of the network either computes the 2input AND operation, the exclusiveOR operation, i.e., the 2input XOR, or the inversion operation \(\neg x=1\oplus x=\bar{x}\). We use \(\bar{x}\) to denote the Boolean complement of \({x}={1x}\), and define \({x}^{0}=\bar{x}\) and x^{1} = x. A simple XAG computing the majorityofthree Boolean function is shown in Fig. 1a.
A Boolean chain is a formal notation for logic networks. Given primary inputs x_{1},…x_{n}, a logic network consisting of r local function is represented by a sequence called Boolean chain
where f_{i} is a gate function with ar(f_{i}) inputs and 0 ≤ i_{j} < i for 1 ≤ j ≤ ar(f_{i}) are indexes to primary inputs or previous steps in the sequence, as defined in ref. ^{33}. An XAG logic network representing an nvariable Boolean function with inputs x_{1}, …, x_{n} is modeled as a Boolean chain with steps
for n < i ≤ n + r, depending on whether the step computes the 2input XOR or the 2input AND operation, where r is the number of steps. The constant values 1 ≤ j(i) < k(i) < i point to input or previous steps in the chain. When a step computes the AND operation, the Boolean constants p(i) and q(i) are used to possibly complement the gate’s fanin. Please note that complemented inputs of XOR gates can be propagated to their outputs, hence we do not define p(i) and q(i) for the XOR steps. The value of a singleoutput function is computed by the last step of the chain \(f={x}_{n+r}^{p}\), which may be complemented. In the case of multioutput functions, there will be a set of steps that computes the function’s values: \({f}_{o}={x}_{o}^{p}\), where o ∈ O is the list of all the output indices. We write ∘_{i} = ∧ , if step i computes an AND gate, and ∘_{i} = ⊕, if step i computes an XOR gate.
We define the multiplicative complexity of the logic network as the number of AND gates it contains: \(\tilde{c}= \{i {\circ }_{i}=\wedge \}\). We also define the multiplicative complexity of the Boolean function, which is the minimum number of AND nodes required to represent it as an XAG. Clearly, the multiplicative complexity of a network is an upper bound on the multiplicative complexity of the Boolean function it realizes.
In this work, we exploit the fact that every AND node acts on two multiinput parity functions. When the input to the AND node is either a primary input, another AND gate, or a network’s output, the arity of this function is equal to 1. Formally, let the linear transitive fanin of a node x_{i} in the logic network be defined using the recursive function
where ‘Δ’ denotes the symmetric difference of two sets. It is easy to see that all elements in ltfi(x_{i}) are either inputs, outputs, or steps that compute an AND gate. Figure 4 illustrate an AND node and its two linear transitive fanin cones.
Example 1
The network in Fig. 1a, in which dotted lines represent inversion, implements the majorityofthree function \(\left\langle {x}_{1}{x}_{2}{x}_{3}\right\rangle ={x}_{1}{x}_{2}\vee {x}_{1}{x}_{3}\vee {x}_{2}{x}_{3}\). The network corresponds to a Boolean chain with four steps:
For this network
Finally, we introduce the concept of level in the XAG network. Every step x_{i} of the network, with 1 ≤ i ≤ n + r is characterized by a quantity called level and defined as:
In other words, a network’s node x_{i} is at level L(x_{i}) = l only if the node with the maximum level among all the ones in the linear transitive fanin cones of x_{i} is at level l − 1. This means that only AND nodes and outputs count to define the depth of the network, because only AND and outputs nodes appear in the ltfi sets. We define \({\max }_{n\,{ < }\,i\le n+r}L({x}_{i})\) as the multiplicative depth of the network.
In addition to providing a very compact representation for Boolean functions, XAG networks have another characteristic that makes them excellent data structures for quantum compilation: each node represents a logic function for which a convenient quantum circuit implementation exists. This allows us to recognize the existence of a dependency between the network characteristics, e.g., the multiplicative complexity/depth, and the synthesized quantum circuit. It is indeed possible to derive an upper bound on the number of expensive gates from characteristics of the XAG.
Given a logic network computing an nvariable Boolean function f(x), a compilation algorithm finds a quantum circuit that implements the unitary operation
where k is the number of extra qubits internally used by the circuit and restored back to \(\left0\right\rangle\), also referred to as helper qubits. This circuit is often called oracle. Automatic compilation of logic designs requires two steps, illustrated in Fig. 1: (i) transforming a possibly nonreversible Boolean function into a reversible quantum circuit, and (ii) translating the reversible circuit into a quantum circuit.
The first step is responsible of mapping the Boolean function into a reversible circuit. A reversible circuit is a logic representation characterized by a fixed number of lines that store inputs, outputs, and intermediate data, acted upon by reversible gates. For example, Fig. 1b shows the reversible circuit performing the function specified by the XAG in Fig. 1a. Such circuit is built using 2input Toffoli gates, CNOT gates, and X gates (or NOT). The Toffoli gate is characterized by a set of two controls x_{1}, x_{2} and by a single target y_{1}. It performs the transformation:
In other words, it inverts the target only if the logic AND of the two controls evaluates to one. In practice, if y_{1} is initialized to \(\left0\right\rangle\), the Toffoli gate performs the AND operation. The CNOT is specified by a target and by a control qubit: it complements the target if the state of the control is \(\left1\right\rangle\). If applied on target in the state \(\left0\right\rangle\) the CNOT gate copies the state of the control.
Once the Boolean function is expressed using reversible gates, it needs to be compiled into a quantum circuit. Quantum circuits are a way to describe quantum programs: a sequence of operations performed on qubits, represented by quantum gates. We expect the reader to be familiar with the quantum circuit representation and gate abstractions and refer to ref. ^{34} for a detailed description. In faulttolerant quantum computing, we consider gates from the Clifford + T universal library. This consists of the CNOT gate, the Hadamard gate (H), as well as the T gate, and its inverse T^{†}. The T gate is particularly expensive to be applied. As a consequence, the Tcount (number of T gates) is a good measure for the cost of a faulttolerant implementation of a given quantum program^{35,36}.
Our algorithms exploit well known stateoftheart quantum implementations of the 2input Toffoli gate. The Toffoli gate has a Clifford + T implementation that requires 7 T gates^{37}, which is optimum^{38,39}:
This implementation has been used to derive the quantum circuit for the majorityofthree function shown in Fig. 1c. When the Toffoli gate is computed on a qubit initialized to \(\left0\right\rangle\), it can be implemented using 4 T gates, with a Tdepth of 2, and without requiring any additional qubit^{40,41}:
where H_{Y} = SH and \(\leftT\right\rangle =TH\left0\right\rangle\). Besides, when the result of the Toffoli is uncomputed, this can be performed without the use of any T gate, exploiting measurementbased uncomputation^{40}, as shown:
There exists also another AND gate implementation with Tdepth = 1, which combines the AND circuit from ref. ^{41} and the Toffoli gate implementation with Tdepth = 1 in ref. ^{42}. The circuit requires one extra qubit with respect to the implementation in (8):
where \(\left+\right\rangle =H\left0\right\rangle\).
Results
In this section, we report the statistics of the quantum circuits generated by our XAGbased algorithms. We selected two publicly available benchmark suites, including arithmetic, cryptographic, e.g., AES, and floating point operation with applications in postquantum cryptography and faulttolerant quantum computing.
The first benchmark contains the bestknown versions of logic networks in terms of multiplicative complexity and depth, collected by the Computer Security Resource Center (CSRC) at the National Institute of Standards and Technology (NIST). We synthesize: (i) finite field multiplication in GF(2^{6}) using irreducible polynomial x^{6} + x^{3} + 1 (m × 6 × 31), multiplication in GF(2^{7}) using irreducible polynomial x^{7} + x^{4} + 1 (m × 7 × 41) and using x^{7} + x^{3} + 1 (m × 7 × 31); (ii) binary multiplication with different input sizes n (bm_n); (iii) a 16bit and a 8bit Sbox (s16, s8); (iv) finite field multiplication in GF(2^{8}) using the AES polynomial x^{8} + x^{4} + x^{3} + x + 1 (×8 × 4 × 31).
In addition, we evaluate our method on a set of circuits used in the context of MultiParty Computation and Fully Homomorphic Encryption. From the benchmarks available online we synthesize: (i) block ciphers DES in its expanded and nonexpanded variant (the latter meaning that the input key is assumed nonexpanded); (ii) block cipher AES with 128, 192, and 256 key length; (iii) cryptographic hash functions MD5, Keccak, SHA256, and SHA512; (iv) arithmetic functions such as adders, multipliers, and comparators; (v) IEEE floating point operations. We preprocess the XAGs exploiting the toolbox to reduce the multiplicative complexity proposed by the authors of ref. ^{20}. This enables us to further improve the provided resource estimates for these designs.
Improving the Tcount versus Tdepth
Table 1 shows the synthesis results of the first two proposed algorithms. Alg. 1 minimizes the Tcount, while Alg. 2 minimizes the Tdepth without increasing the number of T gates, but relying on an increased number of additional qubits. The number of T gates achieved is equal to 4 times the multiplicative complexity of the network for both algorithms. The second algorithm obtains a Tdepth equal to the multiplicative depth of the network. The last two columns of Table 1 compare the algorithms by reporting: the percentage of absolute change in Tdepth (%Td) and in number of qubits (%Q) of Alg. 2 with respect to Alg. 1.
Figure 2 compares the results automatically obtained using Alg. 2 with some resource estimates available in the literature^{11,12,15,17}. The comparison shows a significant reduction in both Tcount and Tdepth, while facing a less significant increase in number of qubits. Nevetheless, it is important to note that once mapped into an errorcorrecting code, T gates require a large amount of dedicated qubits. Note that the authors of ref. ^{17} only report the number of Toffoli gates and the Toffolidepth. We obtain the corresponding Tcount and Tdepth by considering the Clifford+T implementation of the Toffoli gate with 7 T gates and a Tdepth equal to 3, which is optimal^{38}.
Qubits/Tcount tradeoff
In this section, we show the results generated by our third algorithm to manage the memory resources during the compilation of the logic design. Our method allows us to force the compilation to synthesize a circuit with a limited number of helper qubits. Figure 3 shows the compilation results obtained setting the number of available helper qubits to different values, for a selection of designs. The plots show on the xaxis the number of qubits, and on the yaxis the obtained Tcount. For every fixed number of qubits we report two points: the nonoptimized and the optimized results. The latter obtained by running a postoptimization procedure encoded as a SAT problem on the initial (nonoptimized) result. It can be seen how the procedure allows us to choose between different qubit/Tcount tradeoff solutions and how the optimization manages to minimize the Tcount.
Discussion
In the last section, we reported the specifics of quantum circuits compiled using our three XAGbased algorithms. In particular, the first two techniques achieve results that are predictable by inspecting the characteristics of the logic network. In details, given a logic network characterized by a multiplicative complexity \(\tilde{c}\), i.e., the number of AND nodes, and by a multiplicative depth:

both algorithms achieve a Tcount equal to \(4\tilde{c}\);

Alg. 2 achieves a Tdepth equal to the multiplicative depth;

the qubit overhead to achieve such Tdepth depends on the number of shared inputs in the linear transitive fanins of the AND nodes in a level.
This suggests that improving a network with respect to the named parameters can strongly and positively impact the synthesized quantum circuits, e.g., as done in ref. ^{21}, to reduce the Tdepth by reducing the multiplicative depth of the network.
Inspecting the results of the comparison in Table 1 reveals a tradeoff between Tdepth and number of qubits. Indeed, while Alg. 1 is far from achieving the Tdepth performances of Alg. 2, it requires fewer qubits. There are two reasons for the increase in qubits which characterizes Alg. 2. The first one is that it employs the AND implementation characterized by a single Tstage and presented in Section “Introduction” (10), which requires one qubit more than implementation (8) used by Alg. 1. This means that the compilation will request this extra qubit whenever a AND node is computed. In addition, the implementation of AND nodes used by the second algorithm is characterized by a T gate applied to the controls, as well as to the target qubit. For this reason, if two AND nodes share the same input signal, the corresponding quantum circuit will have a Tdepth equal to 2, as each AND implementation will add a T gate to the shared qubit. If all the AND nodes at the same level of an XAG do not share any input, they can be computed within a single Tstage. In order to achieve this result, our second algorithm copies inputs that are shared among more AND nodes in a level on new qubits. Hence, the compilation will request a new qubit whenever inputs are shared among AND nodes at the same level in the XAG. In conclusion, if we sum the number of AND nodes in a level with the number of shared inputs among them, we obtain a quantity equal to the number of helper qubits required to compile that level. Since helper qubits are cleanedup after all the nodes in the level are computed, the level for which this amount is greater will dominate and give the total number of helper qubits for the synthesis of the entire network. Further details on the algorithm, including detailed pseudocode, can be found in Section “Methods”.
We chose to report in Table 1 the two extremes that can be reached using our constructive algorithms. It is also possible to obtain results ‘inbetween’, i.e., a smaller improvement in Tdepth and a smaller qubit overhead with respect to Alg. 2, e.g., by modifying Alg. 1 to use the implementation with Tdepth equal to one. In addition, as the connectivity of each AND node in a level has an impact on the T depth, different results can be found by changing how the level of each node is computed. For example, it is possible to change the scheduling of the nodes to reduce the T depth while minimizing the qubit overhead of Alg. 2.
Our third algorithm focuses on exploring the tradeoff between Tcount and number of qubits. Figure 3 shows how our method is capable of providing different compiled solutions, by taking the number of helper qubits as a parameter. Our method finds the best way of reusing memory space, by computing and uncomputing helper qubits that store intermediate results. This problem corresponds to the reversible pebbling game. The problem complexity has been studied in ref. ^{43}, where the author proves that finding the minimum number of pebbles is PSPACEcomplete, as in the case of the nonreversible pebbling game. Besides, the problem is PSPACEhard to approximate up to an additive constant^{44}. An explicit asymptotic expression for the best timespace product is given in ref. ^{45}. This is a global problem, hard to approximate and decompose, hence difficult to be tackled by heuristic techniques. Here, the problem is encoded as a SAT problem and solved globally, returning a valid memory cleanup strategy that guarantees the upper bound on the number of helper qubits while also aiming to minimize the Tcount.
With respect to the SATbased technique in ref. ^{27}, the algorithm proposed in this work exploits a completely different SAT encoding, which is more compact in both number of variables and clauses. With this method it is possible to obtain competitive results for larger designs while guaranteeing better results for smaller designs. For example, consider the compilation of the small design s8 on 20 helper qubits: our method achieves a Tcount of 164 while the results in ref. ^{27} show a Tcount of about 280.
In Fig. 3 we show nonoptimized versus optimized pebbling solutions. The nonoptimized solution is provided by the SAT solver without any constraints on the number of T gates generated. The optimized solution is obtained starting from the initial solution and running optimization rounds, which iteratively add clauses to the SAT problem to minimize the Tcount. The more time is spent in the optimization procedure the better the solution. The optimized points shown in Fig. 3 are either optimal or the best result found after 1 and a half hours of running the optimization procedure on a machine with two Intel Xeon E52680 v3 (Haswell) CPUs with 2.5 GHz clock frequency and 16 GB of main memory.
The optimization procedure removes unnecessary steps that the solver may insert in the solution. Indeed, none of the clauses used to encode the problem prevents the solver to uncompute nodes even if the limit in pebbles is not reached. Preventing this at the encoding level requires a nonpractical increase in the size of the SAT problem. The optimization reveals the trade off between qubits and Tcount.
Methods
Algorithm 1: minimizing the Tcount
Our first algorithm achieves an upper bound on the number of T gates that is proportional to the multiplicative complexity of the input network \(\tilde{c}\). Indeed, the final quantum circuit has \(4\tilde{c}\)T gates.
The key insight is that each AND node in the logic network is driven by two multiinput parity functions of variables which are either inputs or other AND nodes in the lower levels of the logic network. Figure 4 shows the node x_{i} and the two parity functions with the respective linear transitive fanins. The polarity variables p(i) and q(i) take into account possible inversion of the inputs of the AND node. The pseudocode of the algorithm is provided by Alg. 1. Since the algorithm dedicates one helper qubit for each node of the XAG to store its computed Boolean function, we use nodes’ identifiers, e.g. x_{i}, as parameters for quantum operations, e.g., NOT(x_{i}), meaning that the operation is performed on the corresponding qubits.
Lines 19–22 show that, at first, it computes all the steps of the network that perform the AND (or compute an output) using the function compute. Then all the intermediate results are restored to \(\left0\right\rangle\) by uncomputing ‘compute’. In lines 23–24 NOT gates are placed on negated outputs. The function compute (lines 2–18) builds the circuit for each step x_{i} as illustrated in Fig. 4. In particular, it identifies two qubits corresponding to nodes in the ltfi cones that are not shared between the cones, namely t_{1} and t_{2}. Then, the parity functions are computed inplace onto these qubits t_{1} and t_{2}. Then, the complemented edges are evaluated and NOT gates are applied if necessary (see Fig. 4). In lines 13–14 the step x_{i} is finally computed on a new qubit, using a CNOT gate in case of an XOR output or the implementation of the AND node described in (8), which has Tcount equal to 4 and Tdepth equal to 2, otherwise. Finally, the parity functions are uncomputed.
Algorithm 1
Low Tcount compilation algorithm.
Note that we assume that L_{1} ≠ L_{2}. If this is not the case, it means that the functions computed by fanin to the AND gate are equal, making the AND gate redundant. Also, note that the intersection of L_{1} and L_{2} may not be empty. Since we want to compute the value of L_{1} inplace on some signal t_{1} ∈ L_{1}, we must ensure that L_{1} ⊈ L_{2}. If the latter condition applies, it is sufficient to swap L_{1} and L_{2}.
In addition, when L_{2} ⊆ L_{1}, the value computed by L_{2} could be reused to compute L_{1}. This is achieved by modifying the elements in L_{1} such that L_{1} = (L_{1}\L_{2}) ∪ {x_{k}}. An example is shown in Fig. 5. In this case ltfi(x_{j}) includes ltfi(x_{k}) and ltfi(x_{j})\ltfi(x_{k}) = {t_{0}}. This leads to a reduction in the number of CNOT operations.
Algorithm 2: minimizing the Tdepth
Our second algorithm targets the reduction of the Tdepth. Unlike the previous algorithm, it uses the implementation of the AND operation that has 4 T gates, 4 qubits, and 1 Tstage (10).
We refer to X_{l} = {x_{i}∣L(x_{i}) = l}, as the set of all the nodes at level l. The key idea is that if two AND nodes in the same level do not share any of their input in the ltfi sets, then they can be computed with only one Tstage using implementation (10). Obviously, this is not always the case, as AND nodes often share the same inputs. To overcome this problem, the algorithm copies every overlapping set of inputs on a new helper qubit. This procedure, described in Alg. 2, obtains circuits with a number of Tstages equal to the multiplicative depth of the networks. While the previously described algorithm proceeds in topological order, this one proceeds level by level (see lines 10–17). For each level, the function copy_overlaps assigns to each node a set of two qubits on which it computes the parities of the two fanin cones, defining the mapping CP. If the node shares some inputs with another, a new qubit will be assigned to compute the corresponding parity function, otherwise a qubit corresponding to a node in the fanin cone is used. This means that if a node x_{i} ∈ X_{l} has inputs t_{1}, t_{3}, t_{5} (on qubits q_{1}, q_{3}, q_{5}) in common with node x_{j} ∈ X_{l}, then a new qubit q_{i} will be used as target of three CNOT gates with the shared input qubits as controls. As it can be seen in line 11, the copies are performed before computing any of the nodes in the level, thus allowing the actual AND implementations to act on nonoverlapping qubits, resulting in a single Tstage. Once the copies are being computed, each node is passed to the function compute_on_copies (lines 1–9) which uses the qubits associated by the mapping CP to each fanin parity function as controls to compute the AND. Once all AND nodes in the level are computed, the parities are uncomputed (lines 14). Finally the levels in the XAG are uncomputed from top to bottom. Every node, independently from having shared fanins can be uncomputed without using copies (lines 15–17), applying the function compute defined in Alg. 1. Finally in lines 18–end NOT gates are placed on complemented outputs. An illustrative example is shown in Fig. 6, where the algorithm is applied to a simple level X_{l} = x_{i}, x_{s} with one overlapping input t_{0}, such that ltfi(x_{j(i)}) ∩ ltfi(x_{k(s)}) = {} and ltfi(x_{j(s)}) = ltfi(x_{k(i)}) = {t_{0}}. The figure shows how the overlapping input is copied to a new qubit before computing the parity functions: then the two AND can be computed in parallel with a Tdepth equal to 1.
Algorithm 2
Low Tdepth compilation algorithm.
Algorithm 3: minimizing the number of qubits
All the algorithms described so far compute and uncompute every AND node at most once, and the compiled circuit is uniquely determined by the features of the input network. In this section, we show a method that, instead, allows us to explore the solution space, by enabling to compute and uncompute nodes several times.
The third algorithm seeks the best strategy to uncompute the intermediate results in order to optimize the memory usage. The problem is equivalent to the reversible pebbling game. The game is played on a directed acyclic graph (DAG) using a limited number of pebbles. The player places or removes pebbles from the DAG nodes according to certain rules: a pebble can be placed (removed) from a node only if all the inputs of that node have a pebble. The game is won when pebbles are only placed on the network’s output. The set of moves that leads to a winning configuration is called pebbling strategy. Every pebble in the game corresponds to a helper qubit. The move of placing a pebble on a node corresponds to computing the logic of that node on this helper qubit. When a pebble is removed, it corresponds to uncomputing the value stored on the helper qubit. As a consequence, the pebbling strategy directly corresponds to a set of compute/uncompute operations. The definition of a winning configuration (no pebbles on internal nodes) guarantees that performing this set of operations uncomputes all intermediate results. As demonstrated in ref. ^{30}, SAT solvers can be used to solve the reversible pebbling game and find a synthesis strategy for any Boolean function represented using a DAG.
The compilation problem is transformed into the following problem:
Problem 1
Given a DAG and a number of pebbles, find a valid pebbling strategy using the minimum number of moves.
To address this problem using a SAT solver, it needs to be decomposed into many SAT problems:
Problem 2
Given a DAG and P pebbles, does a valid pebbling strategy with K moves exist?
The solver can either find a solution and return a pebbling strategy, or state that no solution exists. In order to solve problem 1, when the SAT solver returns unsat, K is incremented and the solver is asked to find a strategy again. This is done until a satisfying solution is found. Since K is incremented at each step, once a solution is found, it is guaranteed to be the one with the smallest K.
SAT encoding
Here we give a quick overview of the basic encoding. The input DAG G = (V, E) figures nodes computing output values and we refer to them as elements of the set O ⊆ V. Note that the primary inputs are not nodes of the DAG. Problem 2 is encoded in terms of the pebble state variables p_{v,i}. For v ∈ V and 0 ≤ i ≤ K, those are Boolean variables that evaluate to true if the node v is pebbled at time i. Note that the SAT formula encodes K + 1 pebble configurations with K steps describing the transition from one configuration to the other. The following set of clauses describes the reversible pebbling problem:

Initial and final clauses. At time 0 all the nodes are unpebbled and at time K all the outputs need to be pebbled and all the intermediate results unpebbled
$$\mathop{\bigwedge}\limits_{v\in V}{\bar{p}}_{v,0} \wedge \mathop{\bigwedge}\limits_{v\in O}{p}_{v,K}\wedge \mathop{\bigwedge}\limits_{v\notin O}{\bar{p}}_{v,K}$$ 
Move clauses. If a node is pebbled or unpebbled at time i + 1, then all its children are pebbled at time i and time i + 1:
$$\mathop{\bigwedge }\limits_{i=1}^{K}\mathop{\bigwedge}\limits_{(v,w)\in E}(({p}_{v,i}\oplus {p}_{v,i+1})\to ({p}_{w,i}\wedge {p}_{w,i+1}))$$ 
Cardinality clauses. At each step, at most P pebbles are used:
$$\mathop{\bigwedge }\limits_{i=0}^{K}(\mathop{\sum}\limits_{v\in V}{p}_{v,i} \le P)$$
Example 2
Figure 7 illustrates how a network with only AND nodes can be compiled as a reversible network of Toffoli gates out of a pebbling solution with 3 pebbles and 6 steps. Note that the final circuit will use only 2 helper qubits, which is the number of pebbles used, minus the number of outputs. The overall width will be equal to 7: the number of inputs plus the number of pebbles.
XAGs are DAGs in which each node computes the AND or the XOR function. It follows that it is possible to play the reversible pebbling game directly on the XAG, as done in ref. ^{27}. Nevertheless, this does not exploit the structural properties of the XAG. In addition, the SAT encoding required for a similar approach must be capable of discriminating between the different properties of the XAG node. For example, several clauses are required to enable inplace computing of XOR nodes. The resulting SAT problem features many variables and clauses and is only applicable to small designs.
For these reasons, we choose to construct a different DAG from the XAG, which we call abstract graph. Each AND node (and its two input parity functions) corresponds to a box node of the abstract graph, as shown in and Fig. 8. Once a strategy for pebbling the abstract graph is found, each time a pebble is placed on a box node which compresses x_{i} the compute (x_{i}) function will be called, while whenever a pebble is removed from a node, the compute^{†}(x_{i}) function will be called to uncompute the node.
Optimizing the pebbling solution
While the XAG is compressed into the abstract graph we lose some information about the number of quantum gates required to compute each node. Indeed, the strategy found would not take into account the fact that one box node requires more gates to be performed than another. In addition, the SAT encoding of the standard reversible pebbling game does not include any clause that controls the number of moves, which reflects in the number of generated T gates. An optimization step is introduced to overcome both problems.
The key idea is that it is possible to associate a weight with each box node of the abstract graph w_{v}, which is equal to the number of inputs to the node itself. Indeed, the number of inputs are related to the number of CNOT gates that are needed to compute the parity functions ‘hidden’ in the compressed node. Then, we define a new set of variables for the SAT encoding: activation variables a_{v,i}. For v ∈ V and 0 < i ≤ K, those are Boolean variables that evaluate to true if the node v has changed its state at time i. Once a weightagnostic solution has been found, the following quantity represent the total weight of the strategy:
The SAT solver is then asked to find a solution with a total weight W = W_{s} − 1 by adding a cardinality clause that expresses equation (11). This procedure is repeated until the solver returns ‘unsat’ or hits a timeout.
As shown in the result section, this optimization procedure succeeds at reducing the number of T gates with respect to the initial solution. This result can be achieved even if every node has weight equal to one. Indeed, the optimization introduces a cardinality constraint on the activation variables, hence eliminates all the pebbling moves that are not fundamental to terminate the game. As a consequence, fewer helper qubits are required. If the weights are set to reflect the actual size of the parity functions, then the number of CNOT in the solution is reduced.
Data availability
The circuits we synthesized have been collected by the NIST and the University of Yale (http://cswww.cs.yale.edu/homes/peralta/CircuitStuff/CMT.html) and by the Department of Electrical Engineering (ESAT) at KU Leuven (https://homes.esat.kuleuven.be/^{~}nsmart/MPC/). For some entries of our benchmark we used circuit implementations with low multiplicative complexity obtained at EPFL and available online at https://github.com/lsils/date2020_experiments.
Code availability
All the algorithms that we discussed in this work are part of the C++ opensource library caterpillar (https://github.com/gmeuli/caterpillar), which is one of the LSI logic synthesis libraries^{46}.
References
Svore, K. M. et al. Q#: Enabling scalable quantum computing and development with a highlevel DSL. In Real World Domain Specific Languages Workshop, 7:1–7:10 (2018).
Aleksandrowicz, G. et al. Qiskit: An Opensource Framework for Quantum Computing (2019). Zenodo. https://doi.org/10.5281/zenodo.2562111.
Smith, R. S., Curtis, M. J. & Zeng, W. J. A practical quantum instruction set architecture. Preprint at https://arxiv.org/abs/1608.03355 (2017).
Ho, A. & Bacon, D. Announcing Cirq: An open source framework for NISQ algorithms. Google AI Blog (2018).
Green, A. S., Lumsdaine, P. L., Ross, N. J., Selinger, P. & Valiron, B. Quipper: a scalable quantum programming language. In ACM SIGPLAN Conference on Programming Language Design and Implementation, 333–342 (2013).
JavadiAbhari, A. et al. ScaffCC: a framework for compilation and analysis of quantum computing programs. Proceedings of the 11th ACM Conference on Computing Frontiers, CF 2014 (2014).
Steiger, D. S., Häner, T. & Troyer, M. ProjectQ: an open source software framework for quantum computing. Quantum 2, 49 (2018).
Grover, L. K. Quantum computers can search arbitrarily large databases by a single query. Phys. Rev. Lett. 79, 4709 (1997).
Shor, P. W. Polynomialtime algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 41, 303–332 (1999).
Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502 (2009).
Grassl, M., Langenberg, B., Roetteler, M. & Steinwandt, R. Applying Grover’s algorithm to AES: quantum resource estimates. In: PostQuantum Cryptography. PQCrypto 2016 (ed. Takagi, T.), vol. 9606, 29–43 (2016).
Jaques, S., Naehrig, M., Roetteler, M. & Virdia, F. Implementing grover oracles for quantum key search on AES and LowMC. In Annual Int’l Conf. on the Theory and Applications of Cryptographic Techniques, 280–310 (Springer, 2020).
NIST. Submission requirements and evaluation criteria for the postquantum cryptography standardization process (2016). Online at https://csrc.nist.gov/CSRC/media/Projects/LightweightCryptography/documents/finallwcsubmissionrequirementsaugust2018.pdf.
Häner, T., Jaques, S., Naehrig, M., Roetteler, M. & Soeken, M. Improved quantum circuits for elliptic curve discrete logarithms. In Int’l Conf. on PostQuantum Cryptography, 425–444 (Springer, 2020).
Amy, M. et al. Estimating the cost of generic quantum preimage attacks on sha2 and sha3. In: Selected Areas in Cryptography – SAC 2016. (eds. Avanzi, R. & Heys, H.), vol. 10532, 317–337 (2017).
Parent, A., Roetteler, M. & Svore, K. M. Reversible circuit compilation with space constraints. Preprint at https://arxiv.org/abs/1510.00377 (2015).
Langenberg, B., Pham, H. & Steinwandt, R. Reducing the cost of implementing the advanced encryption standard as a quantum circuit. IEEE Trans. Quantum Eng. 1, 1–12 (2020).
Kim, P., Han, D. & Jeong, K. C. Timespace complexity of quantum search algorithms in symmetric cryptanalysis: applying to AES and SHA2. Quantum Inf. Process. 17, 339 (2018).
Brayton, R. K., Hachtel, G. D. & SangiovanniVincentelli, A. L. Multilevel logic synthesis. Proc. IEEE 78, 264–300 (1990).
Testa, E., Soeken, M., Riener, H., Amaru, L. & De Micheli, G. A logic synthesis toolbox for reducing the multiplicative complexity in logic networks. In Design, Automation and Test in Europe Conference (2020).
Häner, T. & Soeken, M. Lowering the Tdepth of quantum circuits by reducing the multiplicative depth of logic networks. Preprint at https://arxiv.org/abs/2006.03845 (2020).
Rawski, M. Application of functional decomposition in synthesis of reversible circuits. In Reversible Computation. RC 2015. (eds. Krivine, J. & Stefani, J. B.), vol. 9138, 285–290 (2015).
Markov, I. L. & Saeedi, M. Faster quantum number factoring via circuit synthesis. Phys. Rev. A 87, 012310 (2013).
Shende, V. V., Prasad, A. K., Markov, I. L. & Hayes, J. P. Synthesis of reversible logic circuits. IEEE Trans. Comput. Aided Design Integrated Circuits Syst. 22, 710–722 (2003).
Soeken, M., Roetteler, M., Wiebe, N. & De Micheli, G. LUTbased hierarchical reversible logic synthesis. IEEE Trans. Comput. Aided Design Integrated Circuits Syst. 38, 1675–1688 (2018).
Meuli, G., Soeken, M., Roetteler, M. & De Micheli, G. ROS: Resource constrained oracle synthesis for quantum circuits. In Quantum Physics and Logic (2019).
Meuli, G., Soeken, M., Campbell, E., Roetteler, M. & De Micheli, G. The role of multiplicative complexity in compiling low Tcount oracle circuits. Int’l Conf. on ComputerAided Design (2019).
Meuli, G., Soeken, M., Roetteler, M. & De Micheli, G. Enumerating optimal quantum circuits using spectral classification. In Int’l Symp. on Circuits and Systems (2020).
Bennett, C. H. Time/space tradeoffs for reversible computation. SIAM J. Comput. 18, 766–776 (1989).
Meuli, G., Soeken, M., Roetteler, M., Bjorner, N. & Micheli, G. D. Reversible pebbling game for quantum memory management. In Design, Automation and Test in Europe Conference, 288–291 (2019).
Brayton, R. & Mishchenko, A. ABC: An academic industrialstrength verification tool. In Int’l Conf. on Computer Aided Verification, 24–40 (Springer, 2010).
Synopsys. Design compiler graphical. Online at https://www.synopsys.com/implementationandsignoff/rtlsynthesistest/designcompilergraphical.html (2020). Accessed Apr 2020.
Knuth, D. E. The Art of Computer Programming, vol. 4A (AddisonWesley, 2011).
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 2000).
Campbell, E. T. & Howard, M. Unified framework for magic state distillation and multiqubit gate synthesis with reduced resource cost. Phys. Rev. A 95, 022316 (2017).
Fowler, A. G., Mariantoni, M., Martinis, J. M. & Cleland, A. N. Surface codes: Towards practical largescale quantum computation. Phys. Rev. A 86, 032324 (2012).
Maslov, D. Advantages of using relativephase Toffoli gates with an application to multiple control Toffoli optimization. Phys. Rev. A 93, 022311 (2016).
Amy, M., Maslov, D., Mosca, M. & Roetteler, M. A meetinthemiddle algorithm for fast synthesis of depthoptimal quantum circuits. IEEE Trans. CAD Integrated Circuits Syst. 32, 818–830 (2013).
Gosset, D., Kliuchnikov, V., Mosca, M. & Russo, V. An algorithm for the Tcount. Quantum Inf. Comput. 14, 1261–1276 (2014).
Jones, C. Lowoverhead constructions for the faulttolerant Toffoli gate. Phys. Rev. A 87, 022328 (2013).
Gidney, C. Halving the cost of quantum addition. Quantum 2, 10–22331 (2018).
Selinger, P. Quantum circuits of Tdepth one. Phys. Rev. A 87, 042302 (2013).
Chan, S. M. Pebble games and complexity. Ph.D. thesis, University of California, Berkeley (2013).
Chan, S. M., Lauria, M., Nordstrom, J. & Vinyals, M. Hardness of approximation in PSPACE and separation results for pebble games. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, 466–485 (2015).
Knill, E. An analysis of Bennett’s pebble game. Preprint at https://arxiv.org/abs/math/9508218 (1995).
Soeken, M. et al. The EPFL logic synthesis libraries. Preprint at https://arxiv.org/abs/1805.05121 (2018).
Acknowledgements
This research was supported by the Swiss National Science Foundation (200021169084 MAJesty).
Author information
Authors and Affiliations
Contributions
G.M. and M.S. conceived the algorithms and planned the experimental evaluation. G.M. implemented the algorithms, performed the experiments and analyzed the data. G.D.M. coordinated the project. G.M. wrote the manuscript. All authors revised and approved the content of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Meuli, G., Soeken, M. & De Micheli, G. XorAndInverter Graphs for Quantum Compilation. npj Quantum Inf 8, 7 (2022). https://doi.org/10.1038/s4153402100514y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4153402100514y
This article is cited by

Toffolidepth reduction method preserving inplace quantum circuits and its application to SHA3256
Quantum Information Processing (2024)