Xor-And-Inverter Graphs for Quantum Compilation

Meuli, Giulia; Soeken, Mathias; De Micheli, Giovanni

doi:10.1038/s41534-021-00514-y

Download PDF

Article
Open access
Published: 27 January 2022

Xor-And-Inverter Graphs for Quantum Compilation

npj Quantum Information volume 8, Article number: 7 (2022) Cite this article

2606 Accesses
9 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Quantum compilation is the task of translating a high-level description of a quantum algorithm into a sequence of low-level quantum operations. We propose and motivate the use of Xor-And-Inverter Graphs (XAG) to specify Boolean functions for quantum compilation. We present three different XAG-based compilation algorithms to synthesize quantum circuits in the Clifford + T library, hence targeting fault-tolerant quantum computing. The algorithms are designed to minimize relevant cost functions, such as the number of qubits, the T-count, and the T-depth, while allowing the flexibility of exploring different solutions. We present novel resource estimation results for relevant cryptographic and arithmetic benchmarks. The achieved results show a significant reduction in both T-count and T-depth when compared with the state-of-the-art.

Scalable algorithm simplification using quantum AND logic

Article Open access 14 November 2022

A high-performance compilation strategy for multiplexing quantum control architecture

Article Open access 03 May 2022

Universal compilation for quantum state tomography

Article Open access 06 March 2023

Introduction

Different programming languages are currently available to program quantum computers at a high level of abstraction, with the purpose of enabling a wide community to exploit their exceptional computation capabilities. Relevant examples are: Q# (Microsoft)¹, Qiskit (IBM)², PyQuil/Quil (Rigetti)³, Circ (Google)⁴, Quipper⁵, Scaffold/ScaffCC⁶, and ProjectQ⁷. These languages require fast and reliable methods to compile the program into hardware-specific low-level quantum operations. The compilation result is evaluated by the number of qubits used, as well as by the number and the entity of low-level operations obtained.

Many quantum algorithms, such as Grover’s⁸, Shor’s⁹ and HHL¹⁰, require the computation of some combinational logic functions, e.g., arithmetic functions, which usually need large amounts of resources to be computed. Methods capable of generating quantum circuits for such logic designs are needed to run these algorithms on a quantum computer. For example, HHL requires the reciprocal operation, which causes a significant overhead in the number of qubits with respect to the other components of the algorithm. In some cases, the resources required to perform logic operations may dominate the overall resources and exceed the available computing power. Besides, quantum circuits performing combinational logic, called oracles, find application in post-quantum cryptography. It has been shown how Grover’s algorithm can be used to break symmetric encryption schemes such as the Advanced Encryption Standard (AES), if the quantum circuit for the encryption function is known^11,12. The number of resources required to break a newly proposed post-quantum encryption scheme depends on the resources required to build the corresponding quantum oracle. Consider for example the categories for public-key schemes proposed by the National Institute of Standards and Technology (NIST) in their proposal to standardize post-quantum cryptography¹³. Shor’s algorithm also requires combinational logic and can be used to construct quantum algorithms for integer factorization, finite field discrete logarithms, and elliptic curve discrete logarithms. As a consequence, cryptosystems based on these problems cannot be considered secure in a post-quantum environment.

Even if the technology is nowadays still far from achieving the system sizes and performances that these applications require, estimating the resources needed to perform combinational functions has a relevant impact on the design and applicability of advanced quantum algorithms. The resource footprint of these operations, e.g., a large number of quantum operations and qubits, can exceed the actual resources available, hence preventing some algorithms to be computed. Consequently, there is a large interest in compilation methods that minimize the impact of combinational logic on the cost of quantum algorithms.

Several research works focus on improving (often manually) quantum implementations of cryptographic functions. As Shor’s algorithm can be used to break elliptic curve cryptography, authors of¹⁴ have optimized the required quantum circuit that computes the costly elliptic curve scalar multiplication. The authors of ref. ¹¹ present Clifford + T implementations of AES (key size 128, 192, and 256) used to evaluate the resources needed to run an exhaustive key search with Grover’s algorithm. In ref. ¹⁵, authors present resource estimations of quantum pre-image attacks on SHA-2 and SHA-3. They present quantum oracles for SHA-256 and SHA3-256. They improve the reversible implementations derived in ref. ¹⁶ and evaluate the cost of running the attack on a surface code based fault-tolerant quantum computer. In ref. ¹⁷ authors focus on improving the implementation of the S-box of AES to simplify Grover based key search. Similarly, authors in ref. ¹⁸ provide implementations for SHA-256 and AES-128, result successively improved by Jaques et al. ¹².

In this work, we focus on the problem of automatically compiling arbitrary logic functions for fault-tolerant quantum computing, starting from a multilevel logic network representation. With respect to the previously cited works, we do not rely on manual and design-specific optimizations. Our automatic compilation strategies are designed to minimize qubits and gates, with an emphasis on exploring the trade-off between the two cost functions. The algorithms are inspired by methods currently applied in classical multilevel logic synthesis—a 50 years old research field focused on optimization and mapping of combinational designs¹⁹. Algorithms and data structures developed in this field can be borrowed, adapted, and expanded to the synthesis of quantum circuits. In particular, we exploit a convenient graph-based data structure called Xor-And-Inverter Graphs (XAG). As we target fault-tolerant quantum computing, we compile into the Clifford + T universal library and focus on the following cost functions: the T-count—the number of generated T gates; the T-depth—the maximum number of T gates to be performed sequentially, also referred to as number of T-stages; and the number of qubits. We identify how the characteristics of the network impact the resource footprint of the compiled circuit and elaborate on how the network could be modified to achieve better compilation results using state-of-the-art minimization strategies^20,21.

Logic networks are often used as convenient representation to develop scalable reversible synthesis algorithms^22,23,24. A recent work²⁵ presents an automatic hierarchical synthesis method that leverages look-up table (LUT) decomposition. Such a method has the advantage of being applicable to any logic network, independently of the Boolean function implemented by its nodes. More importantly, it enables us to control the number of generated qubits: the network is decomposed into several single-output sub-networks whose results are stored into extra qubits. By controlling the size of the sub-networks, it is possible to control the extra qubits generated. Nevertheless, the method is not able to efficiently optimize the gate count. Typically, when the number of qubits is heavily constrained, the number of gates significantly increases. This happens because large sub-networks will be generated and, with no control on the Boolean functions they implement, they will likely be compiled into a large circuit. In addition, LUT decomposition causes a windowing effect: parts of the networks are prevented from being synthesized together, resulting in more gates. To address this issue, the work in ref. ²⁶ implements an LUT decomposition strategy which allows some control on the grouped logic, reducing the T-count.

The present work is based on a different synthesis approach that enables better control over all the cost functions, which we introduced for the first time in ref. ²⁷. This approach is based on identifying repeated patterns in the network, which conveniently translate into quantum circuits with few gates. In particular, the graph is decomposed into parts that can be implemented by one single Toffoli gate. Hence, a direct correlation can be established between the features of the networks and the cost in terms of T gates (T-count and T-depth) and number of qubits.

In this work, we present all the latest improvements on XAG-based compilation, which reflect in the algorithms collected in the open-source library caterpillar. We propose XAG-based compilation as the method of choice to automatically synthesize quantum circuits implementing cryptographic and arithmetic logic functions with application in post-quantum cryptography and fault-tolerant quantum computing. Through the provided detailed description of the algorithms, the reader can identify (i) the most suited algorithm and (ii) the best XAG pre-processing steps to be used with respect to a specific compilation problem.

The first algorithm presented, which was originally proposed in ref. ²⁷, minimizes the T-count by correlating it with the number of AND nodes in the XAG (multiplicative complexity). Indeed, the final circuit achieves the upper-bound in the number of T gates of four times the multiplicative complexity of the input network. We demonstrated in ref. ²⁷ an average 20 × reduction in T-count with respect to LUT-based methods. The second algorithm proposed minimizes the T-depth by relating it to (i) the maximum number of levels in the graph with AND nodes, i.e., the multiplicative depth, and (ii) the number of AND nodes in the same level sharing input signals. This algorithm achieves a T-depth equal to the multiplicative depth of the graph and has been originally used in ref. ²⁸ to synthesize designs with maximum 5 inputs. We provide a detailed algorithmic description of both algorithms. Furthermore, we present synthesis results for relevant cryptographic benchmarks (https://homes.esat.kuleuven.be/^~nsmart/MPC/ and http://cs-www.cs.yale.edu/homes/peralta/CircuitStuff/CMT.html), which can serve as resource estimation for post-quantum attacks. Such results are compared with the state-of-the-art estimates available in the literature for some of the designs, showing improvement with respect to both T-count and T-depth. Differently from ref. ²⁸, we provide resource estimation results for very large designs, proving the scalability of the proposed methods. We discuss and compare the results that the two methods achieve in addition to explaining how properties of the XAGs can be modified to tune the obtained results. For example, we identify the node scheduling as a key tool to minimize the number of qubits when using the second algorithm.

Finally, in this paper we propose a third compilation algorithm that performs quantum memory management to explore the trade-off between qubits and T-count. The number of available helper qubits can be selected as a parameter of the algorithm, which will return a valid compilation solution to not exceed the given qubit constraint, then an optimization procedure reduces the number of T gates. In particular, it exploits SAT solvers to find a strategy to fit the logic into a constrained number of qubits. The idea is to enable the reuse of helper qubits by uncomputing intermediate results, solving the so-called reversible pebbling game²⁹. In a previous work³⁰ we introduced the problem of quantum memory management and proposed a solution based on SAT. With respect to the first attempt to apply this idea to XAGs in ref. ²⁷, here we propose to work at a wider level of granularity. In other words, while the previous method was enabling computation and uncomputation of every single node in the XAG separately, in this approach we group selected sets of nodes together. This allows us to control the overhead in the number of gates generated when constraining the number of qubits. We present a SAT encoding that, by reducing the number of variables and the size of clauses, is applicable to larger designs and enables a second optimization algorithm to further improve the T-count of the compiled results. We demonstrate the ability of this method to trade-off qubits for gates on a selection of our benchmarks.

In classical logic synthesis, a good method is based on the synergy between data structure and algorithm, working together to minimize the target functions. Multilevel logic networks proved to be both scalable and compact data structures. For example, the And-Inverter Graph (AIG) is a popular network used both in academic and industrial frameworks^31,32.

In this work, we present different algorithms for the synthesis of quantum circuits that rely on the convenient representation of the logic as an XAG. This is a logic network over the gate basis {∧, ⊕, ¬}, meaning that each node of the network either computes the 2-input AND operation, the exclusive-OR operation, i.e., the 2-input XOR, or the inversion operation $\neg x=1\oplus x=\bar{x}$. We use $\bar{x}$ to denote the Boolean complement of ${x}={1-x}$, and define ${x}^{0}=\bar{x}$ and x¹ = x. A simple XAG computing the majority-of-three Boolean function is shown in Fig. 1a.

**Fig. 1: The three steps perfomed to compile an XAG representing the majority-of-three function.**

A Boolean chain is a formal notation for logic networks. Given primary inputs x₁,…x_n, a logic network consisting of r local function is represented by a sequence called Boolean chain

$${x}_{i}={f}_{i}({x}_{{i}_{1}},\ldots ,{x}_{{i}_{ar({f}_{i})}}) \quad {{\rm{for}}}\quad n \, \le \, i \, \le \, n+r$$

(1)

where f_i is a gate function with ar(f_i) inputs and 0 ≤ i_j < i for 1 ≤ j ≤ ar(f_i) are indexes to primary inputs or previous steps in the sequence, as defined in ref. ³³. An XAG logic network representing an n-variable Boolean function with inputs x₁, …, x_n is modeled as a Boolean chain with steps

$${x}_{i}={x}_{j(i)}\oplus {x}_{k(i)}\qquad \,{{\mbox{or}}}\,\qquad {x}_{i}={x}_{j(i)}^{p(i)}\wedge {x}_{k(i)}^{q(i)},$$

(2)

for n < i ≤ n + r, depending on whether the step computes the 2-input XOR or the 2-input AND operation, where r is the number of steps. The constant values 1 ≤ j(i) < k(i) < i point to input or previous steps in the chain. When a step computes the AND operation, the Boolean constants p(i) and q(i) are used to possibly complement the gate’s fan-in. Please note that complemented inputs of XOR gates can be propagated to their outputs, hence we do not define p(i) and q(i) for the XOR steps. The value of a single-output function is computed by the last step of the chain $f={x}_{n+r}^{p}$, which may be complemented. In the case of multi-output functions, there will be a set of steps that computes the function’s values: ${f}_{o}={x}_{o}^{p}$, where o ∈ O is the list of all the output indices. We write ∘_i = ∧ , if step i computes an AND gate, and ∘_i = ⊕, if step i computes an XOR gate.

We define the multiplicative complexity of the logic network as the number of AND gates it contains: $\tilde{c}=| \{i| {\circ }_{i}=\wedge \}|$. We also define the multiplicative complexity of the Boolean function, which is the minimum number of AND nodes required to represent it as an XAG. Clearly, the multiplicative complexity of a network is an upper bound on the multiplicative complexity of the Boolean function it realizes.

In this work, we exploit the fact that every AND node acts on two multi-input parity functions. When the input to the AND node is either a primary input, another AND gate, or a network’s output, the arity of this function is equal to 1. Formally, let the linear transitive fan-in of a node x_i in the logic network be defined using the recursive function

$${{\mathrm{ltfi}}}\,({x}_{i})=\left\{\begin{array}{ll}\{{x}_{i}\}\quad &\,{{\mbox{if}}}\,i\ \le \ n\ {{{\rm{or}}}}\,{\circ }_{i}=\wedge \ {{{\rm{or}}}}\ i\ \in O,\\ {{\mathrm{ltfi}}}\,({x}_{j(i)})\bigtriangleup {{\mathrm{ltfi}}}\,({x}_{k(i)})\quad &\,{{\mbox{otherwise}}}\,,\end{array}\right.$$

(3)

where ‘Δ’ denotes the symmetric difference of two sets. It is easy to see that all elements in ltfi(x_i) are either inputs, outputs, or steps that compute an AND gate. Figure 4 illustrate an AND node and its two linear transitive fan-in cones.

Example 1

The network in Fig. 1a, in which dotted lines represent inversion, implements the majority-of-three function $\left\langle {x}_{1}{x}_{2}{x}_{3}\right\rangle ={x}_{1}{x}_{2}\vee {x}_{1}{x}_{3}\vee {x}_{2}{x}_{3}$. The network corresponds to a Boolean chain with four steps:

$$\begin{array}{lll}{x}_{4}={x}_{1}\oplus {x}_{2},\qquad {x}_{5}={x}_{2}\oplus {x}_{3},\\ {x}_{6}={\overline{x}}_{4}\wedge {x}_{5},\qquad {x}_{7}={x}_{3}\oplus {x}_{6}.\end{array}$$

For this network

$$\begin{array}{ll}{{\mathrm{ltfi}}}\,({x}_{4})=\{{x}_{1},\,{x}_{2}\},\\ {{\mathrm{ltfi}}}\,({x}_{5})=\{{x}_{2},\,{x}_{3}\},\\ {{\mathrm{ltfi}}}\,({x}_{6})=\{{x}_{6}\},\\ {{\mathrm{ltfi}}}\,({x}_{7})=\{{x}_{3},\,{x}_{6}\}.\end{array}$$

Finally, we introduce the concept of level in the XAG network. Every step x_i of the network, with 1 ≤ i ≤ n + r is characterized by a quantity called level and defined as:

$$L({x}_{i})=\left\{\begin{array}{ll}\mathop{\max }\limits_{t\in C}(L(t))+1\,{{\mbox{with}}}\,C\!:= {{\mathrm{ltfi}}}\,({x}_{j}(i))\cup {{\mathrm{ltfi}}}\,({x}_{k}(i)),&\,{{\mbox{if}}}\,i > \,n\\ 0,&\,{{\mbox{otherwise}}}\,\end{array}\right.$$

(4)

In other words, a network’s node x_i is at level L(x_i) = l only if the node with the maximum level among all the ones in the linear transitive fan-in cones of x_i is at level l − 1. This means that only AND nodes and outputs count to define the depth of the network, because only AND and outputs nodes appear in the ltfi sets. We define ${\max }_{n\,{ < }\,i\le n+r}L({x}_{i})$ as the multiplicative depth of the network.

In addition to providing a very compact representation for Boolean functions, XAG networks have another characteristic that makes them excellent data structures for quantum compilation: each node represents a logic function for which a convenient quantum circuit implementation exists. This allows us to recognize the existence of a dependency between the network characteristics, e.g., the multiplicative complexity/depth, and the synthesized quantum circuit. It is indeed possible to derive an upper bound on the number of expensive gates from characteristics of the XAG.

Given a logic network computing an n-variable Boolean function f(x), a compilation algorithm finds a quantum circuit that implements the unitary operation

$${U}_{f}:\left|x\right\rangle \left|y\right\rangle {\left|0\right\rangle }^{k}\,\mapsto\, \left|x\right\rangle \left|y\oplus f(x)\right\rangle {\left|0\right\rangle }^{k},$$

(5)

where k is the number of extra qubits internally used by the circuit and restored back to $\left|0\right\rangle$, also referred to as helper qubits. This circuit is often called oracle. Automatic compilation of logic designs requires two steps, illustrated in Fig. 1: (i) transforming a possibly non-reversible Boolean function into a reversible quantum circuit, and (ii) translating the reversible circuit into a quantum circuit.

The first step is responsible of mapping the Boolean function into a reversible circuit. A reversible circuit is a logic representation characterized by a fixed number of lines that store inputs, outputs, and intermediate data, acted upon by reversible gates. For example, Fig. 1b shows the reversible circuit performing the function specified by the XAG in Fig. 1a. Such circuit is built using 2-input Toffoli gates, CNOT gates, and X gates (or NOT). The Toffoli gate is characterized by a set of two controls x₁, x₂ and by a single target y₁. It performs the transformation:

$$\left|{x}_{1}\right\rangle \left|{x}_{2}\right\rangle \left|{y}_{1}\right\rangle \,\mapsto\, \left|{x}_{1}\right\rangle \left|{x}_{2}\right\rangle \left|{y}_{1}\oplus {x}_{1}{x}_{2}\right\rangle .$$

(6)

In other words, it inverts the target only if the logic AND of the two controls evaluates to one. In practice, if y₁ is initialized to $\left|0\right\rangle$, the Toffoli gate performs the AND operation. The CNOT is specified by a target and by a control qubit: it complements the target if the state of the control is $\left|1\right\rangle$. If applied on target in the state $\left|0\right\rangle$ the CNOT gate copies the state of the control.

Once the Boolean function is expressed using reversible gates, it needs to be compiled into a quantum circuit. Quantum circuits are a way to describe quantum programs: a sequence of operations performed on qubits, represented by quantum gates. We expect the reader to be familiar with the quantum circuit representation and gate abstractions and refer to ref. ³⁴ for a detailed description. In fault-tolerant quantum computing, we consider gates from the Clifford + T universal library. This consists of the CNOT gate, the Hadamard gate (H), as well as the T gate, and its inverse T^†. The T gate is particularly expensive to be applied. As a consequence, the T-count (number of T gates) is a good measure for the cost of a fault-tolerant implementation of a given quantum program^35,36.

Our algorithms exploit well known state-of-the-art quantum implementations of the 2-input Toffoli gate. The Toffoli gate has a Clifford + T implementation that requires 7 T gates³⁷, which is optimum^38,39:

(7)

This implementation has been used to derive the quantum circuit for the majority-of-three function shown in Fig. 1c. When the Toffoli gate is computed on a qubit initialized to $\left|0\right\rangle$, it can be implemented using 4 T gates, with a T-depth of 2, and without requiring any additional qubit^40,41:

(8)

where H_Y = SH and $\left|T\right\rangle =TH\left|0\right\rangle$. Besides, when the result of the Toffoli is uncomputed, this can be performed without the use of any T gate, exploiting measurement-based uncomputation⁴⁰, as shown:

(9)

There exists also another AND gate implementation with T-depth = 1, which combines the AND circuit from ref. ⁴¹ and the Toffoli gate implementation with T-depth = 1 in ref. ⁴². The circuit requires one extra qubit with respect to the implementation in (8):

(10)

where $\left|+\right\rangle =H\left|0\right\rangle$.

Results

In this section, we report the statistics of the quantum circuits generated by our XAG-based algorithms. We selected two publicly available benchmark suites, including arithmetic, cryptographic, e.g., AES, and floating point operation with applications in post-quantum cryptography and fault-tolerant quantum computing.

The first benchmark contains the best-known versions of logic networks in terms of multiplicative complexity and depth, collected by the Computer Security Resource Center (CSRC) at the National Institute of Standards and Technology (NIST). We synthesize: (i) finite field multiplication in GF(2⁶) using irreducible polynomial x⁶ + x³ + 1 (m × 6 × 31), multiplication in GF(2⁷) using irreducible polynomial x⁷ + x⁴ + 1 (m × 7 × 41) and using x⁷ + x³ + 1 (m × 7 × 31); (ii) binary multiplication with different input sizes n (bm_n); (iii) a 16-bit and a 8-bit S-box (s16, s8); (iv) finite field multiplication in GF(2⁸) using the AES polynomial x⁸ + x⁴ + x³ + x + 1 (×8 × 4 × 31).

In addition, we evaluate our method on a set of circuits used in the context of Multi-Party Computation and Fully Homomorphic Encryption. From the benchmarks available online we synthesize: (i) block ciphers DES in its expanded and non-expanded variant (the latter meaning that the input key is assumed non-expanded); (ii) block cipher AES with 128, 192, and 256 key length; (iii) cryptographic hash functions MD5, Keccak, SHA-256, and SHA-512; (iv) arithmetic functions such as adders, multipliers, and comparators; (v) IEEE floating point operations. We pre-process the XAGs exploiting the toolbox to reduce the multiplicative complexity proposed by the authors of ref. ²⁰. This enables us to further improve the provided resource estimates for these designs.

Improving the T-count versus T-depth

Table 1 shows the synthesis results of the first two proposed algorithms. Alg. 1 minimizes the T-count, while Alg. 2 minimizes the T-depth without increasing the number of T gates, but relying on an increased number of additional qubits. The number of T gates achieved is equal to 4 times the multiplicative complexity of the network for both algorithms. The second algorithm obtains a T-depth equal to the multiplicative depth of the network. The last two columns of Table 1 compare the algorithms by reporting: the percentage of absolute change in T-depth (%Td) and in number of qubits (%Q) of Alg. 2 with respect to Alg. 1.

Table 1 Compilation results.

Full size table

Figure 2 compares the results automatically obtained using Alg. 2 with some resource estimates available in the literature^11,12,15,17. The comparison shows a significant reduction in both T-count and T-depth, while facing a less significant increase in number of qubits. Nevetheless, it is important to note that once mapped into an error-correcting code, T gates require a large amount of dedicated qubits. Note that the authors of ref. ¹⁷ only report the number of Toffoli gates and the Toffoli-depth. We obtain the corresponding T-count and T-depth by considering the Clifford+T implementation of the Toffoli gate with 7 T gates and a T-depth equal to 3, which is optimal³⁸.

**Fig. 2: Resource estimates for AES-128/192/256 and SHA-256 compared with the state-of-the art: Jaques et al.¹², Grassl et al.¹¹, Langenberg et al.¹⁷ and Amy et al.¹⁵.**

Qubits/T-count trade-off

In this section, we show the results generated by our third algorithm to manage the memory resources during the compilation of the logic design. Our method allows us to force the compilation to synthesize a circuit with a limited number of helper qubits. Figure 3 shows the compilation results obtained setting the number of available helper qubits to different values, for a selection of designs. The plots show on the x-axis the number of qubits, and on the y-axis the obtained T-count. For every fixed number of qubits we report two points: the non-optimized and the optimized results. The latter obtained by running a post-optimization procedure encoded as a SAT problem on the initial (non-optimized) result. It can be seen how the procedure allows us to choose between different qubit/T-count trade-off solutions and how the optimization manages to minimize the T-count.

**Fig. 3: Results of pebbling selected logic networks using different number of pebbles: comparison between optimized and non-optimized solutions.**

Discussion

In the last section, we reported the specifics of quantum circuits compiled using our three XAG-based algorithms. In particular, the first two techniques achieve results that are predictable by inspecting the characteristics of the logic network. In details, given a logic network characterized by a multiplicative complexity $\tilde{c}$, i.e., the number of AND nodes, and by a multiplicative depth:

both algorithms achieve a T-count equal to $4\tilde{c}$;
Alg. 2 achieves a T-depth equal to the multiplicative depth;
the qubit overhead to achieve such T-depth depends on the number of shared inputs in the linear transitive fan-ins of the AND nodes in a level.

This suggests that improving a network with respect to the named parameters can strongly and positively impact the synthesized quantum circuits, e.g., as done in ref. ²¹, to reduce the T-depth by reducing the multiplicative depth of the network.

Inspecting the results of the comparison in Table 1 reveals a trade-off between T-depth and number of qubits. Indeed, while Alg. 1 is far from achieving the T-depth performances of Alg. 2, it requires fewer qubits. There are two reasons for the increase in qubits which characterizes Alg. 2. The first one is that it employs the AND implementation characterized by a single T-stage and presented in Section “Introduction” (10), which requires one qubit more than implementation (8) used by Alg. 1. This means that the compilation will request this extra qubit whenever a AND node is computed. In addition, the implementation of AND nodes used by the second algorithm is characterized by a T gate applied to the controls, as well as to the target qubit. For this reason, if two AND nodes share the same input signal, the corresponding quantum circuit will have a T-depth equal to 2, as each AND implementation will add a T gate to the shared qubit. If all the AND nodes at the same level of an XAG do not share any input, they can be computed within a single T-stage. In order to achieve this result, our second algorithm copies inputs that are shared among more AND nodes in a level on new qubits. Hence, the compilation will request a new qubit whenever inputs are shared among AND nodes at the same level in the XAG. In conclusion, if we sum the number of AND nodes in a level with the number of shared inputs among them, we obtain a quantity equal to the number of helper qubits required to compile that level. Since helper qubits are cleaned-up after all the nodes in the level are computed, the level for which this amount is greater will dominate and give the total number of helper qubits for the synthesis of the entire network. Further details on the algorithm, including detailed pseudo-code, can be found in Section “Methods”.

We chose to report in Table 1 the two extremes that can be reached using our constructive algorithms. It is also possible to obtain results ‘in-between’, i.e., a smaller improvement in T-depth and a smaller qubit overhead with respect to Alg. 2, e.g., by modifying Alg. 1 to use the implementation with T-depth equal to one. In addition, as the connectivity of each AND node in a level has an impact on the T depth, different results can be found by changing how the level of each node is computed. For example, it is possible to change the scheduling of the nodes to reduce the T depth while minimizing the qubit overhead of Alg. 2.

Our third algorithm focuses on exploring the trade-off between T-count and number of qubits. Figure 3 shows how our method is capable of providing different compiled solutions, by taking the number of helper qubits as a parameter. Our method finds the best way of reusing memory space, by computing and uncomputing helper qubits that store intermediate results. This problem corresponds to the reversible pebbling game. The problem complexity has been studied in ref. ⁴³, where the author proves that finding the minimum number of pebbles is PSPACE-complete, as in the case of the non-reversible pebbling game. Besides, the problem is PSPACE-hard to approximate up to an additive constant⁴⁴. An explicit asymptotic expression for the best time-space product is given in ref. ⁴⁵. This is a global problem, hard to approximate and decompose, hence difficult to be tackled by heuristic techniques. Here, the problem is encoded as a SAT problem and solved globally, returning a valid memory clean-up strategy that guarantees the upper bound on the number of helper qubits while also aiming to minimize the T-count.

With respect to the SAT-based technique in ref. ²⁷, the algorithm proposed in this work exploits a completely different SAT encoding, which is more compact in both number of variables and clauses. With this method it is possible to obtain competitive results for larger designs while guaranteeing better results for smaller designs. For example, consider the compilation of the small design s8 on 20 helper qubits: our method achieves a T-count of 164 while the results in ref. ²⁷ show a T-count of about 280.

In Fig. 3 we show non-optimized versus optimized pebbling solutions. The non-optimized solution is provided by the SAT solver without any constraints on the number of T gates generated. The optimized solution is obtained starting from the initial solution and running optimization rounds, which iteratively add clauses to the SAT problem to minimize the T-count. The more time is spent in the optimization procedure the better the solution. The optimized points shown in Fig. 3 are either optimal or the best result found after 1 and a half hours of running the optimization procedure on a machine with two Intel Xeon E5-2680 v3 (Haswell) CPUs with 2.5 GHz clock frequency and 16 GB of main memory.

The optimization procedure removes unnecessary steps that the solver may insert in the solution. Indeed, none of the clauses used to encode the problem prevents the solver to uncompute nodes even if the limit in pebbles is not reached. Preventing this at the encoding level requires a non-practical increase in the size of the SAT problem. The optimization reveals the trade off between qubits and T-count.

Methods

Algorithm 1: minimizing the T-count

Our first algorithm achieves an upper bound on the number of T gates that is proportional to the multiplicative complexity of the input network $\tilde{c}$. Indeed, the final quantum circuit has $4\tilde{c}$T gates.

The key insight is that each AND node in the logic network is driven by two multi-input parity functions of variables which are either inputs or other AND nodes in the lower levels of the logic network. Figure 4 shows the node x_i and the two parity functions with the respective linear transitive fan-ins. The polarity variables p(i) and q(i) take into account possible inversion of the inputs of the AND node. The pseudo-code of the algorithm is provided by Alg. 1. Since the algorithm dedicates one helper qubit for each node of the XAG to store its computed Boolean function, we use nodes’ identifiers, e.g. x_i, as parameters for quantum operations, e.g., NOT(x_i), meaning that the operation is performed on the corresponding qubits.

**Fig. 4: Illustration of the general idea in which the fan-in nodes of an AND gate are considered as large XOR gates, computed in-place using CNOT gates.**

Lines 19–22 show that, at first, it computes all the steps of the network that perform the AND (or compute an output) using the function compute. Then all the intermediate results are restored to $\left|0\right\rangle$ by uncomputing ‘compute’. In lines 23–24 NOT gates are placed on negated outputs. The function compute (lines 2–18) builds the circuit for each step x_i as illustrated in Fig. 4. In particular, it identifies two qubits corresponding to nodes in the ltfi cones that are not shared between the cones, namely t₁ and t₂. Then, the parity functions are computed in-place onto these qubits t₁ and t₂. Then, the complemented edges are evaluated and NOT gates are applied if necessary (see Fig. 4). In lines 13–14 the step x_i is finally computed on a new qubit, using a CNOT gate in case of an XOR output or the implementation of the AND node described in (8), which has T-count equal to 4 and T-depth equal to 2, otherwise. Finally, the parity functions are uncomputed.

Algorithm 1

Low T-count compilation algorithm.

Note that we assume that L₁ ≠ L₂. If this is not the case, it means that the functions computed by fan-in to the AND gate are equal, making the AND gate redundant. Also, note that the intersection of L₁ and L₂ may not be empty. Since we want to compute the value of L₁ in-place on some signal t₁ ∈ L₁, we must ensure that L₁ ⊈ L₂. If the latter condition applies, it is sufficient to swap L₁ and L₂.

In addition, when L₂ ⊆ L₁, the value computed by L₂ could be reused to compute L₁. This is achieved by modifying the elements in L₁ such that L₁ = (L₁\L₂) ∪ {x_k}. An example is shown in Fig. 5. In this case ltfi(x_j) includes ltfi(x_k) and ltfi(x_j)\ltfi(x_k) = {t₀}. This leads to a reduction in the number of CNOT operations.

**Fig. 5: A special configuration with one transitive fan-in included in the other.**

Algorithm 2: minimizing the T-depth

Our second algorithm targets the reduction of the T-depth. Unlike the previous algorithm, it uses the implementation of the AND operation that has 4 T gates, 4 qubits, and 1 T-stage (10).

We refer to X_l = {x_i∣L(x_i) = l}, as the set of all the nodes at level l. The key idea is that if two AND nodes in the same level do not share any of their input in the ltfi sets, then they can be computed with only one T-stage using implementation (10). Obviously, this is not always the case, as AND nodes often share the same inputs. To overcome this problem, the algorithm copies every overlapping set of inputs on a new helper qubit. This procedure, described in Alg. 2, obtains circuits with a number of T-stages equal to the multiplicative depth of the networks. While the previously described algorithm proceeds in topological order, this one proceeds level by level (see lines 10–17). For each level, the function copy_overlaps assigns to each node a set of two qubits on which it computes the parities of the two fan-in cones, defining the mapping CP. If the node shares some inputs with another, a new qubit will be assigned to compute the corresponding parity function, otherwise a qubit corresponding to a node in the fan-in cone is used. This means that if a node x_i ∈ X_l has inputs t₁, t₃, t₅ (on qubits q₁, q₃, q₅) in common with node x_j ∈ X_l, then a new qubit q_i will be used as target of three CNOT gates with the shared input qubits as controls. As it can be seen in line 11, the copies are performed before computing any of the nodes in the level, thus allowing the actual AND implementations to act on non-overlapping qubits, resulting in a single T-stage. Once the copies are being computed, each node is passed to the function compute_on_copies (lines 1–9) which uses the qubits associated by the mapping CP to each fan-in parity function as controls to compute the AND. Once all AND nodes in the level are computed, the parities are uncomputed (lines 14). Finally the levels in the XAG are uncomputed from top to bottom. Every node, independently from having shared fan-ins can be uncomputed without using copies (lines 15–17), applying the function compute defined in Alg. 1. Finally in lines 18–end NOT gates are placed on complemented outputs. An illustrative example is shown in Fig. 6, where the algorithm is applied to a simple level X_l = x_i, x_s with one overlapping input t₀, such that ltfi(x_j(i)) ∩ ltfi(x_k(s)) = {} and ltfi(x_j(s)) = ltfi(x_k(i)) = {t₀}. The figure shows how the overlapping input is copied to a new qubit before computing the parity functions: then the two AND can be computed in parallel with a T-depth equal to 1.

Algorithm 2

Low T-depth compilation algorithm.

Algorithm 3: minimizing the number of qubits

All the algorithms described so far compute and uncompute every AND node at most once, and the compiled circuit is uniquely determined by the features of the input network. In this section, we show a method that, instead, allows us to explore the solution space, by enabling to compute and uncompute nodes several times.

The third algorithm seeks the best strategy to uncompute the intermediate results in order to optimize the memory usage. The problem is equivalent to the reversible pebbling game. The game is played on a directed acyclic graph (DAG) using a limited number of pebbles. The player places or removes pebbles from the DAG nodes according to certain rules: a pebble can be placed (removed) from a node only if all the inputs of that node have a pebble. The game is won when pebbles are only placed on the network’s output. The set of moves that leads to a winning configuration is called pebbling strategy. Every pebble in the game corresponds to a helper qubit. The move of placing a pebble on a node corresponds to computing the logic of that node on this helper qubit. When a pebble is removed, it corresponds to uncomputing the value stored on the helper qubit. As a consequence, the pebbling strategy directly corresponds to a set of compute/uncompute operations. The definition of a winning configuration (no pebbles on internal nodes) guarantees that performing this set of operations uncomputes all intermediate results. As demonstrated in ref. ³⁰, SAT solvers can be used to solve the reversible pebbling game and find a synthesis strategy for any Boolean function represented using a DAG.

The compilation problem is transformed into the following problem:

Problem 1

Given a DAG and a number of pebbles, find a valid pebbling strategy using the minimum number of moves.

To address this problem using a SAT solver, it needs to be decomposed into many SAT problems:

Problem 2

Given a DAG and P pebbles, does a valid pebbling strategy with K moves exist?

The solver can either find a solution and return a pebbling strategy, or state that no solution exists. In order to solve problem 1, when the SAT solver returns unsat, K is incremented and the solver is asked to find a strategy again. This is done until a satisfying solution is found. Since K is incremented at each step, once a solution is found, it is guaranteed to be the one with the smallest K.

SAT encoding

Here we give a quick overview of the basic encoding. The input DAG G = (V, E) figures nodes computing output values and we refer to them as elements of the set O ⊆ V. Note that the primary inputs are not nodes of the DAG. Problem 2 is encoded in terms of the pebble state variables p_v,i. For v ∈ V and 0 ≤ i ≤ K, those are Boolean variables that evaluate to true if the node v is pebbled at time i. Note that the SAT formula encodes K + 1 pebble configurations with K steps describing the transition from one configuration to the other. The following set of clauses describes the reversible pebbling problem:

Initial and final clauses. At time 0 all the nodes are unpebbled and at time K all the outputs need to be pebbled and all the intermediate results unpebbled
$$\mathop{\bigwedge}\limits_{v\in V}{\bar{p}}_{v,0} \wedge \mathop{\bigwedge}\limits_{v\in O}{p}_{v,K}\wedge \mathop{\bigwedge}\limits_{v\notin O}{\bar{p}}_{v,K}$$
Move clauses. If a node is pebbled or unpebbled at time i + 1, then all its children are pebbled at time i and time i + 1:
$$\mathop{\bigwedge }\limits_{i=1}^{K}\mathop{\bigwedge}\limits_{(v,w)\in E}(({p}_{v,i}\oplus {p}_{v,i+1})\to ({p}_{w,i}\wedge {p}_{w,i+1}))$$
Cardinality clauses. At each step, at most P pebbles are used:
$$\mathop{\bigwedge }\limits_{i=0}^{K}(\mathop{\sum}\limits_{v\in V}{p}_{v,i} \le P)$$

Example 2

Figure 7 illustrates how a network with only AND nodes can be compiled as a reversible network of Toffoli gates out of a pebbling solution with 3 pebbles and 6 steps. Note that the final circuit will use only 2 helper qubits, which is the number of pebbles used, minus the number of outputs. The overall width will be equal to 7: the number of inputs plus the number of pebbles.

XAGs are DAGs in which each node computes the AND or the XOR function. It follows that it is possible to play the reversible pebbling game directly on the XAG, as done in ref. ²⁷. Nevertheless, this does not exploit the structural properties of the XAG. In addition, the SAT encoding required for a similar approach must be capable of discriminating between the different properties of the XAG node. For example, several clauses are required to enable in-place computing of XOR nodes. The resulting SAT problem features many variables and clauses and is only applicable to small designs.

**Fig. 7: Illustration of a pebbling strategy using 3 pebbles and 6 moves.**

For these reasons, we choose to construct a different DAG from the XAG, which we call abstract graph. Each AND node (and its two input parity functions) corresponds to a box node of the abstract graph, as shown in and Fig. 8. Once a strategy for pebbling the abstract graph is found, each time a pebble is placed on a box node which compresses x_i the compute (x_i) function will be called, while whenever a pebble is removed from a node, the compute^†(x_i) function will be called to uncompute the node.

Optimizing the pebbling solution

While the XAG is compressed into the abstract graph we lose some information about the number of quantum gates required to compute each node. Indeed, the strategy found would not take into account the fact that one box node requires more gates to be performed than another. In addition, the SAT encoding of the standard reversible pebbling game does not include any clause that controls the number of moves, which reflects in the number of generated T gates. An optimization step is introduced to overcome both problems.

The key idea is that it is possible to associate a weight with each box node of the abstract graph w_v, which is equal to the number of inputs to the node itself. Indeed, the number of inputs are related to the number of CNOT gates that are needed to compute the parity functions ‘hidden’ in the compressed node. Then, we define a new set of variables for the SAT encoding: activation variables a_v,i. For v ∈ V and 0 < i ≤ K, those are Boolean variables that evaluate to true if the node v has changed its state at time i. Once a weight-agnostic solution has been found, the following quantity represent the total weight of the strategy:

$${W}_{s}=\mathop{\sum }\limits_{i=1}^{K}\mathop{\sum}\limits_{v\in V}{w}_{v}{a}_{v,i}$$

(11)

The SAT solver is then asked to find a solution with a total weight W = W_s − 1 by adding a cardinality clause that expresses equation (11). This procedure is repeated until the solver returns ‘unsat’ or hits a timeout.

As shown in the result section, this optimization procedure succeeds at reducing the number of T gates with respect to the initial solution. This result can be achieved even if every node has weight equal to one. Indeed, the optimization introduces a cardinality constraint on the activation variables, hence eliminates all the pebbling moves that are not fundamental to terminate the game. As a consequence, fewer helper qubits are required. If the weights are set to reflect the actual size of the parity functions, then the number of CNOT in the solution is reduced.

Data availability

The circuits we synthesized have been collected by the NIST and the University of Yale (http://cs-www.cs.yale.edu/homes/peralta/CircuitStuff/CMT.html) and by the Department of Electrical Engineering (ESAT) at KU Leuven (https://homes.esat.kuleuven.be/^~nsmart/MPC/). For some entries of our benchmark we used circuit implementations with low multiplicative complexity obtained at EPFL and available online at https://github.com/lsils/date2020_experiments.

Code availability

All the algorithms that we discussed in this work are part of the C++ open-source library caterpillar (https://github.com/gmeuli/caterpillar), which is one of the LSI logic synthesis libraries⁴⁶.

References

Svore, K. M. et al. Q#: Enabling scalable quantum computing and development with a high-level DSL. In Real World Domain Specific Languages Workshop, 7:1–7:10 (2018).
Aleksandrowicz, G. et al. Qiskit: An Open-source Framework for Quantum Computing (2019). Zenodo. https://doi.org/10.5281/zenodo.2562111.
Smith, R. S., Curtis, M. J. & Zeng, W. J. A practical quantum instruction set architecture. Preprint at https://arxiv.org/abs/1608.03355 (2017).
Ho, A. & Bacon, D. Announcing Cirq: An open source framework for NISQ algorithms. Google AI Blog (2018).
Green, A. S., Lumsdaine, P. L., Ross, N. J., Selinger, P. & Valiron, B. Quipper: a scalable quantum programming language. In ACM SIGPLAN Conference on Programming Language Design and Implementation, 333–342 (2013).
Javadi-Abhari, A. et al. ScaffCC: a framework for compilation and analysis of quantum computing programs. Proceedings of the 11th ACM Conference on Computing Frontiers, CF 2014 (2014).
Steiger, D. S., Häner, T. & Troyer, M. ProjectQ: an open source software framework for quantum computing. Quantum 2, 49 (2018).
Article Google Scholar
Grover, L. K. Quantum computers can search arbitrarily large databases by a single query. Phys. Rev. Lett. 79, 4709 (1997).
Article ADS Google Scholar
Shor, P. W. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 41, 303–332 (1999).
Article ADS MathSciNet Google Scholar
Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502 (2009).
Article ADS MathSciNet Google Scholar
Grassl, M., Langenberg, B., Roetteler, M. & Steinwandt, R. Applying Grover’s algorithm to AES: quantum resource estimates. In: Post-Quantum Cryptography. PQCrypto 2016 (ed. Takagi, T.), vol. 9606, 29–43 (2016).
Jaques, S., Naehrig, M., Roetteler, M. & Virdia, F. Implementing grover oracles for quantum key search on AES and LowMC. In Annual Int’l Conf. on the Theory and Applications of Cryptographic Techniques, 280–310 (Springer, 2020).
NIST. Submission requirements and evaluation criteria for the post-quantum cryptography standardization process (2016). Online at https://csrc.nist.gov/CSRC/media/Projects/Lightweight-Cryptography/documents/final-lwc-submission-requirements-august2018.pdf.
Häner, T., Jaques, S., Naehrig, M., Roetteler, M. & Soeken, M. Improved quantum circuits for elliptic curve discrete logarithms. In Int’l Conf. on Post-Quantum Cryptography, 425–444 (Springer, 2020).
Amy, M. et al. Estimating the cost of generic quantum pre-image attacks on sha-2 and sha-3. In: Selected Areas in Cryptography – SAC 2016. (eds. Avanzi, R. & Heys, H.), vol. 10532, 317–337 (2017).
Parent, A., Roetteler, M. & Svore, K. M. Reversible circuit compilation with space constraints. Preprint at https://arxiv.org/abs/1510.00377 (2015).
Langenberg, B., Pham, H. & Steinwandt, R. Reducing the cost of implementing the advanced encryption standard as a quantum circuit. IEEE Trans. Quantum Eng. 1, 1–12 (2020).
Article Google Scholar
Kim, P., Han, D. & Jeong, K. C. Time-space complexity of quantum search algorithms in symmetric cryptanalysis: applying to AES and SHA-2. Quantum Inf. Process. 17, 339 (2018).
Brayton, R. K., Hachtel, G. D. & Sangiovanni-Vincentelli, A. L. Multilevel logic synthesis. Proc. IEEE 78, 264–300 (1990).
Article Google Scholar
Testa, E., Soeken, M., Riener, H., Amaru, L. & De Micheli, G. A logic synthesis toolbox for reducing the multiplicative complexity in logic networks. In Design, Automation and Test in Europe Conference (2020).
Häner, T. & Soeken, M. Lowering the T-depth of quantum circuits by reducing the multiplicative depth of logic networks. Preprint at https://arxiv.org/abs/2006.03845 (2020).
Rawski, M. Application of functional decomposition in synthesis of reversible circuits. In Reversible Computation. RC 2015. (eds. Krivine, J. & Stefani, J. B.), vol. 9138, 285–290 (2015).
Markov, I. L. & Saeedi, M. Faster quantum number factoring via circuit synthesis. Phys. Rev. A 87, 012310 (2013).
Article ADS Google Scholar
Shende, V. V., Prasad, A. K., Markov, I. L. & Hayes, J. P. Synthesis of reversible logic circuits. IEEE Trans. Comput. Aided Design Integrated Circuits Syst. 22, 710–722 (2003).
Article Google Scholar
Soeken, M., Roetteler, M., Wiebe, N. & De Micheli, G. LUT-based hierarchical reversible logic synthesis. IEEE Trans. Comput. Aided Design Integrated Circuits Syst. 38, 1675–1688 (2018).
Article Google Scholar
Meuli, G., Soeken, M., Roetteler, M. & De Micheli, G. ROS: Resource constrained oracle synthesis for quantum circuits. In Quantum Physics and Logic (2019).
Meuli, G., Soeken, M., Campbell, E., Roetteler, M. & De Micheli, G. The role of multiplicative complexity in compiling low T-count oracle circuits. Int’l Conf. on Computer-Aided Design (2019).
Meuli, G., Soeken, M., Roetteler, M. & De Micheli, G. Enumerating optimal quantum circuits using spectral classification. In Int’l Symp. on Circuits and Systems (2020).
Bennett, C. H. Time/space trade-offs for reversible computation. SIAM J. Comput. 18, 766–776 (1989).
Article MathSciNet Google Scholar
Meuli, G., Soeken, M., Roetteler, M., Bjorner, N. & Micheli, G. D. Reversible pebbling game for quantum memory management. In Design, Automation and Test in Europe Conference, 288–291 (2019).
Brayton, R. & Mishchenko, A. ABC: An academic industrial-strength verification tool. In Int’l Conf. on Computer Aided Verification, 24–40 (Springer, 2010).
Synopsys. Design compiler graphical. Online at https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/design-compiler-graphical.html (2020). Accessed Apr 2020.
Knuth, D. E. The Art of Computer Programming, vol. 4A (Addison-Wesley, 2011).
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 2000).
Campbell, E. T. & Howard, M. Unified framework for magic state distillation and multiqubit gate synthesis with reduced resource cost. Phys. Rev. A 95, 022316 (2017).
Article ADS Google Scholar
Fowler, A. G., Mariantoni, M., Martinis, J. M. & Cleland, A. N. Surface codes: Towards practical large-scale quantum computation. Phys. Rev. A 86, 032324 (2012).
Article ADS Google Scholar
Maslov, D. Advantages of using relative-phase Toffoli gates with an application to multiple control Toffoli optimization. Phys. Rev. A 93, 022311 (2016).
Article ADS Google Scholar
Amy, M., Maslov, D., Mosca, M. & Roetteler, M. A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits. IEEE Trans. CAD Integrated Circuits Syst. 32, 818–830 (2013).
Article Google Scholar
Gosset, D., Kliuchnikov, V., Mosca, M. & Russo, V. An algorithm for the T-count. Quantum Inf. Comput. 14, 1261–1276 (2014).
MathSciNet Google Scholar
Jones, C. Low-overhead constructions for the fault-tolerant Toffoli gate. Phys. Rev. A 87, 022328 (2013).
Article ADS Google Scholar
Gidney, C. Halving the cost of quantum addition. Quantum 2, 10–22331 (2018).
Article Google Scholar
Selinger, P. Quantum circuits of T-depth one. Phys. Rev. A 87, 042302 (2013).
Article ADS Google Scholar
Chan, S. M. Pebble games and complexity. Ph.D. thesis, University of California, Berkeley (2013).
Chan, S. M., Lauria, M., Nordstrom, J. & Vinyals, M. Hardness of approximation in PSPACE and separation results for pebble games. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, 466–485 (2015).
Knill, E. An analysis of Bennett’s pebble game. Preprint at https://arxiv.org/abs/math/9508218 (1995).
Soeken, M. et al. The EPFL logic synthesis libraries. Preprint at https://arxiv.org/abs/1805.05121 (2018).

Download references

Acknowledgements

This research was supported by the Swiss National Science Foundation (200021-169084 MAJesty).

Author information

Authors and Affiliations

Integrated Systems Laboratory, EPFL, Lausanne, Switzerland
Giulia Meuli, Mathias Soeken & Giovanni De Micheli
Synopsys Italia, Silicon Realization Group, Agrate Brianza, Italy
Giulia Meuli

Authors

Giulia Meuli
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Soeken
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni De Micheli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.M. and M.S. conceived the algorithms and planned the experimental evaluation. G.M. implemented the algorithms, performed the experiments and analyzed the data. G.D.M. coordinated the project. G.M. wrote the manuscript. All authors revised and approved the content of the manuscript.

Corresponding author

Correspondence to Giulia Meuli.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Meuli, G., Soeken, M. & De Micheli, G. Xor-And-Inverter Graphs for Quantum Compilation. npj Quantum Inf 8, 7 (2022). https://doi.org/10.1038/s41534-021-00514-y

Download citation

Received: 14 August 2020
Accepted: 03 December 2021
Published: 27 January 2022
DOI: https://doi.org/10.1038/s41534-021-00514-y

This article is cited by

Toffoli-depth reduction method preserving in-place quantum circuits and its application to SHA3-256
- Jongheon Lee
- Yousung Kang
- Dooho Choi
Quantum Information Processing (2024)