Introduction

As the field of quantum computing has been gaining momentum over the last few years, the need for optimizing quantum circuits is also growing. Quantum adders are one of the most important basic components of quantum computing circuits. The continuous development of quantum adders has not only improved the efficiency of small basic quantum computing circuits such as multiplication circuits but also has a significant effect on various prominent large-scale quantum circuits. On the one hand, an efficient quantum adder can increase the speed of quantum addition and multiplication operations and reduce the cost of the required resources. On the other hand, quantum adders are widely used in Shor’s algorithm1 which plays an important role in the field of public key cryptography. Therefore, an efficient quantum adder not only has significant financial benefits but also makes a crucial contribution to the development of quantum computing.

Even though the adders are highly analyzed in classical computing, we observe that the niche is still not properly studied in the quantum paradigm. In the field of quantum adders, the quantum ripple carry adder (RCA)2,3 was first proposed. However, the T-depth of quantum RCAs increases linearly with the number of input qubits, which means they need a long time to perform the operation. Then some quantum carry look-ahead adder (CLA) designs such as Draper’s logarithmic adder4 have been proposed to achieve further efficiency gains, with the T-depth increasing logarithmically with the number of input qubits. In the field of the classical adder, Gürkaynak5 et al. found that increasing the radix of the CLA can effectively decrease the computation time of classical CLAs. However, the problem of quantum CLA implementation has not been explored yet, to the best of our finding.

The objective of this work is to explore the potential of the higher radix strategy in improving the performance of quantum arithmetic circuits. As we will see later, high fan-out is a challenge for quantum adder. Using the idea of separating the propagation and summation in the Manchester Carry Chain (MCC), we avoided this problem and proposed an innovative quantum higher radix adder. Specifically, this circuit can be divided into two parts. Firstly, in the higher radix part, we use Gidney’s Logical-And6 and Selinger’s Multi-control Toffoli construction method7 to implement a general quantum higher radix circuit. Secondly, in the MCC part, we chose the Brent-Kung structure as the carry path and Gidney’s RCA as the sum path after carefully analysing all possible carry propagation structures and sum paths. By integrating the quantum higher radix and MCC parts, we propose a quantum higher radix adder.

This work improves the efficiency of quantum circuits at different scales by proposing an innovative higher radix adder circuit. It is hoped that our paper can contribute to a deeper understanding of the great potential of multi-control Toffoli gates and the higher radix strategy in improving the performance of quantum arithmetic circuits.

The remainder of this paper is divided into eight parts. The next part (“Section Previous works”) introduces prominent previous research works. Following that, the second part (“Section Method”) describes the implementation details of the higher radix layer and the whole structure of the quantum higher radix adder. We present the evaluation results in the third part (“Section Results and discussions”). We conclude the paper in thereafter (“Section Conclusion”), though some additional information, discussion, and examples can be found in the supplementary material.

Previous works

In this section, we describe the previous relevant research. These related works can be divided into the following three types.

Firstly, we start by introducing some important quantum adders based on the structure of classical adders. Over the past few decades, a large number of quantum adder designs have been published. These works can be divided into quantum RCA and quantum CLA. The structures of quantum RCA are very simple, but the computation time tends to grow linearly with the qubit number. In 1995, Vedral2 et al. designed a simple reversible quantum adder that is based on the classical RCA structure. However, VBE adder requires linear ancilla qubits, which incurs a very large cost. In order to reduce the cost of the VBE adder, Cuccaro3 et al. designed a new structure that only requires one ancilla qubit with lower T-depth and T-count. Subsequently, various RCA structures8,9,10,11 have been proposed. Unlike the quantum RCA, the quantum CLA tends to be more efficient and is capable of performing addition in exponential time. In 2004, Draper4 et al. borrowed the classical CLA structure and then designed a quantum logarithmic T-depth adder based on the Brent-Kung structure12. Takahashi13,14 et al. made further optimizations on the Draper’s adder, resulting in the designs of a variety of cheaper quantum CLA adders.

Besides, there are also some quantum adders that use the unique properties of qubits so that they cannot be implemented in the classical world. For example, Gidney6 proposed a Logical-And structure based on the properties of qubits which significantly reduces the cost required to build a pair of Toffolis. Compared to Cuccaro’s adder, Gidney’s design greatly reduces the T-count and T-depth required to construct a quantum RCA adder. In the later sections, Gideney’s RCA will be used as our sum path in the MCC part.

Most importantly, we draw the higher radix strategy from classical adders. In 2000, Gürkaynak et al.5 found that higher radix adders, which have larger fan-in and fan-out, tend to be more efficient than radix-2 CLAs. However, the higher radix idea has not been applied in the quantum world so far. This is perhaps due to the fact that qubits are not copyable, which means the fan-out of the propagation part in a quantum CLA cannot be larger than 2. Therefore, we cannot construct higher radix quantum adders with higher fan-out in the quantum world. To solve this problem, we borrow the idea of separating the carry chain and sum chain from the classical MCC adder. Specifically, our quantum adder is devided into two parts. The first part is called the carry path, which is used to calculate specific intermediate carry bits. We use Brent-Kung structure12 to construct the carry path. The second part is called the sum path, which is used to compute the result of the addition according to those specific intermediate carry bits obtained from the carry path. In this paper, we use multiple parallel carry-select adders (CSAs) or quantum RCAs to construct the sum path. Overall we show that various quantum adders can be categorized as qubit count(QC), T-count, T-depth, similar to Harris’ classification of classical adders15.

Method

In this section, we first introduce C\(_n\)NOT, which is an important basic component of the quantum higher radix adder. Specifically, our method is divided into three steps. In the first step, we describe how to construct the higher radix layer based on the C\(_n\)NOT gate. In the second and third steps, the Brent-Kung structure and Gidney’s RCA are chosen as the Carry path and Sum path after analyzing five carry-propagate structures and two sum structures, respectively. At the end of this section, we describe how to construct the overall higher radix circuit in detail.

Basic component: C\(_n\)NOT

C\(_n\)NOT is a basic component of the quantum higher radix adder. In order to construct a cheap C\(_n\)NOT gate, we use and optimize Gidney’s Logical-And structure in Fig. 1a,b.

$$\begin{aligned} |T\rangle = {\left\{ \begin{array}{ll} \frac{1}{\sqrt{2}}(|0\rangle +e^{i\pi /4}|1\rangle )&{} \text {~if~} ancilla=|0\rangle ; \\ \frac{1}{\sqrt{2}}(|0\rangle -e^{i\pi /4}|1\rangle ) &{} \text {~if~} ancilla=|1\rangle \\ \end{array}\right. } \end{aligned}$$
(1)
Figure 1
figure 1

Gidney’s logical-and structure.

In Gidney’s paper6, the first formula in Equation (1) is used to define the special state |T\(\rangle \) in the Logical-And structure (Fig. 1a). According to it, we can apply a Hadamard gate first and then a T gate on an ancilla with state |0\(\rangle \) to obtain |T\(\rangle \). However, we found that it is also possible to use an ancilla with state |1\(\rangle \) instead of |0\(\rangle \) to construct this structure. The related formula is shown in the lower part of Equation (1). In brief, we can perform operations such as NOT before Logical-And structure, which can be used to reduce the number of qubits required for our quantum adder in later sections. After expanding the scope of Logical-And’s application, we then describe the specific structure and decomposition method of the proposed multi-control Toffoli gate.

As shown in Table 1, we found that as the number of control qubits increases, the C\(_n\)NOT gate can effectively reduce the average T-count, T-depth, and QC per control qubit. In order to reduce the cost required by the circuit, we use multi-control Toffolis in this paper. In 2013, Selinger7 proposed a general method for constructing C\(_n\)NOT gates using Clifford gates and T gates. This paper optimizes Selinger’s general method by incorporating ideas from Logical-And structure6 and another Toffoli decomposition method16. Figure 2 shows the specific structure of our multi-control Toffoli gate. Based on Logical-And structure, we take C\(_3\)NOT as an example to show how to construct multi-control Toffoli gates. In Fig. 2a, we show how to construct an unpaired C\(_3\)NOT gate with Clifford + T gates. For a pair of C\(_3\)NOT gates, Fig. 2b,c introduce the computation and uncomputation circuits, respectively. In Fig. 2a–c, the first circuits from left to right represent the original designs, and the second circuits show how to use Toffolis to decompose circuits in the first column. Besides, the third circuits use Logical-And structure to decompose the original circuits.

Figure 2
figure 2

Construction of multi-control Toffoli.

Table 1 Cost of C\(_n\)NOT.

Specifically, using computation and uncomputation structures of Logical-And to decompose the rest of the Toffoli gates except the middle one can effectively reduce the T-count and T-depth of our multi-control Toffolis. As a result, the efficiency of the whole circuit is increased.

As shown in Fig. 3, there are five general decomposition methods for the middle unpaired Toffoli gate7,16. According to Table 2, we can observe that all decomposition methods require 7 T gates. Since the proposed higher radix adder has a high QC, we do not wish to introduce more ancilla qubits by using decomposition methods with ancilla bits. Although the T-depth of the quantum circuit that we obtain using Method 5 to decompose one unpaired Toffoli is only 1, it introduces 4 additional ancillae, which will greatly increase qubits to construct a CLA, so we do not choose this decomposition method. Similarly, Method 4 also introduces extra ancilla qubits. Among Methods 1, 2, and 3, Method 3 is the most efficient because it has the smallest T-depth. Hence, we choose Method 3 to decompose all the unpaired Toffolis in our work.

Figure 3
figure 3

Different methods to decompose unpaired Toffoli with Clifford \(+\) T gates7,16.

Table 2 Summary table of Toffoli decomposition.

Step 1: Higher radix layer

In this section, we introduce the higher radix strategy and implementation details for applying it to quantum circuits. Radix is the bit-width of each CLA block in carry look-ahead adders. For convenience, we use r to represent radix. In the classical world, the higher radix adder has shown its advantages5. By increasing the radix, an adder with higher fan-in and fan-out is constructed, which can reduce the propagation time of propagates (p) as well as generates (g), thus improve the efficiency of classical adder.

We firstly take the radix 2 CLA as an example to calculate the sum of two binary numbers a and b. We need to get the carry c after 2-step calculations. In the first step, we calculate the p and g of each bit. For the ith bit, we use the formulas 1 and 2 to compute the corresponding \(p_i\) and \(g_i\), respectively. In the second step, after computing the propagation of p and g to obtain the intermediate quantities P and G, we can calculate the carry c. Here, we assume that j is less than or equal to i. Equations (4), (5) and (6) describe how to propagate p and g from bit i to bit j. After using the formulas (7), we are able to obtain the final carry c. The standard notations of \(\oplus \) for logical XOR, \(\cdot \), \(+\), \(\circ \) for logical AND, logical OR and propagation, respectively are here.

$$\begin{aligned}&p_i&= a_i \oplus b_i \end{aligned}$$
(2)
$$\begin{aligned}&g_i&=a_i \cdot b_i \end{aligned}$$
(3)
$$\begin{aligned}&(G_{0:0}, P_{0:0})&=(g_0, p_0) \end{aligned}$$
(4)
$$\begin{aligned}&(G_{0:i}, P_{0:i})&=(g_i, p_i)\circ (G_{0:i-1}, P_{0:i-1}) \end{aligned}$$
(5)
$$\begin{aligned}&(g_x, p_x) \circ (g_y, p_y)&=(g_x+p_x\cdot g_y, p_x \cdot p_y) \end{aligned}$$
(6)
$$\begin{aligned}&c_i&=g_i+p_i \cdot c_{i-1} \end{aligned}$$
(7)

As shown in Fig. 4, we use multi-control Toffoli gates to construct the quantum propagation structures with radix 2, 3, and 4 respectively. Similarly, the higher radix strategy also reduces the number of propagation layers of p and g, thereby reducing T-count required for the addition operations, and further effectively reducing the operation time. However, due to the extremely high cost of fan-out larger than 2, the quantum higher radix strategy proposed only focuses on increasing the fan-in of the adder.

Figure 4
figure 4

Quantum circuits for higher radix layer.

In this paper, we try to construct a quantum higher radix adder, which means we need to construct the higher radix layer based on multi-control Toffolis for the propagation of p and g. Figure 4b, c show the specific circuits for the higher radix structure with radix 3 and radix 4 as respective examples.

Step 2: Carry path

In this section, we describe the details of our carry path. Through the calculation in the previous section, we have already obtained p and g for each bit. In order to get the carries, we need to select a particular carry-propagate structure as the carry path to propagate p and g. As shown in Fig. 5, five different propagation structures are proposed in the literature, which are subsequently discussed. The details of these structures are shown in Fig. 6.

Figure 5
figure 5

Chronology of publication of carry-propagate structures.

  • Sklansky. In 1960, J. Sklansky17 proposed a conditional CLA adder with high fan-out nodes and minimal depth. The structure of it is shown in Fig. 6a.

  • Kogge-Stone. The Kogge-Stone structure18 was published in 1973, which has a low depth but a high number of nodes. The structure is shown in Fig. 6c.

  • Ladner-Fisher. As shown in Fig. 6b, the topology of Ladner-Fisher19 is similar to the Sklanskly structure. Hence, it also has low depth but high fan-out nodes. However, there are some differences between these two structures in the application.

  • Brent-Kung. The Brent-Kung structure12 is one of the most important propagation structures. Compared to other structures, this structure has a very small number of nodes as well as low fan-in and fan-out, despite having a large logic depth. Therefore, it is widely used in quantum CLA designs.

  • Han-Carlson. In 1987, the Han-Carlson structure was first proposed20. In order to improve the overall efficiency of the propagation, it combines elements from both the Brent-Kung and Kogge-Stone structures.

The propagation operations are the main cost of the carry path. Furthermore, since qubit can not be copied, the carry-propagate structures with fan-out larger than 2 introduce additional cost. Therefore, as shown in Fig. 6, the Brent-Kung structure which has the smallest number of propagate operations and low fan-out is selected as the carry path for the p and g propagation in our paper.

Figure 6
figure 6

Carry-propagate structures, where the green node represents one propagate operation.

Step 3: Sum path

We then discuss the implementation details of the sum path. As described in the previous sections, the final sum can be calculated by feeding carries into the sum path. For the sum path, we can choose between CSA and RCA.

  • RCA. The general structure of RCA is shown in Fig. 7a. As discussed in “Section Previous works”, various quantum RCAs have been proposed so far. In this paper, we choose the most efficient Gidney adder, which has the minimum T-count and T-depth, as our RCA structure.

  • CSA. As shown in Fig. 7b, the Carry Select adder consists of two Ripple Carry adders and one select circuit.

    In the first part, we built two quantum RCAs with the same structure. The input carry bit of them are set to 0 and 1. Therefore, using these sub-circuits, we can obtain the sum when the inputs are 0 and 1, respectively. In both the classical and quantum worlds, this part can be computed in parallel with the carry path, thus effectively reducing the time cost.

    In the second part, we construct a select sub-circuit. Suppose we know that the real input carry is c, which can only be 0 or 1. Depending on c, the sum calculated by the corresponding RCA is then chosen as the final result. More specifically, when c is equal to 0, we choose the sum of the quantum RCA whose input carry is 0 as the final sum. Similarly, when \(c=1\), the final result is the sum of quantum RCA whose input carry is 1.

    It is worth noting that the quantum CSA contains expensive CSWAP gates, which can be decomposed by Clifford+T gates. As shown in Fig. 8, there are 2 decomposition methods. For Method 116, the CSWAP gate is decomposed into a Clifford+T circuit with 7 T gates and T-depth of 4. If we use Method 2 (this is adopted from Quipper documentation (https://www.mathstat.dal.ca/~selinger/quipper/doc/Quipper-Libraries-GateDecompositions.html.)), the CSWAP gate is decomposed into a circuit with 7 T gates incurring the T-depth of 3.

    In this paper, we use CSA1 to denote the CSA structure whose CSWAPs are decomposed by Method 1. Similarly, CSA2 represents the CSA whose CSWAPs are decomposed by Method 2.

After determining the design details of these sum paths, we performed a systematic analysis of their performance. As shown in Table 3, the RCA structure is always cheaper than the CSA structures in terms of T-count, T-depth and QC. Therefore, we choose Gidney’s RCA as our sum path.

Figure 7
figure 7

Sum paths.

Figure 8
figure 8

Decomposition methods for quantum CSWAP gate.

Table 3 Different structures of sum path (\(r=n\)). When calculating T-depth, we assume that Part 1 of the CSA has been completed when calculating the carry path. Therefore, for CSA1 and CSA2, T-depth equals to the T-depth of Part 2, which is the minimum T-depth of sum path for CSAs.

Overall structure of quantum higher radix adder

One higher radix layer

In this paper, we only use one higher radix layer. As shown in Fig. 9a (This diagram was inspired from https://web.stanford.edu/class/archive/ee/ee371/ee371.1066/lectures/lect_04.pdf), the classical higher radix adder applies the higher radix strategy to every layer of the Brent-Kung tree, which reduces the computation depth from \(log2^n\) to \(log_r^n\) .

In this part, we explain why do we not follow the same example from the classical computing. This is due to the fact that if the strategy is used at every layer as shown in Fig. 9b, RCAs with large T-depth will be introduced in the sum path, which deprives our higher radix adder of the significant advantage of low T-depth according to Fig. 10. Therefore, we recommend using the higher radix strategy only for the first few layers of the quantum Brent-Kung Tree.

Figure 9
figure 9

Brent-Kung structure with radix 4. (The small bounding box represents radix-4 RCA, and the large bounding box represents radix-16 RCA.).

Figure 10
figure 10

The T-depth of the sum path with increasing layers for the radix-4 higher radix adder. The formula represented by this curve is \(O(T-depth)=r^{layers}\).

Complete circuit

The structure of the proposed quantum higher radix circuit is shown in Fig. 11. It can be divided into 7 stages.

  • Notations. The binary bit-width of the addends is denoted as n, and r denotes the value of the radix. The binary expansion of the number a is denoted as \(a=a_{n-1}a_{n-2}\cdots a_0\), where \(a_{n-1}\) is the most significant bit and \(a_0\) is the least significant bit. For circuit decomposition, paired Toffoli gates are decomposed into Logical-And, while unpaired Toffoli gates are decomposed using Method 3 shown in Fig. 3. Thus, \(TC_3\) is equal to 7, and \(TD_3\) is equal to 3.

  • Step 1. In this step, our task is to calculate the p and g. Since we do not need the most significant carry, no operation is performed on the most significant group. By taking \(a_i\), \(b_i\) as control qubits and an ancilla with initial state \(|0\rangle \) as the controlled qubit, we apply the CCNOT gate to compute \(g_i\) and then store the result in the corresponding ancilla. After that, we use \(a_i\) as the control and \(b_i\) as the controlled qubit to apply the CNOT gate. As a result, the corresponding \(p_i\) is stored in the corresponding \(b_i\) position. Some Toffoli gates are unpaired in the whole circuit, we decompose those Toffoli gates by using Method 3.

    For convenience, \(\alpha \) is introduced to denote the addend qubits in the most significant group. This step requires \(TC_3\cdot (n-\alpha )\) T-count, \(TD_3\) T-depth, and extra ancilla qubits are \(3\cdot n-\alpha \).

    $$\begin{aligned} \alpha = {\left\{ \begin{array}{ll} r &{} \text {~if~} n \pmod r = 0;\\ n \pmod {r} &{}\text {~otherwise} \end{array}\right. } \end{aligned}$$
    (8)
  • Step 2. In the second step, we group the initially obtained p and g by using the higher radix structure. Specifically, we construct the corresponding higher radix structure according to the method shown in Fig. 4, and then apply it to the corresponding \(g_i\) and \(p_i\) calculated in step 1 to obtain \(g_{group}\) and \(p_{group}\). Since the controlled qubits of the last Toffoli will be used to store the carry later, no uncomputation is performed on it. Hence, the last Toffoli is always unpaired, we only decompose it using Method 3, but decompose the rest of the Toffolis into Logical-And.

    For convenience, \(\beta \) and \(\rho \) are introduced to represent a complex intermediate variable for constructing multi-control Toffolis and the number of groups divided, respectively. In step 2, the required T-count is \(\rho \cdot [TC_3+4\cdot (2\cdot r-3)]\), the T-depth is \( TD_3+r+\beta -1\), and extra ancilla is \(\rho \cdot (r-1)\).

    $$\begin{aligned} \beta = {\left\{ \begin{array}{ll} 0 &{} \text { if } r\le 2;\\ 2+\lfloor \log (r-2)\rfloor &{}\text {~otherwise} \end{array}\right. } \end{aligned}$$
    (9)
    $$\begin{aligned} \rho = {\left\{ \begin{array}{ll} \frac{n}{r}-1 &{} \text {~if~} n\pmod r=0;\\ \lfloor \frac{n}{r} \rfloor &{}\text { otherwise} \end{array}\right. } \end{aligned}$$
    (10)
  • Step 3. In step 3, we construct the Brent-Kung tree using the \(p_{group}\) and \(g_{group}\) processed by the higher radix structure to calculate the carry path. We tried using Logical-And here, but found no benefit. Hence, here all the Toffolis are decomposed by Method 3.

    In this step, the required T-count is \(2\cdot TC_3 \cdot [2\cdot \rho -1 -\omega (\rho )-\lfloor \log (\rho )\rfloor ]\), the T-depth is \(TD_3\cdot ( \left\lfloor \log (\rho )\right\rfloor + \left\lfloor \log \frac{\rho }{3}\right\rfloor +2)\), and the number of extra ancilla qubits is \(2\cdot \rho -1 - \omega (\rho )-\lfloor \log (\rho )\rfloor \).

  • Step 4. In this step we uncompute the operation of calculating the intermediate p in step 3. We repeat the calculation of all Toffolis for the intermediate variable p in reverse order. In step 4, the required T-count is \( TC_3 \cdot [2\cdot \rho -1 -\omega (\rho )-\lfloor \log (\rho )\rfloor ]\), the T-depth is \(TD_3\cdot ( \left\lfloor \log \rho \right\rfloor + \left\lfloor \log \frac{\rho }{3}\right\rfloor +1)\). Here we do not need any extra ancilla qubit.

  • Step 5. Here we uncompute Step 2. We just repeat the same Toffolis from Step 2 in reverse order, except for the last one. Since we decompose them into Logical-And structures, no additional cost is needed in this step.

  • Step 6. In step 6, we restore the original binary addends a and b by applying the NOT gate and Toffoli on p and g. In order to store the corresponding \(b_i\) in the corresponding qubits, we apply the CNOT gate by taking \(a_i\) as the control qubit and \(b_i\) as the controlled qubit. Since the Logical-And that used previously introduces the measurement operation, so we then uncompute only the Toffoli gates which are applied to the least significant qubits of each group.

    In this step, the required T-count is \(\rho \cdot TC_3\), and the T-depth is \(TD_3\). We do not need any extra ancilla qubits.

  • Step 7. In this step, we construct the Gidney’s RCA6 to calculate the sum for each group.

    In step 7, the required T-count is \(4\cdot (n-\lceil \frac{n}{r} \rceil )\), the T-depth is r, and the number of extra ancilla qubits is \(\alpha -1+(r-2)\cdot \rho \).

An example of an addition operation performed by the higher radix adder is shown in Fig. 11. We use seven colors to divide the whole circuit from step 1 to step 7 from left to right. The radix of this adder is set to 3 and the inputs (i.e., addends) are two 15-bit binary numbers denoted by a and b. By using this quantum circuit we can correctly get the sum of these two numbers. In order to show the overall structure more clearly, we use \(\circ \) to represent the controlled-NOT operation of the Logical-And structure in Fig. 11.

Figure 11
figure 11

Radix-3 addition circuit with two 15-bit addends a and b.

Interestingly, there are two special cases. When \(r \ge n\), our adder is Gidney’s RCA. When radix is equal to one, our adder is a simple CLA. In summary, the overall cost of our circuit is shown below.

$$\begin{aligned}&\text {T-count}&=(8r+40)\cdot \left\lceil \llceil \frac{n}{r}\right\rceil \rrceil +11n -72 -7\cdot (n-1)\pmod r -8r -21\omega (\left\lceil \llceil \frac{n}{r}\right\rceil \rrceil -1) -21\left\lfloor \log (\left\lceil \llceil \frac{n}{r}\right\rceil \rrceil -1) \right\rfloor )\end{aligned}$$
(11)
$$\begin{aligned}&\text {T-depth}&= 6\cdot (\left\lfloor \log (\left\lceil \llceil \frac{n}{r}\right\rceil \rrceil -1) \right\rfloor +4+\left\lfloor \log (\frac{1}{3}(\left\lceil \llceil \frac{n}{r}\right\rceil \rrceil -1)) \right\rfloor ) +\left\lfloor \log (r-2) \right\rfloor +2r-5 \end{aligned}$$
(12)
$$\begin{aligned}&\text {Qubit Count}&= 3n-1-2r-\omega (\left\lceil \llceil \frac{n}{r}\right\rceil \rrceil -1) +(2r-1)\cdot \left\lceil \llceil \frac{n}{r}\right\rceil \rrceil -\left\lfloor \log (\left\lceil \llceil \frac{n}{r}\right\rceil \rrceil -1) \right\rfloor \end{aligned}$$
(13)

It can be observed that the circuit structure of the quantum higher radix adder varies with radix. In the next section, we will discuss how radix affects the performance of our adder and compare it with other well-known work.

Figure 12
figure 12

Comparision of the cost required by quantum higher radix adders with different radix and different sizes.

Results and discussions

  • Experiment 1: The effect of radix.

    Figure 12 shows T-count, T-depth, and QC for nine different higher radix adders with radix from 1 to 9, respectively. It is clear that when the radix is fixed, the performance of our adder varies for different input sizes. As the input size increases, the overall cost also increases, which means that the larger the input size, the more complex and expensive any adder tends to be. Since an increase in input size means an increase in the number of operations, this can directly result in an increase in circuit scale. For larger circuit, more expensive cost is often required in terms of T-depth, T-count and QC.

    Interestingly, for a fixed input size, increasing the radix does not always reduce the cost monotonically. In this paper, the higher radix adder with r equal to 1 is a CLA, while r equal to 2 represents the higher radix adder without multi-control Toffolis. Compared to higher radix adder with r equal to 1, the cost of it with radix 2 is reduced in T-count, T-depth and QC, which means that the higher radix layer can effectively optimize the circuit even without introducing multi-control Toffoli. When r is larger than 2, our adder is a hybrid of quantum RCA and CLA. T-count and T-depth decrease first and then increase as the radix increases in the range where r is less than the input size. Meanwhile, QC increases steadily in the fluctuation. When r is equal to input size, QC drops abruptly. This dramatic change is caused by the transformation of our higher radix adder from the hybrid to Gidney’s RCA.

    It can be seen that the performance of the higher radix adder is significantly influenced by the choice of radix. Therefore, by changing it, we can adapt the proposed adder to the specific requirements of different scenarios as well as minimize the overall cost. In general, when QC is more expensive, a small radix should be set to avoid introducing too much ancillas, while when T-depth or T-count is more costly, setting a larger radix can help minimize T-depth or T-count.

  • Selection of Best Radix.

    The best radix for T-depth, T-count and QC is defined as the radix that leads to the lowest cost of our adder in terms of T-depth, T-count and QC, respectively. As shown in Fig. 12; the fluctuations of T-depth, T-count, and QC vary as r increases. Hence, the corresponding best radix may be different for T-depth, T-count, and QC. According to the formulae for cost (given in Equations (11), (12) and (13)), the corresponding optimum radix can be determined. See supplementary material for more details.

  • Experiment 2: Comparison with well-known quantum adders.

    The performance of our adder is compared to other well-known quantum adders as shown below. We summarize the cost formulas of them in Tables 4 and 5. Based on them, the relevant data is visualized in Figs. 13 and 14.

    Firstly, we describe some important experimental details. In order to evaluate our adder more objectively, three Toffoli decomposition methods are used to decompose adders into three different versions. For adders which are denoted by \(\diamond \), all the Toffoli pairs are decomposed using the Logical-And structure, and then the rest are decomposed using Method 3 mentioned in “Section Method”. For adders which are denoted by \(\star \), all the Toffolis are decomposed by Method 3. For adders which are denoted by \(\bullet \), only Gidney’s RCAs6 are decomposed using the Logical-And structure, and the rest are decomposed by Method 3. Among them, the adders which are denoted by \(\diamond \) and \(\star \) have smaller qubits because that Logical-And structure is not used before the sum path, which means some ancilla can be reused in sum path after the uncomputation. For the adders which are denoted by \(\bullet \), the overall T-depth and T-count is reduced at the cost of QC. This is due to that it uses Logical-And structures wherever possible.

    Then we compare the proposed adder with other well-known works. Compared to quantum RCAs, our adder consistently has a significant advantage in terms of T-depth, despite having more T-count and QC. It is interesting that various new RCA designs are proposed recent years. In a comparision conducted by Orts et al.8, the performance of Wang RCA8 and Gayathri RCA9 was evaluated. Moreover, Gayathri et al.10 utilized a variant of Gidney RCA6 in their quantum circuit, referred to as Gayathri Adder in this paper. Overall, Gidney RCA is observed has the lowest cost in T-count, T-depth, and QC among all these new RCA designs. Therefore, it is widely applied in various quantum designs, such as21. Specifically, Gidney RCA achieves a T-count of \(4n-4\), lower than Wang RCA, Gayathri RCA and Gayathri Adder. Moreover, Gidney RCA is equal to the T-depth of Wang RCA and Gayathri RCA, while Gayathri Adder exhibits a slightly higher T-depth of 2n. For QC, Gidney RCA excels with a value of \(3n-1\), while Wang RCA and Gayathri RCA have \(3n+1\) qubits, and Gayathri Adder has a QC of 3n. To maintain conciseness, the subsequent comparisons will only focus on the Gidney RCA among all the new RCAs which are proposed after 2016. It is evident that even when compared to Gidney RCA, our adder maintains a significant advantage in T-depth. Compared to quantum CLAs, our adders which are denoted by \(\diamond \) and \(\bullet \) have similar T-count and T-depth, but significantly smaller T-count. For our adder which is denoted by \(\star \), it significantly reduces the T-depth and further reduces the T-count at the cost of a slight increase in qubit. Since Draper’s out-of-place adder4 does not need to be complexly uncomputed like the in-place one. Therefore, we construct a simplified version of our higher radix adder to objectively compare with. According to Fig. 14, our adder slightly increases T-depth and QC, but significantly decreases T-count. Moreover, when compared to Takahashi adder13 which is a special quantum CLA that introduces grouping idea, all the versions of the higher radix adder have similar QC and significantly smaller T-count. For T-depth, our adders which are denoted by \(\diamond \) and \(\bullet \) are similar to Takahashi adder, but our adder which is denoted by \(\star \) has a huge reduction. Besides, the higher radix adder is also compared with Takahashi combination adder14, which also combines RCA and CLA. Although our QC is larger, the T-count and T-depth of our adder have different degrees of reduction. It is obvious that our structure is more general and flexible, and further improves the overall efficiency.

    In general, the higher radix adder needs more qubits as a cost to significantly reduce the overall T-count and T-depth compared to other adders.

Table 4 Performance analysis of different quantum adders. To simplify the representation, we compare the costs of different higher radix adders with r from 3 to \(n-1\) and then use the lowest cost to represent the performance of our adder. The formula for \(\omega (n)\) is \(\omega (n)=n-\sum _{y=1}^\infty \left\lfloor \frac{n}{2^y}\right\rfloor \) and the range for r is \(2 < r \le n\).
Table 5 Performance analysis of quantum out-of-place CLAs. Since Draper’s out-of-place adder does not need to be complexly uncomputed like the in-place adder, it is unfair to compare our adder directly to it. Therefore, we construct a simplified version of our higher radix adder to compare with. The relevant formulas are shown below.
Figure 13
figure 13

Comparison of the cost required by different quantum adders.

Figure 14
figure 14

Comparison of the cost required by Draper’s out-of-place CLAs and our adders.

  • Connecting with existing quantum adders.

    Our work can be seen as a bridge to connect existing quantum adders. Figure 15 illustrates the general framework of it, whose key parts are the carry path and the sum path.

    For the carry path, quantum CLAs can be used to compute specific carries. For the sum path, any quantum adder can be used to calculate the final result based on those carries. It is interesting to note that the cost contribution of the carry path and sum path in the total circuit can be adjusted by changing radix. More specifically, when the radix is large, the number of groups divided by the higher radix layer is small. Hence, the carry chain is short, which means the sum path is a larger cost contribution of the overall circuit than the carry path. On the contrary, when the radix is small, the carry chain is longer, which means the carry path accounts for a large portion of the total circuit.

    According to this general framework, this work can be seen as a specific example based on Draper’s CLA and Gidney’s RCA. Specifically, our carry path uses the same Brent-Kung tree structure as Draper’s CLA, and our sum path is Gidney’s RCA. Apart from these two adders, other quantum adders can also be used to construct a higher radix adder.

    In order to support one to construct the cheapest quantum adder in different scenarios quickly and easily, we summarize the performance of well-known quantum adders in Fig. 16. When QC is more expensive, it is more suitable to use adders with less ancilla such as Takahashi RCA. For T-count, using RCAs such as Gidney’s RCA can effectively reduce the overall cost. For reducing T-depth, it is recommended to integrate Draper’s In-place CLA or other quantum CLAs within higher radix framework described in Fig. 15.

Figure 15
figure 15

A general framework of higher radix adder.

Figure 16
figure 16

Visualization of quantum adders.

Conclusion

Quantum adder is one of the most fundamental components in quantum computing. Therefore, designing a quantum adder with a lower cost is of great significance for establishing a more efficient and cheaper large-scale quantum circuit. This paper proposed an efficient quantum circuit for integer addition by introducing techniques from classical higher radix carry-lookahead adder and Manchester Carry Chain adder. In terms of T-depth and T-count, the proposed circuit is superior to all the existing quantum carry-lookahead adders except Draper Out-of-place CLA. Compared with Draper’s Out-of-place CLA, the proposed higher radix adder has significantly lower T-count with comparable QC and T-depth. Due to practical constraints, we focused our analysis on three main quantum circuit complexity metrics, T-count, T-depth, and QC.

In the future, one may be interested in how to automatically design the best adder based on specific cost constraints and how to accurately and quickly tune the radix to obtain the most efficient adder. Additionally, exploring the limits of T-count, T-depth, and QC in quantum addition is also a meaningful and challenging problem. Finally, comparing various adder designs considering practical constraints, such as quantum error correction (QEC) and topological structures, is an important open problem.