Introduction

Quantum computation is a very promising method to perform information processing. Several types of problems, such as prime factorization1 or search algorithms2 can be sped up considerably. The first physical realizations have been built,3,4,5,6,7 where error rates are small enough to allow for effective error-correction and fault-tolerant quantum computation.8 Many diverse systems can already run small quantum algorithms and projects such as the IBM Quantum Experience,9 have connected small prototype computers to the cloud for both educational purposes, and to allow other researchers to test small-scale protocols.

One of the remaining tasks for experimentalists is to scale up the number of qubits, while maintaining low error rates, in order to allow more complex algorithms to be performed. The task for theorists is now to build a quantum compiler,10,11,12,13 that can translate high-level algorithms to individual hardware instructions. This compiler has to be aware of the hardware faults and should introduce error-correction to protect logical qubits from physical influences, in order to ensure a completely fault-tolerant computation. Several parts of such a compiler have already been created for both the offline component (before a quantum computer is initiated)10,11,12,13 and the online component (classical software run in tandem with the quantum algorithm),14, 15 but a complete software package has yet to be developed. Notably, optimization algorithms,16 which operate at the error-correction level, are still lacking to optimize physical resources in the most commonly used error-correction models. To this end, we inspect the optimization of a specific topologically based operational model called lattice surgery (LS).17 This representation was chosen particularly because of its applicability to a wide range of hardware models,18,19,20,21,22 and the applicability of LS approaches using other topological coding techniques.23,24,25

For a practical fault-tolerant computer using LS, both the physical and logical levels are arranged in a 2D nearest-neighbor array, on which a universal gate set can be realized.17, 26 This 2D, nearest-neighbor environment is enforced by the connectivity of the physical qubit array. For LS this is the planar code,27 for braiding it is generally the surface code.8, 27, 28 The common feature of all these representations is that physical qubits are connected via a graph, that indicates their possible interactions. Even non-fault-tolerant implementations suffer from the restricted connectivity of the underlying physical qubits and methods to perform computation on these had to be developed.29

Conceptually, algorithmic compilation and optimization is similar to more traditional measurement-based quantum computation, but at the level of error-corrected qubits. The LS translation26 of an arbitrary circuit creates an algorithmically specific graph state at the encoded level, using the native parity checks of LS. After this encoded graph is created, a time-ordered sequence of non-Clifford measurements is performed on each encoded node in the graph to realize the algorithm. This is akin to traditional measurement-based quantum computation30 (which is not error-corrected), where a 2D, universal graph state (commonly referred to as a cluster state) is prepared, a quantum circuit mapped to this 2D array and all associated Clifford measurements are performed. The 2D cluster state is then converted to an algorithmically specific graph state where the only subsequent operations needed are a time-ordered sequence of non-Clifford measurements and feed-forward.26, 31

Here, we want to evaluate the complexity of the creation of such an encoded graph state. In complexity theory, problems are divided into categories, which determine their hardness. A famous class consists of non-deterministic polynomial complete (NP-complete) problems.32 Such problems lie in the complexity class NP, such that a solution can be verified in polynomial time, and are at least as hard as the most difficult problems in NP. A common way to determine NP-completeness is to map an already known NP-complete problem to the problem of interest.32 We were inspired by the proof of NP-hardness of Tetris,33 and map the 3-partitioning problem34 to the optimization of LS patches using the translation devised in ref. 26. This implies that it is also NP-hard to optimize the complete problem including measurements, because this only adds an additional layer of complexity to the system.

We will prove that the circuit optimization of a particular fault-tolerant implementation of topological error-correction is NP-hard. Similarly to our result, it has been shown that it can be NP-hard for a compiler to optimize classical code, such that its execution is time optimal.35 Our results, thus, urge the development of heuristics that can optimize quantum circuits not exactly, but at least reasonably well, for implementation on realistic quantum hardware. Furthermore, we derive general estimates on best and worst performance. We also discuss the benefit of optimization given a sample algorithm and estimate the hardness of the optimization problem for an exact, classical solver.

The main idea of the LS translation26 is to encode an algorithmically specific graph state in the square lattice of the planar code, which will then use a measurement-based quantum computational approach to perform any calculation. The implementation of this encoded state needs to respect the underlying structure of the planar code. Many square patches that encode individual qubits17 are aligned on a 2D lattice. Connections between nearest-neighbor logical qubits are possible using physical qubits that lie on the boundary between the patches. These operations constitute merges and splits that act as parity checks between the two encoded qubits, and can be used together with injection to enable universal quantum computation.26

The analysis performed here is rooted in the LS translation given in ref. 26. First, patches are initialized to \(\left| + \right\rangle \), then using parity checks an algorithmically specific stabilizer state is generated. This stabilizer state is measured in the bases \(\left| Z \right\rangle \), \(\left| X \right\rangle \), \(\left| Y \right\rangle = P\left| + \right\rangle \), and \(\left| A \right\rangle = T\left| + \right\rangle \), where P = \(\sqrt Z \) and T = \(\sqrt P \). However, for planar codes the rotated basis measurements (e.g., Y, A) are not protected fault-tolerantly, and magic states must be injected.17, 26 Our description is only concerned with the creation of the initial algorithmically specific stabilizer state and shows that even the optimization of this less complicated problem is already NP-hard.

An arbitrary circuit can be rearranged into the ICM format,12 which is already divided into (I)nitializations, (C)NOTs, and (M)easurements. The first two steps can be interpreted as a circuit to generate the stabilizer state. The translation to LS first merges all CNOTs of this circuit into multi-target CNOTs, which can then be easily implemented in LS: For each multi-target CNOT a column in the planar code is created, which is later split into individual encoded qubits (Fig. 1). Then, the qubits that are targeted by two or more CNOTs have to be combined through LS merge operations.

Fig. 1
figure 1

LS translation. Here, multi-target CNOTs are implemented using LS. During initialization three patches of surface code with N by 7N qubits are created, which are then split and merged to perform the computation. The faded boxes indicate ancillary qubits. An optimized version of this circuit is shown at the bottom where the placement of the patches ensures a minimal bounding box of the whole circuit area. This circuit can achieve the theoretical optimality

Due to different CNOTs targeting the same qubits multiple times in a general quantum circuit, the compiler naturally produces ancillary qubits, which are inherent to the structure of the algorithm and the compiler. During our calculations these are disregarded and we only study the problem of how to optimally place the patches in the 2D-nearest neighbor environment of the planar code error-correction model using LS.

Results

We will prove that the problem of deciding if a perfectly optimizable layout in LS exists, by mapping a known NP-complete problem to the problem of interest.

Inspired by the proof of NP-hardness of Tetris,33 we chose to encode the 3-partition problem into a circuit, which then gets translated to LS. We will show that, with polynomial overhead, a solution to the 3-partition problem can be obtained by the optimization of the placement of LS patches.

The proof whether theoretical optimality is reachable implies that the optimization problem itself is NP-hard. This can be explained by using the optimization problem as a subroutine to the decision problem. If the optimization problem was easier, the decision problem would be solvable in polynomial time. With the described mapping, this would mean that any problem in NP could be solved polynomially, which is widely assumed to be false.

The proof itself is given in the methods section.

Discussion

We want to give an estimate on how much of an improvement can be expected from the optimization of a double \(\left| Y \right\rangle \)-state distillation circuit.8, 36 This circuit is only illustrative for the optimization and we are aware of better proposals to implement \(\left| Y \right\rangle \)-states in surface codes.37 For reference, we provide the circuit of one distillation step in Fig. 2. In our calculation we give bounds for the best case by calculating the theoretical optimum. Furthermore, the worst case bounds are given by calculating an unoptimized placement, where each qubit corresponds to one row of patches in LS. The definitions for both worst-case and best-case bounds are given in the “Methods” section. However, a previous manual optimization26 has shown that the \(\left| Y \right\rangle \)-state distillation circuit cannot reach theoretical optimality, such that the best possible solution lies somewhere in between these bounds. In the following back-of-the-envelope calculation we assume that the basis transformation of the measurement step can be applied without movement, which would correspond to a solution of a more complex optimization problem.

Fig. 2
figure 2

Steane code. This circuit is the Steane code, to be used for the distillation of \(\left| Y \right\rangle \) states. This is an iterative procedure where the error-prone Y are used during the application of the S-gates

The first round of the \(\left| Y \right\rangle \)-state distillation circuit consists of seven distillations. Each distillation consists of four CNOTs from the Steane-code with three target qubits each. Furthermore, for the application of one S-gate an additional qubit is needed for the injection procedure. Thus, an additional 7 × 7 qubits are needed. In a second round, an additional distillation needs to be performed, requiring eight more qubits and four more CNOTs. Furthermore, each distillation circuit consists of eight qubits, and initially 7 × 7 noisy \(\left| Y \right\rangle \)-states need to be injected. Thus, the total number of qubits needed in this double distillation is N Q = 8 × 7 + 7 × 7 + 8 = 113.

The optimal costs can be calculated with Eq. (2) and lead to 32 × 4 + 7 × 7 = 177 encoded patches of the planar code. A suboptimal placement, Eq. (3), where each qubit is fixed to one row, requires 32 × 113 = 3616 patches. This difference is a factor of roughly 20, with the difference only growing for larger circuits.

Since this optimization has to be performed by a compiler, which is likely to run on classical hardware, the nature of the optimization being a NP-hard problem will restrict the size of the exactly optimizable instance. To show that it would be unfeasible to exactly optimize even a small amount of individual CNOTs, we look at an exact solution of the number-partitioning problem. An exact algorithm has to loop through all valid configurations to find the best one. We do not consider a dynamic programming solution here, because such an algorithm is unlikely to be devised for the optimization of LS. The reason is that dynamic programming relies on the solution of subproblems. However, connections between different surface code patches required by merges break the structure exploited by dynamic programming approaches. Furthermore, one should note that this optimization needs to be general, such that each circuit can be optimized. Any circuit-specific optimization is therefore discouraged, which makes our claims valid despite the symmetry of the current exemplary circuit. Assuming the same double \(\left| Y \right\rangle \)-state distillation circuit as before, we would have 46 numbers and want to partition these into 15 subsets. The nature of the 3-partitioning problem only allows three numbers per set, such that we only have to consider these configurations. The assignment of 3N elements to N sets such that each set contains exactly three elements has

$${N_{{\rm{config}}}} = \mathop {\prod}\limits_{i = 0}^{N - 1} \frac{{\left( {3N - 3i} \right)!}}{{3!\left( {3N - 3i - 3} \right)!}} = \frac{{\left( {3N} \right)!}}{{{{\left( {3!} \right)}^N}}}$$
(1)

configurations. These would equal ~1044 possible configurations for the distillation circuit. A computer with 3.5 GHz and an ability to check one configuration per cycle would still need ~1034 processor hours to complete this task. Thus, it is not feasible to find the optimal solution with exact algorithms. The scaling of LS should be even worse, because individual qubits of the CNOTs have to be checked for eventual merges with horizontal neighbors adding an additional layer of complexity. Thus, this rough calculation indicates that the NP-hardness of this problem makes it impossible to optimize any meaningful quantum algorithms exactly and efficient heuristic algortihms have to be developed.

We have proven that the decision problem of whether a circuit is perfectly optimizable using the LS-translation devised in ref. 26 is NP-complete and that the optimization problem has to be NP-hard. Furthermore, we have given some rough estimates on how hard exact optimization for LS would be, and have shown that even small circuits cannot be optimized exactly. For practical purposes, however, the optimal configuration is not needed because an optimization protocol can get reasonably close. This urges further research in the development of efficient heuristics to optimize circuits in LS, that are as close to optimality as possible. Furthermore, the inclusion of the measurement step introduces an additional layer of complexity which has not yet been considered in our analysis. This will increase the space of possible configurations and likely decrease even further the efficiency of prospective optimization algorithms.

Methods

In this section we will detail the proof of this work. In order to do so, we need to define optimality in the context of the LS translation. Furthermore, the optimization problem is presented in an abstracted, non-physical description.

Optimality

A usual definition of optimality is reaching a (computational) goal with minimal physical requirements. In our case these physical requirements correspond to a minimal space-time volume, which is defined by the product of error-correcting cycles and physical qubits. We will further restrict this definition such that the bounding box of this space-time volume (within which all computation happens) needs to be minimal, while the placement still retains the same output state. Another way at looking at this definition is that every patch of the surface code inside the bounding box is initialized to a computational qubit and no ancillary patches are needed. We focus on the generation of the algorithmically specific stabilizer-state and prove that even the optimization of this part is NP-hard. Such a stabilizer-state can be prepared in constant time (as all circuit elements are Clifford), which allows a simplified optimality condition to be the mapping that results in the “least surface area”.

(Non-physical) problem description

The LS translation creates a problem where each CNOT has to be fitted into a surface code area that contains all computations. This area should be minimized. However, this can be viewed as an abstract problem, completely detached from the LS picture. We will now introduce the problem that needs to be solved.

The problem consists of minimizing the surface area of a square lattice which consists of individual patches. Some of these patches are assigned an integer q ij , but they do not necessarily need one. Furthermore, multiple patches can have the same integer. A horizontal (vertical) neighbor of a patch is defined as the next non-empty patch (i.e., a patch that contains an integer) to the right or left (up or down). A set of boxes C i containing patches with integers q ij are given and can be implemented on the lattice by a chain of vertical neighbors, where the order of the {q ij } j can be chosen freely. Furthermore, empty patches can be added freely. The following criteria have to be met to obtain a valid configuration: (1) Patches for all boxes need to be placed (vertical neighbors); (2) Patches with the same integers need to be placed such that they are horizontal neighbors.

Thus, the problem consists of an optimal placement of these numbered patches such that the area of the bounding box of the total arrangement is minimized. The less empty patches are required, the more optimized is the configuration.

For a circuit that has been prepared in the universal, inverted ICM-representation each multi-target CNOT operation will contribute to one box C i . The numbers q ij of box C i are given by the qubits that partake in this operation. Figure 1 shows how a sample circuit is mapped to the LS representation.

If a circuit reaches optimality, the number of patches needed in LS can be calculated by:

$${N_{{\rm{Patch}}}} = \mathop {\sum}\limits_{i = 1}^{\# {\rm{CNOT}}} \left( {{N_{{\rm{targe}}{{\rm{t}}_i}}} + 1} \right).$$
(2)

Here, #CNOT denotes the number of multi-target CNOTs with different qubits as their control, in the original circuit specification, and \({N_{{\rm{targe}}{{\rm{t}}_i}}}\) is the number of target qubits for the ith CNOT. However, due to incompatibility during the merge step, this can (in the worst case) lead to a non-optimal placement with a patch-requirement of

$${N_{{\rm{Patch}}}} = {N_Q} \cdot \# {\rm{CNOT}},$$
(3)

where N Q represents the total number of qubits in the circuit.

Due to the structure of the high-level circuit that needs to be compiled, it is not always possible to reach the theoretical optimum. A general optimized algorithm needs resources between the two bounds given above.

Proof

In the 3-partitioning problem34 a set of non-negative integers {a i }1≤i≤3s is given. With another non-negative integer L, two further requirements are: (i) \(\frac{L}{4} \le {a_i} \le \frac{L}{3}\) i, such that 1 ≤ i ≤ 3s, and (ii) \(\mathop {\sum}\nolimits_{i = 1}^{3s} {a_i} = sL\).

The NP-complete decision problem for 3-partitioning answers the following question: Can {a i }1≤i≤3s be partitioned into s disjoint subsets A 1, …, A s , such that \(\mathop {\sum}\nolimits_{i \in {A_j}} {a_i} = L\) for j{1, …, s}?

Mapping

We can translate the problem of 3-partitioning to the problem of deciding whether a corresponding circuit can reach optimality in LS. The main idea of this mapping is to encode each of the integers of the 3-partitioning problem a i into a single multi-target CNOT, where the number of qubits that partake in the ith CNOT is given by a i . Therefore, a box C i of the non-physical problem description contains a i integers which will then be translated to blocks of width 1 and height a i in the LS model. Furthermore, each qubit is only acted on by one CNOT, such that no further constraints apply to the placement of these boxes. The solution of the 3-partitioning problem is given by finding an arrangement of these CNOT blocks in a rectangle of height L and width s. We will call this rectangle the compute area. In Fig. 3 we show a possible circuit, where the qubits in part (b) implement the CNOTs corresponding to a i .

The qubits from part (a) are needed to ensure that a compute area of L by s is optimal. To ensure a width of at least s, one can devise a chain of CNOTs that have different control qubits but operate on the same target qubit. This is encoded in the qubits starting from the second and ending at qubit (s + 2) of part (a). An additional column has to be created here, because one also has to ensure a height of L in the compute area.

This can be performed by adding a single multiple target qubit CNOT with L + 1 target qubits. One of these qubits is used to link the vertical with the horizontal constraints. This qubit is the (s + 2)nd qubit in the circuit of Fig. 3. The following L qubits are used to increase the height by L. This results in the optimal placement of LS patches shown in Fig. 4. If that circuit cannot reach theoretical optimality, the compute area cannot contain all 3-partitioning CNOT-patches and thus additional qubits are needed. Holes (i.e., patches of surface code that do not correspond to any qubit in the circuit) are created and the bounding box of the calculation increases.

Fig. 3
figure 3

Circuit for the optimization problem. The optimization for both parts of this circuit corresponds to solving the 3-partitioning problem. b Implements the 3-partitioning problem only. Each CNOT corresponds to a number a i and will be translated into a separate patch of variable height in the LS representation of Fig. 4. The compute area is the area in the LS representation which only consists for the CNOTs from b. a (of this circuit) Used to force the compute area to be a rectangle of height L and width s and the qubits in the compute area are responsible of encoding the original NP-complete problem

Fig. 4
figure 4

Translated circuit. The circuit from Fig. 3 is now translated to the LS model of quantum computing. Here, the numbers indicate which qubit of the original circuit each patch represents. If the circuit can reach the theoretical optimum, the last sL qubits can be fit in the compute area space. Each column then consists of L patches of surface code, which are all filled with qubits that partake in CNOT operations. If each of these columns is completely filled, s sets are found, which have elements that sum to L

The number of qubits that are needed for this mapping is s(L + 1) + L + 2. With the algorithmic ancillary patches, the circuit requires (L + 2)(s + 1) patches in LS. Thus, this mapping only needs resources linearly in the number of integers of the original problem.

By construction, each column in the compute area corresponds to one of the sets A i , such that the requirement of each set summing to L is equivalent to the requirement that each column in the compute area has a height of exactly L. Furthermore, checking whether each column exactly contains L qubits can be performed in polynomial time, such that the problem is in the complexity class NP.

Data availability

This work does not rely on any further data.