Introduction

Recent advances in quantum hardware1,2,3 open the path for practical applications of quantum algorithms. A particularly promising target application domain is combinatorial optimization. Problems in this space are prominent in many industrial sectors, such as logistics4, supply-chain design5, drug discovery6, and finance7. However, many of the most promising quantum algorithms for optimization are heuristic and lack provable performance guarantees. Moreover, limited capabilities of near-term quantum computers further constrain the power of such algorithms to address practically-relevant problems. Therefore, it is crucial to perform thorough evaluations of promising quantum algorithms on state-of-the-art hardware to assess their potential to provide quantum advantage in optimization.

When conducting such evaluations, the choice of the target optimization problem is of particular importance. Many well-studied theoretical problems—such as maximum cut (MaxCut)8,9,10,11,12,13,14, maximum independent set15 and maximum k-colorable subgraph16—have been used to evaluate the performance of quantum optimization algorithms. These problems have several advantages: they are well-characterized theoretically, have strong hardness guarantees, and are easy to map to quantum hardware. At the same time, however, they do not correspond directly to the practically-relevant problems that are solved daily in an industrial setting.

In this work, we consider the problem of text summarization, where the goal is to generate a shortened representation of an input document without altering its original meaning. This process is commonly used to produce summaries of news articles17 and voluminous legal documents18. Specifically, we focus on a version of text summarization known as extractive summarization (ES), wherein the summary is produced by selecting sentences verbatim from the original text. In our experiments, we map the ES problem to an \(\mathsf {NP}\)-hard constrained-optimization problem using the formulation introduced by McDonald19. The resulting optimization problem is then solved on a quantum computer. We impose a constraint on the number of sentences in the summary, which is enforced using a penalty term in the objective, or natively by limiting the quantum evolution to a constraint-restricted subspace.

ES is a particularly interesting problem to consider since it has challenges that are similar to those of many other industrially-relevant use cases. First, it is constrained, making it necessary to either restrict the quantum evolution to the corresponding subspace, or introduce large penalty terms into the formulation. Second, it lacks simple structures, such as symmetries20. Third, unlike commonly considered toy problems, such as MaxCut, the coefficients in its objective are not necessarily integers, which can make the optimization of quantum algorithm parameters hard21.

In this paper, we present experimental and numerical results demonstrating the challenges associated with solving constrained-optimization problems with near-term quantum computers. Our contribution is twofold.

First, we demonstrate experimental results showing successful execution of the Quantum Alternating Operator Ansatz algorithm with a Hamming-weight-preserving XY mixer (XY-QAOA)22,23 on the quantum processor Quantinuum H1-1. We use all 20 qubits of H1-1 and execute circuits with two-qubit gate depth of up to 159 and two-qubit gate count of up to 765, which is the largest such demonstration to date. We additionally report results from the execution of the Layer Variational Quantum Eigensolver (L-VQE)24, which is a recently-introduced hardware-efficient variational algorithm for optimization. We obtain approximation ratios of up to \(92.1\%\) and in-constraint probability of up \(91.4\%\) on the H1-1 device.

Second, we motivate our algorithm choice by highlighting the trade-off between the quality of the solution and the in-constraint probability which is implicit in applying unconstrained near-term quantum optimization algorithms to constrained problems. This trade-off suggests the need to carefully engineer the parameter optimization strategy, which is difficult in general. We show how this trade-off can be avoided by either using a sufficiently expressive circuits such as L-VQE (at the cost of the increased difficulty of parameter optimization) or, more naturally, by encoding the constraints directly into the circuits as in the case of XY-QAOA.

The remainder of this paper is organized as follows. Section “Problem description” introduces the quantum algorithms used. Section “Extractive summarization as an optimization problem” describes how the ES problem is formulated as an optimization problem. Section “Methods” details the methodology we adopted to generate the optimization problem instances and solve them on real quantum hardware. Section “Results” illustrates the experimental results we obtained by executing the algorithms and discusses the advantages and downsides of them. Section “Related work” discusses previous hardware demonstrations. Finally, “Discussion” summarizes our findings and their significance. Additional technical details on the algorithms, hyperparameter and ES problem are presented in the Appendix at the end of the paper.

Problem description

For a given objective function f defined on the N-dimensional Boolean cube and a set of feasible solutions \(\mathscr {F}\subseteq \{0,1\}^N\), consider the problem of finding a binary string \(\mathbf {x}\in {\mathscr {F}}\) that maximizes it:

$$\begin{aligned} \max _{\mathbf {x} \in {\mathscr {F}}} f(\mathbf {x}). \end{aligned}$$
(1)

The set of feasible solutions \(\mathscr {F}\) is typically given by constraints of the form \(g(\mathbf {x})=0\) or \(g(\mathbf {x})\le 0\). A binary string \(\mathbf {x}\in \mathscr {F}\) is said to be “in-constraint”. Let \(C\in \mathbb {C}^{{2^N} \times {2^N}}\) denote the Hamiltonian (Hermitian operator) encoding f on qubits. This operator is diagonal in the computational basis (\(C = \text {diag}(f(\mathbf {x}))\)) and is defined by its action on the computational basis: \(C|\mathbf {x}\rangle = f(\mathbf {x})|\mathbf {x}\rangle , \forall \mathbf {x}\in \{0,1\}^N\).

QAOA25,26 solves the problem (1) by preparing a parameterized quantum state

$$\begin{aligned} \prod _{j=1}^p \left[ e^{-i\beta _j \sum _{k=1}^N \textsc {x}_k}e^{-i\gamma _j C} \right] |+\rangle ^{\otimes N}, \end{aligned}$$
(2)

where \(\textsc {x}_j\) denotes a single-qubit Pauli \(\textsc {x}\) acting on qubit j and the initial state \(|+\rangle ^{\otimes N}\) is a uniform superposition over all computational basis states. The parameters \(\mathbf {\beta }, \mathbf {\gamma }\) are chosen using a classical algorithm, typically an optimization routine27, with the goal of maximizing the expected objective value of QAOA state-measurement outcomes. The depth of a QAOA circuit is controlled by a free parameter p. In the limit \(p\rightarrow \infty\), QAOA can solve the problem exactly via adiabatic evolution25. Additionally, there exist lower bounds on the performance of QAOA when solving MaxCut in finite depth12,28, and QAOA achieves performance competitive with best-known classical algorithms on the Sherrington-Kirkpatrick model29 in infinite-size limit and finite depth30,31.

In the remainder of this paper, XY-QAOA refers to the Quantum Alternating Operator Ansatz algorithm with a Hamming-weight-preserving XY mixer22, whereas QAOA indicates the Quantum Approximate Optimization Algorithm25.

L-VQE24 solves problem (1) by preparing a parameterized state:

$$\begin{aligned} \prod _{j=1}^p \big [U(\mathbf {\theta }_j)\big ]V(\mathbf {\theta }_0)|0\rangle , \end{aligned}$$
(3)

where U is a circuit composed of linear nearest-neighbor cnot gates and single-qubit rotations, V is a tensor product of single-qubit rotations and \(|0\rangle\) is the N-qubit vacuum state. Due to the structure of the circuit, the two-qubit gate depth is very low (\(4 \times p\) for a circuit with p layers). We refer interested readers to Ref.24 for the precise definition of the circuit.

While QAOA and L-VQE can tackle constrained-optimization problems in which the constraint is enforced by a penalty term, the output of these algorithms is not guaranteed to satisfy the constraint. Moreover, a penalty term introduces a trade-off between the in-constraint probability and the quality of the in-constraint solution, as discussed in “Quantum circuit needs to preserve constraints”. The Quantum Alternating Operator Ansatz algorithm22 overcomes this limitation by using a parameterized circuit which limits the quantum evolution to a constraint-preserving subspace. The general form of this circuit is given by

$$\begin{aligned} \prod _{j=1}^p \left[ U_M(\beta _j)U_C(\gamma _j) \right] |s\rangle , \end{aligned}$$
(4)

where \(U_C\) is the phase operator, which encodes the objective and is diagonal in the computational basis, \(U_M\) is a non-diagonal mixer operator, and \(|s\rangle\) is some initial state. QAOA circuit can be recovered as a special case by setting \(U_M(\beta _j) = e^{-i\beta _j \sum _{k=1}^N \textsc {x}_k}\), \(U_C(\gamma _j)=e^{-i\gamma _j C}\) and \(|s\rangle =|+\rangle ^{\otimes N}\). In this paper, we focus in particular on the Hamming-weight constraint of the form \(\sum _{j=1}^N x_j = M\), which in our case corresponds to fixing the size of the summary. While many variations of this algorithm exist22, we only consider the XY-QAOA version, which lends itself well to implementation on near-term hardware. We let the initial state be the uniform superposition over all binary strings with Hamming weight M, \(U_C(\gamma _j)=e^{-i\gamma _j C}\) and \(U_{M}^{\textsc {x}\textsc {y}}(\beta _j) = \prod _{k=1}^N e^{-i\frac{\beta _j}{2}(\textsc {x}_k\textsc {x}_{k+1}+\textsc {y}_k\textsc {y}_{k+1})}\)23. Given that \(\sum _{k=1}^N\textsc {x}_k\textsc {x}_{k+1}+\textsc {y}_k\textsc {y}_{k+1}\) commutes with \(\sum _{k=1}^N\textsc {z}_i\), the evolution produced by the resulting circuit is restricted to the span of computational basis states with Hamming weight M, as desired.

Extractive summarization as an optimization problem

Extractive summarization (ES) is an interesting problem to evaluate the performance of quantum optimization algorithms due to its practical importance and complex structure. The goal of ES is to pick a subset of the sentences present in a large document to form a smaller document such that this subset preserves the information content in the original document, i.e., summarizes it. While many approaches to solving ES exist (see e.g.32,33,34,35), in this work we focus on a particular formulation of ES as an optimization problem. Specifically, we consider the problem of maximizing the centrality and minimizing the redundancy of the sentences in the summary under the constraint that the total number of sentences in the summary is fixed. This formulation of ES has been proposed by R. McDonald and shown to be \(\mathsf {NP}\)-hard to solve exactly19.

We now introduce the necessary notation and formally define the problem. An ES algorithm maps a document of N sentences to a summary of \(M < N\) sentences. Let the sentences be denoted by integers \(i \in [N] := \{0, 1, \dots N-1\}\) according to the order in which they appear in the document. An extractive summary is a vector \(\mathbf {x} \in \{0, 1\}^N\), where \(x_i = 1\) if and only if sentence i is included in the summary text associated with \(\mathbf {x}\). The summary should identify sentences that are central, meaning important, to the document. The salience of a sentence is measured with some centrality which is a map \(\mu : [N] \xrightarrow []{} \mathbb {R}\) satisfying the following property: \(\mu (i) > \mu (j)\) if and only if sentence i contains more information about the document than j. At the same time, to keep the summary short, it is desirable to ensure that the sentences in the summary are not redundant. The overlap in information content between two sentences is measured with pairwise similarity which is a symmetric map \(\beta : [N] \times [N] \xrightarrow []{} \mathbb {R}\) that satisfies the following property: \(\beta (i, j) > \beta (i, k)\) if and only if sentence j is more similar to i than k is to sentence i. We discuss the particular choices of measures of sentence centrality and pairwise similarity in Appendix 1.

The extractive summarization is formulated as the following optimization problem19:

$$\begin{aligned} \begin{aligned} \max _{\mathbf {x} \in \{0,1\}^N }&\sum _{i=0}^{N-1} \mu (i)x_i - \lambda \sum _{i\ne j} \beta (i, j) x_i x_j, \\ \text {s.t.}&\sum _{i=0}^{N-1} x_i = M \end{aligned} \end{aligned}$$
(5)

where the parameter \(\lambda\) controls how the inclusion of similar sentences in the summary is penalized. The objective of this maximization problem is to increase the information content of the sentences in the summary while ensuring that the total pairwise similarity between sentences is low, relative to \(\lambda\). Refer to Appendix 5 that shows how this parameter affects the quality of the summaries obtained.

This problem can be solved directly by a quantum algorithm that can preserve constraints. On the other hand, to solve this problem using an unconstrained-optimization algorithm, we must convert this problem to an unconstrained one by adding a penalty term to enforce the constraint. The penalty term is minimized when exactly M sentences are selected. This term is weighted with the parameter \(\Gamma\). Including the constraint on the lengths of summaries, the optimization problem becomes the following:

$$\begin{aligned} \begin{aligned} \max _{\mathbf {x} \in \{0,1\}^N } \sum _{i=0}^{N-1} \mu (i)x_i - \lambda \sum _{i\ne j} \beta (i, j) x_i x_j - \Gamma \left( \sum _{i=0}^{N-1}x_i - M\right) ^2, \end{aligned} \end{aligned}$$
(6)

which can be simplified further by ignoring constant terms to get a quadratic objective:

$$\begin{aligned} \begin{aligned} \max _{\mathbf {x} \in \{0,1\}^N} \sum _{i=0}^{N-1}\underbrace{(\mu (i) + 2\Gamma M - \Gamma )}_{\mu _{\Gamma , M}(i)}x_i - \sum _{i\ne j} \underbrace{(\lambda \beta (i, j)+\Gamma )}_{{\beta _{\Gamma }(i, j)}} x_i x_j. \end{aligned} \end{aligned}$$
(7)

Methods

To generate the optimization problems to be solved, we use articles from the CNN/DailyMail dataset36. This dataset contains just over 300 k unique news articles written by journalists at CNN and the Daily Mail in English. The optimization-problem instances consist of two sets of 10 instances: one set with \(N=20\) and a required summary length of \(M=8\) and another set with \(N=14\) and a required summary length of \(M=8\). We use sentence embeddings produced by BERT37 to compute similarities, and tf-idf38 to compute centralities, both of which are discussed in detail in Appendix 1.

To quantify the quality of the solution for the optimization problem, we report the approximation ratio given by

$$\begin{aligned} \frac{f_{\text {observed}}-f_{\min }}{f_{\max }-f_{\min }}, \end{aligned}$$
(8)

where \(f_{\text {observed}}\) is the objective value for the solution produced by a given algorithm, \(f_{\max }\) is the maximum value of the objective function and \(f_{\min }\) the minimum. The maximum and minimum are computed over all possible in-constraint solutions. For quantum algorithms, \(f_{\text {observed}}\) is computed as the average value of the objective function over all in-constraint samples. We additionally report the probability of the solution being in-constraint, which for the experimental results is estimated as the ratio of in-constraint samples to the total number of samples. In practice, the running time of the algorithm scales inversely proportionally to the in-constraint probability due to the need to obtain at least one in-constraint sample.

For unconstrained quantum optimization solvers, the cost function to maximize contains a penalty term with weight \(\Gamma\) to constrain the number of sentences in the summary to be M (last term in (6)). The value of this hyperparameter \(\Gamma\) was chosen to ensure the value of the cost in (6) corresponding to in-constraint binary strings is greater than the cost corresponding to out-of-constraint binary strings. For each article, after having calculated the similarity and centrality measures, we set \(\Gamma = \sum _{i=0}^{N-1} \mu (i) + \lambda \sum _{i\ne j} \beta (i,j)\), where \(\lambda =0.075\). See Appendix 5 for additional numerical experiments showing the impact of the value of \(\lambda\) on the performance.

We execute the QAOA, XY-QAOA and L-VQE using 14 and 20 qubits respectively. We use one layer for QAOA and XY-QAOA (\(p=1\)) and for L-VQE we use \(p=1\) for 14-qubit problems and \(p=2\) for 20-qubit problems. We optimize the parameters in noiseless simulation and then execute the circuits with optimized parameters on hardware with 2000 shots. We use Qiskit39 for circuit manipulation and noiseless simulation. For the hardware experiments, we transpile and optimize the circuits to H1-1’s native gate set using Quantinuum’s t\(|\text {ket}\rangle\) transpiler40. For comparison, we also run the quantum algorithms on an emulator provided by Quantinuum, which approximates the noise of H1-1.

For the XY-QAOA circuit, a significant part of the two-qubit gate depth comes from the circuit preparing the initial state, which is a uniform superposition of all in-constraint states (Dicke state). To obtain circuits that are shallow enough to be executable on hardware, we leverage recently developed techniques for the short-depth Dicke-state preparation41,42,43. Specifically, we use the divide-and-conquer approach43 to generate circuits targeting a device with an all-to-all connectivity.

Figure 1
figure 1

The approximation ratios (top) and in-constraint probabilities (bottom) obtained in the optimization of the instances on 14 (left) and 20 (right) qubits using different quantum optimization algorithms executed in a noiseless simulator, the Quantinuum H1-1 machine and its emulator. In each box plot, the box shows quartiles, the line is the median and the whiskers show minimum and maximum. Each plot is showing the statistics over 10 problem instances, with the exception of XY-QAOA where only three 14-qubit and three 20-qubit runs have been executed due to high circuit depth. We observe that all results significantly improve on random guess. The in-constraint probability of QAOA is below random guess for 14 qubits due to the choice of parameters; see discussion in “Quantum circuit needs to preserve constraints”.

Results

We now present experimental results obtained in simulation and on the Quantinuum H1-1 quantum processor. We show the largest to date demonstration of constrained quantum optimization on gate-based quantum computers, using up to 20 qubits and 765 native two-qubit gates (see “Related work” for a review of previous demonstrations). Note that in the results presented below, we do not perform any error mitigation for the results obtained from hardware or noisy simulations. The specifications of the H1-1 processor are given in Appendix 3.

As previously discussed, we use three quantum optimization algorithms to solve a constrained-optimization problem with the eventual goal of generating document summaries. The optimization algorithms we use are QAOA, L-VQE and XY-QAOA. See “Problem description” for the definitions and discussion of the algorithms, and Appendix 2 for the implementation details. All the statistics we report are computed over 10 problem instances for each number of qubits, with the exception of XY-QAOA on H1-1 where only three 14-qubit and three 20-qubit instances are solved due to the high circuit depth and correspondingly high running time on trapped-ion hardware. “Random” and “Random in-constraint” always refers to statistics computed over all binary strings and all in-constraint binary strings respectively. This is equivalent to computing them with respect to uniform random distribution over corresponding sets.

Figure 2
figure 2

Hamming weight of the bitstrings sampled from the initial uniform superposition of in-constraint states (Dicke state) and a full XY-QAOA circuit. The in-constraint Hamming weight is highlighted in bold. Unlike the QAOA or L-VQE, in XY-QAOA a significant amount of noise is incurred in the initial state preparation step. Observe that the initial Dicke state, due to hardware noise, includes out-of-constraint states.

Figure 3
figure 3

The probability of sampling bitstrings with a given Hamming weight distance to the in-constraint subspace for 20 qubits. The Hamming weight distance is given by \(|\text{wt}(x)-M|\), where \(\text{wt}(x)\) is the Hamming weight of x and M is the constraint value. The shaded background represents the distribution of Hamming weight distance for random bitstrings sampled from a uniform distribution. Note that the overlap with the in-constraint subspace is lower on hardware due to the presence of noise.

Experiments on hardware

In Fig. 1, we present the approximation ratios and the in-constraint probabilities for the three algorithms obtained from the execution in a noiseless simulator, with approximated noise in the emulator of H1-1 and on the real H1-1 device. For comparison, we also present the expected approximation ratio of a random feasible solution. We observe that all the approximation ratios obtained, including those from largest circuit executions on hardware, significantly improve upon random guess. Additionally, we observe that the results on hardware are on average at least as good as those obtained from the emulator, making us confident that the emulator gives a good lower bound on the solution quality that can be expected on hardware.

Examining Fig.  1 makes apparent the relative advantages and limitations of the three algorithms we consider. We begin by noting that QAOA gives a relatively high approximation ratio, but a very low in-constraint probability. This is due to the QAOA parameters being chosen to trade-off the two metrics of success; we discuss this issue in detail in “Quantum circuit needs to preserve constraints” below. Here we simply note that the low in-constraint probability of QAOA makes the high approximation ratio less significant. Combined with the complexity of parameter setting arising from the trade-off, this means that QAOA is not a good algorithm for the problem we consider here despite having a relatively high approximation ratio. This trade-off is avoided by L-VQE and XY-QAOA in two different ways.

L-VQE uses a very expressive parameterized circuit that can in principle solve the problem exactly with no two-qubit gates, just by optimizing the parameters of the initial single-qubit-gate layer \(V(\mathbf {\theta }_0)\). At the same time, the expressiveness of L-VQE circuit makes the parameters hard to optimize, both due to their high number and due to the gradients vanishing as the number of qubits grows in some cases44,45. As we consider modestly sized problems in this work, we are able to optimize the parameters and obtain solutions with very high approximation ratio and in-constraint probability. However, as the number of qubits grows, this will become increasingly infeasible.

XY-QAOA natively encodes the constraints by restricting the quantum evolution to the subspace equal to the span of the computational basis states of Hamming weight M. This leads to an in-constraint probability of one and a high approximation ratio, if no noise is present. At the same time, XY-QAOA requires deeper circuits as compared to QAOA and L-VQE. This is due to the higher gate count cost of the XY-QAOA mixer operator and the initial state preparation. The two-qubit gate counts and two-qubit gate depths of the executed circuits are given in Table 1.

The unconstrained QAOA uses a mixer unitary which is a product of single-qubit rotations, i.e., \(U_M(\beta _j) = \prod _{k=1}^N e^{-i\beta _j \textsc {x}_k}\), where N is the number of qubits. For most near-term circuit architectures, including H1-1, the single qubit gates are relatively less noisy than the two-qubit entangling gates46,47. As a result, the mixer unitary does not add much noise to the evolution. Each time-step of QAOA, therefore, has at most \(N(N-1)/2\) entangling gates, all of which come from the pairwise interaction in the problem Hamiltonian. On the other hand, XY-QAOA uses a more complex mixer operator, which preserves the Hamming weight of the states it acts on. This operator is defined as: \(U_{M}^{XY}(\beta _j) = \prod _{k=1}^N e^{-i\frac{\beta _j}{2}(\textsc {x}_k\textsc {x}_{k+1}+\textsc {y}_k\textsc {y}_{k+1})}\). It adds additional O(N) gates in each layer of QAOA circuit as it requires entangling gates on all adjacent pairs.

The implementation of XY-QAOA requires preparing an initial state, which is a uniform superposition of all in-constraint states (Dicke state). This state is non-trivial to implement, as it requires a circuit with O(N) two-qubit gate depth41. This is in sharp contrast with the initial state used by QAOA or L-VQE, which requires no two-qubit gates to prepare. Therefore in XY-QAOA a significant cost is incurred before any optimization is performed. Figure 2 shows this by plotting the probabilities of sampling bitstrings with different Hamming weights from the output of the initial state preparation circuit and the full XY-QAOA circuit on the H1-1 device. Unlike the noiseless case, the initial state is not fully contained in the in-constraint subspace. At the same time, the noise is sufficiently low so that the XY-QAOA output distribution is still concentrated on the in-constraint subspace. Improved hardware fidelity would lead to more accurate initial state preparation and higher overall in-constraint probability.

Finally, we examine the impact of noise on the in-constraint probability of the output of all three algorithms. As discussed above, for our problem of text summarization, it is crucial that the Hamming weight of the output solution is constrained. Both the algorithm design and the hardware noise affect the in-constraint probability. In Fig.  3, we visualize this behavior for 20-qubit instances, with the analogous figure for 14 qubits given in Appendix 2. Concretely, we plot the probability of obtaining a bitstring \(\varvec{x}\) with Hamming distance k to in-constraint bitstrings for each \(k \in \{0, \dots , N-M\}\). Hamming distance k means that at least k bitflips are required to transform one bitstring into the other. We can see that as the noise is added, the in-constraint probability decreases and the output begins to include out-of-constraint bitstrings. Note that for the short L-VQE circuits, the amount of noise accumulated during the circuit execution is small, and only the bitstrings that are one or two bitflips away are included in the output. On the other hand, for deeper XY-QAOA circuits, bitstrings as far as 10 bitflips away are included. For QAOA, even the noiseless output includes primarily out-of-constraint bitstrings due to the choice of parameters.

Figure 4
figure 4

Pareto frontier of QAOA parameters with respect to approximation ratio and the in-constraint probability for 14 (a) and 20 (b) qubits. Each color plots a different problem instance for a given number of qubits. When solving a constrained-optimization problem using QAOA, the parameter optimization strategy has to be tuned to arrive at the desired point on the Pareto frontier. Directly optimizing the objective (6) would lead to choosing the righmost point of the frontier, which in the case of the 20 qubit instance presented in (c) corresponds to approximation ratio equal to that of random guess. In (c), each dot corresponds to one value of \(\mathbf {\beta }, \mathbf {\gamma }\) from a grid search in parameter space. Thick dashed line shows the Pareto frontier and the thin dotted line marks the approximation ratio of a random solution. The red \(\times\) marker indicates the parameters chosen for the experiments shown above.

Table 1 Two-qubit gate depths (2Q depth) and two-qubit gate counts (2Q count) of the circuits executed on emulator and on the H1-1 device.

Quantum circuit needs to preserve constraints

As discussed above, the output distribution of QAOA circuits has a relatively small overlap with the in-constraint space even in the absence of noise. This is due to the trade-off between the in-constraint probability and the approximation ratio, which is implicit in the choice of the parameter optimization strategy. This trade-off is an important weakness of unconstrained quantum algorithms applied to constrained problems, necessitating the development and implementation of quantum algorithms that natively preserve constraints. In the QAOA experiments presented in Fig.  1, we choose the parameters with the goal of increasing the approximation ratio of in-constraint solutions at the expense of reduced in-constraint probability. We now examine this trade-off numerically.

We perform a grid search over the parameter space \(\mathbf {\beta }, \mathbf {\gamma }\) of a single-layer QAOA. In Fig.  4 we plot the Pareto frontier with respect to approximation ratio and in-constraint probability. Specifically, we plot the approximation ratios and in-constraint probabilities for QAOA with parameter values \(\mathbf {\beta }, \mathbf {\gamma }\) such that there do not exist parameters \(\hat{\mathbf {\beta }}, \hat{\mathbf {\gamma }}\), which improve either of the metrics without decreasing the other. We plot such frontiers for all 20 instances considered.

This trade-off behavior is not specific to QAOA and indeed applies to all unconstrained quantum algorithms that are not expressive enough to solve the problem exactly. Examining the Pareto frontiers makes clear the challenge of choosing parameters for such algorithms. Whereas in this work we perform the full grid search and we may actually choose any point on the frontier for execution on hardware, in practice an objective function must be carefully designed to optimally trade-off the two objectives. This is hard in general. As an example, optimizing parameters with respect to (6) would lead to prioritizing the in-constraint probability. Figure 4c shows an example of an instance where this leads to approximation ratio equal to that of random guess.

One potential solution to this challenge is using a circuit that is sufficiently expressive to solve the problem exactly, such as the circuit used in L-VQE. However, such circuits by necessity would have many parameters that are hard to optimize. Additionally, in many cases they would suffer from gradients of the objective vanishing exponentially with the number of qubits (“barren plateaus”)45, making optimization impossible. Therefore the most practical solution is encoding constraints directly into the quantum circuit, as is the case for XY-QAOA.

Table 2 Comparison of the hardware demonstrations shown in this work with previous hardware demonstrations of QAOA on gate-based devices.

Related work

In this work, we show the largest demonstration to date of constrained optimization on a gate-based quantum computer. We now briefly review the state-of-the-art, which we summarize in Table 2. We include the unconstrained-optimization demonstrations for completeness, and emphasize that the circuits used in this work are deeper than any quantum optimization circuits executed previously for any problem.

There have been various quantum hardware demonstrations of QAOA applied to unconstrained-optimization problems. Using Google’s “Sycamore” superconducting processor, Harrigan et al. 13 ran QAOA to find the ground states of Ising models that mapped to the 23-qubit Sycamore topology and Sherrington–Kirkpatrick models with 17 vertices. Additionally, they solved MaxCut on 3-regular graphs with up to 22 vertices. Otterbach et al. solved MaxCut on a 19-vertex graph that obeys the hardware topology of Rigetti’s 19-qubit “Acorn” superconducting processor48. Lacroix et al. 49 executed QAOA, up to \(p=6\), on a superconducting gate-based quantum computer to solve the exact-cover problem on at most seven vertices. The device supported two-qubit controlled arbitrary-phase gates. This enhanced gate set resulted in a two-qubit-gate depth of 42. There have been other hardware demonstrations of solving unconstrained-optimization problems, with either lower qubit counts or shallower circuits14,51,52,53,54,55,56,57,58,59.

An important step in implementing XY-QAOA is the preparation of Dicke states. Aktar et al. 43, who developed the divide-and-conquer Dicke-state preparation approach used in our experiments, implemented their algorithm using up to six qubits on two IBM Q devices.

To the authors’ knowledge, there has only be one other demonstration of QAOA with constrained mixers and Dicke-state initialization on quantum hardware. This was done by Baker and Radha50 who solved problems using at most five qubits and \(p=5\). They executed circuits using both the XY complete-graph mixer with Dicke-state initialization and the XY ring mixer with initialization to a random in-constraint state. They utilized the Rigetti “Aspen-10” superconducting processor, five IBM Q superconducting processors, and the 11-qubit IonQ trapped-ion device. They reported that QAOA beat, with both the XY complete-graph mixer and XY ring mixer, random guess for up to three qubits and \(p=5\) on the Rigetti device. Lastly, their results on IonQ, for the XY ring mixer, beat random guess for up to three qubits and \(p=4\).

While there have been other hardware demonstrations of QAOA using alternative mixers to the transverse field, they were either not applied to hard-constrained problems or did not use an in-constraint initial state. For example, Golden et al. 60 solved six Ising problems on 10 IBM Q backends using QAOA with Grover mixers61. While Grover mixers can be used to incorporate hard constraints, the problems Golden et al. solved were unconstrained. Pelofske et al. 62 solved five Ising problems, still unconstrained, with QAOA using Grover mixers on seven IBM Q backends, Rigetti’s “Aspen-9” device, and the 11-qubit IonQ device. The instances required fewer than seven qubits. For a protein folding problem, Fingerhuth et al. 63 executed QAOA with an XY mixer using four qubits and \(p=1\) on Rigetti’s Acorn device. However, the initial state was a uniform superposition over all bitstrings, and thus the Hamming weight constraint was not obeyed.

Lastly, there have been very large-scale demonstrations of QAOA on analog quantum simulators. First, Pagano et al. 64 demonstrated QAOA on a trapped-ion analog quantum simulator using up to 40 qubits at \(p=1\) and 20 qubits at \(p=2\) on unconstrained problems. Second Ebadi et al. 15 show an application of an analog quantum simulator, using QAOA, to maximum independent set (MIS) problems on graphs with up to 179 vertices. This variant of QAOA was performed by controlling the timing of global laser pulses applied to a 2D array of 289 cold atoms. The global pulses induce Rydberg excitations, which result in a blockade effect that ensures that only independent sets are sampled. However, since these analog devices do not implement universal gate sets, the results are not directly comparable to ours.

Discussion

In this work, we present the largest to date demonstration of constrained optimization on quantum hardware. Our results demonstrate how algorithmic and hardware progress are bringing the prospect of quantum advantage in constrained optimization closer, which can be leveraged in many industries including finance65,66.

In our experiments on the 20-qubit Quantinuum H1-1 system, we observe that XY-QAOA with up to 20 qubits provides results that are significantly better than random guess despite the circuit depth exceeding 100 two-qubit gates. This progress can be clearly observed by comparing the size and the complexity of the circuits used in our experiments to the previous results discussed in “Related work” above. The results we present here were obtained with no error mitigation, which is not the case for many of the previous demonstrations. Our execution of complex circuits for constrained optimization benefits from the underlying hardware’s all-to-all connectivity, as the circuit depth would increase significantly if the circuit had to be compiled to a nearest-neighbor architecture.

We additionally show the necessity of embedding the constraints directly into the quantum circuit being used. If the circuit does not preserve the constraints, the in-constraint probability and the quality of the in-constraint solution have to be traded-off against each other. This trade-off is hard to do in general. This observation further motivates our investigation of XY-QAOA on H1-1, and gives additional weight to our results. At the same time, we show that further advances are needed to reduce the hardware requirements of implementing such circuits and improve the fidelities of the hardware.