Combinatorial optimization is widely considered to be one of the most promising problem domains for quantum algorithms. The ubiquity of hard optimization problems in science and industry amplifies the impact of any improvements in algorithmic performance. In practice, the optimization problems often have many constraints, such as the regulatory constraints when optimizing a portfolio or logistic constraints when optimizing flight crew assignments. Being able to incorporate a diverse range of constraints is an essential criterion for the applicability of a quantum algorithm to industrial problems.

A commonly considered class of quantum optimization algorithms uses a parameterized quantum evolution to drive the quantum system towards a state encoding the solution of the optimization problem. This class of algorithms includes the quantum approximate optimization algorithm (QAOA)1,2 and variational algorithms for optimization3,4. While these algorithms are often discussed as promising approaches for noisy near-term devices5,6,7,8,9. Therefore, in this paper we primarily view these algorithms as targeting fault-tolerant quantum processors.

One of the main challenges in applying these quantum algorithms to commercially-relevant optimization problems is the need to enforce the constraints. Concretely, the goal is to prepare a quantum state such that upon measuring it, a high-quality solution that satisfies the constraints is obtained with high probability. Two commonly considered approaches are to encode the constraint into the objective using a penalty term and to directly restrict the parameterized quantum evolution to the in-constraint subspace. In the first approach, a penalty term is added to the objective so that optimizing the objective requires satisfying the constraint. While such approaches are flexible enough to satisfy most constraints, the quality of the result is sensitive to the choice of the penalty strength10. As tuning the penalty strength is difficult in general, this approach often leads to sub-optimal performance in practice11. This observation motivates the second approach, i.e., restricting the quantum evolution to the in-constraint subspace.

A number of techniques have been proposed to ensure that the parameterized quantum evolution respects the constraints of the problem. Hadfield et al.12,13 proposed the quantum alternating operator ansatz algorithm, which applies pairs of alternating operators to an in-constraint initial state. The first alternating operator (phase operator) is diagonal in the computational basis and encodes the objective, and the second operator (mixing operator or mixer) is non-diagonal and restricts the transitions of probability amplitudes to the computational basis states corresponding to the in-constraint solutions. The problem of constructing a Hamiltonian preserving arbitrary constraints is NP-complete even for linear constraints14, though explicit constructions are available for some combinatorial optimization problems12,13,15,16. In general, constraint-preserving mixers are difficult to implement, even when constructions are available17,18. The cost of implementing the algorithm on hardware can be reduced for a restricted class of problems by combining the phase and mixing operators19. If a uniform superposition of in-constraint states can be prepared efficiently, a Grover operator can be used as the mixer20,21,22. Finally, for problems with an indexable set of feasible states (such as those with Hamming-weight constraints), a continuous-time quantum walk in the solution space can be used as a mixer23,24,25. However, none of these techniques are sufficiently flexible to handle the general case of multiple arbitrary constraints directly. The parity optimization framework26,27,28,29,30 can natively handle polynomial equality constraints for QAOA-like circuits. However, this framework introduces an auxiliary qubit for every unique monomial term that appears, leading to large space overhead for complex objectives and constraints. All of the techniques mentioned above consider QAOA-like alternating operator circuits, and are not easy to generalize to other variational algorithms.

In this work, we introduce an approach for enforcing multiple arbitrary constraints in quantum optimization. We restrict the quantum evolution to the in-constraint subspace by repeated projective measurements. In each measurement, the value of the constraint is computed onto an auxiliary register, which is then measured. Our technique uses quantum Zeno dynamics, wherein the evolution of the system is restricted to the subspace defined by the repeated projective measurements and transitions outside of this subspace are suppressed. Our approach is applicable to any problem in NPO (the NP optimization complexity class), as the only restriction we impose on the constraints is the existence of an efficient oracle for testing them. We provide explicit constructions for arbitrary combinatorial constraints. We demonstrate the effectiveness of the proposed technique by using it to enforce constraints in QAOA with various, unconstrained, mixing operators and the layer variational quantum eigensolver (L-VQE)31, which is a variational quantum algorithm for optimization. We show analytically that our technique is guaranteed to obtain the optimal in-constraint solution when applied to the digital simulation of the quantum adiabatic algorithm, or equivalently to QAOA in the constrained subspace with sufficiently large depth. We derive an analytical form of the scaling of the number of measurements required to maintain a constant minimum success probability for any parameterized quantum evolution. Furthermore, we provide numerical evidence that our technique, applied to QAOA for the portfolio optimization problem with a budget constraint, provides significant performance improvements over the state-of-the-art method of enforcing the constraint by introducing a penalty term. While the results we derive are for fault-tolerant quantum processors, high-fidelity near-term devices may be able to implement the algorithms without realizing full error-correction. To demonstrate an end-to-end realization of our technique, we implement QAOA with Zeno dynamics on the Quantinuum H1-2 trapped-ion quantum processor for proof-of-concept portfolio optimization problems. These experiments complement our numerical simulations by using explicit constructions and compilations of circuits, including those for checking the constraints. In the hardware experiments, we observe performance improvements from increasing the number of measurements, up to a two-qubit circuit depth of 148.


Quantum Zeno dynamics for constrained optimization

We now introduce our approach to enforcing constraints in quantum optimization by repeated non-selective projective measurements. Our method is general, though here we focus on algorithms utilizing parameterized states of the form

$$\left\vert \psi ({{{{{{{\boldsymbol{\theta }}}}}}}})\right\rangle =U({{{{{{{\boldsymbol{\theta }}}}}}}})\left\vert s\right\rangle =\mathop{\prod }\limits_{j=1}^{m}{e}^{-i{\theta }_{j}{H}_{j}}\left\vert s\right\rangle ,$$

where Hj is some Hamiltonian, e.g., a tensor product of single-qubit Pauli operators, and \(\left\vert s\right\rangle\) is the initial state, which lies in the system Hilbert space \({{{{{{{\mathcal{H}}}}}}}}\).

A constrained combinatorial optimization problem has a set of feasible states \({{{{{{{\mathcal{F}}}}}}}}\), which is a subset of the n-dimensional Boolean cube \({{\mathbb{B}}}^{n}\). Let \({P}_{{{{{{{{\mathcal{F}}}}}}}}}\) denote the orthogonal projector onto the subspace spanned by computational basis states corresponding to feasible solutions in \({{{{{{{\mathcal{F}}}}}}}}\). We discuss the construction of this operator in the Methods Section. The measurement \({{{{{{{\mathcal{P}}}}}}}}\) is a super-operator as defined as

$${{{{{{{\mathcal{P}}}}}}}}\rho =\mathop{\sum }\limits_{j=1}^{k}{P}_{j}\rho {P}_{j},$$

where \(\mathop{\sum }\nolimits_{j = 1}^{k}{P}_{j}={{{{{{{\rm{I}}}}}}}}\), and Pj is a projection onto some subspace \({{{{{{{{\mathcal{H}}}}}}}}}_{j}={P}_{j}{{{{{{{\mathcal{H}}}}}}}}\) of dimensionality \({{{{{{{\rm{Tr}}}}}}}}({P}_{j})\ge 1\). Without loss of generality, we can assume \({P}_{1}={P}_{{{{{{{{\mathcal{F}}}}}}}}}\), and define \({P}_{{{{{{{{\mathcal{G}}}}}}}}}:= {{{{{{{\rm{I}}}}}}}}-{P}_{{{{{{{{\mathcal{F}}}}}}}}}=\mathop{\sum }\nolimits_{j = 2}^{k}{P}_{j}\).

We give our main result in Theorem 1, which we use to derive the number of measurements required to enforce constraints in parameterized evolutions of the form given by Equation (1).

Theorem 1

Let \({{{{{{{\mathcal{P}}}}}}}}\) be the measurement defined in Equation (2). Suppose a system is evolved from some initial state ρ0 = Pjρ0Pj under the action of a Hamiltonian H, whose distinct eigenvalues are \({\xi }_{\min }={\xi }_{1} \, < \, {\xi }_{2} \, < \cdots < \, {\xi }_{d}={\xi }_{\max }\), for time θ. For δ≤0.19, if N applications of \({{{{{{{\mathcal{P}}}}}}}}\) are performed at equally-spaced time intervals with

$$N=\left\lceil\frac{{\left[\theta ({\xi }_{\max }-{\xi }_{\min })\right]}^{2}}{\ln {\left(1-2\delta \right)}^{-2}}\right\rceil,$$

then the probability of measuring a state in \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\) at time θ is lower bounded by 1 − δ, i.e.,

$${{{{{{{\rm{Tr}}}}}}}}\left[{P}_{j}\rho (\theta )\right]\ge 1-\delta ,$$


$$\rho (\theta )={{{{{{{\mathcal{U}}}}}}}}(\theta ){\rho }_{0}{{{{{{{\mathcal{U}}}}}}}}{(\theta )}^{{{{\dagger}}} },\quad {{{{{{{\mathcal{U}}}}}}}}(\theta )={[{{{{{{{\mathcal{P}}}}}}}}{e}^{-iH\theta /N}]}^{N}.$$


See the Methods Section.

Remark 1

Note that since \(2{\parallel } H{\parallel }_{2}\ge | {\xi }_{\max }-{\xi }_{\min }|\), the bound can be reformulated in terms of the spectral norm of the Hamiltonian. This may be useful as the spectral norm may be easier to bound in practice for complicated Hamiltonians.

Assume that the initial state \(\left\vert s\right\rangle\) respects the constraints, that is \({P}_{{{{{{{{\mathcal{F}}}}}}}}}\left\vert s\right\rangle =\left\vert s\right\rangle\). We apply a parameterized unitary U(θ) to the initial state following Equation (1). To enforce the constraints, we can insert measurements into the parameterized evolution as follows:

$${{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})=\mathop{\prod }\limits_{k=1}^{L}{\left[{{{{{{{\mathcal{P}}}}}}}}\mathop{\prod }\limits_{j = 1}^{{m}_{k}}{e}^{-i({\theta }_{r(k,j)}/{N}_{k}){H}_{r(k,j)}}\right]}^{{N}_{k}},$$

where \(r(k,j)=\mathop{\sum }\nolimits_{t = 1}^{k-1}{m}_{t}+j\) and each sequence of mk parameterized evolutions, without a measurement, is called a block. We define Nk = 0 to mean that no measurement is performed and no θr(k, j) is not scaled for that block. The following corollarly provides a sufficient Nk for each block to ensure a desired minimum in-constraint probability. The asymptotic dynamics, i.e. when Nk → , k and also called the Zeno limit, will be different depending on how the blocks are chosen.

Corollary 1

Let \({{{{{{{\mathcal{P}}}}}}}}\) be the measurement defined in Equation (1). Let the parameterized evolution defined in Equation (6) evolve the system from some initial state ρ0 = Pjρ0Pj. Then, in order to ensure that

$${{{{{{{\rm{Tr}}}}}}}}[{P}_{j}{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}}){\rho }_{0}{{{{{{{{\mathcal{U}}}}}}}}}_{Z}{({{{{{{{\boldsymbol{\theta }}}}}}}})}^{{{{\dagger}}} }]\ge 1-\delta ,$$

it suffices to choose

$${N}_{k}=\left\lceil\frac{4L{[\mathop{\sum }\nolimits_{j = 1}^{{m}_{k}}| {\theta }_{r(k,j)}| ]}^{2}{\max }_{j}{\parallel } {H}_{r(k,j)}{\parallel }_{2}^{2}}{\tau (\delta )}\right\rceil,$$


  • \(\tau (\delta )=\ln {(1-2\delta )}^{-2}\) if Hr(k, j) pairwise commute,

  • \(\tau (\delta )=\ln {\left(1-\delta \right)}^{-1.78}\) otherwise,

and δ≤0.19. In addition, the asymptotic dynamics is

$$\mathop{\prod }\limits_{k=1}^{L}{e}^{-i{{{{{{{\mathcal{P}}}}}}}}{{{{{{{{\boldsymbol{H}}}}}}}}}_{k}\cdot {{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}}{{{{{{{\mathcal{P}}}}}}}},$$

where \({{{{{{{\mathcal{P}}}}}}}}\) acts element-wise on the vector \({{{{{{{{\boldsymbol{H}}}}}}}}}_{k}={({H}_{(k,1)},\ldots ,{H}_{(k,{m}_{k})})}^{{\mathsf{T}}}\) and \({{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}=({\theta }_{(k,1)},\ldots ,{\theta }_{(k,{m}_{k})})\).


See the Methods Section.

Remark 2

For combinatorial optimization problems, constraint-preserving measurements that correspond to different constraints always commute. Thus \({{{{{{{{\mathcal{P}}}}}}}}}_{{{{{{{{\mathcal{F}}}}}}}}}\) can be implemented as a composition of measurements corresponding to different constraints.

While the previous results indicate that Nk can grow inverse polynomially with the desired error probability, the following result (Corollary 2) shows that fixing δ and applying a simple repetition scheme suffices to suppress the failure probability arbitrarily below δ with only logarithmic overhead. Thus, the overall procedure can be made efficient. The purpose of the Zeno framework is to ensure that we can obtain a state that has an overlap with \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\) that is lower bounded by a constant and prepare this state with an overhead that is \(O({{{{{{{\rm{polylog}}}}}}}}(\dim {{{{{{{\mathcal{H}}}}}}}}))\).

Corollary 2

Let \({{{{{{{\mathcal{P}}}}}}}}\) be the measurement defined in Equation (2). Let the parameterized evolution defined in Equation (6) evolve the system from some initial state ρ0 = Pjρ0Pj. In addition, suppose that the number of measurements Nk was chosen, using Corollary 1, to ensure that \({{{{{{{\rm{Tr}}}}}}}}[{P}_{j}{\rho }_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})]={{{{{{{\rm{Tr}}}}}}}}[{P}_{j}{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}}){\rho }_{0}{{{{{{{{\mathcal{U}}}}}}}}}_{Z}{({{{{{{{\boldsymbol{\theta }}}}}}}})}^{{{{\dagger}}} }]\) is lower bounded by a constant independent of the system size, and then in order to ensure that \({{{{{{{\mathcal{P}}}}}}}}\) applied to ρZ(θ) prepares a state in \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\) with a probability at least 1 − ϵ, it suffices to prepare and measure at most \(\log (1/\epsilon )\) copies of ρZ(θ).


Suppose \({{{{{{{\rm{Tr}}}}}}}}[{P}_{j}{\rho }_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})]=c\). Since we can efficiently check whether the post-measurement state obtained from applying \({{{{{{{\mathcal{P}}}}}}}}\) to ρZ(θ) is in \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\), \(\log (1/\epsilon )/\log (1/(1-c)) < \log (1/\epsilon )\) repetitions suffice to ensure that the outcome of at least one of the repetitions is in \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\) with probability at least 1 − ϵ.

These results imply that for most practical cases, e.g. when Hj are Pauli operators as in the cases of QAOA and hardware-efficient parameterized circuits, the number of measurements scales at most quadratically in the circuit depth and width, i.e., as \(O({{{{{{{\rm{polylog}}}}}}}}(\dim {{{{{{{\mathcal{H}}}}}}}}))\). Thus, QZD can be used to efficiently constrain parameterized evolution for quantum optimization.

Constrained QAOA via Zeno dynamics

We now discuss the application of QZD to QAOA. In a QAOA circuit, the phase operator UC(γ) is diagonal in the computational basis and cannot violate constraints. More specifically, it evolves the current state, for time γ, under the diagonal operator \(C={\sum }_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{B}}}^{n}}f({{{{{{{\boldsymbol{x}}}}}}}})\left\vert {{{{{{{\boldsymbol{x}}}}}}}}\right\rangle \left\langle {{{{{{{\boldsymbol{x}}}}}}}}\right\vert\), which encodes the values of the objective function f on \({{\mathbb{B}}}^{n}\). The Hermitian mixing operator B transitions probability amplitude between elements of \({{\mathbb{B}}}^{n}\) and, in general, does not respect the problem constraints. Therefore the measurements only need to be added to the mixing operator. Since a p-layer QAOA circuit consists of p applications of the phase and mixing operators in an alternating fashion, the full circuit combined with the Zeno framework then becomes

$${{{{{{{{\mathcal{U}}}}}}}}}_{Z-{{{{{{{\rm{QAOA}}}}}}}}}({{{{{{{\boldsymbol{\beta }}}}}}}},{{{{{{{\boldsymbol{\gamma }}}}}}}})=\mathop{\prod }\limits_{j=1}^{p}\left[{{{{{{{{\mathcal{U}}}}}}}}}_{B}({\beta }_{j},{N}_{j}){U}_{C}({\gamma }_{j})\right],$$


$${{{{{{{{\mathcal{U}}}}}}}}}_{B}({\beta }_{j},{N}_{j})={\left[{{{{{{{\mathcal{P}}}}}}}}{e}^{-i\frac{{\beta }_{j}}{{N}_{j}}B}\right]}^{{N}_{j}}.$$

In the notation of Equation (6), this corresponds to setting all mk = 1, and setting Nk = 0 for blocks containing the cost operator. While there are other valid choices for the blocks, the decomposition we have chosen is sufficient to achieve an efficient scheme.

As the mixing operator B is known, we can explicitly derive the number of measurements required to maintain a constant success probability. We observe that for any mixer this number of measurements grows linearly with the number of QAOA layers, and for commonly considered mixers, the number of measurements grows no more than quadratically with the number of qubits.

Corollary 3

Let \({{{{{{{{\mathcal{U}}}}}}}}}_{Z-{{{{{{{\rm{QAOA}}}}}}}}}({{{{{{{\boldsymbol{\beta }}}}}}}},{{{{{{{\boldsymbol{\gamma }}}}}}}})\) denote the QAOA circuit on n qubits with N measurements added to each mixing operator as defined in Equation (9). Let the initial state \({\rho }_{0}=\left\vert s\right\rangle \left\langle s\right\vert\) be in-constraint. Then Nj measurements suffice to maintain at least a 1 − δ probability of obtaining an in-constraint measurement outcome, where

  • if \(B=\mathop{\sum }\nolimits_{k = 1}^{n}{{{{{{{{\rm{x}}}}}}}}}_{k}\), then \({N}_{j}=\left\lceil\frac{p{\beta }_{j}^{2}{n}^{2}}{\ln {\left[1-2\delta \right]}^{-\frac{1}{2}}}\right\rceil\)

  • if \(B=\left\vert +\right\rangle \left\langle +\right\vert\), then \({N}_{j}=\left\lceil\frac{p{\beta }_{j}^{2}}{\ln {\left[1-2\delta \right]}^{-2}}\right\rceil\),

and δ ≤ 0.19.


The proof follows from Theorem 1 by noting that for \(B=\mathop{\sum }\nolimits_{k = 1}^{n}{{{{{{{{\rm{x}}}}}}}}}_{k}\) the minimum and maximum eigenvalues are − n and n, respectively, and for \(B=\left\vert +\right\rangle \left\langle +\right\vert\) the only eigenvalues are one and zero. For QAOA with p layers, the number of measurements increases by a factor of p. Note that while we could of instead used Corollary 1, using Theorem 1 directly results in Nk being lower by a constant for \(B=\left\vert +\right\rangle \left\langle +\right\vert\).

Note that the scaling rule of Corollary 3 implies that the number of measurements will change with βj and thus each mixer layer.

Figure 1 visualizes how the number of measurements required to maintain a given minimum in-constraint probability, according to Corollary 3, grows with the evolution time β for the B = ∑jxj (✖ marker) and \(B=\left\vert +\right\rangle \left\langle +\right\vert\) (✚ marker) mixing operators for p = 1 QAOA with a 3-qubit initial state \(\left\vert s\right\rangle\). As the phase operator is diagonal, there is no dependency on it. We note that the number of measurements for the mixer B = ∑jxj grows with number of qubits and is therefore larger than for \(B=\left\vert +\right\rangle \left\langle +\right\vert\). Note that when following the scaling rules of Corollary 3, the number of measurements is multiplied by the number of QAOA layers p.

Fig. 1: Scaling of the number of Zeno measurements.
figure 1

Number of measurements, obtained from Corollary 3, required in QAOA with Zeno dynamics to maintain a maximum out-of-constraint probability of \({\delta }_{\max }\) (hence, a minimum in-constraint probability of \(1-{\delta }_{\max }\)) for the B = ∑jxj (✖ marker) and \(B=\left\vert +\right\rangle \left\langle +\right\vert\) (✚ marker) mixers with 3 qubits. Color denotes the minimum in-constraint probability \(1-{\delta }_{\max }\), as indicated by the legend. Note that this is the scaling required to ensure the desired minimum in-constraint probability for the worst-case initial state (i.e., Equation (37)) and is potentially more pessimistic than what is observed in practice. Many more measurements are required for B = ∑jxj as the number of measurements grows quadratically with number of qubits. Note that due to periodicity, the evolution time, β, can be constrained to \(| \beta | \le \frac{\pi }{2}\) for B = ∑jxj and βπ for \(B=\left\vert +\right\rangle \left\langle +\right\vert\).

In the Results Section, we observe that for realistic constraints, the number of measurements is significantly lower. This is because the worst-case \({P}_{{{{{{{{\mathcal{F}}}}}}}}}\) and \(\left\vert s\right\rangle\), i.e., from Equation (37) in the proof of Lemma 1, are far from those encountered in practice. Specifically, the worst-case \({P}_{{{{{{{{\mathcal{F}}}}}}}}}\) is rank one (i.e., only one state is in-constraint). A larger in-constraint subspace leads to a lower sufficient number of measurements. Moreover, in practice the initial state is unlikely to align perfectly with the worst case presented in Equation (37). We also observe in our experiments that the required number of measurements has only a weak dependence on the number of QAOA layers p for the problem instances considered. Therefore, one could consider a significantly relaxed and simplified version of the rules provided in Corollary 3 as follows:

$${N}_{j}=\left\lceil\frac{{\beta }_{j}^{2}}{\eta }\right\rceil,$$

where η is some hyperparameter to be fine tuned. One could always efficiently estimate the in-constraint probability of a QAOA circuit with a fixed η by measuring a single auxiliary qubit indicating whether the final state output by the circuit is in-constraint. In the portfolio optimization experiments, we successfully use an η for the B = ∑jxj mixer that is orders of magnitude larger than predicted by Corollary 3, requiring a correspondingly smaller number of measurements.

QAOA with Zeno dynamics in the adiabatic limit

If the initial state \(\left\vert s\right\rangle\) is the ground state of the mixer Hamiltonian B, QAOA is known to be able to prepare the ground state of the cost Hamiltonian C and thereby solve the problem exactly in the limit of an infinite number of QAOA layers by approximating adiabatic evolution2. We now show that this limiting behavior is preserved for constrained QAOA with Zeno dynamics.

Now consider QAOA with constraints enforced by measurement \({{{{{{{\mathcal{P}}}}}}}}\) as defined in Equation (2), in the Zeno limit, when the number of measurements is taken to infinity, the operator describing the asymptotic dynamics is a sum of the original mixer B projected onto the subspaces defined by the projectors constituting \({{{{{{{\mathcal{P}}}}}}}}\), i.e.,

$${H}_{Z}={{{{{{{\mathcal{P}}}}}}}}B=\mathop{\sum }\limits_{j=1}^{k}{P}_{j}B{P}_{j}.$$

Concretely, consider the task of using QAOA to approximate the adiabatic evolution under the following time-dependent Hamiltonian:


where s: [0, T] → [0, 1] is the interpolating schedule function. A common schedule function is the linear schedule defined by


where T is the evolution time scale. Suppose \(T\gg O({(\mathop{\min }\nolimits_{s}{\Delta }_{n}(s))}^{-2})\), where Δn(s) is the instantaneous minimum difference between the n-th eigenvalue and any other eigenvalue of H(s). If s, it holds that Δn(s) ≠ 0, then the quantum adiabatic theorem32 implies:

$${{{{{{{\mathcal{T}}}}}}}}\exp \left(i\int\nolimits_{0}^{T}{H}_{s}(t)dt\right)\left\vert {\phi }_{n}(0)\right\rangle =\left\vert {\phi }_{n}(T)\right\rangle .$$

In the Zeno case, we consider


Consider the QAOA operator with only one measurement per layer, i.e., j, Nj = 1 in (9):

$${{{{{{{\mathcal{U}}}}}}}}(p)=\mathop{\prod }\limits_{j=1}^{p}{{{{{{{\mathcal{P}}}}}}}}{{{{{{{{\mathcal{U}}}}}}}}}_{B}\big({\beta }_{j}\big){{{{{{{{\mathcal{U}}}}}}}}}_{C}\big({\gamma }_{j}\big).$$

Now it is easy to recover the parameters βj, γj giving the limit. From the definition of the product integral33 it follows that

$$ {{{{{{{\mathcal{T}}}}}}}}\exp \left(i\int\nolimits_{0}^{T}{H}_{s}(t)dt\right)\\ = \mathop{\lim }\limits_{p\to \infty }\mathop{\prod }\limits_{j=1}^{p}\exp \left(i\frac{T}{p}{H}_{s}\left(\frac{jT}{p}\right)\right)\\ = \mathop{\lim }\limits_{p\to \infty }\mathop{\prod }\limits_{j=1}^{p}\exp \left(i\frac{T}{p}\left[\left(1-\frac{j}{p}\right){{{{{{{\mathcal{P}}}}}}}}B+\left(\frac{j}{p}\right){{{{{{{\mathcal{P}}}}}}}}C\right]\right)\\ = \mathop{\lim }\limits_{p\to \infty }\mathop{\prod }\limits_{j=1}^{p}{{{{{{{\mathcal{P}}}}}}}}\exp \left(i\frac{T}{p}\left(1-\frac{j}{p}\right)B\right)\exp \left(i\frac{jT}{{p}^{2}}C\right),$$

where the third equality follows from expanding to the first order in \(\frac{T}{p}\) and that \(\frac{j}{p}\) and \(1-\frac{j}{p}\) are bounded by 1. Also, since the evolution is in a finite-dimensional space, B and C have bounded operator norms.

Thus if \({\rho }_{n}(0)=\left\vert {\psi }_{n}(0)\right\rangle \left\langle {\psi }_{n}(0)\right\vert\) is an n-th eigenstate of HZ then

$${\rho }_{n}(T)=\mathop{\lim }\limits_{p\to \infty }{{{{{{{\mathcal{U}}}}}}}}(p){\rho }_{n}(0),$$

where ρn(T) is pure and is an n-th eigenstate of \({{{{{{{\mathcal{P}}}}}}}}C\). Thus with \({\beta }_{j}=-\frac{T}{p}(1-\frac{j}{p})\) and \({\gamma }_{j}=-\frac{jT}{{p}^{2}}\) as p → , QAOA with Zeno dynamics approaches the adiabatic limit and recovers the optimal solution.

Mitigating mixer limitations in the Zeno limit

While the evolution under \({P}_{{{{{{{{\mathcal{F}}}}}}}}}B{P}_{{{{{{{{\mathcal{F}}}}}}}}}\) is guaranteed to preserve the in-constraint subspace, it may inhibit transitions between states in \({{{{{{{\mathcal{F}}}}}}}}\) that were allowed with B. This is because states in \({{{{{{{\mathcal{F}}}}}}}}\) may be connected by B through a path that passes through states not in \({{{{{{{\mathcal{F}}}}}}}}\). To see this, consider a simple example of the two-qubit mixer B2 = x1 + x2 and the in-constraint space \({{{{{{{\mathcal{F}}}}}}}}=\{\left\vert 01\right\rangle ,\left\vert 10\right\rangle \}\). In the Zeno limit, the mixing operator evolution in the in-constraint subspace is generated by \({P}_{{{{{{{{\mathcal{F}}}}}}}}}{B}_{2}{P}_{{{{{{{{\mathcal{F}}}}}}}}}\), which equals the zero matrix. Thus, the propagator corresponding to the projected mixer becomes the identity operator and the dynamics become trivial. In general, if there is no path between two computational basis states \(\left\vert j\right\rangle ,\left\vert k\right\rangle \in {{{{{{{\mathcal{F}}}}}}}}\) in the graph defined by B, the continuous-time quantum walk defined by the mixing operator cannot move probability amplitude from \(\left\vert k\right\rangle\) to \(\left\vert j\right\rangle\). Whether the transitions between in-constraint states are suppressed in the Zeno limit is in general dependent on the in-constraint space \({{{{{{{\mathcal{F}}}}}}}}\).

One way to avoid the issue of suppressed transitions is by choosing a mixer B with a complete connectivity graph among computational basis states, i.e., \(B=\left\vert +\right\rangle \left\langle +\right\vert\). This mixer is also known as the complete-graph mixer20,34. It has been conjectured34 that mixers with high connectivity, such as the \(B=\left\vert +\right\rangle \left\langle +\right\vert\), can at best produce a Grover-like speedup since they do not make use of the structure of the cost operator. While it is unclear if this conjecture is true, we emphasize that our approach can utilize any mixer and can efficiently enforce constraints as long as the difference between the maximum and minimum eigenvalues of the mixer is polynomial in the number of qubits.

Numerical experiments

We now present the numerical experiments showing the power of the proposed method. The technique we propose is general, though in this section we consider only the problem of portfolio optimization (with both equality and inequality constraints) and only the QAOA and L-VQE algorithms. The parameters in QAOA and VQE were optimized using COBYLA35 initialized with a large number of random initial points. We compare the results to the state-of-the-art method of encoding constraints by introducing a penalty into the objective, and observe significant improvements in both approximation ratio and in-constraint probability. In addition to better performance, the proposed method does not require complicated tuning of the penalty factor.

Benchmark: portfolio optimization

The daily operation of a large financial institution requires solving many classically-hard optimization problems36,37,38. Among such problems, one of the most important is portfolio optimization. Modern portfolio theory39 considers the task of finding a portfolio with a desired trade-off between risk and expected return. This task is typically formulated as an optimization problem, which is hard to solve classically in many settings, such as when the variables are required to only take on a discrete set of values. When designing an algorithm for portfolio optimization, a central consideration is the ability to incorporate a general class of constraints. Such constraints can come from regulatory or business considerations, with examples ranging from portfolio-level constraints (including budget and total number of assets) to asset-level constraints (such as minimum holding size).

The particular constrained portfolio optimization problems we study numerically arise from the discrete mean-variance Markowitz model39 and have the following objective function

$$\mathop{\min }\limits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{F}}}}}}}}}q{{{{{{{{\boldsymbol{x}}}}}}}}}^{{\mathsf{T}}}\Sigma {{{{{{{\boldsymbol{x}}}}}}}}-{{{{{{{{\boldsymbol{\mu }}}}}}}}}^{{\mathsf{T}}}{{{{{{{\boldsymbol{x}}}}}}}},$$

where \({{{{{{{\mathcal{F}}}}}}}}\) is defined by some set of constraints on the portfolio. We consider two sets of problems. In the first set, we impose an inequality constraint on the total size of the portfolio (∑jxjC). In the second set of problems, in addition to the inequality constraint on portfolio size, we include a constraint on the total expected return (∑jμjxjR). For each of the two sets of constraints, we consider seven instances with between four and ten assets, for a total of fourteen instances. In all problem instances \({{{{{{{\mathcal{F}}}}}}}}\subset {{\mathbb{B}}}^{n}\), where n is the number of assets.

Zeno dynamics improves quantum optimization performance

Figure 2 presents the comparison between QAOA with Zeno dynamics and QAOA with constraints enforced using a penalty factor on the fourteen problem instances described in the previous subsection. The penalty method is described in the Methods Section. The solution quality is measured in terms of the approximation ratio r, a value between 0 and 1, with larger r being better. The approximation ratio is formally defined in the Methods Section. We consider QAOA with mixers B = ∑jxj (✖ marker) and \(B=\left\vert +\right\rangle \left\langle +\right\vert\) (✚ marker), and optimize the QAOA parameters exhaustively. To improve the performance of parameter optimization, we follow ref. 40 and rescale the cost function so that the gradients with respect to β and γ are roughly of the same magnitude.

Fig. 2: Performance of QAOA with Zeno dynamics and QAOA with constraints enforced using penalty terms.
figure 2

Approximation ratio r and out-of-constraint probability δ (correspondingly 1 − δ in-constraint probability) achieved by QAOA with constraints enforced using penalty terms (dotted lines) on problems (ad) with a single constraint, and by QAOA with Zeno dynamics (solid lines) on problems with a single (ad) and multiple (e, f) constraint(s). The markers ✖ and ✚ indicate whether QAOA used the B = ∑jxj mixer or \(B=\left\vert +\right\rangle \left\langle +\right\vert\) mixer, respectively. For all single constraint problems, QAOA with Zeno dynamics produces a superior approximation ratio and in-constraint probability (solid line is above dotted line with the same color). As penalty factor tuning is prohibitively difficult for problems with multiple constraints (see the Results Section), for these problems only Zeno dynamics results are presented.

For instances with a single constraint (see dotted lines in Fig. 2a–d) we perform extensive tuning of the penalty factor λ. For multi-constraint problems, the tuning becomes prohibitively expensive. Therefore, we exclude QAOA with constraints enforced through penalties from the comparison for problems with multiple constraints. The choice of the penalty factor and the difficulty of its optimization are discussed in detail in the next subsection.

We observe that Zeno dynamics (see solid lines in Fig. 2a–d) enables consistently better solution quality and in-constraint probability as compared to QAOA with constraints enforced using a penalty (dotted lines) for all problems considered. Furthermore, Fig. 2b shows that for 6 and 10 assets the in-constraint probability drops off rapidly with the number of QAOA layers if the penalty factor is kept constant. This highlights an important limitation of enforcing the constraints via penalties, namely that the penalty factor must be tuned independently for each QAOA depth. In contrast, for QAOA with Zeno dynamics we obtain an explicit rule for how η, from (11), should change with the QAOA depth (see Corollary 3). However, for the numerics shown in Fig. 2, we fix η to ensure a constant minimum in-constraint probability per layer. We observe good performance despite η being a depth-independent constant in this case. We note that since η was held constant while p varied, the in-constraint probability slowly decreases with the number of layers as predicted by Corollary 3. For \(B=\left\vert +\right\rangle \left\langle +\right\vert\) mixer, this results in an average number of measurements of ≈ 77 for 6 assets and ≈ 35 for 7 assets.

Since multiple constraints can be efficiently handled in the Zeno framework, in Fig. 2e, f, we include the performance of QAOA with Zeno dynamics on problems with multiple constraints (one on the budget and one on the total expected return). The results show that the Zeno-enhanced QAOA is able to achieve a similar performance as it did for the single-constraint problems, with sufficiently high p.

We note that the in-constraint probability can be improved arbitrarily for the Zeno dynamics approach by decreasing η, without the need to re-optimize the QAOA parameters. This is due to the objective function landscape becoming independent of η as the Zeno limit is approached. In fact, we observe that transferring parameters from a smaller to a larger number of measurements (larger to smaller η) works well even for practically relevant values of η. Figure 3 shows the approximation ratio r and in-constraint probability with directly optimized QAOA parameters and with pre-optimized parameters transferred from a fixed value of η = 1.6 (marked with a star in the plot). We observe that for sufficiently small η, transfer works well and the difference in approximation ratio is negligible. Specifically, parameter transfer using the B = ∑jxj mixer and a total of 33, 75, and 200 measurements results in in-constraint probabilities of at least 85%, 89%, and 96%, respectively for the nine-assets, single-constraint problem at p = 5. At the same time, if the number of measurements is very small (η large), the objective function landscape is very different from the landscape in the Zeno limit, and the parameter transfer does not work well. We remark that while the in-constraint probability increases monotonically as η decreases, no such guarantee is given for approximation ratio r. In fact, in Fig. 3 we observe that depending on the problem and the circuit depth, r can either increase or decrease with η.

Fig. 3: Transferability of parameters in QAOA with Zeno dynamics.
figure 3

Performance of a 1-layer (a) and 5-layer (b) QAOA with Zeno dynamics and mixer B = ∑jxj with directly optimized parameters (ropt, 1 − δopt) and with parameters transferred from a fixed value of η = 1.6 (rtran, 1 − δtran). The source is marked with a star. Corresponding to each case, r signifies the approximation ratio and δ the out-of-constraint probability. For values of the hyperparameter η, which controls the number of measurements and is defined in Equation (11), smaller than 1.6, the difference between performance with optimized and transferred parameters is negligible (dashed line very close to the solid line).

Note that the same approach of boosting the in-constraint probability without re-optimizing the QAOA parameters does not work if the constraints are enforced using penalties. Figure 4 shows that transferring parameters from a fixed value of penalty factor (marked with a star) leads to the approximation ratio rapidly dropping off to random guess. It is however possible that better performance may be achieved by leveraging more sophisticated parameter transfer strategies, such as the rescaling rule proposed for the weighted MaxCut problem41,42 or machine learning methods43.

Fig. 4: Transferability of parameters in QAOA with penalty terms.
figure 4

Performance of QAOA with B = ∑jxj mixer and constraints enforced through penalties with parameters transferred from a fixed value of penalty factor λ = 0.1 (source marked with a star). The out-of-constraint probability is δ. The approximation ratio r (Equation (23)), unlike rpenalty (Equation (24)), excludes the penalty objective and drops off to random guess if transferring parameters to values of λ sufficiently different from source.

While for QAOA with Zeno dynamics the approximation ratio r given in Equation (23) increases monotonically with the number of QAOA layers, this is not guaranteed for QAOA with constraints enforced through penalties. This is because the QAOA parameters are chosen with respect to the objective with penalties and the increased expressivity of the higher-depth circuit is only guaranteed to improve the performance with respect to that objective. Figure 5 shows that this is indeed the case and the approximation ratio rpenalty given in Equation (24) increases with the number of QAOA layers as expected.

Fig. 5: Approximation ratio of QAOA with penalty terms using different numbers of QAOA layers.
figure 5

The approximation ratio (as defined in Equation (24)) for the full objective with penalty terms increases monotonically with the number of QAOA layers, as expected. However, the in-constraint approximation ratio (as defined in Equation (23)) is not guaranteed to change monotonically, as seen in Fig. 2a, c. Color denotes the number of assets in the optimization problem, as shown in the legend. The markers ✖ and ✚ indicate whether QAOA used the B = ∑jxj mixer or \(B=\left\vert +\right\rangle \left\langle +\right\vert\) mixer, respectively.

Finally, we include the results for Zeno-enhanced L-VQE with L = 1 in Equation (6). The structure of L-VQE is presented in Equation (26) and further described in the Methods Section. However, instead of using Corollary 1 to determine a sufficient value for the number of measurements N, we heuristically set N = 100. Table 1 presents the results. As expected, L-VQE achieves high approximation ratio, while Zeno dynamics enables high in-constraint probability. As the total number of measurements is kept fixed for all problems and parameter values, slightly lower in-constraint probability is observed for higher qubit counts. As is the case for QAOA, the in-constraint probability can be increased by increasing the number of measurements.

Table 1 Performance of L-VQE with Zeno dynamics on the benchmark problems.

Penalty factor tuning is difficult

An important advantage of our method is the simplicity of hyperparameter tuning, as only η in Equation (11) needs to be chosen. This choice is made easy by Theorem 1 and its corollaries, which imply the monotonic increase of in-constraint probability with decrease in η. This is in sharp contrast with the penalty approach, where the performance crucially depends on the penalty strength, which is hard to tune in general. We now present how the penalty strength was chosen for the experiments above, and discuss the challenges that arose in doing so.

Figure 6 presents the performance of QAOA on a single-constraint problem enforced using a penalty term with varying penalty factors λ. In the plot, the in-constraint probability 1 − δ monotonically increases with λ, while the approximation ratio r decreases. This indicates a trade-off between r and the out-of-constraint probability δ, and hence hyperparameter tuning on λ must be performed in order to obtain a good approximation ratio while meeting requirements on the minimum in-constraint probability. We also observe that for QAOA with small p, 1 − δ tends to levels off at a value far below what is achievable by using Zeno dynamics. For example, the top figure in Fig. 6 shows that the highest in-constraint probability achievable with p = 1 is around 80% for the problem tested. Given that the approximation ratio with the penalty term rpenalty is above 0.9 for the high λ regime, it indicates that the maximum achievable in-constraint probability may be limited by the expressivity of the variational circuit. On the other hand, constraints enforced by Zeno dynamics do not suffer from such problems, as the in-constraint probability can be arbitrarily boosted regardless of the expressivity of the varational circuit (see Fig. 3). In the numerical experiments, we choose the value of λ independently for each problem instance with the goal of obtaining a high in-constraint probability 1 − δ. Since we show that the factor λ trades off r and δ, both cannot be improved at the same time. This suggests that there does not exist a choice of λ such that QAOA with the penalty method outperforms QAOA with Zeno dynamics.

Fig. 6: Difficulty of penalty factor tuning for QAOA with a single penalty term.
figure 6

Performance of QAOA with a single constraint enforced through a penalty term with varying penalty factors λ. A trade-off occurs between the approximation ratio r (Equation (23)) and the in-constraint probability 1 − δ. As shown in a the maximum in-constraint probability is limited by the expressivity of the QAOA circuit at low depth (1 QAOA layer, or p = 1). With 5 layers (b), QAOA is able to achieve better performance in terms of the penalized objective, as indicated by the approximation ratio rpenalty (Equation (24)). However, there is still a significant trade-off between the true objective r and in-constraint probability.

For problems with multiple constraints, hyperparameter tuning should generally be performed on each penalty factor λj included in the relaxed objective (Equation (21)). This means that hyperparameter tuning can quickly become infeasible, as the search space for all λj’s grows exponentially with the number of penalty terms. We show in Fig. 7 how hyperparameter tuning works with two penalty factors: λ1 and λ2, which correspond to penalty terms enforcing the budget constraint and the return constraint respectively. The figure shows the in-constraint probability of the optimal solution obtained with varying λ1 and λ2. Similar to the single-constraint case, maximal approximation ratio r and maximal in-constraint probability 1 − δ cannot be simultaneously achieved. Specifically, the solutions with the maximal r and maximal 1 − δ have very different values in λ1 and λ2. Moreover, unlike Fig. 6, Fig. 7 clearly shows the non-monotonic behavior of 1 − δ in both λ1 and λ2. In fact, we observe a similar behavior across many of the single- and multi-constraint problems that we have tested, and for both the B = ∑jxj and \(B=\left\vert +\right\rangle \left\langle +\right\vert\) mixers. This indicates that tuning the penalty factors is indeed difficult in the general case.

Fig. 7: Difficulty of penalty factor tuning for QAOA with two penalty terms.
figure 7

In-constraint probability of optimized solution using QAOA applied to an objective with two penalty functions, associated with separate constraints. The corresponding penalty factors are indicated by λ1 and λ2, respectively. One is a maximum budget constraint and the other is minimum return constraint. The value δ is the out-of-constraint probability, and r is the approximation ratio (as defined in Equation (23)). The square highlighted in red corresponds to the maximum in-constraint probability (1 − δ) over all combinations of the two penalty factors, and the square highlighted in green corresponds to the maximum r. This highlights that both large in-constraint probability and large approximation ratio cannot be obtained. The figure shows results for the \(B=\left\vert +\right\rangle \left\langle +\right\vert\) mixer and 3-layer QAOA, though we observe similar behavior for all mixers and QAOA depths considered.

Hardware experiments

While the numerical experiments presented earlier show evidence of the performance of our technique, they do not make use of any concrete circuit implementations of the constraint-checking oracles. In this section, we consider optimized circuit implementations of constraint-checking oracles for two proof-of-concept portfolio optimization problems on noisy quantum hardware. This enables us to validate all of the hardware features, such as mid-circuit measurements and quantum conditional logic (QCL), that are required to implement the efficient oracle construction presented in the Methods Section.

We execute QAOA with Zeno dynamics on the Quantinuum H1-2 trapped-ion quantum processor. Our implementation uses constraint-checking oracles that perform quantum arithmetic in the Fourier domain, following directly the construction in the Methods Section. We observe that increasing the number of measurements improves the in-constraint probability 1 − δ, as expected. The improvement from additional measurements continues up to a two-qubit gate depth of 148, at which point the hardware noise prevents further improvements.

The experiments presented in this Section utilize p = 1 QAOA and the B = ∑jxj mixer. We use the cost function of the four-assets portfolio optimization problem used in the numerics described in the Results Section, but apply different constraints. We consider two instances with linear constraints, one with an equality constraint and one with an inequality constraint. Figure 8 shows a high-level circuit diagram. For each problem, the QAOA parameters are first optimized using a noiseless simulator. All circuit executions use 2000 shots and no error mitigation.

Fig. 8: Quantum circuit for QAOA with Zeno dynamics.
figure 8

QAOA circuit with Zeno dynamics used in hardware runs for one-layer QAOA (p = 1) on four-asset problems. The operator S prepares a uniform superposition over feasible states.

The first portfolio optimization instance we consider has an equality constraint on the four binary variables x1, x2, x3, x4: 2x1 − x2 − x3 = 0. As discussed in the Methods Section, the semiclassical quantum Fourier transform (QFT) can be utilized for equality constraints. The semiclassical QFT makes use of QCL and midcircuit measurements, which are features supported by the H1-2 device. This results in an oracle that uses only one auxiliary qubit, and thus the circuit uses five qubits in total. The circuit for the oracle is shown in Fig. 9. We note that the uncomputation step consists of resetting the one auxiliary qubit to the \(\left\vert +\right\rangle\) state.

Fig. 9: QFT adder with QCL.
figure 9

Quantum circuits for (a) semiclassical quantum Fourier transform adder with quantum conditional logic (QCL) used in the hardware experiments involving equality constraints and (b) the four-qubit rotation gate used in a. Note that R(α) denotes a phase gate. For the equality-constraint experiment executed on the H1-2 quantum device, we set a (a1, a2, a3, a4) = (2, − 1, − 1, 0). The uncomputation step consists of resetting the auxiliary qubit to the \(\left\vert +\right\rangle\) state.

As a comparison, we also implement the coherent QFT (Fig. 10) on three qubits, resulting in seven qubits in total. After applying the oracle and measuring, all auxiliary qubits are reset to the ground state for the uncompute step. Figure 11a shows the in-constraint probability as a function of the number of projective measurements. Figure 12a shows the distributions of measurement outcomes of QAOA for varying numbers of measurements (N), with the outcomes (computational basis states) ordered by the objective function value. For both implementations, the in-constraint probability improves with the number of measurements up to N ≈ 15. For a higher number of measurements, the hardware noise arising from high circuit depth prevents further improvements in the in-constraint probability 1 − δ.

Fig. 10: QFT adder without QCL.
figure 10

Quantum circuits for (a) the quantum Fourier transform (QFT) adder used in the hardware experiments and (b) the four-qubit rotation gate used in a. Note that R(α) denotes a phase gate. For the inequality-constraint experiment, we set a1 = a2 = a3 = a4 = 1, d = − 3 and used four qubits for precision. For the equality-constraint experiment, without quantum conditional logic (QCL), we set a1 = 2, a2 = a3 = − 1, a4 = d = 0 and used only three qubits for precision. For the inequality constraint, the inverse of the oracle is applied after measuring the qubit encoding the sign. However, for the equality constraint, since all auxiliary qubits are measured, we do not need to apply the inverse QFT operator and can simply reset all auxiliary qubits to the ground state. Note that here the inverse QFT operator (QFT) does not include swaps as the reordering has been done by rearranging the banks of controlled rotations.

Fig. 11: Simulation and hardware experiment results using QAOA with Zeno dynamics.
figure 11

QAOA with p = 1 and Zeno dynamics was applied to solve a four-asset problem with an equality constraint 2x1 − x2 − x3 = 0 (a) and inequality constraint \(\mathop{\sum }\nolimits_{j = 1}^{4}{x}_{j}\le 2\) (b). The circuits were executed on a classical simulator and on the H1-2 quantum device. The oracles are implemented using arithmetic in the Fourier domain. For the equality constraint (a), the quantum conditional logic (QCL) implementation of the Fourier adder used one auxiliary qubit, and the version without QCL used three auxiliary qubits. The Fourier adder used for the inequality constraint (b) used four auxiliary qubits. Error bars indicate the standard error of the mean arising from finite sampling (2000 shots). The in-constraint probability 1 − δ grows with the number of measurements (N).

Fig. 12: Effectiveness of constraint enforcement using QAOA with Zeno dynamics in simulation and hardware experiments.
figure 12

Distribution of final measurement results obtained from QAOA applied to the equality- (a) and inequality-constrained (b) problem for different numbers of measurements (N). For the equality-constrained problem experiments were executed both with and without quantum conditional logic (QCL). Each column corresponds to a computational basis state (either in-constraint or out-of-constraint), and the columns are ordered by objective value (to the right is better). The circuits were executed on a classical simulator and on the H1-2 quantum device. There is strong agreement between the hardware results and results from noise-free simulation.

While the QCL and non-QCL implementations both perform similarly, we do note a reduction in the number of two-qubit gates and auxiliary qubits. For QCL and N = 15, the two-qubit gate depth was 122 and the count was 123. Without QCL, for N = 15, the two-qubit gate depth was 148 and the count was 165. The similar performance between QCL and non-QCL versions despite the difference in gate count may be due to the higher impact of measurement error on the QCL implementation.

The second portfolio optimization instance we consider has a cardinality (Hamming-weight) inequality constraint \(\mathop{\sum }\nolimits_{j = 1}^{4}{x}_{j}\le 2\). For this problem, it is necessary to utilize the coherent QFT, and thus QCL does not lead to a resource-requirement reduction. The QFT adder is used to compute ∑jxj − 3, which requires four qubits to accommodate the range. In addition, unlike the equality-constraint case, the inverse oracle is necessary for uncomputation. The system is in-constraint when the most-significant qubit, i.e., the sign bit, is a one. The circuit for the oracle is shown in Fig. 10. Similar to the previous run, we plot the in-constraint probability for varying numbers of measurements (Fig. 11b), as well as, the measurement distributions obtained from QAOA (Fig. 12b). For N = 3, the two-qubit gate depth is 112 and the count is 186. Similarly to the experiments with the equality constraint, the in-constraint probability 1 − δ improves until N = 3. For a higher number of measurements, the hardware noise prevents further improvements.

Note that the performance deteriorates at a significantly lower N for the inequality constraint problem than equality. This occurs even though the two-qubit circuit depth is lower for the inequality case and the two-qubit gate count is not significantly higher. Besides the inclusion of an additional qubit, one potential reason for this is that for the inequality constraint, only one of the auxiliary qubits is measured and then the inverse oracle is applied. This allows for errors to accumulate more and propagate to the rest of circuit. However, in the equality constraint case, after applying the oracle, all auxiliary qubits are measured and then reset to the ground state. In addition, the total gate count happens to be significantly higher for the inequality constraint case.


In this work, we propose an approach for enforcing constraints in quantum optimization and demonstrate its effectiveness by applying it to constrained instances of portfolio optimization in simulation and on a trapped-ion quantum processor. Our technique has two major advantages: the ability to enforce a very general class of constraints and the simplicity of hyperparameter tuning. Two important downsides of our approach are the complexity of implementing the measurement and the possibility of the measurements resulting in trivial dynamics.

Implementing the oracle for a constraint in general requires quantum arithmetic and may lead to high gate count for more complex constraints. However, the asymptotic efficiency of our approach makes it viable for fault-tolerant quantum devices. Additionally, reductions in the cost of implementing quantum arithmetic, such as techniques utilizing quantum conditional logic, can further reduce the overhead of the proposed method.

Moreover, for noisy quantum devices, additional performance improvements can be obtained by leveraging advanced algorithm-specific error mitigation techniques such as the ones recently proposed for QAOA44,45. Such techniques may help bridge the gap between the noisy near-term devices and the error correction likely required to execute circuits of sufficient depth to provide performance improvements over classical algorithms46,47,48.

As discussed in the Results Section, restricting the evolution to the Zeno subspace may result in trivial dynamics for certain mixers. Therefore an important consideration when applying the proposed technique is evaluating whether the particular choice of mixer has this behavior. As this effect would apply generally to all instances with a given class of constraints, the mixer only needs to be analyzed once for a class of problems.



We begin by briefly introducing the relevant concepts and setting the notation. We undertake the task of minimizing an objective function f defined on the Boolean cube, \({{\mathbb{B}}}^{n}\), over the set of feasible solutions \({{{{{{{\mathcal{F}}}}}}}}\subseteq {{\mathbb{B}}}^{n}\):

$$\mathop{\min }\limits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{F}}}}}}}}}f({{{{{{{\boldsymbol{x}}}}}}}}).$$

We consider sets \({{{{{{{\mathcal{F}}}}}}}}\) of the form \({{{{{{{\mathcal{F}}}}}}}}=\{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{B}}}^{n}\ | \ {\bar{g}}_{j}({{{{{{{\boldsymbol{x}}}}}}}})=0\ \forall j\}\), where \({\bar{g}}_{j}({{{{{{{\boldsymbol{x}}}}}}}})\) is an oracle that returns 0 if x satisfies the j-th constraint and a value strictly greater-than 0 otherwise. This general definition includes most commonly considered problems such as those with equality and inequality constraints.

This constrained optimization problem can be solved by relaxing the constraints and introducing penalty terms as follows:

$$\mathop{\min }\limits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{B}}}^{n}}{f}_{{{{{{{{\rm{penalty}}}}}}}}}=\mathop{\min }\limits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{B}}}^{n}}f({{{{{{{\boldsymbol{x}}}}}}}})+\mathop{\sum}\limits_{j}{\lambda }_{j}{\bar{g}}_{j}({{{{{{{\boldsymbol{x}}}}}}}}),$$

where \({\lambda }_{j}\in {{\mathbb{R}}}^{+}\) are the penalty factors.

Specifically, for an equality constraint g(x) = 0, the penalty function may be written as


On the other hand, an inequality constraint g(x) ≥ 0 can be converted into an equivalent equality constraint \(g({{{{{{{\boldsymbol{x}}}}}}}})-\hat{s}=0\) by introducing a slack variable\(\hat{s}\in [0,{g}_{\max }]\), where \({g}_{\max }=\mathop{\max }\nolimits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{F}}}}}}}}}g({{{{{{{\boldsymbol{x}}}}}}}})\). If we assume g(x) can be discretized with a spacing of Δg, then \(\hat{s}\) can be implemented using \({n}_{{{{{{{{\rm{slack}}}}}}}}}=\lceil {\log }_{2}({g}_{\max }/{\Delta }_{g})\rceil\) binary variables \({{{{{{{\boldsymbol{s}}}}}}}}={({s}_{1},\ldots ,{s}_{{n}_{{{{{{{{\rm{slack}}}}}}}}}})}^{{\mathsf{T}}}\), and the resultant equality constraint is g(x) − Δgj2j−1sj = 0. Therefore the penalty function for an inequality constraint can be written as

$$\bar{g}({{{{{{{\boldsymbol{x}}}}}}}};{{{{{{{\boldsymbol{s}}}}}}}})={\left[g({{{{{{{\boldsymbol{x}}}}}}}})-{\Delta }_{g}\mathop{\sum }\limits_{j = 1}^{{n}_{{{{{{{{\rm{slack}}}}}}}}}}{2}^{j-1}{s}_{j}\right]}^{2}.$$

The magnitudes of the penalty factors λj control how much the constraint violations are penalized. Intuitively, a higher value of λj should lead to a higher in-constraint probability. However, in practice, the relationship between the penalty factor, the in-constraint probability and the solution quality may be non-monotonic. This makes choosing λj harder. We discuss the difficulty of tuning the penalty factors in the Results Section.

Quantum algorithms for approximate optimization

In this work, we focus on the class of quantum optimization algorithms that use a parameterized quantum evolution to prepare a state, such that the corresponding measurement outcomes contain a high-quality, valid solution to the original optimization problem with high probability. This parameterized state, a restatement of Equation (1), is prepared by applying a parameterized evolution U(θ) to some initial state \(\left\vert s\right\rangle\):

$$\left\vert \psi ({{{{{{{\boldsymbol{\theta }}}}}}}})\right\rangle =U({{{{{{{\boldsymbol{\theta }}}}}}}})\left\vert s\right\rangle =\mathop{\prod }\limits_{j=1}^{m}{e}^{-i{\theta }_{j}{H}_{j}}\left\vert s\right\rangle ,$$

where Hj is some Hamiltonian, e.g., a tensor product of single-qubit Pauli operators.

Let \(C={\sum }_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{B}}}^{n}}f({{{{{{{\boldsymbol{x}}}}}}}})\left\vert {{{{{{{\boldsymbol{x}}}}}}}}\right\rangle \left\langle {{{{{{{\boldsymbol{x}}}}}}}}\right\vert\) be the operator encoding the objective function f on qubits and \({C}_{{{{{{{{\rm{penalty}}}}}}}}}={\sum }_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{B}}}^{n}}{f}_{{{{{{{{\rm{penalty}}}}}}}}}({{{{{{{\boldsymbol{x}}}}}}}})\left\vert {{{{{{{\boldsymbol{x}}}}}}}}\right\rangle \left\langle {{{{{{{\boldsymbol{x}}}}}}}}\right\vert\) be the operator encoding the relaxed objective function (21). The figures of merit used to evaluate the quality of a parameter θ* obtained by algorithms that employ parameterized circuit (22) are approximation ratios, defined as follows:

$$r=\frac{\left\langle \psi ({{{{{{{\boldsymbol{{\theta }}}}}}}^{* }}})\right\vert {C}_{{{{{{{{\mathcal{F}}}}}}}}}\left\vert \psi ({{{{{{{\boldsymbol{{\theta }}}}}}}^{* }}})\right\rangle -{f}^{\max }}{{f}^{\min }-{f}^{\max }}$$


$${r}_{{{{{{{{\rm{penalty}}}}}}}}}=\frac{\left\langle \psi ({{{{{{{\boldsymbol{{\theta }}}}}}}^{* }}})\right\vert {C}_{{{{{{{{\rm{penalty}}}}}}}}}\left\vert \psi ({{{{{{{\boldsymbol{{\theta }}}}}}}^{* }}})\right\rangle -{f}_{{{{{{{{\rm{penalty}}}}}}}}}^{\max }}{{f}_{{{{{{{{\rm{penalty}}}}}}}}}^{\min }-{f}_{{{{{{{{\rm{penalty}}}}}}}}}^{\max }},$$

where \({C}_{{{{{{{{\mathcal{F}}}}}}}}}={\sum }_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{F}}}}}}}}}f({{{{{{{\boldsymbol{x}}}}}}}})\left\vert {{{{{{{\boldsymbol{x}}}}}}}}\right\rangle \left\langle {{{{{{{\boldsymbol{x}}}}}}}}\right\vert\), \({f}^{\min }=\mathop{\min }\nolimits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{F}}}}}}}}}f({{{{{{{\boldsymbol{x}}}}}}}})\), \({f}^{\max }=\mathop{\max }\nolimits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{F}}}}}}}}}f({{{{{{{\boldsymbol{x}}}}}}}})\), \({f}_{{{{{{{{\rm{penalty}}}}}}}}}^{\min }=\mathop{\min }\nolimits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{B}}}^{n}}{f}_{{{{{{{{\rm{penalty}}}}}}}}}({{{{{{{\boldsymbol{x}}}}}}}})\), and \({f}_{{{{{{{{\rm{penalty}}}}}}}}}^{\max }=\mathop{\max }\nolimits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{B}}}^{n}}{f}_{{{{{{{{\rm{penalty}}}}}}}}}({{{{{{{\boldsymbol{x}}}}}}}})\).

This class of algorithms includes QAOA1,2,48 and its generalization, the quantum alternating operator ansatz algorithm13. In both algorithms, the parameterized quantum evolution is performed by applying pairs of alternating operators:

$$\left\vert \psi ({{{{{{{\boldsymbol{\beta }}}}}}}},{{{{{{{\boldsymbol{\gamma }}}}}}}})\right\rangle =\mathop{\prod }\limits_{j=1}^{p}\left[{U}_{B}({\beta }_{j}){U}_{C}({\gamma }_{j})\right]\left\vert s\right\rangle ,$$

where \({U}_{C}({\gamma }_{j})={e}^{-i{\gamma }_{j}C}\) is the phase operator, and UB(βj) is the mixing operator. In the special case of QAOA, the initial state \(\left\vert s\right\rangle\) is the uniform superposition over all computational basis states and the mixing operator UB is set to be \({U}_{B}({\beta }_{j})={e}^{-i{\beta }_{j}B}\), where B = ∑kxk is a sum of single-qubit Pauli-x operators. In quantum alternating operator ansatz, UB and \(\left\vert s\right\rangle\) are allowed to be arbitrary, and are typically set such that the resulting state \(\left\vert \psi ({{{{{{{\boldsymbol{\beta }}}}}}}},{{{{{{{\boldsymbol{\gamma }}}}}}}})\right\rangle\) preserves the constraints, in the sense that every measurement outcome x belongs to \({{{{{{{\mathcal{F}}}}}}}}\). In this paper, we consider QAOA with an arbitrary mixing Hamiltonian B, defined in ref. 13 as Hamiltonian-based QAOA. In all other sections of this paper, unless it is specified otherwise, the acronym QAOA is used to denote this version of the algorithm.

In addition to QAOA, we consider the layer variational quantum eigensolver (L-VQE)31, which is a version of VQE with the hardware-efficient layered parameterized circuit tailored towards optimization problems. L-VQE uses the parameterized circuit of the form

$$\mathop{\prod }\limits_{j=1}^{p}\left[{U}_{{{{{{{{\rm{NN}}}}}}}}}({{{{{{{{\boldsymbol{\theta }}}}}}}}}_{j})\right]V({{{{{{{{\boldsymbol{\theta }}}}}}}}}_{0})\left\vert 0\right\rangle ,$$

where UNN consists of nearest-neighbor CNOT’s and single-qubit Ry’s, and V is a layer of single-qubit Ry’s. The reader is referred to ref. 31 for the precise definition of the circuit. While the circuit includes non-parameterized CNOT’s, it is easy to write it equivalently in the form of Equation (22) by pushing Ry through the control of the CNOT and noting that \({{{{{{{\rm{Ry}}}}}}}}(\theta )={e}^{-i\frac{\theta }{2}{{{{{{{\rm{y}}}}}}}}}\) and \({{{{{{{{\rm{cnot}}}}}}}}}_{1,2}{{{{{{{{\rm{Ry}}}}}}}}}_{2}(\theta ){{{{{{{{\rm{cnot}}}}}}}}}_{1,2}={e}^{-i\frac{\theta }{2}{{{{{{{{\rm{z}}}}}}}}}_{1}{{{{{{{{\rm{y}}}}}}}}}_{2}}\). Here, yj and zj denote a single-qubit Pauli-y and Pauli-z, respectively, acting on the j-th qubit.

Quantum Zeno dynamics

The quantum Zeno effect (QZE)49,50 is named after Zeno’s paradox51, which regards the continuous observation of a moving arrow. Zeno’s paradox states that an arrow cannot move if no time has elapsed since the point it was last observed. If the time difference between observations is Δt, continuous observation occurs in the limit of Δt → 0. Under continuous observation, no time elapses between observations, and during each observation the arrow is not moving; thus, no overall movement is possible. The analog in quantum mechanics is a consequence of the Schrödinger equation. We first introduce a simpler one-dimensional version, in which the quantum state is restricted from evolving due to repeated measurements, and then present a more general case in which the dynamics of the system are restricted to a particular subspace, called a Zeno subspace.

Suppose a time-dependent quantum state is evolved in a finite-dimensional Hilbert space \({{{{{{{\mathcal{H}}}}}}}}\) from some initial state \(\left\vert {\psi }_{0}\right\rangle\) under the action of some Hamiltonian H for time t. Define a projective measurement \({{{{{{{\mathcal{P}}}}}}}}\) given by a pair of complement projections \(P=\left\vert {\psi }_{0}\right\rangle \left\langle {\psi }_{0}\right\vert\) and Q = I − P, which acts on a density operator ρ as

$${{{{{{{\mathcal{P}}}}}}}}\rho =P\rho P+Q\rho Q.$$

If we carry out N repeated projective measurements \({{{{{{{\mathcal{P}}}}}}}}\) at a time interval of t/N, then the probability that the system remains in the initial state is

$$\begin{array}{rcl}p(t)&=&\parallel P{e}^{-iHt/N}\left\vert {\psi }_{0}\right\rangle {\parallel }_{2}^{2N}\\ &=&{\left[| \left\langle {\psi }_{0}\right\vert {e}^{-iHt/N}\left\vert {\psi }_{0}\right\rangle {| }^{2}\right]}^{N}\\ &=&{\left[1-{(t/N{\tau }_{Z})}^{2}\right]}^{N}+O({N}^{-2})\mathop{\to }\limits^{N\to \infty }1,\end{array}$$

where \({\tau }_{Z}^{-2}=\left\langle {\psi }_{0}\right\vert {H}^{2}\left\vert {\psi }_{0}\right\rangle -\left\langle {\psi }_{0}\right\vert H{\left\vert {\psi }_{0}\right\rangle }^{2}\) is called the Zeno time and quantifies how often the measurements need to be taken. As the frequency at which the measurements are performed increases without bound, the probability of remaining in the initial state approaches one.

Quantum Zeno dynamics (QZD)52,53,54,55 considers the more general case where the evolution of the state is constrained to a subspace of dimension greater than one. Thus the projective measurement \({{{{{{{\mathcal{P}}}}}}}}\) can contain multiple projections with ranks all greater than one. Specifically, a restatement of Equation (2),

$${{{{{{{\mathcal{P}}}}}}}}\rho =\mathop{\sum }\limits_{j=1}^{k}{P}_{j}\rho {P}_{j},$$

where \(\mathop{\sum }\nolimits_{j = 1}^{k}{P}_{j}={{{{{{{\rm{I}}}}}}}}\), and Pj is a projection onto some subspace \({{{{{{{{\mathcal{H}}}}}}}}}_{j}={P}_{j}{{{{{{{\mathcal{H}}}}}}}}\) of dimensionality \({{{{{{{\rm{Tr}}}}}}}}({P}_{j})\ge 1\). Informally, QZD states that if the evolution starts in \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\) and the measurement \({{{{{{{\mathcal{P}}}}}}}}\) is performed sufficiently often, then the system will remain in \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\) with high probability.

Consider an initial state ρ0, after N projective measurements by \({{{{{{{\mathcal{P}}}}}}}}\), the state of the system is given by

$$\rho (t)={{{{{{{\mathcal{U}}}}}}}}(t){\rho }_{0}{{{{{{{\mathcal{U}}}}}}}}{(t)}^{{{{\dagger}}} },$$

where \({{{{{{{\mathcal{U}}}}}}}}(t)={({{{{{{{\mathcal{P}}}}}}}}{e}^{-iHt/N})}^{N}\) and \(p(t)={{{{{{{\rm{Tr}}}}}}}}({P}_{j}\rho (t))\) is the probability of the system remaining in \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\) after evolving for time t. Note that

$$\begin{array}{rcl}{{{{{{{\mathcal{U}}}}}}}}(t)&=&{\left({{{{{{{\mathcal{P}}}}}}}}{e}^{-iHt/N}\right)}^{N}\\ &=&{\left({{{{{{{\mathcal{P}}}}}}}}[{{{{{{{\rm{I}}}}}}}}-iHt/N+O({N}^{-2})]\right)}^{N}.\\ &=&{\left({{{{{{{\rm{I}}}}}}}}-i{{{{{{{\mathcal{P}}}}}}}}Ht/N+O({N}^{-2})\right)}^{N}\\ &=&{\left({{{{{{{\rm{I}}}}}}}}-i{{{{{{{\mathcal{P}}}}}}}}Ht/N\right)}^{N}+O({N}^{-1})\end{array}$$
$$\mathop{\to }\limits^{N\to \infty }{e}^{-i{{{{{{{\mathcal{P}}}}}}}}Ht}{{{{{{{\mathcal{P}}}}}}}},$$

and the dynamics of the system are governed by \({H}_{Z}={{{{{{{\mathcal{P}}}}}}}}H\), called the Zeno Hamiltonian. Moreover, as N → , transitions between different subspaces \(\{{{{{{{{{\mathcal{H}}}}}}}}}_{1},\ldots ,{{{{{{{{\mathcal{H}}}}}}}}}_{k}\}\) of \({{{{{{{\mathcal{H}}}}}}}}\) are suppressed. This implies if ρ0 = Pjρ0Pj for some j [k]  {1, …, k}, then in the limit of N → , called the Zeno limit, it follows that p(t) → 1, and thus the state will remain in \({{{{{{{{\mathcal{H}}}}}}}}}_{j}\) throughout the evolution. For a more detailed discussion the reader is referred to refs. 54,55.

QZE has many applications in algorithms and error mitigation. Childs et al.56 propose a version of Grover’s search based on QZD that utilizes frequent measurements instead of slow adiabatic evolution. This alternative approach to slow evolution was also observed in ref. 57. Somma et al.58,59 develop a quantum-enhanced version of the simulated annealing algorithm. Their approach makes use of QZD to ensure that the evolution remains in the instantaneous quantum Gibbs state for varying temperature. Boixo et al.60 show that for Grover’s algorithm and simulated annealing based on QZD, one could use frequent randomized evolutions instead of measurements (the randomization method). The randomization method has also been used to implement algorithms for quantum linear systems61,62. Finally, dynamical decoupling, also called bang-bang decoupling63, is a popular error-mitigation technique that uses QZE to suppress decoherence55,64,65,66,67,68.

Proof of Theorem 1

In this Section we derive our main result, Theorem 1, for the number of measurements required to maintain a constant success probability. We start by deriving the required lemmas.

Lemma 1

Let H be a Hermitian matrix. Then

$$\begin{array}{c}\mathop{\min }\limits_{P,\left\vert \psi \right\rangle \in {{\mbox{Im}}} (P)}{\left\Vert P{e}^{-i\theta H}\left\vert \psi \right\rangle \right\Vert }_{2}^{2}={\cos }^{2}\left(\frac{{\xi }_{\max }-{\xi }_{\min }}{2}\theta \right)\\ \forall \theta \in {\mathbb{R}},| \theta | \le \frac{\pi }{{\xi }_{\max }-{\xi }_{\min }},\end{array}$$

where P is an orthogonal projector and \({\xi }_{\max }\) and \({\xi }_{\min }\) are the largest and smallest eigenvalues of H.


Suppose H has the following eigendecomposition

$$H=\mathop{\sum }\limits_{k=1}^{d}{\xi }_{k}{Q}_{k},$$

where ξk are the unique eigenvalues of H (including 0 if H is not full rank) and \({\{{Q}_{k}\}}_{k = 1}^{d}\) is the complete set of projectors onto the corresponding eigenspaces. Therefore

$$p(\theta ) = {\left\Vert P{e}^{-i\theta H}\left\vert \psi \right\rangle \right\Vert }_{2}^{2}\hfill\\ \ge {\left\Vert \left\vert \psi \right\rangle \left\langle \psi \right\vert \psi {e}^{-i\theta H}\left\vert \psi \right\rangle \right\Vert }_{2}^{2}\\ =\big| \left\langle \psi \right\vert {e}^{-i\theta H}\left\vert \psi \right\rangle {\big| }^{2}\\ =\mathop{\sum }\limits_{j,k=1}^{d}{e}^{i\theta ({\xi }_{j}-{\xi }_{k})}\left\langle \psi \right\vert {Q}_{j}\left\vert \psi \right\rangle \left\langle \psi \right\vert {Q}_{k}\left\vert \psi \right\rangle \\ =\mathop{\sum }\limits_{j,k=1}^{d}\cos (\theta ({\xi }_{j}-{\xi }_{k}))\left\langle \psi \right\vert {Q}_{j}\left\vert \psi \right\rangle \left\langle \psi \right\vert {Q}_{k}\left\vert \psi \right\rangle \\ =\mathop{\sum }\limits_{j,k=1}^{d}{c}_{jk}{x}_{j}{x}_{k},$$

where \({c}_{jk}=\cos (\theta ({\xi }_{j}-{\xi }_{k}))\), \({x}_{j}=\left\langle \psi \right\vert {Q}_{j}\left\vert \psi \right\rangle \ge 0\). Note that the second to the last equality follows from

$${e}^{i\theta ({\xi }_{j}-{\xi }_{k})}{x}_{j}{x}_{k}+{e}^{i\theta ({\xi }_{k}-{\xi }_{j})}{x}_{k}{x}_{j}=\cos (\theta ({\xi }_{j}-{\xi }_{k})){x}_{j}{x}_{k}+\cos (\theta ({\xi }_{k}-{\xi }_{j})){x}_{k}{x}_{j}.$$

Let C be the matrix with elements cjk at the j-th row and k-th column. Then using simple trigonometric identities, it can be shown that

$$C={{{{{{{\boldsymbol{v}}}}}}}}(\theta ){{{{{{{\boldsymbol{v}}}}}}}}{(\theta )}^{{\mathsf{T}}}+{{{{{{{\boldsymbol{v}}}}}}}}\left(\frac{\pi }{2}-\theta \right){{{{{{{\boldsymbol{v}}}}}}}}{\left(\frac{\pi }{2}-\theta \right)}^{{\mathsf{T}}}$$


$${{{{{{{\boldsymbol{v}}}}}}}}(\theta )={(\cos ({\xi }_{1}\theta ),\ldots ,\cos ({\xi }_{d}\theta ))}^{{\mathsf{T}}}.$$

Since C is the sum of positive semi-definite matrices, it too is positive semi-definite.

Therefore, minimizing p(θ) is equivalent to solving the following convex constrained minimization problem

$$\mathop{\min }\limits_{{{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{S}}}}}}}}}{{{{{{{{\boldsymbol{x}}}}}}}}}^{{\mathsf{T}}}C{{{{{{{\boldsymbol{x}}}}}}}},\,{{\mbox{where}}}\,\,{{{{{{{\mathcal{S}}}}}}}}:= \{{{{{{{{\boldsymbol{x}}}}}}}}\in {{\mathbb{R}}}_{+}^{d}\,| \,{{{{{ \Vert {{\boldsymbol{x}}}}}}}} \Vert _{1}=1\},$$

\({{{{{{{\boldsymbol{x}}}}}}}}={({x}_{1},\ldots ,{x}_{d})}^{{\mathsf{T}}}\) and thus a sufficient condition69, Theorem 2.2.5 for x to be the optimum is

$${{{{{{{{{\boldsymbol{x}}}}}}}}}^{\star }}^{{\mathsf{T}}}C({{{{{{{\boldsymbol{x}}}}}}}}-{{{{{{{{\boldsymbol{x}}}}}}}}}^{\star })\ge 0,\forall {{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{S}}}}}}}}$$

Consider the following trial solution

$$\begin{array}{l}{x}_{\min }^{\star }={x}_{\max }^{\star }=\frac{1}{2},\\ {x}_{j}^{\star }=0\quad \forall \,j\, \notin \, \{\min ,\max \}.\end{array}$$

We have that \(\forall {{{{{{{\boldsymbol{x}}}}}}}}\in {{{{{{{\mathcal{S}}}}}}}}\)

$$\begin{array}{l}2{{{{{{{{{\boldsymbol{x}}}}}}}}}^{\star }}^{{\mathsf{T}}}C({{{{{{{\boldsymbol{x}}}}}}}}-{{{{{{{{\boldsymbol{x}}}}}}}}}^{\star })\hfill\\ =(1+{c}_{\max ,\min })({x}_{\max }+{x}_{\min }-1)+\mathop{\sum}\limits_{j\notin \{\min ,\max \}}{x}_{j}({c}_{\max ,j}+{c}_{\min ,j})\hfill\\ =(1-{x}_{\max }-{x}_{\min })\left[\mathop{\sum}\limits_{j\notin \{\min ,\max \}}({c}_{\max ,j}+{c}_{\min ,j})-(1+{c}_{\max ,\min })\right]\end{array}$$

Also for \(| \theta | \le \frac{\pi }{{\xi }_{\max }-{\xi }_{\min }}\), we have \({c}_{j,k}\ge {c}_{\max ,\min }\), and thus

$$1+{c}_{\min ,\max } = 2{\cos }^{2}\left(\frac{{\xi }_{\max }-{\xi }_{\min }}{2}\theta \right)\\ \le 2\cos \left(\frac{{\xi }_{\max }-{\xi }_{\min }}{2}\theta \right)\cos \left(\frac{{\xi }_{\max }+{\xi }_{\min }-2{\xi }_{j}}{2}\theta \right)\\ = {c}_{\max ,j}+{c}_{\min ,j}.$$

Combining the above results, we obtain that \(2{{{{{{{{{\boldsymbol{x}}}}}}}}}^{\star }}^{{\mathsf{T}}}C({{{{{{{\boldsymbol{x}}}}}}}}-{{{{{{{{\boldsymbol{x}}}}}}}}}^{\star })\ge 0\). Thus our choice is optimal.

After, plugging in the optimal choice and noting that all steps are equalities in (31) when \(P=\left\vert \psi \right\rangle \left\langle \psi \right\vert\), we obtain:

$$\mathop{\min }\limits_{P,\left\vert \psi \right\rangle \in {{\mbox{Im}}} (P)}{\left\Vert P{e}^{-i\theta H}\left\vert \psi \right\rangle \right\Vert }_{2}^{2}={{{{{{{{{\boldsymbol{x}}}}}}}}}^{\star }}^{{\mathsf{T}}}C{{{{{{{{\boldsymbol{x}}}}}}}}}^{\star }={\cos }^{2}\left(\frac{{\xi }_{\max }-{\xi }_{\min }}{2}\theta \right).$$

Additionally, the result implies that minimization occurs when

$$\left\vert \psi \right\rangle =\left\vert {\pm }_{H}\right\rangle := \frac{1}{\sqrt{2}}\left\vert {\xi }_{\max }\right\rangle \pm \frac{1}{\sqrt{2}}\left\vert {\xi }_{\min }\right\rangle$$

for any \(\left\vert {\xi }_{\max }\right\rangle \in {{{{{\rm{Im}}}}}} ({Q}_{\max })\) and \(\left\vert {\xi }_{\min }\right\rangle \in {{{{{\rm{Im}}}}}} ({Q}_{\min })\).

Note as observed in the proof of Lemma 1, the lower bound on the in-constraint probability bound is saturated when the initial state is chosen to be either \(\left\vert {+}_{H}\right\rangle\) or \(\left\vert {-}_{H}\right\rangle\) in Equation (37), and P is the projector onto the chosen initial state.

Lemma 2

Let H be a Hermitian matrix. Then

$$ \mathop{\min }\limits_{P,\left\vert \psi \right\rangle \in {{\mbox{Im}}} (P)} \quad {\left\Vert P{\left({{{{{{{\mathcal{P}}}}}}}}{e}^{-i\frac{\theta }{N}H}\right)}^{N}\left\vert \psi \right\rangle \right\Vert }_{2}^{2}=\frac{1}{2}+\frac{1}{2}{\left[2{p}^{* }\left(\frac{\theta }{N}\right)-1\right]}^{N},\\ \forall \theta \in {\mathbb{R}},| \theta | \le \frac{\pi N}{{\xi }_{\max }-{\xi }_{\min }},$$

where \({{{{{{{\mathcal{P}}}}}}}}\) is a projective measurement as defined in Equation (27) with projectors P and I − P,

$${p}^{* }\left(\frac{\theta }{N}\right)={\cos }^{2}\left(\frac{{\xi }_{\max }-{\xi }_{\min }}{2N}\theta \right),$$

and \({\xi }_{\max }\) and \({\xi }_{\min }\) are the largest and smallest eigenvalues of H.


Consider a fixed θ and some N that satisfies the hypothesis. The stochastic process formed by random variables indicating whether the system is in Im(P) or its complement after each evolution segment \({{{{{{{\mathcal{P}}}}}}}}{e}^{-i\frac{\theta }{N}H}\) form a two-state Markov chain. According to Lemma 1, the probability of remaining in a state on the chain at any point in time is at least

$${p}^{* }\left(\frac{\theta }{N}\right):= {\cos }^{2}\left(\frac{{\xi }_{\max }-{\xi }_{\min }}{2N}\theta \right),$$

and this minimum probability is attained at each segment when \(\left\vert \psi \right\rangle\) is (37) and \(P=\left\vert \psi \right\rangle \left\langle \psi \right\vert\). Because, in this case, the evolution lies in the two-dimensional space spanned by \(\left\vert {\pm }_{H}\right\rangle\), the result is a Markov chain with transition matrix

$$A(k)=\bar{A}=\left(\begin{array}{cc}{p}^{* }&1-{p}^{* }\\ 1-{p}^{* }&{p}^{* }\end{array}\right),\forall k\in [N],$$

and k > N, A(k) = I.

Therefore the probability of the state remaining in Im(P) after N steps of the chain is \({\bar{A}}_{0,0}^{N}\), or the first diagonal element of the matrix \(\bar{A}\) after raising it to the N-th power. Applying diagonalization on \(\bar{A}\), we obtain

$${\bar{A}}_{0,0}^{N}=\frac{1+{(2{p}^{* }-1)}^{N}}{2}.$$

We now proceed to derive Theorem 1 using the above lemmas.

Proof of Theorem 1

For all \(\theta \in {\mathbb{R}}\), such that

$$| \theta | < \frac{N}{{\xi }_{\max }-{\xi }_{\min }},$$

it follows that

$$\begin{array}{rcl}{\cos }^{2}\left(\frac{{\xi }_{\max }-{\xi }_{\min }}{2N}\theta \right)&\ge &{\left(1-\frac{1}{2}{\left[\frac{\theta ({\xi }_{\max }-{\xi }_{\min })}{2N}\right]}^{2}\right)}^{2}\\ &\ge &1-\frac{{\left[\theta ({\xi }_{\max }-{\xi }_{\min })\right]}^{2}}{4{N}^{2}}.\end{array}$$

If we combine this result with Lemma 2, then we obtain

$$\begin{array}{rcl}\frac{1}{2}+\frac{1}{2}{\left[2{p}^{* }\left(\frac{\theta }{N}\right)-1\right]}^{N}&\ge &\frac{1}{2}+\frac{1}{2}{\left(1-\frac{{\left[\theta ({\xi }_{\max }-{\xi }_{\min })\right]}^{2}}{2{N}^{2}}\right)}^{N}\\ &\ge &\frac{1}{2}+\frac{1}{2}\exp \left(-\frac{{\left[\theta ({\xi }_{\max }-{\xi }_{\min })\right]}^{2}}{2N}\right)\end{array}$$

To lower bound this by 1 − δ, we can choose N as stated in Theorem 1. Note that to ensure Equation (41) we must have

$$\frac{{\left[\theta ({\xi }_{\max }-{\xi }_{\min })\right]}^{2}}{N} \, < \, N,$$

and thus

$$\frac{1}{2}+\frac{1}{2}\exp \left(-\frac{{\left[\theta ({\xi }_{\max }-{\xi }_{\min })\right]}^{2}}{2N}\right) > \frac{1}{2}+\frac{1}{2}\exp \left(-\frac{N}{2}\right).$$

At the minimum of value of N, we have

$$\frac{1}{2}+\frac{1}{2}\exp \left(-\frac{1}{2}\right)\, \lesssim \, 0.81.$$

Proof of Corollary 1


For simplicity, consider a single block of size m:

$${{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})={\left[\mathop{\prod }\limits_{j = 1}^{m}{e}^{-i({\theta }_{j}/N){H}_{j}}\right]}^{N}.$$

First, suppose that the elements of \({\{{H}_{j}\}}_{j = 1}^{m}\) do not all pairwise commute. Then, according to70, Proposition 9:

$${\left\Vert \mathop{\prod }\limits_{j = 1}^{m}{e}^{-i({\theta }_{j}/N){H}_{j}}-{e}^{-i\mathop{\sum }\nolimits_{j = 1}^{m}({\theta }_{j}/N){H}_{j}}\right\Vert }_{2}\le \frac{1}{2{N}^{2}}\mathop{\sum }\limits_{j=1}^{m}{\left\Vert \left[\mathop{\sum }\nolimits_{{j}^{{\prime} } = j+1}^{m}{\theta }_{{j}^{{\prime} }}{H}_{{j}^{{\prime} }},{\theta }_{j}{H}_{j}\right]\right\Vert }_{2}$$

This implies that

$${\left\Vert {{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})-{\left[{{{{{{{\mathcal{P}}}}}}}}{e}^{-i\mathop{\sum }\nolimits_{j = 1}^{m}({\theta }_{j}/N){H}_{j}}\right]}^{N}\right\Vert }_{2} \le \frac{1}{2N}\mathop{\sum }\limits_{j=1}^{m}{\left\Vert \left[\mathop{\sum }\limits_{{j}^{{\prime} } = j+1}^{m}{\theta }_{{j}^{{\prime} }}{H}_{{j}^{{\prime} }},{\theta }_{j}{H}_{j}\right]\right\Vert }_{2} \\ \le \frac{{\left[\mathop{\sum }\nolimits_{j = 1}^{m}| {\theta }_{j}| \right]}^{2}\mathop{\max }\nolimits_{j}{\left\Vert {H}_{j}\right\Vert }_{2}^{2}}{N}.$$


$${\left\Vert {P}_{{{{{{{{\mathcal{G}}}}}}}}}{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})\left\vert \psi \right\rangle \right\Vert }_{2}^{2}\le {\left({\left\Vert {P}_{{{{{{{{\mathcal{G}}}}}}}}}{\left[{{{{{{{\mathcal{P}}}}}}}}{e}^{-i\mathop{\sum }\nolimits_{j = 1}^{m}({\theta }_{j}/N){H}_{j}}\right]}^{N}\left\vert \psi \right\rangle \right\Vert }_{2}+\frac{{\left[\mathop{\sum }\nolimits_{j = 1}^{m}| {\theta }_{j}| \right]}^{2}{\max }_{j}{\left\Vert {H}_{j}\right\Vert }_{2}^{2}}{N}\right)}^{2}.$$

If we choose

$$N=\left\lceil\frac{4{\left[\mathop{\sum }\nolimits_{j = 1}^{m}| {\theta }_{j}| \right]}^{2}\mathop{\max }\nolimits_{j}{\left\Vert {H}_{j}\right\Vert }_{2}^{2}}{\ln {\left(1-\delta \right)}^{-2\alpha }}\right\rceil,$$

then for α≤1, Theorem 1 with Remark 1 implies that the out-of-constraint probability is at most

$${\left\Vert {P}_{{{{{{{{\mathcal{G}}}}}}}}}{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})\left\vert \psi \right\rangle \right\Vert }_{2}^{2}\le \frac{\delta }{2}+\alpha \frac{\sqrt{\delta }}{2}\ln {\left(1-\delta \right)}^{-2}+\frac{{\alpha }^{2}}{16}{\ln }^{2}{\left(1-\delta \right)}^{-2}$$
$$\le \frac{\delta }{2}+\frac{\delta }{2}\left[\alpha +\frac{{\alpha }^{2}}{8}\right],$$

where δ ≤ 0.19. If α = 0.89, then

$${\left\Vert {P}_{{{{{{{{\mathcal{G}}}}}}}}}{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})\left\vert \psi \right\rangle \right\Vert }_{2}^{2} \, < \, \delta .$$

To compensate for the decay of the success probability after L blocks, each Nk must be multiplied by L.

Lastly, for the asymptotic dynamics, from Equation (29)-(30) we get

$$\begin{array}{rcl}{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})&=&{\left[{{{{{{{\mathcal{P}}}}}}}}\mathop{\prod }\limits_{j = 1}^{m}\left({{{{{{{\rm{I}}}}}}}}-i({\theta }_{j}/N){H}_{j}+O({N}^{-2})\right)\right]}^{N}\\ &=&{\left[{{{{{{{\mathcal{P}}}}}}}}\left({{{{{{{\rm{I}}}}}}}}-i\mathop{\sum }\limits_{j = 1}^{m}({\theta }_{j}/N){H}_{j}+O({N}^{-2})\right)\right]}^{N}\end{array}$$
$$\mathop{\longrightarrow} \limits^{N\to \infty }{e}^{-i\mathop{\sum }\nolimits_{j = 1}^{m}{{{{{{{\mathcal{P}}}}}}}}{H}_{j}{\theta }_{j}}{{{{{{{\mathcal{P}}}}}}}}={e}^{-i{{{{{{{\mathcal{P}}}}}}}}{{{{{{{\boldsymbol{H}}}}}}}}\cdot {{{{{{{\boldsymbol{\theta }}}}}}}}}{{{{{{{\mathcal{P}}}}}}}}.$$

Thereby the dynamics are described by the Zeno Hamiltonian \({{{{{{{{\boldsymbol{H}}}}}}}}}_{{{{{{{{\boldsymbol{Z}}}}}}}}}={{{{{{{\mathcal{P}}}}}}}}{{{{{{{\boldsymbol{H}}}}}}}}\), where \({{{{{{{\mathcal{P}}}}}}}}\) acts element-wise on the vector \({{{{{{{\boldsymbol{H}}}}}}}}={({H}_{1},\ldots ,{H}_{m})}^{{\mathsf{T}}}\). The limiting dynamics of L blocks is the product of these limits.

If the elements of \({\{{H}_{j}\}}_{j = 1}^{m}\) pairwise commute, then there is no Trotter error, and α = 1 without the need to halve δ. The limiting dynamics follows trivially as well.

Realizing oracles for combinatorial constraints

In this Section, we review the constructions of quantum oracles for implementing polynomial inequality and equality constraints. We use the constructions provided in this Section in the experiments on a trapped-ion quantum computer described in the Results Section. Since any function on the Boolean cube can be expressed as a polynomial it suffices to only demonstrate constructions for polynomial constraints71. In addition, since we are considering problems in NPO we can assume the existence of a polyomially-sized classical circuit for evaluating any constraints to sufficient precision. Given that all classical basis gates can be represented as polynomials, we can represent our constraint as the composition of polynomially many polynomial functions. Of course, one could also directly implement the classical circuit in a reversible fashion on a quantum device efficiently. For the remainder of this Section, we consider a polynomial function g:

$$g({{{{{{{\boldsymbol{b}}}}}}}})=\mathop{\sum }\limits_{k=1}^{K}{d}_{k}\mathop{\prod}\limits_{l\in {S}_{k}}{b}_{l},$$

where Sk [n] and \({d}_{k}\in {\mathbb{R}}\). In addition for \({S}_{k}=\varnothing\), \({\prod }_{l\in {S}_{k}}{b}_{l}:= 1\).

Without loss of generality we can assume that equality constraints are of the form g(b) = 0 and inequality constraints are of the form g(b) ≥ 0. We assume that there exists an oracle that computes the value of g(b) into a quantum register (constructions of such oracles are briefly reviewed in the Methods Section). For an equality constraint, we implement the constraint-enforcing measurement by simply measuring the entire register. A projection onto the in-constraint subspace implies that we have observed a 0 in the register. For an inequality constraint, we measure the qubit corresponding to the sign, a 0 corresponds to a successful projection, and apply the inverse of the oracle post measurement.

While the above procedure works in general, there are further optimizations that can be made by utilizing quantum conditional logic (QCL). We give an example of such an optimization in the Results Section. Further optimizations are possible for double-sided inequalities of the form 0 ≤ g(b) < a, where a is a power of 2. To implement the measurement corresponding to this double-sided inequality, we only need to measure higher-order bits. Since the results of these high-order bits are now classical, we can replace the part of the inverse-oracle circuit controlled on these bits with classically-conditioned single-qubit gates. Lastly, because all constraint-preserving measurements can be implemented separately and thus auxiliary qubits can be reused, the required number of auxiliary qubits to implement all constraint-preserving measurements is equal to the maximum amount of auxiliary qubits required by any oracle call.

In the subsections that follow, we present efficient constructions of oracles that can be used to implement polynomial functions. Both of these use techniques that have been presented in prior work. Here we include a brief review for completeness and present the resource analysis for our setting.

Review of classical reversible arithmetic circuits

The design of reversible versions of classical arithmetic circuits has been extensively explored and highly optimized constructions are available72,73,74. Such constructions allow one to implement unitary operations for performing arithmetic on quantum registers. Consider fixed-point arithmetic of m bits including digits both before and after the decimal point. Suppose polynomial g has K terms. For each coefficient dk, we require an n-qubit controlled m-bit adder. A controlled m-bit adder can be implemented with O(m) T gates75. Since a multi-controlled Toffoli can be implemented with a T count of O(n)76,77 and thus the overall multi-controlled adder can be implemented with a T count of O(n + m). The T count for implementing g is O(K(n + m)).

Review of quantum Fourier arithmetic

For smaller quantum devices, a more resource efficient approach is to switch to the Fourier basis using the quantum Fourier transform (QFT) and perform the arithmetic in the Fourier basis. This approach has worse asymptotic complexity in terms of T-gate counts, but requires fewer qubits and CNOT gates. We use this approach in the hardware experiments discussed in the Results Section. The discussion in this Section is based on ref. 21, though the idea of using the QFT for quantum arithmetic is well-known, see e.g.78,79,80.

For s [2m], the QFT on \({{\mathbb{Z}}}_{{2}^{m}}\) is defined as follows:

$${{{{{{{{\rm{QFT}}}}}}}}}_{{2}^{m}}:\left\vert s\right\rangle \mapsto \mathop{\sum}\limits_{k\in [{2}^{m}]}{e}^{-i2\pi ks/{2}^{m}}\left\vert k\right\rangle .$$

It can be shown81 that the right-hand side of (57) is a product state and can be expressed in the following form:

$$\mathop{\bigotimes }\limits_{k=1}^{m}\frac{\left\vert 0\right\rangle +{e}^{-i\pi \frac{s}{{2}^{m-k}}}\left\vert 1\right\rangle }{\sqrt{2}}={F}_{m}\left(\frac{s}{{2}^{m}}\right){\left\vert +\right\rangle }^{\otimes m},$$


$${F}_{m}(\theta ):= \mathop{\bigotimes }\limits_{k=1}^{m}R(\pi {2}^{k}\theta )$$

implements the desired operation. In addition, R(α) denotes the phase gate \(\left\vert 0\right\rangle \left\langle 0\right\vert +{e}^{i\alpha }\left\vert 1\right\rangle \left\langle 1\right\vert\). The angle θ is restricted to \(\left[-\frac{1}{2},\frac{1}{2}\right)\) to avoid overflow and allow for representing negative numbers. Thus, when implementing a polynomial g, we require that its range match the range of θ, i.e., \(\parallel g{\parallel }_{\infty }\le \frac{1}{2}\). This can always be satisfied by scaling g accordingly.

As an example, we can add two integers a and b, with the conditions a, b, a + b { − 2m−1, …0, …, 2m−1 − 1}, as follows:

$${{{{{{{{\rm{QFT}}}}}}}}}_{{2}^{m}}^{{{{\dagger}}} }{F}_{m}\left(\frac{a}{{2}^{m}}\right){F}_{m}\left(\frac{b}{{2}^{m}}\right){\left\vert +\right\rangle }^{\otimes m}=\left\vert a+b\right\rangle .$$

Note, the value in the quantum register is really the two’s complement of a + b. We define the following controlled operation:

$${F}_{m}({{{{{{{\boldsymbol{b}}}}}}}},\theta ):= \left\vert {{{{{{{\boldsymbol{b}}}}}}}}\right\rangle \left\langle {{{{{{{\boldsymbol{b}}}}}}}}\right\vert \otimes {F}_{m}(\theta )+(I-\left\vert {{{{{{{\boldsymbol{b}}}}}}}}\right\rangle \left\langle {{{{{{{\boldsymbol{b}}}}}}}}\right\vert )\otimes I,$$

where \({{{{{{{\boldsymbol{b}}}}}}}}\in {{\mathbb{B}}}^{n}\). For Sk [n], let \({{{{{{{{\boldsymbol{1}}}}}}}}}_{{S}_{k}}\in {{\mathbb{B}}}^{n}\) denote the indicator vector of Sk. The process for (approximately) loading the value of the polynomial (56) into a quantum register is:

$$(I\otimes {{{{{{{{\rm{QFT}}}}}}}}}_{{2}^{m}}^{{{{\dagger}}} })\mathop{\prod }\limits_{k=1}^{K}{F}_{m}({{{{{{{{\boldsymbol{1}}}}}}}}}_{{S}_{k}},{d}_{k})\left\vert {{{{{{{\boldsymbol{b}}}}}}}}\right\rangle {\left\vert +\right\rangle }^{\otimes m}=\left\vert {{{{{{{\boldsymbol{b}}}}}}}}\right\rangle \left\vert \tilde{g}({{{{{{{\boldsymbol{b}}}}}}}})\right\rangle ,$$

where by the assumption on the range of g, \(| \tilde{g}({{{{{{{\boldsymbol{b}}}}}}}})-g({{{{{{{\boldsymbol{b}}}}}}}})| \le {2}^{-m}\). The result is stored in an auxiliary quantum register of size O(m). The operation Fm(b, θ) requires mn-controlled rotation gates. Thus overall it requires Kmn-controlled rotation gates. An O(n)-controlled Toffoli can be implemented with O(n) T gates76,77 and each controlled rotation can be ϵ-approximately implemented with \(O(\log (1/\epsilon ))\) T’s82,83. Thus, assuming a fixed rotation-gate approximation error the total cost is O(Kmn).

The operation \({{{{{{{{\rm{QFT}}}}}}}}}_{{2}^{m}}\) requires O(m2) gates to be implemented exactly81 and can be implemented approximately, for a fixed approximation error, on a fault-tolerant device with \(O(m\log (m))\) T gates83. For equality constraints, since we will be measuring the entire register containing the value \(\tilde{g}({{{{{{{\boldsymbol{b}}}}}}}})\), we swap the coherent implementation of the inverse QFT for the semiclassical variant84,85. This semiclassical version of the QFT replaces all two-qubit gates with classically-controlled single qubit gates and requires only a single auxiliary qubit that is repeatedly measured and reset to compute the bits of \(\tilde{g}({{{{{{{\boldsymbol{b}}}}}}}})\). Thus, this approach benefits from both mid-circuit measurements and QCL. A fault-tolerant version of this circuit can be approximately implemented with \(O(m\log (m))\) T gates86. Thus in a fault-tolerant setting the overall T count of the QFT-based approach is \(O(Kmn+m\log (m))\).

Initial state construction

Our proposed approach is flexible with regards to the choice of the initial state, any initial state that is in-constraint suffices. Thus, unlike ref. 20, when using the complete-graph mixer our approach does not require repeated applications of a unitary and its inverse for preparing the uniform superposition of in-constraint states. However, the initial state we use in experiments discussed in the Results Section is the uniform superposition over all computational basis states encoding in-constraint solutions. In general, this superposition is hard to prepare. However, there exist constructions for a wide range of practically relevant cases. If the set of feasible solutions is efficiently indexable, (ref. 24, Section IIIB) gives an efficient procedure for the initial state preparation. In the specific case of a Hamming-weight equality or inequality constraint, the uniform superposition over feasible states is a superposition of Dicke states with corresponding Hamming weights, which can be constructed efficiently87. Since, our technique does not require the state preparation method be reversible, we can make use of repeat-until-success schemes.

Parameter optimization

The Zeno framework we propose works well with standard techniques used to optimize parameterized quantum circuits. Specifically, as long as each Nr is large enough to ensure the desired minimum in-constraint probability is 1 − δ (c.f. Corollary 1) for the given parameter range, the direction of steepest descent will still result in a circuit with the same minimum in-constraint probability. Here we make an assumption that θ remains bounded throughout optimization, which is a valid assumption in practice. This means that both gradient-based and gradient-free local optimization methods can be used with Zeno-augmented hybrid quantum-classical algorithms. A commonly used way to optimize parameterized quantum circuits is to use the parameter-shift rule88,89 in conjunction with a gradient-based optimizer. We now show that the Zeno framework works efficiently with the parameter-shift rule.

We consider the task of finding a minimum-eigenvalue state of an observable M using a parameterized quantum evolution consisting of generating Hamiltonians that are also unitary, e.g. L-VQE. We utilize the measurement scheme presented in Equation (6) with the condition that k, mk = 1. (Following similar arguments as88, Section 3), we obtain

$$\begin{array}{rcl}\frac{\partial }{\partial {\theta }_{r}}{{{{{{{\rm{Tr}}}}}}}}\left\{M{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})\rho {{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}})\right\}&=&\mathop{\sum }\limits_{k=1}^{{N}_{r}}{{{{{{{\rm{Tr}}}}}}}}\left\{{M}_{k}{{{{{{{\mathcal{P}}}}}}}}\frac{{H}_{r}}{{N}_{r}}{e}^{-i\frac{{\theta }_{r}}{{N}_{r}}{H}_{r}}{\rho }_{k}+\,{{\mbox{h.c.}}}\,\right\}\hfill\\ &=&\frac{1}{{N}_{r}}\mathop{\sum }\limits_{k=1}^{{N}_{r}}\left[{{{{{{{\rm{Tr}}}}}}}}\left\{M{{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{+(r,k)}\rho {{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{{{{\dagger}}} ,+(r,k)}\right\}-{{{{{{{\rm{Tr}}}}}}}}\left\{M{{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{-(r,k)}\rho {{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{{{{\dagger}}} ,-(r,k)}\right\}\right],\end{array}$$

where Mk and ρk contain terms that have not been differentiated, and \({{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{\pm (r,k)}\) is the same as \({{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})\) except that the evolution at the \(\mathop{\sum }\nolimits_{t = 1}^{r-1}{N}_{t}+k\)-th step has a phase shift of \(\pm \frac{\pi }{4{N}_{r}}\). Thus, whereas the normal parameter-shift requires two expectation evaluations per parameter, Zeno would require 2Nr. This is the same additional overhead as in the case of a circuit with gates that share parameters.

It also easy to see that the gradient is biased towards minimizing \({M}_{{{{{{{{\mathcal{F}}}}}}}}}={P}_{{{{{{{{\mathcal{F}}}}}}}}}M{P}_{{{{{{{{\mathcal{F}}}}}}}}}\), i.e. the in-constraint Hamiltonian, as follows:

$$\frac{\partial }{\partial {\theta }_{r}}{{{{{{{\rm{Tr}}}}}}}}\left\{M{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})\rho {{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}})\right\}=\, {\Pr }_{{{{{{{{\mathcal{F}}}}}}}}}\frac{\partial }{\partial {\theta }_{r}}{{{{{{{\rm{Tr}}}}}}}}\left\{{M}_{{{{{{{{\mathcal{F}}}}}}}}}\frac{{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})\rho {{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}})}{{\Pr }_{{{{{{{{\mathcal{F}}}}}}}}}}\right\}\\ +{\Pr }_{{{{{{{{\mathcal{G}}}}}}}}}\frac{\partial }{\partial {\theta }_{r}}{{{{{{{\rm{Tr}}}}}}}}\left\{{M}_{{{{{{{{\mathcal{G}}}}}}}}}\frac{{{{{{{{{\mathcal{U}}}}}}}}}_{Z}({{{{{{{\boldsymbol{\theta }}}}}}}})\rho {{{{{{{{\mathcal{U}}}}}}}}}_{Z}^{{{{\dagger}}} }({{{{{{{\boldsymbol{\theta }}}}}}}})}{{\Pr }_{{{{{{{{\mathcal{G}}}}}}}}}}\right\},$$

where \({\Pr }_{{{{{{{{\mathcal{F}}}}}}}}}\) is the probability of projecting onto \({{{{{{{\mathcal{F}}}}}}}}\) when measuring the parameterized evolution with \({{{{{{{\mathcal{P}}}}}}}}\). Lastly, Corollary 1 can be used to ensure \({\Pr }_{{{{{{{{\mathcal{F}}}}}}}}} \, > \, 1-\delta\).