Introduction

Combinatorial optimization is one of the most promising applications of quantum computers due to its broad applicability in science and industry and the availability of promising quantum algorithms with the potential for speedups over the classical state-of-the-art1,2. A leading quantum algorithm for combinatorial optimization is the quantum approximate optimization algorithm3,4 and its generalization, quantum alternating operator ansatz (QAOA)5. QAOA solves the optimization problem by preparing a parameterized quantum state using a quantum circuit consisting of layers of phase and mixing (mixer) operators applied in alternation, with parameters optimized to extremize a chosen measure of solution quality. QAOA has promising applications in optimization6,7,8,9,10,11,12, finance13,14 and machine learning15,16,17, and has been adapted to be applicable to quantum chemistry18. Among the numerous QAOA variants that have been introduced, our focus is on the widely-studied local Hamiltonian-based QAOA (LH-QAOA)5. In this variant, the phase and mixing operators correspond to the time evolution under phase Hamiltonian HP and mixing Hamiltonian HM, with HM being the sum of polynomially many local terms. Note that HP does not have to be local. In the remainder of the paper, we use QAOA to refer to LH-QAOA.

QAOA has an important connection to adiabatic quantum algorithm (AQA)19,20. AQA prepares ground states of Hamiltonians by performing a slow interpolation between an easy-to-prepare ground state of some simple Hamiltonian and the ground state of the target Hamiltonian. The speed of interpolation is governed by the minimum spectral gap of the instantaneous system Hamiltonian during the evolution. AQA can be applied to optimization problems by choosing an appropriately constructed diagonal Hamiltonian as the target. If the alternating operators in QAOA are chosen to be time evolution with the target and a simple Hamiltonian (e.g., the commonly used transverse field Hamiltonian), and the initial state is set to be the ground state of the simple Hamiltonian, QAOA can approximate the AQA evolution with an approximation error that depends on the number of the alternating layers (QAOA depth).

While the connection between AQA and QAOA is simple and well-known, QAOA is typically used with parameters that are different from the AQA schedule and with a depth that is too small to approximate AQA meaningfully. This creates ambiguity regarding the extent to which the QAOA mechanism is related to AQA, as well as how to leverage the techniques for boosting AQA performance in the QAOA setting. In this paper, we show that in one important aspect, the lessons of the adiabatic algorithm indeed apply clearly. Namely, we show that QAOA performance is improved if the ground state of HM and the initial state are aligned. Specifically, we show that QAOA gives better performance when the initial state matches the ground state of HM compared to other setups. This choice of initial state and HM also aligns with that in the AQA. We refer to this setup as initial-mixer alignment or alignment for short.

We note that in some cases the mixer as well as the ground state of HM are difficult to implement on the quantum hardware. Therefore is may be desirable to use alternative initial states and mixers that are not well-aligned but are easy to implement. Moreover, previous studies have found that the performance of QAOA may be improved by carefully preparing a ‘warm-start’ initial state different from the ground state of HM21,22. In general, it may be hard to modify the mixer to make sure that the warm-start initial state is exactly the ground state of the mixer. These examples motivate the current study of misaligned combinations of mixer and initial state.

In this work, we study QAOA with Hamming-weight-preserving XY mixers6, where the mixing operator is a time evolution governed by Heisenberg XY models23. This variant of QAOA is of particular interest as evidence suggests that it has the potential to provide exponential speedup over unstructured search on certain problems24. We choose the various XY models as the constraint-preserving mixing Hamiltonian in QAOA: ring-XY, complete-XY, and several XY models with arbitrary connectivity. We apply QAOA with XY mixers to the portfolio optimization problem with an equality constraint on the portfolio size, which corresponds to a constraint on the Hamming weight of the binary string. This is a well-studied toy financial problem, which is commonly considered as a benchmark for quantum optimization heuristics14,25,26,27,28,29. Such heuristic algorithms often aim at approximately solving the portfolio optimization problem with a goal of maximizing the approximation ratio. We will follow the same convention and use approximation ratio as the primary metric for evaluating the performance of QAOA. To quantify the impact of alignment on QAOA performance, we design two sets of numerical experiments. First, we compare the performance of various initial-mixer pairs. Here, we isolate the impact of alignment by considering the exact implementation of the mixer. Second, we fix the initial state as the ground state of HM and implement the XY mixer with various fidelities by varying the step of Trotter approximation. This setup highlights the practical considerations when implementing complex mixers with non-commuting terms in HM.

The main conclusion of the paper is illustrated in Fig. 1. Our results show that, in most cases, the alignment boosts the QAOA performance. The only exception is when the mixing XY Hamiltonian is relatively simple, e.g., a chain or ring. In these cases, QAOA performance is more robust under Trotter approximation error. However, given a high Trotter error, a more accurate Trotter approximation still improves the QAOA performance. Across all simulations, we observe a consistent trend in QAOA with both low and high depth. The first set of results clearly shows the alignment effect without considering the circuit implementation.

Fig. 1: An overview of the results.
figure 1

We show that the QAOA performance depends on the alignment between the initial state \(\left\vert {\psi }_{0}\right\rangle\) of QAOA and the ground state of the mixing Hamiltonian HM. Right: the approximation ratios (ARs) obtained by QAOA applied to constrained portfolio optimization with N = 6 assets and Hamming weight constraint K = 3 with the complete-XY mixer and the initial state set to be the ground state of complete-XY (`Aligned') and ring-XY (`Misaligned') mixing Hamiltonian. The error bars represent the standard error of the mean approximation ratio estimated from 10 problem instances of portfolio optimization.

While the improvement in performance from alignment is consistent across the many settings we consider, its absolute effect is relatively small. Therefore, when executing on noisy intermediate-scale quantum era (NISQ) devices, enhancing the alignment at the cost of increased circuit depth is unlikely to improve the results significantly. To illustrate this observation, we apply QAOA with ring-XY mixer to portfolio optimization using all 32 qubits of the Quantinuum H2-1 trapped-ion processor. We observe that even the step-1 Trotter approximation of the ring-XY mixer gives a high-quality solution on hardware, and further improvements in alignment do not significantly increase the solution quality on hardware. This contrasts with the noiseless case, where a more accurate Trotter approximation results in better performance.

To the best of our knowledge, this is also the first study of the impact of mixer Trotterization on QAOA performance. Recent works have developed various techniques to improve the performance of QAOA. Specifically, non-standard initial states have been used, such as the ‘warm-start’ initial state constructed using a solution produced by a classical solver21,22 abd the randomly sampled computational basis states with a given Hamming weight for QAOA with Hamming-weight preserving mixers30. In addition, alternative ansätze have also been proposed, such as initial-state-dependent custom mixers for warm-started QAOA31. Here, we do not aim to propose an optimal ansatz with minimal depth but try to systematically demonstrate the mechanism that the alignment effect from the adiabatic theorem applies to QAOA in the low-depth regime. Therefore, our study will not include techniques in which the initial state and the ground state of the mixing Hamiltonian are purposefully not aligned, such as the ones mentioned above. Beyond the investigation of the QAOA mechanism, we also discuss the techniques we used to improve the convergence of local QAOA parameter optimizers, which may be of independent interest. We note that our results are expected to apply broadly beyond the particular problem considered and may be particularly impactful in applications where the target ground state must be prepared with high fidelity, such as in quantum chemistry18.

Results

Background

In this section, we will briefly review the relevant technical background around the portfolio optimization problem, the quantum alternating operator ansatz (QAOA) and the adiabatic quantum algorithm (AQA). We will also discuss parameter optimization for QAOA and the connection between QAOA and AQA.

Portfolio optimization problem. We focus on the mean-variance portfolio optimization problem32 with objective f given by

$$\begin{array}{c}\mathop{\min }\limits_{{{{\bf{x}}}}\in {\{0,1\}}^{N}}f({{{\bf{x}}}})=q{{{{\bf{x}}}}}^{T}{{{\bf{W}}}}{{{\bf{x}}}}-{{{{\boldsymbol{\mu }}}}}^{T}{{{\bf{x}}}},\\ {{{\rm{s}}}}.{{{\rm{t}}}}.\quad {{{{\bf{1}}}}}^{T}{{{\bf{x}}}}=K,\end{array}$$
(1)

where x {0, 1}N denotes a vector of binary decision variables indicating whether a given asset is included in (1) or excluded from (0) the portfolio, \({{{\boldsymbol{\mu }}}}\in {{\mathbb{R}}}^{N}\) denotes the vector of expected returns for the assets, \({{{\bf{W}}}}\in {{\mathbb{R}}}^{N\times N}\) is the covariance matrix between N assets and q > 0 is a risk factor to balance the importance of risk and return in the objective. The equality constraint corresponds to a fixed budget requiring the manager to pick exactly K assets. This equality constraint is also called a Hamming-weight-preserving constraint since it restricts the Hamming weight of x to a constant K.

Quantum alternating operator ansatz (QAOA). In order to apply QAOA for solving (1), we must define the Hamiltonians in the two alternating operators: HP encoding the classical objective function and HM mixing the probability amplitudes while preserving the constraints.

To encode the objective function (1), we construct a diagonal Hamiltonian HP = diag(f(x)) by mapping each binary variable xi to a quantum spin using xi → (I − Zi)/2, giving

$${{{{\bf{H}}}}}_{P}=\frac{1}{2}q\mathop{\sum}\limits_{i < j}{W}_{ij}{{{{\bf{Z}}}}}_{i}{{{{\bf{Z}}}}}_{j}-\frac{1}{2}\mathop{\sum}\limits_{i}\left(q\mathop{\sum}\limits_{j}{W}_{ij}-{\mu }_{i}\right){{{{\bf{Z}}}}}_{i}+c,$$
(2)

where \(c=\frac{1}{2}{\sum }_{i}(q{\sum }_{j = i}{W}_{ij}-{\mu }_{i})\) is a constant. We denote the time evolution under HP given by \({e}^{-i\beta {{{{\bf{H}}}}}_{P}}\) as the phase operator.

In order to enforce the Hamming weight constraint on the quantum state, we follow refs. 5,6 and use the Heisenberg XY model as the mixing Hamiltonian

$${{{{\bf{H}}}}}_{M}:= {{{{\bf{H}}}}}_{S}^{{{{\rm{XY}}}}}=\mathop{\sum}\limits_{(i,j)\in S}{{{{\bf{X}}}}}_{i}{{{{\bf{X}}}}}_{j}+{{{{\bf{Y}}}}}_{i}{{{{\bf{Y}}}}}_{j},$$
(3)

where S is a set of index pairs describing the interaction among qubits, X and Y are the Pauli matrices. We denote the time evolution under \({{{{\bf{H}}}}}_{S}^{{{{\rm{XY}}}}}\) given by \({e}^{-i\beta {{{{\bf{H}}}}}_{S}^{{{{\rm{XY}}}}}}\) as the XY mixer. The performance and the implementation difficulty of an XY mixer depend on the choice of the connectivity defined by S. Two commonly used XY models are ring-XY and complete-XY. Specifically, the ring-XY model includes one-dimensional nearest-neighbor interactions with a periodic boundary condition, i.e.,

$${S}_{{{{\rm{ring}}}}}=\left\{(i,j)\,| \,j=(i+1)\,{{\mathrm{mod}}}\,\,N;i\in [N]\right\}.$$

On the other hand, the complete-XY model contains interactions between all pairs of qubits, i.e.,

$${S}_{{{{\rm{complete}}}}}=\left\{(i,j)\,| \,i \,<\, j;i,j\in [N]\right\}.$$

It is easy to see that the evolution with the XY mixers preserves the Hamming weight. In other words, if we start from a superposition of states of Hamming weight K, the measurement outcomes of the final state are also guaranteed to have Hamming weight K.

For a given pair of HP and HM, QAOA with depth p consists of the following three steps. First, QAOA prepares a feasible initial state \(\left\vert {{{{\boldsymbol{\psi }}}}}_{0}\right\rangle\). Then the phase operator and the mixer are applied p times to obtain the state

$$\left\vert {{{\boldsymbol{\psi }}}}({{{\boldsymbol{\gamma }}}},{{{\boldsymbol{\beta }}}})\right\rangle ={e}^{-i{\beta }_{p}{{{{\bf{H}}}}}_{M}}{e}^{-i{\gamma }_{p}{{{{\bf{H}}}}}_{P}}\ldots {e}^{-i{\beta }_{1}{{{{\bf{H}}}}}_{M}}{e}^{-i{\gamma }_{1}{{{{\bf{H}}}}}_{P}}\left\vert {{{{\boldsymbol{\psi }}}}}_{0}\right\rangle ,$$
(4)

where γ and β are vectors of free parameters obtained using some classical procedure. Finally, the state \(\left\vert {{{\boldsymbol{\psi }}}}({{{\boldsymbol{\gamma }}}},{{{\boldsymbol{\beta }}}})\right\rangle\) is measured in the computational basis to obtain solutions to the original problem.

QAOA is typically used as a hybrid quantum-classical algorithm wherein a classical optimizer is used to optimize the parameters γ and β to minimize the energy of HP. We denote it as unrestricted optimization:

$$\mathop{\min }\limits_{{{{\boldsymbol{\gamma }}}},{{{\boldsymbol{\beta }}}}}\langle {{{\boldsymbol{\psi }}}}({{{\boldsymbol{\gamma }}}},{{{\boldsymbol{\beta }}}})| {{{{\bf{H}}}}}_{P}| {{{\boldsymbol{\psi }}}}({{{\boldsymbol{\gamma }}}},{{{\boldsymbol{\beta }}}})\rangle .$$
(5)

Usually, the parameter optimization is nontrivial since the energy landscape is known to contain many local optima. Many advanced methods have been developed for QAOA training33,34,35,36. We will discuss our techniques for accelerating the parameter optimization in the method section.

When p is large, parameter optimization becomes hard. Restricting the QAOA parameters can allow faster parameter optimization for large p. For example, it has been shown that good solution quality can be achieved with reduced optimization complexity by using a linear ramp schedule for the QAOA parameters given by4,18,37,38,39

$$\gamma (l)=\Delta l,\quad \beta (l)=\Delta (1-l),$$
(6)

where Δ is a constant and l (0, 1). In QAOA with depth p, the linear schedule may be applied with the QAOA parameters for each layer set as follows:

$$\begin{array}{l} {\gamma }_{i}=\gamma ({l}_{i}),{\beta }_{i}=\beta ({l}_{i})\quad {{{\rm{with}}}}\\ {l}_{i}=\frac{i}{p+1},\quad \forall i=1,2,\ldots ,p.\end{array}$$
(7)

With Δ > 0 and p → , the QAOA approaches the adiabatic limit40. In this case, if the initial state is the ground state of HM, the resulting final state will converge to the ground state of HP18.

The linear schedule can be optimized by setting Δ to be a free parameter39. We denote this setting as the optimized linear schedule (OLS). Specifically, in QAOA with OLS, we fix the depth to a large value (e.g., p = 100) and optimize Δ to minimize the energy:

$$\mathop{\min }\limits_{\Delta }\langle {{{\boldsymbol{\psi }}}}({{{\boldsymbol{\gamma }}}}(\Delta ),{{{\boldsymbol{\beta }}}}(\Delta ))| {{{{\bf{H}}}}}_{P}| {{{\boldsymbol{\psi }}}}({{{\boldsymbol{\gamma }}}}(\Delta ),{{{\boldsymbol{\beta }}}}(\Delta ))\rangle .$$
(8)

Compared with (5) with 2p variables, QAOA with OLS has only one free parameter Δ to optimize regardless of p, and hence is much easier to search for the optimum. Given the effectiveness of OLS in large-depth QAOA where regular QAOA parameter optimization becomes intractable18,41, we include QAOA with OLS in this study.

Adiabatic quantum algorithm (AQA). AQA20,42 prepares the ground state of some target Hamiltonian by performing adiabatic evolution. Specifically, it proceeds from an initial Hamiltonian whose ground state is easy to prepare to a final Hamiltonian whose ground state encodes the solution to the computational problem.

For a system evolving under a time-dependent Hamiltonian H(t), its time-evolution is governed by the Schrödinger equation

$$i\frac{\partial \left\vert {{{\boldsymbol{\psi }}}}(t)\right\rangle }{\partial t}={{{\bf{H}}}}(t)\left\vert {{{\boldsymbol{\psi }}}}(t)\right\rangle .$$
(9)

The quantum adiabatic theorem guarantees that if the initial state \(\left\vert {{{\boldsymbol{\psi }}}}(0)\right\rangle\) is the ground state of H(0) and H(t) varies sufficiently slowly with t, the quantum state \(\left\vert {{{\boldsymbol{\psi }}}}(t)\right\rangle\) will remain in the ground state of the instantaneous Hamiltonian H(t) for all t.

Connection between QAOA and AQA. QAOA has important connections to AQA and its non-adiabatic variant. ref. 43 shows that with a sufficiently large depth, QAOA with optimal angles can become a digitization of quantum annealing. ref. 44 applies optimal control theory to solving the protocol for controlling the Hamiltonian evolution in both quantum annealing and QAOA. The optimal QAOA parameter schedule matches the optimal control protocol for AQA45. ref. 46 shows that both AQA with tuned scaling and QAOA with an optimal control protocol can solve a quantum linear system problem. An analog version of the QAOA by parameterizing and optimizing the schedule function is proposed in ref. 47. ref. 48 shows the possibility of running QAOA in a customized device with the digital analog paradigm. ref. 49 derives the lower bound of annealing time beyond the adiabatic regime.

While adiabatic evolution is a promising approach for quantum optimization, it suffers from potential non-adiabatic transitions between eigenstates of the system at time points where the Hamiltonian has small energy gaps and it is often infeasible for near-term devices due to noise and limited coherence times. Counterdiabatic driving is a method that compensates for the non-adiabatic effects by adding an additional term to the evolved Hamiltonian. The counterdiabatic evolution has been shown to improve adiabatic quantum optimization in50. In addition, the counterdiabatic term and counterdiabatic-inspired ansatz have also been found to benefit QAOA performance51,52,53,54.

Motivated by the connection between QAOA and AQA, the initial state \(\left\vert {{{{\boldsymbol{\psi }}}}}_{0}\right\rangle\) is typically set to be the ground state of the mixing Hamiltonian HM3. However, in many cases, either the ground state of HM or HM itself is difficult to implement exactly. The behavior in the adiabatic regime suggests that if HM or the initial state is not implemented exactly (meaning that the initial state is not aligned with the ground state of HM), the QAOA performance may be affected. However, the impact of such alignment on the performance of low-depth QAOA has received little attention to date. In this work, we systematically study this alignment effect and demonstrate that it can significantly benefit QAOA performance far from the adiabatic limit, even in the low-depth regime.

In this section, we describe the results from numerical simulations applying QAOA to ten portfolio optimization problem instances with the number of assets N = 6. Unless otherwise specified, we set p = 1, 2, 3 for unrestricted QAOA given by Eq. (5) and set p = 100 for QAOA with OLS (8). All the circuits were simulated using the qiskit_aer_statevector simulator. To optimize both QAOA parameter schedules, we use the BFGS optimizer built in the Scipy55 package, running with multiple initial guesses (50-250 depending on the problem dimension).

Given a solution x (portfolio selection) to the problem, we use approximation ratio (AR) to quantify the quality of the solution, defined as

$${{{\rm{AR}}}}({{{\bf{x}}}})=\left\{\begin{array}{ll}\frac{f({{{\bf{x}}}})-{f}_{\max }}{{f}_{\min }-{f}_{\max }},\quad &{\sum }_{i}{x}_{i}=K,\\ 0,\quad &{\sum }_{i}{x}_{i}\,\ne \,K,\end{array}\right.$$
(10)

where \({f}_{\min }\) and \({f}_{\max }\) are the maximum and minimum value of f(x) among all feasible portfolios, i.e.,

$$\begin{array}{l}{f}_{\min }\,=\,\mathop{\min }\limits_{{\sum }_{i}{x}_{i}=K}f({{{\bf{x}}}}),\\ {f}_{\max }\,=\,\mathop{\max }\limits_{{\sum }_{i}{x}_{i}=K}f({{{\bf{x}}}}).\end{array}$$
(11)

Alignment effect with exact mixers

To investigate the alignment effect between the initial state and the ground state of HM, we conduct numerical simulations comparing circuits with different pairs of initial states and exact mixers. Our simulations studied various XY mixers, including the exact ring-XY mixer, complete-XY mixer, and arbitrary mixers that will be explained later. The exact mixers are implemented by a unitary operator constructed from directly exponentiating the corresponding mixing Hamiltonian. Correspondingly, we prepare the initial state by assigning it as the ground state of a mixing Hamiltonian.

Exact ring-XY and complete-XY mixers. We first look at the exact ring and complete mixers for comparison. We separate the results by the combination of initial state and mixer type, and we label such combinations by ‘S-H’ pairs. For example, we use Scomplete − Hring to denote that the initial state is the ground state of a complete-XY mixing Hamiltonian whereas the mixing Hamiltonian is a ring-XY model.

As shown in Fig. 2, for all p studied, the Scomplete − Hcomplete pair gives significantly better AR than the Scomplete − Hring pair, which aligns with the results reported in a previous study30. Similarly, the Sring − Hring pair performs better than the Sring − Hcomplete pair. This indicates that alignment between the initial state and the ground state of HM improves the QAOA performance for these cases, enabling the algorithm to converge more effectively to high-quality solutions.

Fig. 2: Comparisons of the exact ring- and complete-XY mixers in QAOA with unrestricted optimization at p = 1, 2, 3 and with the OLS method at p = 100.
figure 2

We reported the mean approximation ratios over 10 instances with N = 6 and K = 3. The error bars represent standard errors of the mean. The alignment enhances performance in both low and high-depth QAOA.

We note that it is not meaningful to directly compare the solution AR given by the unrestricted QAOA (5) with that from QAOA with OLS (8) since they have different depths and parameter schedules. The OLS method will gradually converge to the global optimum with a high enough depth (as shown in Fig. 3). The performance of the linear schedule could be less regular at a relative small p (like p = 100 and 200 for Scomplete − Hcomplete), which is referred as the ridge region in41. For more detailed discussions on the linear parameter schedule, we refer the readers to ref. 41. We only show the p = 100 results in Fig. 2 and in the following sections as a sanity check to demonstrate that the alignment effect holds in QAOA with both a low and high depth. We also note that the performance improvement does not result from the warm-start effect, as the Scomplete − Hcomplete pair consistently outperforms the Sring − Hcomplete pair, and the Sring − Hring pair also consistently performs better than the Scomplete − Hring pair. This means that given different mixers, there is no such a fixed best initial state.

Fig. 3: An example demonstrating the convergence of the OLS method for QAOA with exact mixers with the instances from Fig. 2.
figure 3

For both mixers, the initial states are aligned. As the QAOA depth increases, the final state of the OLS method (8) will gradually converge to the ground state of the problem Hamiltonian HP. The complete-XY mixer needs a larger depth to converge with the OLS schedule. The error bars represent standard errors of the mean approximation ratios.

Exact mixers with arbitrary connectivity

Next, we investigate the impact of alignment on some XY mixers that have arbitrary connectivity beyond ring and complete. XY models can be viewed as graphs with edges (i, j) representing the indices (i, j) of the interacting qubits. To satisfy the Hamming wight constraint of a solution, we have significant freedom to select edges and construct different variants of XY model. Here, we introduce an option for constructing mixers by selecting chains.

For a complete-XY Hamiltonian with N qubits (suppose N is even), we can decompose the interaction terms as a summation of N/2 chains:

$${{{{\bf{H}}}}}_{{S}_{{{{\rm{complete}}}}}}^{{{{\rm{XY}}}}}=\mathop{\sum }\limits_{v=1}^{N/2}{{{{\bf{H}}}}}_{{C}_{v}}^{{{{\rm{XY}}}}}=\mathop{\sum }\limits_{v=1}^{N/2}\mathop{\sum}\limits_{(i,j)\in {C}_{v}}{{{{\bf{X}}}}}_{i}{{{{\bf{X}}}}}_{j}+{{{{\bf{Y}}}}}_{i}{{{{\bf{Y}}}}}_{j},$$
(12)

where Cv is the set of qubit indices in a chain. Inspired by the above decomposition, we can construct different XY mixing Hamiltonians by selecting a subset of chains from the complete graph. Notably, there are many possible ways to decompose a complete graph into chains, each of which may have different implications for QAOA performance. In our simulations, we arbitrarily select a decomposition of a 6-node complete graph into 3 chains as shown in Fig. 4. This approach allows us to compare the performance of QAOA with and without initial-mixer alignment across a range of XY mixers. We expect our conclusions to hold for other XY mixers as well.

Fig. 4: An example of the six-qubit complete-XY model.
figure 4

The complete graph is constructed by three separate chains, denoted by different colors and line styles.

Figure 5 illustrates the comparison of results from XY mixers constructed using the decomposition in Fig. 4, with the x-axis and y-axis indicating the initial state and mixing Hamiltonian labels, respectively. For example, the label S12 on the x-axis indicates that the initial state is the ground state of the mixing Hamiltonian built with chain-1 and chain-2, while the label H12 on the y-axis indicates that the mixing Hamiltonian is built with chain-1 and chain-2. Our results show that the ARs of diagonal pairs (i.e., with initial-mixer alignment) are significantly better than the non-diagonal pairs (without initial-mixer alignment), complementing the results observed for the exact ring-XY and complete-XY mixers. It suggests that the alignment effect applies to a wide range of XY mixers. Specifically, we found that the alignment effect is more pronounced for simpler mixers with less connectivity, such as those constructed using a single chain.

Fig. 5: Comparisons of exact XY-mixers in QAOA with unrestricted optimization at p = 2 and with OLS at p = 100.
figure 5

The heatmaps display the average AR over the 10 instances considered with N = 6 and K = 3. The mixers are constructed using one or two chains, as shown in Fig. 4. The alignment improves performance in both low and high-depth QAOA, as the diagonal pairs outperform others in the corresponding row and column.

Alignment effect with trotterized mixers

Next, we explore the alignment effect in the practical circuits. To achieve it, we must decompose the mixing operator \({e}^{-i\beta {{{{\bf{H}}}}}_{M}}\) into a series of 1-qubit and 2-qubit gates. One of the widely used approaches is Trotterization. In the following, we will first describe the Trotterization procedure for various XY mixers, and discuss how the resulting Trotter error can impact the quality of the solution.

Mixer Trotterization. For ease of notation, we define XYi = XiXj + YiYj, where \(j=(i+1)\,{{\mathrm{mod}}}\,\,N\). For a one-dimensional (i.e., chain or ring) mixer, we use the popular parity partition strategy to Trotterize it:

$${e}^{-i\beta {\sum }_{i}{{{{\rm{XY}}}}}_{i}}\approx {\left[\mathop{\prod}\limits_{j{{{\rm{is}}}}{{{\rm{odd}}}}}{e}^{-i\frac{\beta }{T}{{{{\rm{XY}}}}}_{j}}\mathop{\prod}\limits_{j{{{\rm{is}}}}{{{\rm{even}}}}}{e}^{-i\frac{\beta }{T}{{{{\rm{XY}}}}}_{j}}\right]}^{T},$$
(13)

where T denotes the number of Trotter steps, called the Trotter number.

When constructing a mixer using multiple XY chains, we apply a two-level approximation strategy. Firstly, we Trotterize the chains in sequential order with a Trotter number T1. Secondly, we apply the parity partition strategy within each chain with a Trotter number T2. Given a mixing Hamiltonian constructed by k chains, \(\mathop{\sum }\nolimits_{v = 1}^{k}{{{{\bf{H}}}}}_{{{{{\rm{C}}}}}_{v}}^{{{{\rm{XY}}}}}\), we approximate its unitary as follows:

$$\begin{array}{l}{e}^{-i\beta \mathop{\sum }\nolimits_{v = 1}^{k}{{{{\bf{H}}}}}_{{{{{\rm{C}}}}}_{v}}^{{{{\rm{XY}}}}}}\approx {\left[\mathop{\prod }\limits_{v = 1}^{k}{e}^{-i\frac{\beta }{{T}_{1}}{{{{\bf{H}}}}}_{{{{{\rm{C}}}}}_{v}}^{{{{\rm{XY}}}}}}\right]}^{{T}_{1}}\quad {{{\rm{with}}}}\\ {e}^{-i\frac{\beta }{{T}_{1}}{{{{\bf{H}}}}}_{{{{{\rm{C}}}}}_{v}}^{{{{\rm{XY}}}}}}\approx {\left[\mathop{\prod}\limits_{j = {{{\rm{odd}}}}}{e}^{-i\frac{\beta }{{T}_{1}{T}_{2}}{{{{\rm{XY}}}}}_{j}}\mathop{\prod}\limits_{j = {{{\rm{even}}}}}{e}^{-i\frac{\beta }{{T}_{1}{T}_{2}}{{{{\rm{XY}}}}}_{j}}\right]}^{{T}_{2}}.\end{array}$$
(14)

The choices of T1 and T2 control the Trotter error. Given a Hamiltonian H = H1 + H2 with evolution time t, the commutator-type error bound for its first-order Trotter approximation is as follows56:

$$\parallel {e}^{it{{{\bf{H}}}}}-{e}^{it{{{{\bf{H}}}}}_{1}}{e}^{it{{{{\bf{H}}}}}_{2}}{\parallel }_{2}\le \frac{{t}^{2}}{2}\parallel \left[{{{{\bf{H}}}}}_{1},{{{{\bf{H}}}}}_{2}\right]{\parallel }_{2}.$$
(15)

Intuitively, the spectral norm of the commutator between two chain Hamiltonians \({{{{\bf{H}}}}}_{{{{{\rm{C}}}}}_{v}}^{{{{\rm{XY}}}}}\) will be significantly larger than the one between the two parity-partitioned parts of one chain Hamiltonian. Therefore, in our implementation, we fix T2 = 1 and adjust T1 to control the approximation accuracy. Generally, a T-step Trotterization will enlarge the mixer circuit for T times. However, using a large Trotter number can be computationally costly, even in simulation. Therefore, we implement the Trotterized mixer operators with steps up to six, which we find is sufficient for our purposes.

In the following simulations analyzing the impact of alignment on QAOA performance, we will fix the initial state as the ground state of the exact mixing Hamiltonian and approximate the mixer operator via different Trotter numbers. While some XY mixers can be implemented exactly, such as the ring-XY mixer can be realized by diagonalization57 or other algebraic compression techniques58, and a 2m-sized complete-XY mixer (with m being a positive integer) can be realized efficiently for a Hamming-weight K = 1 problem6, we chose to use the Trotterization to implement all the mixers in our simulations. The reason for this choice is that the Trotterization is more flexible and allows us to analyze the QAOA performance under mixers with various approximation accuracies more easily. It is worth noting that there also exist other Trotter strategies56,59,60, which are out of the scope of this paper.

Trotterized ring-XY and complete-XY mixers. First, we focus on the Trotterized ring-XY and complete-XY mixers. Fig. 6 shows the QAOA performance under different approximated mixers with low and high QAOA depths. In the case of the complete-XY mixer, an increase in the Trotter number consistently enhances QAOA performance, converging to the Scomplete − Hcomplete results depicted in Fig. 2. For the ring-XY mixer, QAOA performance exhibits greater robustness in terms of the Trotter number. However, an initially more accurate mixer continues to contribute to performance improvement, such as the performance observed when increasing Trotter number from 1 to 2. The results of low-depth QAOA parameter optimization and high-depth linear schedule simulations show a consistent relationship, specifically, their AR results follow the same trend with respect to Trotter number.

Fig. 6: Comparisons of Trotterized ring- and complete-XY mixers in QAOA with unrestricted optimization at p = 1, 2, 3 and with the OLS method at p = 100.
figure 6

We report the mean approximation ratio over 10 instances with N = 6 and K = 3 with error bars denoting the standard errors of the mean estimation. A larger Trotter number consistently results in better performance for the complete mixer. However, for QAOA with an approximated ring-XY mixer, we observe a more robust QAOA performance.

We hypothesize that the distinct behavior observed between Trotterized ring-XY and complete-XY mixers arises from the intricacy of their respective mixing structures. To substantiate this hypothesis, we conduct subsequent numerical experiments employing Trotterized variants of XY mixers.

We also demonstrate the alignment effect in noisy simulation via the Quantinuum’s H2-1 device emulator. Considering the practical circuit implementation, we prepare the circuits with the Dicke state (a uniform superposition over bitstrings with a fixed Hamming weight) and Trotter-step-1 approximated ring-XY and complete-XY mixers. We report both the noisy and noiseless simulation results in Fig. 7. We observe that for this problem with the total number of 2-qubit gates less than 200, given an initial Dicke state, the alignment effect still holds where the complete-XY mixer achieves better performance than the ring-XY mixer in the presence of realistic noise.

Fig. 7: Given an initial Dicke state, the comparisons of Trotter-step-1 approximated ring- and complete-XY mixers in QAOA at p = 1, 2, 3.
figure 7

We report the mean approximation ratio over 10 instances with N = 6 and K = 3 with error bars denoting the standard errors of the mean estimation. Under the noisy simulation, the complete-XY mixer still outperforms the ring-XY mixer, which demonstrates the alignment effect.

Trotterized arbitrary mixers. Next, we analyze the alignment effect for the Trotterized XY mixers. To investigate the impact of mixer structure, we did the same simulations for six XY mixers, including three whose Hamiltonians are built with two chains and three whose Hamiltonians are built with one chain. As illustrated in Fig. 8, we observe consistent trends between the 2-chain mixers and the complete-XY mixer, as well as between the one-chain mixers and the ring-XY mixer. Based on these observations, we argue that when the initial state aligns with the ground state of the exact mixing Hamiltonian, a more precise implementation of the mixing Hamiltonian with complex connectivity leads to improved performance. In contrast, for a less connected mixing Hamiltonian, a Trotterized implementation with a few steps attains optimal performance, which then stabilizes.

Fig. 8: Comparisons of Trotterized XY mixers in QAOA with unrestricted optimization at p = 1, 2, 3 and with OLS method at p = 100.
figure 8

We report the mean approximation ratio over 10 instances with N = 6 and K = 3 with error bars denoting the standard errors of the mean estimation. A larger Trotter step consistently leads to better performance for mixing Hamiltonians built with two chains. In contrast, for the approximated mixing Hamiltonians built with one chain, the performance improves when moving from Trotter step 1 to 2 and then stabilizes.

In summary, our results demonstrate that the alignment effect positively impacts QAOA performance across both low- and high-depth regimes. This observation was validated through the application of exact and Trotterized mixers on various XY mixers.

Experiments on a trapped-ion quantum processor

While the improvement in performance from alignment between the initial state and the ground state of HM is robust in the noiseless simulation, its absolute value is relatively small. Intuitively, this suggests that noise will likely affect it when executed on near-term hardware. We now demonstrate this intuition by executing QAOA with Trotterized ring-XY mixer on Quantinuum H2-1 trapped-ion processor using 32 qubits.

For the ring-XY mixer, the ground state of the exact Hamiltonian can be difficult to prepare in a quantum circuit, especially on noisy hardware. Therefore, in our experiments, we use the Dicke state as a proxy for the ground state of the exact ring-XY Hamiltonian, and use it as the initial state of the QAOA circuit as well as the target state in evaluating the overlap with the effective ground state. The Dicke state is prepared using the divide-and-conquer approach of61. Figure 9 shows that increasing the Trotter number from 1 to 2 improves the fidelity between the Dicke state and the effective ground state, subsequently improving QAOA performance in the noiseless simulation. However, we do not expect this to hold strictly as the Trotter number increases, where an exact ground state would need to be prepared.

Fig. 9: Left: an example of achieving a larger overlap between the Dicke state and the effective ground state of the Trotterized ring mixer at β = 0.5 by increasing the Trotter number from 1 to 2.
figure 9

The Dicke state is prepared with different N values but a fixed K = 3. Right: the quality of QAOA solution in noiseless simulation. The initial state is prepared as the Dicke state with a fixed K = 5, and the ring-XY mixer is approximated with Trotter numbers 1 and 2. For various problem sizes, we consistently observe a performance improvement when transitioning the Trotter number from 1 to 2.

The QAOA circuit is compiled to H2-1 and optimized using pytket62, resulting in the total numbers of 2-qubit gates of 1, 159 for T = 1 and 1, 223 for T = 2. We note that the Dicke state preparation needs 581 CNOT gates. As shown in Fig. 10, we observe that in the hardware experiments, the performance of the T = 2 circuit is worse than the T = 1 one. It validates that the improvement of Trotter approximation can be impacted by the hardware noise. However, the hardware results are still significantly better than the random guess, shedding light on the power of advanced quantum devices. The hardware results can be further improved by performing error mitigation techniques such as symmetry verification by parity checks63,64,65.

Fig. 10: Experimental results for N = 32 in trapped-ion quantum hardware: The AR from the random guess (uniform over all feasible solutions) is 0.7801.
figure 10

For ARs from Trotter-number-1 (T = 1) and Trotter-number-2 (T = 2) QAOA, hardware results are 0.8638 and 0.8424, while noiseless simulator results are 0.8808 and 0.8816. The hardware results, though impacted by noise, are significantly better than the random guess. To evaluate ARs, we post-selected feasible solutions in the hardware experiments, selecting 213 and 172 feasible samples out of 2500 shots in T = 1 and T = 2 experiments. The post-selection ratio is significantly better than a random selection of 0.0047%. The p-value of the independent two-sample t-test for the two groups of feasible samples is 0.0344, indicating a significant difference between the means of the two groups. For visualization purposes, we exclude one outlier with AR < 0.35 in the swarm plot of both T = 1 and T = 2 experiments.

Discussion

In this paper, we demonstrate that the alignment effect is impactful even at very small QAOA depth, suggesting a strong connection between QAOA and adiabatic quantum algorithms. We show the evidence of the alignment effect by studying QAOA performance with various XY mixers in two ways: with the exact mixers and varying initial states, and with a fixed initial state and varying fidelity of the mixer implementation. We use portfolio optimization problems as the benchmark, but we expect the findings to apply broadly to other combinatorial optimization problems. To the best of our knowledge, this is the first study of the impact of Trotter approximation error in mixer implementation on QAOA performance. For simple one-dimensional XY mixers, the QAOA performance is relatively robust to the Trotter error. Meanwhile, for the more complicated XY mixers, a larger Trotter number leads to better performance since the effective ground state is approaching the initial state (the ground state of the exact mixing Hamiltonian).

While we show that better alignment improves performance, for small system sizes accessible numerically the absolute value of the improvement is relatively small. For instance, in hardware experiments on the H2-1 device, we do not observe the anticipated improvement in solution quality when increasing Trotter number from 1 to 2. This highlights the centrality of minimizing the circuit depth when executing QAOA on NISQ devices.

Beyond demonstrating the alignment effect, designing constraint-preserving mixers is of independent interest66,67. In a recent paper66, the authors studied the mixer design from the perspective of a transition matrix. For a Trotterized XY mixer, some transitions between feasible states may be suppressed when the Trotter number is not large enough. However, our results show that QAOA performance is not explicitly related to the transition path. For instance, in the Trotterized complete mixers, all the possible transitions between feasible states have been filled with Trotter number one. Meanwhile, the Trotter number still greatly influences the QAOA performance, as shown in Figs. 6 and 8. This underscores the importance of both the connectivity and probability of transitions in mixer design for achieving high performance in QAOA.

Methods

In this section, we introduce and discuss some implementation details that enabled the simulations presented in the results section.

Problem instances selection

We first generate a pool of portfolio optimization problem instances by randomly generating the mean return vector and covariance matrix using RandomDataProvider in qiskit_finance68. To make the performance of different QAOA variants clearly distinguishable, we intentionally select the ‘hard’ instances. These instances are chosen by roughly examining depth-1 QAOA performance with the initial state set to be the Dicke state and Trotter-step-1 approximated ring-XY and complete-XY mixers. Only instances that have relatively low AR (AR < 0.8 for ring-XY and AR < 0.85 for complete-XY) are included in the benchmark. We choose a total of 10 instances as our benchmark: 5 from QAOA with a Trotterized ring-XY mixer and 5 from QAOA with a Trotterized complete-XY mixer.

Improving the trainability

One of the challenges in solving the portfolio optimization problem (1) with QAOA is that due to the non-integer weights (mean returns and covariances between assets) assigned to the problem Hamiltonian terms, the QAOA objective (5) is not periodic. A larger parameter search space will correspondingly require more initial points in a classical numerical optimizer to converge to a high-quality local optimum. Different orders of γ and β, and consequently different orders of their gradients, can also introduce difficulties to the classical optimizer.

To address it, we multiply the objective function (1) by a rescaling factor λ, which is a predefined instance-dependent constant. Such a rescaling factor does not influence the true solution to the problem, but it allows us to control the search range in the energy landscape and rescale the order of γ and its gradient. In QAOA, it is equivalent to scaling the γ to \({{{{\boldsymbol{\gamma }}}}}^{{\prime} }=\lambda {{{\boldsymbol{\gamma }}}}\). In a numerical optimizer, if we fix a bounded search range of γ, such as \({\left[0,2\pi \right]}^{p}\), scaling by λ is equivalent to extending the search range to \({\left[0,2\lambda \pi \right]}^{p}\). In general, we are not guaranteed to find a global optimum in this fixed interval; in fact, adversarial examples can be constructed with a global optimum far from origin34. However, in practice, we observe that this technique always gives a high-quality local optimum. In this paper, we use the following protocol to select the rescaling factor λ:

  • Implement a QAOA with p = 1 for the specific problem instance.

  • Plot the heatmap of the circuit performance by conducting a grid search over γ and β in a bounded range (e.g., \(\left[0,2\pi \right]\) and \(\left[-\frac{\pi }{2},\frac{\pi }{2}\right]\)). A coarse search grid may be used for efficiency.

  • Select the rescaling factor λ by controlling the range of γ to cover the regions on the heatmap with high-quality local optima.

An example of selecting the rescaling factor is depicted in Fig. 11. The selection of λ is not sensitive to the circuit structures discussed in the results section. Similar protocols for rescaling the QAOA objective have been proposed in Refs. 25,34,69. Similarly to previous results, we observe that using one rescaling factor for all p works well. When the problem size is large, we can make use of some advanced QAOA simulators to obtain the expected energy, such as70,71,72.

Fig. 11: This example demonstrates how to select the rescaling factor for one N = 6 instance.
figure 11

The figures from left to right display the depth-1 QAOA energy landscape with gamma search space [0, 2000π], [0, 100π], and [0, 2π]. By applying a rescaling factor and fixing the search space as [0, 2π], they are equivalent to setting the rescaling factors as 1000, 50, and 1, respectively. In this example, a rescaling factor of 50 encompasses high-quality local minima in the landscape.

To study the alignment effect with Trotterized mixers, we try to explore the performance within the same landscape. To avoid the numerical optimizer driving the solution out of the targeted landscape, we fix the same search range of β for all Trotterized implementations and set a hard boundary constraint on the solved γ. In the regular QAOA parameter optimization, for the circuits with the same setup but different Trotter numbers, a solution from one circuit could be a good initial guess for other circuits.

Initial state preparation

Circuit realization. In our alignment effect simulations, we need to prepare the initial state as the ground state of the corresponding mixing Hamiltonian. However, in general, the state preparation circuit for constructing the initial state can be costly to implement. In our implementation, we skip the gate-based circuit realization by assigning an exact state vector as the initial state in the simulator. One special case is the complete-XY mixer. The ground state of its Hamiltonian is the Dicke state, whose efficient circuit implementation is known61,73 but could still be costly in the near term devices. For example, ref. 74 studies fidelity lower bounds of Dicke state preparation on Quantinuum H1 devices.

Determination of the ground state of the mixing Hamiltonian

In the case of the mixer implemented exactly, we can determine the ground state by numerically performing the eigenvalue decomposition on the mixing Hamiltonian. Since we are considering a Hamming weight constraint, we only consider the ground state in the feasible subspace.

However, when the mixer is implemented using Trotter approximation, statements about the spectral properties of the exact mixer may not be valid. Trotterization can make notions like the ‘ground state of the mixing Hamiltonian’ ambiguous. Specifically, if a mixer U(β) is the product of non-commuting operators, its eigenvectors, and consequently the eigenvectors of any Hermitian operator H(β) such that U(β) = eiH(β), become dependent on β. For this reason, we prepare the initial state as the ground state of the exact mixing Hamiltonian and try to approximate the mixer with a larger Trotter number T.

In addition, even for a fixed β, the periodicity of the eigenvalues of U(β) (which are all unit complex numbers) allows each eigenvalue of H(β) to be shifted by a multiple of 2π, while still corresponding to the same eigenvalue and eigenvector of U(β). This phenomenon is also discussed in41. It renders the notion of the ‘smallest eigenvalue and the associated eigenstate’ ill-defined. To quantify the alignment level between the initial state and the mixer, similarly to ref. 75, we define the effective Hamiltonian for a Trotterized unitary operator and its associated effective ground state, based on the intuition from the adiabatic limit. Specifically, the effective Hamiltonian associated with evolution time β is defined as \({{{{\bf{H}}}}}_{{{{\rm{eff}}}}}(\beta )=i\log ({{{\bf{U}}}}(\beta ))\) and the corresponding effective ground state is the eigenstate of Heff(β) that exhibits maximal overlap with the ground state of the exact Hamiltonian. We refer to the value of the maximal overlap as the ‘GS fidelity’. As the Trotter number increases, the effective ground state at each QAOA step should converge to the ground state of the exact Hamiltonian, since the Trotterized mixer becomes more accurate. As demonstrated in Fig. 12, even in the presence of potentially large Trotter error, a small Trotter number is sufficient for the effective ground state to be very close to the ground state of an exact mixing Hamiltonian. In other words, the GS fidelity converges much faster than the Trotter error with respect to the Trotter number. This observation is also reported in75.

Fig. 12: An example of Trotterizing the unitary \({{{\bf{U}}}}={e}^{-i\beta {{{{\bf{H}}}}}_{S}^{{{{\rm{XY}}}}}}\) with N = 6 using different numbers of step with β = 0.5.
figure 12

The blue lines represent the relative error in approximating the unitary, while the orange lines depict the fidelity between the ground states of exact and effective Hamiltonians.