Space-efficient binary optimization for variational quantum computing

In the era of Noisy Intermediate-Scale Quantum (NISQ) computers it is crucial to design quantum algorithms which do not require many qubits or deep circuits. Unfortunately, most of the well-known quantum algorithms are too demanding to be run on currently available quantum devices. Moreover, even the state-of-the-art algorithms developed for the NISQ era often suffer from high space complexity requirements for particular problem classes. In this paper, we show that it is possible to greatly reduce the number of qubits needed for the Travelling Salesman Problem (TSP), a paradigmatic optimization task, at the cost of having deeper variational circuits. While the focus is on this particular problem, we claim that the approach can be generalized for other problems where the standard bit-encoding is highly inefficient. Finally, we also propose encoding schemes which smoothly interpolate between the qubit-efficient and the circuit depth-efficient models. All the proposed encodings have the same volume up to polylogarithmic factors and remain efficient to implement within the Quantum Approximate Optimization Algorithm framework.


INTRODUCTION
During the past few decades, a wealth of quantum algorithms have been designed for various problems, many of which offered a speedup over their classical equivalents [1][2][3][4] . The theoretical developments have also been complemented by progress on the experimental side. Indeed, the demonstration of quantum supremacy by Google 5 indicates that in the near future useful quantum technology may be available. However, current Noisy Intermediate-Scale Quantum (NISQ) devices are too small and too noisy to implement complicated algorithms like Shor's factorization algorithm 2 or Grover-search-based optimizers 6,7 . This resulted in a new field of quantum computation that focuses on designing new algorithms requiring significantly less noisy qubits.
Optimization is a problem that seems to be particularly suitable for current NISQ devices. In particular, the Variational Quantum Eigensolver (VQE) [8][9][10] seems to be the state-of-the-art algorithm for solving molecule Hamiltonians. Although it can solve optimization problems defined over discrete spaces, socalled combinatorial optimization problems, quantum annealing and Quantum Approximate Optimization Algorithm (QAOA) 11 are considered to be more suitable.
For all of these algorithms, the original combinatorial optimization problem has to be transformed into the Ising model. Typically, one starts with a high-level description, like the Max-Cut problem, where nodes in the graph G = (V, E) have to be colored either red or black, so that certain function is minimized. Then, one has to transform it to a pseudo-Boolean polynomial P fu;vg2E ðb u À b v Þ 2 . Each binary variable (bit) b v denotes the color of the node v in the graph. For example we can choose b v = 1 for red color, and b v = 0 for black color. For quantum algorithms it is convenient to change the representation into Ising model via transformation b v ← (1 − Z v )/2. Here Z v is a Pauli operator acting on a qubit corresponding to the node v. By transforming the original objective function into Hamiltonian, we also change the domain of the problem into the space of quantum states.
Quantum optimization algorithms differ in the way how they solve the problem. Variational Quantum Eigensolver (VQE) is a heuristic algorithm in which the quantum circuit is optimized using classical procedure. More precisely, we are given an ansatz U(θ) which, after fixing the parameter θ, produces a state θ j i ¼ Q kÀ1 i¼0 U i ðθ i Þ 0 j i. The vector θ is optimized using classical optimization procedure like gradient descent, simultaneous perturbation stochastic approximation (SPSA), and other 12,13 , so that θ j i will be localized at high-quality answer. Due to its generality, VQE is commonly used for molecule Hamiltonians, however its usability to the classical optimization problems may be limited.
Quantum annealing theoretically can also be applied for chemistry Hamiltonians, however current machines restrict their usability to combinatorial optimization problems. The algorithm turns the ground state of initial Hamiltonian H mix = ∑ i X i into a ground state of objective Hamiltonian H through adiabatic evolution g(t)H mix + (1 − g(t))H. Adiabatic theorem provides a good premise for high-quality solutions of the problem. Furthermore, while available D-Wave's annealers have thousands of qubits, the topology restrictions may limit the size of tractable problems to cases solvable by classical procedures 14,15 .
Quantum Approximate Optimization Algorithm (QAOA) is a mixture of the methods described above 11 . While quantum annealing is a continuous process, QAOA interchangeably applies both H mix and H for some time. The evolution time is a parameter of the evolution, and as it was in the case of VQE, they are adjusted by external classical procedure. Here the resulting state takes the form where r is the number of levels. The algorithm can be implemented as long as both mixing and objective Hamiltonians can be implemented, which in particular allows for applying it to combinatorial problems. Many studies have been performed to 1 characterize properties of QAOA algorithms, including both rigorous proofs of computational power and reachability properties [16][17][18][19] as well as characterization through heuristics and numerical experiments and extensions of the algorithm [20][21][22][23][24] . While the proposed quantum algorithms can theoretically solve arbitrary combinatorial optimization problems, not all pseudo-Boolean polynomials can be natively considered for current quantum devices. A general pseudo-Boolean polynomial takes the form where α I 2 R defines the optimization problem. The problem of finding optimum of such function is called Higher-Order Binary Optimization (HOBO). An order of such Hamiltonian is a maximum size of S for which α S is nonzero. Current D-Wave machines are restricted to polynomials of order 2, hence if one would like to solve Hamiltonians of higher order, first a quadratization procedure has to be applied, in general at cost of extra qubits 25 . Note that optimization of quadratic polynomials, Quadratic Unconstrained Binary Optimization (QUBO) is NP-complete, hence it encapsulates most of the relevant problems.
The objective Hamiltonian for Max-Cut requires n qubits for graph of order n. Hence it can encode 2 n solutions, which is equal to the number of all possible colorings. However, this is not the case in general. For example for Traveling Salesman Problem (TSP) over N cities, the QUBO requires N 2 bits 26 . However, to store N! permutations only dlog 2 ðN!Þe % N log N bits are needed. We consider this as a waste of computational resources. Unfortunately, in general polynomials with optimal number of qubits have order larger than two, thus we are actually dealing with higherorder binary optimization, which is currently not possible using D-Wave machines.
The idea of using higher-order terms is not new. In fact, in the original work of Farhi et al. 11 , the authors have not restricted the model to two-local model. Furthermore, Hamiltonian of order 4 was used for variational quantum factoring 27 , while Hamiltonian of order l was constructed for Max-l-SAT problem 20 . Since the terms of arbitrary order can be implemented efficiently, QAOA for the problem can reduce the number of required logical qubits. In general, if objective polynomial is of constant order α, then the circuit of depth O n α log α implements the objective Hamiltonian exactly. While the number may be large, it is still polynomial, which makes the implementation tractable. However, even for slowly growing α (say Oðlog nÞ), in general the number of terms grows exponentially, which could be the case for l → ∞ in Max-l-SAT. Note that even an encoding that requires only logarithmic number of qubits has been introduced 28 , however, the minimizer of this encoding does not necessarily map to the minimizer of the original problem. Furthermore, when dealing with unbounded order, one has to be careful when transforming QUBO into the Ising model. Let us consider a polynomial 2 n Q nÀ1 i¼0 b i . Default transformation b i ← (1 − Z i )/2 will produce Hamiltonian P I⊆{0, …, n−1} (−1) |I| Q i2I Z i , which consists of 2 n terms 29 . For this example, one can easily find a better Hamiltonian À P nÀ1 i¼0 Z i , that shares the same global minimizer, however, in general finding such a transformation requires a higher-level understanding of the problem. Note that this is not an issue for constant-order polynomials, as the number of terms is guaranteed to be polynomial even in the worst case scenario.
Despite the potential issues coming from utilizing unboundedorder polynomials, we present a polynomial (encoding) for TSP problem with unbounded order, which can be efficiently implemented using approximately optimal number of qubits. Furthermore, our model requires fewer measurements for estimating energy. We also developed a transition encoding, where one can adjust the improvement in the required number of qubits and circuit's depth. Finally, QAOA optimizes our encoding with similar or better efficiency compared to the state-of-the-art QUBO encoding of TSP.
Current state-of-the-art encoding of TSP problem can be found in the paper by Lucas 26 . Let us consider the Traveling Salesman Problem (TSP) over N cities. Let W be a cost matrix, and b ti be a binary variable such that b ti = 1 iff the i-th city is visited at time t. The QUBO encoding takes the form 26 t¼0 bti btþ1;j: Here A 1 ; A 2 > B max i≠j W ij are parameters that have to be adjusted during the optimization. We also assume N → 0 simplification for the indices. Note that any route can be represented in N different ways, depending on which city is visited at time t = 0. Such redundancy can be solved by fixing that the first city should be visited at time t = 0. Thanks to that, n = (N−1) 2 (qu)bits in total are required.
In the scope of this paper we will take advantage of various quality measures of encodings. First, since the Hamiltonian has to be implemented directly, we prefer encodings with possibly small depth. In this manner, QUBO can be simulated with a circuit of depth OðnÞ using round-robin scheduling, see Fig. 2.
However, QUBO encoding for TSP is inefficient in the number of qubits. Using Stirling formula one can show that dlogðN!Þe ¼ N logðNÞ À N logðeÞ þ ΘðlogðNÞÞ qubits are sufficient to encode all possible routes, which is significantly smaller than N 2 . Note that the number of qubits also has an impact on the volume of the circuit, defined as a product of the number of qubits and the circuit's depth. In case of this encoding, the volume is of order OðN 3 Þ.
In the paper we also consider the required number of measurements to estimate the energy within constant additive error. Instead of estimating each term of the Hamiltonian independently, which has to be done for VQE, we consider the measurement's output as a single sample. This way, using Hoeffding's theorem, QUBO encoding requires OðN 3 Þ measurements.
Finally, we would also like to mention some other approaches of reducing the number of qubits required for optimization problems. The authors of 30 proposed a similar approach of changing the numeric system, however, they have not considered its applications to combinatorial problems, and particularly focused on qudits. The authors of Fuchs et al. 31 independently considered Max-K-Cut, which is adealing with unbounded order, one has to be generalization of Max-Cut with K-partitions of the graph. They obtained a similar reduction from NK to $ N log K qubits, however, they did not provide a transition model which provides a trade-off between the circuit depth and the required number of qubits. Finally, there are known QAOA versions in which one reduces the size of infeasible space for TSP problems based on the generalization of QAOA 20 , however, they still require N 2 number of qubits, which is the same as for the state-of-the-art encoding 32 .

RESULTS Preliminaries
Travelling Salesman Problem is natively defined over the permutations of {0, …, N − 1}. A simple encoding can be defined as follows. We make a partition of all bits into N collections b t , where each collection encodes a city visited in a particular time. Then, for each collection we choose a number encoding that represents the city.
QUBO is an example of such an encoding, where each city is represented by an one-hot vector, see Fig. 1. Instead, each city can be encoded as a number using binary numbering system. Using binary numbering system is a state-of-the-art way to encode inequalities 26 , however, it is new in the context of encoding elements of a feasible space.
The Hamiltonian takes the form Hamiltonian H valid checks whether a vector of bits b t encodes a valid city. For example, for QUBO it checks whether exactly one bit is equal to 1.
Hamiltonian H ≠ verifies whether two collections encode the same city. Note that QUBO encoding falls into this representation by choosing Hamiltonian H δ plays a similar role as H ≠ . If the inputs are different, then both H δ and H ≠ give zeros. If the inputs are the same, then the outputs are nonzero and moreover we expect that the Hamiltonian H δ outputs 1. This is in order to preserve the costs of routes. In case of QUBO we took H δ (b t , i) = b ti . Note that in particular, H δ = H ≠ may be a good choice, however, later we will show that choosing different H ≠ may be beneficial.

Simple HOBO encoding
The simplest encoding is the one in which each collection b t encodes a city in a binary system, see Fig. 1. In this case, for each time we need K :¼ dlog Ne qubits. In total we need $ N logðNÞ qubits, which match the lower bound. Moreover, we have to design H valid in such a way that b t represents the number at most N − 1.
Following Eq. (2), it is easy to note that HOBO defined in a way described above, is of polynomial size. Note that the sum of . Hence in total, we have OðN 4 Þ terms, which implies the polynomial size and depth, and thus volume.
Let us now present an exemplary encoding.
Note that if b 0 is a fixed number like it is in the case of objective function implementation in Eq. (2), then we simply take consecutive bits from binary representation. After transforming the polynomial above into Ising model, one can see that This means that all the Pauli terms are of the form where spins are in pairs with the same index k. This allows reducing the initial estimate OðN 4 Þ of number of Pauli terms into OðN 3 Þ.
Let us estimate the cost of this encoding. As it was previously stated, the number of factors is of order OðN 3 Þ. Using round-robin technique for distributing gates and Gray code for ordering the applications of Pauli ZZ…Z operators, see Figs. 2 and 3, one can show that the depth of the circuit is OðN 2 Þ, which gives us the volume OðN 3 logðNÞÞ, which is almost the same as it was for QUBO. Note that the Gray code scheduling requires additional ⌊N/2⌋ qubits, which does not change the final result. Finally, in order to achieve a similar quality of energy measurements, we need OðN 2 Þ measurements. One can expect that Higher-Order Binary Optimization may lead to difficult landscapes, harder to investigate for optimization algorithm. We have investigated TSP encodings with W ≡ 0 and random W matrices. The results are presented in Fig. 4. One can see that with the same number of Hamiltonians applied, the results are either similar or in favor for higher-order encodings.
Mixed QUBO-HOBO approach While QUBO encoding requires significantly more qubits compared to HOBO, the latter produces much deeper circuits. It is not clear whether the number of qubits or the depth of the circuit will be more challenging, and in fact we claim that both may produce significant difficulties when designing quantum computers.
One can consider a simple mixing of the proposed QUBO and HOBO approaches in the following way: let R ∈ {1, …, N − 1} be a free parameter of our model. Exactly R of collections b t will be encoded as one-hot vectors (in QUBO's fashion), while the remaining N − R collections will be encoded using the binary representation, see Fig. 1d).
Unfortunately, this approach combines flaws of both models introduced before. For R = Ω(N), the mixed approach requires Θ (N 2 ) qubits. On the other hand, for R ¼ OðNÞ the depth of the circuit is the same as in the HOBO approach due to numerous HOBO-encoded b t .
Instead, we propose another encoding. Suppose N = (2 K − 1)L for suitable integers K and L. Each b t consists of KL qubits of the form b tlk . The cities are encoded as follows. First K qubits (first bunch) decodes numbers 0, …, 2 K − 2. The second bunch decodes 2 K − 1, …, 2 ⋅ 2 K − 3, and so on. Note that QUBO and HOBO encodings introduced before are special instances with L = N and L = 1 respectively.
We add the following assumptions, which also define H valid . All bits being zero is an invalid assignment, which is equivalent to ∑ k, l b tlk ≥ 1. This can be forced by using standard techniques for transforming inequalities to QUBO 26 . Secondly, if in some bunch there are nonzero bits, then in all other bunches bits have to be zeros. Note that this assumption is equivalent to the fact that for all l 0 = 0, …, L − 1 we have that either P k b tl0k is zero or P l≠l0 P k b tlk is zero. The Hamiltonian H MIX valid takes the form Here ξ i are additional bits for encoding the first assumption. In total, there will be additional N logðKLÞ N logðNÞ qubits. Now let us define H MIX 1 À ðb t;l;k À b t 0 ;l;k Þ 2 : Note that the first factor checks whether the bunches are nonzero, while the latter is the Kronecker delta implementation as before. A description of the schedule can be found in 40 . We assumed that each b i is defined over 3 qubits. Each gate defined over a pair b i , b j is an implementation of the Hamiltonian defined over these variables. ; N À 1g ! f0; ; L À 1g be a function that outputs a bunch index denoting the i-th city. Then it is enough to apply the Kronecker delta on lðiÞ-th bunch. Hence H MIX δ will take the form where b i k is k-th bit of binary representation of i. Let us now calculate the efficiency of the encoding. We will consider only the scenario k ¼ α logðNÞ for α ∈ (0, 1). First we note that L ¼ N 2 K À1 ¼ ΘðN 1Àα Þ. For the proposed mixed encoding, we need NKL þ N logðKLÞ ¼ ΘðN 2Àα logðNÞÞ qubits. Hamiltonian can be encoded in a circuits of depth OðN 1þα log 2 ðNÞÞ. This finally gives us the volume OðN 3 log 3 ðNÞÞ. All these parameters show a perfect, up to poly-log factors, transition between HOBO and QUBO approaches. Finally, to achieve constant error of energy estimation, we require OðN 3Àα log NÞ max i≠j W ij runs of the circuit.
Optimal encoding So far we assumed that all binary variables are split into collections of binary variables, such that each collection defines a particular time point. We heavily used this assumption, so that the encoding was particularly simple. Therefore, it was implementable on a quantum computer, which is necessary for QAOA. Nevertheless, dropping this assumption can save us from even more qubits.
Let H be a diagonal Hamiltonian. Then ψ h jH ψ j i ¼ P b2f0;1g n E b j bjψ h ij 2 , where E i is the energy value corresponding to the bit string i. Hence, the statistics from the measurements are sufficient to estimate the energy.
Suppose we are given a general combinatorial optimization problem of function f : A ! R, where A is a natural feasible space for the problem. In the case of TSP, A would be a collection of all permutations of some fixed order. Let g : A ! B f0; 1g n be a bijection function where n ¼ dlogðjAjÞe. Letg inv ðbÞ be an extension of g −1 such that it maps some penalty energy E pen > min a2A f ðaÞ, i.e. g inv ðbÞ ¼ g À1 ðbÞδ b2B þ E pen δ b∉B . Then provided that g inv can be efficiently computed, we can use it to estimate the expected energy directly from the measurement's statistics. Since converting binary representation into numbers takes negligible time, it is enough to provide a procedure for numbering elements of A.
We can incorporate this technique to TSP problem as well. In this case, the simplest way is to use a factorial numbering system in which i-th digit starting from the least significant one can be any number between 0 and i − 1. In general The opposite transformation can be done by computing the modulo by consecutive natural numbers. Then such representation can be transformed to permutation via Lehmer codes which, starting with the most significant factoradic digit, takes (k + 1)-th digit of the sequence (0, 1, …, k). The used digit is removed and the procedure repeats for next digit. The taken digits in given order directly encodes a permutation.
Since the procedure described above maps consecutive natural numbers to routes, we require only dlogðN!Þe qubits, which is optimal for each N. Since arbitrary pseudo-Boolean function can be transformed to pseudo-Boolean polynomial, it is as well the case for f∘g inv . Hence there exists a diagonal Hamiltonian representing the same optimization problem. However, in general such encoding may require exponential number of terms, which makes it intractable for QAOA. Hence, for such an approach VQE is at the moment the preferable quantum algorithm. The numbers of required measurements will in general depend on the choice of E pen , however, they can be equal to the length of any route, or N max i≠j W ij . By this we can show that the number of measurements is approximately OðNÞ rangeðWÞ, which is significantly smaller than for any encoding described before.

DISCUSSION
The presented results, summarized in Table 1, show that it is possible to significantly reduce the number of required qubits at the cost of having deeper circuits. Since reducing both the depth and the number of qubits are challenging tasks, we claim that it is necessary to provide alternative representation allowing tradeoffs between the different measures. Our numerical results hint that the increase of the depth might not be that significant for larger system sizes, as one needs fewer levels in the spaceefficient embedded version. Thus, it would be interesting to investigate how many fewer levels one needs in the spaceefficient encoding scheme.
Note that the approach cannot be applied for general problems. For example, the state-of-the-art representation of the Max-Cut problem over a graph of order N requires exactly N qubits. Since the natural space in general is of order 2 N , it seems unlikely to further reduce the number of qubits. However, one can expect Fig. 4 The dependence between the probability of measuring the state in feasible space and the number of levels for Travelling Salesman Problem. Analysis was done for a 3 cities, b 4 cities, c 5 cities for W ≡ 0, and d 3 cities, e 4 cities, f 5 cities for randomly chosen W. For most cases HOBO and QUBO present similar quality, while HOBO clearly outperforms QUBO approach for random W for 4 cities. Vertical line denotes the change of optimization method. For 5 cities with random W we were only capable of estimating up to 10 levels applied due to convergence issues. The area describes the range between the worst and the best cases of the best probabilities over all TSP instances. similar improvements for other permutation-based problems like max-k-coloring problem.
On top of that, while arbitrary HOBO can be turned into QUBO by automatic quadratization techniques, it remains an open question whether there are simple techniques that reduce the number of qubits at the cost of additional Pauli terms. This is due to the fact that quadratization is well-defined: if H : B X ! R is a general pseudo-Boolean function, then quadratic pseudo-Boolean function H 0 : B X B Y ! R is its quadratization iff for all x 2 B X HðxÞ ¼ min y2BY H 0 ðx; yÞ: Note that y does not have any meaning in the context of original problem H. However, when removing qubits from binary function, we may not be able to reproduce the original solution. Thus, such (automatic) procedure requires context of the problem being solved. One could consider recent results showing that QAOA with polynomial depth is exposed to the negative side-effect induced by the noisy system 33,34 . This threatens the possible application of QAOA not only to this model, but also to the state-of-the-art model. However, if we want to compare the efficiency of QAOA with QUBO against HOBO or mixed encoding, we should take into account that smaller systems would likely give more faithful results. This effect is in fact already observed in the current quantum devices: e.g., when comparing superconducting qubits and trapped ion technologies, one can note that while for the former it is much easier to construct larger computers, the latter technology allows us to run deep quantum algorithms with much higher fidelity as long as the memory resources are significantly smaller. Thus, the applicability of our results against the state-ofthe-art model may depend on how the current technologies will evolve in the nearest future, and it is difficult to estimate whether small-depth large-width computation will be more plausible than large-depth short-width ones.
Finally, using 2-complete linear swap network presented in O'Gorman et al. 35 we can implement HOBO on Linear Nearest Neighbour (LNN) topology with the same depth up to polylog overhead, by swapping whole registers b t instead of single qubits. This comes from the fact that swapping two neighboring registers of size $ log N requires only $Oðlog NÞ depth. Furthermore, by using decomposition from Fig. 3a and proper reordering of qubits, one can implement up to $ 2 log N-local Pauli operators with only extra Oðlog NÞ depth. On the other hand, swap network presented in 35 guarantees only depth proportional to the number of qubits for QUBO. For this reason for the QUBO formulation considered in this paper, it gives only OðN 2 Þ depth. This in fact is optimal, as one cannot construct swap network achieving depth N 2−ε for any ε > 0. One can show it as follows: if we take any initial arrangement on the LNN network for QUBO, then all of the spins which interact should be with distance N 2−ε . However, all spins presented in the formulation have to interact with the spins interacting with the first spin (in other words, graph of spin interactions has radius 2 for any node). This would mean that all spins have to be within distance N 2−ε from the first spin, which is not possible since there are N 2 of them. Hence we see that Θ(N 2 ) depth is the best achievable for TSP. Hence for such LNN topology HOBO depth is larger only by poly-logarithmic factors, while having significant reduction on qubits.

The analysis of circuits' depths
Let us begin with HOBO and mixed approaches. According to Eq. (2) we can split all the terms into those defined over pairs ðb t ; b t 0 Þ for t≠t 0 . For pairwise different t 0 , t 1 , t 2 , t 3 , if we have polynomials defined over b t0 ; b t1 , and b t2 ; b t3 , then we can implement them independently. Using roundrobin schedule, we can implement those N 2 polynomials in N − 1 (N) steps for even (odd) N, as it is described in Fig. 2.
Let H be a general Hamiltonian defined over K bits. If we implemented each term independently, then it would require P K k¼1 2i K i Θð2 K KÞ ¼ 2 K K À 1 controlled NOTs according to the decomposition presented in Fig. 3a. Adding single auxiliary qubit and using the decomposition from Fig. 3b, and ordering terms according to Gray code, we can do it using 2 K qubits. Following the reasoning from previous paragraph we can apply only d N 2 e Hamiltonians at once. We have an additional cost of d N 2 e qubits, however reducing the depth cost by K ¼ Θðlog NÞ for both mixed and simple HOBO approaches.
For H HOBO ≠ all terms are formed of pair of spins Z t;k Z t 0 ;k . Here we can again use the Gray code schedule, however each wire in Fig. 3c has a meaning of the presented pair of spins. This means that each application of Pauli term will results in 2 new controlled NOTs, instead of a single one as in Fig. 3d. The situation for mixed formulation is much more complicated, as there are terms which consist of the mentioned pairs of spins plus one spin w/o match, hence the application of Gray code is not straightforward. Hence we did not assume any CNOT simplification.
As far as QUBO is concerned, we have to make separate analysis, since only 2-local terms are present. Note that 1-local terms Z ti can be implemented with circuit of depth 1. Moreover, for each 0 ≤ t < N, terms {Z ti Z tj : 0 ≤ i < j < N} can be implemented with a circuit of depth ≈ N using round-robin schedule. We can similarly implement fZ ti Z t 0 i : 0 t<t 0 <Ng, which implement first two addends of H QUBO . For the last addend we have to implement for each 0 ≤ t < N terms Z t ¼ fZ ti Z tþ1;j : i ≠ jg. For even N, note that we can first implement them for even t, then for odd t, which doubles the depth of the circuit for single t. For odd N we have to include extra layer for t = N − 1, which will only increase the depth by 3/2. Finally, note that Z t ¼ S NÀ1 k¼0 Z t;k , where each Z t;k ¼ fZ ti Z tþ1;iþk : 0 i<Ng can be implemented with circuit of depth 1. Eventually, the depth of Hamiltonian H QUBO is of order Θ(N).
The detailed analysis for each encoding can be found in Supplementary Methods.

Numerical analysis
Below we describe the optimization algorithm used to generate the result presented in Fig. 4. We have emulated the quantum evolution and take an exact expectation energy of the state during the optimization. As a classical subroutine we used a L-BFGS algorithm implemented in Julia's  36 . Independently of the instance of algorithm, we assume that the parameters θ mix for mixing Hamiltonian could be from the interval [0, π]. For objective Hamiltonian we assumed the parameter θ obj will be from [0, R]. For both we assume periodic domain, mainly if π + ε or respectively R + ε was encountered, the parameter was changed to ε, which changed the hypercube domain to hypertorus. Let r be the number of levels of the circuit. For r < 5, each run was started from randomly chosen vector within range of the parameters. For r ≥ 5, we used a trajectories method similar to the one proposed in 37 . First, we optimized the algorithm for r = 5, as described in previous paragraph.
Then, for each r ≥ 5 we took the optimized parameter vectors θ ! ðrÞ mix , θ ! ðrÞ obj of length r, and constructed new vectors θ ! ðrþ1Þ mix , θ ! ðrþ1Þ obj of length r + 1 by copying the last element at the end. We took these vectors as initial points for r + 1. Therefore, we obtained a trajectory of length 11 (6 for Fig. 4f) of locally optimal parameter vectors, one for each r ≥ 5.
Sometimes the algorithm has not converged to the local optimum in reasonable time. We claim that the reason came from periodicity of the domain, which for general TSP breaks the smoothness of the Hamiltonian landscape. We only accepted runs for which: for r < 5 the gradient was below 10 −5 ; for r ≥ 5, for each parameters vector from the trajectory, the gradient was supposed to be below 10 −5 .
Since the energy values for both QUBO and HOBO are incomparable, we decided to present the probability of measuring the feasible solution, i.e. the solution describing a valid route.
Figures a, b, c from Fig. 4 were generated as follows. We took QUBO and HOBO encodings of TSP with W ≡ 0 for QAOA algorithm. One can consider it as a Hamiltonian problem on a complete graph. We took A 1 = A 2 = 1 for both encodings, and R = π (R = 2π) for QUBO (HOBO). For each r = 1, …, 15 we generated 100 locally optimal parameter vectors, and for each r we chose the maximum probability.
Subfigures d,e,f from Fig. 4 were generated as follows. We generated 100 matrices W to be W = X + X ⊤ , where X is a random matrix with i. i. d. elements from the uniform distribution over [0, 1]. For each TSP instance we repeated the procedure as in W ≡ 0 case, however, generating 40 samples for each r. For each TSP instance we chose the maximal probability of measuring the state in the feasible space. The line describes the mean of the best probabilities over all TSP instances. The area describes the range between the worst and the best cases of the best probabilities over all TSP instances.

The number of measurements
For estimating the number of measurements we applied the Hoeffding's inequality. Let X ¼ 1 M P M i¼1 X i be the mean of i.i.d. random variables such that X i ∈ [a, b]. Then PðjX À EXj ! tÞ 2 exp À 2Mt 2 b À a : In our case X is the energy estimation of the energy samples X i . Provided that we expect both probability error and estimation error to be constant, we require M = Ω(b − a). The values of a, b depend not only on the cost matrix W, but also on the form of the encoding and the values of constants A 1 , A 2 , B in Eq. (2). For simplicity, we take the following assumptions when estimating the samples number. First, w.l.o.g. we assume B = 1. Furthermore, we assumed C max i≠j W ij A i C 0 max i≠j W ij where C; C 0 do not depend on N and W. This matches the minimal requirement for QUBO. While various measures on W could be considered, we presented the results in the form f ðNÞ max i6 ¼j W i;j , where f does not depend on W. Furthermore, our analysis for each model is tight in N assuming that max i≠j W ij ; min i≠j W ij ¼ Θð1Þ independently on N. Note that using this assumption, a ! N min i≠j W ij is valid for any correctly chosen A 1 , A 2 .
Furthermore, Hamiltonians H ≠ and H valid are integer-valued, and the spectral gap is of constant order. For QUBO, the spectral gap is at most two, which can be generated by adding superfluous one-bit to any valid encoding. For HOBO, it can be generated by choosing the same number for different b t ; b t 0 . Finally, for the mixed approach we can generate incorrect assignments to slack ξ t,i variables. Theoretically, there is no upperbound for A 1 , A 2 . However, in general it is not encouraged to make them too large, as classical optimization algorithm may focus too much on pushing the quantum state to feasible state instead of optimizing over feasible space. For this reason and to make the presentation of our results simpler, we assumed that A i are of order max i≠j W i;j .
During the optimization, one could expect that the quantum states will finally have large, expectedly 1 − o(1) overlap with the feasible space. Thus, one could expect that the estimated energy will be of typical, and later close to minimal, route. Thus, for these points one could expect that OðNÞ rangeðWÞ samples would be enough to estimate the energy accurately. We agree that it is a valid approach, especially when the gradient is calculated using (f(θ 0 + ε) − f(θ 0 − ε))/(2ε) formula. However, recently a huge and justified effort has been made on analytical gradient estimation, which is calculated based on θ parameters that are far from the current θ 0 point 38,39 . In this scenario, we can no longer assume that the energy will be of the order of the typical route. Thus we believe that our approach for number of measurements estimation is justified.
The detailed analysis for each encoding can be found in the Supplementary Methods.

DATA AVAILABILITY
The datasets generated during and/or analyzed during the current study are available at the link https://doi.org/10.5281/zenodo.5638267.