Improving solutions by embedding larger subproblems in a D-Wave quantum annealer

Quantum annealing is a heuristic algorithm that solves combinatorial optimization problems, and D-Wave Systems Inc. has developed hardware implementation of this algorithm. However, in general, we cannot embed all the logical variables of a large-scale problem, since the number of available qubits is limited. In order to handle a large problem, qbsolv has been proposed as a method for partitioning the original large problem into subproblems that are embeddable in the D-Wave quantum annealer, and it then iteratively optimizes the subproblems using the quantum annealer. Multiple logical variables in the subproblem are simultaneously updated in this iterative solver, and using this approach we expect to obtain better solutions than can be obtained by conventional local search algorithms. Although embedding of large subproblems is essential for improving the accuracy of solutions in this scheme, the size of the subproblems are small in qbsolv since the subproblems are basically embedded by using an embedding of a complete graph even for sparse problem graphs. This means that the resource of the D-Wave quantum annealer is not exploited efficiently. In this paper, we propose a fast algorithm for embedding larger subproblems, and we show that better solutions are obtained efficiently by embedding larger subproblems.


Introduction
Combinatorial optimization problems, the minimization of cost functions with discrete variables, have significant real-world applications.The cost function of a combinatorial optimization problem can generally be mapped to the Hamiltonian of a classical Ising model 1 .Inspired by simulated annealing 2 , quantum annealing (QA) 3 was proposed as a method for searching the ground state of a Hamiltonian with a complicated energy landscape.While SA employs thermal fluctuations to escape local minima, QA utilizes quantum fluctuations.Numerous studies have investigated whether QA outperforms SA in terms of the computational time required to obtain a high-accuracy solution.Most of the studies have shown that QA is superior to SA [4][5][6] , while a few have also suggested that it is inferior 7 .Recently, a commercial quantum annealer based on superconducting flux qubits 8 has been developed by D-Wave Systems Inc.Experimental studies using the D-Wave quantum annealer have been performed to compare the performance of QA and SA 6,9 and to demonstrate the applicability of the D-Wave quantum annealer to practical problems [10][11][12] .
The generic form of a time-dependent Hamiltonian in QA is where Ĥ0 is the classical Hamiltonian which represents the cost function to be minimized, and Ĥq is the quantum fluctuation term for which the ground state is trivial.At the beginning of QA, the coefficients of the time-dependent Hamiltonian are set to A(0) = 1, B(0) = 0, and the system is in trivial ground state determined by Ĥq .At the end of QA, the coefficients are set to A(τ) = 0 and B(τ) = 1 where τ is the annealing time.The system evolves according to the Schrödinger equation: where |ψ(t) is a state vector of the system and h is set to 1 for simplicity.The system will remain closed to the instantaneous ground state of the time-dependent Hamiltonian if the system changes sufficiently slowly and if the adiabatic condition 13 , is always satisfied during QA.Here |0(t) , |1(t) , ε 0 (t) and ε 1 (t) are eigen vectors and eigen energies of the instantaneous ground state and first excited state, respectively.Thus, by setting the annealing time τ large enough, we ultimately obtain the ground state of the classical Hamiltonian Ĥ0 , which represents the optimal solution.The current version of D-Wave quantum annealer (D-Wave 2000Q) implements QA with a transverse magnetic field: where N q represents the total number of qubits.A cost function that the D-Wave quantum annealer can handle is: where the interactions between qubits are restricted to Chimera graph, which is constructed as an M × N grid of complete bipartite graphs K L,L 14 .Chimera graph for (M, N, L) = (3, 3, 4) is shown in Fig. 1, where the vertices and edges represent qubits and the interactions between them, respectively.Although the Chimera graph for D-Wave 2000Q is (M, N, L) = (16, 16, 4), the number of operable qubits is less than N q = 2MNL = 2048, since there are defects in the qubits and connectivities.
Limited connectivity between the qubits is a restriction to employing the D-Wave quantum annealer for real-world applications.Before solving an optimization problem, it is necessary to map a problem graph onto a subgraph of the hardware graph.This process is called "minor embedding".The problem graph is defined as a graph in which the vertices and edges represent the logical variables and interactions between them, respectively.The hardware graph is defined as a graph for which the vertices and edges represent the qubits and interactions between them, respectively.It is known to be NP-hard 15 to find an optimal minor embedding of an arbitrary problem graph into an arbitrary hardware graph.There exist various algorithms to find the minor embeddings, and a heuristic algorithm proposed by Cai et al. 16 is the most versatile option so far.While this general algorithm searches for a minor embedding of an arbitrary problem graph into an arbitrary hardware graph, the computational time increases drastically with the number of qubits, especially for sparse problem graphs.To reduce the computational time for the minor embedding, some algorithms that exploit features of the hardware graphs and specific problem graphs have been developed.Although the number of logical variables embeddable into hardware graphs is small, utilizing complete graph embedding 17,18 is the simplest way to reduce the computation time.Complete graph embedding can be applied to arbitrary problem graphs with less than 64 logical variables for a Chimera graph with (M, N, L) = (16, 16, 4) without defects.This is basically the embedding used in qbsolv 19 .Other embedding algorithms 20 can find the minor embedding efficiently in reasonably dense problem graphs by exploiting the bipartite structure of the Chimera graph 21 , and it is possible to embed a larger number of logical variables than for a complete graph embedding 17,18 .A minor embedding of the Cartesian product of two complete graphs, which often appears in real-world optimization problems, has been proposed in the literature 22 .However, efficient embedding algorithms for sparse problem graphs do not exist, despite the fact that sophisticated minor embeddings for sparse problem graphs are more important than for dense problem graphs in order to exploit the potential of the D-Wave quantum annealer.More logical variables can be embedded with shorter-length chains for sparse problem graphs.
In the methods section below, we propose a fast algorithm for embedding larger subproblems based on Cai's algorithm 16 .We do not need to embed all of the logical variables of a problem graph in embedding of subproblems, and the logical variables that can be embedded easily are selected as a part of the subproblem in our proposed algorithm.As a result, the proposed algorithm can embed larger subproblems than complete graph embedding 17,18 , with shorter computational time than the Cai's algorithm 16 , not only for dense problem graphs but also for sparse problem graphs.
In the following section, we show the improvement in solutions achieved by embedding larger subproblems for a ferromagnetic model and a spin-glass model on a three-dimensional ±J Ising model with 1,000 logical variables.Since the  cubic lattice with 1,000 variables is not embeddable into D-Wave 2000Q, we extract embeddable subproblems into D-Wave 2000Q and iteratively optimize them using a quantum annealer like qbsolv.In this study, we utilized two algorithms to embed subproblems into the Chimera graph of D-Wave 2000Q, with few defects in the qubits and connectivities.One is the proposed algorithm, which can embed 380 logical variables, and the other is complete graph embedding 18 , which can embed only 63 logical variables.We have confirmed that better solutions can be achieved with less number of iterations for both the ferromagnetic and spin-glass models by embedding large subproblems.

Results
In this section, we demonstrate that better solutions are obtained efficiently by embedding larger subproblems for the ferromagnetic and spin-glass models on the three-dimensional ±J Ising model with 1, 000 logical variables The optimization process implemented in this study is shown in Fig. 2. The problem graph is the cubic lattice with 10 × 10 × 10 logical variables.We partition the original large problem into subproblems and then iteratively optimize the subproblems using a quantum annealer.Two algorithms are utilized to embed subproblems in this study.One is the proposed algorithm, which can embed 380 logical variables, and the other is a complete-graph embedding 18 , which can embed only 63 logical variables into the Chimera graph of the D-Wave 2000Q with few defects in the qubits and connectivities.After embedding and optimizing the subproblem, the logical variables of the subproblem are updated in the output of D-Wave 2000Q.Then, a greedy algorithm is executed by a conventional digital computer to get to an exact (local) minimum.In this greedy algorithm, one logical variable is randomly selected, and it is flipped if the energy decreases.We finish refining the solution using the greedy algorithm if the energy change caused by flipping each logical variable is completely non-negative.Finally, the best solution obtained by this procedure is updated.These processes are iterated, and we confirm that the solutions are improved by embedding larger subproblems.
The Hamiltonian optimized in this study is shown below: where x i ∈ (−1, +1) represents a logical variable, J i j is the interaction between nearest neighbors in the cubic lattice, and p F is the probability that J i j = +J, the anti-ferromagnetic interaction.We evaluated solutions for a ferromagnetic model with p F = 0.0 and a spin-glass model with p F = 0.5 23 .The ferromagnetic model has no frustration, so that  Carlo algorithms 24 address domain walls well by flipping logical variables in the same domain simultaneously.Although the structures of clusters in these algorithms 24 are different from those of the subproblems extracted by the proposed algorithm, we expect that domain walls can be eliminated efficiently by embedding larger subproblems.In the spin-glass model with p F = 0.5, there are many frustrations, and the energy landscape is rugged with many local minima.In order to obtain better solutions, it is essential to search for as many local minima as possible.By embedding larger subproblems, the phase space searched by optimizing the subproblem grows exponentially, and it is possible to search for better local minima that could be distant from the current solution in the phase space.As a result, we expect that better solutions can be obtained efficiently by embedding larger subproblems for both the ferromagnetic and the spin-glass models.
The energies obtained for p F = 0.0 and p F = 0.5 are shown in Figs.3(a) and 3(b), respectively.The average energies for 32 trials are plotted, and the same initial states are used for each run [p F = 0.0 and 0.5, with the two embedding algorithms illustrated in Fig. 2].For both p F = 0.0 and p F = 0.5, lower energies are obtained with a smaller number of iterations by embedding larger subproblems.The ground state energy for p F = 0.0 is −3 and the ground state is obtained for all the trials after 45 iterations.

Discussion
In the present paper, we showed that better solutions are obtained efficiently by embedding larger subproblems for the spinglass and ferromagnetic models on the cubic lattice with 10 × 10 × 10 logical variables.The energy landscape of the spin-glass model is rugged with many local minima.It is essential to search for as many local minima as possible, and this can be achieved by embedding larger subproblems, for which the phase space is exponentially larger than that of small subproblems.For the ferromagnetic model, although there are no frustrations and a trivial ground state exists, eliminating the domain walls from which single-spin-flip algorithms suffer is essential to obtain the ground state.The logical variables in small domains can be flipped simultaneously by embedding larger subproblems, and as a result the ground state can be obtained efficiently with a smaller number of iterations.Although we demonstrated the improvements in the solutions specifically for the spin-glass and ferromagnetic models on a cubic lattice, we expect that better solutions can be obtained efficiently for a wide range of optimization problems by embedding larger subproblems.
For practical applications, it is essential to utilize the D-Wave quantum annealer as a part of an iterative solver like qbsolv, as long as the problem size embeddable in the D-Wave quantum annealer remains limited.A hybrid use of classical algorithms and the D-Wave quantum annealer is inevitable for this scheme.Although we simply adopted a greedy algorithm as a classical optimization algorithm, a myriad of classical algorithms can be combined with the D-Wave quantum annealer 19,25,26 .One guideline for selecting a classical solver is to exploit the complementary advantages of QA and classical algorithms 19 .For example, a more versatile optimization algorithm may be constructed by combining QA and SA 27 , since QA performs well for the energy landscape with many high and thin barriers, while SA efficiently explores the phase space with low and wide barriers 28 .
Although QA was initially proposed as an optimization method, the D-Wave quantum annealer has recently been considered as a sampling machine.It has been assumed that the output of D-Wave quantum annealer is close to a Boltzmann distribution of the Hamiltonian at a freeze-out point during annealing 29 , and applications that utilize the quantum annealer as a sampling machine have been reported [30][31][32][33] .In addition, a local search around a specific initial state using the D-Wave quantum annealer has been proposed in the literature 27 and it is implemented in D-Wave 2000Q.This is called "reverse annealing" 34 .
By combining reverse annealing and the embedding algorithm proposed in this paper, it may be possible to implement Markov chain Monte Carlo methods efficiently for large problems.
In future work, we will evaluate the utility of embedding larger subproblems for various optimization problems and construct high-performance optimization algorithms that exploit the proposed embedding algorithm.

Methods
In this section, we describe a fast algorithm for embedding larger subproblems into a hardware graph.

Definition of minor embedding
In general, a problem graph is not a subgraph of a Chimera graph, and the problem graph must be mapped onto a subgraph of a Chimera graph in order to solve optimization problems using the D-Wave quantum annealer.This process is called "minor embedding" of the problem graph into the hardware graph, and this is achieved by representing one logical variable with multiple qubits.For example, more than two qubits must be assigned to represent a logical variable that interacts with ten logical variables, since the maximum degree of a Chimera graph is six.
2 by setting J 12 < 0. If |J 12 | is large enough, the optimal solutions of the embedded problem will be identical to that of the original optimization problem.Note that the assignment of multiple qubits to one logical variable is allowed, while the assignment of multiple logical variables to one qubit is forbidden.As shown in the literature 17 , a minor embedding of a problem graph G p into a hardware graph G q is defined as follows: A connected subtree T v is often called a "chain".

A conventional heuristic algorithm
A conventional heuristic algorithm for finding a minor embedding of an arbitrary problem graph into an arbitrary hardware graph has been proposed by Cai et al. in the literature 16 .The embedding process of this algorithm is divided into two stages.In the initial stage, logical variables are embedded one by one into the hardware graph, and all of the logical variables are embedded by allowing multiple assignments of the logical variables to one qubit.For example, suppose that the logical variables x 1 , ..., x k are already embedded in the hardware graph, and a logical variable x add that is adjacent to x 1 , ..., x k in the problem graph is selected to be additionally embedded.In this case, as shown in Fig. 4(a), an unused qubit to which no logical variables are assigned is selected as the root of x add , and the shortest paths from the root of x add to T x 1 , ..., T x k are calculated on the hardware graph using Dijkstra's algorithm.Then, by assigning x add or x i (i = 1, 2, ..., k) to qubits in the shortest paths, the adjacency between x add and x 1 , ..., x k will be represented on the hardware graph.However, it will often be the case that a path with only unused qubits does not exist.For example, as shown in Fig. 4(b), if a logical variable z is assigned to all of the qubits adjacent to T x 1 , a path from the root of x add to T x 1 with only unused qubits does not exist.In such a case, by assigning multiple logical variables [x add or x 1 and z in Fig. 4(b)] to one qubit, x add will be embedded once.After embedding all of the logical variables by allowing multiple assignments in the initial stage, the minor embedding obtained in the initial stage is refined so that only one logical variable is assigned to one qubit in the last stage.
The computational time for this algorithm is dominated by Dijkstra's algorithm.The computational time T (1) conv and the number N (1) Dijkstra of shortest paths searched by Dijkstra's algorithm in the initial stage are given by

N
(1) conv and N (2) Dijkstra in the last stage are given by

N
(2) 5/11 x add or x i , i=1, 2 ,…, k, is assigned to the qubits in the shortest paths where T Dijkstra represents the computational time for Dijkstra's algorithm: Here, |V p | and |E p | are the number of vertices and edges in the problem graph, and |V q | and |E q | are the number of vertices and edges in the hardware graph, respectively.In this algorithm, the vertex-weighted shortest paths are searched in order to distinguish used and unused qubits.The computational time in the last stage is obviously dominant.So we expect that the computational time will be drastically reduced by avoiding multiple assignments of logical variables in the initial stage, since the implementation of the last stage becomes unnecessary.

Proposed algorithm
Here we focus on the embedding of subproblems and propose a fast algorithm to find minor embeddings of subproblems.
In embedding a subproblem, we can select logical variables that are embeddable without multiple assignments as a part of the subproblem, since it is not necessary to embed all of the logical variables included in the problem graph.While in a conventional algorithm, the search for a minor embedding is subject to a strong restriction, that all of the logical variables included in the problem graph must be embedded.The multiple assignments of logical variables are mainly caused by this restriction.However, for dense problem graphs, the logical variables embeddable without multiple assignments become extinct before all the qubits are exploited.To mitigate this issue, the proposed algorithm includes a reservation system that leaves space to extend the chains.As shown in Fig. 5(a), if a qubit on the left side of a complete bipartite graph K 4,4 in Chimera graph is selected as the root of x add , qubits to extend the chain vertically are reserved by x add , and assignment of other logical variables to these qubits are forbidden.If a qubit on the right side is selected as the root of x add , qubits to extend the chain horizontally are reserved by x add , as shown in Fig. 5(b).The reserved qubits are released after the embedding of all the logical variables adjacent to x add are completed.
The refinement of the embedding in the last stage of the conventional algorithm can be eliminated in the proposed algorithm.In addition, the breadth-first search in a subgraph of a hardware graph consisting only of unused qubits is sufficient to search the shortest paths, since multiple assignments are forbidden.The computational time T prop for the proposed algorithm and the number N breadth of the shortest paths searched by the breadth-first search are given by The computational time T breadth for the breadth-first search is given by where |e q | is the number of edges included in the subgraph of the hardware graph with unused qubits.For Chimera graph, as shown in Fig. 6, it is sufficient to consider unit cells with used qubits and adjacent unit cells without used qubits.The maximum number of edges included in the subgraph is limited to The embedding algorithm proposed in this section does not strongly depend on the structure of the hardware graphs, except for the reservation system, and we can easily adapt the embedding algorithm for other hardware graphs.

Experimental results
In order to confirm the scalability of our algorithm, we evaluated the |V q | dependence of the number N breadth of the shortest paths searched by the breadth-first search and the size N sub of the embedded subproblems.We have used the proposed algorithm to embed subproblems of a grid graph with 300 × 300 variables and a complete graph with 1, 000 variables into a Chimera graph with 10 2 ∼ 10 5 qubits.The results for N breadth are shown in Fig. 7(a).Linear fits to the experimental results yield The |V q | dependence of N breadth is less than O |V q | 1.3 , even for a grid graph with sparse connectivity.As the exponent is not large, we expect the proposed algorithm to be feasible even if the number of qubits is increased in a future version of the D-Wave quantum annealer.The results for N sub are shown in Fig. 7(b).Linear fits to the experimental results yield

Figure 2 .
Figure 2. The optimization process implemented in this study.The solutions obtained by utilizing the proposed algorithm and by using complete-graph embedding are compared.

Figure 3 .
Figure 3. Comparisons of the average energy from 32 trials.(a) The ferromagnetic model with p F = 0.0.(b) The spin-glass model with p F = 0.5.

Figure 4 .
Figure 4.An embedding of a logical variable x add .(a) A case for which paths to the adjacent logical variables already exist.(b) A case for which multiple assignments of logical variables is necessary.

Figure 5 .Figure 6 .Figure 7 .
Figure 5. Examples of reserved qubits associated to the root of x add .(a) An example showing vertically reserved qubits associated to the root of x add .(b) An example showing horizontally reserved qubits associated to the root of x add .