Introduction

Quantum computers, initially proposed by Feynmann1, were reported by Benioff2, Deutsch3,4, Grover5, and Shor6 to have great potential that could overwhelmingly surpass that of classical computers. Furthermore, Google experimentally demonstrated quantum supremacy, which is the refutation of the extended Church–Turing thesis, proving the feasibility of quantum computers and raising the expectation for solving practical problems that a classical computer cannot address7. Quantum computers can efficiently solve problems in the BQP (bounded-error quantum polynomial) complexity class8 and verify an answer to a problem in the QMA (quantum Merlin–Arthur) complexity class9. One of the actively researched problems for quantum computers is combinatorial optimization, which is an NP-hard problem10. Combinatorial optimization problems are closely related to our daily lives, and they include the traveling salesman problem11, scheduling problem12, and SAT (satisfiability problem) solver13. Although combinatorial optimization problems are NP-hard, some quantum algorithms were shown to be superior to the classical ones. Grover’s algorithm is already known to improve the computational cost with quadratic speedup when compared with classical computers14,15. It has been reported that quantum annealing is faster than simulated annealing in several cases16,17,18,19,20. Recently, quantum approximate optimization algorithm (QAOA) has been researched owing to its superiority over classical algorithms, which was demonstrated at the time of its proposal21. However, with the development of classical algorithms22, the quantum advantage of QAOA is now an open question.

Under these circumstances, it is challenging for researchers all over the world to employ existing or near-future quantum computers to achieve tasks that are very difficult or impossible using classical computers. Currently available quantum computers are noisy intermediate-scale quantum (NISQ) devices23. Further, conventional quantum algorithms, such as Grover’s algorithm, require many gate operations and they cannot be implemented on NISQ devices with no error correction due to short coherence time. Recently, classical-quantum hybrid algorithms called variational quantum eigensolver (VQE)24,25, and QAOA21,26,27,28,29,30 have been proposed for NISQ devices. In these methods, ansatz states with parameters are implemented on quantum circuits, and the parameters included in the ansatz states are optimized on a classical computer. While VQE and QAOA can be realized with a limited number of quantum operations and have good noise tolerance, it is difficult to determine the ansatz states properly and converge high-dimensional parameters31.

For quantum many-body problems, an imaginary-time evolution method is a known computational method to identify the ground state. The imaginary-time evolution method selectively extracts the ground-state component by performing time evolution in the direction of imaginary time. Various combinatorial optimization problems are converted to a Hamiltonian format, and their corresponding Hamiltonian is derived32. Thus, it is possible to solve the combinatorial optimization problem using the imaginary-time evolution method.

The implementation of the imaginary-time evolution method on a quantum computer involves a critical problem in that the imaginary-time evolution operator is a nonunitary operator, and therefore, it cannot implement the imaginary-time evolution method on a quantum computer in its current state. To overcome this challenge, two quantum imaginary-time evolution (QITE) methods—one that assumes an ansatz state and another that does not—were proposed in previous studies33,34. The method that assumes the ansatz state traces the imaginary-time evolution of the parameters contained in the ansatz state33,35,36. The other method introduces a unitary operation to reproduce the state on which the imaginary-time evolution operator has acted accurately without assuming an ansatz state34,37,38.

We focus on the QITE method without the ansatz assumption and apply it to the optimization problems. The QITE method requires defining a domain size, which determines the accuracy of reproducing the imaginary-time evolution operator. The quantum circuit of an imaginary-time step scales exponentially with respect to this domain size. Besides, as an additional quantum circuit is added at each imaginary-time step, the quantum circuit becomes deeper in proportion to the lapse of imaginary time34. These two features make it difficult to implement the QITE method on NISQ devices.

Therefore, we propose two approximations and one computational technique to overcome this difficulty. We succeeded in significantly reducing the quantum circuit depth of the QITE method, and we applied the developed algorithms to the max-cut problem, which is an NP-hard problem. For the max-cut problem, we chose an unweighted 3-regular graph and a weighted fully connected graph. The latter is a problem known as the classification problem in the context of unsupervised machine learning39,40.

Results

Unitarization of imaginary-time evolution operators

Consider a scenario wherein a Hamiltonian \(\hat{H}\) is given for the optimization problem considered in this study. The Hamiltonian \(\hat{H}\) is expressed as the summation of some partial Hamiltonians \(\hat{h}[m]\) as \(\hat{H}=\mathop{\sum }\nolimits_{m = 1}^{{N}_{{\rm{ham}}}}\hat{h}[m]\), where Nham is the number of the partial Hamiltonians. The max-cut problem, which is a computational target of this work, is represented by the Hamiltonian in the form of Ising spins and can be mapped to the Pauli-operator representation for qubits in a straightforward manner. In the case of the Hamiltonian of quantum chemistry, each partial Hamiltonian can be mapped to the Pauli-operator representation on qubits via the Bravyi–Kitaev representation41 or Jordan–Wigner representation42.

For a given Hamiltonian, the ground state is obtained by using the imaginary-time evolution method. We apply the imaginary-time evolution operator defined by \({{\rm{e}}}^{-\tau \hat{H}}\), where τ is the imaginary time to reach the initial (τ = 0) state of the system, \(\left|{{\Psi }}(\tau =0)\right\rangle\); and \({{\rm{e}}}^{-\tau \hat{H}}\left|{{\Psi }}(\tau =0)\right\rangle\). The imaginary-time evolution operator is decomposed by a first-order Suzuki–Trotter decomposition into ones with a small imaginary-time step Δτ (τ ≡ Δτ × Nstep) of the individual partial Hamiltonians \(\hat{h}[m]\).

$${{\rm{e}}}^{-\tau \hat{H}}=\mathop{\prod }\limits_{n=1}^{{N}_{{\rm{step}}}}\mathop{\prod }\limits_{m=1}^{{N}_{{\rm{ham}}}}{{\rm{e}}}^{-{{\Delta }}\tau \hat{h}[m]}+{\mathcal{O}}({{\Delta }}{\tau }^{2})\ \ .$$
(1)

Because the operators of the imaginary-time evolution are nonunitary, they cannot be directly implemented as a gate operation on a quantum computer. In the QITE method, the unitary operator \({e}^{-i{{\Delta }}\tau {\hat{A}}_{n}[m]}\) is defined such that it reproduces the state \({{\rm{e}}}^{-{{\Delta }}\tau \hat{H}}\left|{{{\Psi }}}_{n}\right\rangle\) for a given state \(\left|{{{\Psi }}}_{n}\right\rangle \equiv \left|{{\Psi }}(\tau =n{{\Delta }}\tau )\right\rangle\). We determine the Hermitian operator \({\hat{A}}_{n}[m]\) that minimizes the following residual norm.

$${\left|\left|\frac{{{\rm{e}}}^{-{{\Delta }}\tau \hat{h}[m]}\left|{{{\Psi }}}_{n}\right\rangle }{\sqrt{\left\langle {{{\Psi }}}_{n}\right|{{\rm{e}}}^{-2{{\Delta }}\tau \hat{h}[m]}\left|{{{\Psi }}}_{n}\right\rangle }}-{{\rm{e}}}^{-i{{\Delta }}\tau {\hat{A}}_{n}[m]}\left|{{{\Psi }}}_{n}\right\rangle \right|\right|}^{2}\ \ .$$
(2)

Nonlocal condition for imaginary-time evolution operators

We express the Hermitian operator \({\hat{A}}_{n}[m]\) as a linear combination of the Dth order tensor products of Pauli operators \(\{{\hat{I}}_{l},{\hat{\sigma }}_{X,l},{\hat{\sigma }}_{Y,l},{\hat{\sigma }}_{Z,l}\}\) acting on the lth qubit as

$${\hat{A}}_{n}[m]={\mathop{\sum}\limits_{{l}_{k+1},\cdots ,{l}_{D}\in {{\mathbb{L}}}_{m}}}^{\prime}\mathop{\sum}\limits_{{i}_{1}\cdots {i}_{D}}{a}_{{i}_{1}\cdots {i}_{D},{l}_{1}\cdots {l}_{D}}^{(n)}[m]{\hat{\sigma }}_{{i}_{1},{l}_{1}(m)}\otimes \cdots \otimes {\hat{\sigma }}_{{i}_{D},{l}_{D}},$$
(3)

where the prime on the first summation symbol indicates removing the double counting of the repeated tensors. We defined \({{\mathbb{L}}}_{m}\) as the set of \({N}_{{{\mathbb{L}}}_{m}}\) qubits, each of which is directly connected with those acted on by the partial Hamiltonian \(\hat{h}[m]\); however, \({{\mathbb{L}}}_{m}\) does not contain the qubits acted on by \(\hat{h}[m]\) [see Fig. 1a]. The parameter D, which is called the domain size, satisfies \(k\,\leqq \,D\,\leqq \,k+{N}_{{{\mathbb{L}}}_{m}}\), where we assumed the partial Hamiltonian \(\hat{h}[m]\) to be written by a tensor product of the kth order. {l1(m),,lk(m)} is the set of qubits contained in the partial Hamiltonian \(\hat{h}[m]\). The summation in Eq. (3) is taken over all combinations of D − k qubits, {lk + 1(m),, lD(m)}, and chosen from \({{\mathbb{L}}}_{m}\). D is an input parameter that represents the level of approximation; a larger D indicates that the imaginary-time evolution operator is expressed using higher-order tensor products and the residual norm in Eq. (2) shows a smaller value, which leads to a better approximation. Note that for D = Nbit, with Nbit being the number of qubits, the residual norm in Eq. (2) vanishes when minimized, yielding the exact imaginary-time evolution operator. In this context, the parameter D represents a truncation level. We consider a scenario where the domain size D incorporates all elements in \({{\mathbb{L}}}_{m}\), namely \(D=k+{N}_{{{\mathbb{L}}}_{m}}\), and then Eq. (3) reproduces the operator An[m] introduced in reference34. This implies that Eq. (3) is a natural extension of the approximation introduced in reference34. We call the method for determining the operator An[m] defined in reference34 local approximation (LA) for comparison with later approximation. Then, we refer to the method defined in Eq. (3) as extended local-approximation (eLA). The following notation is used to indicate the domain size D: e.g., LA with D = 6 is denoted by LA-D6. Note that, for LA, it is a well-defined approximation only when the domain size \(D=k+{N}_{{{\mathbb{L}}}_{m}}\), and the value of D that can be taken is limited by the Hamiltonian. With an ill-defined domain size D in LA, we found that the calculation accuracy decreased, which is called “Inexact QITE" in reference34. Note that eLA can remove such constraints on the Hamiltonian and flexibly determine the parameter D by considering the linear combination for qubits. This flexibility is obvious in the max-cut problem of the fully connected graph. Solving the minimization problem in Eq. (2) to determine the coefficients \({a}_{\{i,l\}}^{(n)}[m]\) results in the linear equation S(n)a(n)[m] = b(n)[m], which can be solved using a classical computer. Here, \({S}_{\{i,{l}_{i}\}\{j,{l}_{j}\}}^{(n)}=\left\langle {{{\Psi }}}_{n}\right|{\hat{\sigma }}_{\{i,{l}_{i}\}}^{\dagger }{\hat{\sigma }}_{\{j,{l}_{j}\}}\left|{{{\Psi }}}_{n}\right\rangle\) and \({b}_{\{i,{l}_{i}\}}^{(n)}[m]=\left\langle {{{\Psi }}}_{n}\right|{\hat{\sigma }}_{\{i,{l}_{i}\}}^{\dagger }\hat{h}[m]\left|{{{\Psi }}}_{n}\right\rangle\). Figure 1b shows a schematic of the quantum circuit representing one imaginary-time step of LA. In LA, the operator of the imaginary-time evolution is approximated by the tensor products of Pauli operators up to the Dth order; therefore, 4D gate operations are required for each partial Hamiltonian. The total number of gate operations for one step of the imaginary-time evolution is Nham4D. Table 1 summarizes the size of the linear equation of the LA per step of the imaginary-time evolution and the number of gate operations per qubit, where Nbit is the total number of qubits.

Fig. 1: Quantum circuit diagrams and quantum circuit depth.
figure 1

a Schematic of a 3-regular graph. A partial Hamiltonian \(\hat{h}[m]\) acts on qubits represented with red vertices, i.e., \(\hat{h}[m]\propto {\hat{\sigma }}_{Z,i}\otimes {\hat{\sigma }}_{Z,j}\). Outer blue vertices represent a domain directly connected to the red vertices. Blue vertices contained in the pale blue region comprise the set \({{\mathbb{L}}}_{m}\) in Eq. (3). b Quantum circuit diagram for one imaginary-time step of LA. The horizontal line represents each qubit, and the yellow box represents 4D gating operations on the straddling qubits. c Quantum circuit diagram for one imaginary-time step of the NLA (domain size D = 2). The green boxes and vertical lines connecting them represent a second-order tensor product operation on the two straddling qubits, with one imaginary-time step containing \({}_{{N}_{{\rm{bit}}}}{C}_{D}\) of second-order tensor products. Detailed quantum circuit of the two-qubit unitary gate is described in Supplementary Note 2. The dependence of the quantum circuit depth for one imaginary-time step of the max-cut problem in the 3-regular graph (d) and the fully connected graph (e) as a function of the number of qubits.

Table 1 Scaling of the size of the matrix S(n) and the number of gate operations per qubit of the linear equation of LA and NLA per imaginary-time step.

Furthermore, this study proposes another approximation method for \({\hat{A}}_{n}\) in the following form:

$${\hat{A}}_{n}={\mathop{\sum}\limits_{{l}_{1},\cdots {l}_{D}}}^{\prime}\mathop{\sum}\limits_{{i}_{1},\cdots {i}_{D}}{a}_{{i}_{1}\cdots {i}_{D},{l}_{1}\cdots {l}_{D}}^{(n)}{\hat{\sigma }}_{{i}_{1},{l}_{1}}\otimes \cdots \otimes {\hat{\sigma }}_{{i}_{D},{l}_{D}}.$$
(4)

The difference from Eq. (3) is that we remove the restriction on the set {l1(m), lk(m)} and extend the summation over qubits to incorporate all possible combinations of D qubits {l1(m), lD(m)}. We call this an NLA. As per this definition, we expand the Hermitian operator, \({\hat{A}}_{n}\), using tensor products of Pauli operators over all qubit combinations. Moreover, in LA and eLA, the tensor product space describing \({\hat{A}}_{n}[m]\) is different depending on m, which is the partial Hamiltonian. The NLA has a notable feature in that the tensor product space that describes \(\hat{A}[m]\) is the same for all m. Table 1 lists the size of the linear equations of the NLA per step of the imaginary-time evolution and the number of gate operations per qubit, where the NLA requires only 4D unitary operators in \({}_{{N}_{{\rm{bit}}}}{C}_{D}\) combinations for the quantum circuit in the first step of the imaginary-time evolution. Figure 1c shows the schematic of the quantum circuit of the NLA for one step of the imaginary-time evolution (for D = 2).

Reduction effect of circuit depth

To clarify the accuracy and effectiveness of NLA, we applied it to the max-cut problem, which is an NP-hard problem. The Hamiltonian of the max-cut problem in qubit representation is given in the following form containing second-order tensor products32.

$$\hat{H}=-\mathop{\sum}\limits_{(i,j)\in E}{d}_{i,j}\frac{1-{\hat{\sigma }}_{Z,i}{\hat{\sigma }}_{Z,j}}{2}$$
(5)

As for the max-cut problem, we considered typical graphs such as 3-regular and fully connected graphs. The 3-regular graphs have three connected edges at every vertex, where E is the set of edges contained in the graph and di,j is the weight of the edges connecting the ith and jth vertices.

The circuit depths, when LA and NLA are applied to the max-cut problem, are shown in Fig. 1d for the 3-regular graph and Fig. 1e for the fully connected graph because different graphs of the max-cut problem change the number of the partial Hamiltonian Nham; the necessary circuit depths for each approximation change correspondingly. In Fig. 1d, e, the circuit depth calculated using Qiskit43 is plotted with points, and the plotted points are extrapolated. In the case of k-regular graphs, the number of the partial Hamiltonians is given by Nham = kNbit/2. It increases linearly with the number of vertices Nbit so that the number of gate operations per qubit does not depend on the number of qubits, as listed in Table 1. Thus, we extrapolated using \(y={\rm{const.}}\). In NLA, regardless of the structure of the Hamiltonian, the number of gate operations per qubit is scaled by \({\mathcal{O}}({{N}_{{\rm{bit}}}}^{D-1})\) with respect to the number of qubits Nbit because all combinations of \({}_{{N}_{{\rm{bit}}}}{C}_{D}\) are taken for gate operations including the Dth order tensor product. In Fig. 1d, the circuit depth of the NLA is extrapolated by the function fitted by f(x) = xD − 1.

Note that in LA, D = 3, 4, and 5 are not well-defined in the 3-regular graph. Thus, D = 6 is required, and 46 = 4096 gate operations are necessary for the imaginary-time evolution of one partial Hamiltonian, which leads to a deeper circuit depth and difficulty in implementation on NISQ devices. In addition, the circuit depth required for LA-D6, compared to NLA-D2, NLA-D3, etc., is considerably higher in the region with a small number of qubits. The circuit depth of the NLA becomes deeper than that of LA in the region where the number of qubits increases.

In Fig. 1e, LA-D2 and eLA-D3 are not shown for the fully connected graph (\({N}_{{\rm{ham}}}{ = }_{{N}_{{\rm{bit}}}}{C}_{2}\)) because the circuit depth of LA-D2 is equal to that of the NLA-D2, and that of eLA-D3 is equal to that of NLA-D3. In addition, because the domain size has to be D = Nbit in LA, which is the exact imaginary-time evolution in a fully connected graph, and the circuit depth increases exponentially with respect to the number of qubits. In NLA, it can be scaled down to the linear or quadratic function with respect to the number of qubits. This result indicates that the NLA and eLA are efficient in reducing the circuit depth, especially when the number of partial Hamiltonians increases; further, these algorithms are effective for NISQ devices.

Calculation accuracy

Simulations were performed after modifying the code provided in reference34. As an initial state, we adopted a state in which all states were superimposed with equal a priori weights. We adopt a figure of merit to discuss the accuracy of the QITE method.

$$r={{\rm{lim}}}_{\tau \to \infty }\frac{\left\langle {{\Psi }}(\tau )\right|\hat{H}\left|{{\Psi }}(\tau )\right\rangle }{{E}_{{\rm{GS}}}}.$$
(6)

The first target of the max-cut problem is an unweighted 3-regular graph with ten vertices, where EGS is the energy of the ground state, and it is obtained from the exact diagonalization. The energy of the ground state is EGS = −12. It is known that designing a classical algorithm that achieves r > 331/332 for an unweighted 3-regular graph is an NP-hard problem44. Further, the approximation accuracy of the current classical algorithm is r ≈ 0.932645. Figure 2a shows the imaginary-time dependence of the energy. The imaginary-time step was set to Δτ = 0.01. In LA-D2, as the imaginary-time τ increased, the energy decreased exponentially in the beginning and converged to around −9, which is higher than the exact solution by about 3. Another important point is that the energy does not monotonically decreases along the imaginary-time evolution. This behavior indicates that the conversion of the operator of the imaginary-time evolution to the unitary operators is less accurate in expanding it in the space of LA-D2. Furthermore, the LA-D6 calculation result shows E = −11.99, which is the energy almost equal to the exact solution. We found that an approximation accuracy in the eLA-D3 is E = − 11.17 (r = 0.93) (the lowest value is E = −11.33 (r = 0.94)); in NLA-D2, E = −11.42 (r = 0.95); and in NLA-D3, E = −12.00 (r = 1.00). We found that eLA-D3 had an approximation accuracy similar to that of the classical algorithm, and NLA-D2 had already exceeded the approximation accuracy of the classical algorithm. NLA-D3 shows better accuracy than NLA-D2 and can reach a nearly exact solution. Note that eLA and NLA monotonically decrease the energy along the imaginary-time evolution with sufficiently good accuracy compared to LA-D2. This behavior was confirmed not only for NLA-D2 but also for NLA-D3 and others. As can be seen from Fig. 1d, in LA-D6, the circuit depth of one imaginary-time step is 369757, while the circuit depth in the NLA-D2 is 789. This implies the circuit depth of NLA can be significantly shallower than that of LA.

Fig. 2: Numerical simulation of QITE method.
figure 2

Energy E (a) and component proportions of the state n(E) (b) in the QITE method to the max-cut problem for an unweighted 3-regular graph with ten vertices. The ground state is denoted by GS and the first excited state by ES. The energy E (c) and component proportions n(E) (d) of the QITE method for a weighted fully connected graph with ten vertices. e The total energy level diagram of the weighted fully connected graph and the eigenstates corresponding to the ground state and the first excited state (divided into two regions, red and blue).

While NLA-D3 has extremely high accuracy, its circuit depth increases with a quadratic function with respect to the number of qubits. Then, we developed NLA-D2.5 to keep the scaling of the circuit depth as linear as NLA-D2 while maintaining the accuracy of NLA-D3, which is an approximation to expand the space of \({\hat{A}}_{n}\) to the space involving the second-order tensor products incorporated by NLA-D2 and the third-order tensor products by eLA-D3. Thus, by incorporating some portions of bases of eLA-D3 into those of NLA-D2, computational scaling can be made linear with respect to the number of qubits, which makes it applicable even in regions with a large number of qubits. Figure 1d shows that the circuit depth is almost the same as that of NLA-D2 for 50 qubits or more, which means that the circuit depth can be significantly reduced compared to that of NLA-D3. In addition, the calculation result of NLA-D2.5 is E = −11.95 and r = 0.99, which gives a good approximation accuracy with a small circuit depth.

Here, for further consideration, we decomposed the state \(\left|{{\Psi }}(\tau )\right\rangle ={{\rm{e}}}^{-\tau \hat{H}}\left|{{\Psi }}(\tau =0)\right\rangle\) into the eigenstate components of the Hamiltonian, and the calculated n(E) ≡ ∑iiΨ(τ)〉2δ(E − Ei) as a function of energy E at each imaginary-time step τ is plotted in Fig. 2b where \(\left|i\right\rangle\) is the eigenstate of \(\hat{H}\) and Ei is the eigen energy of \(\left|i\right\rangle\). Here, we note that the ground state of the eigenstate component n(EGS) is equal to the so-called fidelity defined as F = 〈Ψ(τ)ΨGS2. The ground state can be observed with probabilities of n(EGS) = 0.60 for eLA-D3 (at maximum, n(EGS, τ = 2.87) = 0.65), n(EGS) = 0.69 for NLA-D2, n(EGS) = 0.97 for NLA-D2.5, and n(EGS) = 1.00 for NLA-D3. The imaginary-time dependence of the probability of the first excited state is also plotted. For the first excited state, it is observed that the probability is amplified up to τ = 1, and it starts to decrease, which increases the ground-state probability.

Next, we deal with another computational model called a weighted fully connected graph (classification problem). The coupling constants di,j were given by random numbers. The ground-state energy is EGS = −57.993. In addition, the imaginary-time step is set to Δτ = 0.01. In the classification problem, as shown in Fig. 2e, each graph vertex is colored red or blue. In LA-D2, as in the 3-regular graph, we observed that the energy does not necessarily decrease monotonically. The energy of eLA-D3 is lower than that of NLA-D2; E = −57.504 (r = 0.99) for eLA-D3, E = −57.026 (r = 0.98) for NLA-D2, and E = −57.985 (r = 0.99) for NLA-D3 (Fig. 2c). From the viewpoint of the component analyses of the states, the ground state and the first excited state are pseudo-degenerate (Fig. 2e), and therefore, the probability of the first excited state remains at the same level as the ground state even around τ = 2 when the energy converges sufficiently (Fig. 2d). In NLA, the first excited state gradually decays along with the imaginary-time evolution; however, a sufficiently long imaginary-time evolution is necessary. In particular, NLA-D2 behaves similarly to NLA-D3, and NLA-D2 is sufficiently accurate to obtain the ground state in actual applications.

We now consider why the accuracy of eLA and NLA is better than that of LA with a relatively small domain size D. From the actual application results of eLA-D3, we found that the \({b}_{\{i,{l}_{i}\}}^{(n)}[m]=\left\langle {{{\Psi }}}_{n}\right|{\hat{\sigma }}_{\{i,{l}_{i}\}}^{\dagger }\hat{h}[m]\left|{{{\Psi }}}_{n}\right\rangle \approx 0\) when the Pauli operator \({\hat{\sigma }}_{\{i,{l}_{i}\}}^{\dagger }\) and \(\hat{h}[m]\) do not intersect each other. With a rough approximation for such cases, \({b}_{\{i,{l}_{i}\}}^{(n)}[m]=\left\langle {{{\Psi }}}_{n}\right|{\hat{\sigma }}_{\{i,{l}_{i}\}}^{\dagger }\hat{h}[m]\left|{{{\Psi }}}_{n}\right\rangle =0\), a sparsity in the coefficients of An can be deduced, which eLA highlight and leverage. This fact means that the terms that would require a large domain size D in LA can be efficiently captured with a smaller domain size D in eLA, leading to its high accuracy. Furthermore, by considering that the definition of NLA is expanded from that of eLA, NLA can improve further the accuracy of eLA.

Compression of imaginary-time steps

The approximation accuracy of the NLA and its circuit depth have been discussed. The “compression of imaginary-time steps” is introduced in this section for further reduction of the number of gate operations in NLA. Figure 3a shows a schematic of the compression technique. When the imaginary-time step Δτ is sufficiently small, the time-evolution operators can be compressed into a single exponential form via the reverse Suzuki–Trotter decomposition

$$\mathop{\prod }\limits_{n=1}^{{N}_{{\rm{comp}}}}\exp (-i{{\Delta }}\tau \hat{{A}_{n}})=\exp (-i{{\Delta }}\tau \mathop{\sum }\limits_{n=1}^{{N}_{{\rm{comp}}}}\hat{{A}_{n}})+{\mathcal{O}}({{\Delta }}{\tau }^{2}),$$
(7)

where Ncomp is the number of compressed steps. It is necessary to choose an appropriate Ncomp within the range that guarantees sufficient accuracy for the Suzuki–Trotter decomposition because its accuracy decreases if the Ncomp becomes large. To determine the specific Ncomp in this work, we increased the Ncomp parameter by one at every time-evolution step until the total energy increases. In actual QITE calculations, Ncomp is not necessarily a constant throughout the calculation. This method enables the reduction of quantum circuits to as small as 1/Ncomp. We discussed the error of the second order for Δτ in the compression method in Supplementary Note 1.

Fig. 3: Numerical simulation of compression of imaginary-time steps.
figure 3

a Schematic of the compression of the imaginary-time step. Ncomp steps are compressed into one step by 1st-order Suzuki–Trotter decomposition. Energy E (b) and component of the eigenstate n(E) (c) in the imaginary-time evolution with and without compression of the imaginary-time step in the max-cut problem of an unweighted 3-regular graph with ten vertices. The compressed point Ncomp is plotted with circles. d Results of the simulation with noise for the max-cut problem in an unweighted graph with four vertices.

The graph used for the calculation is the same as that in Fig. 2a, b, which is a 3-regular graph with ten vertices. Figure 3 shows the results of the compression technique for the QITE. In Fig. 3b, the time the compression ended is plotted as a blue circle. In the case of Fig. 3b, the quantum circuit depth is significantly reduced by the compression technique to four compressed imaginary-time steps, and the energy at τ = 10 is E = −11.43 (r = 0.95) without and E = −11.59 (r = 0.97) with the compression technique. We found that sufficient accuracy was achieved regardless of the compression, which indicates that compression does not affect the results. It may be assumed that the compressed technique has a lower energy than that of the uncompressed calculation; a detailed investigation revealed that this was attributed to the accidental acceleration of the convergence by compression. Figure 3c plots the component analyses of the wavefunctions during the imaginary-time evolution with and without the compression method. Finally, the probability of obtaining a ground state is \(n({E}_{\min })=0.76\) with and \(n({E}_{\min })=0.73\) without the compression technique.

The “compression of imaginary-time steps” is effective in reducing the circuit depth, and simultaneously, it reduces the noise associated with the gate operations. We discuss the results of the simulation with noise. The actual qubits are currently connected only with neighboring sites; however, in this study, we simulated a fully connected model. For implementation on an actual quantum computer, in which only adjacent sites are connected, a SWAP gate can be used with an overhead of \({\mathcal{O}}(\sqrt{{N}_{{\rm{bit}}}})\)46. For example, QAOA uses a SWAP network47,48 to implement a \({\mathcal{O}}({N}_{{\rm{bit}}})\) overhead49. We describe the quantum circuit of NLA-D2 of the QITE method for an adjacent-coupling circuit using the SWAP network in Supplementary Note 2. Figure 3d shows the simulation results of the max-cut problem for an unweighted graph with four vertices. The coefficients \({a}_{\{i,l\}}^{(n)}\) in Eq. (4) for the noisy calculation are the same as those for the non-noisy calculation. The noiseless condition without compression results in E = −3.94, which is close to the exact solution E = −4.00 around τ = 5. However, the circuit depth is 922 (Δτ = 0.5), and the simulation result with noise is E = −3.13, which is far from the exact solution. This gap was attributed to the accumulation of errors caused by an increase in circuit depth. The result with compression is E = −3.85 in the case without noise; however, the circuit depth is 163, and the effect of noise is expected to be less sensitive. In fact, the simulation result with noise is E = −3.63, which shows that the noise can be reduced with compression. Thus, it has been shown that the “compression” method of quantum circuits has the advantage of reducing the accumulation of errors.

Discussion

In this study, we proposed two-step approximation methods based on nonlocality: eLA and NLA. We applied them to the Max-cut problem of an unweighted 3-regular graph and a weighted fully connected graph, and comparatively validated the performances of LA, eLA, and NLA. We found that NLA requires significantly less circuit depth than LA while maintaining the same level of computational accuracy. For example, when we request the classical approximation limit in the QITE calculations, the circuit depth required for a single imaginary-time step can be significantly reduced from 369,757 for LA to 789 for NLA when applying it to a 3-regular graph, and from about 314,000 for LA to 789 for NLA when applying it to a fully connected graph. Further, we developed a “compression” technique of the imaginary-time evolution steps to further reduce the circuit depth in the QITE method. With this compression method, we succeeded in further reducing the circuit depth. We showed that the reduction in circuit depth using this compression method has a secondary effect of reducing the accumulation of error caused by the gate operation. Thus, it is an effective method for realization on NISQ devices. The eLA, NLA, and compression methods introduced in this study enable us to significantly reduce the circuit depth and the accumulation of error caused by the gate operation and have paved the way for the realization of the QITE method on NISQ devices.

Methods

Noisy simulation of QITE method

Our numerical simulations were performed after implementing the eLA, NLA, and compression method on the code provided in reference34. The simulation of the quantum noise’s presence is performed with the implementation of the QITE method at the level of the NLA-D2 on the IBM Qiskit quantum simulator. Although almost all actual quantum devices’ qubit connectivity is restricted, we simulated the QITE method based on the fully connected coupling. For implementation on a device connected only with neighboring qubits, we provide a circuit of the QITE method using the SWAP network in Supplementary Note 2. The error model of the gate was constructed from the thermal relaxation time (T1, T2) = (100 μs, 80 μs), and the gate time (Tg1, Tg2) = (0.02 ns, 0.1 ns). The noise simulation was performed by introducing the readout errors (p00, p01, p10, p11) = (0.995, 0.005, 0.02, 0.98). These parameters were assumed to be close to the actual values of IBMQ50.

Note added to proof

During our review of this paper, we noticed an independent work-related “compression method” being done in parallel51.