Implementation of quantum imaginary-time evolution method on NISQ devices: Nonlocal approximation

The imaginary-time evolution method is widely known to be efficient for obtaining the ground state in quantum many-body problems on a classical computer. A recently proposed quantum imaginary-time evolution method (QITE) faces problems of deep circuit depth and difficulty in the implementation on noisy intermediate-scale quantum (NISQ) devices. In this study, a nonlocal approximation is developed to tackle this difficulty. We found that by removing the locality condition or local approximation (LA), which was imposed when the imaginary-time evolution operator is converted to a unitary operator, the quantum circuit depth is significantly reduced. We propose two-step approximation methods based on a nonlocality condition: extended LA (eLA) and nonlocal approximation (NLA). To confirm the validity of eLA and NLA, we apply them to the max-cut problem of an unweighted 3-regular graph and a weighted fully connected graph; we comparatively evaluate the performances of LA, eLA, and NLA. The eLA and NLA methods require far fewer circuit depths than LA to maintain the same level of computational accuracy. Further, we developed a ``compression'' method of the quantum circuit for the imaginary-time steps as a method to further reduce the circuit depth in the QITE method. The eLA, NLA, and the compression method introduced in this study allow us to reduce the circuit depth and the accumulation of error caused by the gate operation significantly and pave the way for implementing the QITE method on NISQ devices.


Quantum
computers, initially proposed by Feynmann [1], were unveiled by Deutsch [2], Grover [3], and Shor [4] to have great potential that could overwhelmingly surpasses classical computers.In addition, the news of Google's demonstration of quantum supremacy in 2019 has spread around the world [5] and expectations for the realization of practical quantum computers are increasing.One of the most promising problems for quantum computers is combinatorial optimization, which is a NP-hard problem [6].Combinatorial optimization problems are closely related to our daily lives, and they include the traveling salesman problem [7], scheduling problem [8], SAT (satisfiability problem) solver [9], among others.For these combinatorial optimization problems, Grover's algorithm is already known to improve the computational cost with quadratic speedup compared to classical computers [10,11].
Under these circumstances, it is challenging for researchers all over the world to employ existing or near-future quantum computers to achieve tasks that are very difficult or impossible using classical computers.Currently available quantum computers are noisy intermediate-scale quantum (NISQ) devices [12].Further, conventional quantum algorithms require many gate operations, such as Grover's algorithm, and they cannot be implemented on NISQ devices with no error correction and short coherence time.Recently, classical-quantum hybrid algorithms called variational quantum eigensolver (VQE) [13,14], and quantum approximate optimization algorithm (QAOA) [15][16][17][18][19][20] have been proposed for NISQ devices.In these methods, ansatz states with parameters are implemented as quantum circuits, and the parameters included in the ansatz states are optimized on a classical computer.While VQE and QAOA can be realized with a limited number of quantum operations and have good noise tolerance, it is difficult to determine the ansatz states properly and converge high-dimensional parameters [21].
For quantum many-body problems, an imaginary-time evolution method is a known computational method to identify the ground state.The imaginary-time evolution method selectively extracts the ground state component by performing time evolution in the direction of imaginary time.Various combinatorial optimization problems are converted to a Hamiltonian format, and their corresponding Hamiltonian is derived [22].Thus, it is possible to solve the combinatorial optimization problem using the imaginary-time evolution method.
The implementation of the imaginary-time evolution method on a quantum computer involves a critical problem in that the imaginary-time evolution operator is a nonunitary operator, and therefore, it cannot implement the imaginary-time evolution method on a quantum computer in its current state.To overcome this challenge, two imaginary-time evolution methods-one that assumes an ansatz state and another that does not-were proposed in this study.The method that assumes the ansatz state traces the imaginary-time evolution of the parameters contained in the ansatz state [23][24][25].The other method introduces a unitary operation to reproduce the state on which the imaginary-time evolution operator has acted accurately [26][27][28].The latter quantum imaginary-time evolution (QITE) method is considered an efficient approach to the optimization problem because it does not need to assume an ansatz state; further, there is no problem of convergence of high-dimensional parameters, even when compared to QAOA.
We focus on the QITE method without the ansatz assumption and apply it to the optimization problems.The QITE method proposed in the previous research has problems with circuit depth and computational cost; even a simple one-dimensional Ising model requires 4 4 = 256 fourth-order tensor-product operators [26].Further, more complex problems are challenging to implement on NISQ devices.
Therefore, we propose two approximations and one computational technique to overcome this difficulty.We succeeded in significantly reducing the quantum circuit depth of the QITE method, and we applied the developed algorithms to the max-cut problem, which is an NP-hard problem.For the max-cut problem, we chose an unweighted 3-regular graph and a weighted fully connected graph.The latter is a problem known as the classification problem in the context of unsupervised machine learning [29,30].

METHOD Unitarization of imaginary-time evolution operators
Consider a scenario wherein a Hamiltonian Ĥ is given for the optimization problem considered in this study.The Hamiltonian Ĥ is expressed as the summation of some partial Hamiltonians ĥ[m] as Ĥ = where N ham is the number of the partial Hamiltonians.The max-cut problem, which is a computational target of this work, is represented by the Hamiltonian in the form of Ising spins and can be mapped to the Pauli operator representation for qubits in a straightforward manner.In the case of the Hamiltonian of quantum chemistry, each partial Hamiltonian can be mapped to the Pauli-operator representation on qubits via the Bravyi-Kitaev representation [31] or Jordan-Wigner representation [32].
For a given Hamiltonian, the ground state is obtained by using the imaginary-time evolution method.We apply the imaginary-time evolution operator defined by e −τ Ĥ , where τ is the imaginary time to reach the initial (τ = 0) state of the system, |Ψ(τ = 0) ; and e −τ Ĥ |Ψ(τ = 0) .The imaginary-time evolution operator is decomposed by a first-order Suzuki-Trotter decomposition into ones with a small imaginary-time step ∆τ (τ ≡ ∆τ × N step ) of the individual partial Hamiltonians ĥ[m].
Because the operators of the imaginary-time evolution are nonunitary, they cannot be directly implemented as a gate operation on a quantum computer.In the QITE method, the unitary operator e −i∆τ Ân [m] is defined such that it reproduces the state e −∆τ Ĥ |Ψ n for a given state |Ψ n ≡ |Ψ(τ = n∆τ ) .We determine the Hermitian operator Ân [m] that minimizes the following residual norm.

Nonlocal condition for imaginary-time evolution operators
We express the Hermitian operator Ân [m] as a linear combination of the D-th order tensor products of Pauli operators { Îl , σX,l , σY,l , σZ,l } acting on the l-th qubit as where the prime on the first summation symbol indicates removing the double counting of the repeated tensors.We defined L m as the set of N Lm qubits, each of which interact with those in the partial Hamiltonian ĥ[m]; however, it is not contained in ĥ[m].The parameter D, which is called the domain size, satisfies k D k + N Lm , where we assumed the partial Hamiltonian ĥ[m] to be written by a tensor product of the k-th order.{l 1 (m), • • • , l k (m)} is the set of qubits contained in the partial Hamiltonian ĥ[m].The summation in Eq. ( 2) is taken over all combinations of D − k qubits, {l k+1 (m), • • • , l D (m)}, and chosen from L m .D is an input parameter that represents the level of approximation; a larger D indicates that the imaginary-time evolution operator is expressed using higher-order tensor products and the residual norm in Eq. ( 1) shows a smaller value, which leads to a better approximation.We consider a scenario where the domain size D incorporates all elements in L m , namely D = k + N Lm , and then Eq. ( 2) reproduces the operator A n [m] introduced in Ref. [26].This implies that Eq. ( 2) is a natural extension of the approximation introduced in Ref. [26].We call the method for determining the operator A n [m] defined in Ref. [26] local approximation (LA) for comparison with later approximation.Then, we refer to the method defined in Eq. ( 2) as extended local-approximation (eLA).The following notation is used to indicate the domain size D: e.g., LA with D = 6 is denoted by LA-D6.Note that, for LA, it is a well-defined approximation only when the domain size D = k+N Lm , and the value of D that can be taken is limited by the Hamiltonian.In addition, note that eLA can remove such constraints on the Hamiltonian and flexibly determine the parameter D by considering the linear combination for qubits.This flexibility is obvious in the max-cut problem of the fully connected graph.Solving the minimization problem in Eq. ( 1) to determine the coefficients a which can be solved using a classical computer.Here, S  1(a) shows a schematic of the quantum circuit representing one imaginary-time step of LA.In LA, the operator of the imaginary-time evolution is approximated by the tensor products of Pauli operators up to the D-th order; therefore, 4 D gate operations are required for each partial Hamiltonian.The total number of gate operations for one step of the imaginary-time evolution is N ham 4 D .Table I summarizes the size of the linear equation of the LA per step of the imaginary-time evolution and the number of gate operations per qubit, where N bit is the total number of qubits.
Furthermore, this study proposes another approximation method for Ân in the following form: The difference from Eq. ( 2) is that we remove the restriction on the set {l 1 (m), • • • l k (m)} and extend the summation over qubits to incorporate all possible combinations of D qubits {l 1 (m), • • • l D (m)}.We call this an NLA.

RESULTS
To clarify the accuracy and effectiveness of NLA, we applied it to the max-cut problem, which is an NP-hard problem.The Hamiltonian of the max-cut problem in qubit representation is given in the following form containing second-order tensor products [22].
As for the max-cut problem, we considered typical graphs such as 3-regular and fully connected graphs.The 3regular graphs have three connected edges at every vertex, where E is the set of edges contained in the graph and d i,j is the weight of the edges connecting the ith and jth vertices.

Reduction effect of circuit depth
The circuit depths when LA and NLA are applied to the max-cut problem are shown in Fig. 1(c) for the 3regular graph and Fig. 1(d) for the fully-connected graph because different graphs of the max-cut problem change the number of the partial Hamiltonian N ham ; the necessary circuit depths for each approximation change correspondingly.In Fig. 1(c) and (d), the circuit depth calculated using Qiskit [33] is plotted with points, and the plotted points are extrapolated.In the case of kregular graphs, the number of the partial Hamiltonians is given by N ham = kN bit /2.It increases linearly with the number of vertices N bit so that the number of gate operations per qubit does not depend on the number of qubits, as listed in Table I.Thus, we extrapolated using y = const..In NLA, regardless of the structure of the Hamiltonian, the number of gate operations per qubit is scaled by O(N bit D−1 ) with respect to the number of qubits N bit because all combinations of N bit C D are taken for gate operations including the D-th order tensor product.In Figure 1(c), the circuit depth of the NLA is extrapolated by the function fitted by f (x) = x D−1 .
Note that in LA, D = 3, 4, and 5 are not well defined in the 3-regular graph.Thus, D = 6 is required, and 4 6 = 4096 gate operations are necessary for the imaginary-time evolution of one partial Hamiltonian, which leads to a deeper circuit depth and difficulty in implementation on NISQ devices.In addition, the circuit depth required for LA-D6, compared to NLA-D2, NLA-D3, etc., is considerably higher in the region with a small number of qubits.The circuit depth of the NLA becomes deeper than that of LA in the region where the number of qubits increases.
In Figure 1(d), LA-D2 and eLA-D3 are not shown for the fully connected graph (N ham = N bit C 2 ) because the circuit depth of LA-D2 is equal to that of the NLA-D2, and that of eLA-D3 is equal to that of NLA-D3.In addition, because the domain size has to be D = N bit in LA, which is the exact imaginary-time evolution in a fully connected graph, and the circuit depth increases exponentially with respect to the number of qubits.In NLA, it can be scaled down to the linear or quadratic function with respect to the number of qubits.This result indicates that the NLA and eLA are efficient in reducing the circuit depth, especially when the number of partial Hamiltonians increases; further, these algorithms are effective for NISQ devices.

Calculation accuracy
Simulations were performed after modifying the code provided in Ref. [26].As an initial state, we adopted a state in which all states were superimposed with equal a priori weights.We adopt a figure of merit to discuss the accuracy of the QITE method.
The first target of the max-cut problem is an unweighted 3-regular graph with ten vertices, where E GS is the energy of the ground state, and it is obtained from the exact diagonalization.The energy of the ground state is E GS = −12.It is known that designing a classical algorithm that achieves r > 331/332 for an unweighted 3-regular graph is an NP-hard problem [34].Further, the approximation accuracy of the current classical algorithm is r ≈ 0.9326 [35].Figure 2 (a) shows the imaginary-time dependence of the energy.The imaginary-time step was set to ∆τ = 0.01.In LA-D2, as the imaginary-time τ increased, the energy decreased exponentially in the beginning and converged to around −9, which is higher than the exact solution by about 3. Another important point is that the energy does not monotonically decreases along the imaginary-time evolution.This behavior indicates that the conversion of the operator of the imaginarytime evolution to the unitary operators is less accurate in expanding it in the space of LA-D2.Furthermore, the LA-D6 calculation result shows E = −11.99,which is the energy almost equal to the exact solution.We found that an approximation accuracy in the eLA-D3 is E = −11.17(r = 0.93) (the lowest value is E = −11.33(r = 0.94)); in NLA-D2, E = −11.42(r = 0.95); and in NLA-D3, E = −12.00(r = 1.00).We found that eLA-D3 had an approximation accuracy similar to that of the classical algorithm, and NLA-D2 had already exceeded the approximation accuracy of the classical algorithm.Note that eLA and NLA monotonically decrease the energy along the imaginary-time evolution with sufficiently good accuracy compared to LA-D2.This behavior was confirmed not only for NLA-D2 but also for NLA-D3 and others.As can be seen from Fig. 1 (c), in LA-D6, the circuit depth of one imaginary-time step is 369757, while the circuit depth in the NLA-D2 is 789.This implies the circuit depth of NLA can be significantly shallower than that of LA.
While NLA-D3 has extremely high accuracy, its circuit depth increases with a quadratic function with respect to the number of qubits.Then, we developed NLA-D2.5 to keep the scaling of the circuit depth as linear as NLA-D2 while maintaining the accuracy of NLA-D3, which is an approximation to expand the space of Ân to the space involving the second-order tensor products incorporated by NLA-D2 and the third-order tensor products by eLA-D3.Thus, by incorporating some portions of bases of eLA-D3 into those of NLA-D2, computational scaling can be made linear with respect to the number of qubits, which makes it applicable even in regions with a large number of qubits.Fig. 1(c) shows that the circuit depth is almost the same as that of NLA-D2 for 50 qubits or more, which means that the circuit depth can be significantly reduced compared to that of NLA-D3.In addition, the calculation result of NLA-D2.5 is E = −11.95and r = 0.99, which gives a good approximation accuracy with a small circuit depth.
Here, for further consideration, we decomposed the state |Ψ(τ ) = e −τ Ĥ |Ψ(τ = 0) into the eigenstate components of the Hamiltonian, and the calculated n(E) as a function of energy E at each imaginary-time step τ is plotted in Fig. 2(b) where |i is the eigenstate of Ĥ and E i is the eigen-energy of |i .The ground state can be observed with probabilities of n(E GS ) = 0.60 for eLA-D3 (at maximum, n(E GS , τ = 2.87) = 0.65), n(E GS ) = 0.69 for NLA-D2, n(E GS ) = 0.97 for NLA-D2.5, and n(E GS ) = 1.00 for NLA-D3.The imaginary-time dependence of the probability of the first excited state is also plotted.For the first excited state, it is observed that the probability is amplified up to τ = 1, and it starts to decrease, which increases the ground state probability.
Next, we deal with another computational model called a weighted fully connected graph (classification problem).The coupling constants d i,j were given by random numbers.The ground state energy is E GS = −57.993.In addition, the imaginary-time step is set to ∆τ = 0.01.
In the classification problem, as shown in Fig. 2(e), each graph vertex is colored red or blue.In LA-D2, as in the 3-regular graph, we observed that the energy does not necessarily decrease monotonically.The energy of eLA-D3 is lower than that of NLA-D2; E = −57.504(r = 0.99) for eLA-D3, E = −57.026(r= 0.98) for NLA-D2, and E = −57.985(r= 0.99) for NLA-D3 (Figure 2 (c)).From the viewpoint of the component analyses of the states, the ground state and the first excited state are pseudo-degenerate (Fig. 2.(e)), and therefore, the probability of the first excited state remains at the same level as the ground state even around τ = 2 when the energy converges sufficiently (Fig. 2.(d)).In NLA, the first excited state gradually decays along with the imaginarytime evolution; however, a sufficiently long imaginarytime evolution is necessary.In particular, NLA-D2 behaves similarly to NLA-D3, and NLA-D2 is sufficiently accurate to obtain the ground state in actual applications.

Compression of imaginary-time steps
The approximation accuracy of the NLA and its circuit depth have been discussed.The "compression of imaginary-time steps" is introduced in this section for further reduction of the number of gate operations in NLA. Figure 3  where N comp is the number of compressed steps.It is necessary to choose an appropriate N comp within the range that guarantees sufficient accuracy for the Suzuki-Trotter decomposition because its accuracy decreases if the N comp becomes large.To determine the specific N comp in this work, we increased the N comp parameter by one at every time-evolution step until the total energy increases.In actual QITE calculations, N comp is not necessarily a constant throughout the calculation.This method enables the reduction of quantum circuits to be as small as 1/N comp .
The graph used for the calculation is the same as that in Figures 2(a) and (b), which is a 3-regular graph with ten vertices.Figure 3 shows the results of the compression technique for the QITE.In Figure 3(b), the time the compression ended is plotted as a blue circle.In the case of Fig. 3(b), the quantum circuit depth is significantly reduced by the compression technique to four compressed imaginary-time steps, and the energy at τ = 10 is E = −11.43(r = 0.95) without and E = −11.59(r = 0.97) with the compression technique.We found that sufficient accuracy was achieved regardless of the compression, which indicates compression does not affect the results.It may be assumed that the compressed technique has a lower energy than that of the uncompressed calculation; a detailed investigation revealed that this was attributed to the accidental acceleration of the convergence by compression.Figure 3 (c) plots the component analyses of the wavefunctions during the imaginary-time evolution with and without the compression method.Finally, the probability of obtaining a ground state is n(E min ) = 0.76 with and n(E min ) = 0.73 without the compression technique.
The "compression of imaginary-time steps" is effective in reducing the circuit depth, and simultaneously, it reduces the noise associated with the gate operations.We discuss the results of the simulation with noise.The actual qubits are currently connected only with neighboring sites; however, in this study, we simulated a fully connected model.For implementation on an actual quantum computer, in which only adjacent sites are connected, a SWAP gate can be used with an overhead of O( √ N bit ) [36].For example, QAOA uses a SWAP network [37,38] to implement a O(N bit ) overhead [39].The error model of the gate was constructed from the thermal relaxation time (T 1 , T 2 ) = (100µs, 80µs), and the gate time (T g1 , T g2 ) = (0.02ns, 0.1ns).The noise simulation was performed by introducing the readout errors (p 00 , p 01 , p 10 , p 11 ) = (0.995, 0.005, 0.02, 0.98).These parameters were assumed to be close to the actual values of IBMQ [40]. Figure 3 (d) shows the simulation results of the max-cut problem for an unweighted graph with four vertices.The coefficients a (n) {i,l} in Eq. (3) for the noisy calculation are the same as those for the non-noisy calculation.The noiseless condition without compression results in E = −3.94,which is close to the exact solution E = −4.00around τ = 5.However, the circuit depth is 922 (∆τ = 0.5), and the simulation result with noise is E = −3.13,which is far from the exact solution.This gap was attributed to the accumulation of errors caused by an increase in circuit depth.The result with compression is E = −3.85 in the case without noise; however, the circuit depth is 163, and the effect of noise is expected to be less sensitive.In fact, the simulation result with noise is E = −3.63,which shows that the noise can be reduced with compression.Thus, it has been shown that the "compression" method of quantum circuits has an advantage of reducing the accumulation of errors.

CONCLUDING REMARKS
In this study, we proposed two-step approximation methods based on nonlocality: eLA and NLA.We applied them to the Max-cut problem of an unweighted 3regular graph and a weighted fully-connected graph, and comparatively validated the performances of LA, eLA, and NLA.We found that NLA requires significantly less circuit depth than LA while maintaining the same level of computational accuracy.For example, when we request Results of the simulation with noise for the max-cut problem in an unweighted graph with four vertices.
the classical approximation limit in the QITE calculations, the circuit depth required for a single imaginarytime step can be significantly reduced from 369757 for LA to 789 for NLA when applying it to a 3-regular graph, and from about 314000 for LA to 789 for NLA when applying it to a fully connected graph.Further, we developed a "compression" technique of the imaginary-time evolution steps to further reduce the circuit depth in the QITE method.With this compression method, we succeeded in further reducing the circuit depth.We showed that the reduction in circuit depth using this compression method has a secondary effect of reducing the accumu-lation of error caused by the gate operation.Thus, it is an effective method for realization on NISQ devices.The eLA, NLA, and compression methods introduced in this study enable us to reduce the circuit depth and the accumulation of error caused by the gate operation significantly and have paved the way for the realization of the QITE method on NISQ devices.
FIG. 1: (a) Quantum circuit diagram for one imaginary-time step of LA.The horizontal line represents each qubit, and the yellow box represents 4 D gating operations on the straddling qubits.(b) Quantum circuit diagram for one imaginary-time step of the NLA (domain size D = 2).The green boxes and vertical lines connecting them represent a second-order tensor product operation on the two straddling qubits, with one imaginary-time step containing N bit C D of second-order tensor products.The dependence of the quantum circuit depth for one imaginary-time step of the max-cut problem in the 3-regular graph (c) and the fully connected graph (d) as a function of the number of qubits.

N
bit C D combinations for the quantum circuit in the first step of the imaginary-time evolution.Figure1(b)shows the schematic of the quantum circuit of the NLA for one step of the imaginary-time evolution (for D = 2).

FIG. 2 :
FIG. 2: Energy E(a) and component proportions of the state n(E)(b) in the QITE method to the max-cut problem for an unweighted 3-regular graph with ten vertices.The ground state is denoted by GS and the first excited state by ES.The energy E(c) and component proportions n(E)(d) of the QITE method for a weighted fully connected graph with ten vertices.(e) The total energy level diagram of the weighted fully connected graph and the eigenstates corresponding to the ground state and the first excited state (divided into two regions, red and blue).
(a)  shows a schematic of the compression technique.When the imaginary-time step ∆τ is sufficiently small, the time-evolution operators can be compressed into a single exponential form via the reverse Suzuki-Trotter decomposition

FIG. 3 :
FIG. 3: (a) Schematic of the compression of the imaginary-time step.energy E(b) and component of the eigenstate n(E)(c) in the imaginary-time evolution with and without compression of the imaginary-time step in the max-cut problem of an unweighted 3-regular graph with ten vertices.The compressed point N comp is plotted with circles.(d) Results of the simulation with noise for the max-cut problem in an unweighted graph with four vertices.

TABLE I :
Scaling of the size of the matrix S (n) and the number of gate operations per qubit of the linear equation of LA and NLA per imaginary-time step feature in that the tensor product space that describes Â[m] is the same for all m.Table I lists the size of the linear equations of the NLA per step of the imaginary-time evolution and the number of gate operations per qubit, where the NLA requires only 4 D unitary operators in