Simulated bifurcation assisted by thermal fluctuation

Various kinds of Ising machines based on unconventional computing have recently been developed for practically important combinatorial optimization. Among them, the machines implementing a heuristic algorithm called simulated bifurcation have achieved high performance, where Hamiltonian dynamics are simulated by massively parallel processing. To further improve the performance of simulated bifurcation, here we introduce thermal fluctuation to its dynamics relying on the Nosé–Hoover method, which has been used to simulate Hamiltonian dynamics at finite temperatures. We find that a heating process in the Nosé–Hoover method can assist simulated bifurcation to escape from local minima of the Ising problem, and hence lead to improved performance. We thus propose heated simulated bifurcation and demonstrate its performance improvement by numerically solving instances of the Ising problem with up to 2000 spin variables and all-to-all connectivity. Proposed heated simulated bifurcation is expected to be accelerated by parallel processing. Ising machines aim to find the global minimum of Ising energy in the fewest time steps. Here, simulated thermal fluctuations are added to simulated bifurcation to quickly escape from local minima.

To further improve the performance of SB, here we introduce thermal fluctuation to SB. SA can yield high-accuracy solutions by modeling thermal fluctuation [33], while quantum annealing utilizes quantum fluctuation [3].These fluctuations can assist escape from local minima of the Ising problem and lead to higher solution accuracy.
Our method is based on the Nosé-Hoover method [50,51], which enables simulations of Hamiltonian dynamics at finite temperatures [52].Unlike SA, the Nosé-Hoover method does not use random numbers, namely, is deterministic, and thus the simplicity of SB is preserved.We find that a simplified dynamics with only a heating process can improve the performance of SB, where an ancillary dynamical variable in the Nosé-Hoover method is replaced by a constant.We numerically demonstrate this improvement by solving instances of the Ising problem with up to 2000 spin variables and all-to-all connectivity, which corresponds to the Sherrington-Kirkpatrick (SK) model introduced in studies of spin glasses [53][54][55].The SK model has been widely used to measure the performance of Ising machines [12, 14, 28-30, 32, 34, 38, 56].Proposed heated SB is also suitable for massively parallel implementations with, e.g., FPGAs.

RESULTS AND DISCUSSION
SB with thermal fluctuation.First, we briefly explain the Ising problem and SB.The Ising problem is to find N Ising spins s i = ±1 minimizing a dimensionless Ising energy, where J ij represents the interactions between s i and s j (J ij = J ji and J ii = 0).The SB has two latest variants, ballistic SB (bSB) and discrete SB (dSB) [38].Both bSB where x i and y i are respectively the positions and momenta corresponding to s i , the dots denote time derivatives, a(t) is a control parameter, and a 0 and c 0 are constants.The force due to the interactions, f i , are given by where sgn(x j ) is the sign of x j .Time evolutions of x i are calculated by solving Eqs. ( 2) and (3) with the symplectic Euler method [52], where the positions x i are confined within |x i | ≤ 1 by perfectly inelastic walls at x i = ±1, that is, if |x i | > 1 after each time step, x i and y i are set to x i = sgn (x i ) and y i = 0.With increasing a(t) from zero to a 0 , bifurcations to x i = ±1 occur, and the signs s i = sgn (x i ) yield a solution to the Ising problem.A solution at the final time is at least a local minimum of the Ising problem [38].Ballistic behavior in bSB leads to fast convergence to a local or approximate solution, while the discretized f i in dSB enable higher solution accuracy with a longer time.
Here we apply the Nosé-Hoover method [50][51][52] with a finite temperature T to Eqs. ( 2) and (3), obtaining ẋi = a 0 y i , where ξ is an ancillary variable playing a role of thermal fluctuation, and M is a parameter (mass).The variable ξ controls an instantaneous temperature defined by to be a given T as follows.When T inst is smaller than T , ξ is negative according to Eq. ( 8), which makes ξ negative.Then |y i | increase owing to the last term in Eq. ( 7), and thus T inst increases and approaches T , which can be regarded as heating.To the contrary, when T inst > T , cooling occurs.We found that SB gives better solutions when ξ is kept negative by negative initial ξ and large M (leading to ξ 0).This observation suggests that the heating can improve SB but the cooling is unnecessary.This may be because increased |y i | by the heating can lead to escape from local minima of the Ising problem.Furthermore, small | ξ| due to large M above implies that constant ξ can play a similar role, and then we found that ξ replaced by a negative constant −γ (γ > 0) can rather yield higher performance.The constant γ is regarded as a rate of the heating.
Thus, in this paper, we propose SB with a heating term, which we call heated bSB (HbSB) and dSB (HdSB), as follows, ẋi = a 0 y i , ( We numerically solve Eqs. ( 10) and ( 11) by discretizing the time by t k+1 = t k + ∆t with a time interval ∆t, and by calculating x i (t k+1 ) and y i (t k+1 ) from x i (t k ) and y i (t k ) in each time step.Here note that the symplectic Euler method is not applicable for Eqs. ( 10) and (11), because these equations are no longer Hamiltonian equations owing to the term γy i [52].We empirically found that solution accuracy can be improved by the same update as previous bSB and dSB [38] followed by an update corresponding to the term γy i .This ordering results in nonzero momenta by the heating and can prevent from getting stuck at the walls.See Methods for a detailed algorithm.
In the following, we compare heated SB with previous SB by solving instances of the Ising problem with allto-all connectivity, where J ij are randomly chosen from ±1 with equal probabilities (corresponding to the SK model).The control parameter a(t) is linearly increased from 0 to a 0 .The constant parameters are set as a 0 = 1 and where c 1 is a parameter tuned around 0.5, which is based on random matrix theory [34,38].x i and y i are initialized by uniform random numbers in the interval (−1, 1).
Performance for a 2000-spin Ising problem.We first solve a benchmark instance called K 2000 , which is a 2000-spin instance of the Ising problem with all-to-all connectivity [12,30,34,38].K 2000 is often expressed as a MAX-CUT problem.The MAX-CUT problem is given by weights w ij with w ij = w ji , and the following cut value C is maximized, where in Eq. ( 14), C has been related to E Ising [Eq.( 1)] by w ij = −J ij .Thus the MAX-CUT problem can be reduced to the Ising problem.
To confirm the effect of the heating, we calculate the instantaneous temperature T inst in Eq. ( 9) and the cut value C in Eq. ( 14) at every time step.Figure 1a shows typical examples of time evolutions of T inst .Here parameters are the ones optimized in advance, which are explained later.For bSB, T inst rapidly decreases owing to collisions with the perfectly inelastic walls, while T inst for HbSB is kept much higher owing to the heating, as expected.For dSB, T inst is also higher than bSB, because the discretized forces f i in Eq. ( 5) increase energy, violating conservation of energy [38].In comparison between HbSB and dSB, T inst for HbSB is higher than dSB around the end of the evolution.HdSB shows similar T inst to dSB, because the optimal rate of the heating γ for HdSB is small.dSB and HdSB.Next, we solve K 2000 in 10 4 trials with random initial x i and y i .In one trial, C is evaluated every 100 time steps, and the best value is output.Figure 2 shows examples of cumulative distributions of C, where the number of trials giving cut values lower than C are normalized by the total number of trials, 10 4 .For bSB, the majority of trials results in one of a few values of C between 33200 and 33300.On the other hand, dSB, HbSB, and HdSB yield broader distributions, owing to fluctuations (higher instantaneous temperatures) in their dynamics.It is notable that the heating makes the distribution of bSB similar to dSB (and HdSB).Inset in Fig. 2 shows a magnification around the best known cut value, C best = 33337 [38].The distributions for HbSB and HdSB are shifted to larger C than dSB, indicating improvement by the heating.
We then evaluate average and maximum C in the 10 4 trials and probability P for obtaining C best .Here, P is estimated by dividing the number of trials obtaining C best by the total number of trials.Also, using P and the number of time steps, N s , we calculate the number of time steps required to find C best with a probability of 99%, which we call step-to-solution S, given by Step-to-solution is a useful measure of performance of an algorithm [32].(Time-to-solution is often used to measure performance of Ising machines [28,31,32,38], but it depends on not only algorithms but also implementations to hardware devices.Time-to-solution equals stepto-solution multiplied by a computation time for one time step.)Here the parameters ∆t, c 1 , and γ are set such that S are minimized.For bSB, instead, average C is maximized, because we found P = 0 for bSB and could not estimate S. Step-tosolution.The black cross is step-to-solution obtained from previously reported data for dSB [38].Parameters ∆t, c1, and γ here are the same as in Fig. 1.SB means simulated bifurcation, and b, d, and H denote ballistic, discrete, and heated, respectively.
Figure 3a shows average and maximum C as functions of N s .For large N s , average C for dSB, HbSB, and HdSB are larger than that for bSB.C best is reached by three SBs other than bSB. Figure 3b shows that, for large N s , P are the highest for HbSB, followed by HdSB, dSB, and bSB in this ordering.
Figure 3c shows S as functions of N s .Each SB has a minimum of S at certain N s , and in the following we compare S minimized with respect to N s .The black cross represents a value obtained from previously reported data for dSB [38].The present value of S for dSB is smaller than the previous value, because in this study the parameters ∆t and c 1 are optimized for K 2000 while not in the previous study.In Fig. 3c, at optimal N s , S for HbSB is the smallest.Compared with dSB, S are reduced by 32.7% for HbSB and 19.7% for HdSB.These results demonstrate that the heating improves the performance.Although dSB performs better than bSB, bSB is much more improved by the heating than dSB, and resulting HbSB shows higher performance than HdSB.
Performance for 100 instances of a 700-spin Ising problem.Finally we examine the performance for instances other than K 2000 by solving 100 instances of the Ising problem with 700 spin variables and all-to-all connectivity [32,38].For these instances, reference solutions for estimating step-to-solution were obtained by SA with sufficiently long annealing times and many iterations, which are expected to be close to optimal solutions [38].Here each instance is solved in 10 4 trials, and C (or E Ising ) is evaluated at the last time step.We set the parameters to the values optimized in K 2000 .
Figure 4 shows the medians of S for the 100 instances [32,38] as functions of N s .HbSB results in the smallest S among four SBs (S by bSB is much larger than those by the others).In comparison with dSB, HbSB reduces S by 9.55%.This result demonstrate that HbSB can improve the performance for not only K 2000 but also the other instances.

CONCLUSION
We have demonstrated that SB can be improved by introducing fluctuation with a heating term, which has been obtained by replacing an ancillary dynamical variable in the Nosé-Hoover method by a constant rate of heating.This heating can be effective for both bSB and dSB, but the improvements are much larger for bSB.We have solved all-to-all connected 2000-spin and 700-spin instances of the Ising problem (the SK model) and have found that HbSB gives better step-to-solution than bSB, dSB, and HdSB.Since proposed heated SB shares the simple dynamics with previous SB, we expect that heated SB will be accelerated by massively parallel processing implemented by, e.g., FPGAs.This study also implies that further improvements of SB will be possible by simple physics-inspired modifications like the heating term introduced here.METHODS Heated simulated bifurcation.First, the symplectic Euler method [52] is formally applied to the terms other than γy i in Eqs.(10) and (11), ỹi = y i (t k )+{−[a 0 −a(t k )]x i (t k )+c 0 f i }∆t, (16) xi = x i (t k )+a 0 ỹi ∆t, (17) where f i are calculated from x i (t k ) with Eqs. ( 4) and (5), and the variables with the tildes denote temporary variables used within a time step.Then, the perfectly inelastic walls work as x i (t k+1 ) = g(x i ) , (18) ỹi = h(x i , ỹi ) , (19) where the functions g(x) and h(x, y) are given by h(x, y) = y, |x| ≤ 1, 0, |x| > 1.
Finally, we include the heating term, referring to the usual Euler method, as y i (t k+1 ) = ỹi +γy i (t k )∆t.( 22) Equations ( 16)-( 19) and ( 22) are numerically solved, where the variables are represented as single precision floating-point real numbers.

FigureFIG. 2 .
Figure 1b shows last parts of typical time evolutions of C. For bSB, C is almost constant for the last 1000 time steps, while C for HbSB continues to fluctuate until nearly the end of the evolution owing to the heating.The fluctuation for HbSB looks similar to the fluctuations for

FIG. 3 .
FIG. 3. Comparison of performance for K2000.a Average (lines) and maximum (symbols) cut values C as functions of the number of time steps, Ns.The horizontal dashed line indicates the best known cut value C best = 33337.The maximum C for dSB, HbSB, and HdSB coincide with C best for Ns ≥ 800.b Probabilities for obtaining C best .cStep-tosolution.The black cross is step-to-solution obtained from previously reported data for dSB[38].Parameters ∆t, c1, and γ here are the same as in Fig.1.SB means simulated bifurcation, and b, d, and H denote ballistic, discrete, and heated, respectively.

FIG. 4 .
FIG. 4. Comparison of performance for 100 instances of a 700-spin Ising problem.The medians of step-tosolution for the 100 instances are shown as functions of the number of time steps, Ns.Parameters ∆t, c1, and γ are the same as in Fig. 1.SB means simulated bifurcation, and b, d, and H denote ballistic, discrete, and heated, respectively.