Introduction

Many complex systems such as spin glasses, interacting proteins, large-scale hardware, and financial portfolios, can be described as ensembles of disordered elements that have competing frustrated interactions1 and rugged energy landscapes. There has been a growing interest in using physical simulators called Ising machines in order to reduce time and resources needed to identify configurations that minimize their total interaction energy, notably that of the Ising Hamiltonians \({{{{{{{\mathcal{H}}}}}}}}\) with \({{{{{{{\mathcal{H}}}}}}}}({{{{{{{\boldsymbol{\sigma }}}}}}}})=-\frac{1}{2}{\sum }_{ij}{\omega }_{ij}{\sigma }_{i}{\sigma }_{j}\) (with ωij the symmetric Ising couplings, i.e., ωij = ωji, and σi = ±1) that is related to many nondeterministic polynomial-time hard (NP-hard) combinatorial optimization problems and various real-world applications2 (see ref. 3 for a review). Recently proposed implementations include memresistor networks4, micro- or nano-electromechanical systems5, micro-magnets6,7, coherent optical systems8, hybrid opto-electronic hardware9,10,11, integrated photonics12,13,14, flux qubits15, and Bose-Einstein condensates16. In principle, these physical systems often possess unique properties, such as coherent superposition in flux qubits17 and energy efficiency of memresistors4,18, which could lead to a distinctive advantage compared to conventional computers (see Fig. 1(a) and (b)) for the sampling of low energy states. In practice, the difficulty in constructing connections between constituting elements of the hardware is often the main limiting factor to scalability and performance for these systems15,19. Moreover, these devices often implement schemes that are directly related to the concept of annealing (either simulated20,21, mean-field22,23, chaotic18,24, and quantum17,25) in which the escape from the numerous local minima26 and saddle points27 of the free energy function can only be achieved under very slow modulation of the control parameter (see Fig. 1(c)). These methods are dependent on non-equilibrium dynamics called aging that, according to recent numerical studies28, is strongly non-ergodic and seems to explore only a confined subspace determined by the initial condition rather than wander in the entire configurational space29 for mean-field spin glass models. In general, such systems find the solutions of minimal energy only after many repetitions of the relaxation process.

Fig. 1: Schematic representation of the proposed chip.
figure 1

Schematic representation of a conventional central processing unit (CPU) and separate memory with the von Neumann bottleneck problem and b the proposed neuromorphic chip for combinatorial optimization. Universal Asynchronous Receiver/Transmitter (UART), phase-locked loops (PLL), and global clock buffer with clock enable (BUFGCE) are used for input/output of data, clock management, and clock gating, respectively. Neurons and synapses are shown in the bottom to illustrate the analogy between the organization of the chip and biological neural networks. Schema of dynamics in analog state-space x of algorithms based on c annealing on a potential function shown at different times ti and d a trajectory of proposed chaotic amplitude control scheme shown by the curved gray line with an arrow. Red circles show the discrete space σ.

Alternative dynamics that is not based on the concepts of annealing and relaxation may perform better for solving hard combinatorial optimization problems30,31,32. Various kinds of dynamics have been proposed3,33,34,35,36, notably chaotic dynamics18,37,38,39,40, but have either not been implemented onto specialized hardware37,41 or use chaotic dynamics merely as a replacement to random fluctuations18,38. It was recently shown that the control of amplitude in mean-field dynamics can improve the performance of Ising machines by introducing error-correction terms (see Fig. 1(d)), effectively doubling the dimensionality of the system, whose role is to correct the amplitude heterogeneity30. Because of the similarity of such dynamics with that of a neural network, it can be implemented especially efficiently in electronic neuromorphic hardware where memory is distributed with the processing42,43,44.

In this paper, we show that the addition of the nonrelaxational part to the relaxation process makes the dynamics able to escape at a much faster rate than relaxational ones from local minima and saddles of the corresponding energy function. The exponential scaling factor with respect to system size of the time needed to reach optimal configurations of spin glasses that can be found after a very long computation time using state-of-the-art heuristics, called time to solution, is smaller in the former case, which raises the question whether the nonrelaxational component makes the dynamics qualitatively different from the very slow relaxation observed in classic Monte-Carlo simulations of spin glasses. In order to extend numerical analysis to large problem sizes and limit finite-size effects, we implement a scheme that we name chaotic amplitude control (CAC) on a field programmable gate array (FPGA, see Fig. 1(b)) and show that the developed hardware can be faster for finding these optimal configurations in the limit of large problem sizes than many state-of-the-art algorithms and Ising machines for some reference benchmarks with enhanced energy efficiency.

Results

Mean-field dynamics

For the sake of simplicity, we consider the classical limit of Ising machines for which the state space is often naturally described by analog variables (i.e., real numbers) noted xi in the following. The variables xi represent measured physical quantities such as voltage4 or optical field amplitude8,9,10,11,12,13 and these systems can often be simplified to networks of interacting nonlinear elements whose time evolution can be written as follows:

$$\frac{\,{{{{{{{\rm{d}}}}}}}}{x}_{i}}{\,{{{{{{{\rm{d}}}}}}}}t}={f}_{i}({x}_{i})+{\beta }_{i}(t)\mathop{\sum}\limits_{j}{\omega }_{ij}{g}_{j}({x}_{j})+{\sigma }_{0}{\eta }_{i},$$
(1)

where fi and gi represent the nonlinear gain and interaction, respectively, and are assumed to be monotonic, odd, and invertible “sigmoidal” functions for the sake of simplicity; ηi, experimental white noise of standard deviation σ0 with \(\langle {\eta }_{i}(t){\eta }_{j}(t^{\prime} )\rangle ={\delta }_{ij}\delta (t-t^{\prime} )\); and N, the number of spins. δij and δ(t) are the Kronecker delta symbol and Dirac delta function, respectively. Ordinary differential equations similar to eq. (1) have been used in various computational models that are applied to NP-hard combinatorial optimization problems such as Hopfield-Tank neural networks45, coherent Ising machines46, and correspond to the “soft” spin description of frustrated spin systems47. Moreover, the steady states of eq. (1) correspond to the solutions of the “naive” Thouless-Anderson-Palmer (nTAP) equations that arise from the mean-field description of Sherrington-Kirkpatrick spin glasses when the Onsager reaction term has been discarded48. In the case of neural networks in particular, the variables xi and constant parameters ωij correspond to firing rates of neurons and synaptic coupling weights, respectively.

It is well known that, when βi = β for all i and the noise is not taken into account (σ0 = 0), the time evolution of this system is motion in the state space that seeks out minima of a potential function49 (or Lyapunov function) V given as \(V=\beta {{{{{{{\mathcal{H}}}}}}}}({{{{{{{\boldsymbol{y}}}}}}}})+{\sum }_{i}{V}_{b}({y}_{i})\), where Vb is a bistable potential with \({V}_{b}({y}_{i})=-\int\nolimits_{0}^{{y}_{i}}{f}_{i}({g}_{i}^{-1}(y))\,{{{{{{{\rm{d}}}}}}}}y\) and \({{{{{{{\mathcal{H}}}}}}}}({{{{{{{\boldsymbol{y}}}}}}}})=-\frac{1}{2}{\sum }_{ij}{\omega }_{ij}{y}_{i}{y}_{j}\) is the extension of the Ising Hamiltonian in the real space with yi = gi(xi) (see Supplementary Note 1.1). The bifurcation parameter β, which can be interpreted as the inverse temperature of the naive TAP equations48, the steepness of the neuronal transfer function in Hopfield-Tank neural networks45, or to the coupling strength in coherent Ising machines8,9,10, is usually decreased gradually in order to improve the quality of solutions found. This procedure has been called mean-field annealing23, and can be interpreted as a quasi-static deformation of the potential function V (see Fig. 1(c)). There is, however, no guarantee that a sufficiently slow deformation of the landscape V will ensure convergence to the lowest energy state contrary to the quantum adiabatic theorem50 or the convergence theorem of simulated annealing51. At fixed β, global convergence to the minimum of the potential V can be assured if σ0 is gradually decreased with \({\sigma }_{0}{(t)}^{2} \sim \frac{c}{log(2+t)}\) and c sufficiently large52. The parameter \({\sigma }_{0}^{2}\) is analogous to the temperature in simulated annealing in this case. The global minimum of the potential V does not, however, generally correspond to that of the Ising Hamiltonians \({{{{{{{\mathcal{H}}}}}}}}\) at finite β. Moreover, the statistical analysis of spin glasses suggests that the potential V is highly non-convex at low temperature and that simple gradient descent very unlikely reaches the global minimum of \({{{{{{{\mathcal{H}}}}}}}}({{{{{{{\boldsymbol{\sigma }}}}}}}})\) because of the presence of exponentially numerous local minima26 and saddle points27 as the size of the system increases. The slow relaxation time of Monte-Carlo simulations of spin glasses, such as when using simulated annealing, might also be explained by similar trapping dynamics during the descent of the free energy landscape obtained from the TAP equations27. In the following, we consider in particular the soft spin description obtained by taking \({f}_{i}({x}_{i})=(-1+p){x}_{i}-{x}_{i}^{3}\) and yi = g(xi) = xi, where p is the gain parameter, which is the canonical model of the system described in eq. (1) at proximity of a pitchfork bifurcation with respect to the parameter p (see theory of weakly connected neural networks53). In this case, the potential function Vb is given as \({V}_{b}({x}_{i})=(1-p)\frac{{x}_{i}^{2}}{2}+\frac{{x}_{i}^{4}}{4}\) and eq. (1) can be written as \(\frac{\,{{{{{{{\rm{d}}}}}}}}{x}_{i}}{\,{{{{{{{\rm{d}}}}}}}}t}=-\frac{\partial V}{\partial {x}_{i}}\), i.

Chaotic amplitude control

In order to define deterministic dynamics that is inclined to visit spin configurations associated with lower Ising Hamiltonian without relying entirely on the descent of a potential function, we introduce error signals, noted \({e}_{i}\in {\mathbb{R}}\), that modulate the strength of coupling βi to the ith nonlinear element such that βi(t) defined in eq. (1) is expressed as βi(t) = βei(t) with β > 0. The time evolution of the error signals ei are given as follows30:

$$\frac{\,{{{{{{{\rm{d}}}}}}}}{e}_{i}}{\,{{{{{{{\rm{d}}}}}}}}t}=-\xi (g{({x}_{i})}^{2}-a){e}_{i},$$
(2)

where a and ξ are the target amplitude and the rate of change of error variables, respectively, with a > 0 and ξ > 0. If the system settles to a steady state, the values \({y}_{i}^{* }=g({x}_{i}^{* })\) become exactly binary with \({y}_{i}^{* }=\pm\!\sqrt{a}\). When p < 1, the internal fields hi at the steady state, defined as hi = ∑jωijσj with \({\sigma }_{j}={y}_{j}^{* }/| {y}_{j}^{* }|\), are such that hiσi > 0, i30. Thus, each equilibrium point of the analog system corresponds to that of a zero-temperature local minimum of the binary spin system.

The dynamics described by the coupled equations (1) and (2) is not derived from a potential function because error signals ei introduce asymmetric interactions between the xi and the computational principle is not related to a gradient descent. Rather, the addition of error variables results in additional dimensions in the phase space via which the dynamics can escape local minima. The mechanism of this escape can be summarized as follows. It can be shown (see the Supplementary Note 1.2) that the dimension of the unstable manifold at equilibrium points corresponding to local minima σ of the Ising Hamiltonian depends on the number of eigenvalues μ(σ) with μ(σ) > F(a) where μ(σ) are the eigenvalues of the matrix \({\{\frac{{\omega }_{ij}}{| {h}_{i}| }\}}_{ij}\) (with internal field hi) and F a function given as \(F(y)=\frac{\psi ^{\prime} (y)}{\psi (y)}y\) and \(\psi (y)=\frac{f({g}^{-1}(y))}{({g}^{-1})^{\prime} (y)}\). Thus, there exists a value of a such that all local minima (including the ground state) are unstable and for which the system exhibits chaotic dynamics that explores successively candidate boolean configurations. The energy is evaluated at each step and the best configuration visited is kept as the solution of a run. This chaotic search is particularly efficient for sampling configurations of the Ising Hamiltonian close to that of the ground state using a single run although the distribution of sampled states is not necessarily given by the Boltzmann distribution. Note that the use of chaotic dynamics for solving Ising problems has been discussed previously18,54, notably in the context of neural networks, and it has been argued that chaotic fluctuations may possess better properties than Brownian noise for escaping from local minima traps. In the case of the proposed scheme, the chaotic dynamics is not merely used as a replacement for noise. Rather, the interaction between nonlinear gain and error-correction results in the destabilization of states associated with lower Ising Hamiltonian. Increasing the amplitude of additive noise σ0 in eq. (1) does not, in fact, significantly increase the efficiency of the system and σ0 is thus set to zero.

Ensuring that fixed points are locally unstable does not guarantee that the system does not relax to periodic and chaotic attractors. We have previously proposed that non-trivial attractors can also be destabilized by ensuring the positive rate of Gibbs entropy production using a modulation of the target amplitude30. In this paper, we propose an alternative heuristic modulation of the target amplitude that is more suitable for a digital implementation on FPGA than the one proposed in30 without significant decrease in performance for most problem instances (see Supplementary Note 1.3 for a comparison of the two schemes). Because the value of a for which all local minima is unstable is not known a priori, we propose instead to destabilize the local minima traps by dynamically modulating a depending on the visited configurations σ as follows:

$$a(t)=\alpha -\rho \,{{\mbox{tanh}}}\,(\delta {{\Delta }}{{{{{{{\mathcal{H}}}}}}}}(t)),$$
(3)

where \({{\Delta }}{{{{{{{\mathcal{H}}}}}}}}(t)={{{{{{{{\mathcal{H}}}}}}}}}_{{{\mbox{opt}}}}-{{{{{{{\mathcal{H}}}}}}}}(t)\); \({{{{{{{\mathcal{H}}}}}}}}(t)\), the Ising Hamiltonian of the configuration visited at time t; and \({{{{{{{{\mathcal{H}}}}}}}}}_{{{\mbox{opt}}}}\), a given target energy. In practice, we set \({{{{{{{{\mathcal{H}}}}}}}}}_{{{\mbox{opt}}}}\) to the lowest energy visited during the current run, i.e., \({{{{{{{{\mathcal{H}}}}}}}}}_{{{\mbox{opt}}}}(t)={{{\mbox{min}}}}_{t^{\prime} \le t}{{{{{{{\mathcal{H}}}}}}}}(t^{\prime} )\). The function tanh is the tangent hyperbolic. ρ and δ are positive real constants. Lastly, the parameter ξ (see eq. (2)) is modulated as follows: \(\frac{\,{{{{{{{\rm{d}}}}}}}}\xi }{\,{{{{{{{\rm{d}}}}}}}}t}={{\Gamma }}\) when t − tr < Δt, where tr is the last time for which either the best known energy \({{{{{{{{\mathcal{H}}}}}}}}}_{{{\mbox{opt}}}}\) was updated or ξ was reset. Otherwise, ξ is reset to 0 if t − tr ≥ Δt and tr is set to t. Numerical simulations shown in the following suggest that this modulation results in the destabilization of non-trivial attractors (periodic, chaotic, etc.) for typical problem instances.

Hardware independent time-to-solution scaling

In order to test if the nonrelaxational dynamics of chaotic amplitude control might be able to accelerate the search of mean-field dynamics for finding the ground state of typical frustrated systems, we look for the ground states of Sherrington-Kirkpatrick (SK) spin glass instances using the numerical simulation of eqs. (1) to (3) and compare time to solutions with those of two closely related relaxational schemes: noisy mean-field annealing (NMFA)22 and the simulation of the coherent Ising machine (simCIM, see Supplementary Note 4.1-2). Because finding the ground states of SK spin glasses is a NP-hard problem, it cannot be assured that the states found by heuristic solvers, even after a very long computation time, are the real ground states. In order to compare the performance of a set of heuristic solvers, we can define for each instance the “optimal” energy as the one equal to the lowest energy found repeatedly (>106 times) by these heuristics after a very long computation time. States that have this optimal energy are called solution states in the following. Because the arithmetic complexity of calculating one step of these three schemes is dominated by the matrix-vector multiplication (MVM), it is sufficient for the sake of comparison to count the number of MVM, noted ν, to find the solution state energy of a given instance with 99% success probability, with \(\nu (K)=K\frac{\,{{\mbox{ln}}}(1-0.99)}{{{\mbox{ln}}}\,(1-{p}_{0}(K))}\) and p0(K) the probability of visiting a solution state configuration at least once after a number K of MVMs in a single run. In Fig. 2, NMFA (a) and the CAC (b) are compared using the averaged success probability 〈p0〉 of finding the solution state for 100 randomly generated SK spin glass instances per problem size N (see Supplementary Note 3 for details about the benchmark set). Note that the success probability of the mean-field annealing method does not seem to converge to 1 even for large annealing time (see Fig. 2(a)). Because the success probability of NMFA and simCIM remains low at larger problem size, its correct estimation requires simulating a larger number of runs, which we achieved by using GPU implementations of these methods. On the other hand, the average success probability 〈p0〉 of CAC is of order 1 when the maximal number of MVM is large enough, suggesting that the system rarely gets trapped in local minima of the Ising Hamiltonian or non-trivial attractors. In Fig. 2(c) and (d) are shown the qth percentile (with q = 50, i.e., the median) of the MVM to solution distribution, noted νq(K; N), for various duration of simulation K, where K is the number of MVMs of a single run. The minimum of these curves, noted \({\nu }_{q}^{* }(N)\) with \({\nu }_{q}^{* }(N)={{{\mbox{min}}}}_{K}{\nu }_{q}(K;N)\), represents the optimal scaling of MVM to solution vs. problem size N15. Using the hypothesis of an exponential scaling with the square root of problem size N, CAC exhibits significantly smaller scaling exponent (γ = 0.18 ± 0.06) than the NMFA (γ = 0.47 ± 0.04) and simCIM (γ = 0.54 ± 0.03, see inset in Fig. 2(e)). We have verified that this scaling advantage holds for various parameters of the mean-field annealing (see Supplementary Note 4.1). Note that a root-exponential scaling behavior of the median time to solution has been previously reported for SK spin glass problems15,55 and other NP-Hard problems56. We consider both cases of a root-exponential and exponential scaling as possible hypotheses in the following. Note that a similar root-exponential scaling at finite size N is observed when we apply the CAC algorithm to solving problems from the recently proposed Wishart planted instances57,58 in the “easy” regime (see Supplementary Note 3.3).

Fig. 2: Matrix-vector multiplication to solution for Sherrington-Kirkpatrick instances.
figure 2

a, b Average success probability 〈p0〉 of finding the solution state configuration of 100 Sherrington-Kirkpatrick spin glass instances and c, d 50th percentile (median) of the matrix-vector multiplication (MVM) to solution distribution ν50 vs. system size N. a, c noisy mean-field annealing (NMFA). b, d CAC. Thicker lines correspond to the duration of simulation K = 105.0. Dotted black lines show the minimum of the median MVM to solution ν50 with respect to the number of MVMs of a single run K. e MVM to solution \({\nu }_{q}^{* }(N)\) distribution. Lower, higher, and upper whisker of boxes show the 50th, 80th, and 90th percentiles of the distribution. The upper right inset shows the exponential scaling factor γ of the 50th percentile with \({\nu }_{50} \sim {e}^{\gamma \sqrt{N}}\) for CAC, NMFA, and simCIM.

Benchmark of the FPGA implementation

Although comparison of algebraic complexity indicates that CAC has a scaling advantage over mean-field annealing, it is in practice necessary to compare its physical implementation against other state-of-the-art methods because the performance of hardware depends on other factors such as memory access and information propagation delays. To this end, CAC is implemented into a FPGA because its similarity with neural networks makes it well-fitted for a design where memory is distributed with processing (see Supplementary Note 2 for the details of the FPGA implementation). The organization of the electronic circuit can be understood using the following analogy. Pairs of analog values xi and ei, which represent averaged activity of two types of neurons, are encoded within neighboring circuits. This micro-structure is repeated N times on the whole surface of the chip, which resembles the columnar organization of the brain. The nonlinear processes fi(xi), which model the local-population activation functions and are independent for i ≠ j, are calculated in parallel. The coupling between elements i and j {1, …, N} that is achieved by the dot product in eq. (1) is implemented by circuits that are at the periphery of the chip and are organized in a summation tree reminiscent of dendritic branching (see Fig. 1(b)). The power consumption of the developed hardware never exceeds 5W because of limitations of the development board that we have used.

First, we compare the FPGA implementation of CAC against state-of-the-art CPU algorithms: break-out local search59 (BLS) that has been used to find many of the best known maximum-cuts (equivalently, Ising energies) from the GSET benchmark set (https://web.stanford.edu/~yyye/yyye/Gset/), a well-optimized single-core CPU implementation of parallel tempering (or random replica exchange Monte-Carlo Markov chain sampling)60,61 (PT, courtesy of S. Mandrà), simulated annealing (SA)62. Figure 3(a) shows that the CAC on FPGA has the smallest real time to solution \({\tau }_{q}^{* }\) against most other state-of-the-art algorithms despite just 5W power-consumption where \({\tau }_{q}^{* }(N)\) is the optimal qth percentile of time to solution with 99% success probability and is given as \({\tau }_{q}^{* }(N)={{{\mbox{min}}}}_{T}{\tau }_{q}(T;N)\) where τ(T) of a given instance is \(\tau (T)=T\frac{\,{{\mbox{ln}}}(1-0.99)}{{{\mbox{ln}}}\,(1-{p}_{0}(T))}\) and T is the duration in seconds of a run. The probability p0(T) is evaluated using 100 runs per instance. The results of the CAC algorithm run on a CPU are also included in Fig. 3. The CPU implementation of CAC written in python for this work is not optimized and is consequently slower than other algorithms. However, its scaling of time to solution with problem size is consistent with that of CAC on FPGA. Figure 3 shows that CAC implemented on either CPU or FPGA has a significantly smaller increase of time to solution with problem size than SA run on CPU. Note that the power consumption of transistors in the FPGA and CPU scales proportionally to their clock frequencies. In order to compare different hardware despite the heterogeneity in their power consumption, the qth percentile of energy-to-solution \({E}_{q}^{* }\), i.e., the energy \({E}_{q}^{* }\) required to solve SK instances with \({E}_{q}^{* }=P{\tau }_{q}^{* }\) and P the power consumption, is plotted in Fig. 3(b). For the sake of simplicity, we assume a 20 watts power consumption for the CPU. These numbers represent typical orders to magnitude for contemporary digital systems. CAC on FPGA is 102 to 103 times more energy efficient than state-of-the-art algorithms running on classical computers.

Fig. 3: Time to solution of the field programmable gate array (FPGA) implementation for Sherrington-Kirkpatrick instances.
figure 3

a Lower, higher, and upper whisker of boxes show the 50th, 80th, and 90th percentiles of the time to solution \({\tau }_{q}^{* }\) distribution in seconds vs. the square root of problem size \(\sqrt{N}\) for the FPGA implementation of chaotic amplitude control (CAC) with a maximum of 5W power consumption and the following algorithms running on a CPU (20W): CAC, simulated annealing (SA), and parallel tempering (PT). b The same as a for the energy-to-solution E*. Power consumption of the FPGA is considered constant with respect to N because of the pipelined implementation.

The Monte-Carlo methods SA and PT have moreover been recently implemented on a special-purpose electronic chip called Digital Annealer (DA)20. In Fig. 4, we show the scaling exponents of 50th and 80th percentiles of the MVM and time to solution distribution for problem sizes N = 800 to N = 1100 based on the hypothesis of scaling in eγN obtained by fitting data shown in Figs. 2 and 3 (see “CAC fully parallel” and ‘CAC 100 × 100”, respectively, in Fig. 4 for the numerical values of the scaling exponents), and compare them to that reported for SA and PT implemented on CPU and DA. Note that we replicated the benchmark method of ref. 20 for the sake of the comparison by fitting time to solution from N = 800 up to N = 1100. The scaling obtained from fitting the MVM to solution in Fig. 2 is based on the assumption that the matrix-vector multiplication can be calculated fully in parallel in a time that scales as log(N) instead of N2 (see Methods section) at least up to N = 1100. We include this hypothesis because many other Ising machines exploit the parallelization of matrix-vector multiplication for speed up20,63, whereas the current implementation of CAC iterates on block matrices of size 100 by 100 and is thus only partially parallel because of resource limitations specific to the downscale FPGA used in this work. Note that the number of matrix-vector multiplications necessary to find the solution state dominates the exponential scaling rather than the time to compute one matrix-vector multiplication. The scaling of time to solution for chaotic amplitude control observed is significantly smaller than the ones of standard Monte-Carlo methods SA and PT20, especially in the case of a fully parallel implementation. The scaling exponents of fully parallel CAC is smaller than that of DA and on par with that of PT on DA (PTDA), although CAC does not require simulating replica of the system and is thus faster in absolute time than PTDA20.

Fig. 4: Scaling exponents of the proposed method.
figure 4

Scaling exponents \(\frac{\gamma }{\,{{\mbox{log}}}\,(10)}\) of the 50th (a) and 80th (b) percentiles of the time to solution distribution based on the hypotheses of scaling in eγN obtained by fitting data of the number of matrix-vector multiplication (MVM) to solution vs. problem size N shown in Fig. 2 of the proposed chaotic amplitude control dynamics and the scaling exponents reported in ref. 20. Colored boxes show the 90% confidence interval in the scaling exponents. CAC: chaotic amplitude control; SA: simulated annealing; PT: parallel tempering; DA: digital annealer; PTDA: parallel tempering on DA. Exponents of SA, PT, DA, and PTDA are taken from ref. 20. CAC (100 × 100) and CAC (fully parallel) are estimated using time (see Fig. 3) and MVM (see Fig. 2) to solution, respectively.

Next, the proposed implementation of chaotic amplitude control is compared to other recently developed Ising machines (see Fig. 5). The relatively slow increase of time to solution with respect to the number of spins N when solving larger SK problems using CAC suggests that our FPGA implementation is faster than the Hopfield neural network implemented using memresistors (mem-HNN)4, the restricted Boltzmann machine using a FPGA (FPGA-RBM)55 at large N. Extrapolations are based on the hypotheses of scaling in eγN and \({e}^{\gamma \sqrt{N}}\) by fitting the available experimental data up to N = 100 for mem-HNN and FPGA-RBM, N = 150 for NTT CIM, and N = 1100 for FPGA-CAC. Figure 5 shows that mem-HNN, FPGA-RBM, and NTT CIM have similar scaling exponents, although FPGA-RBM tends to exhibit a scaling in eγN rather than \({e}^{\gamma \sqrt{N}}\) for N ≈ 10055. It can be nonetheless expected that the algorithm implemented in mem-HNN, which is similar to mean-field annealing, has the same scaling behavior as simCIM and NMFA (see Fig. 2). Note that the lines showing the scaling in \({e}^{\gamma \sqrt{N}}\) in Fig. 5 can be seen as a lower bound of the TTS if the sub-exponential scaling is finite-size effect.

Fig. 5: Median time to solution and extrapolations for the exponential and sub-exponential scaling hypotheses on different Ising machines.
figure 5

For chaotic amplitude control on field programmable gate array (CAC, FPGA) and digital annealer (DA), the lower, higher, and upper whisker of boxes show the 50th, 80th, and 90th percentiles of the real time to solution distribution. Markers show the experimental and simulated data. Dashed and full straight lines show the fit of the logarithm of median time to solution vs. problem size N and its square root \(\sqrt{N}\), respectively, and extrapolations up to N = 2000.

It is noteworthy to mention that a recent implementation of the simulated bifurcation machine63 (SBM), which is not based on the gradient descent, similarly to CAC, but based on adiabatic evolutions of energy conservative systems performs well in solving SK problems. Both SBM and CAC exhibit smaller time to solution than other gradient based methods. SBM has been implemented on a FPGA (the Intel Stratix 10 GX) that has approximately 5 to 10 times more adaptive logic modules than the KU040 FPGA used to implement CAC. In order to compare SBM and CAC if implemented on an equivalent FPGA hardware, we plot in Fig. 5 the estimation of the time to solution for a fully parallel implementation of CAC using the hypothesis that one matrix-vector multiplication of size 1100 × 1100 can be achieved in 0.3 μs. This is the same time to compute a MVM that we can infer from time to solution reported in ref. 63 for SBM with binary connectivity given that problems of size N = 100 (N = 700) are solved in 29 μs (55 ms) and 94 (81000) MVMs, respectively. Note that SBM can reach the solution states after approximately 20 times less MVMs than CAC at N = 100 but only 5 times less MVM at N = 1000, suggesting that the speed of SBM depends largely on hardware rather than an algorithmic scaling advantage. Moreover, the simulated bifurcation machine63 does not perform significantly better than our current implementation of CAC for solving instances of the reference MAXCUT benchmark set called GSET (https://web.stanford.edu/~yyye/yyye/Gset/) (see Table 1) because CAC can find better cuts and exhibits smaller time to solution for multiple GSET instances even compared to the case of the implementation on the smaller KU040 FPGA with the probability of finding maximum cuts of the GSET in a single run that is much smaller with SBM. Comparison of the scaling behavior of time to solution between CAC and SBM is unfortunately not possible based on available data63.

Table 1 Performance of the field programmable gate array (FPGA) implementation of chaotic amplitude control (CAC) on the GSET. Performance of the FPGA implementation of CAC in finding the maximum cuts known, i.e., lowest Ising Hamiltonian known, of graphs in the GSET benchmark. id, Copt, CCAC, CSBM are the name of instances, best maximum cuts known from ref. 73, the proposed method after 100 runs, and Toshiba bifurcation machine on FPGA63 (FPGA-SBM), respectively. CSBM is evaluated using >100 runs but shorter time per run. The difference in cuts between CAC and SMB is given by ΔCCAC/SBM = CCAC − CSBM. \({p}_{0}^{\,{{\mbox{CAC}}}\,}\) and \({p}_{0}^{\,{{\mbox{SBM}}}\,}\) are the probabilities that FPGA-CAC and FPGA-SBM find the cut CCAC and CSBM in a single run, respectively. Moreover, < tBLS > and < tCAC > are the averaged time over 20 and 100 runs, respectively, to find the corresponding best cut in successful runs using BLS written C++ and running on a Xeon E5440 2.83 GHz59 and the proposed scheme implemented on the KU040 FPGA. Moreover, TTSCAC and TTSSBM are the time to find the best cut with 99% success probability using CAC and SBM, respectively, on FPGA. Times are written with parentheses around them if the cuts found are not the best known ones. The times are in seconds. For each instance, the smaller TTS from the two last columns of the table is shown in bold when the comparison is applicable.

Estimation of time to solution distribution

Next, we consider the whole distribution of time to solution in order to compare the ability of various methods to solve harder instances. As shown in Fig. 6(a), the cumulative distribution function (CDF) P(τ; T) of time to solution with 99% success probability τ is not uniquely defined as it depends on the duration T of the runs. We can define an optimal CDF P*(τ) that is independent of the runtime T as follows: P*(τ) = maxTP(τ; T). Numerical simulations show that this optimal CDF is well described by lognormal distribution, that is \({P}^{* }(log(\tau )) \sim {{{{{{{\mathcal{N}}}}}}}}(\mu ,\sqrt{v})\) where \(\sqrt{v}\) is the standard deviation of log(τ) (see Fig. 6(b), (c), and (d) for the cases of CAC, SA, and NMFA, respectively). In Fig. 6(e), it is shown that the scaling of the standard deviation \(\sqrt{v}(N)\) with the problem size N is significantly smaller for CAC, which implies that harder instances can be solved relatively more rapidly than using other methods. This result confirms the advantageous scaling of higher percentiles for CAC that was observed in Figs. 2 and 3.

Fig. 6: Estimation of time to solution distribution of chaotic amplitude control for Sherrington-Kirkpatrick (SK) instances.
figure 6

a Cumulative distribution of the time to solution P(τ) for N = 400 SK problems. The black line corresponds to the maximum of P(τ; T) with respect to the duration of the runs T. b, c, d Optimal cumulative distribution P*(τ) with \({P}^{* }(log(\tau )) \sim {{{{{{{\mathcal{N}}}}}}}}(\mu (N),\sqrt{v}(N))\) for chaotic amplitude control (CAC) (b), simulated annealing (SA) (c), and noisy mean-field annealing (NMFA) (d), respectively. e Standard deviation \(\sqrt{v}\) of the logarithm of time to solution distribution vs. problem size N. Shaded regions show the 99% confidence interval in the standard deviation. simCIM: simulation of the coherent Ising machine.

Conclusions

The framework described in this paper can be extended to solve other types of constrained combinatorial optimization problems such as the traveling salesman45, vehicle routing, and lead optimization problems. Moreover, it can be adapted to a variety of recently proposed Ising machines4,5,6,7,8,9,10,11,12,13,15, which would benefit from implementing a scheme that does not rely solely on the descent of a potential function. In particular, the performance of CIM9,10, mem-HNN4, and chip-scale photonic Ising machine14, which have small time to solution for small problem sizes (N ≈ 100) for which it may be sufficient to do rapid sampling based on convex optimization64 but with a relatively large scaling exponent, which limits their scalability, could be improved by adapting the implementation we propose if these hardware can be shown to be able to simulate larger numbers of spins experimentally. Rapid progress in the growing field of Ising machines may allow to verify scaling behaviors of the various methods at larger problem sizes and, thus, limit further finite-size effects.

The scaling exponents we have reported in this paper are based on the integration of the chaotic amplitude control dynamics using a Euler approximation of its ODEs. We have noted that the scaling of MVMs to solution of SK spin glass problems is reduced when the Euler time step is decreased (see Supplementary Note 2.8). The scaling exponents of CAC might thus be smaller than reported in this paper in the limit of a more accurate numerical integration over continuous time. It is therefore important future work to evaluate the scaling for N 1000 using a faster numerical simulation method. Such numerical calculations require a careful analysis of the integration method of ODEs, numerical precision, and tuning of parameters. It is also of considerable interest to implement CAC on multiple FPGAs for very high parallelization65 and on analog physical systems for further reduction of power consumption.

Nonrelaxational dynamics described herein is not limited to artificial simulators and likely also emerge in natural complex systems. In particular, it has been hypothesized that the brain operates out of equilibrium at large scales and produces more entropy when performing physically and cognitively demanding tasks66,67 in particular for the ones involving creativity for maximizing reward rather than memory retrieval. Such neural processes cannot be explained simply by the relaxation to fixed point attractors49, periodic limit cycles68, or even low-dimensional chaotic attractors whose self-similarity may not be equivalent to complexity but set the conditions for its emergence67,69. Similarly, evolutionary dynamics that is characterized as non-ergodic when the time required for representative genome exploration is longer than available evolutionary time70 may benefit from nonrelaxational dynamics rather than slow glassy relaxation for faster identification of high-fitness solutions. A detailed analytic comparison between the slow relaxation dynamics observed in Monte-Carlo simulations of spin glasses with the one proposed in this paper is needed in order to explain the apparent difference in their scaling of time to solution exhibited by our numerical results.

Methods

We target the implementation of a low-power system with maximum power supply of 5W using a XCKU040 Kintex Ultrascale Xilinx FPGA integrated on an Avnet board. The implemented circuit can process Ising problems of up to at least 1100 spins fully connected and of >2000 spins sparsely connected within the 5W power supply limit. Data is encoded into 18 bits fixed point vectors with 1 sign, 5 integer and 12 decimal bits to optimize computation time and power consumption. An important feature of our FPGA implementation of CAC is the use of several clock frequencies to concentrate the electrical power on the circuits that are the bottleneck of computation and require a high-speed clock. For the realization of the matrix-vector multiplication, each element of the matrix is encoded with 2 bits precision (wij is − 1, 0 or 1). An approximation based on the combination of logic equations describing the behavior of a multiplexer allows to achieve 104 multiplications within one clock cycle. The results of these multiplications are summed using cascading DSP and CARRY8 connected in a tree structure. Using pipelining, a matrix-vector multiplication for a squared matrix of size N is computed in \(2+5\frac{\log(N-4)}{\log\,(5)}+(\frac{N}{u})^{2}\) clock cycles (see Supplementary Note 2.4) at a clock frequency of 50MHz with u = 100, which is determined by the limitation of the number of available electronic component of the XCKU040 FPGA. The block size u can be made at least 3 times larger using commercially available FPGAs, which implies that the number of clock cycles needed to compute a dot product can scale almost logarithmically for problems of size N ≈ 1000 (see Supplementary Note 2.4 for discussions of scalability) and that the calculation time can be further significantly decreased using a higher-end FPGA. The calculation of the nonlinearity fi and error terms is achieved at higher frequency (300MHz and 100Mhz) using DSP in 8 + (N/u) and 9 + (N/u) clock cycles, respectively. In order to minimize energy resources and maximize speed, the nonlinear and error terms are calculated multiple times during the calculation of a single matrix-vector multiplication (see Supplementary Note 2).

Prior to computing the benchmark on the Sherrington-Kirkpatrick instances, the parameters of the system (see Supplementary Note 2.8) are optimized automatically using Bayesian optimization and bandit-based methods71. The automatic tuning of parameters for some previously unseen instances is out of the scope of this work but can be achieved to some extent using machine learning techniques72.