Scaling advantage of chaotic amplitude control for high-performance combinatorial optimization

The development of physical simulators, called Ising machines, that sample from low energy states of the Ising Hamiltonian has the potential to transform our ability to understand and control complex systems. However, most of the physical implementations of such machines have been based on a similar concept that is closely related to relaxational dynamics such as in simulated, mean-field, chaotic, and quantum annealing. Here we show that dynamics that includes a nonrelaxational component and is associated with a finite positive Gibbs entropy production rate can accelerate the sampling of low energy states compared to that of conventional methods. By implementing such dynamics on field programmable gate array, we show that the addition of nonrelaxational dynamics that we propose, called chaotic amplitude control, exhibits exponents of the scaling with problem size of the time to find optimal solutions and its variance that are smaller than those of relaxational schemes recently implemented on Ising machines. Finding the ground state of a variety of complex systems can be formulated as the minimization of the total interaction energy of Ising machines, posing a challenge as computational cost increases exponentially with system size. In this paper, the authors propose an algorithm to find the ground states of Ising-type problems by destabilising non-trivial attractors in combinatorial optimisation solvers through a heuristic modulation of the target amplitude, and show that this provides an improved scaling with respect to several existing methods.

M any complex systems such as spin glasses, interacting proteins, large-scale hardware, and financial portfolios, can be described as ensembles of disordered elements that have competing frustrated interactions 1 and rugged energy landscapes. There has been a growing interest in using physical simulators called Ising machines in order to reduce time and resources needed to identify configurations that minimize their total interaction energy, notably that of the Ising Hamiltonians H with HðσÞ ¼ À 1 2 ∑ ij ω ij σ i σ j (with ω ij the symmetric Ising couplings, i.e., ω ij = ω ji , and σ i = ±1) that is related to many nondeterministic polynomial-time hard (NP-hard) combinatorial optimization problems and various real-world applications 2 (see ref. 3 for a review). Recently proposed implementations include memresistor networks 4 , micro-or nano-electromechanical systems 5 , micro-magnets 6,7 , coherent optical systems 8 , hybrid opto-electronic hardware [9][10][11] , integrated photonics [12][13][14] , flux qubits 15 , and Bose-Einstein condensates 16 . In principle, these physical systems often possess unique properties, such as coherent superposition in flux qubits 17 and energy efficiency of memresistors 4,18 , which could lead to a distinctive advantage compared to conventional computers (see Fig. 1(a) and (b)) for the sampling of low energy states. In practice, the difficulty in constructing connections between constituting elements of the hardware is often the main limiting factor to scalability and performance for these systems 15,19 . Moreover, these devices often implement schemes that are directly related to the concept of annealing (either simulated 20,21 , mean-field 22,23 , chaotic 18,24 , and quantum 17,25 ) in which the escape from the numerous local minima 26 and saddle points 27 of the free energy function can only be achieved under very slow modulation of the control parameter (see Fig. 1(c)). These methods are dependent on non-equilibrium dynamics called aging that, according to recent numerical studies 28 , is strongly non-ergodic and seems to explore only a confined subspace determined by the initial condition rather than wander in the entire configurational space 29 for mean-field spin glass models. In general, such systems find the solutions of minimal energy only after many repetitions of the relaxation process.
Alternative dynamics that is not based on the concepts of annealing and relaxation may perform better for solving hard combinatorial optimization problems [30][31][32] . Various kinds of dynamics have been proposed 3,[33][34][35][36] , notably chaotic dynamics 18,[37][38][39][40] , but have either not been implemented onto specialized hardware 37,41 or use chaotic dynamics merely as a replacement to random fluctuations 18,38 . It was recently shown that the control of amplitude in mean-field dynamics can improve the performance of Ising machines by introducing error-correction terms (see Fig. 1(d)), effectively doubling the dimensionality of the system, whose role is to correct the amplitude heterogeneity 30 . Because of the similarity of such dynamics with that of a neural network, it can be implemented especially efficiently in electronic neuromorphic hardware where memory is distributed with the processing [42][43][44] .
In this paper, we show that the addition of the nonrelaxational part to the relaxation process makes the dynamics able to escape at a much faster rate than relaxational ones from local minima and saddles of the corresponding energy function. The exponential scaling factor with respect to system size of the time needed to reach optimal configurations of spin glasses that can be Fig. 1 Schematic representation of the proposed chip. Schematic representation of a conventional central processing unit (CPU) and separate memory with the von Neumann bottleneck problem and b the proposed neuromorphic chip for combinatorial optimization. Universal Asynchronous Receiver/ Transmitter (UART), phase-locked loops (PLL), and global clock buffer with clock enable (BUFGCE) are used for input/output of data, clock management, and clock gating, respectively. Neurons and synapses are shown in the bottom to illustrate the analogy between the organization of the chip and biological neural networks. Schema of dynamics in analog state-space x of algorithms based on c annealing on a potential function shown at different times t i and d a trajectory of proposed chaotic amplitude control scheme shown by the curved gray line with an arrow. Red circles show the discrete space σ.
found after a very long computation time using state-of-the-art heuristics, called time to solution, is smaller in the former case, which raises the question whether the nonrelaxational component makes the dynamics qualitatively different from the very slow relaxation observed in classic Monte-Carlo simulations of spin glasses. In order to extend numerical analysis to large problem sizes and limit finite-size effects, we implement a scheme that we name chaotic amplitude control (CAC) on a field programmable gate array (FPGA, see Fig. 1(b)) and show that the developed hardware can be faster for finding these optimal configurations in the limit of large problem sizes than many state-of-the-art algorithms and Ising machines for some reference benchmarks with enhanced energy efficiency.

Results
Mean-field dynamics. For the sake of simplicity, we consider the classical limit of Ising machines for which the state space is often naturally described by analog variables (i.e., real numbers) noted x i in the following. The variables x i represent measured physical quantities such as voltage 4 or optical field amplitude [8][9][10][11][12][13] and these systems can often be simplified to networks of interacting nonlinear elements whose time evolution can be written as follows: where f i and g i represent the nonlinear gain and interaction, respectively, and are assumed to be monotonic, odd, and invertible "sigmoidal" functions for the sake of simplicity; η i , experimental white noise of standard deviation σ 0 with hη i ðtÞη j ðt 0 Þi ¼ δ ij δðt À t 0 Þ; and N, the number of spins. δ ij and δ(t) are the Kronecker delta symbol and Dirac delta function, respectively. Ordinary differential equations similar to eq. (1) have been used in various computational models that are applied to NP-hard combinatorial optimization problems such as Hopfield-Tank neural networks 45 , coherent Ising machines 46 , and correspond to the "soft" spin description of frustrated spin systems 47 . Moreover, the steady states of eq. (1) correspond to the solutions of the "naive" Thouless-Anderson-Palmer (nTAP) equations that arise from the mean-field description of Sherrington-Kirkpatrick spin glasses when the Onsager reaction term has been discarded 48 . In the case of neural networks in particular, the variables x i and constant parameters ω ij correspond to firing rates of neurons and synaptic coupling weights, respectively. It is well known that, when β i = β for all i and the noise is not taken into account (σ 0 = 0), the time evolution of this system is motion in the state space that seeks out minima of a potential function 49 (or Lyapunov function) V given as dy and HðyÞ ¼ À 1 2 ∑ ij ω ij y i y j is the extension of the Ising Hamiltonian in the real space with y i = g i (x i ) (see Supplementary Note 1.1). The bifurcation parameter β, which can be interpreted as the inverse temperature of the naive TAP equations 48 , the steepness of the neuronal transfer function in Hopfield-Tank neural networks 45 , or to the coupling strength in coherent Ising machines [8][9][10] , is usually decreased gradually in order to improve the quality of solutions found. This procedure has been called mean-field annealing 23 , and can be interpreted as a quasi-static deformation of the potential function V (see Fig. 1(c)). There is, however, no guarantee that a sufficiently slow deformation of the landscape V will ensure convergence to the lowest energy state contrary to the quantum adiabatic theorem 50 or the convergence theorem of simulated annealing 51 . At fixed β, global convergence to the minimum of the potential V can be assured if σ 0 is gradually decreased with σ 0 ðtÞ 2 $ c logð2þtÞ and c sufficiently large 52 . The parameter σ 2 0 is analogous to the temperature in simulated annealing in this case. The global minimum of the potential V does not, however, generally correspond to that of the Ising Hamiltonians H at finite β. Moreover, the statistical analysis of spin glasses suggests that the potential V is highly non-convex at low temperature and that simple gradient descent very unlikely reaches the global minimum of HðσÞ because of the presence of exponentially numerous local minima 26 and saddle points 27 as the size of the system increases. The slow relaxation time of Monte-Carlo simulations of spin glasses, such as when using simulated annealing, might also be explained by similar trapping dynamics during the descent of the free energy landscape obtained from the TAP equations 27 . In the following, we consider in particular the soft spin description obtained by taking f i ðx i Þ ¼ ðÀ1 þ pÞx i À x 3 i and y i = g(x i ) = x i , where p is the gain parameter, which is the canonical model of the system described in eq. (1) at proximity of a pitchfork bifurcation with respect to the parameter p (see theory of weakly connected neural networks 53 ). In this case, the potential function V b is given as V b ðx i Þ ¼ ð1 À pÞ Chaotic amplitude control. In order to define deterministic dynamics that is inclined to visit spin configurations associated with lower Ising Hamiltonian without relying entirely on the descent of a potential function, we introduce error signals, noted e i 2 R, that modulate the strength of coupling β i to the ith nonlinear element such that β i (t) defined in eq. (1) is expressed as β i (t) = βe i (t) with β > 0. The time evolution of the error signals e i are given as follows 30 : where a and ξ are the target amplitude and the rate of change of error variables, respectively, with a > 0 and ξ > 0. If the system settles to a steady state, the values . When p < 1, the internal fields h i at the steady state, defined as h i = ∑ j ω ij σ j with σ j ¼ y Ã j =jy Ã j j, are such that h i σ i > 0, ∀i 30 . Thus, each equilibrium point of the analog system corresponds to that of a zero-temperature local minimum of the binary spin system.
The dynamics described by the coupled equations (1) and (2) is not derived from a potential function because error signals e i introduce asymmetric interactions between the x i and the computational principle is not related to a gradient descent. Rather, the addition of error variables results in additional dimensions in the phase space via which the dynamics can escape local minima. The mechanism of this escape can be summarized as follows. It can be shown (see the Supplementary Note 1.2) that the dimension of the unstable manifold at equilibrium points corresponding to local minima σ of the Ising Hamiltonian depends on the number of eigenvalues μ(σ) with μ(σ) > F(a) where μ(σ) are the eigenvalues of the matrix f ω ij jh i j g ij (with internal field h i ) and F a function given as FðyÞ ¼ ψ 0 ðyÞ ψðyÞ y and ψðyÞ ¼ f ðg À1 ðyÞÞ ðg À1 Þ 0 ðyÞ . Thus, there exists a value of a such that all local minima (including the ground state) are unstable and for which the system exhibits chaotic dynamics that explores successively candidate boolean configurations. The energy is evaluated at each step and the best configuration visited is kept as the solution of a run. This chaotic search is particularly efficient for sampling configurations of the Ising Hamiltonian close to that of the ground state using a single run although the distribution of sampled states is not necessarily given by the Boltzmann distribution. Note that the use of chaotic dynamics for solving Ising problems has been discussed previously 18,54 , notably in the context of neural networks, and it has been argued that chaotic fluctuations may possess better properties than Brownian noise for escaping from local minima traps. In the case of the proposed scheme, the chaotic dynamics is not merely used as a replacement for noise. Rather, the interaction between nonlinear gain and error-correction results in the destabilization of states associated with lower Ising Hamiltonian. Increasing the amplitude of additive noise σ 0 in eq. (1) does not, in fact, significantly increase the efficiency of the system and σ 0 is thus set to zero.
Ensuring that fixed points are locally unstable does not guarantee that the system does not relax to periodic and chaotic attractors. We have previously proposed that non-trivial attractors can also be destabilized by ensuring the positive rate of Gibbs entropy production using a modulation of the target amplitude 30 . In this paper, we propose an alternative heuristic modulation of the target amplitude that is more suitable for a digital implementation on FPGA than the one proposed in 30 without significant decrease in performance for most problem instances (see Supplementary Note 1.3 for a comparison of the two schemes). Because the value of a for which all local minima is unstable is not known a priori, we propose instead to destabilize the local minima traps by dynamically modulating a depending on the visited configurations σ as follows: where ΔHðtÞ ¼ H opt À HðtÞ; HðtÞ, the Ising Hamiltonian of the configuration visited at time t; and H opt , a given target energy. In practice, we set H opt to the lowest energy visited during the current run, i.e., H opt ðtÞ ¼ min t 0 ≤ t Hðt 0 Þ. The function tanh is the tangent hyperbolic. ρ and δ are positive real constants. Lastly, the parameter ξ (see eq. (2)) is modulated as follows: dξ dt ¼ Γ when t − t r < Δt, where t r is the last time for which either the best known energy H opt was updated or ξ was reset. Otherwise, ξ is reset to 0 if t − t r ≥ Δt and t r is set to t. Numerical simulations shown in the following suggest that this modulation results in the destabilization of non-trivial attractors (periodic, chaotic, etc.) for typical problem instances.
Hardware independent time-to-solution scaling. In order to test if the nonrelaxational dynamics of chaotic amplitude control might be able to accelerate the search of mean-field dynamics for finding the ground state of typical frustrated systems, we look for the ground states of Sherrington-Kirkpatrick (SK) spin glass instances using the numerical simulation of eqs. (1) to (3) and compare time to solutions with those of two closely related relaxational schemes: noisy mean-field annealing (NMFA) 22 and the simulation of the coherent Ising machine (simCIM, see Supplementary Note 4.1-2). Because finding the ground states of SK spin glasses is a NP-hard problem, it cannot be assured that the states found by heuristic solvers, even after a very long computation time, are the real ground states. In order to compare the performance of a set of heuristic solvers, we can define for each instance the "optimal" energy as the one equal to the lowest energy found repeatedly (>10 6 times) by these heuristics after a very long computation time. States that have this optimal energy are called solution states in the following. Because the arithmetic complexity of calculating one step of these three schemes is dominated by the matrix-vector multiplication (MVM), it is sufficient for the sake of comparison to count the number of MVM, noted ν, to find the solution state energy of a given instance with 99% success probability, with νðKÞ ¼ K lnð1À0:99Þ ln ð1Àp 0 ðKÞÞ and p 0 (K) the probability of visiting a solution state configuration at least once after a number K of MVMs in a single run. In Fig. 2, NMFA (a) and the CAC (b) are compared using the averaged success probability 〈p 0 〉 of finding the solution state for 100 randomly generated SK spin glass instances per problem size N (see Supplementary Note 3 for details about the benchmark set). Note that the success probability of the mean-field annealing method does not seem to converge to 1 even for large annealing time (see Fig. 2(a)). Because the success probability of NMFA and simCIM remains low at larger problem size, its correct estimation requires simulating a larger number of runs, which we achieved by using GPU implementations of these methods. On the other hand, the average success probability 〈p 0 〉 of CAC is of order 1 when the maximal number of MVM is large enough, suggesting that the system rarely gets trapped in local minima of the Ising Hamiltonian or non-trivial attractors. In Fig. 2(c) and (d) are shown the q th percentile (with q = 50, i.e., the median) of the MVM to solution distribution, noted ν q (K; N), for various duration of simulation K, where K is the number of MVMs of a single run. The minimum of these curves, noted ν Ã q ðNÞ with ν Ã q ðNÞ ¼ min K ν q ðK; NÞ, represents the optimal scaling of MVM to solution vs. problem size N 15 . Using the hypothesis of an exponential scaling with the square root of problem size N, CAC exhibits significantly smaller scaling exponent (γ = 0.18 ± 0.06) than the NMFA (γ = 0.47 ± 0.04) and simCIM (γ = 0.54 ± 0.03, see inset in Fig. 2(e)). We have verified that this scaling advantage holds for various parameters of the mean-field annealing (see Supplementary Note 4.1). Note that a root-exponential scaling behavior of the median time to solution has been previously reported for SK spin glass problems 15,55 and other NP-Hard problems 56 . We consider both cases of a root-exponential and exponential scaling as possible hypotheses in the following. Note that a similar root-exponential scaling at finite size N is observed when we apply the CAC algorithm to solving problems from the recently proposed Wishart planted instances 57,58 in the "easy" regime (see Supplementary Note 3.3).
Benchmark of the FPGA implementation. Although comparison of algebraic complexity indicates that CAC has a scaling advantage over mean-field annealing, it is in practice necessary to compare its physical implementation against other state-of-the-art methods because the performance of hardware depends on other factors such as memory access and information propagation delays. To this end, CAC is implemented into a FPGA because its similarity with neural networks makes it well-fitted for a design where memory is distributed with processing (see Supplementary Note 2 for the details of the FPGA implementation). The organization of the electronic circuit can be understood using the following analogy. Pairs of analog values x i and e i , which represent averaged activity of two types of neurons, are encoded within neighboring circuits. This microstructure is repeated N times on the whole surface of the chip, which resembles the columnar organization of the brain. The nonlinear processes f i (x i ), which model the local-population activation functions and are independent for i ≠ j, are calculated in parallel. The coupling between elements i and j ∈ {1, …, N} that is achieved by the dot product in eq. (1) is implemented by circuits that are at the periphery of the chip and are organized in a summation tree reminiscent of dendritic branching (see Fig. 1(b)). The power consumption of the developed hardware never exceeds 5W because of limitations of the development board that we have used.
First, we compare the FPGA implementation of CAC against state-of-the-art CPU algorithms: break-out local search 59 (BLS) that has been used to find many of the best known maximumcuts (equivalently, Ising energies) from the GSET benchmark set (https://web.stanford.edu/~yyye/yyye/Gset/), a well-optimized single-core CPU implementation of parallel tempering (or random replica exchange Monte-Carlo Markov chain sampling) 60,61 (PT, courtesy of S. Mandrà), simulated annealing (SA) 62 . Figure 3(a) shows that the CAC on FPGA has the smallest real time to solution τ Ã q against most other state-of-the-art algorithms despite just 5W power-consumption where τ Ã q ðNÞ is the optimal q th percentile of time to solution with 99% success probability and is given as τ Ã q ðNÞ ¼ min T τ q ðT; NÞ where τ(T) of a given instance is τðTÞ ¼ T lnð1À0:99Þ ln ð1Àp 0 ðTÞÞ and T is the duration in seconds of a run. The probability p 0 (T) is evaluated using 100 runs per instance. The results of the CAC algorithm run on a CPU are also included in Fig. 3. The CPU implementation of CAC written in python for this work is not optimized and is consequently slower than other algorithms. However, its scaling of time to solution with problem size is consistent with that of CAC on FPGA. Figure 3 shows that CAC implemented on either CPU or FPGA has a significantly smaller increase of time to solution with problem size than SA run on CPU. Note that the power consumption of transistors in the FPGA and CPU scales proportionally to their clock frequencies. In order to compare different hardware despite the heterogeneity in their power consumption, the q th percentile of energy-to-solution E Ã q , i.e., the energy E Ã q required to solve SK instances with E Ã q ¼ Pτ Ã q and P the power consumption, is plotted in Fig. 3(b). For the sake of simplicity, we assume a 20 watts power consumption for the CPU. These numbers represent typical orders to magnitude for contemporary digital systems. CAC on FPGA is 10 2 to 10 3 times more energy efficient than state-of-the-art algorithms running on classical computers.
The Monte-Carlo methods SA and PT have moreover been recently implemented on a special-purpose electronic chip called Digital Annealer (DA) 20 . In Fig. 4, we show the scaling exponents of 50th and 80th percentiles of the MVM and time to solution distribution for problem sizes N = 800 to N = 1100 based on the hypothesis of scaling in e γN obtained by fitting data shown in Figs. 2 and 3 (see "CAC fully parallel" and 'CAC 100 × 100", respectively, in Fig. 4 for the numerical values of the scaling exponents), and compare them to that reported for SA and PT implemented on CPU and DA. Note that we replicated the benchmark method of ref. 20 for the sake of the comparison by fitting time to solution from N = 800 up to N = 1100. The scaling obtained from fitting the MVM to solution in Fig. 2 is based on the assumption that the matrix-vector multiplication can be calculated fully in parallel in a time that scales as log(N) instead of N 2 (see Methods section) at least up to N = 1100. We include this hypothesis because many other Ising machines exploit the parallelization of matrix-vector multiplication for speed up 20,63 , whereas the current implementation of CAC iterates on block matrices of size 100 by 100 and is thus only partially parallel because of resource limitations specific to the downscale FPGA used in this work. Note that the number of matrixvector multiplications necessary to find the solution state dominates the exponential scaling rather than the time to compute one matrixvector multiplication. The scaling of time to solution for chaotic amplitude control observed is significantly smaller than the ones of standard Monte-Carlo methods SA and PT 20 , especially in the case of a fully parallel implementation. The scaling exponents of fully parallel CAC is smaller than that of DA and on par with that of PT on DA (PTDA), although CAC does not require simulating replica of the system and is thus faster in absolute time than PTDA 20 .
Next, the proposed implementation of chaotic amplitude control is compared to other recently developed Ising machines (see Fig. 5). The relatively slow increase of time to solution with respect to the number of spins N when solving larger SK problems using CAC suggests that our FPGA implementation is faster than the Hopfield neural network implemented using memresistors (mem-HNN) 4 , the restricted Boltzmann machine using a FPGA (FPGA-RBM) 55 Figure 5 shows that mem-HNN, FPGA-RBM, and NTT CIM have similar scaling exponents, although FPGA-RBM tends to exhibit a scaling in e γN rather than e γ ffiffiffi N p for N ≈ 100 55 . It can be nonetheless expected that the algorithm implemented in mem-HNN, which is similar to mean-field annealing, has the same scaling behavior as simCIM and NMFA (see Fig. 2). Note that the lines showing the scaling in e γ ffiffiffi N p in Fig. 5 can be seen as a lower bound of the TTS if the sub-exponential scaling is finite-size effect.
It is noteworthy to mention that a recent implementation of the simulated bifurcation machine 63 (SBM), which is not based on the gradient descent, similarly to CAC, but based on adiabatic evolutions of energy conservative systems performs well in solving SK problems. Both SBM and CAC exhibit smaller time to solution than other gradient based methods. SBM has been implemented on a FPGA (the Intel Stratix 10 GX) that has approximately 5 to 10 times more adaptive logic modules than the KU040 FPGA used to implement CAC. In order to compare SBM and CAC if implemented on an equivalent FPGA hardware, we plot in Fig. 5 the estimation of the time to solution for a fully parallel implementation of CAC using the hypothesis that one matrix-vector multiplication of size 1100 × 1100 can be achieved in 0.3 μs. This is the same time to compute a MVM that we can infer from time to solution reported in ref. 63 for SBM with binary connectivity given that problems of size N = 100 (N = 700) are solved in 29 μs (55 ms) and 94 (81000) MVMs, respectively. Note that SBM can reach the solution states after approximately 20 times less MVMs than CAC at N = 100 but only 5 times less MVM at N = 1000, suggesting that the speed of SBM depends largely on hardware rather than an algorithmic scaling advantage. Moreover, the simulated bifurcation machine 63 does not perform significantly better than our current implementation of CAC for solving instances of the reference MAXCUT benchmark set called GSET (https://web.stanford.edu/~yyye/yyye/Gset/) (see Table 1) because CAC can find better cuts and exhibits smaller time to solution for multiple GSET instances even compared to the case of the implementation on the smaller KU040 FPGA with the probability of finding maximum cuts of the GSET in a single run that is much smaller with SBM. Comparison of the scaling behavior of time to solution between CAC and SBM is unfortunately not possible based on available data 63 .
Estimation of time to solution distribution. Next, we consider the whole distribution of time to solution in order to compare the ability of various methods to solve harder instances. As shown in Fig. 6(a), the cumulative distribution function (CDF) P(τ; T) of time to solution with 99% success probability τ is not uniquely defined as it depends on the duration T of the runs. We can define an optimal CDF P * (τ) that is independent of the runtime T as follows: P * (τ) = max T P(τ; T). Numerical simulations show that this optimal CDF is well described by lognormal distribution, that is P Ã ðlogðτÞÞ $ N ðμ; ffiffi ffi v p Þ where ffiffi ffi v p is the standard deviation of log(τ) (see Fig. 6(b), (c), and (d) for the cases of CAC, SA, and NMFA, respectively). In Fig. 6(e), it is shown that the scaling of the standard deviation ffiffi ffi v p ðNÞ with the problem size N is significantly smaller for CAC, which implies that harder instances can be solved relatively more rapidly than using other methods. This result confirms the advantageous scaling of higher percentiles for CAC that was observed in Figs. 2 and 3.

Conclusions
The framework described in this paper can be extended to solve other types of constrained combinatorial optimization problems such as the traveling salesman 45 , vehicle routing, and lead optimization problems. Moreover, it can be adapted to a variety of recently proposed Ising machines [4][5][6][7][8][9][10][11][12][13]15 , which would benefit from implementing a scheme that does not rely solely on the descent of a potential function. In particular, the performance of CIM 9,10 , mem-HNN 4 , and chip-scale photonic Ising machine 14 , which have small  time to solution for small problem sizes (N ≈ 100) for which it may be sufficient to do rapid sampling based on convex optimization 64 but with a relatively large scaling exponent, which limits their scalability, could be improved by adapting the implementation we propose if these hardware can be shown to be able to simulate larger numbers of spins experimentally. Rapid progress in the growing field of Ising machines may allow to verify scaling behaviors of the various methods at larger problem sizes and, thus, limit further finite-size effects.
The scaling exponents we have reported in this paper are based on the integration of the chaotic amplitude control dynamics using a Euler approximation of its ODEs. We have noted that the scaling of MVMs to solution of SK spin glass problems is reduced when the Euler time step is decreased (see Supplementary  Table 1 Performance of the field programmable gate array (FPGA) implementation of chaotic amplitude control (CAC) on the GSET. Performance of the FPGA implementation of CAC in finding the maximum cuts known, i.e., lowest Ising Hamiltonian known, of graphs in the GSET benchmark. id, C opt , C CAC , C SBM are the name of instances, best maximum cuts known from ref. 73 , the proposed method after 100 runs, and Toshiba bifurcation machine on FPGA 63 (FPGA-SBM), respectively. C SBM is evaluated using >100 runs but shorter time per run. The difference in cuts between CAC and SMB is given by ΔC CAC/SBM = C CAC − C SBM . p CAC 0 and p SBM 0 are the probabilities that FPGA-CAC and FPGA-SBM find the cut C CAC and C SBM in a single run, respectively. Moreover, < t BLS > and < t CAC > are the averaged time over 20 and 100 runs, respectively, to find the corresponding best cut in successful runs using BLS written C++ and running on a Xeon E5440 2.83 GHz 59 and the proposed scheme implemented on the KU040 FPGA. Moreover, TTS CAC and TTS SBM are the time to find the best cut with 99% success probability using CAC and SBM, respectively, on FPGA. Times are written with parentheses around them if the cuts found are not the best known ones. The times are in seconds. For each instance, the smaller TTS from the two last columns of the table is shown in bold when the comparison is applicable. Note 2.8). The scaling exponents of CAC might thus be smaller than reported in this paper in the limit of a more accurate numerical integration over continuous time. It is therefore important future work to evaluate the scaling for N ≫ 1000 using a faster numerical simulation method. Such numerical calculations require a careful analysis of the integration method of ODEs, numerical precision, and tuning of parameters. It is also of considerable interest to implement CAC on multiple FPGAs for very high parallelization 65 and on analog physical systems for further reduction of power consumption. Nonrelaxational dynamics described herein is not limited to artificial simulators and likely also emerge in natural complex systems. In particular, it has been hypothesized that the brain operates out of equilibrium at large scales and produces more entropy when performing physically and cognitively demanding tasks 66,67 in particular for the ones involving creativity for maximizing reward rather than memory retrieval. Such neural processes cannot be explained simply by the relaxation to fixed point attractors 49 , periodic limit cycles 68 , or even low-dimensional chaotic attractors whose self-similarity may not be equivalent to complexity but set the conditions for its emergence 67,69 . Similarly, evolutionary dynamics that is characterized as non-ergodic when the time required for representative genome exploration is longer than available evolutionary time 70 may benefit from nonrelaxational dynamics rather than slow glassy relaxation for faster identification of high-fitness solutions. A detailed analytic comparison between the slow relaxation dynamics observed in Monte-Carlo simulations of spin glasses with the one proposed in this paper is needed in order to explain the apparent difference in their scaling of time to solution exhibited by our numerical results.

Methods
We target the implementation of a low-power system with maximum power supply of 5W using a XCKU040 Kintex Ultrascale Xilinx FPGA integrated on an Avnet board. The implemented circuit can process Ising problems of up to at least 1100 spins fully connected and of >2000 spins sparsely connected within the 5W power supply limit. Data is encoded into 18 bits fixed point vectors with 1 sign, 5 integer and 12 decimal bits to optimize computation time and power consumption. An important feature of our FPGA implementation of CAC is the use of several clock frequencies to concentrate the electrical power on the circuits that are the bottleneck of computation and require a high-speed clock. For the realization of the matrix-vector multiplication, each element of the matrix is encoded with 2 bits precision (w ij is − 1, 0 or 1). An approximation based on the combination of logic equations describing the behavior of a multiplexer allows to achieve 10 4 multiplications within one clock cycle. The results of these multiplications are summed using cascading DSP and CARRY8 connected in a tree structure. Using pipelining, a matrix-vector multiplication for a squared matrix of size N is computed in 2 þ 5 logðNÀ4Þ log ð5Þ þ ð N u Þ 2 clock cycles (see Supplementary Note 2.4) at a clock frequency of 50MHz with u = 100, which is determined by the limitation of the number of available electronic component of the XCKU040 FPGA. The block size u can be made at least 3 times larger using commercially available FPGAs, which implies that the number of clock cycles needed to compute a dot product can scale almost logarithmically for problems of size N ≈ 1000 (see Supplementary Note 2.4 for discussions of scalability) and that the calculation time can be further significantly decreased using a higher-end FPGA. The calculation of the nonlinearity f i and error terms is achieved at higher frequency (300MHz and 100Mhz) using DSP in 8 + (N/u) and 9 + (N/u) clock cycles, respectively. In order to minimize energy resources and maximize speed, the nonlinear and error terms are calculated multiple times during the calculation of a single matrix-vector multiplication (see Supplementary Note 2).
Prior to computing the benchmark on the Sherrington-Kirkpatrick instances, the parameters of the system (see Supplementary Note 2.8) are optimized automatically using Bayesian optimization and bandit-based methods 71 . The automatic tuning of parameters for some previously unseen instances is out of the scope of this work but can be achieved to some extent using machine learning techniques 72 .

Data availability
Sherrington-Kirkpatrick instances used in this paper are available upon request to T. Leleu. The GSET instances are available at https://web.stanford.edu/~yyye/yyye/Gset/.

Code availability
Requests for code availability should be addressed to T. Leleu.