Abstract
Spin glasses are disordered magnets with random interactions that are, generally, in conflict with each other. Finding the ground states of spin glasses is not only essential for understanding the nature of disordered magnets and many other physical systems, but also useful to solve a broad array of hard combinatorial optimization problems across multiple disciplines. Despite decadeslong efforts, an algorithm with both high accuracy and high efficiency is still lacking. Here we introduce DIRAC – a deep reinforcement learning framework, which can be trained purely on smallscale spin glass instances and then applied to arbitrarily large ones. DIRAC displays better scalability than other methods and can be leveraged to enhance any thermal annealing method. Extensive calculations on 2D, 3D and 4D EdwardsAnderson spin glass instances demonstrate the superior performance of DIRAC over existing methods. The presented framework will help us better understand the nature of the lowtemperature spinglass phase, which is a fundamental challenge in statistical physics. Moreover, the gauge transformation technique adopted in DIRAC builds a deep connection between physics and artificial intelligence. In particular, this opens up a promising avenue for reinforcement learning models to explore in the enormous configuration space, which would be extremely helpful to solve many other hard combinatorial optimization problems.
Introduction
The Ising spin glass is a classical disordered system that has been studied for decades^{1,2}. Its spectacular behaviors have attracted considerable interests in several branches of science, including physics, mathematics, computer science, and biology. The endogenous nature of quenched disorder in spin glasses results in the fact that, it is hard to find out the ground state of such a system due to the frustrations (i.e., the impossibility of simultaneously minimizing all the interactions), despite its seemingly simple Hamiltonian^{3}:
In general, this Hamiltonian can be defined on arbitrary graphs. Here, we will focus on the most heavily studied lattice realization of the nearest neighbor Ising spin glass, where the sites lie on a Ddimensional hypercubic lattice with N = L^{D} sites (see Fig. 1 for 2D instances) and σ_{i} = ± 1 represents the binary Ising spin value at site i. The coupling J_{ij} is a Gaussian random variable that represents the interaction strength between two neighboring spins i and j. In the literature, this is often referred to as the EdwardsAnderson (EA) spin glass model. The EA model aims at capturing the quintessential character of real, physically occurring, spin glasses^{4}. Comparing to other short range models such as the meanfield Bethe Lattice^{5}, the EA model seems more challenging in the sense that there exists vast amounts of short loops that will lead to much more frustrations.
There are at least three strong motivations to find the ground states of spin glasses. First of all, finding the spin glass ground states is a key to the mysteries behind the strange and complex behaviors of spin glasses (and many other disordered systems), such as its glassy phase^{6} and ergodicity breaking^{7}. In particular, groundstate energies in different boundary conditions can be used to compute the stiffness exponent of spin glasses, which can help us ensure the existence of a spin glass phase at finite temperatures^{8,9}. Second, finding ground states of Ising spin glasses in three or higher dimensions is a nondeterministic polynomialtime (NP) hard problem^{10}, which is closely related to many other hard combinatorial optimization problems^{11}. For example, all of Karp’s 21 NPcomplete problems and many NPhard problems (such as the maxcut problem, the traveling salesman problem, the protein folding problem, etc.) have Ising spin glass formulations^{11,12,13}. Therefore, finding the Ising spin glass ground states may help us solve many other NPhard problems. Finally, the celebrated Hopfield model^{14} and other pioneering models of neural networks drew deep connections with Ising magnets^{15} (and spin glasses, in particular^{16,17}) on general networks. The study of spin glasses and their ground states has led to (and will continue lead to) the development of powerful optimization tools such as the cavity method and Belief Propagation that will further shed new light on computational complexity transitions^{2,18}.
Given the NPhard nature of finding the spin glass ground states in three or higher dimensions, the exact branchandbound approach can only be used for very small systems^{19}. For twodimensional lattices with periodic boundary conditions in at most one direction (or planar graphs in general), the Ising spin glass ground states can be calculated by mapping to the minimumweight perfect matching problem, which can be exactly solved in polynomial time^{20,21}. However, for general cases with large system sizes, we lack a method with both high accuracy and high efficiency. We used to rely on heuristic methods. In particular, Monte Carlo methods based on thermal annealing, e.g., simulated annealing (SA)^{22}, population annealing^{23} and parallel tempering (PT)^{24,25,26,27}, have been well studied in the statistical physics community.
Recently, reinforcement learning (RL) has proven to be a promising tool in tackling many combinatorial optimization problems, such as the minimum vertex cover problem^{28}, the minimum independent set problem^{29}, the network dismantling problem^{30}, the travelling salesman problem^{31}, the vehicle routing problem^{32}, etc. Compared to traditional methods, RLbased algorithms are believed to achieve a more favorable tradeoff between accuracy and efficiency. We note that RL was recently used to devise a smart temperature control scheme of simulated annealing in finding ground states of the 2D spin glass system, which enabled small systems to better escape local minimum and reach their ground states with high probability^{33}. However, this RLenhanced simulated annealing still fails in finding ground states for larger spin glass systems in three or higher dimensions.
In this work, we introduce DIRAC (Deep reinforcement learning for spInglass gRoundstAte Calculation), a RLbased framework that can directly calculate spin glass ground states. DIRAC has several advantages. First, it demonstrates superior performances (in terms of accuracy) over the stateoftheart thermal annealing methods, especially when the gauge transformation (GT) technique is adopted in DIRAC. Second, it displays better scalability than other methods. Finally, it can be leveraged to enhance any thermal annealing method and offer much better solutions.
Results
Reinforcement learning formulation
Following many other RL formulations in solving combinatorial optimization problems^{31,34,35,36}, DIRAC considers the spin glass ground state search as a Markov decision process (MDP), which involves an agent interacting with its environment (i.e., the input instance), and learning an optimal policy that sequentially takes the longsighted action so as to accumulate its maximum rewards. To better describe this process, we first define state, action and reward in the context of Ising spin glass ground state calculation. State: a state s represents the observed spin glass instance, including both the spin configuration {σ_{i}} and the coupling strengths {J_{ij}}, based on which the optimal action will be chosen. The terminal state s_{T} is met when the agent has tried to flip each spin once. Action: an action a^{(i)} means to flip spin i. Reward: the reward \(r(s,{a}^{(i)},{s}^{{\prime} })\) is defined as the energy change after flipping spin i from state s to get a new state \({s}^{{\prime} }\), i.e., \(r(s,{a}^{(i)},{s}^{{\prime} })=2{\sum }_{j\in \partial i}{J}_{ij}{\sigma }_{i}{\sigma }_{j}\), where ∂i represents the set of nearest neighbors of spin i.
Through the RL formulation, we seek to learn a policy π_{Θ}(a^{(i)}∣s) that takes any observed state s and produces the action a^{(i)} corresponding to the optimal spin flip that maximizes the expected future cumulative rewards. Here \({{\Theta }}=\{{{{\Theta }}}_{{{{{{{{\mathcal{E}}}}}}}}},{{{\Theta }}}_{{{{{{{{\mathcal{D}}}}}}}}}\}\) represents a collection of learnable encoding parameters \({{{\Theta }}}_{{{{{{{{\mathcal{E}}}}}}}}}\) and decoding parameters \({{{\Theta }}}_{{{{{{{{\mathcal{D}}}}}}}}}\), which will be updated through RL.
DIRAC architecture
We design DIRAC to learn the policy π_{Θ} automatically. As shown in Fig. 2, DIRAC consists of two phases: offline training and online application. For offline training, the DIRAC agent is selftaught on randomly generated smallscale EA spin glass instances. For each instance, the agent interacts with its environment through a sequence of states, actions and rewards (Fig. 2a). Meanwhile, the agent gains experiences to update its parameters, which enhances its ability in finding the ground states of EA spin glasses (Fig. 2b,c). For online application, the welltrained DIRAC agent can be used either directly (DIRAC^{1}, Fig. 2d) or iteratively (DIRAC^{m}, Fig. 2e) or just as a plugin to a thermal annealing method (DIRACSA and DIRACPT), on EA spin glass instances with much larger sizes than the training ones.
DIRAC’s success is mainly determined by the following two key issues: (1) How to represent states and actions effectively? (2) How to leverage these representations to compute a Qvalue, which predicts the longterm gain for an action under a state. We refer to these two questions as the encoding and decoding problem, respectively.
Encoding
Since a hypercubic lattice can be regarded as a special graph, we design an encoder based on graph neural networks^{37,38,39,40,41}, namely SGNN (Spin Glass Neural Network), to represent states and actions. As shown in Fig. 3, to capture the coupling strengths {J_{ij}}, which are crucial to determine the spin glass ground states, SGNN performs two updates at each of the K iterations: the edgecentric update and the nodecentric update, respectively. Here,the hyperparameter K represents the number of messagepassing steps in SGNN, and we set K = 5 in our calculations. The edgecentric update (Fig. 3b, Fig. S1a) aggregates edge embedding vectors, which are initialized as edge input features (SI Sec. IA), from its adjacent nodes. The nodecentric update (Fig. 3c, Fig. S1b) aggregates node embedding vectors, which are initialized as node input features (SI Sec. IA), from its adjacent edges. Both updates concatenate the self embedding and the neighborhood embedding and are then subjected to a nonlinear transformation (e.g., rectified linear unit, \({{{{{{{\rm{ReLU}}}}}}}}(z)=\max (0,z)\)). Traditional graph neural networks architectures often carry only nodecentric updates^{37,38,39,41}, with edge weights taken as node’s neighborhood if needed. Yet this would fail in our case where edge weights play vital roles, and lead to unsatisfactory performances (see ablation study in SI Sec. IF and Fig. S2).
SGNN repeats K iterations of both edgecentric and nodecentric updates, and finally obtains an embedding vector for each node (or spin) (Fig. 3e). Essentially, each node’s embedding vector after K iterations captures both its position in the lattice and its longrange couplings with neighbors within K hops (see Fig. 3f for an example of K = 5). In our RL setting, each node is subject to a potential action, thus we also call the embedding vector of node i, denote as z_{i}, its action embedding. Collectively, we denote z_{a} = {z_{i}}, which includes embedding vectors for all the nodes i = 1, ⋯ , N. To represent the whole lattice (i.e., the state in our setting) and obtain the state embedding, denote as z_{s}, we sum over all node embedding vectors^{41}, which is a straightforward and empirically effective way for graphlevel encoding. (SI Sec. IA and Algo. S3 describe more details about SGNN.)
Decoding
Once the action embeddings z_{a} and state embedding z_{s} have been computed, DIRAC will leverage these representations to compute the stateaction pair value function Q(s, a^{(i)}; Θ), which predicts the expected future cumulative rewards if taking action a^{(i)} under state s, and following the policy π_{Θ}(a^{(i)}∣s) till the end of the episode (i.e., till all the spins have been flipped once). Hereafter, we will refer to this function as the Qvalue of spini. Specifically, we concatenate the embeddings of state and action, and apply a neural network with nonlinear transformations to map the concatenation [z_{s}, z_{i}] to a scalar value. In theory, any neural network architecture can be used. Here for the sake of simplicity, we adopt the classical multilayer perceptron (MLP) with ReLU activation. (see SI Sec. IB for more details):
Note that here \({{\Theta }}=\{{{{\Theta }}}_{{{{{{{{\mathcal{E}}}}}}}}},{{{\Theta }}}_{{{{{{{{\mathcal{D}}}}}}}}}\}\), \({{{\Theta }}}_{{{{{{{{\mathcal{E}}}}}}}}}\) are the SGNN encoder parameters (see SI Eq. 1–Eq. 2), \({{{\Theta }}}_{{{{{{{{\mathcal{D}}}}}}}}}\) are the MLP decoder parameters (see SI Eq. 3).
Offline training
We will adopt the above Q function to calculate the spin glass ground state. Prior to that, we first need to optimize the Q function to predict a more accurate future gain.
We define the nstep Qlearning loss as:
and we perform minibatch gradient descent to update parameters Θ over large amounts of experience transitions, which are represented by the 4tuple transitions (s_{t}, a_{t}, r_{t,t+n}, s_{t+n}) in the DIRAC framework. The transitions are randomly sampled from the experience replay buffer \({{{{{{{\mathcal{B}}}}}}}}\), s_{t} and a_{t} denote the state and action at time step t, respectively. \({r}_{t,t+n}=\mathop{\sum }\nolimits_{k=0}^{n1}{\gamma }^{k}r({s}_{t+k},{a}_{t+k},{s}_{t+k+1})\) represents the nstep accumulated reward, the discount factor γ is a hyperparameter that controls how much to discount future rewards. \(\hat{{{\Theta }}}\) is the target parameter set, which will only be updated with Θ every a certain number of episodes (see SI Sec. IC and Algo. S4 for more details on training).
Online application
DIRAC is trained over a large number of small random spin glass instances. Once the training phase is finished, we will perform the optimized Qbased ground state search. Traditional Qbased strategy greedily takes the highestQ action each step till the end. Here we adopt the batch nodes selection strategy^{30}, i.e., at each step we flip a top fraction (e.g., 1%) of the spins with highest Qvalues. Similar to the training phase, we start from the allspinsup configuration, end at the allspinsdown configuration, and each spin is flipped only once. Hereafter we refer to this process as DIRAC^{1}. The spin configuration of the lowest energy encountered during this process is returned as the predicted ground state. Note that starting from the same configuration forces the agent to learn a strategy with the same starting point, which drastically reduces the potential trajectory space and thus requires less data for training. Ending at the same configuration makes the agent always finish the MDP within finite steps. This finitehorizon MDP forces the agent to pick the right move without allowing too many regrets.
We emphasize that the vanilla strategy DIRAC^{1} has several limitations. First, it can only handle one single uniform initialization {↑, ⋯ , ↑}, rather than multiple random initializations. This drastically hinders DIRAC’s performance as significant performance improvements would be achieved by simply taking the best solution found across multiple initializations. Second, starting from the allspinsup configuration and ending at the allspinsdown configuration (with each spin flipped only once) is certainly not ideal. An ideal way is to let the agent “revisit” its earlier decisions so as to search for an everimproving solution. After all, due to the inherent complexity of combinatorial optimization problems, a policy that produces only one single “bestguess” solution is often suboptimal. However, most of the existing RL algorithms are unable to revisit their earlier decisions in solving combinatorial optimization problems, because they often use a greedy strategy to construct the solution incrementally, adding one element at a time. To solve this issue, many recent attempts designed complex neural network architectures with lots of tedious and ad hoc input features^{42,43}. Yet, their presented results are far from satisfactory, they could not achieve the exact ground truth on even small instances, and they lack the validations on large instances.
To resolve the limitations of DIRAC^{1}, we employ the technique of GT in Ising systems^{44}. The GT between one spin glass instance {σ_{i}, J_{ij}} and another instance \(\{{\sigma }_{i}^{{\prime} },{J}_{ij}^{{\prime} }\}\) are given by^{45,46}:
where t_{i} = ± 1 are independent auxiliary variables so that \({\sigma }_{i}^{{\prime} }\) can take a desired Ising spin value. This technique is able to switch the spin glass system between any two configurations while keeping the system energy invariant (since \({J}_{ij}^{{\prime} }{\sigma }_{i}^{{\prime} }{\sigma }_{j}^{{\prime} }={J}_{ij}{\sigma }_{i}{\sigma }_{j}\)), which also means any input configuration can be gauge transformed to the allspinsup configuration. In this way, DIRAC^{1} is able to handle any random input spin configuration. Note that if there exists external fields h_{i}, we only need to make \({h}_{i}^{{\prime} }={h}_{i}{t}_{i}\) so that GT still works.
With the aid of GT, we can design a more powerful strategy beyond DIRAC^{1}, referred to as DIRAC^{m} hereafter. As the name suggests (also shown in Fig. 2e), DIRAC^{m} repeats m iterations of DIRAC^{1}. During each iteration, DIRAC^{1} starts from an instance with allspinsup configuration, which is obtained by gauge transforming the lowestenergy configuration found in the previous iteration, until the system energy no longer decreases. (As shown in SI Fig. S3, initializing from the lowestenergy configuration found in the previous iteration is much better than from a random one.) Notably, GT allows DIRAC^{m} to revisit the earlier decisions by computing a new set of Qvalues (so as to reevaluate the states and actions) at each iteration. The Qvalue can be seen as a function of {J_{ij}, σ_{i}}, i.e., Q(J_{ij}, σ_{i}). DIRAC will generate different Qvalues for different instances, as long as they have different bond signs and spin values (even if those instances are connected by GTs and hence share the same physics). This also explains why GT only works for DIRAC, but fails for any other energybased methods, such as Greedy, SA or PT. Those methods only consider the energy of each bond, while GT does not change the energy of each bond at all: \(({J}_{ij}{\sigma }_{i}{\sigma }_{j}={J}_{ij}^{{\prime} }{\sigma }_{i}^{{\prime} }{\sigma }_{j}^{{\prime} })\). (see SI Sec. ID for more details on DIRAC^{m}).
Another prime use of GT is the socalled gauge randomization^{47}, where one may execute many runs (or randomizations) of DIRAC (either DIRAC^{1} or DIRAC^{m}) for an input spin glass instance, with each run the instance is randomly initialized with a different spin configuration. The configuration of the lowest energy among these runs is then returned as the predicted ground state of the input instance.
DIRAC can also serve as a plugin to MonteCarlo based methods, such as SA and PT. The key ingredient of these methods is the socalled MetropolisHastings criterion:
which means that the probability of accepting a move with energy change ΔE at β (indicates the inverse temperature, 1/T) is the minimum of 1 and e^{−βΔE}. The move is usually referred to as a small perturbation of the system, and in our case it just means a singlespin flip. At high temperatures, the MetropolisHastings criterion tends to accept all possible moves (including “bad” moves with ΔE > 0). However, at low temperatures it is more likely to accept those “good” moves that could lower the energy (i.e., with ΔE < 0), rendering the moveandaccept iteration more like a greedy search. We refer the process of using the MetropolisHastings criterion to accept moves as the energybased MetropolisHastings (EMH) procedure hereafter. The art of these MonteCarlo based methods, in some sense, is the balance between exploration at high temperatures and exploitation (energy descents) at low temperatures.
The general idea of using DIRAC as a plugin to MonteCarlo based methods is to replace the EMH procedure with DIRAC. (Later in this paper we will demonstrate the longsighted greediness of DIRAC with respective to a purely energybased greediness.) Specifically, we design a DIRACbased MetropolisHastings (DMH) procedure. At each iteration, we let the systems or replicas choose randomly between DMH and EMH for a more effective configuration search. The DMH procedure uses one DIRAC^{1} (with the assistance of GT) to lower the system energy. When the system energy reaches a local minimum (i.e., ΔE = 0), DMH will perturb the spin configuration by flipping each spin with a temperaturedependent probability (SI Eq. 6). When applying this plugin idea to SA and PT, we obtain DIRACSA and DIRACPT, respectively. See SI Sec. IH, Algo. S6, Algo. S7 and Fig. S4 for more details about these two DIRACenhanced algorithms.
Performance of finding the ground state
To demonstrate the power of DIRAC in finding the ground states of Ising spin glasses, we first calculated the probability P_{0} of finding the ground states of smallscale EA spin glass instances as a function of the number of initial spin configurations (denoted as n_{initial}) (see Fig. 4(a–c)). Here we want to point out the difference between the concepts of initial configuration and run. Usually initial configuration can be seen as the same as sweep, referring to N spinflip (attempts). A run refers to a complete process of an algorithm, and it contains a certain number of initial configurations. For example, PT consists of N_{e} epochs and N_{r} replicas, in each epoch, a replica will do a single sweep, namely N spinflip attempts, so a run of PT contains N_{e} × N_{r} initial configurations; DIRAC^{m} contains m iterations of DIRAC^{1}, and each DIRAC^{1} contains N spinflips, so each run of DIRAC^{m} is counted as m initial configurations. Those instances were chosen to be small so that their exact ground states can be calculated by the branchandbound based solver Gurobi within tolerable computing time. For each given n_{initial}, we empirically computed P_{0} as the fraction of 1000 random instances for which the ground state is found by DIRAC (and confirmed by Gurobi^{48}). We found that DIRAC enables a much faster finding of ground states than the Greedy, SA, and PT algorithm. In fact, all DIRAC variants (DIRAC^{1}, DIRAC^{m}, DIRACSA and DIRACPT) facilitate the finding of ground states. For example, in the case of D = 2 and L = 10 (Fig. 4a), \({n}_{{{{{{{{\rm{initial}}}}}}}}}^{*}\) (the minimum value of n_{initial} where P_{0} reaches 1.0) of PT is 322,800, while \({n}_{{{{{{{{\rm{initial}}}}}}}}}^{*}\) of DIRAC^{m} is only 600. In fact, for DIRAC^{m} the ground states can be found with only one gauge randomization for some instances.
To systematically compare those algorithms in terms of their ability of finding the ground states, we investigated the system size scaling of \({n}_{{{{{{{{\rm{initial}}}}}}}}}^{*}\). As shown in Fig. 4(d–f), DIRAC’s superior performances of facilitating the finding of ground states is persistent across different systems with varying sizes, rather than only for the three sizes presented in Fig. 4(a–c).
We notice that the P_{0} ~ n_{initial} curve of DIRACSA contains very few scatter points, and its \({n}_{{{{{{{{\rm{initial}}}}}}}}}^{*}\) is almost independent of the system size N. This is because in our implementation, each result of SA or DIRACSA is calculated using 5000 initial configurations (one run), and for these small systems, one run of DIRACSA is able to reach their ground states. Due to the NPhard nature of this problem, we believe the \({n}_{{{{{{{{\rm{initial}}}}}}}}}^{*}\) of DIRACSA will eventually grow exponentially with N for large N.
We also notice that in Fig. 4(b,c) the performances of SA and PT seem to be worse than the simple Greedy algorithm. We suspect this is because for those small systems, the simple Greedy algorithm, which greedily flips the highestenergydrop spin, could reach the ground states much faster than SA or PT. Indeed, those annealingbased algorithms often require multiple energeticallyunfavorable spinflips in order to ultimately reach a lower energy state. For large systems, the Greedy algorithm would easily get stuck in the local minimum and thus need more initial configurations to reach the ground state, as shown in Fig. 4 (a,d–f).
Performance of minimizing the energy
For larger systems, it is hard to compute the probability of finding the ground states for any algorithm, because we need to confirm if the calculated “ground state” is the true ground state obtained by an exact solver, and even the best branchandbound solver could not calculate the ground states of very large instances within acceptable time. In this case, a more practical choice of benchmarking various algorithms is to compare the energy of their predicted “ground state”, denoted as E_{0}, which is not necessarily the true ground state energy, but the lowest energy provided by each algorithm for each particular instance. In particular, we are interested in the disorder averaged “groundstate” energy per spin, denoted as e_{0}, which is computed as E_{0}/N averaged over many instances. In Fig. 5, we demonstrate e_{0} as a function of n_{initial} on several large systems. Up to our knowledge, some of these systems, such as D = 3, L = 20, have never been considered in previous studies. Moreover, we have never seen results on the 4D systems in the literature.
From Fig. 5 we made several following observations. First, DIRACSA reaches the lowest e_{0} for all cases. In some cases, DIRACSA is very close to the reported ground state in existing studies. For example, for D = 3 and L = 10, Ref. ^{49} reported e_{0} = −1.6981 (with n_{initial} = 3.2 × 10^{7}, obtained by PT), while DIRACSA obtained e_{0} = − 1.6906 (with n_{initial} = 2 × 10^{4}, fewer than one thousandth of the number of initial configurations used in Ref. ^{49}) (Fig. 5d). Second, the performance of SA in minimizing the system energy is surprisingly good, which is comparable, sometimes even has a better performance than PT (up to n_{initial} = 2 × 10^{4}). PT has long been considered as the stateoftheart algorithm for the spin glass ground state problem^{27,49,50}. However, our observation suggests that we should revisit the potential of SA in solving this problem. Third, DIRAC as a general plugin could greatly improve annealingbased MonteCarlo methods, such as SA and PT. For the nine systems studied in Fig. 5, DIRACSA computes an average 0.79% energy lower than SA, and DIRACPT calculates an average 2.01% energy lower than PT. Statistical tests indicate that these improvements are not marginal, but statistically significant : p value < 10^{−4} for most cases, and < 0.05 for all the cases (Wilcoxon signedranked test, see SI, Fig. S5). Finally, there is a clear performance gap between DIRAC^{1} and DIRAC^{m} in Fig. 5. This is simply because DIRAC^{m} (as a sequential running of m iterations of DIRAC^{1} connected by GTs) can jump out of local minimum and finally reaches a much lower energy state than DIRAC^{1}.
Efficiency
Besides the effectiveness, DIRAC is also computationally efficient. For example, during the application phase, at each step DIRAC^{1} flips a small fraction (e.g., 1%) of the highestQ spins, rather than just flipping the spin with the highest Qvalue as we did in the training phase. In our numerical experiments, we found this batch nodes selection strategy^{30} reduces the running time significantly without sacrificing much accuracy (Fig. S6). Both time complexity analysis (SI Sec. IE, Tab. S1, Fig. S7) and the performance analysis of finding the ground state (Fig. 4) suggest that DIRAC displays a better scalability than other methods.
We should admit that DIRAC needs to be offline trained while other methods needn’t. Yet, we think it is reasonable to compare DIRAC’s efficiency without considering its training time, as DIRAC needs to be trained offline only once for each dimension (Fig. S8), and could then be applied infinite times for the systems (of the same dimension) with different sides. Besides, DIRAC’s training time is often affordable. For some large systems the total costs required by its training and application are still lower than that of PT. For instance, although DIRAC needs about 2.5 h to finish training on the 3D system, it takes only on average 417 s for DIRAC^{m} to calculate a random spin glass instance with D = 3, L = 20 (n_{initial} = 10). However, to obtain the same energy, PT needs on average 3 h (n_{initial} = 5, 360), which is higher than the total time costs of DIRAC^{m} (about 2.62 h) (Fig. 5f). Note that all the calculations were conducted on a 20core computer server with 512GB memory and a 16GB Tesla V100 GPU.
Since the biggest computational cost of DIRAC is from the matrix multiplications in SGNN, the graphics processing unit (GPU)accelerations can be more easily applied on DIRAC than other methods, as the matrix multiplication itself is particularly suitable for paralleling. Still, for the sake of fairness, here we report DIRAC’s CPU testing time only, and do not deploy its GPUaccelerations in the application phase. We only utilized GPU to speed up the training process. Hence, the efficiency of DIRAC presented here is rather conservative.
Application on general NPhard problems
It has been shown that many NPhard problems, including all of Karp’s 21 NPcomplete problems (such as the maxcut problem, 3SAT problem and the graph coloring problem), have Ising formulations^{11}. Hence, we anticipate that DIRAC (with some modifications) could help us solve a wide range of NPhard problems that have Ising spin glass formulations. We emphasize that to make the current DIRAC framework fully compatible with more complex Ising formulations is nontrivial. In the EA spin glass model, we only have pairwise or twobody interactions, which can be represented by “edges” connecting spins. When the Ising formulation involves kbody interactions (with k > 2), we have to leverage the notion of hypergraph and replace the “edge feature” in DIRAC with the “hyperedge feature”^{51}, which have been heavily studied in the field of hypergraph and hypergraph learning^{52,53,54}.
We emphasize that GT can be applied to any optimization problem (such as kSAT^{55} and graph coloring^{11}) with an Ising formulation. Consider a general Ising formulation:
Here we have M interactions, and the ath interaction involves k_{a} ⩾ 1 Ising spins (k_{a} = 1 corresponds to the external field, k_{a} = 2 corresponds to the twobody interaction we considered in this paper). Note that GT still works (with t_{i} = ± 1):
So that the Hamiltonian/energy remains the same:
As a concrete example, we applied DIRAC to explicitly calculate the maxcut problem (SI Sec. II), a canonical example of the mapping between Ising spin glasses and NPhard problems^{11}. The results are shown in SI Fig. S9. We found that DIRAC consistently outperforms other competing maxcut solvers.
Interpreting the superior performance of DIRAC
In this section, we offer a systematic interpretation on the superior performances of DIRAC. In Fig. 6, we compare the system’s energy difference between DIRAC^{1} and the Greedy algorithm at each greedy step. Note that both methods are performed greedily, the difference is that the former greedily flips the highestQvalue spin at each step while the latter greedily flips the highestenergydrop spin at each step. (Note that, for a fair comparison, here in DIRAC^{1} at just step we just flip the spin with the highest Qvalue, instead of flipping a fraction of spins with highest Qvalues.) The (energybased) Greedy algorithm represents an extremely shortsighted strategy, since it focuses only on each step’s maximal energy drop. Fig. 6 clearly shows that compared to this shortsighted strategy, DIRAC^{1} always goes through a highenergy state temporarily for the early steps, so as to reach a much lower energy state in the long run. This result implies that DIRAC has learned to make shortterm sacrifices for longterm gains. In other words, DIRAC has been trained to be mindful of its longterm goals.
In Fig. S10, we demonstrate that, during each iteration of DIRAC^{m} (which is a sequential running of m iterations of DIRAC^{1} connected by GTs), there are two interesting phenomena: (1) the fraction of antiferromagnetic bonds in the gauge transformed instances keeps decreasing (Fig. S10k); and (2) the Qvalue distribution becomes more homogeneous (Fig. S10l). In Fig. S3, we show clear evidence that DIRAC^{m} significantly outperforms m independent DIRAC^{1} (where each DIRAC^{1} is dealing with a random instance with ≈ 50% antiferromagnetic bonds). These results implies that the superior performance of DIRAC^{m} is related to the decreasing fraction of antiferromagnetic bonds and a more homogeneous Qvalue distribution due to the sequential GTs.
The results shown in Fig. S3 and Fig. S10 prompt us to ask if the superior performance of DIRAC over other methods can be better visualized in an extreme case, i.e., an antiferromagnetic Ising model where J_{ij} = − 1 for all the bonds. It is well known that this simple model has a ground state with a checkerboard pattern in the spin configuration (as shown in Fig. 7, first row, step = 200, where the red/white sites represent −1/+1 spins, respectively). However, for classical energybased heuristic algorithms (e.g., Greedy, SA and PT), this ground state cannot be found easily. Consider an antiferromagnetic Ising model on a 20 × 20 square lattice with periodic boundary conditions. The last column in Fig. 7 shows the trend of energy vs. the number of steps taken by those heuristic algorithms. For the Greedy algorithm, it ran for 191 steps and got stuck in a local minimum whose energy is significantly higher than the ground state energy. For SA or PT (the coldest replica), it took 21,333 or 10,808 steps to finally reach the ground state, respectively. By contrast, DIRAC only took 200 = 20 × 20/2 steps (i.e., flipping exactly half of the spins) to reach the ground state for this lattice. In other words, DIRAC did not make any wrong decision in the whole process, which is remarkable.
To further explain why DIRAC is so “smart” in this case, we look at the snapshots shown in Fig. 7. All the different algorithms start from a uniform initial state where all the spin values are set to be +1. Note that in the snapshots, red sites represent spin values − 1. Sites with grayscale colors represent spin values + 1, and the grayscale of each site is determined by its Qvalue or siteenergy. For DIRAC, a darker site corresponds to a higher Qvalue; for Greedy, SA and PT, a darker site corresponds to a higher siteenergy. All the algorithms always tend to flip a darker site with a higher Qvalue or siteenergy. But DIRAC differs from other algorithms in the following way. Since the instance composes of purely antiferromagnetic bonds, the Qvalues of different nodes (spins) are all the same at the beginning. After the first spin is flipped (see the center node in the step = 1 snapshot of DIRAC in Fig. 7), there are two consequences: (1) all its first nearest neighbors’ Qvalues are “smartly” decreased, rendering them less likely to be flipped in the future; (2) all its second nearest neighbors’ Qvalues are “smartly” increased, rendering them more likely to be flipped in the next step. In a sense, DIRAC has a longsighted vision that cleverly leverages the nature of a purely antiferromagnetic Ising model. As a result, the intermediate snapshots (e.g., in step = 50) display a clear “stripe” pattern that “grows” from the first flipped spin. By contrast, other algorithms are shortsighted, try almost random flips at the beginning, then make incorrect flips and get stuck in the local minimum. The Greedy algorithm got stuck in a local minimum forever. SA and PT can jump out of their local minimum, but it took them very long time to achieve the final ground state.
Taken together, DIRAC seeks to mimic human intelligence in solving the ground state problem. For the spin glass ground state problem, it learns to scarify shortterm satisfaction for longterm gains. For the antiferromagnetic Ising model, it demonstrates a remarkable longsighted vision by making a smart move every time.
Discussion
This work reports an effective and efficient deep RLbased algorithm, DIRAC, that can directly calculate the ground states of EA spin glasses. Extensive numerical calculations demonstrate that DIRAC outperforms stateoftheart algorithms in terms of both solution quality and running time. Besides, we also evaluate DIRAC’s superior performances under different scenarios, e.g., different coupling distributions (Gaussion vs. Bimodal vs. Uniform) (Fig. S11); different topological structures (trees vs. loopy trees vs. lattices) (Fig. S12); different hardness regimes (Fig. S13, Fig. S14) and different spin glass models (EA vs. SherringtonKirkpatrick) (Fig. S15), see SI Sec. III for more details. Through a pure datadriven way and without any domainspecific guidance, DIRAC smartly mimics the human intelligence in solving the spin glass ground state problem. In particular, DIRAC enables a much faster finding of ground states than existing algorithms, and it can greatly improve annealingbased methods (e.g., SA and PT) to reach the lowest system energy for all dimensions and sides.
Note that in our implementations of annealingbased methods (e.g., SA and PT), we took the parameters of SA and PT from Ref. ^{50} and Ref. ^{49}. We found that our implementations of SA and PT were able to generate similar results as (or, arguably, a slightly better performance than) what were reported in existing works (see SI, Fig. S16 and Fig. S17). We emphasize that even if SA or PT itself can be further improved, we can still use DIRAC as a plugin to enhance the improved version of SA or PT. Hence, we are not only interested in comparing DIRAC with the stateoftheart implementation of SA or PT, but also interested in comparing DIRACenhanced thermal annealing algorithms with their corresponding vanilla algorithms (as shown in SI Fig. S5).
In the future, advances in deep graph representations may enable us design a better encoder, and developments of RL techniques may help a more efficient training. Both would further improve DIRAC’s performances to find the ground states of Ising spin glasses. The utilization of GT in DIRAC and the way of combining DIRAC and annealingbased methods may also inspire many other physicsguided AI research. Our current framework is just the beginning of such a promising adventure.
Methods
DIRAC
For DIRAC, Tab. S2 lists the values of its hyperparameters, which were determined by an informal grid search. We only tried to tune a few hyperparameters, including the discount factor γ, delay reward steps n and the messagepassing steps K. The results are shown in Fig. S18, Fig. S19, and Fig. S20. Therefore DIRAC’s performances can be further improved by a more systematic grid search. For example, in Fig. S20, we found that the agent trained using the same K value as that in testing often yields the best performance, and this observation stands for different system sizes. This contradicts our intuition that a larger K should always obtain better performances on large systems due to a better capture of the longrange correlations. We suspect that this may be due to the inconsistency between K and the embedding dimension d (i.e., the size of node embedding vector, which is always set to be d = 64 in all our calculations). We anticipate that d should be higher for higher K so that longer correlations can be encoded in the node embedding vector. Systematically testing this idea is beyond the scope of the current study. For more implementation details, please see SI Sec. I.
SA
For SA, we linearly annealed the temperature from a high value to a low one, the number of temperatures is set to be N_{t}. For each temperature, we performed N_{s} sweeps of explorations, each sweep contains N (number of spins) random moves. The values of hyperparameters are determined from Ref. ^{50}, i.e., setting the maximal inverse temperature \({\beta }_{\max }=5\) and the minimal inverse temperature \({\beta }_{\min }=0\). We set N_{t} = 100, which is consistent with the first row in TABLE I in Ref. ^{50}. Ref. ^{50} also pointed out that the optimized value of N_{t} × N_{s} should be around 5000. In our case, we set N_{s} = 50 and N_{t} = 100. For more implementation details, please see SI Sec. IV.
PT
For PT, we chose N_{r} = 20 replicas, whose temperatures range from 0.1 to 1.6 with equal interval^{49}, initialized with random configurations. Within each epoch, we attempted random flips for N (the number of spins) times. After these random flips, we randomly picked up two replicas and exchanged their spin configurations. The lowest energy and the corresponding spin configuration of all the replicas were recorded during the whole process. For more implementation details, please see SI Sec. IV.
Data availability
The data used to reproduce the results in this paper are publicly available through Zenodo^{56} (https://doi.org/10.5281/zenodo.7562380).
Code availability
The source code of DIRAC (and its variants), as well as the two baseline methods, SA and PT, are publicly available through Zenodo^{56} (https://doi.org/10.5281/zenodo.7562380or on GitHub (https://github.com/FFrankyy/DIRAC.git).
References
Binder, K. & Young, A. P. Spin glasses: Experimental facts, theoretical concepts, and open questions. Rev. Mod. Phys. 58, 801 (1986).
Mézard, M., Parisi, G. & Virasoro, M. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications 9, (1987).
Hartmann, A. K. & Rieger, H. Optimization algorithms in physics 2, (2002).
Edwards, S. F. & Anderson, P. W. Theory of spin glasses. J. Phys. F: Met. Phys. 5, 965 (1975).
Chayes, J. T., Chayes, L., Sethna, J. P. & Thouless, D. J. A mean field spin glass with short range interactions. Comm. Math. Phys. 106, 41–89 (1986).
Ceccarelli, G., Pelissetto, A. & Vicari, E. Ferromagneticglassy transitions in threedimensional ising spin glasses. Phys. Rev. B 84, 134202 (2011).
Cugliandolo, L. F. & Kurchan, J. Weak ergodicity breaking in meanfield spinglass models. Philos. Mag. B 71, 501–514 (1995).
Carter, A., Bray, A. & Moore, M. Aspectratio scaling and the stiffness exponent θ for ising spin glasses. Phys. Rev. Lett. 88, 077201 (2002).
Hartmann, A. K. Scaling of stiffness energy for threedimensional ± j ising spin glasses. Phys. Rev. E 59, 84 (1999).
Barahona, F. On the computational complexity of ising spin glass models. J. Phys. A: Math. Gen. 15, 3241 (1982).
Lucas, A. Ising formulations of many np problems. Front. Phys. 2, 5 (2014).
Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75 (2010).
Goldstein, R. A., LutheySchulten, Z. A. & Wolynes, P. G. Optimal proteinfolding codes from spinglass theory. Proc. Natl Acad. Sci. 89, 4918–4922 (1992).
Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl Acad. Sci. (USA) 79, 2554 (1982).
Little, W. A. The existence of persistent states in the brain. Math. Biosci. 19, 101 (1974).
Amit, D. J., Gutfreund, H. & Sompolinsky, H. Spinglass models of neural networks. Phys. Rev. A 32, 1007 (1985).
Sompolinsky, H. Statistical mechanics of neural networks. Phys. Today 40, 70 (1988).
Mézard, M., Parisi, G. & Zecchina, R. Analytic and algorithmic solution of random satisfiability problems. Science 297, 812–815 (2002).
De Simone, C. et al. Exact ground states of ising spin glasses: new experimental results with a branchandcut algorithm. J. Stat. Phys. 80, 487–496 (1995).
Hartmann, A. K. Ground states of twodimensional ising spin glasses: fast algorithms, recent developments and a ferromagnetspin glass mixture. J. Stat. Phys. 144, 519 (2011).
Khoshbakht, H. & Weigel, M. Domainwall excitations in the twodimensional ising spin glass. Phys. Rev. B 97, 064410 (2018).
Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220, 671–680 (1983).
Gubernatis, J. E. The monte carlo method in the physical sciences: celebrating the 50th anniversary of the metropolis algorithm. The Monte Carlo Method in the Physical Sciences 690 (2003).
Swendsen, R. H. & Wang, J.S. Replica monte carlo simulation of spinglasses. Phys. Rev. Lett. 57, 2607 (1986).
Geyer, C. J. et al. Computing science and statistics: proceedings of the 23rd symposium on the interface. American Statistical Association 156, (1991).
Hukushima, K. & Nemoto, K. Exchange monte carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 65, 1604–1608 (1996).
Earl, D. J. & Deem, M. W. Parallel tempering: Theory, applications, and new perspectives. Phys. Chem. Chem. Phys. 7, 3910–3916 (2005).
Mnih, V. et al. Humanlevel control through deep reinforcement learning. Nature 518, 529–533 (2015).
Li, Z., Chen, Q. & Koltun, V. Combinatorial optimization with graph convolutional networks and guided tree search. In Advances in Neural Information Processing Systems 31, 539–548 (2018).
Fan, C., Zeng, L., Sun, Y. & Liu, Y.Y. Finding key players in complex networks through deep reinforcement learning. Nat. Mach. Intell. 2, 317–324 (2020).
Bello, I., Pham, H., Le, Q. V., Norouzi, M. & Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016).
Nazari, M., Oroojlooy, A., Snyder, L. & Takác, M. Reinforcement learning for solving the vehicle routing problem. In Advances in Neural Information Processing Systems 31, 9861–9871 (2018).
Mills, K., Ronagh, P. & Tamblyn, I. Finding the ground state of spin hamiltonians with reinforcement learning. Nat. Mach. Intell. 2, 509–517 (2020).
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Khalil, E., Dai, H., Zhang, Y., Dilkina, B. & Song, L. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems 30, 6348–6358 (2017).
Mazyavkina, N., Sviridov, S., Ivanov, S. & Burnaev, E. Reinforcement learning for combinatorial optimization: a survey. Comput. Oper. Res. 134, 105400 (2021).
Kipf, T. N. & Welling, M. Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations, (2017).
Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30, 1024–1034 (2017).
Velickovic, P. et al. Graph attention networks. In International Conference on Learning Representations, (2018).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In International Conference on Machine Learning, 1263–1272 (PMLR, 2017).
Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? In International Conference on Learning Representations, (2018).
Barrett, T., Clements, W., Foerster, J. & Lvovsky, A. Exploratory combinatorial optimization with reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence 34, 3243–3250 (2020).
Yao, F., Cai, R. & Wang, H. Reversible action design for combinatorial optimization with reinforcementlearning. In AAAI22 Workshop on Machine Learning for Operations Research (ML4OR), (2021).
Wegner, F. J. Duality in generalized ising models and phase transitions without local order parameter. J. Math. Phys. 12, 2259 (1971).
Ozeki, Y. Gauge transformation for dynamical systems of ising spin glasses. J. Phys. A: Math. Gen. 28, 3645 (1995).
Batista, C. D. & Nussinov, Z. Generalized elitzur’s theorem and dimensional reductions. Phys. Rev. B 72, 045137 (2005).
Hamze, F. et al. From near to eternity: spinglass planting, tiling puzzles, and constraintsatisfaction problems. Phys. Rev. E 97, 043303 (2018).
Gurobi Optimization, L. Gurobi optimizer reference manual, (2021).
Romá, F., RisauGusman, S., RamirezPastor, A. J., Nieto, F. & Vogel, E. E. The ground state energy of the edwards–anderson spin glass model with a parallel tempering monte carlo algorithm. Phys. A: Stat. Mech. Appl. 388, 2821–2838 (2009).
Wang, W., Machta, J. & Katzgraber, H. G. Comparing monte carlo methods for finding ground states of ising spin glasses: Population annealing, simulated annealing, and parallel tempering. Phys. Rev. E 92, 013303 (2015).
Feng, Y., You, H., Zhang, Z., Ji, R. & Gao, Y. Hypergraph neural networks. In Proceedings of the AAAI conference on Artificial Intelligence 33, 3558–3565 (2019).
Yu, C.A., Tai, C.L., Chan, T.S. & Yang, Y.H. Modeling multiway relations with hypergraph embedding. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 1707–1710 (2018).
Gao, Y. et al. Hypergraph learning: Methods and practices. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2548–2566(2020).
Pu, L. & Faltings, B. Hypergraph learning with hyperedge expansion. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 410–425 (2012).
ErcseyRavasz, M. & Toroczkai, Z. Optimization hardness as transient chaos in an analog approach to constraint satisfaction. Nat. Phys. 7, 966–970 (2011).
Fan, C. et al. Searching for spin glass ground states through deep reinforcement learning (v1.0.1). Zenodo (2023). https://doi.org/10.5281/zenodo.7562380.
Acknowledgements
We are grateful to L. Zeng for the valuable discussions. C.F. and Z.L. are supported by National Natural Science Foundation of China (NSFC, 62206303, 62273352, 62073333, 72025405, 72088101), and Ministry of Science and Technology of China (MSTC, 2022YFB3102600).
Author information
Authors and Affiliations
Contributions
Y.Y.L. conceived and designed the project. Y.Y.L. and Y.S. managed the project. C.F. and M.S. performed all the numerical calculations and analyzed the results, Y.Y.L., Y.S., Z.N., and Z.L. interpreted the results. C.F., M.S., and Y.Y.L. wrote the paper, Y.S. and Z.N. edited the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fan, C., Shen, M., Nussinov, Z. et al. Searching for spin glass ground states through deep reinforcement learning. Nat Commun 14, 725 (2023). https://doi.org/10.1038/s4146702336363w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702336363w
This article is cited by

Deep reinforced learning heuristic tested on spinglass ground states: The larger picture
Nature Communications (2023)

Reply to: Deep reinforced learning heuristic tested on spinglass ground states: The larger picture
Nature Communications (2023)

Novel multiple access protocols against Qlearningbased tunnel monitoring using flying ad hoc networks
Wireless Networks (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.