Deep reinforced learning heuristic tested on spin-glass ground states: The larger picture

In Changjun Fan et al. [Nature Communications https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep reinforced learning approach to augment combinatorial optimization heuristics. In particular, they present results for several spin glass ground state problems, for which instances on non-planar networks are generally NP-hard, in comparison with several Monte Carlo based methods, such as simulated annealing (SA) or parallel tempering (PT). Indeed, those results demonstrate that the reinforced learning improves the results over those obtained with SA or PT, or at least allows for reduced runtimes for the heuristics before results of comparable quality have been obtained relative to those other methods. To facilitate the conclusion that their method is ''superior'', the authors pursue two basic strategies: (1) A commercial GUROBI solver is called on to procure a sample of exact ground states as a testbed to compare with, and (2) a head-to-head comparison between the heuristics is given for a sample of larger instances where exact ground states are hard to ascertain. Here, we put these studies into a larger context, showing that the claimed superiority is at best marginal for smaller samples and becomes essentially irrelevant with respect to any sensible approximation of true ground states in the larger samples. For example, this method becomes irrelevant as a means to determine stiffness exponents $\theta$ in $d>2$, as mentioned by the authors, where the problem is not only NP-hard but requires the subtraction of two almost equal ground-state energies and systemic errors in each of $\approx 1\%$ found here are unacceptable. This larger picture on the method arises from a straightforward finite-size corrections study over the spin glass ensembles the authors employ, using data that has been available for decades.

son between the heuristics is given for a sample of larger instances where exact ground states are hard to ascertain.Here, we put these studies into a larger context, showing that the claimed superiority is at best marginal for smaller samples and becomes essentially irrelevant with respect to any sensible approximation of true ground states in the larger samples.For example, this method becomes irrelevant as a means to determine stiffness exponents θ in d > 2, as mentioned by the authors, where the problem is not only NP-hard but requires the subtraction of two almost equal ground-state energies and systemic errors in each of ≈ 1% found here are unacceptable [4].This larger picture on the method arises from a straightforward finite-size corrections study over the spin glass ensembles the authors employ, using data that has been available for decades [5,6].
In our investigation here, we focus on mainly two ensembles of NP-hard problems the authors utilize: The Edwards-Anderson spin glass on a cubic lattice (EA in d = 3) with periodic boundary conditions [7] and the mean-field (all-to-all connected) Sherrington-Kirkpatrick spin glass (SK) [8].The ensemble for both models consists of instances where all bonds are chosen randomly from a normal distribution of zero mean and unit variance.The ensemble is parametrized by its size, i.e., the number of variables N in a spin configuration ⃗ σ, where N = L 3 in the case of EA.With those hard combinatorial problems, there are many ways to find exact solu-tions for instances of small N , such as a solver like GUROBI, however, for any practical application at large N , the super-polynomial rise in complexity necessitates the use of heuristic methods.Thus, the scalability of a heuristic is of particular concern.In the formal study of computational complexity, this is typically addressed by establishing bounds on an all-encompassing worst-case scenario [9].For many complicated meta-heuristics [10], such as the case of the method presented here, insights into the capability of a heuristic can be gained only from comparative studies over widely accepted testbeds of instances or those selected from specific ensembles.The authors have clearly adopted the ensemble approach [1].
Especially with regard to scaleability, the ensemble picture deserves particular attention, for the following reasons.Those ensembles typically have a "thermodynamic limit", i.e., their averages are welldefined and possess a clear meaning for N → ∞, which a typical large instance approaches.At times, that limit may even be exactly solvable, such as in the case of SK [11], but that is not essential here, as exemplified by EA.More importantly, that limit is usually attained in an equally well-defined manner through finite-size corrections (FSC).To be specific in this context, for the cost function a heuristic is trying to minimize, the authors have chosen the ground state energy density, e 0 = min ⃗ σ H (⃗ σ) /N , of the Hamiltonian H for each of their (physically motivated) spin glass ensembles.Instances are generated via random choices of bonds J ij from a characteristic distribution P (J), see Eq. (1) in Ref. [1].If the thermodynamic limit for the ensemble-averaged groundstate energy density ⟨e 0 ⟩ N =∞ exists, FSC assumes the asymptotic scaling form for a constant A and a correction exponent ω(> 0).  1) for the ensemble averaged ground state energy densities obtained with various heuristics for EA in d = 3.Previous data obtained with GA [5] or PT [3] for a range of system sizes N = L d up to L = 14 exhibit a consistent asymptotic scaling with corrections ∼ 1/N ω and ω = 1 − θ/d ≈ 0.92, as discussed in Ref. [12].The linear fit (dashed line) with x = 1/L 2.76 has the form ⟨e 0 ⟩ N =∞ + Ax with ⟨e 0 ⟩ N =∞ ≈ −1.701 and A ≈ 1.641.The corresponding data for L = 10, 15, 20 from Ref. [1] (red circles) diverges increasingly from the expected values for typical ground states.
Clearly, other forms of corrections might exist and higher-order terms could well obscure the assumed behavior deep into the large-N regime.Yet, selfconsistency with the form in Eq. ( 1) of the actual data for small N , where reliable (or exact) results can be ascertained, often provides a powerful baseline to assess the scalability of a heuristic [12,13].This is certainly the case here, and it provides a larger picture for the results in Ref. [1].
Long before the PT results [3] that the authors reference in their study of EA in d = 3, virtually identical results have been found by Pal [5] using a genetic algorithm (GA).Despite the doubts the authors raise (in the caption1 of their Fig.5), both the PT and the GA data exhibit a consistent scaling picture, shown here in Fig. 1.While the authors don't provide any tabulated data for their corresponding results, at least for the larger samples we can extract estimated values for their best results (for DIRAC-SA, shown as red circles in Fig. 1) from the plots provided in their Fig.S5 (d-f).There, the fact that the DIRAC-SA data is better than either PT or SA is taken as evidence of the superiority of their method by the authors.However, considering how far separated from any actual ground states every one of the datasets employed in this comparison really is, this advantage, whether in speed or in accuracy, is rather inconsequential in the larger picture of Fig. 1.
Similarly, the results the authors provide for SK prove inconclusive in the larger picture of longestablished results for this case [6,14,13].Here, Ref. [1] merely provides results of their method for quite small instances, where GUROBI allows to obtain exact ground states for comparison.While these results are indeed consistent with the predicted scaling, as shown in Fig. 2, the sizes bounded by N ≤ 216 considered in their study have very limited predictive power about the scalability of their method for any size that would make their method competitive, either in speed or in accuracy, with state-of-theart heuristics at larger N .After all, with an ensemble approach, it is not necessary to rely on exactly solved instances to make impactful comparisons, as our discussion of EA demonstrates.
In conclusion, a comparison with existing data shows little evidence for the claimed superiority of the deep reinforcement learning strategy to enhance optimization heuristics proposed in Ref. [1].The comparison provided here for both, a sparse short-  1) with ω = 2/3.For SK, theory (RSB [11]) predicts an exact result for the limit N → ∞, ⟨e 0 ⟩ N =∞ = −0.7632 . .., marked by ×.The reference data (open circles) for up to N = 1023, averaged over at least 10 5 instances each, was obtained with the extremal optimization heuristic (EO) [6].
That the asymptotic fit (line) of this data predicts ⟨e 0 ⟩ N =∞ with high accuracy adds confidence in the scaling.The data for 50 instances each at N = 64, 125, and 216 from Ref. [1] (red squares) matches within errors to a similar random sample of 50 instances each optimized with EO (filled circles).Note that the ground-state energy variances for SK are typically broader than standard deviations [6].The DIRAC50 data here was obtained from Fig. S15  (d-f) in Ref. [1], which required the addition division by √ N in the ground state energy densities when a univariate ( J 2 = 1) bond distribution is used, i. range and a dense infinite-range spin glass model, is quite exemplary for all the ensembles the authors discuss, so that this conclusion is likely not particular to these two cases.The authors should be lauded for having demonstrated some gains relative to simple greedy algorithms for EA [15], but their results remain too far from optimality, even if under the < 1% level we found in Fig. 1, to be of any use in applications to the physics of spin glasses the authors imply.For example, in the stiffness problem one determines a ground state of an instance in EA and again for reversed boundary conditions, which inserts a relative domain wall between the ground states with separate energies e 1,2 0 (L) ∼ ⟨e 0 ⟩ L=∞ + A 1,2 /L dω + . ... That domain wall has a much smaller energy, ∆e = |e 1 − e 2 | ∼ ∆A/L dω → 0, which relates FSC to the stiffness exponent via dω = d − θ [12], as used in Fig. 1.These exponents were determined for EA in dimensions d = 3, . . ., 7 by finding ground states for millions of dilute lattices with up to N = 10 7 using a hybrid EO algorithm [16,4].Hence, the heuristics chosen as a base for their comparison is surprisingly narrow, considering that the authors refer to Ref. [2] for the use of heuristics for spin glasses, which also discusses GA and EO.

Figure 1 :
Figure 1: Extrapolation plot according to the finitesize corrections form in Eq. (1) for the ensemble averaged ground state energy densities obtained with various heuristics for EA in d = 3.Previous data obtained with GA[5] or PT[3] for a range of system sizes N = L d up to L = 14 exhibit a consistent asymptotic scaling with corrections ∼ 1/N ω and ω = 1 − θ/d ≈ 0.92, as discussed in Ref.[12].The linear fit (dashed line) with x = 1/L 2.76 has the form ⟨e 0 ⟩ N =∞ + Ax with ⟨e 0 ⟩ N =∞ ≈ −1.701 and A ≈ 1.641.The corresponding data for L = 10, 15, 20 from Ref.[1] (red circles) diverges increasingly from the expected values for typical ground states.