## Abstract

Decades of exponential scaling in high-performance computing (HPC) efficiency is coming to an end. Transistor-based logic in complementary metal-oxide semiconductor (CMOS) technology is approaching physical limits beyond which further miniaturization will be impossible. Future HPC efficiency gains will necessarily rely on new technologies and paradigms of computing. The Ising model shows particular promise as a future framework for highly energy-efficient computation. Ising systems are able to operate at energies approaching thermodynamic limits for energy consumption of computation. Ising systems can function as both logic and memory. Thus, they have the potential to significantly reduce energy costs inherent to CMOS computing by eliminating costly data movement. The challenge in creating Ising-based hardware is in optimizing useful circuits that produce correct results on fundamentally nondeterministic hardware. The contribution of this paper is a novel machine learning approach, a combination of deep neural networks and random forests, for efficiently solving optimization problems that minimize sources of error in the Ising model. In addition, we provide a process to express a Boltzmann probability optimization problem as a supervised machine learning problem.

### Similar content being viewed by others

## Introduction

Since the creation of the first microprocessor in 1971, the number of transistors on a chip has grown at an exponential rate that has only recently begun to abate^{1}. This trend was first observed in 1975 by engineer Gordon Moore^{2}, and has been known as Moore’s law ever since. Today there is a growing consensus that Moore’s law is coming to an end. Growth in the number of transistors is reaching physical limits, processing frequency has largely stagnated, and power consumption has effectively plateaued. The usability of systems based on CMOS devices creates additional challenges: deeply scaled technologies exhibiting unexplained aging effects, greater hardware variability, increased soft error susceptibility, and decreased noise immunity because of operating at near-threshold voltages^{3}. All told, the path forward for continued improvement in HPC performance and efficiency must consider architectures beyond von Neumann and technologies beyond CMOS, where nondeterminism is not an accident but “an essential part of the process under consideration”^{4}. Along these lines, Ising-based hardware has shown particular promise as an extremely energy-efficient approach to solving particular classes of hard problems^{5,6,7,8,9,10}.

The architecture of Ising-based hardware arises from the Ising model, a famous model from statistical mechanics. The model is a lattice structure, where each site on the (potentially infinite) lattice is referred to as a spin. Unlike the quantum world, the spins are discrete, taking only two values {−1, +1}. The spins are interdependent and the probability of the spins taking on a set of positions (known as the state) can be measured by the Boltzmann distribution, also known as Gibb’s distribution.

By increasing the spin space of the system such that each spin can take values in {1, …, *q*} for some integer *q* ≥ 2, we obtain the Potts model^{11}. Note that for *q* = 2, the Potts model is the Ising model. The Potts model is applied across multiple disciplines. For example, in statistical mechanics and computational biology fields, the model is used to study various macroscopic properties of the (biological or physical) system such as phase transitions, entropy, and thermodynamic properties^{12,13,14}. For more complete surveys of various applications of the Potts model, see refs. ^{15,16}.

For *q* = 2, the Potts / Ising model has been used to design non-von Neumann logical circuits, which is the application motivating this work^{17}. These circuits are inherently nondeterministic: for a given logic operation, every possible output has some probability of occurring. To design an Ising-based circuit, we define an optimization problem of the Boltzmann probability of the system. The objective is to choose system parameters such that the configurations corresponding to correct outputs have a high probability.

An obstacle for working with the Ising model is the computational complexity involved in calculating the Boltzmann distribution and its requisite partition function. This is computationally intractable for even simple problems. Consequently, various approximations to compute this distribution efficiently are of continued research interest in many disciplines, e.g., refs. ^{18,19,20}. Our approach bypasses this challenge. We craft our problem as an optimization problem that is tractable and possible to solve with standard solvers. We successfully train deep machine learning models to solve this optimization problem.

We first introduce the problem of solving for the parameters of the Ising system, known as the reverse Ising problem, and motivate the associated Boltzmann probability optimization problem. We detail the process of converting this optimization problem to one amenable to efficient data generation required for supervised machine learning, and describe our machine learning models. We then analyze the performance improvements of our models over traditional optimization methods. The contribution of this work is a novel approach to solving the otherwise intractable problem of optimizing Ising system parameters to produce desired outputs from specified inputs with high probability. The overarching process of obtaining optimal design parameters is visualized in Fig. 1. In this figure, our work replaces the slow left-hand side gradient-based optimizers with neural network models. More specifically, our work provides a method of efficiently predicting the dynamics of Ising systems, enabling the tackling of more complex problems on this exciting next-gen class of hardware.

## Results

### Preliminaries

In this section, we establish notation used throughout the paper. We use bold characters to denote vectors. An *n*-dimensional vector will be notated as **v** = (*v*_{1}, *v*_{2}, …, *v*_{n}). *v*_{i} denotes the *i*th element of vector **v**. In the case of multiple vectors, we use super-indices within parentheses. For example, **v**^{(1)} and **v**^{(2)} denote two different vectors and \({v}_{1}^{(1)}\) denotes the first element of vector **v**^{(1)}.

As a quick reference guide to the paper, Table 1 is provided. These variables will be described in detail throughout the paper.

### The reverse ising problem

The *Ising Model* represents a system of spins. In the classical setting, these spins take on the values ±1 and can be representative of bit values 0 and 1. The *state* of the system is defined as the set of spin values.

As with magnets, neighboring spins interact with one another and have a ground state, the state to which the spins naturally settle. In the Ising model, the ground state is that with the lowest energy, which is analogous to magnets aligning to their ground state due to the interaction of their magnetic fields. Since the Ising model is a nondeterministic model derived from statistical mechanics, the probability of settling to the ground state is a function of that state’s energy. The resulting probability distribution is discussed further in the next section.

The total energy, both kinetic and potential, of an Ising model with state **s** = {*s*_{i}} ∈ {±1}^{N} is defined by the Hamiltonian of the state:

where *h*_{i}, \({J}_{i,j}\in {\mathbb{R}}\) are the energy fields affecting individual spins and the coefficients representing the interactions between the spins, respectively, and the second sum is over pairs of neighboring spins (every pair is counted once).

The classic Ising problem is as follows: given coefficients *h*_{i}, \({J}_{i,j}\in {\mathbb{R}}\), find state **s** ∈ {±1}^{N} where **s** has the minimum Hamiltonian value. The state with the smallest Hamiltonian value, i.e., the lowest energy, will be the resulting ground state with higher probability than any other possible state of the system. This problem has been shown to be NP-complete^{21}.

The reverse Ising problem takes a slight spin off the classic. Suppose we have a system with *N* spins and the spins are partitioned into two sets of sizes *n* and *m*, that is, *N* = *n* + *m*. Without loss of generality, suppose the first *n* spins (*s*_{1},…, *s*_{n}) are fixed, i.e., the values of the spins are known and programmed into the system. And, suppose the last *m* spins are not fixed, that is not programmed into the system, but they however have preferred known values: (*s*_{n+1},..., *s*_{N}). The question then is, can we find Hamiltonian coefficients such that the desired state **s** = (*s*_{1}, …, *s*_{n}, *s*_{n+1}, …, *s*_{N}) will become the ground state with high probability, that is **s** has the smallest total energy of all states given Hamiltonian coefficients *h*_{i}, *J*_{i,j}. More specifically, given state **s** = (*s*_{1}, …, *s*_{n}, *s*_{n+1}, …, *s*_{N}) ∈ {±1}^{N} find coefficients *h*_{i}, \({J}_{i,j}\in {\mathbb{R}}\) such that

for *t*_{k} ∈ {±1}, 1 ≤ *k* ≤ *m*, where *t*_{k} ≠ *s*_{n+k} for at least one *k*. In addition, since Ising systems are nondeterministic, we maximize the event of state **s** occurring by minimizing the energy of that state:

where

is the vector representing the Hamiltonian coefficients of a system with *N* spins and *H*_{ψ} the Hamiltonian of a state **s** with Hamiltonian coefficients *ψ*.

### Example 1

Let **s** = (−1, 1, −1) where the third spin can vary. We then desire to find Hamiltonian coefficients that will ensure the state **s** = (−1, 1, −1) will have the smallest energy of the states (−1, 1, −1) and (−1, 1, 1):

We generalize Example 1 to the higher dimension *N* = *n* + *m*, and obtain the following optimization problem:

Let **s** = (**u**, **v**) ∈ {±1}^{N} be the desired state, where **u** ∈ {±1}^{n} and **v** ∈ {±1}^{m}. Then, the problem of interest becomes

However, the system of inequalities \({\{{H}_{\psi }({\bf{u}},{\bf{v}}) \,<\, {H}_{\psi }({\bf{u}},{\bf{t}})\}}_{{\bf{t}}\ne {\bf{v}}}\) is not feasible in certain scenarios. Consequently, in such cases extra degrees of freedom are introduced by adding more spins, referred to as auxiliary spins. Let *α* represent the number of auxiliary spins added to the system and let **a** ∈ {±1}^{α} represent the vector of auxiliary spins. From now on, we consider the number of spins to be partitioned as *N* = *n* + *m* + *α*. For each state of desired spins *s*_{1}, …, *s*_{n+m} we want to find the position of auxiliary spins that give us the lowest energy *H*(**s**). Hence, we minimize over the auxiliary spins in addition to the Hamiltonian coefficients, and Eq. (2) becomes

In the reverse Ising problem, the above is generalized to a set of desired states: for a set of states *S* = {**s**^{(i)}}, 1 ≤ *i* ≤ *ℓ*, find Hamiltonian coefficients *ψ* = {*h*_{1}, …, *h*_{N}, *J*_{1,2}, …, *J*_{N−1,N}} such that for each state \({{\bf{s}}}^{(i)}=({s}_{1}^{(i)},\ldots ,{s}_{N}^{(i)})\) the total energy of the system is smaller than that of all possible sets of spins with the *n* spins being the same. That is, for 1 ≤ *i* ≤ *ℓ*,

for *t*_{k} ∈ {±1} where \({t}_{k}\ne {s}_{n+k}^{(i)}\) for at least one value of *k*. We note that for each desired state **s** = (*s*_{1}, …, *s*_{n+m}), the auxiliary spins *a*_{1}, …, *a*_{α} can take on different spin values. For example, if there are 3 desired states, then there are 3 auxiliary spin vectors **a** that are completely independent of one another.

Recall that the Ising model is a nondeterministic system where the probability of settling to a state, including that with the lowest energy, is a function of that state’s energy. As a result, rather than minimizing the energy of a set of states, we turn to the probability of the states.

### The Boltzmann probability objective function

In 1868, Ludwig Boltzmann discovered a probability distribution, the Boltzmann distribution, that models the likelihood of states occurring within a given system^{22}. The probability of the system taking on state \(\tilde{s}\) given Hamiltonian coefficients *ψ* is defined by the Boltzmann probability:

where *β* represents the Boltzmann constant and *T* the temperature. In this paper, we consider the temperature to be fixed. More specifically, we let *β**T* = 1.

Viewing the probability of a system taking on state \(\tilde{s}\) in terms of the probability of all other states, Eq. (4) is equivalent to

In light of the reverse Ising problem, we desire to maximize the probability of obtaining a set of desired states, which we will notate as {**s**^{(i)} = (**u**^{(i)}, **v**^{(i)}, **a**^{(i)})}, where we note that the auxiliary spins \({{\bf{a}}}^{(i)}=({a}_{1}^{(i)},\ldots ,{a}_{\alpha }^{(i)})\) specific to each state are free to take on any value. Equivalently, we want to minimize the maximum probability of obtaining a state with an incorrect second portion **v** over all the possible first portions **u**.

More specifically, let \(S={\{{{\bf{s}}}^{(i)} = ({{\bf{u}}}^{(i)},{{\bf{v}}}^{(i)},{{\bf{a}}}^{(i)})\}}_{1\le i\le \ell }\) represent the set of states desired, where **a**^{(i)} is free. Let \({W}^{(i)}={\{({{\bf{u}}}^{(i)},{\bf{t}},{{\bf{a}}}^{(i)})\}}_{{\bf{t}}}\), where **t** ranges over all the possible vectors in {±1}^{m} such that **t** ≠ **v**^{(i)}. That is, *W*^{(i)} represents the set of states where the second portion differs from **v**^{(i)} in at least one position. Then, the probability of not being in state **s**^{(i)} = (**u**^{(i)}, **v**^{(i)}, **a**^{(i)}) but rather in a state in *W*^{(i)} is

The resulting objective function of interest, which will be used throughout the rest of the paper, is as follows:

where *ψ* is the set of Hamiltonian coefficients and **a**^{(i)} ∈ {±1}^{α} for 1 ≤ *i* ≤ *ℓ*, each *i* corresponding to the desired state **s**^{(i)}. Throughout the rest of the paper, (**a**^{(1)}, …, **a**^{(ℓ)}) is referred to as the auxiliary array.

In this work, we demonstrate the successful application of machine learning algorithms to solve our optimization problem (Eq. (5)). The result is a computationally efficient approach for evaluating Ising system design parameters. This is a dramatic advancement over the prohibitively slow state-of-the-art methods for performing this evaluation (see Table 5 for details). Our results demonstrate the successful implementation of the workflow described in the right-hand side of Fig. 1; that is, we implement a deep learning model that solves a complex optimization problem both accurately and efficiently. This result enables the study of larger and more complex Ising circuits. The sophisticated process required to generate a training dataset for this problem and the machine learning architectures used are discussed next, followed by a discussion of the performance of the ML models.

## Methods

### Size of the system and the complexity of the problem

The size and complexity of the reverse Ising problem is determined by the number of fixed, non-fixed, and auxiliary spins of the system under consideration. For *N* = *n* + *m* + *α* total spins, the total number of possible states considered when computing the Boltzmann probability is 2^{N}, and the vector *ψ* of Hamiltonian coefficients has \(\frac{1}{2}({N}^{2}-N+2)\) variables. The number of linear inequality constraints in our resulting optimization problem is 2^{n+α}(2^{m} − 1).

Finally the number of possible combinations of auxiliary spin values scales double exponentially as \({2}^{\alpha {2}^{n}}\). This relationships is obtained as the auxiliary spins for each desired state are independent from one another and can take on any of the 2^{α} values. As a result, we obtain 2^{α} possibilities for 2^{n} equations, giving us the relationship \({\left({2}^{\alpha }\right)}^{\left({2}^{n}\right)}={2}^{\alpha {2}^{n}}\). For example, a system with 6 total spins (*n* = 2, *m* = 2, and *α* = 2) will have 48 inequality constraints and 256 possible auxiliary arrays. In this scenario, one auxiliary array takes on the form (**a**^{(1)}, …, **a**^{(4)}), where each vector **a**^{(i)} can take on four different values. A problem with twice as many total spins (*n* = 4, *m* = 4, and *α* = 4) will have 3840 inequality constraints and 2^{64} ≈ 1.8 × 10^{19} possible auxiliary arrays.

If the system had no auxiliary spins, the constraints are linear with respect to the set of variables *ψ* = (*h*_{1}, …, *h*_{N}, *J*_{1,2}, …, *J*_{N−1,N}) due to the linearity of the Hamiltonian (Eq. (1)). With the addition of auxiliary spins, the constraints become nonlinear and non-convex. Further, the objective function (Eq. (5)) is a nonlinear, non-differentiable function with an intractable number of local minima. This rules out the possibility of using standard simplex methods to find a global minimum. Because of this, faster methods for evaluating the Boltzmann probability for arbitrary sized systems with large numbers of auxiliary spins are essential to optimization problems like the reverse Ising problem.

### Modeling the Boltzmann probability optimization problem

We continue to assume there are a total of *N* = *n* + *m* + *α* spins in the system, where *α* is the number of auxiliary spins, *n* spins are fixed, leaving *m* of the spins to vary. Recall our minimization maximization programming problem:

where auxiliary spin values **a**^{(i)} ∈ {±1}^{α} are the input, {**s**^{(i)} = (**u**^{(i)}, **v**^{(i)}, **a**^{(i)})} represent the set of states desired and \({W}^{(i)}={\{({{\bf{u}}}^{(i)},{\bf{t}},{{\bf{a}}}^{(i)})\}}_{{\bf{t}}\ne {{\bf{v}}}^{(i)}}\) represents the set of states that are the complement of state **s**^{(i)}. For a given problem size and a desired set of states, our goal is to create a model that provides a means of calculating *ρ* both quickly and accurately, within some epsilon.

To do this, we first need to generate training data. Eq. (6) presents several challenges for generating such data. Firstly it is numerically unstable, and secondly it is not everywhere-differentiable, ruling out the possibilities of using desirable gradient based methods. We describe the process in the next section via which we construct a numerically stable approximation to Eq. (6). This development is singularly responsible for our ability to create high-quality training data required to train supervised machine learning models. This proved essential to making worthwhile progress towards our stated goal of optimizing Ising system design parameters. Following the description of this process, we describe the machine learning models that we consider in this work.

### Generating the training data

We desire to transform Eq. (6) such that we can easily and reliably compute solutions. The nonlinear optimization solver we used is based on the sequential least squares programming (SLSQP) algorithm, proposed by Dieter Kraft in 1988 in^{23}. The solver is available through the Python open-source library SciPy.

For the majority of choices for Hamiltonian coefficients *ψ* = (*h*_{1}, …, *h*_{N}, *J*_{1,2}, …, *J*_{N−1,N}), the maximum probability of landing in an undesired state will be extremely close to one. To assist in distinguishing the probabilities, we minimize the maximum of the \(\log\) of the probability of an undesired state:

This has the effect of amplifying small changes to the probability resulting from updates to the Hamiltonian coefficients.

An additional transformation is applied to the maximum function. For a fixed set of Hamiltonian coefficients *ψ*, the maximum function

is not guaranteed to be continuously differentiable. Because SLSQP requires the gradient to be defined at every point, we thus must approximate the maximum function. For a sufficiently large scaling parameter *λ* the following approximation holds for some arbitrary function *g* evaluated at points *x*_{i}:

Applying this approximation, we obtain a continuously differentiable version of Eq. (7):

This completes our transformation process. The Boltzmann probability optimization function can now be approximated in a numerically stable, continuously differentiable form as follows:

where,

To speed up the work done by the SLSQP solver and to further ensure numerical stability, we also apply standard log-sum-exp transformations and vectorize *f*(*ψ*). Details are found in the supplementary information.

Our training data is of the form

where 0 ≤ *ρ*(**a**) ≤ 1 and **a** = (**a**^{(1)}, …, **a**^{(ℓ)}) is the auxiliary array, where **a**^{(i)} ∈ {±1}^{α} is the vector of *α* auxiliary spins.

For a given **a**, we use the SLSQP solver to minimize *f*(*ψ*). From this we calculate the associated target value using Eq. (10). We use this procedure to generate training data for the four problems listed in Table 2. We note that there are two groupings within the set of problems. The first two problems (Problems 1 and 2) are small enough that we exhaust the entire space. The latter two problems are much larger. As a result, we decided on 10,000 samples for both Problems 3 and 4.

### Random forest regression model

Using the data collected as dictated in the previous section, we train a random forest regressor, a collection of decision trees trained in parallel (independent from one another) such that, given an input, the output is the value most common (or average) to all of the decision trees^{24}. For problem size (*N*, *n*, *α*), the input to the model is an auxiliary array, a set \({\bf{a}}=({{\bf{a}}}^{(1)},\ldots ,{{\bf{a}}}^{({2}^{n})})\,{\text{such}}\,{\text{that}}\,{{\bf{a}}}^{(i)}\in {\{\pm 1\}}^{\alpha }\), and our target is *ρ*(**a**) as dictated by Eq. (10).

Due to the structure of random forests, the decisions to traverse the tree are discretized. That is, at each node of the tree, the choice of going left or right down the tree is dependent on the value of a single element of the auxiliary array **a** (see Fig. 2).

Despite this discreteness of the random forest, we obtain relatively good predictions as can be seen in Fig. 3. Table 3 exhibits the mean squared error (MSE) of the estimated optimization function *ρ*(**a**) over the test data. For each problem, one hundred trees were used as an ensemble vote with a maximum tree depth of 27.

For the interested reader, other ML classifiers were studied, including support vector machines (SVM), various flavors of logistic classifiers, and stochastic gradient classifiers. The random forest classifier triumphed in terms of both accuracy and performance results.

### Deep neural network model

In ref. ^{25}, Humbird et al. showed that a random forest can be modeled by a deep neural network (DNN). They provide a framework (DJINN) to go from a random forest to a DNN, using the random forest as an initialization of the DNN parameters. As our optimization function is nonlinear and random forests are linear, we apply the DJINN framework to take our random forest regression model to a DNN, a nonlinear model. Once the DNN parameters are initialized with the random forest, the model continues to learn and improves upon the initial random forest regressor. The MSE results can be seen in Table 4 and Fig. 3.

The results are from DNNs grown from random forests with only three trees and a maximum tree depth of 10. For example, the neural network used to model the data of Problem 1 has the following layer sizes:

Despite truncating the number of trees and the tree depth of the random forests used to initialize this DNN, one can see that the resulting neural network is able to model the data with similar MSE results to the random forest regression models. Furthermore, the random forest regressor in Problem 1 has 100 trees each at a depth of sixteen, which is approximately 65,535 nodes per tree. Consequently, the aggregated number of trainable parameters of the random forest regression models far exceeds that of the DNN models without significant improvement in prediction accuracy.

## Discussion

In Table 5, we provide performance evaluations of four methods for computing Problem 4 from Table 2. The first two are implementations of the SLSQP algorithm, the latter two are our ML models.

The default behavior of SciPy’s SLSQP algorithm involves approximating the gradient numerically. This version of the solver is referred to as “SLSQP (approx grad)”. For high-dimensional problems such as the reverse Ising problem, this approximation is computationally expensive because the number of objective evaluations requires scales with the number of dimensions (spins). To mitigate this, we explicitly implemented the gradient in NumPy and passed it to the solver. This version of the code is referred to as “SLSQP (explicit grad)”. See the supplementary information for details.

The DJINN framework enabled us to construct DNNs that predict optimal Boltzmann probabilities for the reverse Ising problem in significantly less time than the current state of the art. With an MSE of 0.02 or less, we conclude that it should be possible to construct acceptably accurate predictors of Ising system dynamics based on deep learning models trained with the output of our Boltzmann probability solver. Those trained models can be used to analyze much larger spin configurations than is currently possible in order to design better ground-state solutions for the reverse Ising problem.

The contribution of this paper is a novel machine learning approach, a combination of deep neural networks and random forests, for efficiently solving optimization problems that minimize sources of error in the Ising model. Consequently, this will enable the construction of larger circuits using far fewer spins than are currently needed.

## Data availability

Data is available upon request.

## References

Theis, T. N. & Wong, H.-S. P. The end of moore’s law: a new beginning for information technology.

*Comput. Sci. Eng.***19**, 41–50 (2017).Moore, G. E. Progress in digital integrated electronics [technical literature, copyright 1975 IEEE. Reprinted, with permission. technical digest. international electron devices meeting, IEEE, 1975, pp 11–13.].

*IEEE Solid-State Circuits Soc. Newsl.***11**, 36–37 (2006).Seifert, N. et al. Soft error susceptibilities of 22 nm tri-gate devices. In

*Proceedings of 2012 IEEE Nuclear and Space Radiation Effects Conference*(NSREC, 2012).von Neumann, J. Probabilistic logics and the synthesis of reliable organisms from unreliable components. Lecture delivered at the California Institute of Technology (January, 1952).

Cai, B. et al. Unconventional computing based on magnetic tunnel junction.

*Appl. Phys. A***129**, 236 (2023).Haribara, Y., Ishikawa, H., Utsunomiya, S., Aihara, K. & Yamamoto, Y. Performance evaluation of coherent ising machines against classical neural networks.

*Quantum Sci. Technol.***2**, 044002 (2017).Hong, J., Lambson, B., Dhuey, S. & Bokor, J. Experimental test of landauer’s principle in single-bit operations on nanomagnetic memory bits.

*Sci. Adv.***2**, e1501492 (2016).Huckaba, L. The ising machine-a probabilistic processing-in-memory computer.

*Wave. Natl. Security Agency’s Rev. Emerg. Technol.***23**, 19–24 (2022).Aadit, N. A. et al. Physics-inspired ising computing with ring oscillator activated p-bits. In

*2022 IEEE 22nd International Conference on Nanotechnology (NANO)*393–396 (NANO, 2022).Yamamoto, Y. et al. Coherent ising machines-optical neural networks operating at the quantum limit.

*npj Quantum Inf.***3**, 49 (2017).Wu, F.-Y. The potts model.

*Rev. Mod. Phys.***54**, 235 (1982).Baxter, R. J.

*Exactly Solved Models in Statistical Mechanics*(Elsevier, 2016).Broderick, T., Dudik, M., Tkacik, G., Schapire, R. E. & Bialek, W. Faster solutions of the inverse pairwise ising problem. Preprint at https://doi.org/10.48550/arXiv.0712.2437 (2007).

Dubois, J.-M., Ouanounou, G. & Rouzaire-Dubois, B. The boltzmann equation in molecular biology.

*Prog. Biophys. Mol. Biol.***99**, 87–93 (2009).Nguyen, H. C., Zecchina, R. & Berg, J. Inverse statistical problems: from the inverse ising problem to data science.

*Adv. Phys.***66**, 197–261 (2017).Rozikov, U. Gibbs measures of potts model on cayley trees: a survey and applications.

*Rev. Math. Phys.***33**, 10 (2021).Martin, I., Moore, A., Daly, J., Meyer, J. & Ranadive, T. Design of general purpose minimal-auxiliary ising machines. In

*2023 IEEE International Conference on Rebooting Computing, ICRC 2023, San Diego, CA, USA, December 5–6, 2023 (to appear)*(IEEE, 2023).Anandakrishnan, R. A partition function approximation using elementary symmetric functions.

*PLoS One*https://doi.org/10.1371/journal.pone.0051352 (2012).Haddadan, S., Zhuang, Y., Cousins, C. & Upfal, E. Fast doubly-adaptive MCMC to estimate the gibbs partition function with weak mixing time bounds. In

*NIPS'21 Proc of the 35th International Conference on Neural Information Processing Systems*, 25760–25772 (ACM, 2021).Wocjan, P., Chiang, C.-F., Nagaj, D. & Abeyesinghe, A. Quantum algorithm for approximating partition functions.

*Phys. Rev. A***80**, 022340 (2009).Cipra, B. A. The ising model is NP-complete.

*SIAM News***33**, 1–3 (2000).Boltzmann, L. On the relationship between the second fundamental theorem of the mechanical theory of heat and probability calculations regarding the conditions for thermal equilibrium.

*Entropy***17**, 1971–2009 (2015).Kraft, D. A software package for sequential quadratic programming.

*Forschungsbericht-Deutsche Forschungs-und Versuchsanstalt fur Luft-und Raumfahrt*(1988).Leou, B. Random forests.

*Mach. Learn.***45**, 5–32 (2001).Humbird, K. D., Peterson, J. L. & McClarren, R. G. Deep neural network initialization with decision trees.

*IEEE Trans. Neural Netw. Learn. Syst.***30**, 1286–1295 (2018).

## Author information

### Authors and Affiliations

### Contributions

F.K., J.D. and J.M. all took part in writing the manuscript. In addition, each person was crucial in the brain storming of ideas and creating means of navigating the many roadblocks. F.K. prepared all the figures.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Knoll, F., Daly, J. & Meyer, J. Solving Boltzmann optimization problems with deep learning.
*npj Unconv. Comput.* **1**, 5 (2024). https://doi.org/10.1038/s44335-024-00005-1

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s44335-024-00005-1