## Abstract

Calculating thermodynamic potentials and observables efficiently and accurately is key for the application of statistical mechanics simulations to materials science. However, naive Monte Carlo approaches, on which such calculations are often dependent, struggle to scale to complex materials in many state-of-the-art disciplines such as the design of high entropy alloys or multi-component catalysts. To address this issue, we adapt sampling tools built upon machine learning-based generative modeling to the materials space by transforming them into the semi-grand canonical ensemble. Furthermore, we show that the resulting models are transferable across wide ranges of thermodynamic conditions and can be implemented with any internal energy model *U*, allowing integration into many existing materials workflows. We demonstrate the applicability of this approach to the simulation of benchmark systems (AgPd, CuAu) that exhibit diverse thermodynamic behavior in their phase diagrams. Finally, we discuss remaining challenges in model development and promising research directions for future improvements.

## Introduction

Reliable methods for the assessment of thermodynamic stability can accelerate materials design in at least two ways, one considering only energy and the other considering free energy. Identifying low-energy structures that are stable with respect to phase decomposition is needed to ensure that computer-designed materials are synthesizable and stable in operation conditions. In addition, including the role of temperature and entropy is required to understand phase transitions and to predict phase diagrams de novo.

The difficulty in quantifying the free energy difference between phases arises because, in principle, the evaluation of potentials that govern phase stability requires a summation over all possible states of the system that satisfy the corresponding thermodynamic constraints. In practice, Monte Carlo (MC) methods approximate equilibrium properties by identifying a relatively small number of representative system configurations from which ensemble averages can be estimated, and thus implement a strategy to compute relative free energies and determine stable phases. The broad applicability of MC approaches has led to the development of numerous software packages specifically geared toward the materials domain^{1,2,3,4,5,6,7,8,9}. Generally, the most common algorithm to quantify phase stability is to: (1) consider a coarse-grained representation of a phase consisting of a supercell of fixed size and space group where states can be defined by a set of occupation variables **S** describing the atom at each site, (2) use a set of DFT (structure, energy) pairs to fit an empirical model that predicts the internal energy *U*(**S**) of a state with occupancy **S**, and (3) draw samples from the equilibrium distribution defined by *U*(**S**) using a Markov Chain. Each step of the chain, and thus the resulting representative configurations, is obtained through the stochastic proposal of a new state followed by an acceptance/rejection criterion determined by the relative probabilities of the new and previous states according to the equilibrium distribution.

While this method has demonstrated widespread utility, Markov Chain Monte Carlo (MCMC)^{10} requires serial computation, can suffer from critical slowing down near phase transitions, and results from simulations run at one set of fixed constraints are not generally transferable to other conditions. These issues can be partially mitigated by the design of specialized proposal/acceptance moves^{11,12}, exchange between parallel simulations^{13}, and random-walks through the density of states^{14}, but studies characterizing the mixing thermodynamics of complex, multi-component alloys often demand significant computational cost for large system sizes^{15}.

These limitations have prompted the development of a number of MC methods specifically designed for multi-phase equilibria. Multi-cell Monte Carlo (MC^{2}) implements carefully designed proposal/acceptance steps such that atoms can be exchanged between separate supercells. The impact of phase interfaces on these finite-size simulations is significantly reduced as multiple phases can coexist across different cells^{16,17,18}. Variance-constrained semi-grand canonical simulations rely on a modified thermodynamic ensemble that can be leveraged to compute the free energies of systems within two-phase regions and improve the accuracy of recovered phase boundaries^{19}. Furthermore, Wang–Landau methods^{14} have been adapted to the materials domain and applied to characterize benchmark systems^{20}.

Alternatively, machine learning approaches can be used to produce realistic high-likelihood samples from complex distributions without explicit parametrization in so-called generative models^{21,22}. The application of generative models to scientific calculations is a promising avenue to overcome the challenges of naive MC methods. Intuitively, these models are trained to draw samples by learning the typical values of the system’s physical variables at equilibrium. A perfectly tuned model could then simulate the system by simply averaging over a batch of ML-proposed samples. Generative models such as variational autoencoders (VAEs)^{22} and generative adversarial networks (GANs)^{23,24} have become standard tools for materials design by transforming samples drawn from Gaussian noise to resemble a target distribution. In recent years, VAEs and GANs have often been outperformed by autoregressive generative models^{25,26} that instead of using a latent space, sequentially sample the output variables in series of steps. When the probability distribution of each step is tractable, the probability of sampling any fully generated state can be computed exactly. Critically, when restricted to this class of exact-density models, the generative framework benefits from both a loss function that relies on a variational estimate of the thermodynamic potential, as well as reweighting^{27} and importance sampling techniques^{28} that can correct for sample distributions that deviate slightly from those at equilibrium.

The rigorous basis of these models and the explicit connection between exact likelihood and free energy have inspired a large number of physics-based applications. For continuous systems, exact-density flow models have been applied to reducing autocorrelations in lattice field theory^{29,30,31}, sampling free energy barriers of biomolecules^{27}, and studying relaxations of Ising models^{32,33}. In discrete cases, autoregressive generative models have been used to extract thermodynamics quantities^{28,34} and determine ground states^{35,36} of spin models.

In this work, we introduce Semi-grand Ensemble Generation by Autoregressive Lattices (SEGAL), a generative approach to lattice simulations of phase stability in materials science. In particular, we demonstrate the applicability of exact-density generative models to the semi-grand canonical thermodynamic ensemble; assess model performance on well-known benchmark systems such as spin models, copper-gold and silver-palladium alloys; and extract estimates of phase stability of multi-component systems.

## Results

### Autoregressive sampling for materials simulation

We seek to build a generative model that can successfully identify the representative states of the semi-grand canonical ensemble and their dependence on thermodynamic constraints, providing an alternative to traditional MC approaches. We refer to this model as SEGAL.

SEGAL associates each microstate the system can occupy with a predicted probability *P*_{AR}. Due to the discrete structure of the coarse-grained crystal representation, we decompose the probability of a particular decoration of the crystal prototype as a product of site probabilities that can represent any possible distribution over microstates. This mathematical decomposition requires defining an ordering over sites whereby the atomic identity of a particular site is dependent on its predecessors^{28,34}. Inspired by previous generative models that change the sampled distribution with temperature^{27,37,38}, the dependencies between sites are also functions of the thermodynamic constraints, allowing the conditions to control the microstate probabilities:

We parameterize these conditional probabilities using a neural network, whose general architecture is shown in Fig. 1 and whose specific details per application are given in Supplementary Notes 1–5. Therefore, the parameters of the network are trained to capture the underlying correlations of the atomic orderings. In order to generalize easily to arbitrary numbers of components and increase the capacity of the model, we represent each site as a vector with length equal to the number of components **S**_{i}. New decorations of the lattice prototype can be drawn from the model by sequentially sampling each site *i* from the categorical distribution *P*(**S**_{i}∣**S**_{j<i}, Δ*μ*, *T*) such that, after sampling, **S**_{i} is a one-hot encoded vector corresponding to the identity of the probabilistically chosen atom. The full state describing atomic labels over all sites is simply the concatenation of the **S**_{i} vectors. Note that the first chosen site still has a dependence on the set Δ*μ* and *T*.

### Ising model in a magnetic field

To demonstrate the use of SEGAL for a binary alloy, we first studied 10 × 10 periodic Ising spins in a magnetic field *B*. Through analysis of this model system, equivalences are drawn from spin variables to atomic site labels and from the magnetic field to the chemical potential difference. In particular, the long-range ordering of spins below the critical temperature is analogous to the opening of a two-phase miscibility gap in an alloy with unfavorable mixing. The internal energy function *U*(**S**) is the well-known nearest neighbor model with *J* = −1, working in units where *k*_{b} = 1:

where NN is the set of nearest neighbor pairs and *s*_{i} is the spin at site *i*. In the presence of a field *B*, an additional magnetic potential *B*∑_{i}*s*_{i} plays the role of chemical work Δ*μ**N* for our model system. SEGAL is trained with *T* ∈ [1.5, 3.5] and *B* set to values [−0.4, −0.2, 0.0, 0.2, 0.4], a range over which both first-order and second-order phase transitions are known to occur. Qualitatively, samples from the trained network exhibit behavior consistent with expectations (Fig. 2a). At low temperature, ferromagnetic states are observed and demonstrate a first-order discontinuity at the critical magnetic field *B* = 0. In addition, with increasing temperature, the samples demonstrate an order-disorder transition. Below the critical temperature, some magnetization values are not sampled, which is indicative of thermodynamically unstable alloy compositions that decompose into a linear combination of two more pure phases.

To quantitatively assess the validity of the model, we compared free energies estimated using self-normalized importance sampling (SNIS) on the output of SEGAL to those obtained from a Wang–Landau method that can interpolate between different temperatures but only at a fixed magnetic field^{39,40,41}. When available, we also compared with exact results on finite-size Ising models^{42,43,44}. The SEGAL-estimated free energies are obtained using 20,000 samples at each set of constraints (9 values of *B* ∈ [0.0, 0.4] and 101 values of *T* ∈ [1.5, 3.5]). Over the analyzed conditions, the differences in the free energy per site between the two methods are *O*(10^{−4}) and comparable in magnitude with the standard deviations of 50 separate SEGAL estimates of *F*(*T*, *B*) (Fig. 2b and Supplementary Fig. 2). The total cost to train and perform one sampling iteration using this SEGAL model is 4.7 × 10^{7} energy evaluations. When comparing to the exact values at *B* = 0, the magnitude of the errors of the SEGAL estimates are similar to the errors of the benchmark Wang–Landau algorithm^{39,40} when run for 10^{9} evaluations and restricted to zero magnetic field strength (Supplementary Fig. 3). While this suggests that this SEGAL model is sample efficient in learning the typical ensemble configurations, we note that this reduction in energy evaluations does not translate exactly to acceleration in wall clock time, because of the overhead of the neural network operations, the ability of SEGAL to leverage batches to evaluate energies in parallel, and the Wang–Landau algorithm’s exploitation of the local structure of *U* to efficiently compute changes in energy between simulation steps. Though state-of-the-art exact-density approaches have achieved accuracies of ≈10^{−5} on 16 × 16 lattices at a single temperature^{28}, sacrificing optimal performance for generalizability over the space of constraints may have more practical utility in regimes when many sets of conditions are of interest, as is the case in predicting materials phase diagrams.

In order to provide another estimate on the quality of the SNIS, we measured the normalized effective sample size (NESS) over the conditions the model saw during training (Fig. 2c). While the NESS cannot be used to guarantee accurate model performance, it indicates where the model performs poorly. Over a wide range of conditions, SEGAL performs adequately, with a minimum NESS of 0.40. Areas with lower NESS give some intuition on the limitations of conditional generation. For instance, there are regions of lower NESS near the boundary of the training region, which is likely an artifact of the strategy used to sample different conditions during training. NESS is also lower near the first-order phase transition where the typical configurations sampled by SEGAL change rapidly. Interestingly, above the critical temperature, performance no longer degrades significantly near *B* = 0, which can be interpreted through the disappearance of the first-order phase transition.

The NESS is not a foolproof metric for performance, because a model suffering from mode collapse—that is, repeatedly producing only a very small set of unique outputs—can still have high NESS. To address this concern, we further investigated potential mode collapse of the generative model. In particular, symmetry-related microstates must have the same unnormalized probability in the semi-grand canonical ensemble and that invariance should be preserved by SEGAL:

where *U* is invariant upon the operation *G*. A poorly regularized model could prefer samples with a particular translational or rotational orientation that would break the physical symmetry. In order to test our model, we generated samples over the full range of conditions and recorded their probabilities *P*_{AR}(**S**). We then applied a random symmetry operation and recorded the model probability of symmetry-adapted sample *P*_{AR}(*G* × **S**). If the generating field was non-zero, *G* was a random *C*_{4} rotation composed with random translations in horizontal and vertical directions. If the *B*-field was 0.0 (10% of the tests), an additional spin-flip operation was applied half the time. The \(\log ({P}_{{{{\rm{AR}}}}}({{{\bf{S}}}}))\) and \(\log ({P}_{{{{\rm{AR}}}}}(G\times {{{\bf{S}}}}))\) showed significant agreement (*R*^{2} > 0.999), suggesting that the model captures the underlying physical symmetries without the use of data augmentation or invariances being explicitly encoded in the network (Fig. 2d). One possible explanation for this performance is that accurately capturing the ensemble under a range of (*B*, *T*) constraints forces the neural network toward varying regions of the systems order parameters including composition or site correlations. In this way, the training procedure may act as a natural regularizer of the generative model that incentivizes exploration and avoids mode collapse. Lastly, in Supplementary Fig. 4, we explore how automatic differentiation^{45} can be used to extract thermodynamic quantities by taking derivatives from the neural network predicted probabilities *P*_{AR}(**S**) instead of relying on fluctuations.

### Ground states of CuAu

In order to test the ability of SEGAL to detect low internal energy phases on realistic materials, we analyzed its performance detecting the stable ordered structures in a copper-gold alloy, a widely studied system for MC algorithms and software^{46,47,48,49}. As is standard in materials science workflows, we trained a cluster expansion *U*(**S**) model to predict the energy of new decorations of fcc lattices with the aid of the CLEASE^{4} package.

Density functional theory (DFT) energies were computed for copper-gold fcc structures with varying cell sizes that were generated using CLEASE^{4}. We observed that including all the data in the training process resulted in cluster expansions with relatively low accuracy. Prediction errors could be reduced by building models using a set of 41 training examples with at most 18 atoms and with formation energies below 0.02 eV/atom. This set included the pure phases as well as the CuAu, Cu_{3}Au, and Au_{3}Cu ground states. As predictions of thermodynamic stability rely predominantly on the properties of low-energy structures, it is reasonable to improve model accuracy for the most relevant system configurations by filtering out high-energy structures from the training data. Previous work also found that depending on the application context, cluster expansion performance can be sensitive to the choice of training data^{50}. A final cluster expansion was trained using L2 regularization and obtained a leave-one-out cross-validation score of 8.15 meV/atom (Supplementary Fig. 5). The effective cluster interactions parameters and convex hull for a 16-site supercell are shown in Fig. 3a, b, predicting Cu_{3}Au, CuAu, and Au_{3}Cu as stable intermetallics. We note that Au_{3}Cu was not stable according to our DFT calculations, which is consistent with the results of the Materials Project^{51}. However, the presence of a phase that is stable over only a narrow range of chemical potentials increases the challenge of the sampling problem and can more clearly illustrate SEGAL’s performance.

To sample ground states of varying composition, SEGAL was trained on a 16-site fcc lattice prototype over a range of chemical potential differences bounded by values where the pure phases are stable, Δ*μ* ∈ [−0.24 eV, 0.24 eV]. The temperature was steadily decreased over each epoch in a simulated annealing-based approach to increase the likelihood of converging to the correct structures. A similar method was employed by Wu et al. to minimize the energy of spin systems^{34}. Note that in contrast to SEGAL, the minimization of energy alone would only result in the detection of the structure with minimum formation energy (CuAu). The total number of energy evaluations required to train SEGAL on the CuAu system was 1,000,000, which exceeds the number of possible states on the 16-site lattice (65,536), but the resulting model can still be used to examine SEGAL’s behavior in the context of real materials systems.

Once trained, modifying the chemical potential difference allows SEGAL to sample stable alloy structures of varying composition, successfully identifying the pure phases as well as the Cu_{3}Au, CuAu, and CuAu_{3} intermetallics. Futhermore, when stability is determined by the minimum value of the grand potential at 0 K over a batch of 1000 samples, the critical chemical potentials between stable structures closely match those predicted by the convex hull of the cluster expansion, suggesting that SEGAL has learned to approximate the location of phase transitions (Fig. 3c).

We observe a greater degree of mode collapse than with the Ising model case, as the model finds Cu_{3}Au, CuAu, and CuAu_{3} ground states with degeneracy 1, where the exact values determined through a brute force enumeration are 4, 6, and 4, respectively. The increased difficulty of this task could be due to the more complicated symmetry relationships between ground states or the convergence of training temperature to 0 K, which reduces the regularization effects of temperature variability, from which the Ising SEGAL model may have benefited more significantly.

We compared the effectiveness of the trained SEGAL model to a benchmark random algorithm that samples all configurations with equal frequency by recording the percent of samples that correctly identify the grand potential minima at 0 K (Fig. 3d). During the test, 1000 samples were drawn from each method at 36 separate values of Δ*μ*. In total, only 1 of the 36,000 random samples identified the correct structure, whereas 73.5% of the SEGAL samples correspond to the grand potential minima. Therefore, we conclude that SEGAL is capable of extracting stability-relevant thermodynamic information from a model of a real material’s internal energy after being trained. Similar to the observation of the Ising model’s NESS, the lowest probability of sampling the correct structure occurs as Δ*μ* approaches phase transitions, where two competing ground states have very similar grand potentials and SEGAL-generated structures must rapidly switch between phases. This effect is largest in the case of Au_{3}Cu, which is only stable for a narrow range of chemical potentials (Supplementary Fig. 6).

### AgPd alloy

We further explored the ability of SEGAL to capture the physics of a real metal alloy at finite temperature. As an example, we considered a 27-site fcc prototype (3 × 3 × 3 supercell) of silver and palladium, whose phase diagram features a miscibility gap extending to temperatures of up to 600 K. Below the top of the miscibility gap, unfavorable mixing interactions cause ranges of alloy compositions to be thermodynamically unstable. The gap exhibits a characteristic asymmetry, as palladium is highly soluble in silver, but silver has virtually no solubility in palladium at low temperatures^{52,53}.

A cluster expansion approximation of the formation energy *U*_{CE} was built using a dataset of 625 AgPd structures from the ICET^{3} tutorial database and obtained a ten-fold cross-validation error of 2.3 meV/atom (Supplementary Fig. 7a). SEGAL was trained using *U*_{CE} over a temperature range of [200 K, 900 K] extending within and above the expected miscibility gap. Benchmark semi-grand canonical MCMC simulations using the same cluster expansion were run using CLEASE. In order to show the flexibility of SEGAL with regard to the energy model *U*, we also trained a crystal graph convolutional model for the formation energy *U*_{CGC} over the same dataset, which achieved a test error of 1.49 meV/atom (Supplementary Fig. 7b)^{54}. For the crystal graph convolutional model, we wrote our own CGC MCMC implementation to obtain reference values. While SEGAL is compatible with any parametrization of *U,* and benchmarks of graph-based neural networks^{55} seem to demonstrate similar performance across architectures when trained on large datasets, we found that model selection can be an important factor. In particular, a SchNet^{56} model trained on the AgPd dataset obtained a test error of 8.1 meV/atom, much higher than *U*_{CE} or *U*_{CGC}. Because our dataset is small, the initial atom featurizations developed for CGCNN models may lead to improved accuracy when compared with the simpler atomic-number encoding in SchNet. Furthermore, SchNet is typically used as an interatomic potential and is highly sensitive to parameterizing interatomic distances, which are irrelevant in our case, since the input representation to the *U* models is an idealized lattice with arbitrary fixed lattice constant at all compositions. As a result, though SEGAL can be trained with any *U*, the ability of later sampling tasks to predict physical properties will be dependent on the choice of internal energy model.

Results from SNIS and the Markov Chain estimates show strong numerical agreement across multiple temperatures for both *U*_{CE} and *U*_{CGC} energy models, with most deviations in composition being on the order of 10^{−3} (Supplementary Fig. 8). These errors are sufficiently small to recover the physical properties and phase stability of the alloy over the training region. At 250 K, the discontinuity in compositions indicates thermodynamically unstable compositions and confirms the presence of the two-phase region, separating a nearly pure Pd phase and a 60/40 mixture of Pd and Ag (Fig. 4a, b, d). At 750 K, both methods show continuous variation in composition with chemical potential, suggesting that the top of the miscibility gap has been exceeded (Fig. 4a, b). Importantly, SEGAL is applicable as a sampling method for both *U*_{CGC} and *U*_{CE} potentials, and can be readily generalized to any developed models for alloy energy that achieve sufficient accuracy.

The NESS of SEGAL (Fig. 4c) is reasonable over a large range of conditions, but indicates lower performance near the critical values of Δ*μ*, at which the discontinuity in composition is observed and the typical lattice configurations at equilibrium change rapidly. These uncertainties near phase transitions can introduce deviations in the bounds of the two-phase region such as those observed at 250 K. We further note that above the miscibility gap (≈600 K), stable compositions change more continuously, and the subsequent decrease in the NESS metric is significantly less pronounced. By identifying regions of constraint space where typical states of the system change rapidly, NESS calculations of SEGAL models show some promise at the automatic detection of phase transitions.

### Predicting phase stability

Finally, we give examples on how the SEGAL model can be used to extract information on phase stability. To reduce the artificial effects of a finite simulation cell, we trained SEGAL on larger cells for the AgPd (125-site) and CuAu (128-site) systems. After drawing 5000 samples from the AgPd model at each set of constraints for 15 temperatures between 200 and 900 K and 81 values of Δ*μ* between −0.4 eV and +0.4 eV, a region of thermodynamically unstable compositions was visible and attributed to the miscibility gap. The top of the gap was approximated by using the Felzenszwalb–Huttenlocher image segmentation algorithm^{57,58} on \({\log }_{10}[{{{\rm{NESS}}}}({{\Delta }}\mu ,T)]\) to estimate the values of Δ*μ* and *T* where the phase transition occurs (Supplementary Fig. 9a). The boundary of the gap was computed using a polynomial fit to select points (%Ag,*T*) from those obtained in the sampling procedure above. Below the critical temperature, the points exhibiting the greatest change in composition when Δ*μ* changed by 0.02 eV were selected. At the critical temperature, the point with the largest composition change between Δ*μ* − 0.01 eV and Δ*μ* + 0.01 eV was selected. For the CuAu model, 5000 samples were drawn at each set of constraints for 21 temperatures from 200 to 1200 K and 36 values of Δ*μ* from −0.2 eV to +0.15 eV. Observed discontinuities in stable compositions suggested the presence of a Cu_{3}Au − CuAu two-phase region for temperatures below 700 K. Estimated bounds were determined from the maximum difference in composition between Δ*μ* values separated by 0.02 eV, restricted to the composition range 0.2 < %Au < 0.6. Based on previous work of Takeuchi et al.^{20}, bounds for order-disordered two-phase regions were estimated by locating the temperature with maximal heat capacity *T*_{C} for each constant value of Δ*μ* and approximating the bounds of the two-phase regions as the compositions at (Δ*μ* − *δ*, *T*_{C}) and (Δ*μ* + *δ*, *T*_{C}) with *δ* = 0.01 eV (Supplementary Fig. 9b). Results for both systems agree favorably with reference metadynamics simulations (Fig. 5a, b).

The total number of cluster expansion energy evaluations required to train and sample the AgPd and CuAu models with SEGAL were 1.3 × 10^{7} and 3.0 × 10^{7}, respectively. The baseline metadynamics simulations required 4.1 × 10^{7} (AgPd) and 3.6 × 10^{8}(CuAu) energy evaluations. However, we note that due to the increased accuracy of the metadynamics simulations, highlighted by the detection of the Au_{3}Cu phase, these values are not directly comparable. The NESS values of these larger models (Supplementary Fig. 10) exhibit many of the similar trends as previous experiments, such as low values in the vicinity of phase transitions. In contrast, NESS values in the disordered phases can be less than 10^{−2}, significantly lower than those observed for the smaller AgPd alloy and the Ising system. As a result, the efficient scaling of SEGAL models to large cell sizes of complex alloys is an outstanding challenge but holds promise for the simulation of multi-component systems.

## Discussion

We have shown that general-purpose generative models for statistical physics can be readily modified for applications computing thermodynamic quantities in materials science. In particular, transforming to the semi-grand canonical ensemble avoids interfaces between competing phases and allows for greater control over the exploration of experimental order parameters such as composition and atomic ordering. Furthermore, a single model with no training examples from previous simulations can generalize across a wide range of constraints and accurately determine thermodynamic potentials, observables, and stable phases. SEGAL does not restrict the form of the potential *U*(**S**) in any way and can be trained with crystal graph convolution networks^{54} or other approaches capable of modeling complex multi-component systems^{59}. As a result, generative models have the potential to become a useful tool alongside standard lattice simulation techniques.

While the approach is promising, a number of algorithmic changes are needed improve its scalability and performance. The current architecture can be more sample efficient than baseline methods but does not scale to cell sizes comparable to those of typical simulations, which introduces finite-size effects and limits the precision of the final estimates. Though the problem of designing exact-density generative models capable of performing state-of-the-art calculations has not been completely solved^{34,60}, research directions for further improvement have been proposed, including implementing the autoregressive network using graph convolutional layers to utilize the symmetry of the crystal system^{34} or exploiting the local structure of the energy model to improve the scalability of the generation process^{61,62}. Another crucial step is to refine SEGAL’s sampling performance near phase transitions. SEGAL’s ability to identify these regions through a change in typical states and the associated decrease in NESS values could allow for modified training strategies. In particular, training batches can be more frequently focused in regions with low NESS so that additional examples can help the model to improve in cases where the learning task is difficult. Alternatively, SEGAL could be supplemented with standard MCMC simulations run with constraints close to the critical values of temperature and chemical potential or with strategies to account for exponentially suppressed configurations that increase the variance of importance sampling estimates^{63}.

## Methods

### Thermodynamics ensembles

To draw samples from a particular equilibrium ensemble, lattice Monte Carlo simulations must be run under a chosen set of thermodynamic constraints. In the canonical ensemble, temperature and composition are fixed and system configurations are sampled according to their relative Boltzmann weight \(\propto {{{{\rm{e}}}}}^{-\frac{U}{{k}_{{{{\rm{b}}}}}T}}\). Free energies obtained through this approach can characterize a wide range of phenomena in statistical physics. However, when investigating multi-component materials thermodynamics, the free energy minimum can be achieved by any linear combination of phases that satisfies the composition constraints. Therefore, at equilibrium, multiple phases can coexist in a manner that cannot be represented with a single fixed lattice prototype without introducing phase boundaries. The presence of these multi-phase regions must then be inferred from non-convex regions of the free energy as a function of composition that was observed in the simulation. In order to alleviate this challenge, materials scientists often work in the grand canonical ensemble with fixed chemical potentials and temperature. In this ensemble, for each set of constraints only a single phase will be present at equilibrium, except at the critical values where phase transitions occur. As a result, simulations avoid multi-phase equilibria and are more well-suited to a single lattice cell. While GANs have been applied to the grand canonical ensemble in the context of scalar field theory^{64}, most previous exact-density approaches^{27,28,34} have modeled the canonical ensemble.

The grand potential and resulting microstate probabilities can be derived for a system of *i* species through a Legendre transform of the canonical ensemble. With a fixed total number of sites ∑_{i}*N*_{i} = *N*_{tot}, the system is in the semi-grand canonical ensemble and is determined by a set of *i* − 1 chemical potential differences, Δ*μ*_{i} = *μ*_{i} − *μ*_{0}, and the temperature:

The relative probabilities, and thus, the representative configurations the system occupy at equilibrium change in response to the above constraints. In particular, varying the chemical potential differences results in driving forces to introduce changes in composition, and increasing the temperature leads to a greater contribution to the grand potential from configurational entropy and greater system disorder. We demonstrate the dependence of composition on chemical potential for a toy system in (Supplementary Fig. 1).

### Training

If the sampler was perfect, all microstates configurations would appear with the same relative probabilities as they do in the studied thermodynamic ensemble. One approach to encourage the model probability distribution to converge on the correct values is to minimize the KL divergence, a measure of the difference between two probability distributions, between the model and the ensemble KL(*P*_{AR}∣*P*_{SG}). It can be shown that (Supplementary Methods) the resulting minimization objective can be expressed as:

The true grand potential is the minimum of 〈*U* − *S**T* − ∑_{i≠0}Δ*μ*_{i}*N*_{i}〉 for all possible probability distributions over microstates and will provide a lower bound on the training loss function such that Φ_{AR} ≥ Φ_{SG}. While Eq. (6) is not differentiable due to the discrete, stochastic sampling step, gradients can be estimated through^{34,65}:

where \({\hat{{{\Phi }}}}_{{{{\rm{AR}}}}}\) is an estimate of Eq. (6) over the whole batch of samples. Intuitively, the model will seek to lower the likelihood of configurations for which \({P}_{{{{\rm{AR}}}}} \,>\, {\hat{P}}_{{{{\rm{SG}}}}}\) and increase the likelihood of configurations for which \({P}_{{{{\rm{AR}}}}} \,<\, {\hat{P}}_{{{{\rm{SG}}}}}\). Because *U*(**S**) is not required to be differentiable, a wide range of standard energy models can be easily incorporated into this approach.

Training SEGAL does not require any example configurations, only an energy function *U*(**S**) to model. Batches of samples are iteratively drawn and used to estimate the loss function and update model parameters. As training continues, the estimated grand potential \({\hat{{{\Phi }}}}_{{{{\rm{AR}}}}}\) decreases toward the true minimum Φ_{SG}, and the relative probabilities of the samples approach their equilibrium values. We found multiple procedures could be implemented in order to effectively allow the model to capture the condition-dependent equilibrium distribution. The chemical potential differences Δ*μ*_{batch} and temperature *T*_{batch} of each batch could be set randomly using a uniform distribution within the bounds being investigated \({T}_{{{{\rm{batch}}}}}\in [{T}_{\min },{T}_{\max }]\), \({{\Delta }}{\mu }_{{{{\rm{batch}}}}}\in [{\mu }_{\min },{\mu }_{\max }]\) or set to specific values chosen as hyperparameters. Training can be stabilized by computing the loss over several sets of conditions [*T*_{batch}, Δ*μ*_{batch}] simultaneously before updating parameters. In this case, estimates of \({\hat{{{\Phi }}}}_{{{{\rm{AR}}}}}\) are computed separately over constant conditions. In addition, because the magnitude of thermodynamic potentials can differ significantly depending on the constraints, when combining samples generated under different conditions the gradients were further normalized by the absolute value of \(\frac{{\hat{{{\Phi }}}}_{{{{\rm{AR}}}}}(T,\{{{\Delta }}{\mu }_{i}\})}{{k}_{{{{\rm{b}}}}}T}\). Following the learning procedure, the model can draw samples over the entire range of conditions it was exposed to during training.

### Self-normalized importance sampling

Despite the physics-informed training procedure, generative models will not achieve perfect performance for any ensemble, and estimates of thermodynamic observables can be significantly biased^{28,34}. However, if the probability of the proposed samples *P*_{AR} is known exactly, the statistical power of numerical estimates can be improved by weighting samples using the relation:

where, for example, samples that appear more frequently in the generated distribution than in the target distribution are given less weight to compensate for their increased rate of appearance. While the normalizing constant of *P*_{SG} is unknown in many practical problems, samples can be still be treated as a well-designed proposal distribution for a Markov Chain^{29} or used as a biasing distribution for histogram reweighting^{27}. Nicoli et al.^{28} introduced the use of generative models with SNIS, which offers the added benefit of providing estimates of both normalizing constants and observables. Defining *w*(**S**) as the unnormalized ensemble probability divided by the generative model probability *P*_{AR}:

Because an estimate of *Z*_{SG} must be used in Eq. (10), SNIS is still biased in practice, but the biases can be substantially smaller than those achieved by simply averaging over samples of the generative model. One metric to evaluate this approach is the effective sample size (ESS), which provides an estimate of the number of samples from the true target distribution required to match the performance of the SNIS. The ESS can be normalized (NESS) to evaluate the typical quality of generated samples when compared with the target distribution:

Note that if the generated distribution closely resembles the target distribution and all *w*_{i} are close to *Z*_{SG}, the NESS will approach 1. As the generated distribution deviates from the target and the variation in *w*_{i} increases, the NESS will approach \(\frac{1}{n}\).

### Density functional theory calculations

DFT calculations were carried out using the Vienna Ab initio Simulation Package^{66,67} v. 5.4.4, within the projector-augmented wave method^{68,69}. The Perdew–Burke–Ernzerhof functional within the generalized gradient approximation^{70} was employed as the exchange-correlation functional, including dispersion corrections through Grimme’s D3 method^{71,72}. The kinetic energy cutoff for plane waves was restricted to 520 eV. Integrations over the Brillouin zone were performed using Monkhorst-Pack *k*-point meshes^{73} with a uniform density of 64 *k*-points/Å^{−3}. A stopping criterion of 10^{−6} eV was adopted for the electronic convergence within the self-consistent field cycle. Optimization of unit cell parameters and atomic positions was performed until the Hellmann–Feynman forces on atoms were smaller than 10 meV/Å.

## Data availability

The CuAu DFT training data used to fit the *U*(**S**) models as well as all trained SEGAL models needed to reproduce this work are available at https://github.com/learningmatter-mit/Segal. The AgPd dataset developed for ICET^{3} can be found in their public repository https://gitlab.com/materials-modeling/icet.

## Code availability

The algorithms reported in this work for training and analyzing SEGAL models are available at https://github.com/learningmatter-mit/Segal.

## References

Thomas, J. C. et al. CASM, v0.2.1. https://github.com/prisms-center/CASMcode/tree/v0.2.1 (2021).

Van der Ven, A., Thomas, J., Puchala, B. & Natarajan, A. First-principles statistical mechanics of multicomponent crystals.

*Annu. Rev. Mater. Res.***48**, 27–55 (2018).Ångqvist, M. et al. ICET—a Python library for constructing and sampling alloy cluster expansions.

*Adv. Theory Simul.***2**, 1900015 (2019).Chang, J. H. et al. CLEASE: a versatile and user-friendly implementation of cluster expansion method.

*J. Phys. Condens. Matter***31**, 325901 (2019).Lerch, D., Wieckhorst, O., Hart, G. L., Forcade, R. W. & Müller, S. UNCLE: a code for constructing cluster expansions for arbitrary lattices with minimal user-input.

*Model. Simul. Mater. Sci. Eng.***17**, 55003 (2009).van de Walle, A. & Ceder, G. Automating first-principles phase diagram calculations.

*J. Phase Equilib.***23**, 348–359 (2002).van de Walle, A., Asta, M. & Ceder, G. The alloy theoretic automated toolkit: a user guide.

*Calphad***26**, 539–553 (2002).Bäker, M. Calculating phase diagrams with ATAT. Preprint at https://arxiv.org/abs/1907.10151 (2019).

Troppenz, M., Rigamonti, S. & Draxl, C. Predicting ground-state configurations and electronic properties of the thermoelectric clathrates Ba

_{8}Al_{x}Si_{46−x}and Sr_{8}Al_{x}Si_{46−x}.*Chem. Mater.***29**, 2414–2424 (2017).Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines.

*J. Chem. Phys.***21**, 1087 (1953).Swendsen, R. H. & Wang, J. S. Nonuniversal critical dynamics in Monte Carlo simulations.

*Phys. Rev. Lett.***58**, 86–88 (1987).Wolff, U. Collective Monte Carlo updating for spin systems.

*Phys. Rev. Lett.***62**, 361–364 (1989).Swendsen, R. H. & Wang, J. S. Replica Monte Carlo simulation of spin-glasses.

*Phys. Rev. Lett.***57**, 2607–2609 (1986).Wang, F. & Landau, D. P. Efficient, multiple-range random walk algorithm to calculate the density of states.

*Phys. Rev. Lett.***86**, 2050–2053 (2001).Widom, M. Modeling the structure and thermodynamics of high-entropy alloys.

*J. Mater. Res.***33**, 2881–2898 (2018).Antillon, E. & Ghazisaeidi, M. Efficient determination of solid-state phase equilibrium with the multicell Monte Carlo method.

*Phys. Rev. E***101**, 063306 (2020).Niu, C., Windl, W. & Ghazisaeidi, M. Multi-cell Monte Carlo relaxation method for predicting phase stability of alloys.

*Scr. Mater.***132**, 9–12 (2017).Niu, C., Rao, Y., Windl, W. & Ghazisaeidi, M. Multi-cell Monte Carlo method for phase prediction.

*Npj Comput. Mater.***5**, 1–5 (2019).Sadigh, B. & Erhart, P. Calculation of excess free energies of precipitates via direct thermodynamic integration across phase boundaries.

*Phys. Rev. B***86**, 134204 (2012).Takeuchi, K., Tanaka, R. & Yuge, K. New Wang-Landau approach to obtain phase diagrams for multicomponent alloys.

*Phys. Rev. B***96**, 144202 (2017).Schwalbe-Koda, D. & Gómez-Bombarelli, R.

*Generative Models for Automatic Chemical Design*445–467 (Lecture Notes in Physics, Vol. 968, Springer, 2020).Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules.

*ACS Cent. Sci.***4**, 268–276 (2018).Dan, Y. et al. Generative adversarial networks (GAN) based efficient sampling of chemical composition space for inverse design of inorganic materials.

*Npj Comput. Mater.***6**, 84 (2020).Kim, B., Lee, S. & Kim, J. Inverse design of porous materials using artificial neural networks.

*Sci. Adv.***6**, eaax9324 (2020).Roy, A., Saffar, M., Vaswani, A. & Grangier, D. Efficient content-based sparse attention with routing transformers.

*Trans. Assoc. Comput.***9**, 53–68 (2021).Salimans, T., Karpathy, A., Chen, X. & Kingma, D. P. PixelCNN++: improving the PixelCNN with discretized logistic mixture likelihood and other modifications. In

*Proc. of the 5th International Conference on Learning Representations*(2017).Noé, F., Olsson, S., Köhler, J. & Wu, H. Boltzmann generators: sampling equilibrium states of many-body systems with deep learning.

*Science***365**, eaaw1147 (2019).Nicoli, K. A. et al. Asymptotically unbiased estimation of physical observables with neural samplers.

*Phys. Rev. E***101**, 23304 (2020).Albergo, M. S., Kanwar, G. & Shanahan, P. E. Flow-based generative models for markov chain monte carlo in lattice field theory.

*Phys. Rev. D.***100**, 034515 (2019).Kanwar, G. et al. Equivariant flow-based sampling for lattice gauge theory.

*Phys. Rev. Lett.***125**, 121601 (2020).Pawlowski, J. M. & Urban, J. M. Reducing autocorrelation times in lattice simulations with generative adversarial networks.

*Mach. Learn.: Sci. Technol.***1**, 045011 (2020).Li, S. H. & Wang, L. Neural Network Renormalization Group.

*Phys. Rev. Lett.***121**, 260601 (2018).Zhang, L., E, W. & Wang, L. Monge-Ampère flow for generative modeling. Preprint at https://arxiv.org/abs/1809.10188 (2018).

Wu, D., Wang, L. & Zhang, P. Solving statistical mechanics using variational autoregressive networks.

*Phys. Rev. Lett.***122**, 080602 (2019).Mcnaughton, B., Milošević, M. V., Perali, A. & Pilati, S. Boosting Monte Carlo simulations of spin glasses using autoregressive neural networks.

*Phys. Rev. E***101**, 53312 (2020).Hibat-Allah, M., Inack, E. M., Wiersema, R., Melko, R. G. & Carrasquilla, J. Variational neural annealing.

*Nat. Mach. Intell.***3**, 952–961 (2021).Singh, J., Scheurer, M. S. & Arora, V. Conditional generative models for sampling and phase transition indication in spin systems.

*SciPost Phys.***11**, 43 (2021).Dibak, M., Klein, L. & Noé, F. Temperature-steerable flows. In

*Proc. of the 34th Conference on Neural Information Processing Systems—ML4PS Workshop*(2020).Belardinelli, R. E. & Pereyra, V. D. Wang-Landau algorithm: a theoretical analysis of the saturation of the error.

*J. Chem. Phys.***127**, 184105 (2007).Belardinelli, R. E. & Pereyra, V. D. Fast algorithm to calculate density of states.

*Phys. Rev. E***75**, 046701 (2007).Haule, K. Wang-Landau algorithm for 2D Ising model. http://www.physics.rutgers.edu/h̃aule/681/src_MC/python_codes/wangLand.py (2010).

Kaufman, B. Crystal statistics. II. Partition function evaluated by spinor analysis.

*Phys. Rev.***76**, 1232–1243 (1949).Beale, P. D. Exact distribution of energies in the two-dimensional ising model.

*Phys. Rev. Lett.***76**, 78–81 (1996).Pathria, R. K. & Beale, P. D.

*Statistical Mechanics*3rd edn (Elsevier Ltd, 2011).Wang, W., Axelrod, S. & Gómez-Bombarelli, R. Differentiable molecular simulations for control and learning. Preprint at https://arxiv.org/abs/2003.00868 (2020).

Fontaine, D. D.

*Cluster Approach to Order-Disorder Transformations in Alloys*33–176 (Solid State Physics, Vol. 47, Academic Press, 1994).Lu, Z. W., Wei, S. H., Zunger, A., Frota-Pessoa, S. & Ferreira, L. G. First-principles statistical mechanics of structural stability of intermetallic compounds.

*Phys. Rev. B***44**, 512–544 (1991).Ozoliņš, V., Wolverton, C. & Zunger, A. Cu-Au, Ag-Au, Cu-Ag, and Ni-Au intermetallics: first-principles study of temperature-composition phase diagrams and structures.

*Phys. Rev. B***57**, 6427–6443 (1998).Zhang, Y., Kresse, G. & Wolverton, C. Nonlocal first-principles calculations in Cu-Au and other intermetallic alloys.

*Phys. Rev. Lett.***112**, 075502 (2014).Kleivan, D., Akola, J., Peterson, A. A., Vegge, T. & Chang, J. H. Training sets based on uncertainty estimates in the cluster-expansion method.

*J. Phys. Energy***3**, 034012 (2021).Jain, A. et al. The Materials Project: a materials genome approach to accelerating materials innovation.

*APL Mater.***1**, 011002 (2013).Ghosh, G., Kanter, C. & Olson, G. Thermodynamic modeling of the Pd-X (X=Ag, Co, Fe, Ni) systems.

*J. Phase Equilib.***20**, 295–308 (1999).Dinsdale, A. et al.

*Atlas of Phase Diagrams for Lead-Free Soldering*(Cost Action 531, European Cooperation in Science and Technology, 2008).Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties.

*Phys. Rev. Lett.***120**, 145301 (2018).Fung, V., Zhang, J., Juarez, E. & Sumpter, B. G. Benchmarking graph neural networks for materials chemistry.

*Npj Comput. Mater.***7**, 84 (2021).Schütt, K. T. et al. SchNetPack: a deep learning toolbox for atomistic systems.

*J. Chem. Theory Comput.***15**, 448–455 (2019).Felzenszwalb, P. F. & Huttenlocher, D. P. Efficient graph-based image segmentation.

*Int. J. Comput. Vis.***59**, 167–181 (2004).van der Walt, S. et al. scikit-image: image processing in Python.

*PeerJ***2**, e453 (2014).Liu, X. et al. Monte Carlo simulation of order-disorder transition in refractory high entropy alloys: A data-driven approach.

*Comput. Mater. Sci.***187**, 110135 (2021).Boyda, D. et al. Sampling using SU(

*n*) gauge equivariant flows.*Phys. Rev. D.***103**, 074504 (2021).Pan, F., Zhou, P., Zhou, H. J. & Zhang, P. Solving statistical mechanics on sparse graphs with feedback-set variational autoregressive networks.

*Phys. Rev. E***103**, 012103 (2021).Dai, H., Nazi, A., Li, Y., Dai, B. & Schuurmans, D. Scalable deep generative modeling for sparse graphs. In

*Proc. of the 37th International Conference on Machine Learning, PMLR 119*, 2302–2312 (2020).Wu, D., Rossi, R. & Carleo, G. Unbiased monte carlo cluster updates with autoregressive neural networks.

*Phys. Rev. Res.***3**, L042024 (2021).Zhou, K., Endrödi, G., Pang, L.-G. & Stöcker, H. Regressive and generative neural networks for scalar field theory.

*Phys. Rev. D.***100**, 011501 (2019).Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning.

*Mach. Learn.***8**, 229–256 (1992).Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set.

*Comput. Mater. Sci.***6**, 15–50 (1996).Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set.

*Phys. Rev. B***54**, 11169–11186 (1996).Blöchl, P. E. Projector augmented-wave method.

*Phys. Rev. B***50**, 17953–17979 (1994).Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method.

*Phys. Rev. B***59**, 1758–1775 (1999).Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple.

*Phys. Rev. Lett.***77**, 3865–3868 (1996).Grimme, S., Antony, J., Ehrlich, S. & Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu.

*J. Chem. Phys.***132**, 154104 (2010).Grimme, S., Ehrlich, S. & Goerigk, L. Effect of the damping function in dispersion corrected density functional theory.

*J. Comput. Chem.***32**, 1456–1465 (2011).Monkhorst, H. J. & Pack, J. D. Special points for Brillouin-zone integrations.

*Phys. Rev. B***13**, 5188–5192 (1976).Towns, J. et al. XSEDE: accelerating scientific discovery.

*Comput. Sci. Eng.***16**, 62–74 (2014).

## Acknowledgements

This work was supported by ARPAe DIFFERENTIATE (Award No DE-AR0001220) and by Zapata Computing Inc. J.D. acknowledges support from the National Defense Science and Engineering Graduate Fellowship. D.S.-K. was additionally supported by the MIT Energy Fellowship. The DFT calculations from this paper were executed at the Massachusetts Green High-Performance Computing Center with support from MIT Research Computing, and at the Extreme Science and Engineering Discovery Environment (XSEDE)^{74} Expanse through allocation TG-DMR200068.

## Author information

### Authors and Affiliations

### Contributions

J.D. implemented the SEGAL algorithms and ran the Monte Carlo simulations. D.S.-K. prepared and ran the DFT calculations. R.G.-B. conceived the project and supervised the research. All authors contributed to the data analysis and manuscript writing.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Damewood, J., Schwalbe-Koda, D. & Gómez-Bombarelli, R. Sampling lattices in semi-grand canonical ensemble with autoregressive machine learning.
*npj Comput Mater* **8**, 61 (2022). https://doi.org/10.1038/s41524-022-00736-4

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41524-022-00736-4