Introduction

Recent years have seen a dramatic increase in the use of machine learning tools in materials science.1 They have been combined with large databases and high-throughput computations2,3,4,5,6,7 in the search of novel materials chemistries and to learn global trends.8,9,10,11,12,13 In parallel, machine learning tools are increasingly used to construct interatomic force-fields that can represent diverse local environments.14,15,16,17,18 A major challenge in the application of machine learning in materials science is the identification of suitable structural and chemical descriptors that are invariant to underlying symmetries of the problem. Many such descriptors have already been formulated, including local descriptors that use distances and angles between atoms or expand local environments in terms of spherical harmonics.14,15,16,17,18,19,20

Alloys, where properties are sensitive to the degree of order or disorder of different chemical species on a parent crystal structure, have received less attention as a machine learning problem. Here, we address the alloy problem from a machine learning perspective and show that suitable and robust descriptors of the degree of configurational order can be formulated using mathematical tools that have been developed in the context of lattice model Hamiltonians.

Lattice model Hamiltonians play a central role in first-principles statistical mechanics schemes to predict thermodynamic potentials and diffusion coefficients of alloys and off-stoichiometric compounds.21 They were put on a firm theoretical footing by Sanchez et al.22 with the rigorous derivation of the cluster expansion, an effective Hamiltonian expressed in terms of orthonormal basis functions of configurational occupation variables. The cluster expansion formalism sets up a natural mathematical framework with which to represent the properties of a crystal as a function of site degrees of freedom.22,23 Since it is expressed in terms of a complete and orthonormal basis, it enables a systematic tuning of truncation errors when parameterizing expansion coefficients to first-principles training data. In this way, complex energy landscapes as a function of configurational degrees of freedom can be reproduced by fitting to a relatively small number of first-principles electronic structure calculations. The approach has enabled accurate first-principles predictions of temperature-composition phase diagrams,23,24,25,26,27,28,29,30,31,32,33,34 order–disorder phenomena,35,36,37,38,39,40,41,42,43,44,45 and composition-dependent diffusion coefficients in alloys and complex inorganic compounds.46,47,48,49,50,51,52

A cluster expansion is formulated as a linear series of cluster basis functions multiplied by constant expansion coefficients that are determined by the underlying chemistry and crystal structure of the multicomponent solid. Cluster expansions, while formally exact, must be truncated in practice. Many advanced methods have been developed to aid in the accurate and efficient parameterization of a truncated cluster expansion. These include genetic algorithms to select a cluster basis set,53 schemes to reduce over-fitting using cross-validation26 and regularizers,54 and the use of Bayesian priors to incorporate physical intuition during model development.55 Recently, methods have also been developed to determine the ground states of a cluster expansion56 and to impose constraints as part of the regression step to ensure that the cluster expansion predicts ground states correctly.57

Here, we build on the cluster expansion approach, but relax the constraint of linearity and leverage advanced machine learning tools such as neural networks and Gaussian process regressions to represent crystal properties that depend on alloy configuration in terms of symmetry invariant descriptors of order. As descriptors, we use site-centric correlation functions, which are related to the correlation functions introduced by Sanchez and De Fontaine58,59 and are at the core of the cluster expansion approach.22 We illustrate the method by modeling the formation energies of a synthetic multi-body binary Hamiltonian on the FCC crystal and of Li-vacancy disorder in spinel LiTiS2, a compound that is crystallographically more complex than most, having two symmetrically non-equivalent sites. We find that accurate Hamiltonians can be built with a relatively small number of ab-initio calculations and only a few correlation functions as descriptors.

Fig. 1
figure 1

Prototype square lattice with two distinct pair clusters highlighted. The pair marked in red corresponds to the nearest neighbor pair cluster, while the pair marked in green is the next nearest neighbor pair. Equivalent clusters are marked in the same color. The orbit of a particular cluster centered around site i consists of all the equivalent clusters

Results

The cluster expansion formalism re-visited

We start by reviewing essential ingredients to the cluster expansion approach as applied to a simple binary alloy. A particular ordering of the components of a binary crystal of N sites can be represented as an unrolled vector of occupation variables, \(\vec \sigma = \{ \sigma _1,...,\sigma _i,...,\sigma _N\}\), where σi is +1 or −1 depending on the occupant of site i. Sanchez et al.22 showed that any scalar property of a binary crystal that depends on \(\vec \sigma\), such as its fully relaxed formation energy, can be expressed as an expansion in terms of polynomials of occupation variables according to

$$E(\vec \sigma ) = NV_0 + \mathop {\sum}\limits_\alpha V_\alpha {\mathrm{\Phi }}_\alpha (\vec \sigma )$$
(1)

where the sum extends over all clusters of sites α within the crystal (e.g., point clusters, pair clusters, triplet clusters, etc.) and where

$${\mathrm{\Phi }}_\alpha (\vec \sigma ) = \mathop {\prod}\limits_{i\, \in \,\alpha } \sigma _i$$
(2)

are cluster functions, defined as the product of occupation variables belonging to the cluster α. Sanchez et al.22 showed that the cluster functions Φα form a complete and orthonormal basis with respect to a particular scalar product defined on the space of configurations \(\vec \sigma\). The expansion coefficients Vα in Eq. (1) are constant and are determined by the chemistry and crystal structure of the alloy.

The symmetry of the undecorated parent crystal structure imposes constraints on the expansion coefficients Vα in Eq. (1). Any two cluster functions \(\Phi _\alpha (\vec \sigma )\) and \(\Phi _\delta (\vec \sigma )\) that can be mapped onto each other by a space group operation of the crystal must have the same expansion coefficients (i.e., Vα = Vδ). All cluster functions \(\Phi _\delta (\vec \sigma )\) that are related by a symmetry operation of the crystal to a prototype cluster function \(\Phi _\alpha (\vec \sigma )\) can be grouped together into an orbit of cluster functions \(\Omega _\alpha = \{ \Phi _\alpha (\vec \sigma ),...,\Phi _\delta (\vec \sigma ),...\}\). For example, all cluster functions associated with nearest neighbor pair clusters that are related by a symmetry operation to a prototype nearest neighbor pair cluster belong to the same orbit. For a binary alloy, there exists an orbit of cluster functions for each symmetrically distinct cluster type. The set of all cluster functions can be divided among different orbits Λ = {Ωα, Ωβ,…} where α, β, etc. correspond to symmetrically distinct cluster prototypes.

Since the expansion coefficients belonging to symmetrically equivalent clusters are all equal to each other, there is only one expansion coefficient Vα for each orbit Ωα. This makes it possible to rewrite Eq. (1) as a sum first over orbits followed by a sum over cluster functions within each orbit according to

$$E(\vec \sigma ) = NV_0 + \mathop {\sum}\limits_{{\mathrm{\Omega }}_\alpha \, \in \,{\mathrm{\Lambda }}} V_\alpha \mathop {\sum}\limits_{\delta \, \in \,{\mathrm{\Omega }}_\alpha } {\mathrm{\Phi }}_\delta (\vec \sigma )$$
(3)

Eq. (3) can be normalized by the number of atoms within the crystal and recast as

$$\frac{{E(\vec \sigma )}}{N} = V_0 + \mathop {\sum}\limits_{{\mathrm{\Omega }}_\alpha } V_\alpha m_\alpha \left\langle {{\mathrm{\Phi }}_\alpha (\vec \sigma )} \right\rangle$$
(4)

upon introducing correlation functions defined as22,23

$$\left\langle {\Phi _\alpha (\vec \sigma )} \right\rangle = \frac{{\left( {\mathop {\sum}\limits_{\delta \, \in \,{\mathrm{\Omega }}_\alpha } {\mathrm{\Phi }}_\delta (\vec \sigma )} \right)}}{{Nm_\alpha }}$$
(5)

where mα is the multiplicity of the cluster per site. A correlation function \(\langle {\mathrm{\Phi }}_\alpha (\vec \sigma )\rangle\) is the average value of the cluster function \({\mathrm{\Phi }}_\alpha (\vec \sigma )\) over the orbit Ωα for the ordering \(\vec \sigma\).

For a binary alloy, each symmetrically distinct cluster type (e.g., nearest neighbor pair cluster, second nearest neighbor pair cluster, nearest neighbor triplet cluster, etc.) has a correlation function \(\langle {\mathrm{\Phi }}_\alpha (\vec \sigma )\rangle\) associated with it. The values of all correlation functions, \(\{ \langle {\mathrm{\Phi }}_\alpha (\vec \sigma )\rangle ,\langle {\mathrm{\Phi }}_\beta (\vec \sigma )\rangle ,...\}\) for a particular ordering \(\vec \sigma\) can serve as a fingerprint of that ordering. Since the correlation functions are averages over all symmetrically equivalent cluster functions, they are invariant to an application of any space group operation of the parent crystal applied to the ordering \(\vec \sigma\). Hence the correlation functions will have the same values for all orderings \(\vec \sigma \prime\) that are related by symmetry to \(\vec \sigma\). They are a measure of a particular state of configurational order on a crystal that is invariant to a space group operation of the underlying crystal.

It is instructive to recast the cluster expanded energy as a sum of site energies. To this end, we define \(\Omega _\alpha ^i\) as the set of all cluster functions, Φδ, related by symmetry to Φα in which one of the sites of the cluster δ is site i. \(\Omega _\alpha ^i\) is a subset of Ωα and consists of cluster functions associated with clusters radiating out of site i (Fig. 1). The set of all cluster orbits that radiate from site i will be denoted \(\Lambda ^i = \{ \Omega _\alpha ^i,\Omega _\beta ^i, \cdots \}\), where as previously, the clusters α, β, etc. refer to symmetrically distinct cluster prototypes such as the nearest neighbor pair, the second nearest neighbor pair, etc. In terms of the site orbits, we can rewrite the total energy, Eq. (3), as

$$E(\vec \sigma ) = NV_0 + \mathop {\sum}\limits_i E_i(\vec \sigma )$$
(6)

where the site energies are defined as

$$E_i(\vec \sigma ) = \mathop {\sum}\limits_{{\mathrm{\Omega }}_\alpha ^i\, \in \,{\mathrm{\Lambda }}^i} \frac{{V_\alpha }}{{\left| \alpha \right|}}\mathop {\sum}\limits_{\delta \, \in \,{\mathrm{\Omega }}_\alpha ^i} {\mathrm{\Phi }}_\delta (\vec \sigma )$$
(7)

The |α|, which denotes the number of sites in cluster α, appears in Eq. (6) to avoid over-counting each cluster function \(\Phi _\delta (\vec \sigma )\) when summing Eq. (7) over each site i of the crystal.

Just as the form of the energy expression in Eq. (3) makes clear that the correlation functions defined by Eq. (5) are a measure of the global degree of ordering within the crystal, Eq. (7) for the site energies suggests the importance of local site-centric correlation functions defined as

$$G_\alpha ^i = \mathop {\sum}\limits_{\delta \, \in \,{\mathrm{\Omega }}_\alpha ^i} {\mathrm{\Phi }}_\delta (\vec \sigma )$$
(8)

in measuring a local degree of ordering relative to site i. Since the sum in Eq. (8) is over all symmetrically equivalent clusters having a site i in common, it is invariant to any change in orientation around site i permitted by the space group of the parent crystal of the local degree of ordering.

Developing features for neural network alloy Hamiltonians

The correlation functions defined by Eqs. (5) and (8) form a set of descriptors of the degree of order over the sites of a binary crystal that are invariant to the translational and orientational symmetries of the underlying parent crystal structure. As first shown by Sanchez et al.22 the configurational energy of the crystal can be expressed as a linear expansion of the correlation functions as in Eq. (4), which can trivially be recast into the forms of Eqs. (6) and (7). However, a linear expansion is only guaranteed to be an exact description of the configurational energy if a correlation function is included for every symmetrically distinct cluster type in the crystal. In practice, cluster expansions must be truncated beyond some maximal sized cluster, leading to truncation errors.

Here, we relax the restriction of a linear expansion in terms of correlation functions, and instead allow for a non-linear dependence of the energy on the correlation functions. Similar to Eq. (6), we express the energy of the crystal as a sum of site energies, but the site energies are now allowed to be a non-linear function of the local correlation functions defined by Eq. (8) according to

$$E(\vec \sigma ) = \mathop {\sum}\limits_i E_i\left( {G_\alpha ^i,G_\beta ^i,...} \right)$$
(9)

To be tractable, the site energies will only depend on a finite set of local correlation functions corresponding to short-range and compact clusters. The fact that symmetrically equivalent configurations \(\vec \sigma\) have the same correlations ensures that Eq. (9) is also invariant to the underlying symmetries of the undecorated parent crystal structure and will evaluate to the same energy for all orderings that are equivalent by a space group operation of the crystal.

While the optimal functional dependence of the site energies Ei on a finite set of local correlation function descriptors is not a priori clear, it can be learned with a neural network. Neural networks (NN) are powerful machine learning tools that can replicate complex functions of multiple input variables, also called features. Figure 2 schematically shows a neural net that can describe the site energies Ei relying on inputs corresponding to the different local correlation functions \(\{ G_\alpha ^i,G_\beta ^i,...\}\). Function choices at each node include rectified linear units (ReLU), sigmoid and hyperbolic tangents.60,61

Fig. 2
figure 2

Schematic of neural network architecture and features \((\{ G_\alpha ^i,G_\beta ^i, \cdots \} )\) that are fed into the model for the site energy \((E_i(\vec \sigma ))\)

The neural nets can be trained using first-principles energies, \(E(\vec \sigma )\), calculated for a large number of configurations \(\vec \sigma\) within periodic supercells (Fig. 3). Training neural networks to reproduce the local energy can be accomplished by using conventional backpropagation techniques62 with the following loss function:

$$\begin{array}{*{20}{l}} {\mathrm{\Gamma }} \hfill & = \hfill & {\frac{1}{M}\mathop {\sum}\limits_{\vec \sigma } \left( {E(\vec \sigma ) - E_{{\mathrm{DFT}}}(\vec \sigma )} \right)^2} \hfill \\ {} \hfill & = \hfill & {\frac{1}{M}\mathop {\sum}\limits_{\vec \sigma } \left( {\mathop {\sum}\limits_i E_i\left( {\vec \sigma ;{\mathbf{w}},{\mathbf{b}}} \right) - E_{{\mathrm{DFT}}}(\vec \sigma )} \right)^2} \hfill \end{array}$$
(10)

where w and b are the weight and bias parameters within the neural network, respectively. In this study, we used a fully-connected neural network with three layers consisting of 4, 4, and 2 nodes, respectively. The weights for each network are initialized with values drawn from a uniform distribution as described by Glorot et al.63 We use advanced gradient descent techniques such as ADAM64 that adaptively change the learning rates for each weight parameter with an initial decay rate of 0.001. Further, we use mini-batch training, where the gradients are calculated over a subset of the training data before updating the weights. We then run several epochs (at least 2000) of batch training across our training data set.

Generalization to multi-component arbitrarily complex crystals

The treatment so far relies on a particular functional form for the cluster basis functions, Eq. (2), and is valid for the simplest binary crystals consisting of only one type of site for alloying. There is some flexibility in the choice of cluster basis functions, which, for a binary system can be expressed more generally as

$${\mathrm{\Phi }}_\alpha (\vec \sigma ) = \mathop {\prod}\limits_{i\, \in \,\alpha } \phi \left( {\sigma _i} \right)$$
(11)

where ϕ(σi) represents a function of the occupation variable σi. For example, the commonly used lattice-gas Hamiltonian emerges when \(\phi \left( {\sigma _i} \right) = \frac{1}{2}\left( {1 + \sigma _i} \right)\).65 Sanchez66 has shown how to construct a family of functions ϕ(σi) that are orthogonal under a particular definition of a scalar product in the discrete occupation variable space. For a ternary system, the occupation variables σi assume one of three discrete values (e.g., −1, 0, and +1). Furthermore, for a ternary system multiple cluster basis functions exist for each crystallographic cluster of sites α, and take the form

$${\mathrm{\Phi }}_{\alpha ,\vec n}(\vec \sigma ) = \mathop {\prod}\limits_{i\, \in \,\alpha } \phi _{n_i}\left( {\sigma _i} \right)$$
(12)

where the \(\phi _{n_i}\left( {\sigma _i} \right)\) refer to one of two site basis functions ϕ1 or ϕ2 and where \(\vec n\) is a vector collecting the indices, ni specifying the particular site basis function for site i that is to appear in the cluster basis function. As before, symmetry can be applied to a prototype cluster basis function \({\mathrm{\Phi }}_{\alpha ,\vec n}\) to generate all symmetrically equivalent cluster basis functions forming the orbit \({\mathrm{\Omega }}_{\alpha ,\vec n}\). Site-centric orbits of cluster functions, \({\mathrm{\Omega }}_{\alpha ,\vec n}^i\), can be collected in a similar way as was described for a simple binary system.

Another complexity is that many crystal structures have more than one symmetrically distinct site that can be alloyed. For these crystals, a separate neural network needs to be trained for each symmetrically distinct site.

Fig. 3
figure 3

Schematic picture illustrating how the total energy of a configuration may be estimated by summing the local energy contributions across all sites in the crystal

Case studies

We explore the ability of neural networks to predict configurational formation energies of multi-component crystals. In the first example, we use a neural network to model the formation energies generated with a synthetic cluster expansion Hamiltonian on a face centered cubic lattice. In the second example, we train a neural network to predict the formation energies of Li-vacancy disorder over the interstitial sites of spinel LixTiS2, which contains two symmetrically distinct types of sites that can host Li ions or vacancies.

We generated a synthetic cluster expansion Hamiltonian for the FCC lattice that includes multi-body interactions up to four point clusters. We used a lattice-gas type expansion for the synthetic Hamiltonian (i.e., \(\phi \left( {\sigma _i} \right) = \frac{1}{2}\left( {1 + \sigma _i} \right)\) in Eq. (11)). The expansion coefficients were generated randomly for each cluster and are shown in Fig. 4. These interactions were used to generate a training data set of energies for 1000 randomly generated but symmetrically distinct configurations. This encompasses orderings on supercells up to 10 multiples of the primitive FCC crystal. The energies were input into the ADAM optimizer to estimate parameters for different neural networks having a varying number of local correlation functions as input features. We validated our model against the energies of the 1346 symmetrically distinct configurations with up to 10 volumes of the primitive FCC cell. A comparison of the training, testing, and maximum errors across the linear cluster expansion model and the neural network model is shown in Fig. 5. The neural network consistently performs better in terms of the root mean square error as compared to the linear model, with the two methods converging when all the features of the synthetic cluster expansion are included.

Fig. 4
figure 4

Effective cluster interaction (ECI) values for the linear model were generated randomly by considering multi-body interactions up to four point clusters

Fig. 5
figure 5

The errors on the testing and training data from a synthetic cluster expansion are compared across a neural network model and a linear model with the same number of features. a Training and testing errors while varying the number of input features to the neural network and regular least squares fit. b Maximum errors across the training and testing datasets while varying the number of input features

The neural network predicts the overall energy of the test dataset to within an error of 0.006 eV/atom with six local features (one point feature and five pair descriptors) as shown in Fig. 6. The linear regression model with the same number of features has an error of 0.01 eV/atom. Remarkably, the neural network also predicts the overall shape of the convex hull in agreement with that of the synthetic dataset.

Fig. 6
figure 6

The figure shows the predicted values for the formation energies of a test data set based on models that are fit using a neural network. The model is fit with features consisting of five pair correlations and one point feature. The full data set from the synthetic cluster expansion is plotted as circles while predicted test data is plotted as green crosses. The orange circles represent configurations on the convex hull of the synthetic cluster expansion while the orange crosses represent configurations on the predicted convex hull

We also investigate the ability of a neural network to predict DFT formation energies of lithium-vacancy orderings within a spinel TiS2 crystal which contains two distinct Li sites. The spinel primitive cell contains four octahedral interstitial sites and two tetrahedral interstitial sites that can be occupied by Li. The formation energies calculated with density functional theory on 129 symmetrically distinct orderings are shown in Fig. 7.67 Since there are two crystallographically distinct sites that can host Li-vacancy disorder, two independent neural networks are necessary (one for the tetrahedral sites and one for the octahedral sites) to describe the local energy contributions to the total energy of the crystal.

Fig. 7
figure 7

Formation energies predicted with a least squares regression and b neural network potential for lithium-vacancy orderings on the tetrahedral and octahedral sites of spinel TiS2. Local features around each site are generated from only pair correlations for the neural network, while the least squares model uses the average correlations for the same clusters. The figure shows DFT calculated formation energies as circles, while the predictions from the model are shown as green crosses. Configurations on the DFT convex hull are shown as orange circles and the configurations on the predicted convex hull are shown as orange crosses. The figure on top shows the full composition range, while the figure below spans a smaller composition range up to \(x = \frac{2}{3}\)

The DFT formation energies of 66 configurations were used as training data while the energies of the remaining 63 orderings were used to test the models. The predictions of the neural network and linear regression for this system are shown in Fig. 7. Both models were only trained with local pair cluster correlations having lengths less than 10 Å. The root mean square error over the training data set is 7 meV/f.u. for the neural network, while a regression model with the same clusters had an error of 65 meV/f.u. The models were tested on a holdout set of 63 formation energies, resulting in a 36 meV/f.u. error for the neural network and a 89 meV/f.u. error for the linear regression model. The maximum training (testing) errors for the neural network and regression are 82 (553) and 331 (570) meV/f.u., respectively. Remarkably, as seen in Fig. 7b, the shape of the convex hull reproduced with the neural network model is almost identical to that predicted with the DFT calculations, while the linear regression model shown in Fig. 7a struggles to reproduce the ground states. The errors of the neural network are a tenth that of an equivalent cluster expansion model with only pair interactions. This is especially remarkable since the neural net input feature vector only has information about pair cluster correlations. The linear regression model can be greatly improved by adding additional multi-body clusters. The high quality of the neural network fit using only pair interactions likely stems from the fact that the contributions from the multi-body interactions can be approximated within the neural network through nonlinearities in the activation function and the dense connectivity of the layers.

Discussion

We have shown how neural networks can be implemented to describe the formation energy of a multi-component crystal. Similar to cluster expansion Hamiltonians, it can be generalized to describe any scalar property of a multi-component crystal, such as its formation energy or volume, as a function of configurational degrees of freedom. The approach relies on local variants of the alloy correlation functions introduced by Sanchez and De Fontaine,58,59 which are expressed in terms of site occupation variables that track the chemical occupants at each crystal site. The site-centric correlation functions serve as elements of the input feature vector of the neural network assigned to each symmetrically distinct site within the parent crystal. They are defined in a way to ensure their invariance to any symmetry operation of the undecorated parent crystal structure. The local features are, therefore, guaranteed to have the same value across all local orderings that are related by a symmetry of the underlying crystal.

Neural networks as a function of the local correlation functions can be viewed as non-linear extensions of the cluster expansion formalized by Sanchez et al.22 As such, they should enable a more rapid convergence than traditional cluster expansions, with contributions from multi-body interactions approximated to some degree with non-linear dependencies on correlations belonging to smaller subclusters (e.g., point and pair clusters). While linear cluster expansions have been augmented by non-linear functions in the past,36,68 the non-linear terms have predefined functional forms and usually only depend on a global property, such as the concentration of the solid. The present approach relaxes linearity on all local correlation functions and does not presuppose a functional form.

The approach presented here is not limited to neural network-based tools. Alternative machine learning models such as Gaussian process regression can also be used to estimate site-based energies. In this method, the site-based energy can be interpolated using the similarity of an arbitrary local ordering to the points in the training data set. The similarity is estimated using the kernel trick, while comparing the values of the local symmetry-adapted cluster functions. The method is similar in spirit to the Gaussian approximation potentials.17

A cluster expansion has a local spatial dependence when it is truncated. Similarly, a neural-network model of alloy properties will also have a local spatial dependence if the feature vector of site-centric correlation functions is restricted to short-ranged and compact clusters. The scalar properties of some materials, however, may have contributions from long-range interactions that cannot be neglected. These include strain effects, which are especially important in spatially inhomogeneous crystals,68,69 and electrostatic interactions in ionic crystals. Neural-network alloy Hamiltonians can be adapted to account for long-range interactions by adding additional long-range descriptors in addition to local features. These could include the overall alloy composition and long wave-length Fourier modes of the composition profile.

While a model of the configurational energy should be quantitatively accurate, it must also reproduce important qualitative features including the first-principles predicted ground states. Quadratic programming methods were recently introduced by Huang et al.57 to enforce ground state constraints as part of the regression scheme to determine cluster expansion interaction coefficients. These included constraints that enforce a positive distance from the convex hull for metastable configurations and negative values for configurations on the hull. Similar constraints can be imposed as part of the construction of neural network models of the configurational energy.

As described by Huang et al.57 both the metastability constraint, and the constraints for stable configurations can be summarized as:

$$c(\vec \sigma ) \ge 0$$
(13)

where c(σ) has the form:

$$E(\vec \sigma ) - \mathop {\sum}\limits_{h\, \in \,H} x_hE(\vec \sigma _h) \ge 0$$
(14)

for metastable configurations, with the sum being over all the convex hull points H, and for stable configurations:

$$- \left( {E(\vec \sigma ) - \mathop {\sum}\limits_{h\, \in \,H\vec \sigma } x_hE(\vec \sigma _h)} \right) \ge 0$$
(15)

where the sum extends over all the configurations on the hull, except the configuration, \(\vec \sigma\). The loss function of Eq. (10) subject to the ground state constraints, Eqs. (13)–(15), can be achieved with the help of Lagrange multipliers:

$${\mathrm{\Gamma }} = \frac{1}{M}\mathop {\sum}\limits_{\vec \sigma } \left( {\mathop {\sum}\limits_i E_i(\vec \sigma _i;{\mathbf{w}},{\mathbf{b}}) - E_{DFT}(\vec \sigma )} \right)^2 - \mathop {\sum}\limits_{\vec \sigma } \lambda _{\vec \sigma }c(\sigma )$$
(16)

where the Lagrange multipliers, \(\lambda _{\vec \sigma }\), are required to be positive. Neural networks can then be constructed using standard backpropagation techniques, with the updates of the Lagrange multipliers performed with projected gradients.

Methods

Local cluster functions around each site were calculated with CASM: a clusters approach to Statistical Mechanics software package.30,70,71,72 All graphs were made with the matplotlib73 library. The machine learning tools were implemented with TensorFlow.74 The neural network fitting code and cluster expansion parameterization will be released in a future version of CASM.70