Abstract
Machine learning tools such as neural networks and Gaussian process regression are increasingly being implemented in the development of atomistic potentials. Here, we develop a formalism to leverage such nonlinear interpolation tools in describing properties dependent on occupation degrees of freedom in multicomponent solids. Symmetryadapted cluster functions are used to differentiate distinct local orderings. These local features are used as input to neural networks that reproduce local properties such as the site energy. We apply the technique to reproduce a synthetic cluster expansion Hamiltonian with multibody interactions, as well as the formation energies calculated from firstprinciples for the intercalation of lithium into TiS_{2}. The formalism and results presented here show that complex multibody interactions may be approximated by nonlinear models involving smaller clusters.
Introduction
Recent years have seen a dramatic increase in the use of machine learning tools in materials science.^{1} They have been combined with large databases and highthroughput computations^{2,3,4,5,6,7} in the search of novel materials chemistries and to learn global trends.^{8,9,10,11,12,13} In parallel, machine learning tools are increasingly used to construct interatomic forcefields that can represent diverse local environments.^{14,15,16,17,18} A major challenge in the application of machine learning in materials science is the identification of suitable structural and chemical descriptors that are invariant to underlying symmetries of the problem. Many such descriptors have already been formulated, including local descriptors that use distances and angles between atoms or expand local environments in terms of spherical harmonics.^{14,15,16,17,18,19,20}
Alloys, where properties are sensitive to the degree of order or disorder of different chemical species on a parent crystal structure, have received less attention as a machine learning problem. Here, we address the alloy problem from a machine learning perspective and show that suitable and robust descriptors of the degree of configurational order can be formulated using mathematical tools that have been developed in the context of lattice model Hamiltonians.
Lattice model Hamiltonians play a central role in firstprinciples statistical mechanics schemes to predict thermodynamic potentials and diffusion coefficients of alloys and offstoichiometric compounds.^{21} They were put on a firm theoretical footing by Sanchez et al.^{22} with the rigorous derivation of the cluster expansion, an effective Hamiltonian expressed in terms of orthonormal basis functions of configurational occupation variables. The cluster expansion formalism sets up a natural mathematical framework with which to represent the properties of a crystal as a function of site degrees of freedom.^{22,23} Since it is expressed in terms of a complete and orthonormal basis, it enables a systematic tuning of truncation errors when parameterizing expansion coefficients to firstprinciples training data. In this way, complex energy landscapes as a function of configurational degrees of freedom can be reproduced by fitting to a relatively small number of firstprinciples electronic structure calculations. The approach has enabled accurate firstprinciples predictions of temperaturecomposition phase diagrams,^{23,24,25,26,27,28,29,30,31,32,33,34} order–disorder phenomena,^{35,36,37,38,39,40,41,42,43,44,45} and compositiondependent diffusion coefficients in alloys and complex inorganic compounds.^{46,47,48,49,50,51,52}
A cluster expansion is formulated as a linear series of cluster basis functions multiplied by constant expansion coefficients that are determined by the underlying chemistry and crystal structure of the multicomponent solid. Cluster expansions, while formally exact, must be truncated in practice. Many advanced methods have been developed to aid in the accurate and efficient parameterization of a truncated cluster expansion. These include genetic algorithms to select a cluster basis set,^{53} schemes to reduce overfitting using crossvalidation^{26} and regularizers,^{54} and the use of Bayesian priors to incorporate physical intuition during model development.^{55} Recently, methods have also been developed to determine the ground states of a cluster expansion^{56} and to impose constraints as part of the regression step to ensure that the cluster expansion predicts ground states correctly.^{57}
Here, we build on the cluster expansion approach, but relax the constraint of linearity and leverage advanced machine learning tools such as neural networks and Gaussian process regressions to represent crystal properties that depend on alloy configuration in terms of symmetry invariant descriptors of order. As descriptors, we use sitecentric correlation functions, which are related to the correlation functions introduced by Sanchez and De Fontaine^{58,59} and are at the core of the cluster expansion approach.^{22} We illustrate the method by modeling the formation energies of a synthetic multibody binary Hamiltonian on the FCC crystal and of Livacancy disorder in spinel LiTiS_{2}, a compound that is crystallographically more complex than most, having two symmetrically nonequivalent sites. We find that accurate Hamiltonians can be built with a relatively small number of abinitio calculations and only a few correlation functions as descriptors.
Results
The cluster expansion formalism revisited
We start by reviewing essential ingredients to the cluster expansion approach as applied to a simple binary alloy. A particular ordering of the components of a binary crystal of N sites can be represented as an unrolled vector of occupation variables, \(\vec \sigma = \{ \sigma _1,...,\sigma _i,...,\sigma _N\}\), where σ_{i} is +1 or −1 depending on the occupant of site i. Sanchez et al.^{22} showed that any scalar property of a binary crystal that depends on \(\vec \sigma\), such as its fully relaxed formation energy, can be expressed as an expansion in terms of polynomials of occupation variables according to
where the sum extends over all clusters of sites α within the crystal (e.g., point clusters, pair clusters, triplet clusters, etc.) and where
are cluster functions, defined as the product of occupation variables belonging to the cluster α. Sanchez et al.^{22} showed that the cluster functions Φ_{α} form a complete and orthonormal basis with respect to a particular scalar product defined on the space of configurations \(\vec \sigma\). The expansion coefficients V_{α} in Eq. (1) are constant and are determined by the chemistry and crystal structure of the alloy.
The symmetry of the undecorated parent crystal structure imposes constraints on the expansion coefficients V_{α} in Eq. (1). Any two cluster functions \(\Phi _\alpha (\vec \sigma )\) and \(\Phi _\delta (\vec \sigma )\) that can be mapped onto each other by a space group operation of the crystal must have the same expansion coefficients (i.e., V_{α} = V_{δ}). All cluster functions \(\Phi _\delta (\vec \sigma )\) that are related by a symmetry operation of the crystal to a prototype cluster function \(\Phi _\alpha (\vec \sigma )\) can be grouped together into an orbit of cluster functions \(\Omega _\alpha = \{ \Phi _\alpha (\vec \sigma ),...,\Phi _\delta (\vec \sigma ),...\}\). For example, all cluster functions associated with nearest neighbor pair clusters that are related by a symmetry operation to a prototype nearest neighbor pair cluster belong to the same orbit. For a binary alloy, there exists an orbit of cluster functions for each symmetrically distinct cluster type. The set of all cluster functions can be divided among different orbits Λ = {Ω_{α}, Ω_{β},…} where α, β, etc. correspond to symmetrically distinct cluster prototypes.
Since the expansion coefficients belonging to symmetrically equivalent clusters are all equal to each other, there is only one expansion coefficient V_{α} for each orbit Ω_{α}. This makes it possible to rewrite Eq. (1) as a sum first over orbits followed by a sum over cluster functions within each orbit according to
Eq. (3) can be normalized by the number of atoms within the crystal and recast as
upon introducing correlation functions defined as^{22,23}
where m_{α} is the multiplicity of the cluster per site. A correlation function \(\langle {\mathrm{\Phi }}_\alpha (\vec \sigma )\rangle\) is the average value of the cluster function \({\mathrm{\Phi }}_\alpha (\vec \sigma )\) over the orbit Ω_{α} for the ordering \(\vec \sigma\).
For a binary alloy, each symmetrically distinct cluster type (e.g., nearest neighbor pair cluster, second nearest neighbor pair cluster, nearest neighbor triplet cluster, etc.) has a correlation function \(\langle {\mathrm{\Phi }}_\alpha (\vec \sigma )\rangle\) associated with it. The values of all correlation functions, \(\{ \langle {\mathrm{\Phi }}_\alpha (\vec \sigma )\rangle ,\langle {\mathrm{\Phi }}_\beta (\vec \sigma )\rangle ,...\}\) for a particular ordering \(\vec \sigma\) can serve as a fingerprint of that ordering. Since the correlation functions are averages over all symmetrically equivalent cluster functions, they are invariant to an application of any space group operation of the parent crystal applied to the ordering \(\vec \sigma\). Hence the correlation functions will have the same values for all orderings \(\vec \sigma \prime\) that are related by symmetry to \(\vec \sigma\). They are a measure of a particular state of configurational order on a crystal that is invariant to a space group operation of the underlying crystal.
It is instructive to recast the cluster expanded energy as a sum of site energies. To this end, we define \(\Omega _\alpha ^i\) as the set of all cluster functions, Φ_{δ}, related by symmetry to Φ_{α} in which one of the sites of the cluster δ is site i. \(\Omega _\alpha ^i\) is a subset of Ω_{α} and consists of cluster functions associated with clusters radiating out of site i (Fig. 1). The set of all cluster orbits that radiate from site i will be denoted \(\Lambda ^i = \{ \Omega _\alpha ^i,\Omega _\beta ^i, \cdots \}\), where as previously, the clusters α, β, etc. refer to symmetrically distinct cluster prototypes such as the nearest neighbor pair, the second nearest neighbor pair, etc. In terms of the site orbits, we can rewrite the total energy, Eq. (3), as
where the site energies are defined as
The α, which denotes the number of sites in cluster α, appears in Eq. (6) to avoid overcounting each cluster function \(\Phi _\delta (\vec \sigma )\) when summing Eq. (7) over each site i of the crystal.
Just as the form of the energy expression in Eq. (3) makes clear that the correlation functions defined by Eq. (5) are a measure of the global degree of ordering within the crystal, Eq. (7) for the site energies suggests the importance of local sitecentric correlation functions defined as
in measuring a local degree of ordering relative to site i. Since the sum in Eq. (8) is over all symmetrically equivalent clusters having a site i in common, it is invariant to any change in orientation around site i permitted by the space group of the parent crystal of the local degree of ordering.
Developing features for neural network alloy Hamiltonians
The correlation functions defined by Eqs. (5) and (8) form a set of descriptors of the degree of order over the sites of a binary crystal that are invariant to the translational and orientational symmetries of the underlying parent crystal structure. As first shown by Sanchez et al.^{22} the configurational energy of the crystal can be expressed as a linear expansion of the correlation functions as in Eq. (4), which can trivially be recast into the forms of Eqs. (6) and (7). However, a linear expansion is only guaranteed to be an exact description of the configurational energy if a correlation function is included for every symmetrically distinct cluster type in the crystal. In practice, cluster expansions must be truncated beyond some maximal sized cluster, leading to truncation errors.
Here, we relax the restriction of a linear expansion in terms of correlation functions, and instead allow for a nonlinear dependence of the energy on the correlation functions. Similar to Eq. (6), we express the energy of the crystal as a sum of site energies, but the site energies are now allowed to be a nonlinear function of the local correlation functions defined by Eq. (8) according to
To be tractable, the site energies will only depend on a finite set of local correlation functions corresponding to shortrange and compact clusters. The fact that symmetrically equivalent configurations \(\vec \sigma\) have the same correlations ensures that Eq. (9) is also invariant to the underlying symmetries of the undecorated parent crystal structure and will evaluate to the same energy for all orderings that are equivalent by a space group operation of the crystal.
While the optimal functional dependence of the site energies E_{i} on a finite set of local correlation function descriptors is not a priori clear, it can be learned with a neural network. Neural networks (NN) are powerful machine learning tools that can replicate complex functions of multiple input variables, also called features. Figure 2 schematically shows a neural net that can describe the site energies E_{i} relying on inputs corresponding to the different local correlation functions \(\{ G_\alpha ^i,G_\beta ^i,...\}\). Function choices at each node include rectified linear units (ReLU), sigmoid and hyperbolic tangents.^{60,61}
The neural nets can be trained using firstprinciples energies, \(E(\vec \sigma )\), calculated for a large number of configurations \(\vec \sigma\) within periodic supercells (Fig. 3). Training neural networks to reproduce the local energy can be accomplished by using conventional backpropagation techniques^{62} with the following loss function:
where w and b are the weight and bias parameters within the neural network, respectively. In this study, we used a fullyconnected neural network with three layers consisting of 4, 4, and 2 nodes, respectively. The weights for each network are initialized with values drawn from a uniform distribution as described by Glorot et al.^{63} We use advanced gradient descent techniques such as ADAM^{64} that adaptively change the learning rates for each weight parameter with an initial decay rate of 0.001. Further, we use minibatch training, where the gradients are calculated over a subset of the training data before updating the weights. We then run several epochs (at least 2000) of batch training across our training data set.
Generalization to multicomponent arbitrarily complex crystals
The treatment so far relies on a particular functional form for the cluster basis functions, Eq. (2), and is valid for the simplest binary crystals consisting of only one type of site for alloying. There is some flexibility in the choice of cluster basis functions, which, for a binary system can be expressed more generally as
where ϕ(σ_{i}) represents a function of the occupation variable σ_{i}. For example, the commonly used latticegas Hamiltonian emerges when \(\phi \left( {\sigma _i} \right) = \frac{1}{2}\left( {1 + \sigma _i} \right)\).^{65} Sanchez^{66} has shown how to construct a family of functions ϕ(σ_{i}) that are orthogonal under a particular definition of a scalar product in the discrete occupation variable space. For a ternary system, the occupation variables σ_{i} assume one of three discrete values (e.g., −1, 0, and +1). Furthermore, for a ternary system multiple cluster basis functions exist for each crystallographic cluster of sites α, and take the form
where the \(\phi _{n_i}\left( {\sigma _i} \right)\) refer to one of two site basis functions ϕ_{1} or ϕ_{2} and where \(\vec n\) is a vector collecting the indices, n_{i} specifying the particular site basis function for site i that is to appear in the cluster basis function. As before, symmetry can be applied to a prototype cluster basis function \({\mathrm{\Phi }}_{\alpha ,\vec n}\) to generate all symmetrically equivalent cluster basis functions forming the orbit \({\mathrm{\Omega }}_{\alpha ,\vec n}\). Sitecentric orbits of cluster functions, \({\mathrm{\Omega }}_{\alpha ,\vec n}^i\), can be collected in a similar way as was described for a simple binary system.
Another complexity is that many crystal structures have more than one symmetrically distinct site that can be alloyed. For these crystals, a separate neural network needs to be trained for each symmetrically distinct site.
Case studies
We explore the ability of neural networks to predict configurational formation energies of multicomponent crystals. In the first example, we use a neural network to model the formation energies generated with a synthetic cluster expansion Hamiltonian on a face centered cubic lattice. In the second example, we train a neural network to predict the formation energies of Livacancy disorder over the interstitial sites of spinel Li_{x}TiS_{2}, which contains two symmetrically distinct types of sites that can host Li ions or vacancies.
We generated a synthetic cluster expansion Hamiltonian for the FCC lattice that includes multibody interactions up to four point clusters. We used a latticegas type expansion for the synthetic Hamiltonian (i.e., \(\phi \left( {\sigma _i} \right) = \frac{1}{2}\left( {1 + \sigma _i} \right)\) in Eq. (11)). The expansion coefficients were generated randomly for each cluster and are shown in Fig. 4. These interactions were used to generate a training data set of energies for 1000 randomly generated but symmetrically distinct configurations. This encompasses orderings on supercells up to 10 multiples of the primitive FCC crystal. The energies were input into the ADAM optimizer to estimate parameters for different neural networks having a varying number of local correlation functions as input features. We validated our model against the energies of the 1346 symmetrically distinct configurations with up to 10 volumes of the primitive FCC cell. A comparison of the training, testing, and maximum errors across the linear cluster expansion model and the neural network model is shown in Fig. 5. The neural network consistently performs better in terms of the root mean square error as compared to the linear model, with the two methods converging when all the features of the synthetic cluster expansion are included.
The neural network predicts the overall energy of the test dataset to within an error of 0.006 eV/atom with six local features (one point feature and five pair descriptors) as shown in Fig. 6. The linear regression model with the same number of features has an error of 0.01 eV/atom. Remarkably, the neural network also predicts the overall shape of the convex hull in agreement with that of the synthetic dataset.
We also investigate the ability of a neural network to predict DFT formation energies of lithiumvacancy orderings within a spinel TiS_{2} crystal which contains two distinct Li sites. The spinel primitive cell contains four octahedral interstitial sites and two tetrahedral interstitial sites that can be occupied by Li. The formation energies calculated with density functional theory on 129 symmetrically distinct orderings are shown in Fig. 7.^{67} Since there are two crystallographically distinct sites that can host Livacancy disorder, two independent neural networks are necessary (one for the tetrahedral sites and one for the octahedral sites) to describe the local energy contributions to the total energy of the crystal.
The DFT formation energies of 66 configurations were used as training data while the energies of the remaining 63 orderings were used to test the models. The predictions of the neural network and linear regression for this system are shown in Fig. 7. Both models were only trained with local pair cluster correlations having lengths less than 10 Å. The root mean square error over the training data set is 7 meV/f.u. for the neural network, while a regression model with the same clusters had an error of 65 meV/f.u. The models were tested on a holdout set of 63 formation energies, resulting in a 36 meV/f.u. error for the neural network and a 89 meV/f.u. error for the linear regression model. The maximum training (testing) errors for the neural network and regression are 82 (553) and 331 (570) meV/f.u., respectively. Remarkably, as seen in Fig. 7b, the shape of the convex hull reproduced with the neural network model is almost identical to that predicted with the DFT calculations, while the linear regression model shown in Fig. 7a struggles to reproduce the ground states. The errors of the neural network are a tenth that of an equivalent cluster expansion model with only pair interactions. This is especially remarkable since the neural net input feature vector only has information about pair cluster correlations. The linear regression model can be greatly improved by adding additional multibody clusters. The high quality of the neural network fit using only pair interactions likely stems from the fact that the contributions from the multibody interactions can be approximated within the neural network through nonlinearities in the activation function and the dense connectivity of the layers.
Discussion
We have shown how neural networks can be implemented to describe the formation energy of a multicomponent crystal. Similar to cluster expansion Hamiltonians, it can be generalized to describe any scalar property of a multicomponent crystal, such as its formation energy or volume, as a function of configurational degrees of freedom. The approach relies on local variants of the alloy correlation functions introduced by Sanchez and De Fontaine,^{58,59} which are expressed in terms of site occupation variables that track the chemical occupants at each crystal site. The sitecentric correlation functions serve as elements of the input feature vector of the neural network assigned to each symmetrically distinct site within the parent crystal. They are defined in a way to ensure their invariance to any symmetry operation of the undecorated parent crystal structure. The local features are, therefore, guaranteed to have the same value across all local orderings that are related by a symmetry of the underlying crystal.
Neural networks as a function of the local correlation functions can be viewed as nonlinear extensions of the cluster expansion formalized by Sanchez et al.^{22} As such, they should enable a more rapid convergence than traditional cluster expansions, with contributions from multibody interactions approximated to some degree with nonlinear dependencies on correlations belonging to smaller subclusters (e.g., point and pair clusters). While linear cluster expansions have been augmented by nonlinear functions in the past,^{36,68} the nonlinear terms have predefined functional forms and usually only depend on a global property, such as the concentration of the solid. The present approach relaxes linearity on all local correlation functions and does not presuppose a functional form.
The approach presented here is not limited to neural networkbased tools. Alternative machine learning models such as Gaussian process regression can also be used to estimate sitebased energies. In this method, the sitebased energy can be interpolated using the similarity of an arbitrary local ordering to the points in the training data set. The similarity is estimated using the kernel trick, while comparing the values of the local symmetryadapted cluster functions. The method is similar in spirit to the Gaussian approximation potentials.^{17}
A cluster expansion has a local spatial dependence when it is truncated. Similarly, a neuralnetwork model of alloy properties will also have a local spatial dependence if the feature vector of sitecentric correlation functions is restricted to shortranged and compact clusters. The scalar properties of some materials, however, may have contributions from longrange interactions that cannot be neglected. These include strain effects, which are especially important in spatially inhomogeneous crystals,^{68,69} and electrostatic interactions in ionic crystals. Neuralnetwork alloy Hamiltonians can be adapted to account for longrange interactions by adding additional longrange descriptors in addition to local features. These could include the overall alloy composition and long wavelength Fourier modes of the composition profile.
While a model of the configurational energy should be quantitatively accurate, it must also reproduce important qualitative features including the firstprinciples predicted ground states. Quadratic programming methods were recently introduced by Huang et al.^{57} to enforce ground state constraints as part of the regression scheme to determine cluster expansion interaction coefficients. These included constraints that enforce a positive distance from the convex hull for metastable configurations and negative values for configurations on the hull. Similar constraints can be imposed as part of the construction of neural network models of the configurational energy.
As described by Huang et al.^{57} both the metastability constraint, and the constraints for stable configurations can be summarized as:
where c(σ) has the form:
for metastable configurations, with the sum being over all the convex hull points H, and for stable configurations:
where the sum extends over all the configurations on the hull, except the configuration, \(\vec \sigma\). The loss function of Eq. (10) subject to the ground state constraints, Eqs. (13)–(15), can be achieved with the help of Lagrange multipliers:
where the Lagrange multipliers, \(\lambda _{\vec \sigma }\), are required to be positive. Neural networks can then be constructed using standard backpropagation techniques, with the updates of the Lagrange multipliers performed with projected gradients.
Methods
Local cluster functions around each site were calculated with CASM: a clusters approach to Statistical Mechanics software package.^{30,70,71,72} All graphs were made with the matplotlib^{73} library. The machine learning tools were implemented with TensorFlow.^{74} The neural network fitting code and cluster expansion parameterization will be released in a future version of CASM.^{70}
Data availability
The data for the synthetic FCC cluster expansion, and Li–TiS_{2} are provided as CASM projects in the Supplementary Information. The data points of Fig. 5 are also provided within the same file.
References
 1.
Mueller, T., Kusne, A. G. & Ramprasad, R. Machine learning in materials science: recent progress and emerging applications. Rev. Comput. Chem. 29, 186–273 (2016).
 2.
Ghiringhelli, L. M., Vybiral, J., Levchenko, S. V., Draxl, C. & Scheffler, M. Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015).
 3.
Ward, L. et al. Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys. Rev. B 96, 024104 (2017).
 4.
Seko, A., Hayashi, H., Nakayama, K., Takahashi, A. & Tanaka, I. Representation of compounds for machinelearning prediction of physical properties. Phys. Rev. B 95, 144110 (2017).
 5.
Rupp, M., Tkatchenko, A., Müller, K.R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
 6.
Huo, H. & Rupp, M. Unified representation for machine learning of molecules and crystals. arXiv preprint arXiv:1704.06439 (2017).
 7.
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
 8.
Fischer, C. C., Tibbetts, K. J., Morgan, D. & Ceder, G. Predicting crystal structure by merging data mining with quantum mechanics. Nat. Mater. 5, 641–646 (2006).
 9.
Jain, A. et al. A highthroughput infrastructure for density functional theory calculations. Comput. Mater. Sci. 50, 2295–2310 (2011).
 10.
Curtarolo, S. et al. AFLOWLIB.ORG: a distributed materials properties repository from highthroughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012).
 11.
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with highthroughput density functional theory: the Open Quantum Materials Database (OQMD). JOM 65, 1501–1509 (2013).
 12.
Kirklin, S. et al. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Comput. Mater. 1, 15010 (2015).
 13.
Ghiringhelli, L. M. Towards efficient data exchange and sharing for bigdata driven materials science: Metadata and data formats. npj Comput. Mater 3, 46 (2017).
 14.
Behler, J. & Parrinello, M. Generalized neuralnetwork representation of highdimensional potentialenergy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
 15.
Behler, J. Atomcentered symmetry functions for constructing highdimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).
 16.
Behler, J. Constructing highdimensional neural network potentials: a tutorial review. Int. J. Quantum Chem. 115, 1032–1050 (2015).
 17.
Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).
 18.
Artrith, N., Urban, A. & Ceder, G. Efficient and accurate machinelearning interpolation of atomic energies in compositions with many species. Phys. Rev. B 96, 014112 (2017).
 19.
Artrith, N. & Urban, A. An implementation of artificial neuralnetwork potentials for atomistic materials simulations: Performance for TiO_{2}. Comput. Mater. Sci. 114, 135–150 (2016).
 20.
Artrith, N., Urban, A. & Ceder, G. Constructing firstprinciples phase diagrams of amorphous Li_{x}Si using machinelearningassisted sampling with an evolutionary algorithm. J. Chem. Phys. 148, 241711 (2018).
 21.
Van der Ven, A., Thomas, J. C., Puchala, B. & Natarajan, A. R. Firstprinciples statistical mechanics of multcomponent crystals. Annu. Rev. Mater. Res. 48, 27–55 (2018).
 22.
Sanchez, J., Ducastelle, F. & Gratias, D. Generalized cluster description of multicomponent systems. Phys. A: Stat. Mech. Appl. 128, 334–350 (1984).
 23.
de Fontaine, D. Cluster approach to order–disorder transformations in alloys. Solid State Phys. 47, 33–176 (1994).
 24.
Asta, M., McCormack, R. & Fontaine, D. D. Theoretical study of alloy phase stability in the Cd–Mg system. Phys. Rev. B 48, 748–766 (1993).
 25.
Van der Ven, A., Aydinol, M. K., Ceder, G., Kresse, G. & Hafner, J. Firstprinciples investigation of phasestability in Li_{x}CoO_{2}. Phys. Rev. B 58, 2975–2987 (1998).
 26.
van de Walle, A. & Ceder, G. Automating firstprinciples phase diagram calculations. J. Phase Equilibria 23, 348 (2002).
 27.
Zhou, F., Maxisch, T. & Ceder, G. Configurational electronic entropy and the phase diagram of mixedvalence oxides: The case of Li_{x}FePO_{4}. Phys. Rev. Lett. 97, 155704 (2006).
 28.
Mueller, T. Ab initio determination of structure–property relationships in alloy nanoparticles. Phys. Rev. B 86, 144201 (2012).
 29.
Ravi, C., Panigrahi, B. K., Valsakumar, M. C. & van de Walle, A. Firstprinciples calculation of phase equilibrium of V–Nb, V–Ta, and Nb–Ta alloys. Phys. Rev. B 85, 054202 (2012).
 30.
Puchala, B. & Van der Ven, A. Thermodynamics of the Zr–O system from firstprinciples calculations. Phys. Rev. B Condens. Matter Mater. Phys. 88, 094108 (2013).
 31.
Natarajan, A. R., Solomon, E. L. S., Puchala, B., Marquis, E. A. & Van der Ven, A. On the early stages of precipitation in dilute Mg–Nd alloys. Acta Mater. 108, 367–379 (2016).
 32.
Natarajan, A. R. & Van der Ven, A. Firstprinciples investigation of phase stability in the Mg–Sc binary alloy. Phys. Rev. B 95, 214107 (2017).
 33.
Goiri, J. G. & Van der Ven, A. Phase and structural stability in Ni–Al systems from first principles. Phys. Rev. B 94, 094111 (2016).
 34.
Hart, G. L. et al. Revisiting the revised Ag–Pt phase diagram. Acta Mater. 124, 325–332 (2017).
 35.
Ducastelle, F. Order and Phase Stability in Alloys. Cohesion and Structure (Elsevier, 1991).
 36.
Ozolinš, V., Wolverton, C. & Zunger, A. Cu–Au, Ag–Au, Cu–Ag, and Ni–Au intermetallics: firstprinciples study of temperaturecomposition phase diagrams and structures. Phys. Rev. B 57, 6427 (1998).
 37.
Wolverton, C., Ozolins, V. & Zunger, A. Firstprinciples theory of shortrange order in sizemismatched metalalloys: Cu–Au, Cu–Ag, and Ni–Au. Phys. Rev. B 57, 4332–4348 (1998).
 38.
Van de Walle, A. & Asta, M. Firstprinciples investigation of perfect and diffuse antiphase boundaries in HCPbased Ti–Al alloys. Metall. Mater. Trans. A 33, 735–741 (2002).
 39.
Ghosh, G., de Walle, A. V. & Asta, M. Firstprinciples calculations of the structural and thermodynamic properties of bcc, fcc and hcp solid solutions in the Al–TM (TM = Ti, Zr and Hf) systems: A comparison of clusterexpansion and supercell methods. Acta Mater. 56, 3202–3221 (2008).
 40.
Predith, A., Ceder, G., Wolverton, C., Persson, K. & Mueller, T. Ab initio prediction of ordered groundstate structures in ZrO_{2}–Y_{2}O_{3}. Phys. Rev. B 77, 144104 (2008).
 41.
Seko, A., Koyama, Y. & Tanaka, I. Cluster expansion method for multicomponent systems based on optimal selection of structures for densityfunctional theory calculations. Phys. Rev. B 80, 165112 (2009).
 42.
Kim, H. et al. Structural order–disorder transitions and phonon conductivity of partially filled skutterudites. Phys. Rev. Lett. 105, 265901 (2010).
 43.
Cao, L. & Mueller, T. Rational design of Pt_{3}Ni surface structures for the oxygen reduction reaction. J. Phys. Chem. C 119, 17735–17747 (2015).
 44.
Decolvenaere, E., Gordon, M. J. & Van der Ven, A. Testing predictions from density functional theory at finite temperatures: βlike ground states in CoPt. Physical Review B 92 (2015)..
 45.
Natarajan, A. R., Thomas, J. C., Puchala, B. & Van der Ven, A. Symmetryadapted order parameters and free energies for solids undergoing order–disorder phase transitions. Phys. Rev. B 96, 134204 (2017).
 46.
Van der Ven, A., Ceder, G., Asta, M. & Tepesch, P. D. Firstprinciples theory of ionic diffusion with nondilute carriers. Phys. Rev. B 64, 184307 (2001).
 47.
Van der Ven, A. & Ceder, G. First principles calculation of the interdiffusion coefficient in binary alloys. Phys. Rev. Lett. 94, 045901 (2005).
 48.
Van der Ven, A., Thomas, J., Xu, Q., Swoboda, B. & Morgan, D. Nondilute diffusion from first principles: Li diffusion in Li_{x}TiS_{2}. Phys. Rev. B 78, 104306 (2008).
 49.
Van der Ven, A., Yu, H. C., Ceder, G. & Thornton, K. Vacancy mediated substitutional diffusion in binary crystalline solids. Prog. Mater. Sci. 55, 61–105 (2010).
 50.
Xu, Q. & Van der Ven, A. Atomic transport in ordered compounds mediated by local disorder: diffusion in b2Ni_{x}Al_{1−x}. Phys. Rev. B 81, 064303 (2010).
 51.
Bhattacharya, J. & Van der Ven, A. Firstprinciples study of competing mechanisms of nondilute Li diffusion in spinel Li_{x}TiS_{2}. Phys. Rev. B 83, 144302 (2011).
 52.
Van der Ven, A., Bhattacharya, J. & Belak, A. A. Understanding Li diffusion in Liintercalation compounds. Acc. Chem. Res. 46, 1216–1225 (2013).
 53.
Hart, G. L. W., Blum, V., Walorski, M. J. & Zunger, A. Evolutionary approach for determining firstprinciples hamiltonians. Nat. Mater. 4, 391–394 (2005).
 54.
Nelson, L. J., Ozolinš, V., Reese, C. S., Zhou, F. & Hart, G. L. W. Cluster expansion made easy with Bayesian compressive sensing. Phys. Rev. B 88, 155105 (2013).
 55.
Mueller, T. & Ceder, G. Bayesian approach to cluster expansions. Phys. Rev. B 80, 024103 (2009).
 56.
Huang, W. et al. Finding and proving the exact ground state of a generalized Ising model by convex optimization and MAXSAT. Phys. Rev. B 94, 134424 (2016).
 57.
Huang, W. et al. Construction of groundstate preserving sparse lattice models for predictive materials simulations. npj Comput. Mater. 3, 30 (2017).
 58.
Sanchez, J. M. & De Fontaine, D. The fcc Ising model in the cluster variation approximation. Phys. Rev. B 17, 2926 (1978).
 59.
Sanchez, J. M. & De Fontaine, D. Ising model phasediagram calculations in the fcc lattice with first and secondneighbor interactions. Phys. Rev. B 25, 1759 (1982).
 60.
Nair, V. & Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML10), 807–814 (2010).
 61.
Montavon, G., Orr, G. B. & Müllerin K.R. (Eds). Neural networks: tricks of the trade. In Lecture Notes in Computer Science, Vol. 7700 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012).
 62.
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
 63.
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249–256 (2010).
 64.
Kingma, D. P. & Ba, J. ADAM: A Method for Stochastic Optimization (2015).
 65.
Inden, G. Atomic ordering. In Phase Transformations in Materials (ed. Kostorz, G.) 519–582 (WileyVCH, 2001).
 66.
Sanchez, J. M. Cluster expansion and the configurational theory of alloys. Phys. Rev. B 81, 224202 (2010).
 67.
Kolli, S. K. & Van der Ven, A. First principles study of spinelMgTiS_{2} as a cathode material. Chem. Mater. 30, 2346–2442 (2018).
 68.
Laks, D. P., Ferreira, L. & Zunger, A. Efficient cluster expansion for substitutional systems. Phys. Rev. B 46, 12587–12605 (1992).
 69.
Khachaturyan, A. G. Theory of Structural Transformations in Solids. (John Wiley & Sons Inc., New York, 1983).
 70.
CASM Developers. CASM: A Clusters Approach to Statistical Mechanics (2016).
 71.
Thomas, J. C. & der Ven, Van A. Finitetemperature properties of strongly anharmonic and mechanically unstable crystal phases from first principles. Phys. Rev. B 88, 214111 (2013).
 72.
Van der Ven, A., Thomas, J. C., Xu, Q. & Bhattacharya, J. Linking the electronic structure of solids to their thermodynamic and kinetic properties. Math. Comput. Simul. 80, 1393–1410 (2010).
 73.
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
 74.
Abadi, M. et al. Tensorflow: largescale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
Acknowledgements
The authors are grateful to Sanjeev Kolli for providing the firstprinciples data of Livacancy orderings in LiTiS_{2}. This work was carried out under an NSF DMREF grant: DMR1436154 “DMREF: Integrated Computational Framework for Designing Dynamically Controlled AlloyOxide Heterostructure”. Computing resources were provided by the Center for Scientific Computing at the CNSI and MRL under NSF CNS1725797 and NSF MRSEC (DMR1720256).
Author information
Affiliations
Contributions
A.R.N. and A.V.d.V. conceived and designed the project. A.R.N. implemented the formalism and performed the calculations. Both authors were involved in the writing of the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Natarajan, A.R., Van der Ven, A. Machinelearning the configurational energy of multicomponent crystalline solids. npj Comput Mater 4, 56 (2018). https://doi.org/10.1038/s415240180110y
Received:
Revised:
Accepted:
Published:
Further reading

Machine Learning Regression Algorithm Predicts Multicomponent Crystal Configuration Energy
Journal of Physics: Conference Series (2021)

Modeling the hightemperature phase coexistence region of mixed transition metal oxides from ab initio calculations
Physical Review Research (2021)

Training sets based on uncertainty estimates in the clusterexpansion method
Journal of Physics: Energy (2021)

Bioinspired Energy Storage and Harvesting Devices
Advanced Materials Technologies (2021)

Bandgap Engineering in the Configurational Space of Solid Solutions via Machine Learning: (Mg,Zn)O Case Study
The Journal of Physical Chemistry Letters (2021)