Towards Accurate Prediction of Configurational Disorder Properties in Materials using Graph Neural Networks

The prediction of configurational disorder properties, such as configurational entropy and order-disorder phase transition temperature, of compound materials relies on efficient and accurate evaluations of configurational energies. Previous cluster expansion methods are not applicable to configurationally-complex material systems, including those with atomic distortions and long-range orders. In this work, we propose to leverage the versatile expressive capabilities of graph neural networks (GNNs) for efficient evaluations of configurational energies and present a workflow combining attention-based GNNs and Monte Carlo simulations to calculate the disorder properties. Using the dataset of face-centered tetragonal gold copper without and with local atomic distortions as an example, we demonstrate that the proposed data-driven framework enables the prediction of phase transition temperatures close to experimental values. We also elucidate that the variance of the energy deviations among configurations controls the prediction accuracy of disorder properties and can be used as the target loss function when training and selecting the GNN models. The work serves as a fundamental step toward a new data-driven paradigm for the accelerated design of configurationally-complex functional material systems.


Introduction
Disordered materials have attracted much attention in the community in recent years due to their exotic structural and electronic properties such as Anderson localization and Mott-like conduction [1][2][3], novel phonon scattering channels and lattice dynamics [4][5][6], enhanced ductility and mechanical strength over a wide temperature range in high-entropy alloys [7][8][9], and regulated electronic states and atomic sites for catalysts [10,11], endowing them with promising applications in electronic devices and energy materials.Depending on its chemical nature, disorder effects in materials can be classified into structural disorder characterized by disrupted chemical bonding network (such as vacancies, dislocations, and dangling bonds) [12], and configurational (compositional) disorder characterized by crystallographic sites being occupied by irregular atomic species [13].In this work, we focus on the latter type of disorder.
To numerically access the configurational disorder properties, such as the orderdisorder phase transition temperature [14,15] and the configurational entropy [16,17], Monte Carlo (MC) simulations are often carried out with Metropolis sampling [18] or Wang-Landau sampling [19,20].With the Metropolis sampling, the free energy is obtained by performing thermodynamic integration numerically using the average energies at each temperature.As for the Wang-Landau sampling, the density of states, instead of the average energies, is estimated, and the configurational entropy and the heat capacity at any arbitrary temperature can then be evaluated.However, both sampling methods require evaluating a large number of supercell configurational energies efficiently and accurately in order to achieve convergence, thus not applicable to first-principles methods especially when the cell size is very large.
One commonly used approach to this problem is the cluster expansion (CE) method [21][22][23][24][25][26], where the cell is decomposed into different atomic clusters up to a cutoff size, and the total energy of the cell is expanded into the effective cluster interactions of these clusters.This method has been applied to evaluate the total energies of different configurations and further to calculate the disorder properties effectively.However, it suffers from several limitations.Due to the limited cluster size in practice, it cannot capture long-range orders in materials, which may affect the phase stability and electronic structures [27].Besides, the definitions of clusters strongly depend on the atomic positions, restricting this method from adapting to lattice distortions and local atomic displacements induced by atomic relaxations or thermal effects [28].
Since 2018, graph neural networks (GNNs) have been applied to studying the structure-property correlation in complex solid-state materials.Instead of relying on human-selected descriptors, GNNs can autonomously learn latent representations of materials and make fast and accurate atom-, bond-, and material-level predictions.[29][30][31] Therefore, in this work, we propose to employ GNNs to evaluate the configurational energies and to access the disorder properties in configurationally-complex compound materials accurately, because of their high representation capability and versatile adaptability to realistic simulation scenarios, including lattice distortions, atomic displacements, and various types of defects [30,31].Especially, we construct attention-based GNN models from Transformer neural networks [32], leveraging masked self-attentional layers to obtain configurational energies efficiently.The model is trained on the face-centered tetragonal (fct) gold copper (Au x Cu 1−x ) alloy dataset obtained from density functional theory (DFT) calculations.The trained model can well reproduce the DFT configurational energies, with a mean absolute error (MAE) of 2.76 meV/atom, which eventually leads to the prediction of order-disorder phase transition temperature that is comparable to experimental observation.When random atomic displacements are introduced, which is beyond the capability of the CE method, the GNN model can still evaluate the configurational energies accurately (with MAE being 5.02 meV/atom).The predicted phase transition temperature is slightly lower than the undistorted case, suggesting structural disorder can enhance the configurational disorder.Furthermore, we reveal the connections between the variance of the energy deviations among configurations and the accuracy of configurational entropy and heat capacity predictions, providing new guidance on future data-driven studies of the configurational disorder properties in materials.
Fig. 1 A schematic plot of the workflow to use GNN to calculate the disorder properties of compound materials.The input crystal structure is converted to a graph, whose nodes and edges represent the atomic species and the interatomic distances, respectively.Through several attention-based convolution layers, where the node features aggregate and interact with each other, global features are extracted and further processed by linear layers to predict the energy.The well-trained GNN model is subsequently utilized in Monte Carlo simulations to acquire the final disorder-related properties, such as the configurational density of states, configurational entropy, and heat capacity.

Pristine Au x Cu 1−x Structure Dataset
We choose fct AuCu to construct our dataset.Although both experimental and numerical results on the configurational entropy and the phase transition temperature (683 K) were reported [25,33], those results are based on the face-centered cubic (fcc) structure, which is higher in energy than the ground-state fct structure by 0.016 eV per formula unit and can only be stable under pressure.However, these two structures differ only in the lattice constants (∆a/a = 6.9% and ∆c/c = 15%), and thus we expect their disorder properties, especially the phase transition temperature, to be similar.Using first-principles calculations, we construct our dataset for pristine Au  .The MAE of the best GNN model built upon Transformer convolution layers on the testing set of our fct Au x Cu 1−x dataset is 2.76 meV/atom, similar to previously reported cluster expansion methods on fcc-structure AuCu (4.49 meV/atom) [25].The comparison between the predicted energy and the DFT energy is shown in Fig. 2(b), indicating the capability of our attention-based GNN model to evaluate the configurational energies accurately for all compositions of Au x Cu 1−x .
We compare this result with other commonly used GNNs with attention mechanism, such as the graph attention network [34,35], and those without attention mechanism, including the crystal graph neural network (CGNN) [36] and the edgeconditioned neural network (ECNN) [37,38] (details about these convolution layers can be found in Supplementary Note S1).The MAEs of the best GNN model built for each type of convolution layers are summarized in Tab. 1, suggesting that GNN models with the attention mechanism perform better than those lacking this mechanism when evaluating configurational energies.In attention-based neural networks, node features are updated by summing up adjacent node features weighted by the attention coefficients, calculated from the node features and the edge features and thus containing the information on the similarity between the two nodes.Specifically in Transformer convolution layers, the central node and the adjacent nodes are considered as queries and keys respectively, and the overlap between the queries and the keys reflects the similarity between the two nodes (atoms).This attention mechanism updates the central node features by a linear combination of adjacent node features weighted by the overlap between the query vector and key vectors in the latent space, thus allowing the network to effectively capture the chemical distinctions between neighboring atoms and increasing its efficacy in predicting properties related to crystal structure.Besides, since the attention coefficients depend on the node features, they are dynamic and can vary across different convolution layers, as compared to the static weight matrices that are fixed across the convolution layers in models without the attention mechanism.Therefore, we expect that in general models with the attention mechanism are better than those without it.Finally, we also observe the superior performance of the ECNN model compared to the CGNN model, possibly because the ECNN model adopts a multilayer perceptron layer to process edge features while the CGNN model employs only a linear transformation on edge features.This distinction makes the former more adaptable in evaluating configurational energies.1 The MAE (meV/atom) and the variance (meV 2 /atom) on the pristine AuxCu 1−x datasets using different types of graph neural network models.

GAT
We then apply the optimal GNN model based on Transformer attention mechanism to MC simulations to obtain the configurational properties of stoichiometric AuCu systems.Because the configurational space is complicated for such large supercells, we choose a bin size of 0.2 eV in the Wang-Landau sampling method to expedite convergence.Since the MAE of the trained GNN model is around 2.76 meV/atom, we anticipate that each configuration has an energy deviation of 0.6 eV/cell on average, justifying our choice of bin size.With this choice of the bin size, the MC simulation takes around 4 × 10 7 steps to converge, beyond the capability of DFT methods.We show the intermediate and final density of states that are fitted to a Gaussian distribution in Fig. 2(c).The density of states is normalized such that the sum of the density of states equals to one; while the normalization constant is the total number of configurations of stoichiometric AuCu in a 200-atom supercell, i.e., Ω max = 100 200 .The peak of the density of states is higher in energy than the ground state by 3.2 eV/cell.From the density of states, we can calculate the configurational entropy and the heat capacity according to Eq. 4, shown in Fig. 2(d).The position of the heat capacity peak, indicating the order-disorder phase transition temperature, is at 870 K (for fct AuCu), similar to the experimental values.As temperature further increases, the configurational entropy gradually approaches the theoretical limit for the fully disordered phase (within a 200-atom supercell) S max = k B ln Ω max = 1.17 × 10 −2 eV/K.
We also note that while training and selecting the best GNN model for MC simulations, batch normalization features can affect the predictions of disorder properties significantly and should be disallowed.When training GNN models, configurations are grouped into batches of a reasonable size to accelerate the training process.In general, too small batches can lead to instability of gradients and optimization process, while too large batches can lead to huge memory requirements and possible overfitting.However, in MC simulations, energy evaluations are performed on a single configuration consecutively.When using trained models with batch normalized features to predict the energy of one configuration, incorrect predictions may arise because in this case the batch mean and batch variance are highly influenced by the specific configuration in the batch, but not reflecting the distribution of the features of the whole dataset.In Supplementary Note S2, we show benchmark calculations for the GNN model with batch normalization features.By evaluating the configurations in batches (containing 58 configurations), the MAE of the best model is 3.52 meV/atom, but when evaluated consecutively, the MAE loss for the model increases drastically to 2.76 eV/atom, and the corresponding prediction on the order-disorder phase transition is incorrect (183 K).

Distorted Au x Cu 1−x Structure Dataset
To showcase the adaptability of GNN models to realistic simulation scenarios, such as atomic relaxations where cluster expansion methods face convergence challenges, we constructed another dataset containing 4500 configurations with random atomic displacements.The maximum amount of displacement for each atom is chosen to be 0.35 Å.
The MAE on the testing set of this dataset containing distorted structures is 6.43 meV/atom, and the comparison between the DFT and GNN energies is shown in Fig. 3(a).Larger deviations from DFT energies are primarily observed in configurations with lower Au concentration.We further apply the trained model to calculate the disorder properties, shown in Fig. 3(b).We generate one supercell with random atomic displacements and apply it for MC simulations while keeping all the atomic positions fixed.The calculated phase transition temperature is 788 K, lower than that for the pristine structure, suggesting that structural disorder can enhance the configurational disorder; the introduction of structural disorder increases the likelihood of the material being in the configurational disordered phase.
A potential avenue for enhancing the performance of GNN models for energy predictions of disordered systems with distorted structures lies in the modification of how crystal structures are represented in the form of graphs.In our current method, the nodes represent the atomic species and the edges contain only bond distance information.But other local chemical information can also be included in the graphs, such as the directional information (for instance, bond directions expanded in spherical harmonic coefficients) and the multiplet interactions that can be included in hypergraphs [39] or higher-order graphs [40].These graph structures may capture the much more complex chemical environment of each atom after introducing random atomic displacements.An alternative avenue for improvement is to modify the network structure.For example, the Au concentration can be treated as a global feature for each configuration.This feature could be concatenated with the global features from the nodes and jointly passed through the linear layers to determine the total energy of the configuration.Despite room for these improvements, our current model already suffices for the effective predictions of disorder properties of complex alloy systems with distorted structures.

Discussions
Next, we would like to draw the reader's attention to the correlation between disorder properties and the optimization process of GNN models.From Eq. 4, it can be shown that neither the heat capacity nor the configurational entropy are affected by an overall energy shift ∆E to all configurations (see Supplementary Note S3).Based on this observation, we postulate that when training and selecting the best GNN model to predict configurational entropy and heat capacity, the variance of the differences between E DFT and E GNN among all configurations, instead of the mean of the energy differences, controls the accuracy of predictions and thus can be chosen as the target quantity for optimization.
In the following, we denote Êi as the energy predicted by GNN models and E i as the "ground-truth" energy obtained by DFT for configuration i, and their difference as e i = Êi − E i (in the following, thermodynamic quantities with hats are calculated from Êi , and those without hats are from E i ).We assume that those energy differences are independent and identically distributed random variables, following the normal distribution N (µ, σ 2 ), with mean µ and variance σ 2 ; thus we have E[e i ] = µ, E[e 2 i ] = µ 2 + σ 2 , and E[e i e j ] = µ 2 for all i and j ̸ = i.
To derive the expected deviations of configurational entropy and the heat capacity due to e i 's, we first focus on the difference of the log-partition-function, defined as ∆(ln Z) ≡ ln Ẑ − ln Z = ln i e −β Êi − ln i e −βEi .By assuming that the errors e i are smaller than the energies E i , we can perform Taylor expansions on the logsum-exp function ln i e −β Êi = ln i e −β(Ei+ei) with respect to e i .As shown in the Supplementary Note S3, after taking the expectation value with respect to the random .variables e i , the first-order term vanishes, and we have where α = i e −2βE i ( i e −βE i ) 2 .Based on this result, according to Eq. ( 4), the expected deviation of the configurational entropy due to the random energy deviations e i , defined as where we use the fact that α, though containing β, does not depend on β explicitly.Similarly, the expected deviation of the heat capacity is given by The derivation details can be found in Supplementary Note S3.Therefore, both E[∆S] and E[∆C v ] are linearly dependent on the variance σ 2 , but independent on the mean µ, suggesting that the variance σ 2 of the energy deviations e i among the configurations must be minimized to accurately predict configurational entropy and heat capacity.
To numerically verify this, we train our GNN model that is based on Transformer convolution layers on the pristine Au x Cu 1−x dataset, using the variance as the target loss function.The variance of the optimal model on the testing set is 15.82 meV 2 /atom, but the MAE is 4.89 eV/atom; as a comparison, the variance of our previous model using MSE as the loss function is 13.2 meV 2 /atom and the MAE loss is 2.76 meV/atom.In Fig. 3(a), we show the comparison between the configurational energies predicted by GNN and DFT, suggesting a global shift of energies for all configurations.We then apply this model to MC simulations, and the predicted ordering-disorder phase transition temperature is 837 K, in good agreement with previous results using MSE as the loss function (870 K).Therefore, as long as the variance is minimized, the predicted configurational entropy and the heat capacity are reliable.
In conclusion, we demonstrate the capability of GNN with an attention mechanism to accurately predict the configurational disorder properties in compound materials, including configurational entropy and order-disorder phase transition temperature.Using the face-centered tetragonal Au x Cu 1−x dataset as an example, the predicted phase transition temperature from attention-based GNN models and MC simulations is close to that obtained in experiments.Even when random atomic displacements are introduced, reliable predictions are still achievable.Furthermore, we show that the variance of the configurational energy deviations between GNN and DFT controls the prediction accuracy of these disorder-related properties.These results provide new perspectives on the efficient and accurate evaluation of disordered properties in configurationally complex materials.This contributes significantly to future research focused on the phase stability of such materials and advances the exploration of highentropy alloys and related material systems.

Density Functional Theory Calculations
The first-principles calculations are performed based on DFT, as implemented in Vienna Ab initio Simulation Package [41,42].We use the projector augmented wave pseudopotentials [43,44], where 5d and 6s electrons are treated as valence electrons for Au and 3d and 4s electrons are treated as valence electrons for Cu.We used the GGA-PBE functional for all calculations [45] and 550 eV for the kinetic energy cutoff of the plane-wave basis sets.The k-point grid density is taken to be 0.03 2π/ Å, and the energy convergence threshold is 10 −7 eV.
After variable-cell relaxation, the ground state structure of stoichiometric AuCu is the face-centered-tetragonal structure.The relaxed lattice constants are a = 2.86 Å and c = 3.55 Å.From the relaxed unit cell, we generate the 5 × 5 × 4 supercell, such that the lattice constant along each direction is close to 15 Å.Using the supercell, we generate the Au x Cu 1−x dataset covering the whole concentration range 0 < x < 1.

Constructing and Training GNN models
In each layer of a GNN, the node features are updated by the features of neighboring nodes according to x i = ϕ(x i , x N (i) ), where x i is the feature of the ith node and N (i) is the adjacent nodes of i.This message aggregation process is repeated multiple times as the convolution layers are stacked, allowing GNNs to capture the long-range interactions in the crystal.Depending on the aggregate function ϕ, various types of GNNs are proposed.In this work, we choose the attention-based Transformer network [32] to construct our GNN and generate the main results in the manuscript.
For each layer, we use global mean pooling to extract the global graph feature from all nodes, as we choose the total energy per atom as the target quantity to predict.These global features are added together, forming shortcut connections that allow the gradient to flow more easily during training (benchmark results using only the global features from the last convolution layer can be found in the SM).Finally, the global features are passed to two fully-connected linear layers to predict the total energy per atom.
For training the GNN models, we shuffle the dataset and divide it into the training set, the validation set, and the testing set, allocating them in a 60:20:20 ratio.Unless otherwise stated, the target loss function is the MSE.The validation set is used to prevent overfitting on the training set.While training the GNN models, we keep track of the validation loss and use the model with the smallest validation loss as our final model.
To choose the optimal set of hyperparameters, we use the Bayesian optimization method as implemented in Optuna [46], which calculates the expected improvement of the current set of hyperparameters based on results from previous trial runs using Tree-structured Parzen Estimator method [47].The convergence is achieved when no significantly different hyperparameter values are proposed for consecutive 10 trial runs.The hyperparameter set includes the number of convolution layers, the number of hidden channels, the number of attention heads (used only in GNN models with attention mechanisms), learning rate, weight decay, and batch size.

Monte Carlo Simulations
MC simulations with the Wang-Landau sampling method [19] are then carried out to obtain the density of states g(E), with the configurational energies evaluated from the trained GNN models.The flatness criterion is achieved when the minimum value of the histogram is no smaller than 80% of the mean value, and the convergence is achieved when the modification factor satisfies ln f < 10 −7 .Since the configurational space of the supercell is complicated, at least around 10 8 configurations are necessary for the MC simulation to converge, beyond the capabilities of first-principles methods.
From the density of states, the configurational entropy and the heat capacity at an arbitrary temperature are calculated as where the partition function is Z = g(E)e −βE dE, the free energy is F = −k B T ln Z, the expected value for quantity Q is ⟨Q⟩ = 1 Z g(E)Q(E)e −βE dE, and β = 1 k B T .

Fig. 2
Fig. 2 The performance of the Transformer-based GNN model on the pristine AuxCu 1−x structure dataset.(a) The histogram of the number of configurations for each Au concentration.(b) The comparison between the configurational energies (per atom) predicted from DFT (E DFT ) and the optimal GNN model (E GNN ).The color of each point represents the concentration x of Au atoms in the configuration.(c) The calculated normalized density of states of pristine AuCu during the MC simulation process (per cell).The total number of MC steps is 4 × 10 7 .(d) The calculated configurational entropy (black curve) and the heat capacity (red curve) at different temperatures.

Fig. 3
Fig. 3 The performance on the dataset with random atomic displacements, using MSE as the target loss function.(a) The comparison between the configurational energies (per atom) obtained from DFT (E DFT ) and the optimal GNN model (E GNN ).(b) The calculated configurational entropy (black curve) and the heat capacity (red curve) at different temperatures.(c, d) Same as (a, b), but on a dataset without displacements and using variance as the target loss function.