Atomistic Line Graph Neural Network for improved materials property predictions

Choudhary, Kamal; DeCost, Brian

doi:10.1038/s41524-021-00650-1

Download PDF

Article
Open access
Published: 15 November 2021

Atomistic Line Graph Neural Network for improved materials property predictions

npj Computational Materials volume 7, Article number: 185 (2021) Cite this article

23k Accesses
176 Citations
27 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 28 October 2022

This article has been updated

Abstract

Graph neural networks (GNN) have been shown to provide substantial performance improvements for atomistic material representation and modeling compared with descriptor-based machine learning models. While most existing GNN models for atomistic predictions are based on atomic distance information, they do not explicitly incorporate bond angles, which are critical for distinguishing many atomic structures. Furthermore, many material properties are known to be sensitive to slight changes in bond angles. We present an Atomistic Line Graph Neural Network (ALIGNN), a GNN architecture that performs message passing on both the interatomic bond graph and its line graph corresponding to bond angles. We demonstrate that angle information can be explicitly and efficiently included, leading to improved performance on multiple atomistic prediction tasks. We ALIGNN models for predicting 52 solid-state and molecular properties available in the JARVIS-DFT, Materials project, and QM9 databases. ALIGNN can outperform some previously reported GNN models on atomistic prediction tasks with better or comparable model training speed.

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

De novo design of protein structure and function with RFdiffusion

Article Open access 11 July 2023

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Introduction

Graphs are a powerful non-Euclidean data structure method for establishing relationships between features (nodes) and their relationships (edges)^1,2. Graph neural networks (GNN)^3,4 have immense potential for modeling complex phenomena. Common applications of GNNs include community detection and link prediction in social networks^5,6, functional time series on brain structures⁷, gene DNA on regulatory networks⁸, information flow through telecommunications networks⁹, and property prediction for molecular and solid materials¹⁰. From a quantum chemistry point of view, GNNs provide a unique opportunity to predict properties of solids, molecules, and proteins in a much faster way rather than by solving the computationally expensive Schrodinger equation^11,12,13,14.

There has been rapid progress in the development of GNN architectures for predicting material properties such as SchNet¹⁰, Crystal Graph Convolutional Neural Networks (CGCNN)¹⁵, MatErials Graph Network (MEGNet)¹⁶, improved Crystal Graph Convolutional Neural Networks (iCGCNN)¹⁷, OrbNet¹⁸, and similar variants^{19,20,21,22,23,24,25,26,27,28,29,30,31}. This family of models represents a molecule or crystalline material as a graph with one node for each constituent atom and edges corresponding to interatomic bonds. A common theme is the use of elemental properties as node features and interatomic distances and/or bond valences as edge features. Through multiple layers of graph convolution updating node features based on their local chemical environment, these models can implicitly represent many-body interactions. However, many important material properties (especially electronic properties such as band gaps) are highly sensitive to structural features such as bond angles and local geometric distortions. It is possible that these models are not able to efficiently learn the importance of such many-body interactions. Explicit inclusion of angle-based information has already been shown to improve models with hand-crafted features such as classical force-field inspired descriptors (CFID)³². Recently, there has been growing interest in the explicit incorporation of bond angles and other many-body features^17,19,20.

In this work, we use line graph neural networks inspired by those proposed in ref. ⁶ to develop an alternative way to include angular information to provide high accuracy models. Briefly, the line graph L(g) is a graph derived from another graph g that describes the connectivity of the edges in g. While the nodes of an atomistic graph correspond to atoms and its edges correspond to bonds, the nodes of an atomistic line graph correspond to interatomic bonds and its edges correspond to bond angles. Our model alternates between graph convolution on these two graphs, propagating bond angle information through interatomic bond representations to the atom-wise representations and vice versa. We use both the bond distances and angles in the line graph to incorporate finer details of atomic structure which leads to higher model performance. Our Atomistic Line Graph Neural Network (ALIGNN) models are implemented using the deep graph library (DGL)³³ which allows efficient construction and neural message passing for different types of graphs. ALIGNN is a part of the Joint Automated Repository for Various Integrated Simulations (JARVIS) infrastructure³⁴. We train ALIGNN models for several crystalline material properties from JARVIS-density functional theory (DFT)^{34,35,36,37,38,39,40,41,42,43,44} and Materials project⁴⁵ (MP) datasets as well as molecular properties from QM9⁴⁶ database.

Results and discussion

Atomistic graph representation

ALIGNN performs Edge-gated graph convolution⁴ message passing updates on both the atomistic bond graph (atoms are nodes, bonds are edges) and its line graph (bonds are nodes, bond pairs with one common atom are edges). The Edge-gated graph convolution variant has the distinct advantage of updating both node and edge features. Because each edge in the bond graph directly corresponds to a node in the line graph, ALIGNN can aggregate features from bond pairs to efficiently update atom and bond representations by alternating between message passing updates on the bond graph and its line graph.

For crystals, we use a periodic 12-nearest-neighbor graph construction. We expand this nearest-neighbor graph to include edges to all atoms in the neighbor shell of the 12th-nearest neighbor. Each node in the atomistic graph is assigned 9 input node features based on its atomic species: electronegativity, group number, covalent radius, valence electrons, first ionization energy, electron affinity, block, and atomic volume. This feature set is inspired by the CGCNN¹⁵ model. The initial edge features are interatomic bond distances. We use a radial basis function (RBF) expansion with support between 0 and 8 Å for crystals and up to 5 Å for molecules. This undirected graph then can be represented as G = (υ, є) where υ are nodes and є are edges i.e., a collection of (υ_i, υ_j) linking vertices from υ_i to υ_j. G has an associated node feature set H = {h₁, …, h_N), where h_i is the feature vector associated with node υ_i.

Atomistic line graph representation

The atomistic line graph is derived from the atomistic graph. Each node in the line graph corresponds to an edge in the original atomistic graph; both entities represent interatomic bonds, and in our work, they share latent representations. Edges in the line graph correspond to triplets of atoms or pairs of interatomic bonds. The initial line graph edge features are an RBF expansion of the bond angle cosines: ${\uptheta} = {\it{{{{\mathrm{arccos}}}}}}( {\frac{{r_{ij} \cdot r_{jk}}}{{\left| {r_{ij}} \right|r_{jk}}}})$, where r_ij and r_jk are atomic displacement vectors between atoms i, j, and k. A schematic of an atomistic graph and corresponding atomistic line graph is shown in Fig. 1. To avoid ambiguity between the node and edge features of the atomistic graph and its line graph, we write atom, bond, and triplet representations as h, e, and t.

**Fig. 1: Schematic showing undirected crystal graph representation and corresponding line graph construction for a SiO₄ polyhedron.**

Edge gated graph convolution

ALIGNN uses Edge-gated graph convolution⁴ convolution for updating both node and edge features. This convolution is similar to the CGCNN update, except that edge features are only incorporated into normalized edge gates. Furthermore, edge gated graph convolution uses the pre-aggregated edge messages to update the edge representations.

Edge gated graph convolution updates node representations h^l from layer l according to the formula:

$$h_i^{l + 1} = f\left( {h_j^i\left\{ {h_j^i} \right\}_{j \in N_i}} \right)$$

(1)

$$h_i^{l + 1} = h_i^l + {{{\mathrm{SiLU}}}}\left( {{{{\mathrm{Norm}}}}\left( {W_{src}^lh_i^l + \mathop {\sum}\limits_{j \in N_i} {\hat e} _{ij}^lW_{dst}^lh_j^l} \right)} \right)$$

(2)

$$\hat e_{ij}^l = \frac{{\sigma (e_{ij}^l)}}{{\mathop {\sum}\nolimits_{k \in N_i} {\sigma (e_{ik}^l) + {\it{ \in }}} }}$$

(3)

$$e_{ij}^l = e_{ij}^{l - 1} + {{{\mathrm{SiLU}}}}( {{{{\mathrm{Norm}}}}( {A^lh_i^{l - 1} + B^lh_j^{l - 1} + C^le_{ij}^{l - 1}})})$$

(4)

The edge messages in this Eq. (4) are equivalent to the gating term in the CGCNN update¹⁵, which coalesces the weight matrices A, B, and C into W_gate, and the augmented edge representation

$$z_{ij} = h_i \oplus h_j \oplus e_{ij}$$

(5)

$$e_{ij}^l = e_{ij}^{l - 1} + {{{\mathrm{SiLU}}}}\left( {{{{\mathrm{Norm}}}}\left( {W_{{{{\mathrm{gate}}}}}^lz_{ij}^{l - 1}} \right)} \right)$$

(6)

ALIGNN update

One ALIGNN layer composes an edge-gated graph convolution on the bond graph (g) with an edge-gated graph convolution on the line graph (L(g)), as illustrated in Fig. 2. To avoid ambiguity between the node and edge features of the atomistic graph and its line graph, we write atom, bond, and triplet representations as h, e, and t. The line graph convolution produces bond messages m that are propagated to the atomistic graph, which further updates the bond features in combination with atom features h.

$$m^l,t^l = {{{\mathrm{Edge}}}}\,{{{\mathrm{Gated}}}}\,{{{\mathrm{Graph}}}}\,{{{\mathrm{Conv}}}}(L(g),e^{l - 1},t^{l - 1})$$

(7)

$$h^l,e^l = {{{\mathrm{Edge}}}}\,{{{\mathrm{Gated}}}}\,{{{\mathrm{Graph}}}}\,{{{\mathrm{Conv}}}}(g,h^{l - 1},m^l)$$

(8)

**Fig. 2: Schematic of the ALIGNN layer structure.**

Overall model architecture and training

We use N layers of ALIGNN updates followed by M layers of edge-gated graph convolution (GCN) updates on the bond graph. We use Sigmoid Linear Unit (SiLU, also known as Swish) activations instead of rectified linear unit (ReLU) or Softplus because it is twice differentiable like Softplus but can result in a better empirical performance like ReLU on many tasks. After N + M graph convolution layers, our networks perform global average pooling over nodes and finally predict the target properties with a single fully connected regression or classification layers. Table 1 presents the default hyperparameters of the ALIGNN model used to train the models reported in “Model performance” section. These hyperparameters were selected through a combination of hypothesis-driven experiments and random hyperparameter search, as discussed in detail in the “Methods” section. “Model analysis” section provides a detailed analysis of the sensitivity of model performance and computational cost.

Table 1 ALIGNN model configuration used for both solid-state and molecular machine learning models.

Full size table

Model performance

Model performance can vary substantially depending on the dataset and task. To evaluate the performance of ALIGNN, we currently use two different solid-state property datasets (Materials Project and JARVIS-DFT) as well as molecular property dataset QM9. Because the solid-state datasets are continuously updated, we use time-versioned snapshots of them, specifically selecting the MP version used by previous works to facilitate a direct comparison of model performance with the literature. It is likely that as these dataset sizes increase in the future the performance of the model can be further improved. We select the MP 2018.6.1 version which consists of 69,239 materials with properties such as Perdew Burke-Ernzerhof functional (PBE)⁴⁷ bandgaps and formation energies. Similarly, we use 2021.8.18 version of JARVIS-DFT dataset, which consists of 55,722 materials with several properties such as van der Waals correction with optimized Becke88 functional (OptB88vdW)⁴⁸ bandgaps, formation energies, dielectric constants, Tran-Blaha modified Becke Johnson potential (MBJ)⁴⁹ bandgaps and dielectric constants, bulk, shear modulus, magnetic moment, density functional perturbation theory (DFPT) based maximum piezoelectric coefficients, Boltztrap⁵⁰ based Seebeck coefficient, power factor, maximum absolute value of electric field gradient and two-dimensional materials exfoliation energies. All of these properties are critical for functional materials design. For the MP dataset we use a train-validation-test split of 60,000–5000–4239 as used by SchNet¹⁰ and MEGNet¹⁶. For the JARVIS-DFT dataset and its properties, we use 80 %:10 %: 10 % splits. For QM9 dataset we use a train-validation-test split of 110,000–10,000–10,829 as used by SchNet¹⁰, DimeNet++²⁰, and MEGNet¹⁶.

Performance of ALIGNN models on MP is shown in Table 2, which shows the regression model performance in terms of mean absolute error metric (MAE). The best MAEs for formation energy (E_f) and band gap (E_g) with ALIGNN are 0.022 eV(atom)⁻¹ and 0.218 eV, respectively. In terms of E_f, ALIGNN outperforms reported values of CGCNN, MEGNet, and SchNet models by 43.6%, 21.4%, and 37.1%, respectively. For E_g, ALIGNN outperforms CGCNN and MEGNet by 43.8% and 33.9%, respectively. Good performance on well-known and well-characterized datasets ensures high prediction accuracy of ALIGNN models. Because each property has different units and in general a different variance, we also report the mean absolute deviation (MAD) for each property to facilitate an unbiased comparison of the model performance between different properties. The MAD values represent the performance of a random guessing model with average value prediction for each data point. We also report the CFID based predictions for comparison. Clearly, all the neural networks, especially ALIGNN, perform much better than the corresponding MAD of the dataset as well as CFID performance. Analyzing the MAD: MAE (ALIGNN) ratio, we observe that the ratio could be as high as 42.27 model. Generally, a model with high MAD:MAE ratio (such as 5 and above) is considered a good predictive model⁵¹.

Table 2 Test set performance on the Materials Project dataset.

Full size table

Similarly, we train ALIGNN models on the JARVIS-DFT^{34,35,36,37,38,39,40,41,42,43,44} dataset which consists of data for 55,722 materials. In addition to properties such as formation energies, and bandgaps it also consists several unique quantities such as solar-cell efficiency (spectroscopic limited maximum efficiency, SLME), topological spin-orbit spillage, dielectric constant with (є_x (DFPT)), and without ionic contributions (є_x (OPT, MBJ)), exfoliation energies for two-dimensional (2D), electric field gradients (EFG), Voigt bulk (Kv) and shear modulus (Gv), energy above convex hull (ehull), maximum piezoelectric stress (e_ij) and strain (d_ij) tensors, n-type and p-type Seebeck coefficient and power factors (PF), crystallographic averages of electron (m_e) and hole (m_h) effective masses. As we converge plane wave-cutoff (ENCUT) and k-points used in Brillouin zone integration (Kpoint-length), we attempt to make machine learning predictions on these unique quantities as well. Such a large variety of properties allow a thorough testing of our ALIGNN models. More details for individual properties, its precision with respect to experimental measurements, applicability, and limitations can be found in respective works. However, it is important to mention that many important issues such as tackling systematic underestimation of bandgaps by DFT methods, the inclusion of van der Waals bonding, and the inclusion of spin-orbit coupling interactions, all critically important for materials-design perspective have been key areas of improvements for the JARVIS-DFT dataset. For instance, meta-GGA (generalized gradient approximation) based Tran-Blaha modified Becke Johnson potential (TBmBJ) band gaps are more reliable and comparable to experimental data than Perdew Burke-Ernzerhof functional (PBE) or van der Waals correction with optimized Becke88 functional (OptB88vdW) bandgaps, but their calculations are computationally expensive and hence they are underrepresented in the dataset. In addition to the ALIGNN performance, we also include hand-crafted Classical force-field inspired descriptors (CFID) descriptor and CGCNN MAE performances for these properties using identical data-splits.

In Table 3 we show the performance on regression models for different properties in the JARVIS-DFT database. We observe that ALIGNN models outperform CFID descriptors by up to 4 times, suggesting GNNs can be a very powerful method for multiple material property predictions. Also, ALIGNN outperforms CGCNN by more than 2 times (such as for OptB88vdW total energy). Cross-dataset comparison of corresponding property entries in Tables 2, 3 shows that generally models generally obtain better performance on the MP dataset, which we attribute primarily to the larger size of MP. For example, the MAE for the formation energy target on MP dataset is 50% lower than for JARVIS-DFT. However, for some targets, the differences in the DFT method and settings, as well as potential differences in the material-space distribution, might significantly contribute to the difficulty of a prediction task. For example, the MAE on high throughput band gaps is lower (by 35.7%) for the JARVIS-DFT dataset, which is interesting in light of MP’s dataset size advantage over JARVIS-DFT. One potential source of this discrepancy is the differing computational methodologies used, such as different functionals (PBE vs OptB88vdW), use of the DFT+U method, and settings for various DFT hyperparameters like smearing and k-point settings, all of which can influence the values of computed bandgaps as discussed in ref. ³⁷. Another potential contributing factor could be differing levels of dataset bias in the MP and JARVIS-DFT datasets stemming from differing distributions in material space. Clarifying this situation is beyond the scope of the present work, though it is of great importance for the atomistic modeling community to resolve.

Table 3 Regression model performances on JARVIS-DFT dataset for 29 properties using CFID, CGCNN and ALIGNN models on 55,722 materials.

Full size table

Nevertheless, application of ALIGNN models on different datasets shows improvements for materials-property predictions. Both CFID, CGCNN and ALIGNN models’ MAEs are lower than the corresponding MADs. The MAD:MAE ratios can vary for energy related quantities from a high value of 48.11 (total energy), and 26.06 (formation energy model) to low values such as for DFPT based piezoelectric strain coefficients (1.19) and dielectric constant with ionic contributions (1.63). The results indicate that there is still much room for improvement for the GNN models, especially for electronic properties.

As we notice above, the regression tasks for some of the electronic properties do not show very high MAD: MAE. we train classification models for some of them. Classification tasks predict labels such as high value/low value (based on a selected threshold) as 1 and 0 instead of predicting actual data in regression tasks. Such models can be useful for fast screening purposes³⁸ for computationally expensive methods. We evaluate the performance of these classifiers using the receiver operating characteristic curve area under the curve (ROC AUC). A random guessing model has a ROC AUC of 0.5, while a perfect model would be a ROC AUC of 1.0. Interestingly, we notice most of our classification models (as shown in Table 4) have high ROC AUCs, ranging up to a maximum value of 0.94 (for convex hull stability) showing their usefulness for material classification-based applications. All results are based on the performance of 10 % test data which is never used during the training or model selection procedures.

Table 4 Classification task ROC AUC performance on JARVIS-DFT dataset for ALIGNN models.

Full size table

Next, we evaluate the ALIGNN model on QM9 molecular property dataset (130,829 molecules) and compare it with other well-known models such as SchNet¹⁰, MatErials Graph Network (MEGNet)¹⁶, and DimeNet++²⁰ as shown in Table 5. The results from models other than ALIGNN are reported as given in corresponding papers, not necessarily reproduced by us. QM9 provides DFT calculated molecular properties such as highest occupied molecular orbital (HOMO), lowest unoccupied molecular orbital (LUMO), energy gap, zero-point vibrational energy (ZPVE), dipole moment, isotropic polarizability, electronic spatial extent, internal energy at 0 K, internal energy at 298 K, enthalpy at 298 K, and Gibbs free energy at 298 K. ALIGNN outperforms competing methods for HOMO and dipole moment tasks while other accuracies are similar to the SchNet model. Most importantly, all ALIGNN results reported here use the same set of hyperparameters obtained by tuning to validation performance on the JARVIS-DFT bandgap target, suggesting that ALIGNN provides robust performance with respect to different datasets and material types.

Table 5 Regression model performances on QM9 dataset for 11 properties using ALIGNN.

Full size table

Model analysis

We ablate individual components of the ALIGNN model to evaluate their contribution to the overall architecture. Keeping other parameters intact in the ALIGNN model (as specified in Table 1), we vary the number of ALIGNN and GCN layers as shown in Table 6 and Supplementary Table 1 for JARVIS-DFT OptB88vdW formation energies and bandgaps respectively. We find that without any graph convolution layers the MAE for the formation energy and bandgap are 1248.5% and 453.6% higher than the default model. Adding even a single ALIGNN or GCN layer can reduce the MAE by 102.9% illustrating the importance of these layers. However, further increase in ALIGNN/GCN layers doesn’t scale well and performance quickly saturates at a depth of 4. Excluding GCN layers and increasing ALIGNN layers and vice versa show the individual importance of these layers. Performance of GCN-only models saturates at 4 layers with 44 meV/atom MAE on the JARVIS-DFT formation energy task, while ALIGNN-only models saturate at 34 meV(atom)⁻¹—a relative reduction of 29.14%. Each of these models, along with the other highlighted configurations in Table 6, performs four atom feature updates via graph convolution modules. At least two ALIGNN updates are needed to obtain peak performance. Additional atom feature updates provide little marginal increase in performance. This is consistent with the widely reported difficulty of GCN architectures scaling in depth beyond a few layers⁵².

Table 6 Effect of changing ALIGNN and GCN layers on machine learning models for JARVIS-DFT OptB88vdW formation energy database in ALIGNN models.

Full size table

Figure 3 shows in detail the tradeoff between the performance benefit of including ALIGNN layers and their computational overhead relative to GCN layers. Per-epoch timing for each configuration is reported in Supplementary Table 2. All GCN-only configurations (annotated with the number of GCN layers) are on the low-computation portion of the pareto frontier, but the high-accuracy portion of the pareto frontier is dominated by ALIGNN/GCN combinations with at least two ALIGNN updates. The ALIGNN-2/GCN-2 configuration obtains peak performance (again, relative reduction of MAE by 29.14 %) with a computational overhead of roughly 2× relative to the GCN-4 configuration. Supplementary Table 1 and Supplementary Fig. 53 present layer ablation study results yielding similar conclusions on the JARVIS-DFT OptB88vdW band gap target.

This layer ablation study clearly demonstrates that inclusion of bond angle information and propagation of bond and pair features through the node updates improves the generalization ability of atomistic GCN models. This is satisfying from a materials science perspective, as interatomic bonding theory clearly motivates the notion that inclusion of bond angles should improve accuracy of the model.

Similarly, we vary the number of hidden features (i.e., the width of the graph convolution layers), edge input features, and embedding input features to evaluate the MAE performance for JARVIS-DFT formation energy and bandgap model in comparison with the default model in Table 1. In Supplementary Table 3, we observe that the marginal performance from increasing the hidden features saturates at 256 for both properties. Supplementary Table 4 shows that the number of edge input features is optimal at 80 for formation energy model, while for the bandgap model performance saturates at 40. Similarly, embedding features are optimized at 64 for formation energy while 32 for bandgap model (Supplementary Table 5). Additionally, we tried three different node feature attributes such 1) CFID chemical features (total 438), only atomic number (total 1), and default CGCNN type attributes (total 92) and compared them for formation energy model in Supplementary Table 6. We observe that the default node attributes have the lowest MAE.

Next, we study time taken per epoch of several models for QM9 and JARVIS-DFT formation energy dataset in Supplementary Table 7. To help facilitate fair comparison, we train all models with the same computational resources using the reference implementations and configurations reported in the literature. We note that the timing code for the reference implementations of different methods may include differing amounts of overhead. For example, the ALIGNN timings reported in Supplementary Table 7 amortize the overhead of initial atomistic graph construction across 300 epochs, and each epoch includes the overhead of evaluating the model on the full training and validation sets for performance tracking. Additionally, the computational cost of deep learning models, in general, is not independent of certain hyperparameters; in particular, larger batch sizes can better leverage modern accelerator hardware by exposing more parallelism. We find ALIGNN requires less training time per epoch time compared to other models except DimeNet++ and MEGNet. However, it is important to note that DimeNet++ and other models usually take around 1000 epochs or more to reach desired accuracy, while ALIGNN can converge in about 300 epochs, resulting in lower overall training cost for similar or better accuracy.

While we report timing comparisons using our standard hyperparameter configuration used to train models reported in “Model performance” section, through subsequent model analysis we have identified several strategies that substantially reduce computational workload without incurring a large performance penalty. We observe in Supplementary Fig. 54 that model performance converges after 300 epochs; shorter training budgets incur a modest performance reduction and slightly increased variance with respect to the training split. The performance tradeoff presented in Table 6 and Fig. 3 indicates that switching from the default configuration of 4 layers each of ALIGNN and GCN updates to 2 layers each could offer a speedup of ~1.5× with negligible reduction in accuracy. Finally, we performed a drop-in replacement study comparing batch normalization and layer normalization in Supplementary Table 8, finding that switching to layer normalization provides an additional ~1.7× speedup with a slight degradation in validation loss and negligible degradation in validation MAE. Because the cost of retraining models for all targets reported is still high, and because some of these strategies equally apply to competing models, we defer a more comprehensive performance-cost study to future work.

**Fig. 3: ALIGNN accuracy-cost ablation study on JARVIS-DFT formation energy target.**

Finally, we simultaneously investigate the effects of dataset size and different train-validation-test splits by performing a learning curve study in cross-validation for the JARVIS-DFT formation energy (Fig. 4 and Supplementary Table 9) and bandgap (Supplementary Fig. 55 and Supplementary Table 9) targets. We perform the cross-validation splitting procedure by merging the standard JARVIS-DFT train and validation sets and randomly sampling without replacement N_train training samples and 5000 validation samples. The learning curve study shows no sign of diminishing marginal returns for additional data up to the full size of the JARVIS-DFT dataset. On the full training set size (44,577) we obtain an average validation MAE of 0.0316 ± 0.0004 eV/at (uncertainty corresponds to the standard error of the mean over five cross-validation (CV) iterates). The standard deviation over CV iterates is 0.0009 eV/at, indicating that model performance is relatively insensitive to the dataset split.

**Fig. 4: Learning curve for JARVIS-DFT formation energy regression target.**

In summary, we have developed an ALIGNN model which uses the line graph neural network that improves the performance of GNN predictions for solids and molecules. We have demonstrated that explicit inclusion of angle-based networks in GNNs can significantly improve model performance. A key contribution of this work is the inclusion and development of both the undirected atomistic graph and its line graph counterpart for solid-state and molecular materials. We develop regression and classification ALIGNN models for some of the well-known pre-existing databases and it can be easily applied for other datasets as well. Our model significantly improved accuracies over prior GNN models. We believe the ALIGNN model will rapidly improve the machine learning prediction for several material properties and classes.

Methods

JARVIS-DFT dataset

The JARVIS-DFT^{34,35,36,37,38,39,40,41,42,43,44} dataset is developed using Vienna Ab-initio simulation package (VASP)⁵³ software (please note commercial software is identified to specify procedures. Such identification does not imply recommendation by National Institute of Standards and Technology (NIST)). Most of the properties are calculated using the OptB88vdW functional⁴⁸. For a subset of the data we use TBmBJ⁴⁹ for getting better band gaps. We use density functional perturbation theory (DFPT)⁵⁴ for predicting piezoelectric and dielectric constants with both electronic and ionic contributions. The linear response theory-based⁵⁵ frequency based dielectric function was calculated using both OptB88vdW and TBmBJ and the zero-energy values are trained for the machine learning model. Note that the linear response based dielectric constants lack ionic contributions. The TBmBJ frequency dependent dielectric functions are used to calculate the spectroscopic limited maximum efficiency (SLME)³⁸. The magnetic moments are calculated using spin-polarized calculations considering only ferromagnetic initial configurations and neglecting any density functional theory (DFT)+U effects. The thermoelectric coefficients such as Seebeck coefficients and power factors are calculated using BoltzTrap⁵⁰ software using constant relaxation time approximation. Exfoliation energy for the van der Waals bonded two-dimensional materials are calculated as the energy per atom differences between the bulk and corresponding monolayer counterparts. The spin-orbit spillage⁴⁰ is calculated using the difference in wavefunctions of a material with and without inclusion of spin-orbit coupling effects. All the JARVIS-DFT data and Classical force-field inspired descriptors (CFID)³² are generated using the JARVIS-Tools package. The CFID baseline models are trained using the LightGBM package (please note commercial software is identified to specify procedures. Such identification does not imply recommendation by National Institute of Standards and Technology (NIST)).⁵⁶ using the models developed in ref. ³².

ALIGNN model implementation and training

The ALIGNN model is implemented in PyTorch⁵⁷ and deep graph library (DGL)³³; the training code heavily relies on PyTorch-ignite⁵⁸. For regression targets we minimize the mean squared error (MSE) loss, and for classification targets we minimize the standard negative log likelihood loss. We train all models for 300 epochs using the AdamW⁵⁹ optimizer with normalized weight decay of 10⁻⁵ and a batch size of 64. The learning rate is scheduled according to the one-cycle policy⁶⁰ with a maximum learning rate of 0.001. We use the same model configuration for each regression and classification target. We use the initial atom representations from the CGCNN paper, 80 initial bond radial basis function (RBF) features, and 40 initial bond angle RBF features. The atom, bond, and bond angle feature embedding layers produce 64-dimensional inputs to the graph convolution layers. The main body of the network consists of 4 ALIGNN and 4 graph convolution (GCN) layers, each with hidden dimension 256. The final atom representations are reduced by atom-wise average pooling and mapped to regression or classification outputs by a single linear layer. These hyperparameters are selected to optimize validation MAE on the JARVIS-DFT band gap task through a combination of manual hypothesis-driven experiments and random hyperparameter search facilitated and scheduled through Ray Tune⁶¹; hyperparameter ranges are given in Supplementary Table 10. The random search results indicate that model performance is most highly sensitive to the learning rate, weight decay, and convolution layer width, and beyond a relatively low threshold is insensitive to the sizes of the initial feature embedding layers.

We used NIST’s Nisaba cluster to train all ALIGNN models, and we reproduce results from the literature using the reference implementations for each competing method on the same hardware. Each model is trained on a single Tesla V100 SXM2 32 gigabyte Graphics processing unit (GPU), with 8 Intel Xeon E5-2698 v4 CPU cores for concurrently fetching and preprocessing batches of data during training (please note commercial software is identified to specify procedures. Such identification does not imply recommendation by National Institute of Standards and Technology (NIST)). For the MP dataset we use a train-validation-test split of 60,000–5000–4239. For the JARVIS-DFT dataset, we use 80%:10%: 10% splits. The 10% test data is never used during training procedures. For QM9 dataset we use a train-validation-test split of 110,000–10,000–10,829.

Data availability

All data used in this work is available at Figshare link https://figshare.com/collections/ALIGNN_data/5429274. During the training these datasets are accessed using JARVIS-Tools’s figshare module.

Code availability

The code and full model and training configurations used in this work are available on GitHub at https://github.com/usnistgov/alignn, along with general tooling at https://github.com/usnistgov/jarvis. An interactive web-app for using ALIGNN models is also made available at https://jarvis.nist.gov/jalignn.

Change history

28 October 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41524-022-00913-5

References

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS Google Scholar
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).
Article Google Scholar
Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4 (2020).
Article Google Scholar
Dwivedi, V. P., Joshi, C. K., Laurent, T., Bengio, Y. & Bresson, X. Benchmarking graph neural networks. arXiv 2003, 00982. Preprint at https://arxiv.org/abs/2003.00982 (2020).
Guo, Z. & Wang, H. A deep graph neural network-based mechanism for social recommendations. IEEE Trans. Ind. Inform. 17, 2776 (2020).
Article Google Scholar
Chen, Z., Li, X. & Bruna, J. Supervised community detection with line graph neural networks. arXiv. 1705, 08415. Preprint at https://arxiv.org/abs/1705.08415# (2017).
Li, X. et al. Braingnn: Interpretable brain graph neural network for fmri analysis. Med. Image Anal. 74, 102233 (2021)..
Baumbach, J. CoryneRegNet 4.0–A reference database for corynebacterial gene regulatory networks. BMC Bioinforma. 8, 1–11 (2007).
Article Google Scholar
Wu, K., Chen, Z. & Li, W. A novel intrusion detection model for a massive network using convolutional neural networks. IEEE Access 6, 50850 (2018).
Article Google Scholar
Schütt, K. T. et al. Schnet: a continuous-filter convolutional neural network for modeling quantum interactions. arXiv 1706, 08566. Preprint at https://arxiv.org/abs/1706.08566 (2017).
Duvenaud, D. et al. Convolutional networks on graphs for learning molecular fingerprints. arXiv 1509, 09292. Preprint at https://arxiv.org/abs/1509.09292 (2015).
Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided 30, 595–608 (2016).
Article CAS Google Scholar
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. PMLR 70, 1263 (2017).
Google Scholar
Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 13, 5255–5264 (2017).
Article CAS Google Scholar
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Article CAS Google Scholar
Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31, 3564–3572 (2019).
Article CAS Google Scholar
Park, C. W. & Wolverton, C. Developing an improved crystal graph convolutional neural network framework for accelerated materials discovery. Phys. Rev. Mater. 4, 063801 (2020).
Article CAS Google Scholar
Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R. & Miller, T. F. III OrbNet: deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 153, 124111 (2020).
Article CAS Google Scholar
Klicpera, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. arXiv 2003, 03123. Preprint at https://arxiv.org/abs/2003.03123 (2020).
Klicpera, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv 2011, 14115. Preprint at https://arxiv.org/abs/2011.14115 (2020).
Unke, O. T. & Meuwly, M. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput 15, 3678–3693 (2019).
Article CAS Google Scholar
Shui, Z. & George, K. “Heterogeneous molecular graph neural networks for predicting molecule properties”. 2020 IEEE International Conference on Data Mining (ICDM), 492 (2020).
Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. & Tkatchenko, A. Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8, 1–8 (2017).
Article Google Scholar
Anderson, B., Hy, T.-S. & Kondor, R. Cormorant: covariant molecular neural networks. arXiv 1906, 04015. Preprint at https://arxiv.org/abs/1906.04015 (2019).
Zhang, S., Liu, Y. & Xie, L. Molecular mechanics-driven graph neural network with multiplex graph for molecular structures. arXiv 2011, 07457. Preprint at https://arxiv.org/abs/2011.07457 (2020).
Lubbers, N., Smith, J. S. & Barros, K. Hierarchical modeling of molecular energies using a deep neural network. J. Chem. Phys. 148, 241715 (2018).
Article Google Scholar
Schutt, K. et al. SchNetPack: A deep learning toolbox for atomistic systems. J. Chem. Theory Comput. 15, 448 (2018).
Article Google Scholar
Jha, D. et al. Elemnet: Deep learning the chemistry of materials from only elemental composition. Sci. Rep. 8, 1–13 (2018).
Article Google Scholar
Westermayr, J., Gastegger, M. & Marquetand, P. Combining SchNet and SHARC: The SchNarc machine learning approach for excited-state dynamics. J. Phys. Chem. Lett. 11, 3828 (2020).
Article CAS Google Scholar
Wen, M., Blau, S. M., Spotte-Smith, E. W. C., Dwaraknath, S. & Persson, K. A. BonDNet: a graph neural network for the prediction of bond dissociation energies for charged molecules. Chem 12, 1858 (2020).
Google Scholar
Isayev, O. et al. Universal fragment descriptors for predicting properties of inorganic crystals. Nat. Commun. 8, 1 (2017).
Article Google Scholar
Choudhary, K., DeCost, B. & Tavazza, F. Machine learning with force-field-inspired descriptors for materials: Fast screening and mapping energy landscape. Phys. Rev. Mater. 2, 083801 (2018).
Article CAS Google Scholar
Wang, M. et al. Deep graph library: a graph-centric, highly-performant package for graph neural networks. arXiv 1909, 01315. Preprit at https://arxiv.org/abs/1909.01315 (2019).
Choudhary, K. et al. The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design. Npj Comput. Mater. 6, 1–13 (2020).
Article Google Scholar
Choudhary, K., Cheon, G., Reed, E. & Tavazza, F. Elastic properties of bulk and low-dimensional materials using van der Waals density functional. Phys. Rev. B 98, 014107 (2018).
Article CAS Google Scholar
Choudhary, K., Kalish, I., Beams, R. & Tavazza, F. High-throughput identification and characterization of two-dimensional materials using density functional theory. Sci. Rep. 7, 1–16 (2017).
Article Google Scholar
Choudhary, K. et al. Computational screening of high-performance optoelectronic materials using OptB88vdW and TB-mBJ formalisms. Sci. Data 5, 1–12 (2018).
Article Google Scholar
Choudhary, K. et al. Accelerated discovery of efficient solar cell materials using quantum and machine-learning methods. Chem. Mater. 31, 5900 (2019).
Article CAS Google Scholar
Choudhary, K., Garrity, K. F. & Tavazza, F. High-throughput discovery of topologically non-trivial materials using spin-orbit spillage. Sci. Rep. 9, 1–8 (2019).
Article Google Scholar
Choudhary, K., Garrity, K. F., Ghimire, N. J., Anand, N. & Tavazza, F. High-throughput search for magnetic topological materials using spin-orbit spillage, machine learning, and experiments. Phys. Rev. B 103, 155131 (2021).
Article CAS Google Scholar
Choudhary, K., Ansari, J. N., Mazin, I. I. & Sauer, K. L. Density functional theory-based electric field gradient database. Sci. Data 7, 1–10 (2020).
Article Google Scholar
Choudhary, K., Garrity, K. F. & Tavazza, F. Data-driven discovery of 3D and 2D thermoelectric materials. J. Condens. Matter Phys. 32, 475501 (2020).
Article CAS Google Scholar
Choudhary, K. et al. High-throughput density functional perturbation theory and machine learning predictions of infrared, piezoelectric, and dielectric responses. Npj Comput. Mater. 6, 1–13 (2020).
Article Google Scholar
Choudhary, K. & Tavazza, F. Convergence and machine learning predictions of Monkhorst-Pack k-points and plane-wave cut-off in high-throughput DFT calculations. Comput. Mater. Sci. 161, 300–308 (2019).
Article CAS Google Scholar
Jain, A. et al. Commentary: The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Article Google Scholar
Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 1 (2014).
Article Google Scholar
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
Article CAS Google Scholar
Klimeš, J., Bowler, D. R. & Michaelides, A. Chemical accuracy for the van der Waals density functional. J. Condens. Matter Phys. 22, 022201 (2009).
Article Google Scholar
Tran, F. & Blaha, P. Accurate band gaps of semiconductors and insulators with a semilocal exchange-correlation potential. Phys. Rev. Lett. 102, 226401 (2009).
Article Google Scholar
Madsen, G. K. & Singh, D. J. BoltzTraP. A code for calculating band-structure dependent quantities. Comput. Phys. Commun. 175, 67–71 (2006).
Article CAS Google Scholar
Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials. Npj Comput. Mater. 2, 1 (2016).
Article Google Scholar
Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K. I. & Jegelka, S. Representation learning on graphs with jumping knowledge networks. PMLR 80, 5453 (2018).
Google Scholar
Kresse, G. & Furthmüller Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15 (1996).
Article CAS Google Scholar
Baroni, S. & Resta, R. Ab initio calculation of the macroscopic dielectric constant in silicon. Phys. Rev. B 33, 7017 (1986).
Article CAS Google Scholar
Gajdoš, M., Hummer, K., Kresse, G., Furthmüller, J. & Bechstedt, F. Linear optical properties in the projector-augmented wave methodology. Phys. Rev. B 73, 045112 (2006).
Article Google Scholar
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30, 3146 (2017).
Google Scholar
Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. arXiv 1912, 01703. Preprint at https://arxiv.org/abs/1912.01703 (2019).
PyTorch-ignite documentation. https://pytorch.org/ignite/ (2020).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. arXiv 1711, 05101. Preprint at https://arxiv.org/abs/1711.05101 (2017).
Smith, L. N. A disciplined approach to neural network hyper-parameters: Part 1-learning rate, batch size, momentum, and weight decay. arXiv 1803, 09820. Preprint at https://arxiv.org/abs/1803.09820 (2018).
Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J. E. & Stoica, I. Tune: a research platform for distributed model selection and training. arXiv 1807, 05118. Preprint at https://arxiv.org/abs/1807.05118 (2018).

Download references

Acknowledgements

K.C. and B.D. thank the National Institute of Standards and Technology for funding, computational, and data management resources. Contributions from K.C. were supported by the financial assistance award 70NANB19H117 from the U.S. Department of Commerce, National Institute of Standards and Technology. This work was also supported by the Frontera supercomputer, National Science Foundation OAC-1818253, at the Texas Advanced Computing Center (TACC) at The University of Texas at Austin.

Author information

These authors contributed equally: Kamal Choudhary, Brian DeCost.

Authors and Affiliations

Materials Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
Kamal Choudhary & Brian DeCost
Theiss ResearchLa Jolla, California, 92037, USA
Kamal Choudhary
DeepMaterials LLC, Silver Spring, MD, 20906, USA
Kamal Choudhary

Authors

Kamal Choudhary
View author publications
You can also search for this author in PubMed Google Scholar
Brian DeCost
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both K.C. and B.D. equally contributed to developing the model and writing the manuscript.

Corresponding author

Correspondence to Kamal Choudhary.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Choudhary, K., DeCost, B. Atomistic Line Graph Neural Network for improved materials property predictions. npj Comput Mater 7, 185 (2021). https://doi.org/10.1038/s41524-021-00650-1

Download citation

Received: 03 June 2021
Accepted: 02 October 2021
Published: 15 November 2021
DOI: https://doi.org/10.1038/s41524-021-00650-1

This article is cited by

A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture
- Hyun Park
- Xiaoli Yan
- Emad Tajkhorshid
Communications Chemistry (2024)
Structure-aware graph neural network based deep transfer learning framework for enhanced predictive analytics on diverse materials datasets
- Vishu Gupta
- Kamal Choudhary
- Ankit Agrawal
npj Computational Materials (2024)
Leveraging language representation for materials exploration and discovery
- Jiaxing Qu
- Yuxuan Richard Xie
- Elif Ertekin
npj Computational Materials (2024)
Comparative study of crystal structure prediction approaches based on a graph network and an optimization algorithm
- Fan Yang
- Guanjian Cheng
- Wan-Jian Yin
Science China Materials (2024)
Methods and applications of machine learning in computational design of optoelectronic semiconductors
- Xiaoyu Yang
- Kun Zhou
- Lijun Zhang
Science China Materials (2024)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results and discussion

Atomistic graph representation

Atomistic line graph representation

Edge gated graph convolution

ALIGNN update

Overall model architecture and training

Model performance

Model analysis

Methods

JARVIS-DFT dataset

ALIGNN model implementation and training

Data availability

Code availability

Change history

28 October 2022

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links