Abstract
This work presents an E(3) equivariant graph neural network called HamGNN, which can fit the electronic Hamiltonian matrix of molecules and solids by a complete datadriven method. Unlike invariant models that achieve equivariance approximately through data augmentation, HamGNN employs E(3) equivariant convolutions to construct the Hamiltonian matrix, ensuring strict adherence to all equivariant constraints inherent in the physical system. In contrast to previous models with limited transferability, HamGNN demonstrates exceptional accuracy on various datasets, including QM9 molecular datasets, carbon allotropes, silicon allotropes, SiO_{2} isomers, and Bi_{x}Se_{y} compounds. The trained HamGNN models exhibit accurate predictions of electronic structures for large crystals beyond the training set, including the Moiré twisted bilayer MoS_{2} and silicon supercells with dislocation defects, showcasing remarkable transferability and generalization capabilities. The HamGNN model, trained on small systems, can serve as an efficient alternative to density functional theory (DFT) for accurately computing the electronic structures of large systems.
Similar content being viewed by others
Introduction
Nowadays machine learning (ML) has a wide range of applications in molecular and materials science, including the direct prediction of various properties of materials^{1,2,3}, the construction of machine learning interatomic potentials (MLIPs) with quantum mechanical precision^{4,5,6,7}, the highthroughput generation of molecular and crystal structures^{8,9,10}, and the construction of more precise exchangecorrelation functionals^{11,12}. However, the determination of materials’ electronic structure still heavily relies on calculations based on density functional theory (DFT). Unfortunately, these methods require a significant amount of time to get the electronic Hamiltonian matrices through selfconsistent iterations and exhibit poor scalability with respect to system sizes. Semiempirical tightbinding (TB) approximations^{13}, such as the SlaterKoster method^{14}, can reduce a lot of computation compared to the DFT methods. However, this approach often directly uses the existing or manually finetuned TB parameters and thus cannot accurately reproduce the electronic structure of general systems. Developing truly transferable, fully datadriven TB models that can be applied across various materials, geometries, and boundary conditions can reconcile accuracy with speed but is rather challenging.
Hegde and Brown first used kernel ridge regression (KRR) to learn semiempirical tightbinding Hamiltonian matrices^{15}. They successfully fitted the Hamiltonian matrix of the Cu system containing only rotation invariant s orbitals and the diamond system consisting only of s and p orbitals. Similarly, Wang et al. designed a neural network model to obtain semiempirical TB parameters by fitting the band structures calculated by DFT^{16}. An important feature of the Hamiltonian matrix is that its components transform equivariantly with the rotation of the coordinate system. However, none of these approaches deals with the rotational equivariance of the Hamiltonian matrix. Moreover, the two methods are limited to fitting the empirical model Hamiltonian matrices rather than the true ab initio tightbinding Hamiltonian matrices generated by the selfconsistent iteration of ab initio tightbinding methods such as OpenMX^{17,18} and Siesta^{19,20}.
Because of the spherical harmonic part of the atomic basis functions, the TB Hamiltonian matrix must satisfy two fundamental constraints: rotational equivariance and parity symmetry. Due to the presence of periodic boundary conditions, the tightbinding Hamiltonian matrix for solids also exhibits translational invariance. The fundamental symmetry constraints satisfied by the tightbinding Hamiltonian matrix belong to the E(3) group. The notation E(3) denotes the Euclidean group in threedimensional (3D) space, i.e., the group of rotations, translations, and inversion in 3D space. If a mapping L(q) is E(3) equivariant, then the following equivariant condition is satisfied for any rotation or inversion operation \(\hat{{\rm{g}}}\in O(3)\) and translation operation: \(L(\hat{g}\circ {\bf{q}}+{\bf{t}})=\hat{g}\circ L({\bf{q}})\), where t is the translation vector. In other words, the mapping L(q) remains invariant under translation while being equivariant under rotation and inversion. The tightbinding Hamiltonian matrix H({τ_{i}}) can be viewed as a function of the atomic positions {τ_{i}} of the system. Here we take the rotational equivariance of the Hamiltonian matrix as an example. When a rotation \(\hat{Q}\in SO(3)\) is applied to the system, the equivariance between the rotated and original Hamiltonian matrices can be expressed as \(H(\{{{\boldsymbol{\tau }}^{\prime}}_{i}\})=D(\hat{Q})H(\{{{\boldsymbol{\tau }}}_{i}\})D{(\hat{Q})}^{\dagger }\), where \(\{{{\boldsymbol{\tau }}^{\prime}}_{i}\}\) is the set of the rotated atomic positions, and D is the Wigner D matrix^{21,22}. It is crucial for the mapping of atomic coordinates to the Hamiltonian matrix to satisfy equivariance. If equivariance is not achieved in the mapping, the Hamiltonian matrix may undergo an unphysical transformation, resulting in \(H(\{{{\boldsymbol{\tau }}^{\prime}}_{i}\})\,\ne\, D(\hat{Q})H(\{{{\boldsymbol{\tau }}}_{i}\})D{(\hat{Q})}^{\dagger }\) when the system undergoes any rotation. Nonequivariant models necessitate a substantial amount of training data to approximately achieve equivariance by learning how to predict the Hamiltonian matrix in all possible directions of the system. Nonequivariant models often struggle with transferability, especially when encountering structures or configurations that differ significantly from the training set. As a consequence, they may lack generalization capabilities when applied to new or complex materials. In contrast, equivariant models possess a more robust ability for complex structures due to their ability to capture and preserve symmetries.
Zhang et al.^{23} and Nigam et al.^{24} proposed a method to predict the abinitio TB Hamiltonians of small molecules and simple solid systems by constructing an equivariant kernel in Gaussian Process Regression (GPR) to parameterize the Hamiltonian matrices. Due to the fixed kernel and representation used in GPR^{25,26,27,28}, its training accuracy and multielement generalization ability are typically inferior to those of deep neural networks like messagepassing neural networks (MPNNs)^{29,30,31,32,33,34} when a sufficient number of training samples are available. Hence, the development of graph neural networks (GNNs) with the capability to predict the Hamiltonian matrix of both general periodic and aperiodic systems emerges as the most favorable approach.
However, traditional GNNs are limited to predicting rotationinvariant scalars such as energy, band gap, and the like. To predict equivariant directional properties, GNNs need to encode the directional information of the system suitably. To make the predicted Hamiltonian matrices satisfy the rotational equivariance, Schütt et al. designed the SchNorb neural network architecture by embedding the direction information of the bonds into the messagepassing function^{35}. This network constructs the TB Hamiltonian of molecules from the directional edge features of atom pairs. However, SchNorb needs to learn the rotational equivariance of the Hamiltonian matrix through data augmentation, which greatly increases the amounts of training data and redundant parameters of the network. Unke et al. proposed the PhisNet model^{36}, which achieved SE(3) equivariant parameterization of TB Hamiltonian matrix using GNN based on SO(3) representations. Their model demonstrated high accuracy on the Hamiltonian matrices of small molecules such as water and ethanol. However, it should be noted that the PhisNet model cannot be considered a universal representation of Hamiltonian matrices due to its neglect of parity symmetry, which can lead to significant issues in predicting periodic systems.
Recently, Li et al. proposed a GNN model named DeepH to predict the TB Hamiltonian matrix by constructing local coordinate systems in a crystal^{37}. DeepH successfully predicted the TB Hamiltonian matrices of some simple periodic systems such as graphene and carbon nanotubes. Their original intention of introducing the local coordinate system is to solve the rotation equivariance problem of the Hamiltonian, but DeepH still embeds the local directional information of interacting atom pairs in the invariant message passing function, which will undoubtedly increase the number of redundant parameters of the network but may require less data augmentation than a fully invariant model without local coordinate systems. In addition, the hopping distance between two interacting atoms far exceeds the typical lengths of chemical bonds. Taking the smallest hydrogen atom as an example, the cutoff radius of the numerical atomic orbital of the hydrogen atom used by OpenMX is 6 Bohr, so the furthest hopping between any two atomic bases used by OpenMX in periodic systems can exceed at least 12 Bohr (~6.4 Å), a distance that even exceeds the lattice parameters of certain crystals. Therefore, it is difficult to describe such long hopping in a welldefined local coordinate system.
In this work, we developed a general framework to parametrize the Hamiltonian matrix by decomposing each block of the Hamiltonian matrix into a vector coupling of equivariant irreducible spherical tensors^{38} (ISTs) with appropriate parity symmetry. The parametrized Hamiltonian matrix strictly adheres to rotational equivariance and parity symmetry, and can be further extended to a parameterized Hamiltonian that satisfies SU(2) and timereversal equivariance to accurately fit the Hamiltonian matrix with spinorbit coupling (SOC) effects. Based on this universal parametrized Hamiltonian, we designed the E(3) equivariant HamGNN model for predicting the TB Hamiltonian matrices of molecules and solids. HamGNN has demonstrated its universality for the Hamiltonian matrix, showcasing exceptional accuracy across various datasets, including QM9 molecular datasets, carbon allotropes, silicon allotropes, SiO_{2} isomers, and Bi_{x}Se_{y} compounds. The trained HamGNN model can predict the Hamiltonian matrices, energy bands, and wavefunctions of the structures not present in the training set. The powerful fitting and generalization ability exhibited by HamGNN enables researchers to efficiently compute the electronic structures of largescale crystal systems that were previously deemed challenging or inaccessible.
Results
E(3) equivariant parametrized Hamiltonian
The core of the electronic structure problem in DFT is to solve the KohnSham equation for electrons. If the KohnSham Hamiltonian is represented by numerical atomic orbitals centered on each atom, such as those defined in packages like OpenMX and Siesta, then the KohnSham equation can be formulated as a generalized eigenvalue problem:
where \({H}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}^{({\bf{k}})}=\sum _{{n}_{c}}{e}^{i{\bf{k}}\cdot {{\bf{R}}}_{{n}_{c}}}{H}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}^{({{\bf{R}}}_{{n}_{c}})}\) and \({S}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}^{({\bf{k}})}=\sum _{{n}_{c}}{e}^{i{\bf{k}}\cdot {{\bf{R}}}_{{n}_{c}}}{S}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}^{({{\bf{R}}}_{{n}_{c}})}\) are the KohnSham Hamiltonian and overlap matrices at the point k in the reciprocal space. \({H}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}^{({\bf{k}})}\) and \({S}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}^{({\bf{k}})}\) are obtained by Fourier transform of the realspace TB Hamiltonian matrix \({H}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}^{({{\bf{R}}}_{{n}_{c}})}=\langle {\phi }_{{n}_{i}{l}_{i}{m}_{i}}({\bf{r}}{{\boldsymbol{\tau}}}_{i})\hat{H}{\phi }_{{n}_{j}{l}_{j}{m}_{j}}({\bf{r}}{{\boldsymbol{\tau}}}_{j}{{\bf{R}}}_{{n}_{c}})\rangle\) and overlap matrix \({S}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}^{({{\bf{R}}}_{{n}_{c}})}=\langle {\phi }_{{n}_{i}{l}_{i}{m}_{i}}({\bf{r}}{{\boldsymbol{\tau }}}_{i}){\phi }_{{n}_{j}{l}_{j}{m}_{j}}({\bf{r}}{{\boldsymbol{\tau }}}_{j}{{\bf{R}}}_{{n}_{c}})\rangle\), respectively, in the basis of atomic orbitals \({\phi }_{{n}_{i}{l}_{i}{m}_{i}}\) at the site τ_{i} and \({\phi }_{{n}_{j}{l}_{j}{m}_{j}}\) at the site τ_{j} + R_{nc}, where R_{nc} is the shift vector of the n_{c}th periodic image cell. Therefore, once we have obtained the Hamiltonian matrix and overlap matrix in real space, we can further solve the electronic structure in the whole reciprocal space.
Due to the spherical symmetry of the atomic potential, the atomic orbital bases as its eigenfunctions not only satisfy the rotational equivariance under the operation Q ∈ SO (3) but also has a certain parity symmetry under the inversion operation g ∈ {E, I}. Under a rotaryinversion operation gQ ∈ O (3), the TB Hamiltonian matrix element \({H}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}\) in real space becomes (we omit the notation R_{nc} for convenience in the following discussion):
The irreducible representation of gQ is σ_{p}(g)⊗D(Q), where D(Q) is the Wigner D matrix and σ_{p}(g) is the scalar irreducible representation of the inversion operation, which is defined as follows
Substitute the irreducible representation of gQ into Eq. (2), we can get
where \({\sigma }_{{p}_{i}{p}_{j}}(g)={\sigma }_{{p}_{i}}(g){\sigma }_{{p}_{j}}(g)\). We further write the righthand side of Eq. (4) in the form of matrixvector multiplication:
It can be seen from the above equation that each subblock \({H}_{{n}_{i}{l}_{i}{\mu }_{i},{n}_{j}{l}_{j}{\mu }_{j}}\) (\({\mu }_{i}\le {l}_{i}\), \({\mu }_{j}\le {l}_{j}\)) of the TB Hamiltonian matrix based on atomic orbitals can be regarded as a spherical tensor^{22,38} \({{\bf{T}}}_{{\boldsymbol{\mu }},{p}_{i}{p}_{j}}^{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\equiv {H}_{{n}_{i}{l}_{i}{\mu }_{i},{n}_{j}{l}_{j}{\mu }_{j}}\) with the parity \({p}_{i}{p}_{j}\), which is rotationally equivariant according to the generalized Wigner D matrix \({{\boldsymbol{D}}}_{{\boldsymbol{\mu }}{\boldsymbol{m}}}^{{\boldsymbol{l}}}(Q)={D}_{{\mu }_{i}{m}_{i}}^{{l}_{i}}(Q){D}_{{\mu }_{j}{m}_{j}}^{{l}_{j}}(Q)\), where \({\boldsymbol{l}}\equiv ({l}_{i},{l}_{j})\), \({\boldsymbol{\mu }}\equiv ({\mu }_{i},{\mu }_{j})\), \({\boldsymbol{m}}\equiv ({m}_{i},{m}_{j})\).
According to the angular momentum theory^{21,22}, \({D}^{{l}_{i}}(Q)\otimes {D}^{{l}_{j}}(Q)\) is a reducible representation and can be further decomposed into the direct sum of several irreducible Wigner D matrices:
Combining the parity of the Hamiltonian matrix block \(({n}_{i}{l}_{i},{n}_{j}{l}_{j})\), we can get
According to Eq. (7), \({{\boldsymbol{T}}}_{{{\boldsymbol{\mu }}},{p}_{i}{p}_{j}}^{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\) is reducible and the coupled irreducible spherical tensor \({{T}}_{L,{p}_{i}{p}_{j},m}^{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\) in each order \(L={l}_{i}{l}_{j},\cdots ,{l}_{i}+{l}_{j}\) can be obtained by the vector coupling of \({{\boldsymbol{T}}}_{{\boldsymbol{\mu }},{p}_{i}{p}_{j}}^{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\):
where \({C}_{m{\mu }_{i}{\mu }_{j}}^{L{l}_{i}{l}_{j}}\) is the vector coupling coefficient, namely the ClebschGordan coefficient. Each IST \({T}_{L,{p}_{i}{p}_{j},m}^{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\) has the parity symmetry of \({p}_{i}{p}_{j}\) and satisfies the rotational equivariance of order L. By inverse linear transformation of Eq. (8), \({{\boldsymbol{T}}}_{{\boldsymbol{\mu }},{p}_{i}{p}_{j}}^{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\) can be constructed from ISTs \({T}_{L,{p}_{i}{p}_{j},m}^{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\):
Therefore, as long as we find all ISTs corresponding to each block of the Hamiltonian matrix, we can construct the entire Hamiltonian matrix in a blockwise manner through Eq. (9). We construct two O(3) equivariant vectors \({\mathbf{\Omega} }_{i}^{on}\) and \({\mathbf{\Omega} }_{ij}^{off}\) by the direct summation of all the ISTs required by the onsite (\(i=j\)) Hamiltonian and the offsite (\(i\,\ne\, j\)) Hamiltonian respectively:
The prediction of the Hamiltonian matrix is transformed into the prediction of \({\mathbf{\Omega} }_{i}^{on}\) and \({\mathbf{\Omega}}_{ij}^{off}\), which can be obtained by mapping from the equivariant features of the nodes and the pair interactions, respectively. The final parameterized Hamiltonian can be expressed as:
The above formula is O(3) equivariant. Since GNN naturally has translational symmetry, the parameterized Hamiltonian represented by \({\mathbf{\Omega}}_{i}^{on}\) and \({\mathbf{\Omega }}_{ij}^{off}\) obtained from GNN has E(3) equivariance.
When the spinorbit coupling (SOC) effects are considered, the realspace Hamiltonian matrices are complexvalued and can be divided into four subblocks \({\hat{{\boldsymbol{H}}}}_{{s}_{i}{s}_{j}}\)(\({s}_{i},{s}_{j}=\uparrow or\downarrow\)) by the spin degree. In this case, the complete Hamiltonian matrices satisfy the following SU(2) rotational equivariance:
Although each subblock \({\hat{{\boldsymbol{H}}}}_{{s}_{i}{s}_{j}}\) satisfies the O(3) rotational equivariance, they are coupled to each other under the rotational operations. Therefore, the four subblocks predicted independently with the O(3) equivariant parameterized Hamiltonian cannot be used to construct the complete SU(2) equivariant Hamiltonian with the SOC effect. In addition, the real and imaginary parts of the SU(2) equivariant Hamiltonian matrices are also coupled during rotation, so the complete Hamiltonian cannot be constructed by using the independently predicted real and imaginary parts. These methods not only rely on a large number of fitting parameters but also can not make the constructed Hamiltonian matrices strictly meet the SU(2) equivariance. To ensure that the SOC effect learned by the network complies with the physical rules and SU(2) equivariance, we explicitly express the complete Hamiltonian as the sum of the spinless part and the SOC part:
where \({\hat{H}}_{SOC}=\frac{1}{2}\xi (r)\hat{{\bf{L}}}\cdot \hat{{\boldsymbol{\sigma }}}=\frac{1}{2}\xi (r)({\hat{L}}_{x}{\hat{\sigma }}_{x}+{\hat{L}}_{y}{\hat{\sigma }}_{y}+{\hat{L}}_{z}{\hat{\sigma }}_{z})\), which satisfies SU(2) rotational equivariance and timereversal equivariance. ξ(r) is an invariant coefficient describing the strength of the SOC effects^{39}. According to the above equation, we can get the following parameterized Hamiltonian matrices with SOC effect in the atomic orbitals^{39,40}:
Since the matrix representation of the angular momentum operator \(\langle {\phi }_{{n}_{i}{l}_{i}{m}_{i}}{\hat{L}}_{\alpha }{\phi }_{{n}_{j}{l}_{j}{m}_{j}}\rangle =\langle {\phi }_{{n}_{i}{l}_{i}{m}_{i}}({\bf{r}}_{i}){\hat{L}}_{\alpha }{\phi }_{{n}_{j}{l}_{j}{m}_{j}}({\bf{r}}_{j}{\boldsymbol{\tau }}_{ji})\rangle\) under the atomic orbital basis can be directly calculated analytically, the only learnable parameters in Eq. (15) are \({\tilde{H}}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}\) and \({\xi }_{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\). \({\tilde{H}}_{{n}_{i}{l}_{i}{m}_{i},{n}_{j}{l}_{j}{m}_{j}}\) can be directly expressed by Eq. (12), and \({\xi }_{{n}_{i}{l}_{i},{n}_{j}{l}_{j}}\) is an invariant scalar coefficient that can be mapped from the features of atom pairs ij.
Network Implementation
As can be seen from the above discussion, each Hamiltonian matrix block satisfies the rotational equivariance and has a definite parity under the inversion operation, so we designed an E(3) equivariant HamGNN model based on MPNN to fit TB Hamiltonian matrix. This framework directly captures the electronic structure without expensive selfconsistent iterations by constructing local equivariant representations of each atomic orbit. The network architecture of HamGNN is shown in Fig. 1a. HamGNN can achieve a direct mapping from atomic species {Z_{i}} and positions {r_{i}} to TB Hamiltonian matrix.
Based on the equivariant parameterized Hamiltonian matrix introduced in this work, the prediction of the Hamiltonian matrix is converted to predicting two reducible equivariant tensors: \({\mathbf{\Omega}}_{i}^{on}\) for onsite Hamiltonians and \({\mathbf{\Omega} }_{ij}^{off}\) for offsite Hamiltonians. These tensors satisfy the rotation and inversion equivariance of the O(3) group, therefore we need to construct them in the representation of the O(3) group. HamGNN utilizes equivariant atomic features that are constructed by the direct sum of O(3) representations with different rotation orders l. The O(3) representations of rotation order l can characterize atomic orbitals with angular quantum number l, as they possess the same rotational equivariance. The atomic features are refined through orbital convolution layers (shown in Fig. 2b), which update the atomic features by aggregating equivariant messages constructed from local chemical environments. After refined by T orbital convolution layers, the atomic features are used to construct equivariant pair interaction features via the pair interaction layer (shown in Fig. 2c). \({\mathbf{\Omega}}_{i}^{on}\) and \({\mathbf{\Omega}}_{ij}^{off}\) are obtained by representation transformation of atomic features and pair interaction features, respectively. The Hamiltonian matrix is finally constructed using \({\mathbf{\Omega}}_{i}^{on}\) and \({\mathbf{\Omega}}_{ij}^{off}\) via Eq. (12).
HamGNN first encodes elements, interatomic distances, and relative directions as initial graph embeddings. The atomic numbers \({Z}_{i}\) are encoded as onehot vectors and are subsequently transformed into initial atomic features through a multilayer perceptron (MLP) layer. The distance between atom i and its neighboring atom j within the cutoff radius r_{c} is expanded using the Bessel basis function^{41}:
where f_{c} is the cosine cutoff function^{42}, which guarantees physical continuity for the neighbor atoms close to the cutoff sphere. The list of atomic neighbors is determined by the cutoff radius of each atom’s orbital basis. The interatomic distance is expanded by a set of Bessel functions with \(n=\left[\mathrm{1,2},\cdots ,{N}_{b}\right]\), where N_{b} is the number of Bessel basis functions. In this work, N_{b} is set to 8. The directional information between atom i and atom j is embedded in a set of real spherical harmonics \(\{{Y}_{{m}_{f}}^{{l}_{f}}({\hat{{\bf{r}}}}_{ij})\}\), which will be used to construct the rotationequivariant filter^{43} in the equivariant convolution functions.
The atomic feature tensor \({\rm{V}}={V}_{{l}_{0}}\oplus {V}_{{l}_{1}}\oplus \cdots \oplus {V}_{{l}_{\max }}\) in HamGNN is represented as a direct sum of different irreducible representations of the O(3) group up to a maximum rotation order l_{max}. The features corresponding to each order can characterize the atomic orbitals with distinct angular quantum numbers. If the input structure is rotated, the atomic features will be transformed by the direct sum of the Wigner D matrix, which is represented as a block diagonal matrix \({{D}}={D}_{{{{l}}}_{0}}\oplus {D}_{{{{l}}}_{1}}\oplus \cdots \oplus {D}_{{{{l}}}_{\max }}\). \({V}_{{l}_{i},{p}_{i},{c}_{i},{m}_{i}}^{i,t}\) denotes the tensor element of the equivariant features of atom i in the orbital convolution layer t, where l_{i} ≤ l_{max} is the rotation order of the O(3) irreducible representation, \({p}_{i}\in \{1,1\}\) denotes the parity of the equivariant components of the order l_{i}, \({l}_{i}\le {m}_{i}\le {l}_{i}\) is the index of each projection of the equivariant representation, c_{i} is the channel index (the dimension of the features). We use T orbital convolution layers to construct the equivariant features of the atomic orbits in the local environment. In each orbital convolution layer, the input atomic feature tensor is updated by aggregating the equivariant messages from neighboring atoms. Each equivariant message is generated through the coupling of the feature tensor of neighboring atom j and the convolution filters in an equivariant manner. The rotation equivariant convolution filters are constrained to be a product of learnable radial functions and spherical harmonic functions:
where the radial function \(S({{\bf{r}}}_{ij})=MLP[B({{\bf{r}}}_{ij})]\) is a mapping that transforms the radial basis \(B({{\bf{r}}}_{ij})\) into the rotationinvariant scalar weights through a multilayer perceptron. The tensor product between the feature tensor of the neighboring atom j and the convolution filter is used to generate the equivariant messages sent to atom i. A tensor product of representations is a mathematical operation for combining two given representations \({x}^{({l}_{1})}\in {l}_{1}\) and \({y}^{({l}_{2})}\in {l}_{2}\) to form another equivariant feature \({x}^{({l}_{1})}\otimes {y}^{({l}_{2})}\). The tensor product \({x}^{({l}_{1})}\otimes {y}^{({l}_{2})}\) is reducible and can be expanded into a direct sum of irreducible representations. The value for the irreducible representation of rotation degree \(l\in \{{l}_{1}{l}_{2},\cdots ,{l}_{1}+{l}_{2}\}\) in the direct sum representation of the tensor product \({x}^{({l}_{1})}\otimes {y}^{({l}_{2})}\) is given by
where \({C}_{m,{m}_{1},{m}_{2}}^{l,{l}_{1},{l}_{2}}\) are ClebschGordan coefficients. Finally, the atomic features in each orbital convolution layer are updated by aggregating the equivariant messages of neighboring atoms using the following formulas:
The invariant scalar features of the interatomic distances are used to scale the equivariant output of each rotation order. \({S}_{{l}_{f},{p}_{f},{l}_{j},{p}_{j}}^{{l}_{i},{p}_{i},c}({{\bf{r}}}_{ij})\) is a learnable scalar weight for each filterfeature tensor product path \(({l}_{f},{{{p}}}_{f})\otimes ({l}_{j},{p}_{j},c)\to ({l}_{i},{p}_{i},c)\). To respect the parity equivariance, each filterfeature tensor product path \(({l}_{f},{p}_{f})\otimes ({l}_{j},{p}_{j},c)\to ({l}_{i},{p}_{i},c)\) satisfies the parity selection rule: \({p}_{i}={p}_{f}{p}_{j}\). Equation (20) is the update function, which aggregates equivariant messages from neighboring atoms to update the input atomic features. The updated atomic features are passed to a nonlinear gate activation function^{44}, which scales the input features equivariantly with the invariant field (l ≠ 0) of the input features as the gate.
The pair interaction features, which are utilized to construct the offsite Hamiltonian, are generated by the pair interaction layer. The pair interaction features are the sum of two parts. The first part is the tensor product between the features of two interacting atoms i and j: \(({l}_{i},{p}_{i},c)\otimes ({l}_{j},{p}_{j},c)\to (l,p,c)\), where l and p denote the rotation order and the parity of the pair interaction features, respectively. The second part is the tensor product between the mixed features of atom pairs ij and the convolution filters: \((l{\prime} ,p{\prime} ,c)\otimes ({l}_{f},{p}_{f})\to (l,p,c)\), where l′ and p′ denote the rotation order and the parity of the mixed features of atom pairs ij, respectively. The pair interaction features are finally constructed by the following equation:
where \({W}_{{l}_{i},{p}_{i},c,c{\prime} }^{i}\) and \({W}_{{l}_{j},{p}_{j},c,c{\prime} }^{j}\) are learnable weight matrices used to linearly couple the equivariant features from different channels, \({V}_{l{\prime} ,p{\prime} ,c,m{\prime} }^{ij}=\sum _{c{\prime} }{W{\prime} }_{l{\prime} ,p{\prime} ,c,c{\prime} }^{i}{V}_{l{\prime} ,p{\prime} ,c{\prime} ,m{\prime} }^{i,T}+\sum _{c{\prime} }{W{\prime} }_{l{\prime} ,p{\prime} ,c,c{\prime\;} }^{j}{V}_{l{\prime} ,p{\prime} ,c{\prime} ,m{\prime} }^{j,T}\) is the mixed feature tensor of the atomic features \({V}_{{l}_{i},{p}_{i},{c}_{i},{m}_{i}}^{i,T}\) and \({V}_{{l}_{j},{p}_{j},{c}_{j},{m}_{j}}^{j,T}\).
The onsite layer and offsite layer are used to convert the node features \({V}_{{l}_{i},{p}_{i},{c}_{i},{m}_{i}}^{i,T}\) and pair interaction features \({\omega }_{l,p,c,m}^{ij}\) into the direct sums \({\mathbf{\Omega}}_{i}^{on}\) and \({\mathbf{\Omega}}_{ij}^{off}\) of the ISTs required to construct onsite and offsite Hamiltonian blocks, respectively. We add shortcut connections in the onsite layer and offsite layer and use a norm activation function that scales the modulus of the irreducible representations of each order nonlinearly to increase the nonlinear fitting ability of the network. In the last step, the network uses the ISTs in \({\mathbf{\Omega} }_{i}^{on}\) and \({\mathbf{\Omega} }_{ij}^{off}\) to construct the onsite and offsite Hamiltonian blocks through Eq. (12). The final predicted Hermitian Hamiltonian is obtained by the following symmetrization:
Tests and applications
To assess the accuracy and transferability of HamGNN, we trained and tested HamGNN on the Hamiltonian matrices and electronic structures for the periodic and aperiodic systems, including various molecules, periodical solids, a nanoscale dislocation defect, and a Moiré superlattice. Previously reported models such as SchNorb^{35}, PhiSNet^{36}, and DeepH^{45} are trained and tested on the slightly perturbed structures from only one configuration each time. Since HamGNN is based on the universal parameterized Hamiltonian proposed in this work, our model can be trained and tested on structures with the same atomic species but different configurations in the same way as MLIPs.
Molecules
The QM9 dataset^{46,47} contains 134k stable small organic molecules made up of CHONF. These small organic molecules are important candidates for drug discovery. The development of general ML models for rapid screening of the electronic structure properties of drug molecules is beneficial for understanding the mechanisms of drugs and shortening the cycle of drug development. We calculated the realspace TB Hamiltonian matrices using OpenMX for 10,000 randomly selected molecules from the QM9 dataset. We divided the whole dataset into the training, validation, and test set with a ratio of 0.8: 0.1: 0.1. As shown in Fig. 2a, the predicted values exhibit a high degree of agreement with the DFTcalculated values of the Hamiltonian matrices for various configurations in the test set. The mean absolute error (MAE) of the Hamiltonian matrix element predicted by the trained HamGNN model on the test set is only 1.49 meV. Please note that all MAEs mentioned in our manuscript refer to the mean absolute errors between predicted values and ground truth (DFT values), unless otherwise stated. Each of the orbital energy calculated by diagonalizing the predicted Hamiltonian matrix coincides almost exactly with each DFTcalculated orbital energy (see Fig. 2b), showing high precision and transferability. The orbital energies calculated by HamGNN and OpenMX for the four molecules shown in Fig. 2b are listed in Supplementary Table 1.
We also trained HamGNN on the Hamiltonians of several specific small molecules generated by ab initio molecular dynamics (MD) and compared the accuracy of HamGNN with two recently reported models, PhiSNet^{36} and DeepH^{45}. The Hamiltonian matrices of these molecules were calculated by OpenMX and divided into the training, validation, and test sets in the same way as PhiSNet in ref. ^{36}. The MAEs of the Hamiltonian matrices predicted by DeepH are from ref. ^{45}. As can be seen from Table 1, HamGNN achieves the highest accuracy among the three models. The error of HamGNN as a function of training epochs on both the training and validation sets is highly consistent (see Supplementary Discussion 1). The prediction error of DeepH is higher than that of PhiSNet and HamGNN because the local coordinate system used by DeepH is not strictly equivariant. Although SE(3) equivariant PhiSNet shows high accuracy in predicting the molecules, it is not a universal equivariant model because it does not satisfy the parity symmetry of the Hamiltonian matrix strictly. Our tests have revealed that neglecting parity symmetry can significantly impair its fitting ability for solid materials (see Supplementary Discussion 2).
Periodic solids
We have collected 426 carbon allotropes from the Samara Carbon Allotrope Database (SACADA)^{48} and 30 silicon allotropes from Materials Project^{49}, each containing no more than 60 atoms in its unit cell. We have also collected 221 SiO_{2} isomers from the Materials Project, each containing no more than 80 atoms in its unit cell. We have perturbed each of the collected SiO_{2} isomers three times, resulting in 663 perturbed SiO_{2} structures with a random atomic displacement of 0.1 Å. We performed DFT calculations using OpenMX to obtain the ab initio TB Hamiltonian matrices for these carbon allotropes, silicon allotropes, and perturbed SiO_{2} isomers. The Hamiltonian matrices in each dataset were divided into training, validation, and test sets with a ratio of 0.8:0.1:0.1. Three separate HamGNN models were trained on the carbon allotropes, silicon allotropes, and perturbed SiO_{2} isomers, respectively.
The MAEs of the Hamiltonian matrix predicted by HamGNN for the structures in the test set of carbon allotropes, silicon allotropes, and SiO_{2} isomers are 1.55 meV, 2.01 meV, and 2.29 meV, respectively. There are tiny variations in model accuracy for the HamGNN models trained with different initial random network weights (see Supplementary Discussion 3), indicating the high stability of HamGNN. The MAE of HamGNN on the carbon allotropes is even lower than the error (2.0 meV) of DeepH on the training dataset of only the graphene structures^{45}. Most importantly, the HamGNN model trained on the carbon allotropes is transferable and can fit the Hamiltonian matrices of the carbon allotropes with arbitrary sizes and configurations beyond the training set. Achieving transferable predictions for the DeepH model that utilizes local coordinate systems is difficult, as discussed in Supplementary Discussion 4.
We used pentadiamond^{50}, Moiré twisted bilayer graphene (TBG), Si (MP1199894), and SiO_{2} (MP1257168) to test the accuracy and transferability of the HamGNN models trained on the three datasets. The test structures are shown in Fig. 3ad. Pentadiamond is a threedimensional carbon foam constructed from carbon pentagons and contains 88 carbon atoms in the unit cell^{50}. There are 1084 carbon atoms in the twisted bilayer graphene with a Moiré angle θ ≈ 3.48°. The interlayer spacing of TBG is about 4.0 Å. The carbon atoms within each layer are connected by strong covalent bonds, while the layers are stacked by the weak van der Waals forces. The Si structure labeled MP1199894 contains 82 atoms in the unit cell and crystallizes in the monoclinic C2/c space group. The SiO_{2} structure labeled MP1257168 is characterized by the complex porous structures built by SiO_{4} tetrahedra and has 180 atoms in the unit cell. The MAE of the Hamiltonian matrix elements predicted by HamGNN for the pentadiamond, TBG, Si (MP1204046), and SiO_{2} (MP667371) is only 1.54 meV, 3.23 meV, 1.39 meV, and 1.67 meV, respectively. The high accuracy can be seen in the scatter plots of the predicted Hamiltonian matrices versus the DFT calculated Hamiltonian matrices shown in Fig. 3eh. Although our carbon allotropic dataset does not contain any bilayer graphene structures, HamGNN automatically learns van der Waals interactions from this dataset and successfully predicts the Hamiltonian matrix of TBG. The energy bands obtained by diagonalizing the predicted Hamiltonians closely align with those from the DFT calculations, as shown in Fig. 3il.
Dislocation defect
In general, point defects only induce local structural distortions, so the supercell used for point defect simulation does not need a very large size and can be calculated directly using the DFT method^{51,52}. However, as onedimensional line defects, edge dislocations induce elastic stress fields in their surroundings^{53,54,55}, resulting in significantly greater lattice distortions compared to point defects. Simulating an isolated edge dislocation necessitates the utilization of a large supercell to mitigate the strong elastic interactions between the dislocation and its periodic image. The high density of dislocations in small supercells may lead to unrealistic lattice distortion and inaccurate electronic structures. However, the computational complexity of the large supercells required to simulate the edge dislocations is a great challenge for DFT methods. HamGNN enables the direct mapping from structure to the Hamiltonian matrix, providing the possibility to compute electronic structures of dislocation defects in large supercells.
In this work, we take an isolated edge dislocation in a large silicon supercell as an example to demonstrate the high efficiency of the HamGNN model to simulate nanoscale defects in large systems. Crystalline silicon has a diamondtype crystal structure, whose most favorable slip system belongs to the type ½〈110〉{111}. Taking the {111} plane of silicon as the slip plane, we constructed an isolated edge dislocation with Burgers vector 1/2 < 110 > , as shown in Fig. 4a. There are 4284 atoms in this supercell containing an isolated edge dislocation. We utilized the HamGNN model, which was trained on Si allotropes, to predict the Hamiltonian matrix of this supercell. Subsequently, we computed its band structure and charge density of valence band maximum (VBM) using the predicted Hamiltonian matrix.
The defect energy bands introduced by the dislocation defect appear within the band gap, as shown in Fig. 4b, which significantly narrows the band gap of bulk Si. The predicted valence band maximum (VBM) at the Gamma point is located on an occupied defect energy band. The VBM wave function shown in Fig. 4a is mainly distributed in the dislocation core, indicating that this occupied band is caused by the hanging bonds and structural distortion in the core. The predicted electronic structure in the dislocation core is analogous to that obtained from the DFT simulation for a nonperiodical dislocation model containing only 358 atoms^{56}. However, the dangling bonds at the boundaries of the nonperiodical supercell are saturated by hydrogens^{56}, which may introduce unrealistic electronic states from hydrogens. We use HamGNN to directly compute the electronic structures of dislocations in large periodic supercells, thus overcoming the potential issues arising from the use of small or nonperiodic supercells. With the ability to establish a shortcut from structure to ab initio Hamiltonian matrix, HamGNN enables direct calculation of electronic structures for large supercells without the costly SCF iterations. HamGNN demonstrated exceptional speed and efficiency, taking only 36 seconds to calculate the Hamiltonian matrix of the silicon supercell containing 4284 atoms on a node equipped with 80 Intel(R) Xeon(R) Gold 6248 CPU cores. To verify the reliability of the trained HamGNN model in predicting the electronic structures for silicon dislocation defects, we constructed a small silicon supercell containing an edge dislocation defect and compared the Hamiltonian matrix, energy bands, and the wave function predicted by HamGNN with those calculated by OpenMX (see Supplementary Discussion 5).
Moiré superlattice of bilayer MoS_{2}
MoS_{2} is a 2D transition metal dichalcogenide (TMD) that has attracted much attention because it is an excellent semiconductor with a wide range of applications in the field of electronics and optoelectronics^{57,58,59,60}. Different from monolayer or untwisted bilayer MoS_{2}, the twisted bilayer MoS_{2} with Moiré angles has been found to have flat bands and shear solitons^{57,61,62,63}, which may lead to some interesting physical phenomena, such as superconducting states, quantum Hall insulators, Mottinsulating phases. However, the electronic structure calculation of Moiré twisted bilayer MoS_{2} by DFT is relatively expensive due to its large size. To demonstrate the accuracy and efficiency of HamGNN in replacing DFT for calculating the electronic structure of Moiré twisted twodimensional materials, we conducted a quantitative comparison between the HamGNN prediction and the DFT calculation for twisted bilayer MoS_{2} with a Moiré angle of 3.5°. Before predicting the electronic structure of the Moiré twisted bilayer MoS_{2} superlattice, we trained a HamGNN model using a dataset consisting of 500 untwisted bilayer MoS_{2} structures, each containing 54 atoms. Each MoS_{2} bilayer structure in the dataset has a random interlayer sliding distance of up to 2 angstroms along a random direction. The layer spacing of each MoS_{2} bilayer structure in the dataset was randomly shifted by a maximum of 0.5 angstroms. We used OpenMX to calculate the Hamiltonian matrices for the untwisted MoS_{2} bilayer structures, and divided the dataset into training, validation, and test sets with a ratio of 0.8: 0.1: 0.1. The MAE of the trained HamGNN on the test set is only 0.82 meV.
We used OpenMX and the trained HamGNN model to calculate the Hamiltonian matrix of the twisted bilayer MoS_{2} with a Moiré angle of 3.5°, which contains 1626 atoms in a unit cell. HamGNN and OpenMX performed the calculations on a node with 80 Intel(R) Xeon(R) Gold 6248 CPU cores. HamGNN only took about 21 s, while OpenMX required approximately 73,000 s to complete this task. This indicates that HamGNN significantly improves the efficiency of electronic structure calculations. The MAE between the Hamiltonian matrix predicted by HamGNN and that calculated by OpenMX is only 0.89 meV. The energy bands, obtained by diagonalizing the Hamiltonian matrices from HamGNN and OpenMX, are shown in Fig. 5a. The predicted bands show a high level of agreement with those computed by OpenMX. Besides, the appearance of flat bands at the valence band edge and the Dirac cone at the K point agrees well with the energy bands calculated by VASP in ref. ^{61}. The spatial distribution of the predicted VBM wave function, as illustrated in Fig. 5b, exhibits a high degree of localization around the Moiré patterns, which is consistent with the computational results obtained from VASP calculations under LDA and PBE functionals^{61}.
Through the quantitative comparisons, we demonstrate that HamGNN models trained on small systems can serve as an efficient alternative to DFT for accurately calculating the electronic structure of large systems. After being trained on the Hamiltonian matrices of small untwisted bilayer structures, the HamGNN model can effectively predict the electronic structure of smallangle twisted materials, which is computationally expensive for DFT calculations. The high accuracy and efficiency of HamGNN demonstrate its immense potential in accelerating the electronic structure calculations for smallangle twisted bilayers of various transition metal dichalcogenides (TMDs). In Supplementary Discussion 6, we have discussed the computational scalability of HamGNN with respect to system sizes and also compared its computational efficiency with OpenMX for bilayer MoS_{2} slabs of varying twist angles.
Bi_{x}Se_{y} quantum materials
Bi and Se have multiple chemical valences and can form a set of binary compounds Bi_{x}Se_{y} with various stoichiometric ratios^{64,65}. Bi is a heavy element whose d electrons have strong SOC effects. A total of 19 Bi_{x}Se_{y} compounds can be found on Materials Project^{49}. The compound Bi_{8}Se_{7} (id: MP680214) shown in Fig. 6a, which contains 45 atoms in the unit cell, was used to test the transferability and accuracy of HamGNN. The remaining 18 Bi_{x}Se_{y} compounds, which contain no more than 40 atoms in the unit cell, were used to generate the training set for the network. To increase the size of the training set, we applied a random perturbation up to 0.02 Å to the atoms of each Bi_{x}Se_{y} structure to generate 50 perturbed structures and obtained 900 structures in total. These structures were randomly divided into the training, validation, and test sets with a ratio of 0.8: 0.1: 0.1. The MAE of the real part of the SOC Hamiltonian predicted by the trained model for Bi_{8}Se_{7} is 1.29 meV, and the MAE of the imaginary part of the SOC Hamiltonian is only 5.0 × 10^{−7} meV. As the SOC effect primarily manifests in the imaginary component of the Hamiltonian matrix, such a low MAE for the imaginary part indicates that our parameterized SOC Hamiltonian has a strong fitting ability across various structures. As shown in Fig. 6b, the predicted energy bands of Bi_{8}Se_{7} are in good agreement with those calculated by OpenMX. Although the training set does not contain any compounds with a stoichiometric ratio of Bi_{8}Se_{7}, the HamGNN model still accurately predicted the SOC Hamiltonian matrix and energy bands of this structure, demonstrating its high transferability.
Bi_{2}Se_{3} is a widely recognized 3D topological insulator material, which serves as an exceptional platform for exploring quantum phenomena related to the effects of spinorbit coupling (SOC)^{66,67,68,69,70}. Bulk Bi_{2}Se_{3} is stacked by quintuple layers (QLs) through the van der Waals (vdW) interaction, as is shown in Fig. 6c. Each QL layer is composed of five atomic layers of SeBiSeBiSe combined by strong covalent bonds. As shown in Fig. 6d, the Gpoint band gaps predicted by HamGNN exhibit a high degree of agreement with those obtained from DFT calculations for Bi_{2}Se_{3} slabs comprising 1 to 7 QLs. As shown in Fig. 6e, the addition of more QL layers results in a gradual decrease of the Gpoint band gap and a tendency towards linear band dispersion at the G point, ultimately forming a Dirac cone. The spin textures of the lowest unoccupied states, located 0.07 eV and 0.23 eV above the conduction band minimum (CBM), were computed using the Hamiltonian matrix predicted by HamGNN, as illustrated in Fig. 6f. The predicted spin textures are in good agreement with the features of the Dirac cone, which is a topological surface state protected by timereversal symmetry and characterized by spinmomentum locking.
Discussion
DFT methods are now widely used to calculate various properties of molecules and materials. However, successful implementation of DFT calculations on large systems remains infrequent due to the significant computational resources and running time required. A typical DFT calculation often requires tens to hundreds of selfconsistent iterations to obtain the final Hamiltonian and wave function, with the diagonalization of the Hamiltonian matrix performed on a dense kpoint grid at each iteration step. This process takes up most of the running time of DFT calculations and can not be circumvented. The emergence of deep learning in recent years has enabled efficient atomic simulations with DFT accuracy, as evidenced by the widespread use of machine learning interatomic potentials (MLIPs) that offer quantum mechanical precision to accelerate longtime molecular dynamics simulations for large systems. As potential energy is just an invariant scalar, the implementation of transferable MLIP models is relatively straightforward. However, due to the rotational equivariance and parity symmetry of the Hamiltonian matrix, developing a transferrable model for directly predicting the Hamiltonian matrix is highly challenging.
In this work, we propose an analytical E(3) equivariant parameterized Hamiltonian matrix that explicitly takes into account rotation equivariance and parity symmetry. Furthermore, we extend it to a parameterized Hamiltonian matrix satisfying SU(2) and timereversal equivariance to fit the Hamiltonian matrix with SOC effects. Based on this parameterized Hamiltonian matrix, we develop an E(3) equivariant deep neural network called HamGNN to fit the Hamiltonian matrix of various molecules and solids. Previously reported models were trained and tested on the datasets consisting of slightly perturbed molecules or crystals from the same configuration. To demonstrate the accuracy and transferability of our parameterized Hamiltonian matrix, we used the trained HamGNN model to predict the electronic structures of the molecules, periodic solids, the silicon dislocation defect, Moiré twisted bilayer MoS_{2}, and Bi_{x}Se_{y} quantum materials. The results of actual tests demonstrate that our model exhibits a high level of accuracy in comparison to DFT, while also demonstrating a significant degree of transferability similar to that of MLIPs. These performances provide a crucial foundation for the extensive implementation of electronic structure methods in machine learning. Since our model can establish a direct mapping from the structure to the selfconsistent Hamiltonian matrix without the timeconsuming selfconsistent iterations in DFT, it can be used to accelerate the electronic structure calculation of large systems and other costly advanced calculations, such as the electronphonon coupling matrix via the automatic differentiation ability of the neural network.
Methods
Hamiltonian datasets and DFT calculation details
QM9 structure set and the molecules in Table 1 are available from http://quantummachine.org/datasets/, SACADA structure set is available from https://www.sacada.info/sacada_3D.php, and the structures of Si allotropes, SiO_{2} isomers, and Bi_{x}Se_{y} crystals are downloaded from the Materials Project site^{49}. To prepare the training set of untwisted bilayer MoS_{2}, a random perturbation of up to 0.02 Å is applied to each atom, and a slip of up to 2 Å is performed in a random direction within the XY plane. The layer spacing of the bilayer MoS_{2} was randomly shifted by a maximum of 0.5 angstroms. We performed DFT calculations on the structures in the above datasets to obtain TB Hamiltonian matrices via OpenMX^{17}, a software package for nanoscale material simulations based on normconserving pseudopotentials and pseudoatomic localized basis functions. The PBE (PerdewBurkeErnzerhof)^{71} functional is employed for all OpenMX calculations presented in this study. H6.0s2p1, C6.0s2p2d1, N6.0s2p2d1, O6.0s2p2d1, F6.0s2p2d1, Si7.0s2p2d1, Mo7.0s3p2d2, Bi8.0s3p2d2, and Se7.0s3p2d2 pseudoatomic orbitals (PAOs) were used as the basis for the calculations. The truncation radius of the atomic orbits of H, C, N, O, and F is 6.0 Bohr, the truncation radius of the atomic orbits of Si, Mo, and Se is 7.0 Bohr, the truncation radius of the atomic orbits of Bi is 8.0 Bohr. The cutoff energy and Kpoint grid used by OpenMX in the calculation of the Hamiltonian matrices for each dataset are listed in Supplementary Table 6. The Si dislocation model was built by Atomsk^{72} and relaxed by GPUMD^{73} with a force criterion of 0.1 eV ∙ Å^{−1}. The VASP^{74,75} results of Moiré superlattice of bilayer MoS_{2}, which serve to validate the accuracy of our HamGNN predictions, are referenced from Naik et al.‘s work^{61}. They utilized LDA and PBE functionals to calculate the electronic structure of twisted bilayer MoS_{2} with a Moiré angle of 3.5°. For the OpenMX calculation on the Moiré twisted bilayer MoS_{2}, we used a cutoff energy of 150 Ry and a 4 × 4 × 1 kpoint mesh to sample the Brillouin zone.
Hamiltonian construction details
The majority of computational time in DFT calculations is dedicated to selfconsistent iterations aimed at obtaining the selfconsistent charge density \(\rho\), which is used to determine the final Hamiltonian matrix and wave functions. The KohnSham Hamiltonian can be written as \(H=T+{V}_{eff}\), where \({V}_{eff}=\sum _{l}{V}_{ec}({\bf{r}}{{\boldsymbol{\tau }}}_{l})+{V}_{scf}[\rho ]\). \({V}_{ec}({\bf{r}}{{\boldsymbol{\tau }}}_{l})\) is the Coulomb potential of the core charges of each atom l, \({V}_{scf}[\rho ]=\frac{\delta {E}_{ee}}{\delta \rho }+\frac{\delta {E}_{xc}}{\delta \rho }\) denotes the sum of the potentials that arise from the electronelectron interaction energy \({E}_{ee}[\rho ]\) and the exchangecorrelation energy \({E}_{xc}[\rho ]\). The matrix elements of the kinetic energy operator \(\langle {\phi }_{{n}_{i}{l}_{i}{m}_{i}}({\bf{r}}{{\boldsymbol{\tau }}}_{i})\hat{T}{\phi }_{{n}_{j}{l}_{j}{m}_{j}}({\bf{r}}{{\boldsymbol{\tau }}}_{j}{{\bf{R}}}_{{n}_{c}})\rangle\) and the Coulomb potential of the atomic cores \(\langle {\phi }_{{n}_{i}{l}_{i}{m}_{i}}({\bf{r}}{{\boldsymbol{\tau }}}_{i})\sum _{l}{V}_{ec}({\bf{r}}{{\boldsymbol{\tau }}}_{l}){\phi }_{{n}_{j}{l}_{j}{m}_{j}}({\bf{r}}{{\boldsymbol{\tau }}}_{j}{{\bf{R}}}_{{n}_{c}})\rangle\) are not functionals of the selfconsistent charge density and can be calculated analytically without SCF iterations. The complete Hamiltonian can be obtained by the sum of the SCFindependent matrices calculated analytically from the atomic coordinates and the SCFdependent matrix \({V}_{scf}[\rho ]\) that can be fitted by the equivariant neural networks. The general parameterized Hamiltonian proposed in this work can be applied to all tightbinding Hamiltonian matrices calculated using various exchangecorrelation functionals (see Supplementary Discussion 7).
Network and training details
The Pytorch1.11.0^{76}, PyG2.0.4^{77}, Pymatgen^{78}, Nequip0.5.3^{33}, and e3nn0.5.0^{79} libraries are used to implement HamGNN. All models used in this work have five orbital convolution layers, a pair interaction layer, an onsite layer, and an offsite layer. The edge distances between each atom and its neighbors are expanded by eight Bessel function bases. The spherical harmonic functions with a maximum degree L_{max} = 4 are used to embed the directions of the edges. The O(3) representations used for the atom features and pair interaction features have N_{fea} channels with a maximum degree L_{max} and parities p = ±1. The number of feature channels N_{fea} and the maximum degree L_{max} for each dataset are listed in Supplementary Table 7. A twolayer MLP with 64 neurons is used to map the invariant edge embeddings to the weights of each tensor product path in Eqs. (19) and (21). Shifted softplus^{31} function is used as the activation function in the MLP. The gate activation function scales the input features \((\mathop{\oplus }\limits_{i}{u}_{{p}_{i}}^{(0)})\oplus (\mathop{\oplus }\limits_{j}{v}_{{p{\prime} }_{j}}^{(0)})\oplus (\mathop{\oplus }\limits_{j}{w}_{{p}_{j}}^{({l}_{j} > 0)})\) with its invariant field \((\mathop{\oplus }\limits_{i}{u}_{{p}_{i}}^{(0)})\) and \((\mathop{\oplus }\limits_{j}{v}_{{p{\prime} }_{j}}^{(0)})\) as the gate. The output equivariant features of gate nonlinearity are \([\mathop{\oplus }\limits_{i}{\phi }_{{p}_{i}}^{1}({u}_{{p}_{i}}^{(0)})]\oplus [\mathop{\oplus }\limits_{j}{\phi }_{{p{\prime} }_{j}}^{2}({v}_{{p{\prime} }_{j}}^{(0)}){w}_{{p}_{j}}^{({l}_{j})}]\), where \({\phi }_{{p}_{i}}^{1}\) and \({\phi }_{{p{\prime} }_{j}}^{2}\) are the activation functions that vary with the parity of the scalar input, defined as follows^{33}:
\({\phi }_{{p}_{i}}^{1}(x)\) has the same parity as the input scalar x, while \({\phi }_{{p{\prime} }_{j}}^{2}(x)\) always has even parity. This ensures that the parity of the output features of the Gate activation function is equivariant.
To increase transferability and avoid overfitting, we include the error of the calculated energy bands as a regularization term in the loss function:
where the variables marked with a tilde refer to the corresponding predictions and λ denotes the loss weight of the band energy error. λ equals 0.001 in our training. When the training of the network has not converged, the error of the predicted Hamiltonian is large, resulting in poor or even divergent prediction values of the energy bands. Adding the band loss value at the beginning of training may cause the total loss value to diverge. Therefore, we train the network in two steps. First, only the mean absolute error of Hamiltonian matrices is used as the loss value to train the network until the network weights converge. The parameters were optimized with AdamW^{80,81} optimizer using an initial learning rate of 10^{−3}. Then the mean absolute error of each band calculated at N_{k} random points in the reciprocal space is added to the loss function and starts the training at an initial learning rate of 10^{−4}. When the accuracy of the model on the validation set is not improved after successive N_{patience} epochs, the learning rate will be reduced by a factor of 0.5. When the accuracy of the model on the validation set is not improved after successive N_{stop} epochs or the learning rate is lower than 10^{−6}, the training will be stopped and the model that has the best accuracy on the validation set will be used on the test set. The values of some key network and training parameters on each dataset are listed in Supplementary Table 7. All models were trained on a single NVIDIA A100 GPU.
Data availability
The pretrained models and the test examples are available on Zenodo (https://doi.org/10.5281/zenodo.8147631). The training datasets are available on Zenodo (https://doi.org/10.5281/zenodo.8157128).
Code availability
The HamGNN code used in the current study is available on GitHub (https://github.com/QuantumLabZY/HamGNN).
References
Choudhary, K. et al. Highthroughput density functional perturbation theory and machine learning predictions of infrared, piezoelectric, and dielectric responses. npj Comput. Mater. 6, 64 (2020).
Liu, Y. T., Zhou, Q. & Cui, G. L. Machine learning boosting the development of advanced lithium batteries. Small Methods 5, 2100442 (2021).
Zhang, N. et al. Machine learning in screening high performance electrocatalysts for CO_{2} reduction. Small Methods 5, 2100987 (2021).
Manzhos, S. & Carrington, T. Jr Neural network potential energy surfaces for small molecules and reactions. Chem. Rev. 121, 10187–10217 (2021).
Choudhary, K. et al. Unified graph neural network forcefield for the periodic table: solid state applications. Digit. Discov. 2, 346–355 (2023).
Cheng, Z. et al. Building quantum mechanics quality force fields of proteins with the generalized energybased fragmentation approach and machine learning. Phys. Chem. Chem. Phys. 24, 1326–1337 (2022).
Unke, O. T. et al. SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 12, 7273 (2021).
Cheng, G. J., Gong, X. G. & Yin, W. J. Crystal structure prediction by combining graph network and optimization algorithm. Nat. Commun. 13, 1492 (2022).
Ganea, O. et al. GeoMol: torsional geometric generation of molecular 3D conformer ensembles. Adv. Neural Inf. Process. Syst. 34, 13757–13769 (2021).
Xie, T., Fu, X., Ganea, O.E., Barzilay, R. & Jaakkola, T. Crystal diffusion variational autoencoder for periodic material generation. In International Conference on Learning Representations (ICLR, 2019).
Kirkpatrick, J. et al. Pushing the frontiers of density functionals by solving the fractional electron problem. Science 374, 1385–1389 (2021).
Nagai, R., Akashi, R. & Sugino, O. Machinelearningbased exchange correlation functional with physical asymptotic constraints. Phys. Rev. Res. 4, 013106 (2022).
Goringe, C. M., Bowler, D. R. & Hernandez, E. Tightbinding modelling of materials. Rep. Prog. Phys. 60, 1447–1512 (1997).
Papaconstantopoulos, D. A. & Mehl, M. J. The SlaterKoster tightbinding method: a computationally efficient and accurate approach. J. Phys. Condens. Matter 15, R413–R440 (2003).
Hegde, G. & Bowen, R. C. Machinelearned approximations to density functional theory Hamiltonians. Sci. Rep. 7, 42669 (2017).
Wang, Z. F. et al. Machine learning method for tightbinding Hamiltonian parameterization from abinitio band structure. npj Comput. Mater. 7, 11 (2021).
Ozaki, T. Variationally optimized atomic orbitals for largescale electronic structures. Phys. Rev. B 67, 155108 (2003).
Ozaki, T. & Kino, H. Numerical atomic basis orbitals from H to Kr. Phys. Rev. B 69, 195113 (2004).
Artacho, E. et al. The SIESTA method; developments and applicability. J. Phys. 20, 064208 (2008).
Garcia, A. et al. Siesta: recent developments and applications. J. Chem. Phys. 152, 204108 (2020).
Morrison, M. A. & Parker, G. A. A guide to rotations in quantummechanics. Aust. J. Phys. 40, 465–497 (1987).
Weinert, U. Spherical tensor representation. Arch. Ration. Mech. 74, 165–196 (1980).
Zhang, L. et al. Equivariant analytical mapping of first principles Hamiltonians to accurate and transferable materials models. npj Comput. Mater. 8, 158 (2022).
Nigam, J., Willatt, M. J. & Ceriotti, M. Equivariant representations for molecular Hamiltonians and Ncenter atomicscale properties. J. Chem. Phys. 156, 014115 (2022).
Unke, O. T. & Meuwly, M. A reactive, scalable, and transferable model for molecular energies from a neural network approach based on local information. J. Chem. Phys. 148, 241708 (2018).
Christensen, A. S., Bratholm, L. A., Faber, F. A. & Anatole von Lilienfeld, O. FCHL revisited: Faster and more accurate quantum machine learning. J. Chem. Phys. 152, 044107 (2020).
Faber, F. A., Christensen, A. S., Huang, B. & von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018).
Bartok, A. P., Kondor, R. & Csanyi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. Proc. 34th Int. Conf. Mach. Learn. 70, 1263–1272 (2017).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Schutt, K. T., Sauceda, H. E., Kindermans, P. J., Tkatchenko, A. & Muller, K. R. SchNet  A deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Choudhary, K. & DeCost, B. Atomistic line graph neural network for improved materials property predictions. npj Comput. Mater. 7, 185 (2021).
Batzner, S. et al. E(3)equivariant graph neural networks for dataefficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Wang, Z. et al. Heterogeneous relational message passing networks for molecular dynamics simulations. npj Comput. Mater. 8, 53 (2022).
Schutt, K. T., Gastegger, M., Tkatchenko, A., Muller, K. R. & Maurer, R. J. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nat. Commun. 10, 5024 (2019).
Unke, O. et al. SE(3)equivariant prediction of molecular wavefunctions and electronic densities. Adv. Neural Inf. Process. Syst. 34, 14434–14447 (2021).
Li, H. C., Collins, C., Tanha, M., Gordon, G. J. & Yaron, D. J. A density functional tight binding layer for deep learning of chemical Hamiltonians. J. Chem. Theory Comput. 14, 5764–5776 (2018).
Grisafi, A., Wilkins, D. M., Csanyi, G. & Ceriotti, M. Symmetryadapted machine learning for tensorial properties of atomistic systems. Phys. Rev. Lett. 120, 036002 (2018).
Jones, M. D. & Albers, R. C. Spinorbit coupling in an felectron tightbinding model: Electronic properties of Th, U, and Pu. Phys. Rev. B 79, 045107 (2009).
Hemstreet, L. A., Fong, C. Y. & Nelson, J. S. Firstprinciples calculations of spinorbit splittings in solids using nonlocal separable pseudopotentials. Phys. Rev. B: Condens. Matter 47, 4238–4243 (1993).
Klicpera, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. In International Conference on Learning Representations (ICLR, 2019).
Unke, O. T. & Meuwly, M. PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).
Thomas, N. et al. Tensor field networks: Rotation and translationequivariant neural networks for 3D point clouds. Preprint at https://doi.org/10.48550/arXiv.1802.08219 (2018).
Weiler, M., Geiger, M., Welling, M., Boomsma, W. & Cohen, T. S. 3d steerable cnns: Learning rotationally equivariant features in volumetric data. Adv. Neural Inf. Process. Syst. 31, 10381–10392 (2018).
Li, H. et al. Deeplearning density functional theory Hamiltonian for efficient ab initio electronicstructure calculation. Nat. Comput. Sci. 2, 367–377 (2022).
Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J. L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB17. J. Chem. Inf. Model. 52, 2864–2875 (2012).
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).
Hoffmann, R., Kabanov, A. A., Golov, A. A. & Proserpio, D. M. Homo citans and carbon allotropes: for an ethics of citation. Angew. Chem. Int. Ed. 55, 10962–10976 (2016).
Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Fujii, Y., Maruyama, M., Cuong, N. T. & Okada, S. Pentadiamond: a hard carbon allotrope of a pentagonal network of sp^{2} and sp^{3} C atoms. Phys. Rev. Lett. 125, 016001 (2020).
Puska, M. J. Point defects in silicon, firstprinciples calculations. Comput. Mater. Sci. 17, 365–373 (2000).
McCluskey, M. D. & Janotti, A. Defects in semiconductors. J. Appl. Phys. 127, 190401 (2020).
Blumenau, A. T. et al. Dislocations in diamond: Dissociation into partials and their glide motion. Phys. Rev. B 68, 014115 (2003).
Li, Z. & Picu, R. C. Shuffleglide dislocation transformation in Si. J. Appl. Phys. 113, 083519 (2013).
Das, S. & Gavini, V. Electronic structure study of screw dislocation core energetics in Aluminum and core energetics informed forces in a dislocation aggregate. J. Mech. Phys. Solids 104, 115–143 (2017).
Wang, J. W., Xu, W. W., Wang, R., Laref, A. & Wu, X. Z. Structural and electronic properties of 90 degrees dislocations in silicon nanorods: A firstprinciples calculation. Comput. Mater. Sci. 149, 243–249 (2018).
Wu, F., Lovorn, T., Tutuc, E., Martin, I. & MacDonald, A. H. Topological insulators in twisted transition metal dichalcogenide homobilayers. Phys. Rev. Lett. 122, 086402 (2019).
Li, X. & Zhu, H. Twodimensional MoS_{2}: Properties, preparation, and applications. J. Mater. 1, 33–44 (2015).
Conley, H. J. et al. Bandgap engineering of strained monolayer and bilayer MoS_{2}. Nano. Lett. 13, 3626–3630 (2013).
Xu, K. et al. The role of Anderson’s rule in determining electronic, optical and transport properties of transition metal dichalcogenide heterostructures. Phys. Chem. Chem. Phys. 20, 30351–30364 (2018).
Naik, M. H. & Jain, M. Ultraflatbands and shear solitons in Moiré patterns of twisted bilayer transition metal dichalcogenides. Phys. Rev. Lett. 121, 266401 (2018).
Devakul, T., Crepel, V., Zhang, Y. & Fu, L. Magic in twisted transition metal dichalcogenide bilayers. Nat. Commun. 12, 6730 (2021).
Liu, Y. et al. Moiré superlattices and related Moiré excitons in twisted van der Waals heterostructures. Chem. Soc. Rev. 50, 6401–6422 (2021).
Majhi, K. et al. Emergence of a weak topological insulator from the Bi_{x}Se_{y} family. Appl. Phys. Lett. 110, 162102 (2017).
Goncalves, P. H. R. et al. Formation of Bi_{x}Se_{y} phases upon annealing of the topological insulator Bi_{2}Se_{3}: Stabilization of indepth bismuth bilayers. J. Phys. Chem. Lett. 9, 954–960 (2018).
Yazyev, O. V., Moore, J. E. & Louie, S. G. Spin polarization and transport of surface states in the topological insulators Bi_{2}Se_{3} and Bi_{2}Te_{3} from first principles. Phys. Rev. Lett. 105, 266806 (2010).
Zhang, Y. et al. Crossover of the threedimensional topological insulator Bi_{2}Se_{3} to the twodimensional limit. Nat. Phys. 6, 584–588 (2010).
Crowley, J. M., TahirKheli, J. & Goddard, W. A. 3rd accurate ab initio quantum mechanics simulations of Bi_{2}Se_{3} and Bi_{2}Te_{3} topological insulator surfaces. J. Phys. Chem. Lett. 6, 3792–3796 (2015).
Mazumder, K. & Shirage, P. M. A brief review of Bi_{2}Se_{3} based topological insulator: From fundamentals to applications. J. Alloy. Compd. 888, 161492 (2021).
Shirali, K., Shelton, W. A. & Vekhter, I. Importance of van der Waals interactions for ab initio studies of topological insulators. J. Phys. Condens. Matter 33, 035702 (2020).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
Hirel, P. Atomsk: A tool for manipulating and converting atomic data files. Comput. Phys. Comm. 197, 212–219 (2015).
Fan, Z. et al. GPUMD: A package for constructing accurate machinelearned potentials and performing highly efficient atomistic simulations. J. Chem. Phys. 157, 114801 (2022).
Kresse, G. & Furthmuller, J. Efficiency of abinitio total energy calculations for metals and semiconductors using a planewave basis set. Comp. Mater. Sci. 6, 15–50 (1996).
Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmentedwave method. Phys. Rev. B 59, 1758–1775 (1999).
Paszke, A. et al. Pytorch: An imperative style, highperformance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026–8037 (2019).
Fey, M. & Lenssen, J. E. Fast graph representation learning with pytorch geometric. In Workshop of International Conference on Learning Representations (ICLR, 2019).
Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, opensource python library for materials analysis. Comp. Mater. Sci. 68, 314–319 (2013).
Geiger M. et al. e3nn/e3nn: 20220413 (0.5.0). Zenodo https://doi.org/10.5281/zenodo.6459381 (2022).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (2021).
Reddi, S. J., Kale, S. & Kumar, S. On the convergence of adam and beyond. Preprint at https://doi.org/10.48550/arXiv.1904.09237 (2019).
Acknowledgements
We thank Dr. Hongli Guo for providing the structure of the Moiré superlattice of bilayer MoS_{2}. We acknowledge financial support from the Ministry of Science and Technology of the People´s Republic of China (No. 2022YFA1402901), NSFC (grants No. 11825403, 11991061, 12188101), and the Guangdong Major Project of the Basic and Applied Basic Research (Future functional materials under extreme conditions–2021B0301030005).
Author information
Authors and Affiliations
Contributions
H.J.X. and X.G.G. supervised the project. Y.Z. and H.J.X. proposed the parameterization framework of Hamiltonian. Y.Z. coded the framework and carried out the experiments with the help of H.Y.Y. and M.S. Y.Z. and H.J.X. prepared the manuscript. All authors discussed the results and commented on the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhong, Y., Yu, H., Su, M. et al. Transferable equivariant graph neural networks for the Hamiltonians of molecules and solids. npj Comput Mater 9, 182 (2023). https://doi.org/10.1038/s41524023011304
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524023011304
This article is cited by

Accelerating the calculation of electron–phonon coupling strength with machine learning
Nature Computational Science (2024)

Designing semiconductor materials and devices in the postMoore era by tackling computational challenges with datadriven strategies
Nature Computational Science (2024)

Accelerating predictions of electronic transport and superconductivity
Nature Computational Science (2024)

A machine learning tool to efficiently calculate electron–phonon coupling
Nature Computational Science (2024)