Abstract
The combination of deep learning and ab initio calculation has shown great promise in revolutionizing future scientific research, but how to design neural network models incorporating a priori knowledge and symmetry requirements is a key challenging subject. Here we propose an E(3)equivariant deeplearning framework to represent density functional theory (DFT) Hamiltonian as a function of material structure, which can naturally preserve the Euclidean symmetry even in the presence of spin–orbit coupling. Our DeepHE3 method enables efficient electronic structure calculation at ab initio accuracy by learning from DFT data of smallsized structures, making the routine study of largescale supercells (>10^{4} atoms) feasible. The method can reach submeV prediction accuracy at high training efficiency, showing stateoftheart performance in our experiments. The work is not only of general significance to deeplearning method development but also creates opportunities for materials research, such as building a Moirétwisted material database.
Similar content being viewed by others
Introduction
It has been well recognized that deeplearning methods could offer a potential solution to the accuracyefficiency dilemma of ab initio material calculations. Deeplearning potential^{1,2} and a series of other neural network models^{3,4,5,6,7} are capable of predicting the total energies and atomic forces of given material structures, enabling molecular dynamics simulation at large length and time scales. The paradigm has been used for deeplearning research of various kinds of physical and chemical properties^{8,9,10,11,12,13,14,15,16,17,18,19}. During the development of these methods, people have gradually come to realize that the introduction of symmetry considerations as a priori knowledge into neural networks is of crucial importance to the deeplearning approaches. For this purpose, people have drawn insights from a class of neural networks called the equivariant neural networks (ENNs)^{20,21,22,23,24}. The key innovation of ENNs is that all the internal features transform under the same symmetry group with the input; thus, the symmetry requirements are explicitly treated and exactly satisfied. Symmetry has fundamental importance in physics, so ENNs will be especially advantageous when they are applied to the modeling of physical systems, as shown by a series of neural network models for various material properties^{6,7,13,14,15}.
Recently, a deep neural network representation of density functional theory (DFT) Hamiltonian (named DeepH) was developed by employing the locality of electronic matter, localized basis, and local coordinate transformation^{25}. By the DeepH approach, the computationally demanding selfconsistent field iterations could be bypassed, and all the electronrelated physical quantities in the singleparticle picture can, in principle, be efficiently derived. This opens opportunities for the electronic structure calculation of largescale material systems. However, it is highly nontrivial to incorporate symmetry considerations into DeepH. Specifically, the property that the Hamiltonian matrix changes covariantly (i.e., equivariantly) under rotations or gauge transformations should be preserved by the neural network model for efficient learning and accurate prediction (Fig. 1). A strategy is developed in DeepH to apply local coordinate transformation which changes the rotation covariant problem into an invariant one and thus the transformed Hamiltonian matrices can be learned flexibly via rotationinvariant neural networks^{25}. Nevertheless, the large amount of local coordinate information seriously increases the computational load, and the model performance depends critically on a proper selection of local coordinates, which relies on human intuition and is not easy to optimize. Therefore, we think that the combination of DeepH with ENN might open new possibilities for the deeplearning modeling of Hamiltonians.
People have already made attempts to model the Hamiltonian using equivariant methods. Unke et al. designed PhiSNet^{26}, which used ENN for predicting the Hamiltonians of molecules with fixed system size. Nigam et al. used rotationally equivariant Ncenter features in the kernel ridge regression method to fit molecular Hamiltonians^{27}. Zhang et al. proposed an equivariant scheme to parameterize the Hamiltonians of crystals based on the atomic cluster expansion descriptor^{28}. However, the key capability of DeepH that learns from DFT results on smallsized material systems and predicts the electronic structures of much larger ones has not been demonstrated by these methods. More critically, the existing equivariant methods have neglected the equivariance in the spin degrees of freedom, although the electronic spin and spin–orbit coupling (SOC) play a key role in modern condensed matter physics and materials science. With SOC, one should take care of the spin–orbital Hamiltonian, whose spin and orbital degrees of freedom are coupled and transform together under a change of coordinate system or basis set, as illustrated in Fig. 1. This would raise critical difficulties in designing ENN models due to a fundamental change of symmetry group. In this context, the incorporation of ENN models into DeepH is essential but remains elusive.
In this work, we propose DeepHE3, a universal E(3)equivariant deeplearning framework to represent the spin–orbital DFT Hamiltonian \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) as a function of atomic structure \(\{{{{{{{{\mathcal{R}}}}}}}}\}\) by neural networks, which enables efficient electronic structure calculations of largescale materials at ab initio accuracy. A general theoretical basis is developed to explicitly incorporate covariance transformation requirements of \(\{{{{{{{{\mathcal{R}}}}}}}}\}\mapsto {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) into neural network models that can properly take the electronic spin and SOC into account, and a code implementation of DeepHE3 based on the messagepassing neural network is also presented. Since the principle of covariance is automatically satisfied, efficient learning and accurate prediction become feasible via the DeepHE3 method. Our systematic experiments demonstrate the stateoftheart performance of DeepHE3, which shows submeV accuracy in predicting DFT Hamiltonian. The method works well for various kinds of material systems, such as magicangle twisted bilayer graphene or twisted van der Waals materials in general, and the computational costs are reduced by several orders of magnitude compared to direct DFT calculations. Benefiting from the high efficiency and accuracy as well as the good transferability, there could be promising applications of DeepHE3 in electronic structure calculations. Also, we expect that the proposed neural network framework can be generally applied to develop deeplearning ab initio methods and that the interdisciplinary developments would eventually revolutionize future materials research.
Results
Realization of equivariance
It has long been established as one of the fundamental principles of physics that all physical quantities must transform equivariantly between reference frames. Formally, a mapping f : X → Y is equivariant for vector spaces X and Y with respect to group G if D_{Y} (g) ∘ f = f ∘ D_{X} (g), ∀g ∈ G, where D_{X}, D_{Y} are representations of group G over vector spaces X, Y, respectively. The problem considered in this work is the equivariance of a mapping from the material structure \(\{{{{{{{{\mathcal{R}}}}}}}}\}\) including atom types and positions to the DFT Hamiltonian \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) with respect to the E(3) group. The E(3) group is the Euclidean group in threedimensional (3D) space which contains translations, rotations, and inversion. Translation symmetry is manifest since we only work on the relative positions between atoms, not their absolute positions. Rotations of coordinates introduce nontrivial transformations, which should be carefully investigated. Suppose the same point in space is specified in two coordinate systems by r and \({{{{{{{{\bf{r}}}}}}}}}^{{\prime} }\). If the coordinate systems are related to each other by a rotation, the transformation rule between the coordinates of the point is \({{{{{{{{\bf{r}}}}}}}}}^{{\prime} }={{{{{{{\bf{R}}}}}}}}{{{{{{{\bf{r}}}}}}}}\), where R is a 3 × 3 orthogonal matrix.
In order to take advantage of the nearsightedness of electronic matter^{29}, the Hamiltonian operator is expressed in the picture of localized pseudoatomic orbital (PAO) basis. The basis is separated into radial and angular parts, having the form \({\phi }_{i\alpha }({{{{{{{\bf{r}}}}}}}})={R}_{ipl}(r){Y}_{lm}(\hat{r})\). Here i is the site index, α ≡ (plm), where p is the multiplicity index, Y_{lm} is the spherical harmonics having angular momentum quantum number l and magnetic quantum number m, r ≡ ∣r − r_{i}∣ and \(\hat{r}\equiv ({{{{{{{\bf{r}}}}}}}}{{{{{{{{\bf{r}}}}}}}}}_{i})/{{{{{{{\bf{r}}}}}}}}{{{{{{{{\bf{r}}}}}}}}}_{i}\) where r_{i} is the position of the ith atom. The transformation rule for the Hamiltonian matrix between the two coordinate systems described above is
where \({D}_{m{m}^{{\prime} }}^{l}({{{{{{{\bf{R}}}}}}}})\) is the Wigner Dmatrix. The equivariance of the mapping \(\{{{{{{{{\mathcal{R}}}}}}}}\}\,\mapsto\, {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) requires that, if the change of coordinates causes the positions of the atoms to transform, the corresponding Hamiltonian matrix must transform covariantly according to Eq. (1).
ENN is applied to construct the mapping \(\{{{{{{{{\mathcal{R}}}}}}}}\}\,\mapsto\, {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) in order to preserve equivariance. The input, output, and internal feature vectors of ENNs all belong to a special set of vectors that have the form x_{l} = (x_{l,l}, …, x_{l,−l}) and transform according to the following rule:
This vector is said to carry the irreducible representation of the SO(3) group of dimension 2l + 1. If the input vectors are transformed according to Eq. (2), then all the internal features and the output vectors of the ENN will also be transformed accordingly. Under this constraint, the ENN incorporates learnable parameters in order to model equivariant relationships between inputs and outputs.
The method of constructing the equivariant mapping \(\{{{{{{{{\mathcal{R}}}}}}}}\}\,\mapsto\, {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) is illustrated in Fig. 2. The atomic numbers Z_{i} and interatomic distances ∣r_{ij}∣ ≡ ∣r_{i} − r_{j}∣ are used to construct the l = 0 input vectors (scalars). Spherical harmonics acting on the unit vectors of relative positions \({\hat{r}}_{ij}\) constitute input vectors of l = 1, 2, … . The output vectors of the ENN are passed through the Wigner–Eckart layer before representing the final Hamiltonian. This layer exploits the essential concept of the Wigner–Eckart theorem:
“ ⊕ ” and “ ⊗ ” signs stand for direct sum and tensor product of representations, respectively. “=” denotes equivalence of representations, i.e., they differ from each other by a change of basis. The coefficients in the change of basis are exactly the celebrated Clebsch–Gordan coefficients. The representation l_{1} ⊗ l_{2} is carried by the tensor \({{{{{{{{\bf{x}}}}}}}}}_{{l}_{1}{l}_{2}}\), which transforms according to the rule
Notice that Eq. (4) has the same form as Eq. (1), so the tensor \({{{{{{{{\bf{x}}}}}}}}}_{{l}_{1}{l}_{2}}\) can exactly represent the output Hamiltonian satisfying the equivariant requirements.
Equivariance of the spin–orbital Hamiltonian
If we further consider the spin degrees of freedom, the transformation rule for the Hamiltonian becomes
where σ_{1}, σ_{2} are the spin indices (spin up or down). The construction of the spin–orbital DFT Hamiltonian is a far more complicated issue. Electron spin has angular momentum l = 1/2, so it seems that tedious coding and debugging are unavoidable because we have to introduce complexvalued halfinteger representations into the neural network, which typically only supports realvalued integer representations for the time being. Furthermore, a 2π rotation brings a vector in 3D space to itself but introduces a factor 1 to the spin1/2 vector. This means that any mapping from 3D input vectors to l = 1/2 output vectors will be discontinued and cannot be modeled by neural networks, which poses a serious threat to our approach since we only have 3D vectors as input to the neural network (Fig. 2).
Fortunately, we observe that l = 1/2 appearing in the DFT Hamiltonian does not necessarily mean that halfinteger representations must be inserted everywhere into the neural network. In fact, they can be restricted to the final output layer as soon as we employ the transformation rule:
There is no halfinteger representation on the righthand side; thus, it can be further decomposed into integer representations by repeatedly applying Eq. (3).
Another problem is associated with the introduction of complex numbers. Generally, the spin–orbital Hamiltonian matrix elements have complex values, and the ENN cannot simply predict its real and imaginary parts separately because this will violate equivariance. Ordinary neural networks of complex numbers are mostly still under their experimental and developmental stage, so the use of complexvalued ENN is practically difficult, if not impossible. Nevertheless, we have discovered a way to sidestep this problem. Under the bases which are eigenvectors of the timereversal operator, the Dmatrices of integer l will become purely real. Consequently, for a vector with integer l under that basis, its complex and real parts will never mingle with each other when the vector is multiplied by a real transformation matrix. Then one complex vector can be technically treated as two real vectors while preserving equivariance. Note that this is not true for halfinteger representations, for that we must add up the real and imaginary parts before the integer representations are converted to halfinteger representations in the Wigner–Eckart layer (Fig. 2).
Yet another subtle issue arises in Eq. (5). It is not exactly the same as Eq. (4) in that two of the Dmatrices in the former equation are taken as complex conjugates, but those in the latter are not. In fact, instead of constructing a vector with representation \(({l}_{1}\otimes \frac{1}{2})\otimes ({l}_{2}\otimes \frac{1}{2})\), we must construct \(({l}_{1}\otimes \frac{1}{2})\otimes ({l}_{2}^{*}\otimes {\frac{1}{2}}^{*})\) to represent the spin–orbital Hamiltonian described in Eq. (5). Here, l* denotes the representation whose representation matrix is replaced by its complex conjugate. This is not a problem for integer l, but is critical for l = 1/2. If not treated properly, the overall equivariance will be violated. In order to solve this problem, we first notice that the representation l* is still a representation of the SU(2) group with dimension 2l + 1. In fact, it is guaranteed to be equivalent to the representation l without complex conjugate. In other words, there must exist a unitary matrix P^{l} for each integer or halfinteger l satisfying
This is guaranteed by the fact that the quantum rotation operator \(\hat{U}(g)\) commutes with the timereversal operator \({{{{{{{\mathcal{T}}}}}}}}\): \(\langle lm\hat{U}(g)l{m}^{{\prime} }\rangle=\langle lm{{{{{{{{\mathcal{T}}}}}}}}}^{{{{\dagger}}} }\hat{U}(g){{{{{{{\mathcal{T}}}}}}}}l{m}^{{\prime} }\rangle={(1)}^{m{m}^{{\prime} }}{\langle l,m\hat{U}(g)l,{m}^{{\prime} }\rangle }^*\). The matrix P in Eq. (7) is thus given by
Therefore, we only need to apply a change of basis to convert a vector carrying representation l to a vector carrying l*. Notice that this property holds even for material systems without timereversal symmetry.
The workflow of constructing the DFT Hamiltonian is summarized and illustrated in Fig. 2. In order to construct a Hamiltonian with SOC, the output vectors from the ENN are first separated into two real components, then combined together into complex vectors and passed to the Wigner–Eckart layer. The Wigner–Eckart layer uses the rules in Eq. (3) and Eq. (6) to convert these vectors to tensors of the form in Eq. (4), except that the tensors here have rank 4 instead of 2. After that, the last spin index is converted to its complex conjugate counterpart by the change of basis using Eq. (8) for l = 1/2. The output tensors follow the same transformation rule under coordinate rotation as the DFT Hamiltonian in Eq. (5), and thus could be used to represent the DFT Hamiltonian matrix.
Finally, we discuss two remaining issues. To include parity, we will consider E(3) = SE(3) ⊗ {E, I}, where E is the identity and I is the spatial inversion. Under a coordinate transform, the vector is multiplied by −1 if it has odd parity and the coordinate transform involves spatial inversion. The parity of the Hamiltonian is determined by \({(1)}^{{l}_{1}+{l}_{2}}\). In addition, there is a possible ambiguity in Eq. (5) since the mapping from a classical rotation R to a quantum rotation \({{{{{{{{\bf{D}}}}}}}}}^{\frac{1}{2}}\) is not singlevalued. However, the possible factor −1 will always be canceled between the two Dmatrices in that equation, which eliminates the potential problem.
The neural network architecture of DeepHE3
Here we present the neural network architecture of the DeepHE3 method. An illustration of the architecture can be found in Fig. 3. The general structure is based on the messagepassing neural network^{9,30} that has been widely used in materials research^{6,7,14,15,16,17,18,19,25}. The material structure is represented by a graph, where each atom is associated with a vertex (or node). Edges are connected between atom pairs with nonzero intersite hopping, and selfloop edges are included to describe intrasite coupling. Every vertex i is associated with a feature v_{i} and every edge ij with e_{ij}. These features are composed of several vectors defined in Eq. (2). As illustrated in Fig. 3a, the initial feature \({{{{{{{{\bf{v}}}}}}}}}_{i}^{(0)}\) of vertex i is the trainable embedding of the atomic number Z_{i}, and the initial e_{ij} is the interatomic distance ∣r_{ij}∣ expanded using the Gaussian basis e_{B}(∣r_{ij}∣) as defined in Eq. (11). The features of vertices and edges are iteratively updated using features of their neighborhood as incoming messages. Finally, the final edge feature e_{ij} is passed through a linear layer and used to construct the Hamiltonian matrix block H_{ij} between atoms i and j using the method illustrated in Fig. 2. It is worth mentioning that, under the messagepassing scheme, the output Hamiltonian is only influenced by the information of its neighborhood environment. The nearsightedness property^{29} ensures efficient linearscaling calculations as well as good generalization ability^{25}.
The equivariant building blocks of the neural network are implemented using the scheme provided by TensorField Networks^{21} and e3nn^{24,31}. The feature vectors \({x}_{cm}^{(l)}\) processed by these neural network blocks are implemented as dictionaries with key l, an integer which is the order of representation of the SO(3) group. c is the “channel index” ranging from 1 to n^{(l)}, where n^{(l)} is the number of channels at order l, and each channel refers to a vector defined in Eq. (2).
The E3Linear layer defined in Eq. (12) possesses learnable weights and biases, which is similar to linear layers in conventional neural networks, but only connects vectors of the same representation to preserve equivariance. The gate layer introduces equivariant nonlinearity, as proposed in ref. ^{22}, where nonlinearly activated l = 0 vectors (i.e., scalars) are used as scaling factors (“gates”) to the norms of l ≠ 0 vectors.
We propose a normalization scheme, E3LayerNorm, that normalizes the feature vectors using mean and variance obtained from the layer statistics while preserving equivariance:
where ϵ is introduced to maintain numerical stability, \({g}_{c}^{(l)},{b}_{c}^{(l)}\) are learnable affine parameters, the mean \({\mu }_{m}^{(l)}=\frac{1}{N{n}^{(l)}}\mathop{\sum }\nolimits_{i=1}^{N}\mathop{\sum }\nolimits_{c=1}^{{n}^{(l)}}{({{{{{{{{\bf{v}}}}}}}}}_{i})}_{cm}^{(l)}\), the variance \({({\sigma }^{(l)})}^{2}=\frac{1}{N{n}^{(l)}}\mathop{\sum }\nolimits_{i=1}^{N}\mathop{\sum }\nolimits_{c=1}^{{n}^{(l)}}\mathop{\sum }\nolimits_{m=l}^{l}{({{{{{{{{\bf{v}}}}}}}}}_{i})}_{cm}^{(l)}{\mu }_{m}^{(l)}{}^{2}\), N is the total number of vertices. Here only the E3LayerNorm for vertex update blocks is described. The corresponding E3LayerNorm for edge update blocks is similar with the mean and variance obtained from edge features instead of vertex features. We find that E3LayerNorm significantly stabilizes the training process. A discussion about the use of E3LayerNorm can be found in Supplementary Note 5.
The previously discussed blocks do not include coupling between different l’s. This problem is resolved by the tensor product layer:
where \({C}_{{l}_{2}{m}_{2};{l}_{3}{m}_{3}}^{{l}_{1}{m}_{1}}\) are Clebsch–Gordan coefficients, \({U}_{c{c}^{{\prime} }}^{(l)},{V}_{c{c}^{{\prime} }}^{(l)}\) are learnable weights. This is abbreviated as z = (Ux) ⊗ (Vy).
The neural network architecture is illustrated in Fig. 3. The equivariant convolution block (EquiConv, Fig. 3c) encodes the information of an edge and the vertices connected to that edge. The core component of equivariant convolution is the tensor product (Eq. (10)) of the vertex and edge features (v_{i}∣∣v_{j}∣∣e_{ij}) and the spherical harmonics of the edge ij (\({{{{{{{\bf{Y}}}}}}}}({\hat{r}}_{ij})\)). Here ∣∣ stands for vector concatenation. The tensor product introduces directional information of material structure into the neural network. Propagating directional information into neural networks is important, as emphasized by previous works^{12,14}, which is realized in an elegant way here via the tensor product. The interatomic distance information is also encoded into the neural network. It is expanded using the Gaussian basis expansion and then fed into a fully connected neural network, whose output is multiplied elementwise to the output of gate nonlinearity.
The vertex update block (Fig. 3d) aggregates information from the neighboring environment. To update a vertex, every edge connected to that vertex contributes a “message” generated by the equivariant convolution (EquiConv) block. All the “messages” are summed and normalized to update the vertex feature. This is similar for the edge update block (Fig. 3e), except that only the output of EquiConv on edge ij is used for updating e_{ij}. After several updates, the final edge feature vectors will serve as the neural network output and are passed into the Wigner–Eckart layer to construct the Hamiltonian matrix blocks, as illustrated in Fig. 2. More details are described in “Methods”.
Capability of DeepHE3
The incorporation of global Euclidean symmetry as a priori knowledge provided to the messagepassing deeplearning framework in the DeepHE3 model has led to its outstanding performance in terms of efficiency and accuracy. A remarkable capability of DeepHE3 is to learn from DFT data on small structures and make predictions on varying structures of different sizes without having to perform further DFT calculations. This enables highly efficient electronic structure calculations of largescale material systems at ab initio accuracy. All the DFT Hamiltonian matrices used for deep learning in this work are computed by the OpenMX code using the PAO basis. After example studies on monolayer graphene and MoS_{2} datasets, we will first demonstrate the capability of DeepHE3 by investigating twisted bilayer graphene (TBG), especially the wellknown magicangle TBG whose DFT calculation is important but quite challenging due to its huge Moiré supercell. Next, we will apply DeepHE3 to study twisted van der Waals (vdW) materials with strong SOC, including bilayers of bismuthene, Bi_{2}Se_{3}, and Bi_{2}Te_{3}, for demonstrating the effectiveness of our equivariant approach to construct the spin–orbital DFT Hamiltonian. Finally, we will use our model to illustrate the SOCinduced topological quantum phase transition in twisted bilayer Bi_{2}Te_{3}, giving an example of exploring exotic physical properties in largescale material systems.
Study of monolayer graphene and MoS_{2}
Before going to largescale materials, we first validate our method on the datasets used in ref. ^{25} to benchmark DeepHE3’s performance. The datasets are comprised of DFT supercell calculation results of monolayer graphene and MoS_{2}, and different geometric configurations are sampled from ab initio molecular dynamics. The test results are summarized in Table 1 and compared with those of the original DeepH method^{25}, which, instead of using an explicitly equivariant approach, applied the local coordinate technique in handling the covariant transformation property of the Hamiltonian. Our experiments show that the mean absolute errors (MAEs) of Hamiltonian matrix elements averaged over atom pairs are all within a fraction of a meV, which are reduced approximately by a factor of 2 or more in all prediction targets compared with DeepH. Benefiting from the high accuracy of the deeplearning DFT Hamiltonian, band structures predicted by DeepHE3 can accurately reproduce DFT results (Supplementary Fig. 1).
Application to twisted vdW materials
Our deep learning method is particularly useful for studying the electronic structure of twisted vdW materials. This class of materials has attracted great interest for research and applications since their Moiré super periodicity offers a new degree of freedom to tune manybody interactions and brings in emergent quantum phenomena, such as correlated states^{32}, unconventional superconductivity^{33}, and higherorder band topology^{34}. Traditionally, it is challenging to perform computationally demanding DFT calculations on large Moiré structures. However, this challenge could be largely overcome by DeepHE3. One may train the neural network models by DFT data on small, nontwisted, randomly perturbed structures and predict the DFT Hamiltonian of arbitrarily twisted structures bypassing DFT via deep learning, as illustrated in Fig. 4a. This procedure demands much less computational resources than directly doing DFT calculations on large twisted superstructures.
Once the model is trained, it can be applied to study TBGs of varying twist angles. The performance is compared with that of DeepH. Test data includes DFT results for systems containing up to more than one thousand atoms per supercell. As summarized in Fig. 4b, DeepHE3 significantly reduces the averaged MAEs of DFT Hamiltonian matrix elements by more than a factor of 2 as compared to DeepH, consistent with the above conclusion. Moreover, the MAEs reach ultralow values of 0.2–0.3 meV and gradually decrease with increasing Moiré supercell size (or decreasing twist angle). This demonstrates the good generalizability of DeepHE3. The method is thus expected to be suitable for studying TBGs with small twist angles that are of current interest^{35}.
We take the magicangle TBG with θ = 1.08^{∘} and 11,164 atoms per supercell as a special example. The discoveries of novel physics relevant to flat bands in this system have triggered enormous interest in investigating twisted vdW materials. Due to the large supercell, DFT study of magicangle TBG is a formidable task, but DeepHE3 can routinely study such kind of material systems in a particularly accurate and efficient way. As shown in Fig. 4c, the electronic bands of magicangle TBG with relaxed structure computed by DeepHE3 agree well with the published results obtained by DFT and lowenergy effective continuum model^{35}. The flat bands near the Fermi level are well reproduced. Some minor discrepancies appear away from the Fermi level, which could be partially explained by the methodological difference: the benchmark work uses the planewave basis, whereas our work employs the atomiclike basis, and the pseudopotential used is also different. Detailed discussions about the influence of basis set and pseudopotential are included in Supplementary Note 2.
Most remarkably, DeepHE3 has the capability to reduce the computational cost of studying these large material systems by several orders of magnitude. The DFT calculation (including structural relaxation) on magicangle TBG performed in ref. ^{35} took around 1 month on about five thousand CPU cores. In contrast, the major computational cost of DeepHE3 comes from neural network training. Typically, only a few hundreds of DFT training calculations are needed, and the training process usually takes tens of GPU hours, but all these are only required to be done once. After that, DFT Hamiltonian matrices can be constructed very efficiently via neural network inference. The process time is on the order of minutes by one GPU for magicangle TBG, which grows linearly with Moiré supercell size. Generalized eigenvalue problems are solved for 60 bands near the Fermi level to obtain the band dispersion, which only requires about 8 min per kpoint for magicangle TBG using 64 CPU cores. The low computational cost and high accuracy of DeepHE3 demonstrate its potential power in resolving the accuracyefficiency dilemma of ab initio calculation methods, and it would be highly favorable to future scientific research.
Study of twisted vdW materials with strong SOC
We have tested the performance of DeepHE3 on studying twisted vdW materials with strong SOC, including twisted bilayers of bismuthene, Bi_{2}Se_{3}, and Bi_{2}Te_{3}. The latter two materials are more complicated, which include two quintuple layers and two kinds of elements (Fig. 5a for Bi_{2}Te_{3}). The strong SOC introduces additional complexity in their electronic structure problems. Despite all these difficulties, the capability of DeepHE3 is not influenced to any extent. Our method reaches submeV accuracy in predicting DFT Hamiltonians of test material samples, including nontwisted and twisted structures of bismuthene, Bi_{2}Se_{3}, and Bi_{2}Te_{3} bilayers. Impressively, the band structures predicted by DeepHE3 match well with those obtained from DFT (Supplementary Fig. 3). Moreover, we observe the remarkable ability of our model to fit a tremendous amount of data with moderate model capacity and relatively small computational complexity. For instance, the neural network model is able to fit 2.8 × 10^{9} nonzero complexvalued Hamiltonian matrix elements in the dataset with about 10^{5} real parameters. The training time is about one day on a single GPU in order to reach submeV accuracy. More details are presented in Supplementary Note 3. Through these experiments, the capability of DeepHE3 to represent the spin–orbital DFT Hamiltonian is well demonstrated.
In physics, the SOC can induce many exotic quantum phenomena, leading to emergent research fields of spintronics, unconventional superconductivity, topological states of matter, etc. Investigation of SOC effects is thus of fundamental importance to the research of condensed matter physics and materials science. The functionality of analyzing SOC effects is easily implemented by DeepHE3. Specifically, we apply two neural network models to learn DFT Hamiltonians with full SOC (\({\hat{H}}_{1}\)) and without SOC (\({\hat{H}}_{0}\)) separately for the same material system. Then, we define a virtual Hamiltonian as a function of SOC strength (λ): \({\hat{H}}_{\lambda }={\hat{H}}_{0}+\lambda {\hat{H}}_{{{{{{{{\rm{SOC}}}}}}}}}\), where \({\hat{H}}_{{{{{{{{\rm{SOC}}}}}}}}}={\hat{H}}_{1}{\hat{H}}_{0}\). By studying the virtual Hamiltonian at different λ, we can systematically analyze the influence of SOC effects on material properties.
As an example application, we employ the approach to investigate the topological properties of twisted bilayer Bi_{2}Te_{3}. DeepHE3 can accurately predict the DFT Hamiltonian for both cases with or without SOC, as confirmed by band structure calculations using the predicted \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) (Fig. 5b, c). Herein the SOC is extremely strong as caused by the heavy elements in the material. Consequently, the band structure changes considerably when SOC is turned on. The evolution of band structure as a function of SOC strength (Fig. 5d) provides rich information on the SOC effects. Importantly, the band gap closes and reopens when increasing the SOC strength, indicating a topological quantum phase transition from Z_{2} = 0 to Z_{2} = 1. This is further confirmed by applying symmetry indicators based on KohnSham orbital analysis and by performing Brillouinzone integration of Berry connection and curvature over all occupied states via the FukuiHatsugaiSuzuki formalism^{36}. The topological invariant Z_{2} turns out to be nonzero for the spin–orbital coupled system, suggesting that the twisted bilayer Bi_{2}Te_{3} (θ = 21.8^{∘}) is topologically nontrivial. As DeepHE3 works well for varying twist angles, the dependence of band topology on twist angle can be systematically computed, which will enrich the research of twisted vdW materials.
Discussion
Since the DFT Hamiltonian \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) transforms covariantly between reference frames, it is natural and advantageous to construct the mapping from crystal structure \(\{{{{{{{{\mathcal{R}}}}}}}}\}\) to \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) in an explicitly equivariant manner. In this context, we have developed a general framework to represent \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) with a deep neural network DeepHE3 that fully respects the principle of covariance even in the presence of SOC. We have presented the theoretical basis, code implementation, and practical applications of DeepHE3. The method enables accurate and efficient electronic structure calculation of largescale material systems beyond the scope of traditional ab initio approaches, opening possibilities to investigate rich physics and novel material properties at a particularly low computational cost.
However, as the structure becomes larger, it becomes increasingly difficult to diagonalize the Hamiltonian matrix in order to obtain wavefunctionrelated physical quantities. This difficulty, instead of the limitations of DeepHE3 method itself, will eventually become the bottleneck of accurate electronic structure predictions. Nevertheless, benefiting from the sparseness of the DFT Hamiltonian matrix under localized atomic orbital basis, many efficient O(N) algorithms with high parallel efficiency are available for studying largescale systems (e.g., supercells including up to 10^{7} atoms^{37}). A combination of the DeepHE3 method with such efficient linear algebra algorithms will be a promising direction for future study.
The unique abilities of DeepHE3, together with the general framework of incorporating symmetry requirements and physical insights into neural network model design, might find wide applications in various directions. For example, the method can be applied to build a material database for a diverse family of Moirétwisted materials. For each kind of material, only one trained neural network model will be needed for all the twisted structures in order to have full access to their electronic properties, which is a great advantage for high throughput material discovery. Moreover, since the deeplearning method does not rely on periodic boundary conditions, 2D materials with incommensurate twist angles can also be investigated, making the ab initio study of quasicrystal phases possible. In addition, we could go one step further by calculating the derivative of the electronic Hamiltonian with respect to atomic positions via automatic differentiation techniques. This enables deeplearning investigation of the physics of electronphonon coupling in largescale materials, which has the potential to outperform the computationally expensive traditional methods of frozen phonon or density functional perturbation theory^{38}. Furthermore, one may combine the deep learning method with advanced methods beyond the DFT level, such as hybrid functionals, manybody perturbation theory, timedependent DFT, etc. These important generalizations, if any of them are realized, would greatly enlarge the research scope of ab initio calculation.
Methods
Datasets
Data generated in this study is available in public repositories at Zenodo^{39,40,41}.
Monolayer graphene: The dataset is taken from ref. ^{25}. The dataset consists of 450 graphene structures with 6 × 6 supercells, generated by ab initio molecular dynamics performed by the Vienna ab initio simulation package (VASP)^{42}, using the PBE^{43} exchangecorrelation functional and the projectoraugmented wave (PAW) pseudopotentials^{44,45}. The cutoff energy of the plane waves is 450 eV, and only the Γ point is used in our kmesh. Five thousand frames are obtained at 300K with time step 1fs, and then one frame is taken out every 10 frames starting from the 500th frame. Thus, there are 450 structures in the dataset. The Hamiltonians for training are calculated with the OpenMX code using the PBE functional and normconserving pseudopotential with C6.0s2p2d1 PAOs with 5 × 5Γcentered ksampling. Here 6.0 denotes the orbital cutoff radius in Bohr, s2p2d1 means there are 2 × 1 = 2 sorbitals, 2 × 3 = 6 porbitals, and 1 × 5 = 5 dorbitals.
Monolayer MoS_{2}: The dataset is also taken from ref. ^{25}. Five hundred structures with 5 × 5 supercells are generated by ab initio molecular dynamics performed by VASP with PAW pseudopotential and PBE functional. The cutoff energy of the plane waves is 450 eV, and only the Γ point is used in our kmesh. One thousand frames are taken at 300K with time step 1fs. The first 500 unequilibrated structures are discarded, and the remaining 500 structures are taken into the dataset. The Hamiltonians for training are calculated with the OpenMX code using the PBE functional and normconserving pseudopotential with Mo7.0s3p2d2 and S7.0s2p2d1 PAOs with 5 × 5Γcentered ksampling.
Bilayer graphene: The dataset is also taken from ref. ^{25}. Three hundred structures with 4 × 4 nontwisted supercells are generated by a uniform shift of one of the two vdW layers and inserting random perturbations to atomic positions in the mean time. The perturbations are within 0.1 Å along three cartesian directions. The supercells are constructed from bilayer unit cell structures relaxed with VASP^{42} using PBE functional with vdW interaction corrected by DFTD3 method with Becke–Jonson damping^{46}. The optimal interlayer spacing is found to be 3.35 Å. The Hamiltonians of the dataset and twisted structures are all calculated with the OpenMX code using the PBE functional and normconserving pseudopotential with C6.0s2p2d1 PAOs.
Bilayer bismuthene, Bi_{2}Se_{3}, and Bi_{2}Te_{3}: The same procedure is used to generate nontwisted 3 × 3 bilayer supercells. The numbers of structures are 576, 576, and 256 for bismuthene, Bi_{2}Se_{3}, and Bi_{2}Te_{3}, respectively, but only a randomly selected subset is used for training (details can be found in Supplementary Note 4). The interlayer spacing is 3.20 Å, 2.50 Å, and 2.61 Å for bismuthene, Bi_{2}Se_{3}, and Bi_{2}Te_{3}, respectively. The interlayer spacing is defined to be the vertical distance between the lowest atom in the upper layer and the highest atom in the lower layer. The Hamiltonians of the dataset and twisted structures are all calculated with the OpenMX code using the PBE functional and normconserving pseudopotential with Bi8.0s3p2d2, Se7.0s3p2d1 and Te7.0s3p2d2 PAOs.
Details of neural network models
All the neural network models presented in this article are trained by directly minimizing the meansquared errors of the model output compared to the Hamiltonian matrices computed by DFT packages, and the reported MAEs are also obtained from comparing model output to the DFT results. All physical quantities of materials are derived from the output Hamiltonian matrix.
Some details of neural network building blocks are described here. The Gaussian basis is adapted from ref. ^{4}, which is defined as:
where r_{n}, n = 0, 1, … are evenly spaced, with intervals equal to Δ. The E3Linear layer is defined as:
where \({W}_{c{c}^{{\prime} }}^{(l)},{b}_{c}^{(l)}\) are learnable weights and biases, \({b}_{c}^{(l)}=0\) for l ≠ 0. In the gate layer, the l = 0 part of the input feature is separated into two parts, denoted as \({x}_{1c}^{(0)}\) and \({x}_{2c}^{(0)}\). Notice that the index m is omitted because l = 0. The output feature is calculated by
Here ϕ_{1} and ϕ_{2} are activation functions. In this work, we use ϕ_{1}=SiLU and ϕ_{2}=Sigmoid following ref. ^{7}.
The ENN is implemented with the e3nn library^{31} in version 0.3.5 and PyTorch^{47} in version 1.9.0. The Gaussian basis expansion used as input to the EquiConv layer has a length of 128. The fully connected neural network in the EquiConv layer is composed of two hidden layers, each with 64 hidden neurons, using the SiLU function as nonlinear activation and a linear layer as output. A description of neural network hyperparameters for each material system and their selection strategy can be found in Supplementary Note 4.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The datasets for monolayer graphene, monolayer MoS_{2}, bilayer graphene, and bilayer bismuthene are available in ref. ^{39}. Dataset for bilayer Bi_{2}Se_{3} is available in ref. ^{40}. Dataset for bilayer Bi_{2}Te_{3} is available in ref. ^{41}. Instructions on reproducing the DeepHE3 models on these datasets can also be found in the corresponding repositories. Source data are provided with this paper.
Code availability
The code used in the current study is available at GitHub (https://github.com/XiaoxunGong/DeepHE3) and Zenodo^{48}.
References
Behler, J. & Parrinello, M. Generalized neuralnetwork representation of highdimensional potentialenergy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Zhang, L., Han, J., Wang, H., Car, R. & E, W. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett. 120, 143001 (2018).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192 (2017).
Schütt, K. T., Sauceda, H. E., Kindermans, P.J., Tkatchenko, A. & Müller, K.R. SchNet—a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Unke, O. T. et al. Spookynet: learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 12, 7273 (2021).
Gasteiger, J., Becker, F. & Günnemann, S. Gemnet: universal directional graph neural networks for molecules. Adv. Neural Inf. Process. Syst. 34, 6790–6802 (2021).
Batzner, S. et al. E(3)equivariant graph neural networks for dataefficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).
Duvenaud, D. K. et al. Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 28, 2224–2232 (2015).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. PMLR 70, 1263–1272 (2017).
Chandrasekaran, A. et al. Solving the electronic structure problem with machine learning. NPJ Comput. Mater. 5, 22 (2019).
Schütt, K. T., Gastegger, M., Tkatchenko, A., Müller, K.R. & Maurer, R. J. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nat. Commun. 10, 5024 (2019).
Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. In Proceedings of the Eighth International Conference on Learning Representations, https://openreview.net/forum?id=B1eWbxStPH (ICLR, 2020).
Anderson, B., Hy, T. S. & Kondor, R. Cormorant: covariant molecular neural networks. Adv. Neural Inf. Process. Syst. 32, 14537–14546 (2019).
Schütt, K., Unke, O. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. PMLR 139, 9377–9388 (2021).
Qiao, Z. et al. Informing geometric deep learning with electronic interactions to accelerate quantum chemistry. Proc. Natl Acad. Sci. USA 119, e2205221119 (2022).
Jørgensen, P. B., Jacobsen, K. W. & Schmidt, M. N. Neural message passing with edge updates for predicting properties of molecules and material. Preprint at https://arxiv.org/abs/1806.03146 (2018).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Unke, O. T. & Meuwly, M. Physnet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678 (2019).
Su, M., Yang, J.H., Xiang, H.J. & Gong, X.G. Efficient prediction of density functional theory Hamiltonian with graph neural network. Preprint at https://arxiv.org/abs/2205.05475 (2022).
Cohen, T. & Welling, M. Group equivariant convolutional networks. PMLR 48, 2990–2999 (2016).
Thomas, N. et al. Tensor field networks: rotationand translationequivariant neural networks for 3D point clouds. Preprint at https://arxiv.org/abs/1802.08219 (2022).
Weiler, M., Geiger, M., Welling, M., Boomsma, W. & Cohen, T. S. 3D steerable CNNs: learning rotationally equivariant features in volumetric data. Adv. Neural Inf. Process. Syst. 31, 10381–10392 (2018).
Kondor, R., Lin, Z. & Trivedi, S. Clebsch–Gordan nets: a fully Fourier space spherical convolutional neural network. Adv. Neural Inf. Process. Syst. 31 ,10117–10126 (2018).
Geiger, M. & Smidt, T. e3nn: Euclidean neural networks. Preprint at https://arxiv.org/abs/2207.09453 (2022).
Li, H. et al. Deeplearning density functional theory Hamiltonian for efficient ab initio electronicstructure calculation. Nat. Comput. Sci. 2, 367 (2022).
Unke, O. T. et al. SE(3)equivariant prediction of molecular wavefunctions and electronic densities. Adv. Neural Inf. Process. Syst. 34, 14434–14447 (2021).
Nigam, J., Willatt, M. J. & Ceriotti, M. Equivariant representations for molecular Hamiltonians and N center atomicscale properties. J. Chem. Phys. 156, 014115 (2022).
Zhang, L. et al. Equivariant analytical mapping of first principles Hamiltonians to accurate and transferable materials models. NPJ Comput. Mater. 8, 158 (2022).
Prodan, E. & Kohn, W. Nearsightedness of electronic matter. Proc. Natl Acad. Sci. USA 102, 11635 (2005).
Battaglia, P. W. et al. Relational inductive biases, deep learning, and graph networks. Preprint at https://arxiv.org/abs/1806.01261 (2018).
Geiger, M. et al. e3nn/e3nn: 20210827. Zenodo https://doi.org/10.5281/zenodo.5292912 (2021).
Cao, Y. et al. Correlated insulator behaviour at halffilling in magicangle graphene superlattices. Nature 556, 80 (2018).
Cao, Y. et al. Unconventional superconductivity in magicangle graphene superlattices. Nature 556, 43 (2018).
Liu, B. et al. Higherorder band topology in twisted Moiré superlattice. Phys. Rev. Lett. 126, 066401 (2021).
Lucignano, P., Alfè, D., Cataudella, V., Ninno, D. & Cantele, G. Crucial role of atomic corrugation on the flat bands and energy gaps of twisted bilayer graphene at the magic angle θ ~ 1.08^{∘}. Phys. Rev. B 99, 195419 (2019).
Fukui, T., Hatsugai, Y. & Suzuki, H. Chern numbers in discretized Brillouin zone: efficient method of computing (spin) hall conductances. J. Phys. Soc. Jpn. 74, 1674 (2005).
Hoshi, T., Yamamoto, S., Fujiwara, T., Sogabe, T. & Zhang, S.L. An orderN electronic structure theory with generalized eigenvalue equations and its application to a tenmillionatom system. J. Phys. Condens. Matter 24, 165502 (2012).
Giustino, F. Electronphonon interactions from first principles. Rev. Mod. Phys. 89, 015003 (2017).
Gong, X. et al. Dataset1 for “General framework for E(3)equivariant neural network representation of density functional theory Hamiltonian”. Zenodo https://doi.org/10.5281/zenodo.7553640 (2023).
Gong, X. et al. Dataset2 for “General framework for E(3)equivariant neural network representation of density functional theory Hamiltonian”. Zenodo https://doi.org/10.5281/zenodo.7553827 (2023).
Gong, X. et al. Dataset3 for “General framework for E(3)equivariant neural network representation of density functional theory Hamiltonian”. Zenodo https://doi.org/10.5281/zenodo.7553843 (2023).
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio totalenergy calculations using a planewave basis set. Phys. Rev. B 54, 11169 (1996).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 78, 1396 (1997).
Blöchl, P. E. Projector augmentedwave method. Phys. Rev. B 50, 17953 (1994).
Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmentedwave method. Phys. Rev. B 59, 1758 (1999).
Becke, A. D. & Johnson, E. R. A densityfunctional model of the dispersion interaction. J. Chem. Phys. 123, 154101 (2005).
Paszke, A. et al. Pytorch: an imperative style, highperformance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019).
Gong, X. et al. Code for “General framework for E(3)equivariant neural network representation of density functional theory Hamiltonian”. Zenodo https://doi.org/10.5281/zenodo.7554314 (2023).
Acknowledgements
This work was supported by the Basic Science Center Project of NSFC (grant no. 52388201), the National Science Fund for Distinguished Young Scholars (grant no. 12025405), the National Natural Science Foundation of China (grant no. 11874035), the Ministry of Science and Technology of China (grant nos. 2018YFA0307100 and 2018YFA0305603), the Beijing Advanced Innovation Center for Future Chip (ICFC), and the Beijing Advanced Innovation Center for Materials Genome Engineering. R.X. was funded by the China Postdoctoral Science Foundation (grant no. 2021TQ0187).
Author information
Authors and Affiliations
Contributions
Y.X. and W.D. proposed the project and supervised X.G. and H.L. in carrying out the research, with the help of N.Z. and R.X. All authors discussed the results. Y.X. and X.G. prepared the manuscript with input from the other coauthors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gong, X., Li, H., Zou, N. et al. General framework for E(3)equivariant neural network representation of density functional theory Hamiltonian. Nat Commun 14, 2848 (2023). https://doi.org/10.1038/s41467023384688
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467023384688
This article is cited by

Accelerating the calculation of electron–phonon coupling strength with machine learning
Nature Computational Science (2024)

Designing semiconductor materials and devices in the postMoore era by tackling computational challenges with datadriven strategies
Nature Computational Science (2024)

A deep equivariant neural network approach for efficient hybrid density functional calculations
Nature Communications (2024)

Scalable crystal structure relaxation using an iterationfree deep generative model with uncertainty quantification
Nature Communications (2024)

Generalizing deep learning electronic structure calculation to the planewave basis
Nature Computational Science (2024)