Introduction

It has been well recognized that deep-learning methods could offer a potential solution to the accuracy-efficiency dilemma of ab initio material calculations. Deep-learning potential1,2 and a series of other neural network models3,4,5,6,7 are capable of predicting the total energies and atomic forces of given material structures, enabling molecular dynamics simulation at large length and time scales. The paradigm has been used for deep-learning research of various kinds of physical and chemical properties8,9,10,11,12,13,14,15,16,17,18,19. During the development of these methods, people have gradually come to realize that the introduction of symmetry considerations as a priori knowledge into neural networks is of crucial importance to the deep-learning approaches. For this purpose, people have drawn insights from a class of neural networks called the equivariant neural networks (ENNs)20,21,22,23,24. The key innovation of ENNs is that all the internal features transform under the same symmetry group with the input; thus, the symmetry requirements are explicitly treated and exactly satisfied. Symmetry has fundamental importance in physics, so ENNs will be especially advantageous when they are applied to the modeling of physical systems, as shown by a series of neural network models for various material properties6,7,13,14,15.

Recently, a deep neural network representation of density functional theory (DFT) Hamiltonian (named DeepH) was developed by employing the locality of electronic matter, localized basis, and local coordinate transformation25. By the DeepH approach, the computationally demanding self-consistent field iterations could be bypassed, and all the electron-related physical quantities in the single-particle picture can, in principle, be efficiently derived. This opens opportunities for the electronic structure calculation of large-scale material systems. However, it is highly nontrivial to incorporate symmetry considerations into DeepH. Specifically, the property that the Hamiltonian matrix changes covariantly (i.e., equivariantly) under rotations or gauge transformations should be preserved by the neural network model for efficient learning and accurate prediction (Fig. 1). A strategy is developed in DeepH to apply local coordinate transformation which changes the rotation covariant problem into an invariant one and thus the transformed Hamiltonian matrices can be learned flexibly via rotation-invariant neural networks25. Nevertheless, the large amount of local coordinate information seriously increases the computational load, and the model performance depends critically on a proper selection of local coordinates, which relies on human intuition and is not easy to optimize. Therefore, we think that the combination of DeepH with ENN might open new possibilities for the deep-learning modeling of Hamiltonians.

Fig. 1: Equivariance in electronic structure calculations.
figure 1

a Schematic wavefunctions and b Hamiltonian matrices are shown for the systems neglecting or including spin–orbit coupling (SOC). Structures a1 and a2 are related to each other by a 90 rotation (see (a)), whose hopping parameters (i.e., Hamiltonian matrix elements) between px orbitals are related by a unitary transformation (see (b)). This equivariant property of Hamiltonian must be preserved in all electronic structure calculations. When the SOC is taken into account, the spin and orbital degrees of freedom are coupled and must transform together under global rotations, as shown for structures b1 and b2.

People have already made attempts to model the Hamiltonian using equivariant methods. Unke et al. designed PhiSNet26, which used ENN for predicting the Hamiltonians of molecules with fixed system size. Nigam et al. used rotationally equivariant N-center features in the kernel ridge regression method to fit molecular Hamiltonians27. Zhang et al. proposed an equivariant scheme to parameterize the Hamiltonians of crystals based on the atomic cluster expansion descriptor28. However, the key capability of DeepH that learns from DFT results on small-sized material systems and predicts the electronic structures of much larger ones has not been demonstrated by these methods. More critically, the existing equivariant methods have neglected the equivariance in the spin degrees of freedom, although the electronic spin and spin–orbit coupling (SOC) play a key role in modern condensed matter physics and materials science. With SOC, one should take care of the spin–orbital Hamiltonian, whose spin and orbital degrees of freedom are coupled and transform together under a change of coordinate system or basis set, as illustrated in Fig. 1. This would raise critical difficulties in designing ENN models due to a fundamental change of symmetry group. In this context, the incorporation of ENN models into DeepH is essential but remains elusive.

In this work, we propose DeepH-E3, a universal E(3)-equivariant deep-learning framework to represent the spin–orbital DFT Hamiltonian \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) as a function of atomic structure \(\{{{{{{{{\mathcal{R}}}}}}}}\}\) by neural networks, which enables efficient electronic structure calculations of large-scale materials at ab initio accuracy. A general theoretical basis is developed to explicitly incorporate covariance transformation requirements of \(\{{{{{{{{\mathcal{R}}}}}}}}\}\mapsto {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) into neural network models that can properly take the electronic spin and SOC into account, and a code implementation of DeepH-E3 based on the message-passing neural network is also presented. Since the principle of covariance is automatically satisfied, efficient learning and accurate prediction become feasible via the DeepH-E3 method. Our systematic experiments demonstrate the state-of-the-art performance of DeepH-E3, which shows sub-meV accuracy in predicting DFT Hamiltonian. The method works well for various kinds of material systems, such as magic-angle twisted bilayer graphene or twisted van der Waals materials in general, and the computational costs are reduced by several orders of magnitude compared to direct DFT calculations. Benefiting from the high efficiency and accuracy as well as the good transferability, there could be promising applications of DeepH-E3 in electronic structure calculations. Also, we expect that the proposed neural network framework can be generally applied to develop deep-learning ab initio methods and that the interdisciplinary developments would eventually revolutionize future materials research.

Results

Realization of equivariance

It has long been established as one of the fundamental principles of physics that all physical quantities must transform equivariantly between reference frames. Formally, a mapping f : X → Y is equivariant for vector spaces X and Y with respect to group G if DY (g) f = f DX (g), gG, where DX, DY are representations of group G over vector spaces X, Y, respectively. The problem considered in this work is the equivariance of a mapping from the material structure \(\{{{{{{{{\mathcal{R}}}}}}}}\}\) including atom types and positions to the DFT Hamiltonian \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) with respect to the E(3) group. The E(3) group is the Euclidean group in three-dimensional (3D) space which contains translations, rotations, and inversion. Translation symmetry is manifest since we only work on the relative positions between atoms, not their absolute positions. Rotations of coordinates introduce nontrivial transformations, which should be carefully investigated. Suppose the same point in space is specified in two coordinate systems by r and \({{{{{{{{\bf{r}}}}}}}}}^{{\prime} }\). If the coordinate systems are related to each other by a rotation, the transformation rule between the coordinates of the point is \({{{{{{{{\bf{r}}}}}}}}}^{{\prime} }={{{{{{{\bf{R}}}}}}}}{{{{{{{\bf{r}}}}}}}}\), where R is a 3 × 3 orthogonal matrix.

In order to take advantage of the nearsightedness of electronic matter29, the Hamiltonian operator is expressed in the picture of localized pseudo-atomic orbital (PAO) basis. The basis is separated into radial and angular parts, having the form \({\phi }_{i\alpha }({{{{{{{\bf{r}}}}}}}})={R}_{ipl}(r){Y}_{lm}(\hat{r})\). Here i is the site index, α ≡ (plm), where p is the multiplicity index, Ylm is the spherical harmonics having angular momentum quantum number l and magnetic quantum number m, r ≡ r − ri and \(\hat{r}\equiv ({{{{{{{\bf{r}}}}}}}}-{{{{{{{{\bf{r}}}}}}}}}_{i})/|{{{{{{{\bf{r}}}}}}}}-{{{{{{{{\bf{r}}}}}}}}}_{i}|\) where ri is the position of the ith atom. The transformation rule for the Hamiltonian matrix between the two coordinate systems described above is

$${\left({H}_{i{p}_{1},j{p}_{2}}^{{\prime} }\right)}_{{m}_{1}{m}_{2}}^{{l}_{1}{l}_{2}}=\mathop{\sum }\limits_{{m}_{1}^{{\prime} }=-{l}_{1}}^{{l}_{1}}\mathop{\sum }\limits_{{m}_{2}^{{\prime} }=-{l}_{2}}^{{l}_{2}}{D}_{{m}_{1}{m}_{1}^{{\prime} }}^{{l}_{1}}({{{{{{{\bf{R}}}}}}}}){D}_{{m}_{2}{m}_{2}^{{\prime} }}^{{l}_{2}}{({{{{{{{\bf{R}}}}}}}})}^{*}{\left({H}_{i{p}_{1},j{p}_{2}}\right)}_{{m}_{1}^{{\prime} }{m}_{2}^{{\prime} }}^{{l}_{1}{l}_{2}},$$
(1)

where \({D}_{m{m}^{{\prime} }}^{l}({{{{{{{\bf{R}}}}}}}})\) is the Wigner D-matrix. The equivariance of the mapping \(\{{{{{{{{\mathcal{R}}}}}}}}\}\,\mapsto\, {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) requires that, if the change of coordinates causes the positions of the atoms to transform, the corresponding Hamiltonian matrix must transform covariantly according to Eq. (1).

ENN is applied to construct the mapping \(\{{{{{{{{\mathcal{R}}}}}}}}\}\,\mapsto\, {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) in order to preserve equivariance. The input, output, and internal feature vectors of ENNs all belong to a special set of vectors that have the form xl = (xl,l, …, xl,−l) and transform according to the following rule:

$${x}_{lm}^{{\prime} }=\mathop{\sum }\limits_{{m}^{{\prime} }=-l}^{l}{D}_{m{m}^{{\prime} }}^{l}({{{{{{{\bf{R}}}}}}}}){x}_{l{m}^{{\prime} }}.$$
(2)

This vector is said to carry the irreducible representation of the SO(3) group of dimension 2l + 1. If the input vectors are transformed according to Eq. (2), then all the internal features and the output vectors of the ENN will also be transformed accordingly. Under this constraint, the ENN incorporates learnable parameters in order to model equivariant relationships between inputs and outputs.

The method of constructing the equivariant mapping \(\{{{{{{{{\mathcal{R}}}}}}}}\}\,\mapsto\, {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) is illustrated in Fig. 2. The atomic numbers Zi and interatomic distances rij ≡ ri − rj are used to construct the l = 0 input vectors (scalars). Spherical harmonics acting on the unit vectors of relative positions \({\hat{r}}_{ij}\) constitute input vectors of l = 1, 2, … . The output vectors of the ENN are passed through the Wigner–Eckart layer before representing the final Hamiltonian. This layer exploits the essential concept of the Wigner–Eckart theorem:

$${l}_{1}\otimes {l}_{2}=|{l}_{1}-{l}_{2}|\oplus \cdots \oplus ({l}_{1}+{l}_{2}).$$
(3)

“  ” and “  ” signs stand for direct sum and tensor product of representations, respectively. “=” denotes equivalence of representations, i.e., they differ from each other by a change of basis. The coefficients in the change of basis are exactly the celebrated Clebsch–Gordan coefficients. The representation l1l2 is carried by the tensor \({{{{{{{{\bf{x}}}}}}}}}_{{l}_{1}{l}_{2}}\), which transforms according to the rule

$${x}_{{l}_{1}{l}_{2}{m}_{1}{m}_{2}}^{{\prime} }=\mathop{\sum }\limits_{{m}_{1}^{{\prime} }=-{l}_{1}}^{{l}_{1}}\mathop{\sum }\limits_{{m}_{2}^{{\prime} }=-{l}_{2}}^{{l}_{2}}{D}_{{m}_{1}{m}_{1}^{{\prime} }}^{{l}_{1}}({{{{{{{\bf{R}}}}}}}}){D}_{{m}_{2}{m}_{2}^{{\prime} }}^{{l}_{2}}({{{{{{{\bf{R}}}}}}}}){x}_{{l}_{1}{l}_{2}{m}_{1}^{{\prime} }{m}_{2}^{{\prime} }}.$$
(4)

Notice that Eq. (4) has the same form as Eq. (1), so the tensor \({{{{{{{{\bf{x}}}}}}}}}_{{l}_{1}{l}_{2}}\) can exactly represent the output Hamiltonian satisfying the equivariant requirements.

Fig. 2: Method of constructing an equivariant mapping \(\{{{{{{{{\mathcal{R}}}}}}}}\}\,\mapsto\, {\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\).
figure 2

Take the Hamiltonian matrix between l = 1 and l = 2 orbitals, for example. a The atomic numbers Zi and interatomic distances rij are used to construct the l = 0 vectors, and the unit vectors of relative positions \({\hat{r}}_{ij}\) are used to construct vectors of l = 1, 2, … . b These vectors are passed to the equivariant neural network. c If neglecting spin–orbit coupling (SOC), the output vectors of the neural network are converted to the Hamiltonian using the rule 1  2  3 = 1  2 via the Wigner–Eckart layer. If including SOC, the output consists of two sets of real vectors which are combined to form complex-valued vectors. These vectors are converted to the spin–orbital DFT Hamiltonian according to a different rule \((1\oplus 2\oplus 3)\oplus (0\oplus 1\oplus 2)\oplus (1\oplus 2\oplus 3)\oplus (2\oplus 3\oplus 4)=(1\otimes \frac{1}{2})\otimes (2\otimes {\frac{1}{2}}^{*})\).

Equivariance of the spin–orbital Hamiltonian

If we further consider the spin degrees of freedom, the transformation rule for the Hamiltonian becomes

$$\begin{array}{l}{\left({H}_{i{p}_{1},j{p}_{2}}^{{\prime} }\right)}_{{m}_{1}{\sigma }_{1}{m}_{2}{\sigma }_{2}}^{{l}_{1}\frac{1}{2}{l}_{2}\frac{1}{2}}=\\ \mathop{\sum }\limits_{{m}_{1}^{{\prime} }=-{l}_{1}}^{{l}_{1}}\mathop{\sum }\limits_{{m}_{2}^{{\prime} }=-{l}_{2}}^{{l}_{2}}\mathop{\sum}\limits_{{\sigma }_{1}^{{\prime} }=\uparrow,\downarrow }\mathop{\sum}\limits_{{\sigma }_{2}^{{\prime} }=\uparrow,\downarrow }{D}_{{m}_{1}{m}_{1}^{{\prime} }}^{{l}_{1}}({{{{{{{\bf{R}}}}}}}}){D}_{{\sigma }_{1}{\sigma }_{1}^{{\prime} }}^{\frac{1}{2}}({{{{{{{\bf{R}}}}}}}})\\ {D}_{{m}_{2}{m}_{2}^{{\prime} }}^{{l}_{2}}{({{{{{{{\bf{R}}}}}}}})}^{*}{D}_{{\sigma }_{2}{\sigma }_{2}^{{\prime} }}^{\frac{1}{2}}{({{{{{{{\bf{R}}}}}}}})}^{*}{\left({H}_{i{p}_{1},j{p}_{2}}\right)}_{{m}_{1}^{{\prime} }{\sigma }_{1}^{{\prime} }{m}_{2}^{{\prime} }{\sigma }_{2}^{{\prime} }}^{{l}_{1}\frac{1}{2}{l}_{2}\frac{1}{2}},\end{array}$$
(5)

where σ1, σ2 are the spin indices (spin up or down). The construction of the spin–orbital DFT Hamiltonian is a far more complicated issue. Electron spin has angular momentum l = 1/2, so it seems that tedious coding and debugging are unavoidable because we have to introduce complex-valued half-integer representations into the neural network, which typically only supports real-valued integer representations for the time being. Furthermore, a 2π rotation brings a vector in 3D space to itself but introduces a factor -1 to the spin-1/2 vector. This means that any mapping from 3D input vectors to l = 1/2 output vectors will be discontinued and cannot be modeled by neural networks, which poses a serious threat to our approach since we only have 3D vectors as input to the neural network (Fig. 2).

Fortunately, we observe that l = 1/2 appearing in the DFT Hamiltonian does not necessarily mean that half-integer representations must be inserted everywhere into the neural network. In fact, they can be restricted to the final output layer as soon as we employ the transformation rule:

$$\left({l}_{1}\otimes \frac{1}{2}\right)\otimes \left({l}_{2}\otimes \frac{1}{2}\right)=\left({l}_{1}\otimes {l}_{2}\right)\otimes (0\oplus 1).$$
(6)

There is no half-integer representation on the right-hand side; thus, it can be further decomposed into integer representations by repeatedly applying Eq. (3).

Another problem is associated with the introduction of complex numbers. Generally, the spin–orbital Hamiltonian matrix elements have complex values, and the ENN cannot simply predict its real and imaginary parts separately because this will violate equivariance. Ordinary neural networks of complex numbers are mostly still under their experimental and developmental stage, so the use of complex-valued ENN is practically difficult, if not impossible. Nevertheless, we have discovered a way to sidestep this problem. Under the bases which are eigenvectors of the time-reversal operator, the D-matrices of integer l will become purely real. Consequently, for a vector with integer l under that basis, its complex and real parts will never mingle with each other when the vector is multiplied by a real transformation matrix. Then one complex vector can be technically treated as two real vectors while preserving equivariance. Note that this is not true for half-integer representations, for that we must add up the real and imaginary parts before the integer representations are converted to half-integer representations in the Wigner–Eckart layer (Fig. 2).

Yet another subtle issue arises in Eq. (5). It is not exactly the same as Eq. (4) in that two of the D-matrices in the former equation are taken as complex conjugates, but those in the latter are not. In fact, instead of constructing a vector with representation \(({l}_{1}\otimes \frac{1}{2})\otimes ({l}_{2}\otimes \frac{1}{2})\), we must construct \(({l}_{1}\otimes \frac{1}{2})\otimes ({l}_{2}^{*}\otimes {\frac{1}{2}}^{*})\) to represent the spin–orbital Hamiltonian described in Eq. (5). Here, l* denotes the representation whose representation matrix is replaced by its complex conjugate. This is not a problem for integer l, but is critical for l = 1/2. If not treated properly, the overall equivariance will be violated. In order to solve this problem, we first notice that the representation l* is still a representation of the SU(2) group with dimension 2l + 1. In fact, it is guaranteed to be equivalent to the representation l without complex conjugate. In other words, there must exist a unitary matrix Pl for each integer or half-integer l satisfying

$${{{{{{{{\bf{D}}}}}}}}}^{l}{(g)}^{*}={{{{{{{{\bf{P}}}}}}}}}^{l}{{{{{{{{\bf{D}}}}}}}}}^{l}(g){({{{{{{{{\bf{P}}}}}}}}}^{l})}^{{{{\dagger}}} },\forall g\in \,{{\mbox{SU(2)}}}\,.$$
(7)

This is guaranteed by the fact that the quantum rotation operator \(\hat{U}(g)\) commutes with the time-reversal operator \({{{{{{{\mathcal{T}}}}}}}}\): \(\langle lm|\hat{U}(g)|l{m}^{{\prime} }\rangle=\langle lm|{{{{{{{{\mathcal{T}}}}}}}}}^{{{{\dagger}}} }\hat{U}(g){{{{{{{\mathcal{T}}}}}}}}|l{m}^{{\prime} }\rangle={(-1)}^{m-{m}^{{\prime} }}{\langle l,-m|\hat{U}(g)|l,-{m}^{{\prime} }\rangle }^*\). The matrix P in Eq. (7) is thus given by

$${P}_{m{m}^{{\prime} }}^{l}={(-1)}^{l-m}{\delta }_{m,-{m}^{{\prime} }}.$$
(8)

Therefore, we only need to apply a change of basis to convert a vector carrying representation l to a vector carrying l*. Notice that this property holds even for material systems without time-reversal symmetry.

The workflow of constructing the DFT Hamiltonian is summarized and illustrated in Fig. 2. In order to construct a Hamiltonian with SOC, the output vectors from the ENN are first separated into two real components, then combined together into complex vectors and passed to the Wigner–Eckart layer. The Wigner–Eckart layer uses the rules in Eq. (3) and Eq. (6) to convert these vectors to tensors of the form in Eq. (4), except that the tensors here have rank 4 instead of 2. After that, the last spin index is converted to its complex conjugate counterpart by the change of basis using Eq. (8) for l = 1/2. The output tensors follow the same transformation rule under coordinate rotation as the DFT Hamiltonian in Eq. (5), and thus could be used to represent the DFT Hamiltonian matrix.

Finally, we discuss two remaining issues. To include parity, we will consider E(3) = SE(3)  {E, I}, where E is the identity and I is the spatial inversion. Under a coordinate transform, the vector is multiplied by −1 if it has odd parity and the coordinate transform involves spatial inversion. The parity of the Hamiltonian is determined by \({(-1)}^{{l}_{1}+{l}_{2}}\). In addition, there is a possible ambiguity in Eq. (5) since the mapping from a classical rotation R to a quantum rotation \({{{{{{{{\bf{D}}}}}}}}}^{\frac{1}{2}}\) is not single-valued. However, the possible factor −1 will always be canceled between the two D-matrices in that equation, which eliminates the potential problem.

The neural network architecture of DeepH-E3

Here we present the neural network architecture of the DeepH-E3 method. An illustration of the architecture can be found in Fig. 3. The general structure is based on the message-passing neural network9,30 that has been widely used in materials research6,7,14,15,16,17,18,19,25. The material structure is represented by a graph, where each atom is associated with a vertex (or node). Edges are connected between atom pairs with nonzero inter-site hopping, and self-loop edges are included to describe intra-site coupling. Every vertex i is associated with a feature vi and every edge ij with eij. These features are composed of several vectors defined in Eq. (2). As illustrated in Fig. 3a, the initial feature \({{{{{{{{\bf{v}}}}}}}}}_{i}^{(0)}\) of vertex i is the trainable embedding of the atomic number Zi, and the initial eij is the interatomic distance rij expanded using the Gaussian basis eB(rij) as defined in Eq. (11). The features of vertices and edges are iteratively updated using features of their neighborhood as incoming messages. Finally, the final edge feature eij is passed through a linear layer and used to construct the Hamiltonian matrix block Hij between atoms i and j using the method illustrated in Fig. 2. It is worth mentioning that, under the message-passing scheme, the output Hamiltonian is only influenced by the information of its neighborhood environment. The nearsightedness property29 ensures efficient linear-scaling calculations as well as good generalization ability25.

Fig. 3: Neural network architecture of DeepH-E3.
figure 3

a Overall network structure. Elemental embeddings and Gaussian expansions (see (b)) serve as initial vertex and edge features, respectively. The vertex and edge features are updated L times by update blocks (see (d) and (e)), which encode the interatomic distances and directional information through equivariant convolutions (EquiConv, see (c)). The “” sign stands for channel-wise multiplication and for vector concatenation. \({{{{{{{{\mathcal{N}}}}}}}}}_{i}\) is the neighborhood of vertex i. The final edge vectors \(\{{{{{{{{{\bf{e}}}}}}}}}_{{{{{{{ij}}}}}}}^{(L)}\}\) are passed into the Wigner–Eckart layer depicted in Fig. 2 to represent the DFT Hamiltonian.

The equivariant building blocks of the neural network are implemented using the scheme provided by Tensor-Field Networks21 and e3nn24,31. The feature vectors \({x}_{cm}^{(l)}\) processed by these neural network blocks are implemented as dictionaries with key l, an integer which is the order of representation of the SO(3) group. c is the “channel index” ranging from 1 to n(l), where n(l) is the number of channels at order l, and each channel refers to a vector defined in Eq. (2).

The E3Linear layer defined in Eq. (12) possesses learnable weights and biases, which is similar to linear layers in conventional neural networks, but only connects vectors of the same representation to preserve equivariance. The gate layer introduces equivariant nonlinearity, as proposed in ref. 22, where nonlinearly activated l = 0 vectors (i.e., scalars) are used as scaling factors (“gates”) to the norms of l ≠ 0 vectors.

We propose a normalization scheme, E3LayerNorm, that normalizes the feature vectors using mean and variance obtained from the layer statistics while preserving equivariance:

$$\,{{\mbox{E3LayerNorm}}}\,{({{{{{{{{\bf{v}}}}}}}}}_{i})}_{cm}^{(l)}={g}_{c}^{(l)}\frac{{({{{{{{{{\bf{v}}}}}}}}}_{i})}_{cm}^{(l)}-{\mu }_{m}^{(l)}}{{\sigma }^{(l)}+\epsilon }+{b}_{c}^{(l)},$$
(9)

where ϵ is introduced to maintain numerical stability, \({g}_{c}^{(l)},{b}_{c}^{(l)}\) are learnable affine parameters, the mean \({\mu }_{m}^{(l)}=\frac{1}{N{n}^{(l)}}\mathop{\sum }\nolimits_{i=1}^{N}\mathop{\sum }\nolimits_{c=1}^{{n}^{(l)}}{({{{{{{{{\bf{v}}}}}}}}}_{i})}_{cm}^{(l)}\), the variance \({({\sigma }^{(l)})}^{2}=\frac{1}{N{n}^{(l)}}\mathop{\sum }\nolimits_{i=1}^{N}\mathop{\sum }\nolimits_{c=1}^{{n}^{(l)}}\mathop{\sum }\nolimits_{m=-l}^{l}|{({{{{{{{{\bf{v}}}}}}}}}_{i})}_{cm}^{(l)}-{\mu }_{m}^{(l)}{|}^{2}\), N is the total number of vertices. Here only the E3LayerNorm for vertex update blocks is described. The corresponding E3LayerNorm for edge update blocks is similar with the mean and variance obtained from edge features instead of vertex features. We find that E3LayerNorm significantly stabilizes the training process. A discussion about the use of E3LayerNorm can be found in Supplementary Note 5.

The previously discussed blocks do not include coupling between different l’s. This problem is resolved by the tensor product layer:

$${z}_{cm}^{(l)}=\mathop{\sum}\limits_{{l}_{1}{l}_{2}}\mathop{\sum}\limits_{{m}_{1}{m}_{2}}\mathop{\sum}\limits_{{c}_{1}{c}_{2}}{C}_{{l}_{1}{m}_{1};{l}_{2}{m}_{2}}^{lm}\left({U}_{c{c}_{1}}^{({l}_{1})}{x}_{{c}_{1}{m}_{1}}^{({l}_{1})}\right)\left({V}_{c{c}_{2}}^{({l}_{2})}{y}_{{c}_{2}{m}_{2}}^{({l}_{2})}\right),$$
(10)

where \({C}_{{l}_{2}{m}_{2};{l}_{3}{m}_{3}}^{{l}_{1}{m}_{1}}\) are Clebsch–Gordan coefficients, \({U}_{c{c}^{{\prime} }}^{(l)},{V}_{c{c}^{{\prime} }}^{(l)}\) are learnable weights. This is abbreviated as z = (Ux)  (Vy).

The neural network architecture is illustrated in Fig. 3. The equivariant convolution block (EquiConv, Fig. 3c) encodes the information of an edge and the vertices connected to that edge. The core component of equivariant convolution is the tensor product (Eq. (10)) of the vertex and edge features (vivjeij) and the spherical harmonics of the edge ij (\({{{{{{{\bf{Y}}}}}}}}({\hat{r}}_{ij})\)). Here stands for vector concatenation. The tensor product introduces directional information of material structure into the neural network. Propagating directional information into neural networks is important, as emphasized by previous works12,14, which is realized in an elegant way here via the tensor product. The interatomic distance information is also encoded into the neural network. It is expanded using the Gaussian basis expansion and then fed into a fully connected neural network, whose output is multiplied element-wise to the output of gate nonlinearity.

The vertex update block (Fig. 3d) aggregates information from the neighboring environment. To update a vertex, every edge connected to that vertex contributes a “message” generated by the equivariant convolution (EquiConv) block. All the “messages” are summed and normalized to update the vertex feature. This is similar for the edge update block (Fig. 3e), except that only the output of EquiConv on edge ij is used for updating eij. After several updates, the final edge feature vectors will serve as the neural network output and are passed into the Wigner–Eckart layer to construct the Hamiltonian matrix blocks, as illustrated in Fig. 2. More details are described in “Methods”.

Capability of DeepH-E3

The incorporation of global Euclidean symmetry as a priori knowledge provided to the message-passing deep-learning framework in the DeepH-E3 model has led to its outstanding performance in terms of efficiency and accuracy. A remarkable capability of DeepH-E3 is to learn from DFT data on small structures and make predictions on varying structures of different sizes without having to perform further DFT calculations. This enables highly efficient electronic structure calculations of large-scale material systems at ab initio accuracy. All the DFT Hamiltonian matrices used for deep learning in this work are computed by the OpenMX code using the PAO basis. After example studies on monolayer graphene and MoS2 datasets, we will first demonstrate the capability of DeepH-E3 by investigating twisted bilayer graphene (TBG), especially the well-known magic-angle TBG whose DFT calculation is important but quite challenging due to its huge Moiré supercell. Next, we will apply DeepH-E3 to study twisted van der Waals (vdW) materials with strong SOC, including bilayers of bismuthene, Bi2Se3, and Bi2Te3, for demonstrating the effectiveness of our equivariant approach to construct the spin–orbital DFT Hamiltonian. Finally, we will use our model to illustrate the SOC-induced topological quantum phase transition in twisted bilayer Bi2Te3, giving an example of exploring exotic physical properties in large-scale material systems.

Study of monolayer graphene and MoS2

Before going to large-scale materials, we first validate our method on the datasets used in ref. 25 to benchmark DeepH-E3’s performance. The datasets are comprised of DFT supercell calculation results of monolayer graphene and MoS2, and different geometric configurations are sampled from ab initio molecular dynamics. The test results are summarized in Table 1 and compared with those of the original DeepH method25, which, instead of using an explicitly equivariant approach, applied the local coordinate technique in handling the covariant transformation property of the Hamiltonian. Our experiments show that the mean absolute errors (MAEs) of Hamiltonian matrix elements averaged over atom pairs are all within a fraction of a meV, which are reduced approximately by a factor of 2 or more in all prediction targets compared with DeepH. Benefiting from the high accuracy of the deep-learning DFT Hamiltonian, band structures predicted by DeepH-E3 can accurately reproduce DFT results (Supplementary Fig. 1).

Table 1 MAEs of DFT Hamiltonian matrix elements averaged over atom pairs for monolayer graphene and MoS2, all in units of meV.a

Application to twisted vdW materials

Our deep learning method is particularly useful for studying the electronic structure of twisted vdW materials. This class of materials has attracted great interest for research and applications since their Moiré super periodicity offers a new degree of freedom to tune many-body interactions and brings in emergent quantum phenomena, such as correlated states32, unconventional superconductivity33, and higher-order band topology34. Traditionally, it is challenging to perform computationally demanding DFT calculations on large Moiré structures. However, this challenge could be largely overcome by DeepH-E3. One may train the neural network models by DFT data on small, nontwisted, randomly perturbed structures and predict the DFT Hamiltonian of arbitrarily twisted structures bypassing DFT via deep learning, as illustrated in Fig. 4a. This procedure demands much less computational resources than directly doing DFT calculations on large twisted superstructures.

Fig. 4: Application of DeepH-E3 to study twisted bilayer graphene (TBG).
figure 4

a Workflow of DeepH-E3. The neural network model is first trained on DFT data of small, nontwisted, randomly perturbed structures and then generalized to study arbitrarily twisted structures without invoking DFT anymore. b Performance of DeepH-E3 vs. the original DeepH method25 on studying TBGs of varying twist angle θ. The averaged mean absolute errors (MAEs) of the DFT Hamiltonian are displayed for Moiré supercells of varied sizes. c Band structure of the magic-angle TBG (θ = 1.08, 11,164 atoms per supercell, structure relaxed by previous work35) computed by DeepH-E3, DFT, and continuum model35. Here the DFT benchmark calculations were performed with a different code using a plane-wave basis instead of an atomic-like basis and different pseudopotential, which could introduce numerical differences with respect to DeepH-E3. Source data are provided with this paper.

Once the model is trained, it can be applied to study TBGs of varying twist angles. The performance is compared with that of DeepH. Test data includes DFT results for systems containing up to more than one thousand atoms per supercell. As summarized in Fig. 4b, DeepH-E3 significantly reduces the averaged MAEs of DFT Hamiltonian matrix elements by more than a factor of 2 as compared to DeepH, consistent with the above conclusion. Moreover, the MAEs reach ultralow values of 0.2–0.3 meV and gradually decrease with increasing Moiré supercell size (or decreasing twist angle). This demonstrates the good generalizability of DeepH-E3. The method is thus expected to be suitable for studying TBGs with small twist angles that are of current interest35.

We take the magic-angle TBG with θ = 1.08 and 11,164 atoms per supercell as a special example. The discoveries of novel physics relevant to flat bands in this system have triggered enormous interest in investigating twisted vdW materials. Due to the large supercell, DFT study of magic-angle TBG is a formidable task, but DeepH-E3 can routinely study such kind of material systems in a particularly accurate and efficient way. As shown in Fig. 4c, the electronic bands of magic-angle TBG with relaxed structure computed by DeepH-E3 agree well with the published results obtained by DFT and low-energy effective continuum model35. The flat bands near the Fermi level are well reproduced. Some minor discrepancies appear away from the Fermi level, which could be partially explained by the methodological difference: the benchmark work uses the plane-wave basis, whereas our work employs the atomic-like basis, and the pseudopotential used is also different. Detailed discussions about the influence of basis set and pseudopotential are included in Supplementary Note 2.

Most remarkably, DeepH-E3 has the capability to reduce the computational cost of studying these large material systems by several orders of magnitude. The DFT calculation (including structural relaxation) on magic-angle TBG performed in ref. 35 took around 1 month on about five thousand CPU cores. In contrast, the major computational cost of DeepH-E3 comes from neural network training. Typically, only a few hundreds of DFT training calculations are needed, and the training process usually takes tens of GPU hours, but all these are only required to be done once. After that, DFT Hamiltonian matrices can be constructed very efficiently via neural network inference. The process time is on the order of minutes by one GPU for magic-angle TBG, which grows linearly with Moiré supercell size. Generalized eigenvalue problems are solved for 60 bands near the Fermi level to obtain the band dispersion, which only requires about 8 min per k-point for magic-angle TBG using 64 CPU cores. The low computational cost and high accuracy of DeepH-E3 demonstrate its potential power in resolving the accuracy-efficiency dilemma of ab initio calculation methods, and it would be highly favorable to future scientific research.

Study of twisted vdW materials with strong SOC

We have tested the performance of DeepH-E3 on studying twisted vdW materials with strong SOC, including twisted bilayers of bismuthene, Bi2Se3, and Bi2Te3. The latter two materials are more complicated, which include two quintuple layers and two kinds of elements (Fig. 5a for Bi2Te3). The strong SOC introduces additional complexity in their electronic structure problems. Despite all these difficulties, the capability of DeepH-E3 is not influenced to any extent. Our method reaches sub-meV accuracy in predicting DFT Hamiltonians of test material samples, including nontwisted and twisted structures of bismuthene, Bi2Se3, and Bi2Te3 bilayers. Impressively, the band structures predicted by DeepH-E3 match well with those obtained from DFT (Supplementary Fig. 3). Moreover, we observe the remarkable ability of our model to fit a tremendous amount of data with moderate model capacity and relatively small computational complexity. For instance, the neural network model is able to fit 2.8 × 109 nonzero complex-valued Hamiltonian matrix elements in the dataset with about 105 real parameters. The training time is about one day on a single GPU in order to reach sub-meV accuracy. More details are presented in Supplementary Note 3. Through these experiments, the capability of DeepH-E3 to represent the spin–orbital DFT Hamiltonian is well demonstrated.

Fig. 5: Application of DeepH-E3 to study spin–orbit coupling (SOC) effects of twisted vdW materials.
figure 5

a Schematic structure of twisted bilayer Bi2Te3. b, c Band structures of bilayer Bi2Te3 with twist angle θ = 21.8 predicted by DeepH-E3 without SOC (λ = 0) and with full SOC (λ = 1), compared to DFT results. d Evolution of band structure as a function of SOC strength predicted by DeepH-E3. Closing and reopening of the band gap at the Γ point are visualized, indicating a topological quantum phase transition from Z2 = 0 to Z2 = 1 driven by SOC. Source data are provided with this paper.

In physics, the SOC can induce many exotic quantum phenomena, leading to emergent research fields of spintronics, unconventional superconductivity, topological states of matter, etc. Investigation of SOC effects is thus of fundamental importance to the research of condensed matter physics and materials science. The functionality of analyzing SOC effects is easily implemented by DeepH-E3. Specifically, we apply two neural network models to learn DFT Hamiltonians with full SOC (\({\hat{H}}_{1}\)) and without SOC (\({\hat{H}}_{0}\)) separately for the same material system. Then, we define a virtual Hamiltonian as a function of SOC strength (λ): \({\hat{H}}_{\lambda }={\hat{H}}_{0}+\lambda {\hat{H}}_{{{{{{{{\rm{SOC}}}}}}}}}\), where \({\hat{H}}_{{{{{{{{\rm{SOC}}}}}}}}}={\hat{H}}_{1}-{\hat{H}}_{0}\). By studying the virtual Hamiltonian at different λ, we can systematically analyze the influence of SOC effects on material properties.

As an example application, we employ the approach to investigate the topological properties of twisted bilayer Bi2Te3. DeepH-E3 can accurately predict the DFT Hamiltonian for both cases with or without SOC, as confirmed by band structure calculations using the predicted \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) (Fig. 5b, c). Herein the SOC is extremely strong as caused by the heavy elements in the material. Consequently, the band structure changes considerably when SOC is turned on. The evolution of band structure as a function of SOC strength (Fig. 5d) provides rich information on the SOC effects. Importantly, the band gap closes and reopens when increasing the SOC strength, indicating a topological quantum phase transition from Z2 = 0 to Z2 = 1. This is further confirmed by applying symmetry indicators based on Kohn-Sham orbital analysis and by performing Brillouin-zone integration of Berry connection and curvature over all occupied states via the Fukui-Hatsugai-Suzuki formalism36. The topological invariant Z2 turns out to be nonzero for the spin–orbital coupled system, suggesting that the twisted bilayer Bi2Te3 (θ = 21.8) is topologically nontrivial. As DeepH-E3 works well for varying twist angles, the dependence of band topology on twist angle can be systematically computed, which will enrich the research of twisted vdW materials.

Discussion

Since the DFT Hamiltonian \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) transforms covariantly between reference frames, it is natural and advantageous to construct the mapping from crystal structure \(\{{{{{{{{\mathcal{R}}}}}}}}\}\) to \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) in an explicitly equivariant manner. In this context, we have developed a general framework to represent \({\hat{H}}_{{{{{{{{\rm{DFT}}}}}}}}}\) with a deep neural network DeepH-E3 that fully respects the principle of covariance even in the presence of SOC. We have presented the theoretical basis, code implementation, and practical applications of DeepH-E3. The method enables accurate and efficient electronic structure calculation of large-scale material systems beyond the scope of traditional ab initio approaches, opening possibilities to investigate rich physics and novel material properties at a particularly low computational cost.

However, as the structure becomes larger, it becomes increasingly difficult to diagonalize the Hamiltonian matrix in order to obtain wavefunction-related physical quantities. This difficulty, instead of the limitations of DeepH-E3 method itself, will eventually become the bottleneck of accurate electronic structure predictions. Nevertheless, benefiting from the sparseness of the DFT Hamiltonian matrix under localized atomic orbital basis, many efficient O(N) algorithms with high parallel efficiency are available for studying large-scale systems (e.g., supercells including up to 107 atoms37). A combination of the DeepH-E3 method with such efficient linear algebra algorithms will be a promising direction for future study.

The unique abilities of DeepH-E3, together with the general framework of incorporating symmetry requirements and physical insights into neural network model design, might find wide applications in various directions. For example, the method can be applied to build a material database for a diverse family of Moiré-twisted materials. For each kind of material, only one trained neural network model will be needed for all the twisted structures in order to have full access to their electronic properties, which is a great advantage for high throughput material discovery. Moreover, since the deep-learning method does not rely on periodic boundary conditions, 2D materials with incommensurate twist angles can also be investigated, making the ab initio study of quasi-crystal phases possible. In addition, we could go one step further by calculating the derivative of the electronic Hamiltonian with respect to atomic positions via automatic differentiation techniques. This enables deep-learning investigation of the physics of electron-phonon coupling in large-scale materials, which has the potential to outperform the computationally expensive traditional methods of frozen phonon or density functional perturbation theory38. Furthermore, one may combine the deep learning method with advanced methods beyond the DFT level, such as hybrid functionals, many-body perturbation theory, time-dependent DFT, etc. These important generalizations, if any of them are realized, would greatly enlarge the research scope of ab initio calculation.

Methods

Datasets

Data generated in this study is available in public repositories at Zenodo39,40,41.

Monolayer graphene: The dataset is taken from ref. 25. The dataset consists of 450 graphene structures with 6 × 6 supercells, generated by ab initio molecular dynamics performed by the Vienna ab initio simulation package (VASP)42, using the PBE43 exchange-correlation functional and the projector-augmented wave (PAW) pseudopotentials44,45. The cutoff energy of the plane waves is 450 eV, and only the Γ point is used in our k-mesh. Five thousand frames are obtained at 300K with time step 1fs, and then one frame is taken out every 10 frames starting from the 500th frame. Thus, there are 450 structures in the dataset. The Hamiltonians for training are calculated with the OpenMX code using the PBE functional and norm-conserving pseudopotential with C6.0-s2p2d1 PAOs with 5 × 5Γ-centered k-sampling. Here 6.0 denotes the orbital cutoff radius in Bohr, s2p2d1 means there are 2 × 1 = 2 s-orbitals, 2 × 3 = 6 p-orbitals, and 1 × 5 = 5 d-orbitals.

Monolayer MoS2: The dataset is also taken from ref. 25. Five hundred structures with 5 × 5 supercells are generated by ab initio molecular dynamics performed by VASP with PAW pseudopotential and PBE functional. The cutoff energy of the plane waves is 450 eV, and only the Γ point is used in our k-mesh. One thousand frames are taken at 300K with time step 1fs. The first 500 unequilibrated structures are discarded, and the remaining 500 structures are taken into the dataset. The Hamiltonians for training are calculated with the OpenMX code using the PBE functional and norm-conserving pseudopotential with Mo7.0-s3p2d2 and S7.0-s2p2d1 PAOs with 5 × 5Γ-centered k-sampling.

Bilayer graphene: The dataset is also taken from ref. 25. Three hundred structures with 4 × 4 nontwisted supercells are generated by a uniform shift of one of the two vdW layers and inserting random perturbations to atomic positions in the mean time. The perturbations are within 0.1 Å along three cartesian directions. The supercells are constructed from bilayer unit cell structures relaxed with VASP42 using PBE functional with vdW interaction corrected by DFT-D3 method with Becke–Jonson damping46. The optimal interlayer spacing is found to be 3.35 Å. The Hamiltonians of the dataset and twisted structures are all calculated with the OpenMX code using the PBE functional and norm-conserving pseudopotential with C6.0-s2p2d1 PAOs.

Bilayer bismuthene, Bi2Se3, and Bi2Te3: The same procedure is used to generate nontwisted 3 × 3 bilayer supercells. The numbers of structures are 576, 576, and 256 for bismuthene, Bi2Se3, and Bi2Te3, respectively, but only a randomly selected subset is used for training (details can be found in Supplementary Note 4). The interlayer spacing is 3.20 Å, 2.50 Å, and 2.61 Å for bismuthene, Bi2Se3, and Bi2Te3, respectively. The interlayer spacing is defined to be the vertical distance between the lowest atom in the upper layer and the highest atom in the lower layer. The Hamiltonians of the dataset and twisted structures are all calculated with the OpenMX code using the PBE functional and norm-conserving pseudopotential with Bi8.0-s3p2d2, Se7.0-s3p2d1 and Te7.0-s3p2d2 PAOs.

Details of neural network models

All the neural network models presented in this article are trained by directly minimizing the mean-squared errors of the model output compared to the Hamiltonian matrices computed by DFT packages, and the reported MAEs are also obtained from comparing model output to the DFT results. All physical quantities of materials are derived from the output Hamiltonian matrix.

Some details of neural network building blocks are described here. The Gaussian basis is adapted from ref. 4, which is defined as:

$${{{{{{{{\bf{e}}}}}}}}}_{{{{{{{{\rm{B}}}}}}}}}{(|{{{{{{{{\bf{r}}}}}}}}}_{ij}|)}_{n}=\exp \left(-\frac{{\left(\left|{{{{{{{{\bf{r}}}}}}}}}_{ij}\right|-{r}_{n}\right)}^{2}}{2{\Delta }^{2}}\right),$$
(11)

where rn, n = 0, 1, …  are evenly spaced, with intervals equal to Δ. The E3Linear layer is defined as:

$$\,{{\mbox{E3Linear}}}\,{({{{{{{{\bf{x}}}}}}}})}_{cm}^{(l)}=\mathop{\sum }\limits_{{c}^{{\prime} }=1}^{{n}^{(l)}}{W}_{c{c}^{{\prime} }}^{(l)}{x}_{{c}^{{\prime} }m}^{(l)}+{b}_{c}^{(l)},$$
(12)

where \({W}_{c{c}^{{\prime} }}^{(l)},{b}_{c}^{(l)}\) are learnable weights and biases, \({b}_{c}^{(l)}=0\) for l ≠ 0. In the gate layer, the l = 0 part of the input feature is separated into two parts, denoted as \({x}_{1c}^{(0)}\) and \({x}_{2c}^{(0)}\). Notice that the index m is omitted because l = 0. The output feature is calculated by

$$\,{{\mbox{Gate}}}\,{({{{{{{{\bf{x}}}}}}}})}_{cm}^{(l)}=\left\{\begin{array}{ll}{\phi }_{1}({x}_{1c}^{(0)}),\quad &l=0\\ {\phi }_{2}({x}_{2c}^{(0)}){x}_{cm}^{(l)},\quad &l\,\ne\, 0\end{array}\right..$$
(13)

Here ϕ1 and ϕ2 are activation functions. In this work, we use ϕ1=SiLU and ϕ2=Sigmoid following ref. 7.

The ENN is implemented with the e3nn library31 in version 0.3.5 and PyTorch47 in version 1.9.0. The Gaussian basis expansion used as input to the EquiConv layer has a length of 128. The fully connected neural network in the EquiConv layer is composed of two hidden layers, each with 64 hidden neurons, using the SiLU function as nonlinear activation and a linear layer as output. A description of neural network hyperparameters for each material system and their selection strategy can be found in Supplementary Note 4.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.