Transferable E(3) equivariant parameterization for Hamiltonian of molecules and solids

Using the message-passing mechanism in machine learning (ML) instead of self-consistent iterations to directly build the mapping from structures to electronic Hamiltonian matrices will greatly improve the efficiency of density functional theory (DFT) calculations. In this work, we proposed a general analytic Hamiltonian representation in an E(3) equivariant framework, which can fit the ab initio Hamiltonian of molecules and solids by a complete data-driven method and are equivariant under rotation, space inversion, and time reversal operations. Our model reached state-of-the-art precision in the benchmark test and accurately predicted the electronic Hamiltonian matrices and related properties of various periodic and aperiodic systems, showing high transferability and generalization ability. This framework provides a general transferable model that can be used to accelerate the electronic structure calculations on different large systems with the same network weights trained on small structures.


Introduction
Nowadays machine learning (ML) has a wide range of applications in molecular and materials science, including the direct prediction of various properties of materials [1][2][3] , the construction of machine learning force fields (MLFFs) with quantum mechanical precision [4][5][6][7] , the highthroughput generation of molecular and crystal structures [8][9][10] , and the construction of more precise exchange-correlation functionals 11,12 .However, the acquisition of the electronic structure of materials still relies almost exclusively on density functional theory (DFT) based calculations.Unfortunately, these methods are very time-consuming to get the Hamiltonian of the systems through self-consistent iterations and scale poorly with the system sizes.Semiempirical tight-binding (TB) approximations 13 , such as the Slater-Koster method 14 , can reduce a lot of computation compared to the ab initio DFT methods.However, this approach often directly uses the existing or manually fine-tuned TB parameters and thus cannot accurately reproduce the electronic structure of general systems.Developing truly transferable, fully datadriven TB models applicable across materials, geometries, and boundary conditions can reconcile accuracy with speed but is rather challenging.
Hegde and Brown first used kernel ridge regression (KRR) to learn semi-empirical tightbinding Hamiltonian matrices 15 .They successfully fitted the Hamiltonian of the Cu system containing only rotation invariant s orbitals and the diamond system consisting only of s and p orbitals.Similarly, Wang et al. designed a neural network model to obtain semi-empirical TB parameters by fitting the ab initio band structure 16 .An important feature of the Hamiltonian matrix is that its components transform equivariantly with the rotation of the coordinate system.However, none of these approaches deals with the rotational equivariance of the Hamiltonian matrix.Moreover, the two methods can only fit the empirical model Hamiltonian rather than the true ab initio tight-binding Hamiltonian matrices generated by the self-consistent iteration of ab initio tight-binding methods such as OpenMX 17,18 and Siesta 19,20 .
Zhang et al. 21and Nigam et al. 22 proposed a method to predict the ab-initio TB Hamiltonian of small molecules and simple solid systems by constructing an equivariant kernel in Gaussian Process Regression (GPR) to parameterize the Hamiltonian.Since GPR uses a fixed kernel and representation [23][24][25][26] , its training accuracy and multi-element generalization ability are usually lower than those of deep neural networks such as the message passing neural networks (MPNNs) [27][28][29][30][31][32] when the number of training samples is sufficient.Therefore, developing graph neural networks (GNNs) capable of predicting the Hamiltonian of general periodic and aperiodic systems would be the best option.
However, traditional GNNs can only predict rotation-invariant scalars such as energy, band gap, etc. GNNs must encode the directional information of the system in an appropriate way to predict equivariant directional properties.To make the predicted Hamiltonian matrices satisfy the rotational equivariance, Schütt et al. designed the SchNorb neural network architecture by embedding the direction information of the bonds into the message-passing function 33 .This network constructs the ab initio Hamiltonian of molecules from the directional edge features of atom pairs.However, SchNorb needs to learn the rotational equivariance of the Hamiltonian matrix through data augmentation, which greatly increases the amounts of training data and redundant parameters of the network.Unke et al. proposed the PhisNet model 34 , which realized the SE(3) equivariant parameterization of the Hamiltonian matrices with GNN based on SO (3)   representations and achieved state-of-the-art accuracy on the Hamiltonian of small molecules such as water and ethanol.However, it should be noted that the PhisNet model is not the most universal representation of the Hamiltonian as it ignores the parity of the Hamiltonian, which may lead to serious problems when predicting periodical systems with infinite sizes.

Recently, Li et al. proposed a GNN model called DeepH to predict the ab initio Hamiltonian
by constructing a local coordinate system in a crystal 35 .DeepH successfully predicted the tightbinding Hamiltonian of some simple periodic systems such as graphene, carbon nanotubes, etc.
Their original intention of introducing the local coordinate system is to solve the rotation equivariance problem of the Hamiltonian, but DeepH still embeds the local directional information of interacting atom pairs in the invariant message passing function, which will undoubtedly increase the number of redundant parameters of the network but may require less data augmentation than a fully invariant model without local coordinate systems.In addition, the hopping distance between two interacting atoms far exceeds the lengths of the general chemical bonds.Taking the smallest hydrogen atom as an example, the cutoff radius of the numerical atomic orbital of the hydrogen atom used by OpenMX is 6 Bohr, so the furthest hopping between any two atomic bases used by OpenMX in periodic systems can exceed at least 12 Bohr (~6.4 Å), a distance that even exceeds the lattice parameters of some crystals.
Therefore, it is difficult to describe such long hopping in a well-defined local coordinate system.
Because of the spherical harmonic part of the atomic basis functions, the TB Hamiltonian matrix must satisfy two fundamental constraints: rotational equivariance and parity symmetry.
When the spin-orbit coupling (SOC) effects or the ionic magnetic moments are taken into account, rotational equivariance in the spin degrees of freedom and additional time-reversal equivariance need to be fulfilled.It is hard for the models to learn the physically correct dependence on the direction of input structures from the data.
In this work, we constructed a general parametrized Hamiltonian by decomposing each block of the Hamiltonian into a vector coupling of equivariant irreducible spherical tensors 36

E(3) equivariant parametrized Hamiltonian
The core of the electronic structure problem in DFT is to solve the Kohn-Sham equation for electrons in reciprocal space.If the Kohn-Sham Hamiltonian is represented by numerical atomic orbitals centered on each atom (such as those defined in the OpenMX and Siesta packages), then the Kohn-Sham equation can be expressed as a generalized eigenvalue problem as follows: where in the basis of atomic orbitals Therefore, once we have obtained the Hamiltonian matrix and overlap matrix in real space, we can further solve the electronic structure in the whole reciprocal space.
Due to the spherical symmetry of the atomic potential, the atomic orbital bases as its eigenfunctions not only satisfy the rotational equivariance under the operation   The irreducible representation of gQ is  is the scalar irreducible representation of the inversion operation, which is defined Substitute the irreducible representation of gQ into Eq.(2), we can get where . We further write the right-hand side of Eq. ( 4) in the form of matrix-vector multiplication: It can be seen from the above equation that each sub-block   , , of the TB Hamiltonian based on atomic orbitals can be regarded as a spherical tensor 36,37 According to the angular momentum theory 37,38 , representation, which can be further decomposed into the direct sum of several irreducible Wigner D matrices: Combining the parity of the Hamiltonian matrix block   , i i j j n l n l , we can get According to Eq. ( 7 where T has the parity symmetry of   and satisfies the rotational equivariance of order L. By inverse linear transformation of Eq. ( 8), , , ,, Therefore, as long as we find all ISTs corresponding to each block of the Hamiltonian matrix, we can construct the entire Hamiltonian matrix in a block-wise manner through Eq. ( 9).We , The prediction of the Hamiltonian is transformed into the prediction of on i Ω and off ij Ω , which can be obtained by mapping from the equivariant features of the nodes and the pair interactions, respectively.The final parameterized Hamiltonian can be expressed as: The above formula is O(3) equivariant.Since GNN naturally has translational symmetry, the parameterized Hamiltonian represented by equivariance.
When the spin-orbit coupling (SOC) effects are considered, the real-space Hamiltonian matrices are complex-valued and can be divided into four sub-blocks ˆi j s s H ( , i j s s or   ) by the spin degree.In this case, the complete Hamiltonian matrices satisfy the following SU (2)   rotational equivariance: Although each subblock ˆi j s s H satisfies the O(3) rotational equivariance, they are coupled to each other under the rotational operations.Therefore, the four subblocks predicted independently with the O(3) equivariant parameterized Hamiltonian cannot be used to construct the complete SU(2) equivariant Hamiltonian with the SOC effect.In addition, the real and imaginary parts of the SU(2) equivariant Hamiltonian matrices are also coupled during rotation, so the complete Hamiltonian cannot be constructed by using the independently predicted real and imaginary parts.These methods not only rely on a large number of fitting parameters but also can not make the constructed Hamiltonian matrices strictly meet the SU(2) equivariance.
To ensure that the SOC effect learned by the network complies with the physical rules and SU(2) equivariance, we explicitly express the complete Hamiltonian as the sum of the spin-less part and the SOC part: where , which satisfies SU(2) rotational equivariance and time-reversal equivariance.  r  is an invariant coefficient describing the strength of the SOC effects 39 .According to the above equation, we can get the following parameterized Hamiltonian matrices with SOC effect in the atomic orbitals 39,40 : Since the matrix representation of the angular momentum operator under the atomic orbital basis can be directly calculated analytically, the only learnable parameters in Eq. ( 15) are H  can be directly expressed by Eq. ( 12), and  is an invariant scalar coefficient that can be mapped from the features of atom pairs ij.
We can further derive the parameterized Hamiltonian satisfying the time-reversal equivariance for the magnetic systems.The classical Heisenberg model can be written in a general matrix form containing all possible second-order interactions 41 : where the 3×3 tensors i j J    and k A   are called the J matrix and single-ion anisotropy (SIA) matrix, respectively.
, which is a functional of the electron density ρ.The magnetization of the system is partitioned into the local magnetic moments on each atom by a pre-defined weight operator ˆi W  , which is commonly defined as , where f is a radial cutoff function centered on atom i .The matrix element of the weight operator ˆi W  is given by 42,43   , , , , , 1 ˆ, 2 0, , The matrix , i i i j j j n l m n l m w varies with the choice of the cutoff function f and can be parameterized by Eq. ( 12).The magnetic part of the parameterized Hamiltonian for the magnetic systems equals the variational derivative The i j J    and k A   tensors are learnable and can be mapped from the features of the edges and nodes respectively.Since the spin magnetic moment is odd under time-reversal operation and even under spatial inversion operation, the tensors i j J    and k A   should be even under time-reversal and spatial inversion operations so that the parameterized Hamiltonian matrix constructed by Eq. ( 18) satisfies the time-reversal equivariance and the parity symmetry.In addition, Eq. ( 18) still satisfies all the equivariance when only the ionic magnetic moments are rotated.

Network Implementation
As can be seen from the above discussion, each Hamiltonian matrix block satisfies the rotational HamGNN first encodes elements, inter-atomic distances, and relative directions as initial graph embeddings.The distance between atom i and its neighboring atom j within the cutoff radius r c is expanded using the Bessel basis function: where c f is the cosine cutoff function, which guarantees physical continuity for the neighbor atoms close to the cutoff sphere.The directional information between atom i and atom j is embedded in a set of real spherical harmonics  , which is used to construct the rotation-equivariant filter 44 in the equivariant message passing.
The atomic feature  20) is composed of contributions with parity i j f p p p  to satisfy parity symmetry.Eq. ( 21) is the update function of orbital equivariant features.The updated orbital equivariant features are passed to a nonlinear gate activation function 45 , which scales the input features equivariantly with the invariant field ( 0 l  ) of the input features as the gate.
The pair interaction layer adjusts the pair interaction features (used to construct off-site Hamiltonian) based on the features of the atomic orbits of two interacting atoms as well as the direction and strength of their interactions by the following equation:

T j j T l p c m l p c c l p c m l p c c l p c m
is the mixed feature vector of , , , , The predicted Hermitian Hamiltonian is obtained by the following symmetrization:

Tests and applications
To

Molecules
The QM9 dataset 47,48 contains 134k stable small organic molecules made up of CHONF.These small organic molecules are important candidates for drug discovery.The development of general ML models for rapid screening of the electronic structure properties of drug molecules is beneficial for understanding the mechanisms of drugs and shortening the cycle of drug development.We calculated the real-space ab initio TB Hamiltonian matrices using OpenMX for 10,000 randomly selected molecules from the QM9 dataset.We divided the whole dataset into the training, validation, and test set with a ratio of 0.8: 0.1: 0.1.As shown in Fig. 2  We also trained HamGNN on the Hamiltonian of several specific small molecules generated by ab initio molecular dynamics (MD) and compared the accuracy of HamGNN with two recently reported models, PhiSNet 34 and DeepH 46 , in predicting the Hamiltonian of these small molecules.The Hamiltonian matrices of these molecules were calculated by OpenMX and divided into the training, validation, and test sets in the same way as PhiSNet in ref. 34 .As can be seen from Table S1

Periodic solids
We collected 426 carbon allotropes from Samara Carbon Allotrope Database (SACADA) 49 , 30 silicon allotropes, and 187 SiO 2 isomers from the Materials Project 50 .Each of these structures contains no more than 60 atoms in its unit cell.We performed DFT calculations using OpenMX We used pentadiamond 51 , Si (MP-1204046), and SiO 2 (MP-667371) to test the accuracy and transferability of the HamGNN models trained on the three datasets.The test structures are shown in Fig. 3(a-c).Pentadiamond is a three-dimensional carbon foam constructed from carbon pentagons and contains 88 carbon atoms in the unit cell 51 .The Si structure labeled MP-1204046 contains 106 atoms in the unit cell and belongs to the tetragonal system.The SiO 2 structure labeled MP-667371 is characterized by the complex porous structures built by SiO 4

Moiré superlattice of bilayer MoS 2
MoS 2 is a 2D transition metal dichalcogenide (TMD) that has attracted much attention because it is an excellent semiconductor with a wide range of applications in the field of electronics and optoelectronics [55][56][57][58] .Different from monolayer or untwisted bilayer MoS 2 , the twisted bilayer MoS 2 with Moiré angles has been found to have flat bands and shear solitons 55,[59][60][61] , which could lead to some novel physical phenomena, such as superconducting states, quantum Hall insulators, Mott-insulating phases.HamGNN was trained on the dataset of the untwisted bilayer

Bi x Se y quantum materials
Bi and Se have multiple chemical valences and can form a set of binary compounds Bi x Se y with various stoichiometric ratios 62,63 .Bi is a heavy element whose d electrons have strong SOC  accurately predicted the SOC Hamiltonian and the energy band of this structure, showing very high transferability.
Bi 2 Se 3 is a well-known 3D topological insulator material and is a good platform to study the quantum effects related to SOC effects [64][65][66][67][68] .The bulk Bi 2 Se 3 is an insulator, while a metallic state protected by the time-reversal symmetry is formed on the surface.Bulk Bi 2 Se 3 is stacked by quintuple layers (QLs) through the van der Waals (vdW) interaction, as is shown in Fig. 6(c N fea and the maximum degree L max for each dataset are listed in Table S2.A two-layer MLP with 64 neurons is used to map the invariant edge embeddings to the weights of each tensor product path in Eqs. 17 To increase transferability and avoid overfitting, we include the error of the calculated energy bands as a regularization term in the loss function: where the variables marked with a tilde refer to the corresponding predictions and λ denotes the (ISTs) with correct parity symmetry.This parametrized Hamiltonian strictly satisfies the rotational equivariance and parity symmetry and can be extended to a parameterized Hamiltonian satisfying SU(2) and time-reversal equivariance to fit the Hamiltonian with SOC effects or ionic magnetic moments.Based on this universal parametrized Hamiltonian, we designed the E(3) equivariant HamGNN model for predicting the ab initio TB Hamiltonian of molecules and solids.HamGNN has reached state-of-the-art accuracy on the benchmark test and shows high efficiency and transferability in the prediction of various periodic and aperiodic systems.The trained HamGNN model can predict the Hamiltonian matrices, energy bands, and wavefunctions of the structures not present in the training set.The high transferability and precision of our model enable this ML electronic structure method to replicate the success of MLFFs and be widely used in practical electronic structure calculations.
overlap matrices at the point  ⃗ in the reciprocal space.Fourier transform of real-space TB Hamiltonian matrix shift vector of periodic image cell.
construct two O(3) equivariant vectors on i Ω and off ij Ω by direct summation of all the ISTs required by the on-site ( ) Hamiltonian and the off-site ( ) Hamiltonian respectively: , , , equivariance and has a definite parity under the inversion operation, so we designed E(3) equivariant HamGNN deep neural network based on MPNN to fit ab initio TB Hamiltonian.This framework directly captures the electronic structure without expensive self-consistent iterations by constructing local equivariant representations of each atomic orbit.The network architecture of HamGNN is shown in Fig. 1(a).HamGNN can achieve a direct mapping from atomic species   i Z and positions   i r  to ab initio TB Hamiltonian matrix.

Fig. 1 .Ω
Fig. 1.HamGNN architecture and the illustration of its subnetworks.(a) The overall architecture of HamGNN.This neural network architecture predicts the Hamiltonian matrix through 5 steps.The prediction starts from the initial graph embedding of the species, interatomic distances, and interatomic directions of molecules and crystals.The atomic orbital features with angular momentum l in the local environment are included in the l-order components of the E(3) equivariant atom features and are refined through T orbital convolution blocks.In the third step, pair interaction features , , , ij l p c m  -site layer and off-site layer are used to convert the required to construct on-site and off-site Hamiltonian blocks, respectively.We add shortcut connections in the on-site layer and off-site layer and also use a norm activation function that scales the modulus of the irreducible representations of each order nonlinearly to increase the nonlinear fitting ability of the network.In the last step, the network uses the ISTs in on i Ω and off ij Ω to construct the on-site and off-site Hamiltonian blocks through Eq. (12).
assess the precision and transferability of HamGNN, we trained and tested HamGNN on the ab initio Hamiltonian matrices and electronic structures for the periodic and aperiodic systems including various molecules, periodical solids, a nanoscale dislocation defect, and a Moiré superlattice.Previously reported models such as SchNorb 33 , PhiSNet 34 , and DeepH 46 are trained and tested on only one configuration each time, and predicting the Hamiltonian matrix of a different configuration requires additional training on the perturbed structures of that configuration.Since HamGNN is based on the universal parameterized Hamiltonian proposed in this work, our model can be trained and tested on structures with the same atomic species but different configurations in the same way as ML force field models.
(a), the prediction values coincide quite well with the DFT calculated values of the Hamiltonian matrices

Fig. 2 .
Fig. 2. Application of HamGNN on molecules in the QM9 dataset.(a) Comparison of the HamGNN predicted Hamiltonian matrix elements with the OpenMX calculated Hamiltonian matrix elements on the QM9 test set.(b) Comparison of predicted and calculated energy levels for 4 molecules randomly selected from the QM9 test set.
to obtain the ab initio Hamiltonian matrices for these structures and divided the Hamiltonian matrices in each dataset into training, validation, and test sets with a ratio of 0.8: 0.1: 0.1.The MAEs of the Hamiltonian predicted by HamGNN for the structures in the test set of carbon allotropes, silicon allotropes, and SiO 2 isomers are 1.84 meV, 2.60 meV, and 3.75 meV, respectively.The MAE of HamGNN on the carbon allotropes is even lower than the error (2.0 meV) of DeepH on the training dataset of only the graphene structures 46 .Most importantly, our HamGNN model trained on the SACADA dataset is transferable and can fit the Hamiltonian of carbon allotropes of arbitrary sizes and configurations outside the training set.

Fig. 5 .
Fig. 5.The electronic structure prediction on the twisted bilayer MoS 2 with a Moiré angle of 3.5°.(a) The band structure of the twisted bilayer MoS 2 .(b) The spatial distribution of VBM wave function.
effects.A total of 19 Bi x Se y compounds can be found on Materials Project50 .The compound Bi 8 Se 7 (id: MP-680214) shown in Fig 6A, which contains 45 atoms in the unit cell, was used to test the transferability and accuracy of HamGNN.The remaining 18 Bi x Se y compounds, which contain no more than 40 atoms in the unit cell, were used to generate the training set for the network.To increase the size of the training set, we applied a random perturbation up to 0.02 Å to the atoms of each Bi x Se y structure to generate 50 new perturbed structures and obtained 900 structures in total.These structures were randomly divided into the training, validation, and test sets with a ratio of 0.8: 0.1: 0.1.The MAE of the real part of the SOC Hamiltonian predicted by the trained model for Bi 8 Se 7 is 1.29 meV, and the MAE of the imaginary part of the SOC Hamiltonian is only 5.0×10 -7 meV.As shown in Fig. 6(b), the predicted and calculated energy bands of Bi 8 Se 7 is very close.Since the SOC effect is mainly reflected in the imaginary part of the Hamiltonian, such a low MAE for the imaginary part indicates that our proposed parameterized SOC Hamiltonian can describe the SOC effect of different systems very accurately.The training set contains only 18 perturbed structures of Bi x Se y compounds and no compounds with the stoichiometric ratio of Bi 8 Se 7 are present, but the HamGNN model still

Fig. 6 .
Fig. 6.The electronic structure prediction on the Bi x Se y quantum materials.(a) The crystal structure of Bi 8 Se 7 .(b) Comparison of HamGNN predicted energy bands (solid line) and DFT calculated energy bands (dashed line) of Bi 8 Se 7 .(c) Schematic diagram of the layered crystal structure of Bi 2 Se 3 .(d) Comparison of HamGNN prediction and DFT calculations of the energy gap at G point.(e) Comparison of HamGNN predicted energy bands (solid line) and DFT calculated energy bands (dashed line) of Bi 2 Se 3 with 6 QLs.(f) The predicted spin textures on the lowest unoccupied state of 0.07 eV and 0.23 eV above the conduction band minimum (CBM).
).Each QL layer is composed of five atomic layers of Se-Bi-Se-Bi-Se combined by strong covalent bonds.The HamGNN predicted and DFT calculated G-point band gaps of the Bi 2 Se 3 Slab model with 1 to 7 QLs are very close, as shown in Fig. 6(d).It can be seen from Fig. 6(e) that the slab model with a single QL has the largest G-point band gap.When a new QL layer is added to the slab, the G-point band gap decreases rapidly under the influence of van der Waals interactions.As the number of QL layers increases, E g (G) gradually decreases, and the band dispersion at the G point gradually tends to be linear to form a Dirac cone.As shown in Fig. 6(e), A Dirac cone with a small gap appears near the Fermi surface at the G point.The spin textures on the lowest unoccupied state with 0.07 eV and 0.23 eV above conduction band minimum (CBM) were calculated using the HamGNN predicted Hamiltonian matrix, as is shown in Fig. 6(f).The predicted spin textures are in good agreement with the fact that the Dirac cone is a topological surface state protected by time-reversal symmetry and that spin and momentum on the topological surface state are bound.Discussion DFT methods are now widely used to calculate various properties of molecules and materials.However, successful DFT calculations on large systems are still rare because of the prohibitive computational resources and running time required.A typical DFT calculation often requires tens to hundreds of self-consistent iterations to obtain the final Hamiltonian and wave function, and the diagonalization of the Hamiltonian on a dense k-point grid is carried out in each iteration step.This process takes up most of the running time of DFT calculation and can not be skipped.In recent years, the emergence of deep learning enables efficient atomic simulations with DFT accuracy.Machine learning force fields (MLFFs) with quantum mechanical precision are now widely used to accelerate long-time molecular dynamics simulations of large systems.Since potential energy is just an invariant scalar, the implementation of universal MLFF models is relatively easy.While the Hamiltonian is a matrix with rotational equivariance and parity symmetry, the implementation of a transferrable model for directly predicting the Hamiltonian is very difficult.In this work, an analytical E(3) equivariant parameterized Hamiltonian that explicitly takes into account rotation equivariance and parity symmetry is proposed and further extended to a parameterized Hamiltonian satisfying SU(2) and time-reversal equivariance to fit the Hamiltonian with SOC effects or ionic magnetic moments.Based on this parameterized Hamiltonian, we develop an E(3) equivariant deep neural network called HamGNN to fit the Hamiltonian of arbitrary molecules and solids.Previously reported models were trained and tested on the datasets of the molecular dynamics perturbed molecules and solids with just the same configuration.To demonstrate the accuracy and transferability of this parameterized Hamiltonian, we used the trained HamGNN model to predict the electronic structures of the molecules, periodic solids, the silicon dislocation defect, Moire bilayer MoS 2 , and Bi x Se y quantum materials.Actual tests show that our model has a high accuracy compared with DFT and a high transferability similar to the machine learning force field.These features are the important foundation for the wide application of machine learning electronic structure methods.Since our model can establish a direct mapping from the structure to the self-consistent Hamiltonian without the time-consuming self-consistent iterative process in DFT, it can be used to accelerate the electronic structure calculation of large systems and other costly advanced calculations, such as the electron-phonon coupling matrix via the automatic differentiation ability of the neural network.

and 19 .
Shifted softplus29 function is used as the activation function in the MLP.The gate activation function scales the input features the activation functions that vary with the parity of the scalar input, defined as follows31 : parity.This ensures that the parity of the output features of the Gate activation function is equivariant.
loss weight of the band energy error.λ equals 0.001 in our training.When the training of the network has not converged, the error of the predicted Hamiltonian is large, resulting in poor or even divergent prediction values of the energy bands.Adding the band loss value at the beginning of training may cause the total loss value to diverge.Therefore, we train the network in two steps.First, only the mean absolute error of Hamiltonian matrices is used as the loss value to train the network until the network weights converge.The parameters were optimized with AdamW78,79 optimizer using an initial learning rate of 10 −3 .Then the mean absolute error of each band calculated at N k random points in the reciprocal space is added to the loss function and starts the training at an initial learning rate of 10 −4 .When the accuracy of the model on the validation set is not improved after successive N patience epochs, the learning rate will be reduced by a factor of 0.5.When the accuracy of the model on the validation set is not improved after successive N stop epochs or the learning rate is lower than 10 −6 , the training will be stopped and the model that has the best accuracy on the validation set will be used on the test set.The values of some key network and training parameters on each dataset are listed in Table All models were trained on a single NVIDIA A100 GPU.
, HamGNN achieves the highest accuracy among the models.The accuracy of DeepH is lower than that of PhiSNet and HamGNN because the local coordinate system used by DeepH is not strictly equivariant.Although SE(3) equivariant PhiSNet shows high accuracy in predicting the molecules, it is not a universal equivariant model because it does not satisfy the parity symmetry of the Hamiltonian matrix strictly.Failure will occur in fitting the periodic solid materials containing much more hopping terms or edges (see Appendix