## Introduction

In the realm of physics, chemistry, material science, and biology, multi-scale modeling1,2 helps us understand the properties of materials in multiple scales of time and space. Molecular dynamics (MD) simulation is an essential tool for modeling dynamical evolution of a many-body system. The trajectories of interacting particles are determined by solving Newton’s equations of motion involving complex interatomic potentials. There are two mainstream approaches for performing MD simulations, i.e., classical MD3 and ab initio molecular dynamics (AIMD)4. The potential energy surface in classical MD is given by parameterized force fields of a presumed functional form, which facilitates large-scale calculations but possesses poor transferability across tasks. On the other hand, AIMD computes the total energy of a system using quantum mechanics methods, such as the density functional theory (DFT)5, that guarantees the applicability and the accuracy under a wide variety of conditions. However, due to the cost of rigorously treating the electronic degrees of freedom, AIMD modeling is currently limited to physical and chemical systems of modest scales. With the rapid development of technology for chemical and material synthesis, the need to construct force fields for large-scale calculations with accuracy comparable to that of the first-principles methods has become ever more urgent.

One recent development to address the above issue is to use machine learning methods6,7 to facilitate MD simulations. The most important tool in machine learning is neural networks. The first framework of neural networks for MD simulations is proposed by Behler and Parrinello8, which is based on fully connected neural networks. Considerable success has been achieved along this route. Especially, Deep potential (DeePMD)9,10 has been developed as a comprehensive software suite and has been used in simulations of crystal nucleation11,12 and construction of phase diagram13. Traditional neural networks, for example, fully connected neural networks and convolutional neural networks, are most useful when the input data are Euclidean. However, atoms are intrinsically indistinguishable and cannot be ordered. As a result, heavy data preprocessing have to be performed in the above-mentioned frameworks. To alleviate such data preprocessing burden, graph neural networks (GNNs)14,15 are introduced. The power of graph formalism lies in its focus on relationships among entities (or nodes) rather than the properties of individual nodes. In particular, message passing neural networks (MPNNs)16 summarized the recapitulative formula for GNN in the spatial domain. With atoms represented as nodes and interactions or bonds between them represented as edges in a graph, molecules or crystals can be transformed to molecular graphs or crystal graphs naturally. GNN-based frameworks for MD simulations, including DTNN17, SchNet18,19, DimeNet20,21, PAINN22, and MDGNN23, have accurately predicted the potential surface of small molecules and crystals. Current GNN-based MD simulations mostly use homograph, where the message passing network is the same regardless of the types of the atoms. On the other hand, it is now a common practice to use the hybrid pair style in MD simulations, which utilizes different force fields for atom pairs of different types. The hybrid pair style is very useful for complex material systems, such as polymers on metal surface, polymers with nano-particles and solid-solid interface between two different materials. This motivates us to explore the possibility of improved performance by using heterogeneous graph in GNN-based MD simulations.

In this work, we propose a framework to model diverse interactions in a single MD simulations, termed heterogeneous relational message passing networks (HermNet). The model shares a similar idea of hybrid pair style in Large-scale Atomic/Molecular Massively Parallel Simulator software24. HermNet splits the molecular or crystal graph into several subgraphs and use different message passing networks for different subgraphs. Within each subgraph, we choose a modified version of polarizable atom interaction neural network (PAINN)22 as the sub-network. Experiments on molecular and extended systems were performed and the results were satisfactory. HermNet provides a general method to design heterogeneous GNN for MD simulations.

## Results

### Preliminary

In the graph theory25, a graph is a data structure composed of sets of vertices and edges. Graphs could be classified either as undirected graphs or digraphs by whether there is an explicit designation of edges’ orientations. From the standpoint that an undirected edge graph can be interpreted as a bidirectional link between the pair of nodes, undirected graphs are made up of digraphs.

Graphs can be further classified either as homogeneous or heterogeneous, according to the types of nodes and edges. A homogeneous graph is a special case of heterogeneous graph. MPNN16, which is a universal spatial-domain-based GNN framework, was proposed for homogeneous graphs. With hv and evw denoting, respectively, node features and edge features in a graph, MPNN is summarized as

$${m}_{v}^{t+1}=\mathop{\sum}\limits_{w\in {{{\mathcal{N}}}}(v)}{M}_{t}({h}_{v}^{t},{h}_{w}^{t},{e}_{vw}^{t}),$$
(1)
$${h}_{v}^{t+1}={U}_{t}({h}_{v}^{t},{m}_{v}^{t+1}),$$
(2)

where the forward propagation is decomposed into two phases, a message passing phase and a readout phase. Mt and Ut are a message function and a update function, respectively. The hidden states hw of all the neighbors $${{{\mathcal{N}}}}(v)$$ of vertex v will be aggregated and then be used to update hidden states of vertex v in the next step. A heterogeneous graph supports sophisticated multi-type relations and inherently enables richer semantic relations. Relational graph convolutional network (R-GCN)26 is an extension of MPNN. $${{{\mathcal{G}}}}=({{{\mathcal{V}}}},{{{\mathcal{E}}}},{{{\mathcal{R}}}})$$ denotes a heterogeneous graph with nodes (entities) $${v}_{i}\in {{{\mathcal{V}}}}$$ and labeled edges (relations) $$({v}_{i},r,{v}_{j})\in {{{\mathcal{E}}}}$$, where $$r\in {{{\mathcal{R}}}}$$ is a relation type, that covers both canonical directional and inverse directional relations. A generalized forward process of an entity vi in a relational graph takes the form

$${h}_{i}^{(l+1)}=\mathop{\sum}\limits_{r\in {{{\mathcal{R}}}}}{U}_{r}^{(l)}\left({h}_{i}^{(l)},\mathop{\sum}\limits_{j\in {{{{\mathcal{N}}}}}_{i}^{r}}{M}_{r}^{(l)}({h}_{i}^{(l)},{h}_{j}^{(l)},{e}_{ij}^{(l)})\right),$$
(3)

where $${{{{\mathcal{N}}}}}_{i}^{r}$$ denotes the set of neighbor indices of vertex i of relation r. Eq. (3) implies that a heterogeneous graph can be decomposed into several homogeneous graphs of distinct relations $${{{\mathcal{R}}}}$$. Typically, each homogeneous graph is a directed graph. In other words, an R-GCN layer is made up of multiple MPNN layers, each of which is associated with a homogeneous graph of relation r.

### Architecture

Diverse forms of force fields are manifestly responsible for the intricate interactions, especially in systems with multiple elements. GNNs for homogeneous graphs model interactions of different atomic pairs with shared parameters, which limits the expressive power for neural-network-based force fields. For example, as shown in Fig. 1(a), there are three kinds of particles, i.e. A-, B- and C-type atoms. The graph is constructed via linking central nodes with their adjacent nodes within a cutoff radius. In a classical MD simulation for this system, six different force fields can be allocated for A–A pairs, A–B pairs, A–C pairs, etc., provided only two-body interactions are considered. If a homogeneous GNN is employed to model different interactions by fitting a single function, it is expected to generate a mean force field. On the other hand, equipped with multiple types of nodes and edges, a heterogeneous GNN is a natural choice to model these interactions with a more detailed resolution.

As shown in the following, we develop a universal framework, HermNet, to model diverse many-body interactions simultaneously via extracting appropriate subgraphs, which are subsequently processed by heterogeneous GNNs. The overview of the entire architecture diagram of HermNet is displayed in Fig. 2(a), which takes atomic numbers Z (and a vector of zeros) as the node’s scalar features (and node’s vectorial features). HermNet is composed of several message passing layers, termed HermConv layers, which model interactions hierarchically. The RMConv modules (Fig. 2b) of different relations constitute the HermConv module (Fig. 2c, d). We introduce three variants of HermNets: heterogeneous pair networks (HPNet), heterogeneous triadic networks (HTNet), and heterogeneous vertex networks (HVNet). A HPNet layer for central nodes of A-type is displayed in Fig. 2c, where all the sub-networks with A-type destination contribute to the local environment of A-type node. A HermNet layer for HVNet and HTNet is displayed in Fig. 2d. If the parameters of its sub-networks [RMConv, see Fig. 3] are shared for the same kinds of central nodes, this HermNet framework is referred to as HVNet. When the parameters are not shared, this HermNet framework is an HTNet. We only test and report HVNet’s performance in the following sections, as the other two models (HPNet and HTNet) have high complexity and will take more training time and data points for a proper assessment.

Most machine learning frameworks for MD simulation only take into account the interatomic distances in feature engineering, ignoring the bond angle information, which is an important characteristic of both molecules and crystals. In principle bond angle can be deduced from interatomic distances. However, it is advantageous to explicitly include bond angle information in feature engineering to achieve better performance. Directional message passing networks (DimeNet)20,21 innovatively introduced three-body interactions explicitly by combining radial and angular information from the edges of the original graph and the corresponding line graph, respectively. PAINN22 is a rotationally equivariant MPNN framework and the complexity of calculating angular information was reduced. In this work, we incorporate angular information by choosing PAINN as the sub-network in HermNet. This specific message passing setup can be directly implemented in HVNet, while slight modifications are required in HTNet to distinguish the type of source nodes. We note that HPNet cannot incorporate all angular information explicitly. For example, the bond angle A → B ← B is lost in HPNet because A → B and B ← B are processed by different sub-networks.

As discussed above, a heterogeneous graph could be decomposed into several homogeneous subgraphs. To describe the method of extracting these subgraphs, we use $${{{\mathcal{G}}}}$$, $${\hat{{{{\mathcal{Q}}}}}}_{s}$$, and $${\hat{{{{\mathcal{Q}}}}}}_{d}$$ to denote the input heterogeneous graph, the operator that returns the subgraphs with specific source nodes, and the operator that returns the subgraphs with specific destination nodes, respectively. As indicated in Fig. 1b, c, the directed subgraphs for HVNet could be extracted via selecting inbound edges of a given A-type destination node, i.e. $${\hat{{{{\mathcal{Q}}}}}}_{d}^{A}{{{\mathcal{G}}}}$$, while those for HTNet are extracted via selecting inbound edges of a given B-type destination node firstly and then choosing out-bound edges of its A-type and C-type source nodes simultaneously, i.e. $${\hat{{{{\mathcal{Q}}}}}}_{s}^{A\cup C}{\hat{{{{\mathcal{Q}}}}}}_{d}^{B}{{{\mathcal{G}}}}$$ for triadic relation A → B ← C. We note that if the two destination nodes are extracted sequentially for HTNet, the result is generally an empty graph.

In the following, we report the testing of HVNet against other prior frameworks on three well-established benchmark datasets. As detailed below, HVNet convincingly outperforms most of the prior methods.

### Benchmarks on MD17 dataset

The MD17 dataset17,27,28 provides non-equilibrium structures sampled (at a time resolution of 0.5 fs) from AIMD trajectories for eight small molecules with a background temperature of 500 K. The potential energy and force labels are computed with PBE + vdW − TS method. Christensen and von Lilienfeld29 found that the energies in original MD17 dataset are contaminated with substantial numerical noises and published a revised version of the MD17 dataset. Distinct HVNet models were trained on this revised dataset, and an a 1000-frame training set and a 1000-frame validation set are randomly selected. The learning rate was initially set at 3 × 10−4 and adaptively reduced when the loss on the validation set reached a plateaus. The truncated radius was set at 5 Å for the construction of molecular graphs. Additional details can be found in the Supplementary Methods. Table 1 presents the comparisons of mean absolute errors (MAEs) of three benchmarked models and HVNet. It should be noted that the results of PAINN and HVNet were trained on revised MD17 dataset, while SchNet and DimeNet were trained on the original MD17 dataset. HVNet outperforms other models with a comfortable margin on three-quarters of the predictive tasks, and its results of the remaining tasks are comparable to the best results among all four frameworks. We also attempted to train an HTNet on the MD17 dataset; however, the parameter space of the HTNet is simply too large, and obvious overfitting was immediately observed after just several training epochs. Then we trained the HTNet model on the HfO2 dataset, which was proposed to fit Gaussian approximation potential models30,31,32,33,34,35, and found that when more than 1500 data points were used for training, no obvious overfitting was observed (detailed discussion with respect to training HTNet model is provided in the Supplementary Notes 2). This indicates that HTNet might be expressive once more data points are provided.

### Benchmarks on QM9 dataset

The QM9 dataset36,37 consists of computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of carbon, hydrogen, oxygen, nitrogen, and fluorine. All properties were calculated at the B3LYP/6-31G (2df, p) level of quantum chemistry. This dataset provides quantum chemical insights for the relevant chemical space of small organic molecules, and has been widely adopted as the benchmark to calibrate, analyze and evaluate new methods in this field. HVNet was trained on 110k molecules and validated on another 10k molecules. The properties of the 134k molecules include dipole moment (μ), isotropic polarizability (α), energy of the highest occupied molecular orbital (εHOMO), energy of the lowest unoccupied molecular orbital (εLUMO), band gap (Δε), electronic spatial extent (R2), zero point vibrational energy (ZPVE), internal energy at 0 K (U0), internal energy at 298.15 K (U), enthalpy at 298.15 K (H), free energy at 298.15 K (G), and heat capacity at 298.15 K (cv). It must be emphasized that HVNets were trained with atomization energies rather than the original internal energies, enthalpy energy, and free energy, i.e., the original energies subtracting the atomic reference energies, which is the protocol advocated in the DimeNet work of Klicpera et al. 20. These adjusted values are more reasonable because absolute energies are generally meaningless and relative energies essentially convey all physical implications. Table 2 reports the MAEs of HVNet for 12 tasks in the QM9 dataset with comparison to other eight models. HVNet outperforms all baselines on 10 out of 12 tasks. For the other 2 tasks, R2 and ZPVE, the MAEs of HVNet are on par with some of the baselines. Details of additional settings and the definition of the physical quantities with respect to the models and datasets are provided in the Supplementary Methods and Supplementary Discussion 1.

### Benchmark on extended systems

Predicting properties of extended systems is a more ambitious task because of their intricate chemical environments. Since HermNet is capable to handle extended systems, we conduct this more challenging benchmark on the extended system datasets provided in ref. 10. The datasets contain properties of 8 different systems, among which bulk C5H5N, bulk TiO2, the system which consists of MoS2 and Pt, and high entropy alloy (HEA) are four most difficult tasks: the bulk C5H5N and bulk TiO2 dataset include multiple phases; the system of MoS2 and Pt includes five different datasets. Unfortunately, training on the MoS2+Pt dataset required too much computational time, so we chose not to further pursue this benchmark after some preliminary tuning (and no corresponding results are shown). The HEA dataset is explicitly divided into two datasets, such that the model should be trained on the first dataset which includes 40 kinds of 5 equimolar-element CoCrFeMnNi HEA with random occupations and then tested on the test set in the first dataset and the entire second dataset that includes another 16 kinds of HEA with random occupations. Table 3 shows the comparisons of root mean square errors (RMSEs) between DeepPot-SE/DeePMD10 and HVNet. Since the potential energy is an extended quantity, the RMSEs of energies were normalized with the system size in consistency with how the DeepPot-SE and DeePMD10 presented the results. As shown in Table 3, HVNet achieved lower RMSEs than DeepPot-SE on all tasks except the dataset of MoS2 and Pt, which we chose not to do due to the excessive amount of required training time. Detail of additional settings and specific discussions are provided in Supplementary Methods and Supplementary Discussion 2. Besides, we calculated the vacancy formation energy of the bulk Cu with the trained model. An arbitrary Cu atom was removed and the configuration was relaxed with DFT and HVNet, respectively. The chemical potential of Cu was calculated from DFT and the vacancy formation energies from DFT and HVNet are 1.03 eV and 1.07 eV, respectively, which are also consistent with previous computational and experimental results (1.14 eV and 1.17–1.28 eV, respectively)38.

### Molecular dynamics simulation and phonon dispersion

To demonstrate the performance of HermNet, MD simulation of a MoSe2 monolayer was performed. The dataset was generated with Vienna ab initio Simulation Package (VASP)39 using the projector-augmented wave40,41 pseudopotentials. The Perdew-Berke-Ernzerhof exchange-correlation potential42 was used. The cutoff of plane waves was 260 eV and a 2 × 2 × 1 gamma-centered k-point mesh was adopted to sample the Brillouin zone of the 6 × 6 × 1 supercell. The simulation was carried out under the canonical ensemble with the temperature increasing from 100 K to 1500 K, and 5000 frames were obtained With the time step of 1 fs. The dataset was randomly shuffled and split into training set, validation set, and test set in the ratio of 8:1:1. The MAEs of energy and forces on test set were 0.09 meV per atom and 2.93 meV Å−1, respectively. The comparison of radial distribution functions at 300 K from AIMD and i-PI43, a classical MD simulation software, with HVNet as the force fields, is shown in Fig. 3a. Furthermore, the phonon dispersion was calculated via interfacing HermNet and phonopy44. Acoustic sum rule was enforced to ensure that the three acoustic modes at Γ point must be zero. As shown in Fig. 4b, the performance of HermNet on phonon dispersion demonstrates that even the second order derivative of potential energy reaches high precision.

## Discussion

The complexity of a sub-network is generally scaled as $${{{\mathcal{O}}}}(| {{{\mathcal{N}}}}| )$$, where $$| {{{\mathcal{N}}}}|$$ is typically the number of the neighbors captured within a cutoff radius. The numbers of sub-networks for HVNet, HPNet, and HTNet are $${{{\mathcal{O}}}}({N}_{e})$$, $${{{\mathcal{O}}}}({N}_{e}^{2})$$ and $${{{\mathcal{O}}}}({N}_{e}^{3})$$, respectively. Here, Ne is the number of element types present in the system. Therefore, HVNet is most useful when the number of distinct elements is large. Further discussions on the complexity analysis are deferred to the Supplementary Notes 2.

To construct accurate force field for classical MD simulations, potential energy surface needs to be reproduced up to first-principles precision. Actually, potential energy has hierarchical structure and can be decomposed into several terms as follows,

$$U=\mathop{\sum}\limits_{i}{E}_{i}+\mathop{\sum}\limits_{i < j}{E}_{ij}+\mathop{\sum}\limits_{i < j < k}{E}_{ijk}+\cdots \,,$$
(4)

where the first term represents the energy of a single atom and the second term is the summation of all the pairwise interactions, such as the energy contributed from bonds. The third term denotes the three-body interactions, which typically entails angular specifications. Higher-order many-body interactions can be further included in order to build a more accurate potential energy surface. The layers shown in Fig. 3b and (c), which are equivalent to the message layer in the original PAINN proposal22, could be viewed as a single MPNN layer which models two-body interactions since they merely process radial information. The inner products of the positional vectors presented in the modules in Fig. 3d, e or f are responsible for modeling three-body interactions. Thus the sub-network, i.e. concatenation of these layers, as shown in Figs. 2b and 3a, exactly conforms to this hierarchical rule in Eq. (4).

On the other hand, graphs are constructed with a specific cutoff radius and only information of 1-hop neighbors is aggregated in a single MPNN layer. The final energy prediction is obtained with a global pool operation on all local environments. This suggests that locality is an essential property that facilitates the learning of potential energies. The DFT total energy could be expressed as a summation of eigenvalues of electronic Hamiltonian and the interaction of the nuclei with a correction to avoid double counting45. To take advantages of a localized basis as in a graph, we will discuss the total energy within the tight-binding framework, which could provide more physical insights. When the density is expressed as the superposition of spherical atomic densities46, the total energy in the tight-binding representation is written as

$${E}_{{{\rm{total}}}}=\mathop{\sum}\limits_{m,{m}^{\prime}}{\rho }_{m,{m}^{\prime}}{H}_{m,{m}^{\prime}}+\mathop{\sum}\limits_{I < J}f(| {{{{\rm{R}}}}}_{I}-{{{{\rm{R}}}}}_{J}| ),$$
(5)

where $${\rho }_{m,{m}^{\prime}}$$ is the density matrix. $${H}_{m,{m}^{\prime}}$$ is the matrix element of the Hamiltonian between states m and $${m}^{\prime}$$, where m = 1,  , Nbasis denote the states in the basis. RI is the position of atom I, and J is a neighboring site of I. The formula demonstrates that total energy could be decomposed into pairwise contributions, which is consistent with the layer made up of radial message passing layer in Fig. 2(a). Generally, the terms in Eq. (5) are both short-range interactions47,48,49 and could be extended to higher-order interaction. Then the total energy could be expressed as $${E}_{{{\rm{total}}}}=\mathop{\sum }\nolimits_{i = I}^{N}{\varepsilon }_{I}^{\prime}$$, which is a summation of local contributions from central particles. This indicates the locality of a system’s overall energy, consistent with the idea underlying the seminal work of Ref. 8, which is widely adopted in the many follow-up works in this field.

In principle, the parameters of sub-networks in DeePMD10 are not shared for different element types, which is similar to heterogeneous GNNs. Thus the outperformance on extended systems results from the ability the sub-networks we used in this work. There are also other existing heterogeneous GNN framework designed for MD simulations, but the design principle is very different. MXMNet50 utilized multiplex graphs, which could be viewed as heterogeneous graphs with individual node and two edge types, to capture global and local geometric information from multiplex graphs allocated with different cutoff radii. Heterogeneous molecular GNNs51 introduced heterogeneous graphs for molecules via grouping the original graph and a line graph into a single heterogeneous graph with two kinds of nodes. It processes information of nodes in original graph and line graph with two different GNNs respectively. The heterogeneity in these two works is equivalent to distinguishing original graphs and line graphs, which still treats the original graphs as a homogeneous graph.

In conclusion, we develop HermNet, a framework based on heterogeneous GNN, to learn multiple kinds of force fields in a single MD simulation via extracting required subgraphs. Different from previous works, HermNet introduce heterogeneous graphs to describe different interactions of element types rather than to distinguish the hierarchy of the interactions. Among three variants of HermNet, we tested HVNet on a variety of systems, covering both molecular and extended systems, and obtained satisfactory results. Some discussions based on quantum mechanics and DFT have been provided to justify our model designs. Although we primarily focus on experiments with HVNet, in principle, HTNet is capable of modeling sophisticated interactions once enough data is provided. HVNet outperforms the state-of-the-art benchmark models on most of the tasks for small molecules. For the experiments on extended systems, HVNet also outperforms DeePMD10. These results demonstrate the powerful representation and promising application potential of HVNet for diverse and intricate systems such as HEA. Finally, we emphasize that HermNet is a universal framework, whose sub-networks could be replaced by other advanced or specialized models. For example, unitary N-body tensor equivariant neural network (UNiTE)52, another remarkable framework based on the elegant group theory, was proposed recently, which performed impressively on molecular datasets. We believe that HermNet can deliver improved results by replacing the current sub-networks with UNiTE52. Besides, many-body interactions could also be truncated to higher order in sub-networks of HermNet, such as dihedral angular information53. HermNet can also be extended to model interactions from higher-order contributions via extracting higher-order subgraphs and invoking frameworks that model higher-order contributions properly. More information could be found in Supplementary Discussion 3.

## Methods

### Architecture implementation

HermNet is implemented with PyTorch54 and Deep Graph Library55 python library. Neighbors of the central particle are found by Scikit-Learn56 library and the node features are extracted by Atomic Simulation Environment57 and Pymatgen58 library. In our work, a simplified PAINN22 is implemented as sub-network in both HVNet and HTNet. The angular formula in HVHet is the same as that in PAINN22, while that in HTNet is a little different. The proof that angular information could be introduced in HVNet and HTNet with PAINN naturally is provided in Supplementary Notes 1.