Atomic-scale tailoring of materials is one of the most promising paths towards achieving new, both quantum and classical properties. Controllable defect engineering, i.e., introduction of vacancies or desired impurities, enables properties modifications and new functionalities in crystalline materials1. The opportunities for such controlled material engineering methods got a dramatic boost in the past two decades with the development of the methods for exfoliation of crystal into two-dimensional atomic layer2. The reduced dimensionality in layered two-dimensional materials makes it possible to manipulate defects atom by atom and tune their properties down to quantum mechanics limits3. Such atomic-scale preparation and fabrication techniques hold promise for the continual development of the semiconductor industry in the post-Moore age and the development of novel technologies such as quantum computing4, catalysts5, and photovoltaics6.

Despite decades of research efforts, knowledge of the structure-property relation for defects in crystals is still limited. Only a small subset of defects in the vast configuration space have been investigated7. The properties of complexes of multiple point defects, where quantum phenomena dominate, depend on the composition and configuration of such defects in a strongly non-trivial manner, making their prediction a very hard problem. The diversity and complexity of the problem come from the exchange interaction of defect orbitals separated by discrete lattices8. On the other hand, the vast chemical and configuration space prohibits a thorough exploration of such structures by traditional trial and error experiments and even for computationally expensive state-of-the-art quantum mechanic simulations.

The recent development of large materials databases has stimulated the application of deep learning methods for atomistic predictions. Machine learning (ML) methods trained on density functional theory (DFT) calculations have been used to identify materials for batteries, catalysts, and many other applications. Machine learning methods accelerate the design of the new materials by predicting material properties with accuracy comparable to ab-initio calculations, but with orders of magnitude lower computational costs9,10. A series of fast and accurate deep learning architectures have been presented during the last few years. The most successful of them are graph neural networks, such as MEGNet11, CGCNN12, SchNet13, GemNet14, etc.

In this work, we propose a method for predicting the energetic and electronic structures of defects in 2D materials with machine learning. Firstly, a machine learning-friendly 2D Material Defect database (2DMD) was established employing high throughput DFT calculations15. The database is composed of both structured datasets and dispersive datasets of defects in represented 2D materials such as MoS2, WSe2, h-BN, GaSe, InSe, and black phosphorous (BP). We use the datasets to evaluate the performance of the previously reported approaches along with ours’ which was specially designed to provide accurate description of materials with defects. Our computational experiments show that our approach provides a significant increase in prediction accuracy compared to the state-of-the-art general methods. The high accuracy allows to reproduce the nonlinear non-monotonic property-distance correlation of defects which is a combination of the quantum mechanic effect and the periodic lattice nature in 2D materials. The general methods, on the other hand, mostly fail to predict such property-structure functionals, as we show in subsection “Aggregate performance”. Most importantly, our method shows great transferability for a wide range of defect concentrations in the various 2D materials we studied.

Machine learning offers two principal approaches to predicting atomistic properties: graph neural networks (GNN) and physics-based descriptors. Graph neural networks have several valuable properties that make them uniquely suitable for modeling atomic systems: invariance to permutations, rotations, and translation; natural encoding of the locality of interactions. In the recent Open Catalyst benchmark5, GNNs solidly outperform the physics-based descriptors. Therefore, in this section, we only focus on GNNs.

Xie et al.12 is one of the first works to propose applying a convolutional GNN to materials. Wolverton et al.16 improves on it, by incorporating Voronoi-tessellated atomic structure, 3-body correlations of neighboring atoms, and chemical representation of interatomic bonds. Schütt et al.13 (Schnet) proposes continuous-filter convolutional layers. Chen et al.11 (MEGNet) uses a more advanced GNN: message-passing, instead of convolutional. Klicpera et al.14,17 (GemNet) redresses an important shortcoming of the previous message-passing GNN’s: loosing of geometric information due to considering only distances as edge features. Zitnick et al.18 improves handling of angular information. Cloudhary et al. presents an Atomistic Line Graph Neural Network (ALIGNN)19, a GNN architecture that performs message passing on both the interatomic bond graph and its line graph corresponding to bond angles. Ying et al.20,21 introduces Graphormer, a hybrid model between Transformers and GNNs allows for more expressive aggregation operations.

Even though the described models are not evaluated on crystals with defects - they are in-principle capable of handling any atomistic structures, and thus we use the most established ones as baselines. Moreover, we demonstrate our approach using one of the most renowned GNN architectures for materials: MEGNet (see details in subsection “General message-passing graph neural networks for materials”).

The introduction of a defect site in general creates disturbed electronic states and the wave function associated with such states fluctuates over a distance of a few lattice constants depending on the localization of the electrons in the host lattice, Fig. 1. This results in some localized defect levels in the energy spectrum of the solid. From a quantum embedding theory point of view22, the defects could be seen as the active region of interest embedded in the periodic lattice. Accordingly, the defect levels are governed by the interactions of the unsaturated electrons in the background of the valence band electrons. The properties of a defect complex composed of more than one defect sites are governed by the interference of the wave functions of such electrons23. As the result, the formation energy, positions of defect levels, and the HOMO-LUMO gap are non-trivially dependent on the defect configurations. As schematically shown in Fig. 1, two defect states interfere with each other, and the separation of the bonding and antibonding states is governed by the exchange integral of the two states in the screening background of valence electrons. The exchange integral is subtly dependent on the positions of defect components, and such a HOMO-LUMO gap is a complex functional of the defect configuration. It is still a challenge for machine learning to precisely predict such nonlinear quantum mechanic behavior of defects.

Fig. 1: The quantum mechanic nature of defects.
figure 1

a DFT-simulated defect wave function centered at a single Mo site of MoS2 crystal lattice. Red and green colors represent the opposite phases of the wave function isosurfaces. The dash circles are an illustration of the electronic orbital shells. b A schematic representation of the wave function interference of two defect sites in a crystal lattice. c The defect levels are governed by the exchange interaction of defect components, and hence the separation of the defect levels is dominated by the exchange integral according to the valence bond theory.

Machine learning methods have been proposed for prediction of the formation energies of single point defects24,25,26 across different materials, but the authors didn’t consider configurations with multiple interacting defects. In a recent preprint27, the authors use a model based on CGCNN12 to conduct a large scale screening of single-vacancy structures for diverse energy applications. In ref. 4, the authors use MEGNet architecture for prediction of the properties of pristine 2D materials and choosing the ones that make optimal hosts for engineered point defects. Then they use matminer28 combined with Random Forest29 for predicting the properties of structures with point defects. A similar descriptor-based approach is used in ref. 30. We evaluate the descriptor-based approach for our data, as described in subsection “Physics-based descriptors”.

ReaxFF31 potential has been developed for dichalcogenides, and is used for studying defect dynamics32,33. The potential is very computationally efficient, and thus allows to probe dynamics on a larger time scale. However, it is not as precise as DFT, and doesn’t offer a way to predict the electronic properties.

The paper is structured as follows. Section “Sparse representation of crystals with defects” presents our proposed method for representing structures with defects to machine learning algorithms. Section “Dataset” provides the description of the dataset we use for evaluation. Sections “Aggregate performance”, “Quantum oscillations prediction” present the computational experiments, where we compare performances of different methods. Finally, section “Discussion” summarizes our work.


Sparse representation of crystals with defects

For machine learning algorithms, an atomic structure is a so-called point cloud: a set of points in 3D space. Each point is associated with a vector of properties, which at the least contains the atomic number, but may also include more physics-based features, such as radius, the number of valence electrons, etc.

The structures with defects present a challenge to machine learning algorithms. The neighborhoods of the majority of the atoms are not affected by the point defects. In principle, this shouldn’t be an obstacle for a perfect algorithm. In practice, however, this comparatively small difference in the full structures is hard to learn. As we demonstrate in section “Aggregate performance” with our computational experiments, state-of-the-art algorithms underperform on crystal structures with defects.

We propose a way to represent structures with defects that makes the problem of predicting properties easy for the ML algorithms leading to better performance. The core idea is presented in Fig. 2: instead of treating a crystal structure as a point cloud of atoms, we treat it as a point cloud of defects. To obtain it, we take the structure with defects, remove all the atoms that are not affected by substitution defects, and add virtual atoms on the vacancy sites.

Fig. 2: The transition from full to sparse representation for an example MoS2 supercell.
figure 2

a A full MoS2 structure with one Mo → W, two S → Se substitutions, and two S vacancies; b sparse representation, containing only defects types and coodrinates; c intermediate point of the graph construction: cutoff radius centered on the metal substitution defect; d the final graph with sparse representation. A dashed green line indicates a virtual graph edge that effectively connects one node with another via a periodic boundary condition.

Each point has two parameters in addition to the coordinates: the atomic number of the atom on the site in the pristine structure and the atomic number of the atom in the structure with the defect. Vacancies are considered to have atomic number 0.

The structure of the pristine unit cell is encoded as a global state for each structure using a vector with the set of atomic numbers of the pristine material, e.g. (42, 16) for MoS2. This simple approach is sufficient on our structures. As a future direction, in case generalization between materials is desired, graph embedding would be a logical choice: pristine material unit cell as an input to a different GNN, which outputs a vector of fixed dimensionality.

Secondly, we propose an augmentation specific to graph neural networks and 2D crystals: adding the difference in z coordinate (perpendicular to the material surface) as an edge feature. Normally, such a feature would break the rotational symmetry. But in the case of a 2D crystal, the direction perpendicular to the material surface is physically defined and thus can be used.

In a crystal, the replacement of an atom or the introduction of a vacancy causes a major disruption of the electronic states. Given the wave nature of electrons, the introduction of a localized defect creates oscillations in the electronic wave functions at the atomic level similar to a rock thrown into a pond. In the case of crystals, the wave function oscillations may involve one or several electronic orbitals, and the amplitude of those oscillations decays away from the defect at a rate that depends on the nature of those orbitals. This oscillatory nature of the electronic states away from a defect leads to the formation of electronic orbital shells (EOS). We ascribe an EOS index to such shells, that labels the amplitude of the wave function in decreasing order, the S atom labeled 1 filling the largest amplitude, as we show in Fig. 1. Formally we define EOS orbitals as follows. Firstly, we project all atoms on the x-y plane, making a truly 2D representation of the material. For binary crystals, for each atom, we draw circles centered on it and passing through the atoms of the other species, numbering them in the order of radius increase. For unary materials (BP in our dataset), the circle radii are multiples of the unit cell size. The circle number is the EOS index of the site with respect to the central atom. The intuition behind those indices is described in the paper34, which claims that the atomic electron shells’ interaction strength is not monotonic with respect to atom distance, but it oscillates in a way such that minima and maxima coincide with the crystal lattice nodes. To represent those oscillations, we also add parity of the EOS index as a separate feature, which we call EOS parity.

Incorporating sparse representation into a graph neural network

Our proposed representation fits into the graph neural networks (GNN) framework (described in subsection “General message-passing graph neural networks for materials” as follows:

  1. 1.

    Graph nodes correspond to point defects, not to all the atoms in a structure;

  2. 2.

    Threshold for connecting nodes with edges is increased;

  3. 3.

    Node attributes contain the atomic number of the atom on this site in the pristine structure, and the atomic number of the atom in the structure with defect, with 0 for vacancies;

  4. 4.

    Edge attributes contain not only the Euclidean distance between point defects corresponding to the adjacent vertices, but also EOS index, EOS parity index, and Z plane distance;

  5. 5.

    Input global state contains the chemical composition of the crystal as a vector of atomic numbers.


We established a machine learning friendly 2D material defect database (2DMD)15 for the training and evaluation of models. The datasets contain structures with point defects for the most widely used 2D materials: MoS2, WSe2, hexagonal boron nitride (h-BN), GaSe, InSe, and black phosphorous (BP). The types of point defects are listed in the Table 1. Supercell details are available in Supplementary Table 1 and example defect depictions in Supplementary Fig. 1.

Table 1 Point defect types present in the 2DMD datasets.

The datasets consist of two parts: low defect concentration of structured configurations and high defect concentration of random configurations. The low defect concentration part consists of 5933 MoS2 structures and 5933 WSe2 with all possible configurations in the 8x8 supercell for defect types depicted in Fig. 3. We used pymatgen35 to find the configurations, taking into account symmetry. The high-density dataset contains a sample of randomly generated substitution and vacancy defects for all the materials. For each total defect concentration 2.5%, 5%, 7.5%, 10%, and 12.5% 100 structures were generated, totaling 500 configurations for each material and 3000 in total. Overall, the dataset contains 14866 structures with 120–192 atoms each. The datasets as designed could provide training data for AI methods both the fine features of quantum mechanic nature and those features associated with different elements, crystal structures, and defect concentrations. We used Density Functional Theory (DFT) for computing the properties, the details are described in the subsection “DFT computations”

Fig. 3: Defect types in low density dataset.
figure 3

The ‘Mo’ and two ‘S’ columns denote the the type of site that is being perturbed either by substituting the listed element, or a vacancy (vac). ‘Num’ column contains the number of structures with defects of the type in the dataset. Finally, ‘Example’ column presents a structure with such defect. Note that the actual supercell size is 8 × 8, here we draw 4 × 4 window centred on the defects to conserve space. Drawings of example structures with 8 × 8 supercells are available in Supplementary Fig. 1.

We use two target variables for evaluating machine learning methods: defect formation energy per site and HOMO–LUMO gap.

Formation energy, i.e., the energy required to create a defect is defined as

$${E}_{{{{\rm{f}}}}}={E}_{{{{\rm{D}}}}}-{E}_{{{{\rm{pristine}}}}}+\mathop{\sum }\limits_{i}{n}_{i}{\mu }_{i}$$

where ED is the total energy of the structure with defects, Epristine is the total energy of the pristine base material, ni is the number of the i-th atoms removed from (ni > 0) or added to (ni < 0) the supercell to/from a chemical reservoir, and μi is the chemical potential of the i-th element, computed with the same DFT settings. Finally, to make the results better comparable across examples with different numbers of defects, we normalize the formation energy by dividing it by the number of defect sites:

$${E}_{{{{\rm{f}}}}}^{{\prime} }={E}_{{{{\rm{f}}}}}/{N}_{{{{\rm{d}}}}},$$

where Nd is the number of defects in the structure.

The electronic properties of defects are characterized by the energy spacing between the highest occupied states and the lowest unoccupied states. For the sake of representation, we adopt the terminologies of HOMO-LUMO gap for the separation of defect levels. Defects in some of the materials (BP, GaSe, InSe, h-BN) have unpaired electrons and hence non-zero magnetic momentum. Therefore, DFT was computed taking into account two channels of spin-up and spin-down bands, resulting in the majority and minority HOMO-LUMO gaps. For evaluating the machine learning algorithms, we took the minimum of those gaps as the target variable.

Aggregate performance

We split the dataset into 3 parts: train (60%), validation (20%), and test (20%). The split is random and stratified with the respect to each base material. For each model, we use random search for hyperparameter optimization; we generate 50 hyperparameter configurations, train the model with each configuration on the train part, and select the best-performing configuration by evaluating quality on the validation part. The search spaces and optimal configurations are present in Supplementary Discussion 1. To obtain the final result, we train each model with the optimal parameters on the combination of train and validation parts and evaluate the quality on the unseen test part. We do this 12 times to estimate the effects of the random initialization. We use unrelaxed structures as inputs and predict the energy and HOMO–LUMO gap after relaxation. To account for the material class imbalance, we use weighted mean absolute error (MAE) as the quality metric during both training and evaluation:

$$\,{{\mbox{MAE}}}\,=\frac{\mathop{\sum }\nolimits_{i = 1}^{N}| {\bar{y}}_{i}-{y}_{i}| {w}_{i}}{\mathop{\sum }\nolimits_{i = 1}^{N}{w}_{i}},$$

where wi is the weight assigned to each example; yi is the predicted value; \({\bar{y}}_{i}\) is the true value; N is the number of the structures in the dataset. The purpose of using weights is to prevent the combined error value from being dominated by the low defect density dataset part, as it’s 4 times more numerous compared to the high defect density part. The weights are computed as follows:


where wdataset is the weight associated with each example in a dataset part, Ntotal = 14866 is the total number of examples, Cparts = 8 is the total number of dataset parts (2 low-density and 6 high-density), Npart {500, 5933} is the number of examples in the part (500 for low defect density parts, 5933 for the high defect density parts).

We compare the performance of our sparse representation combined with MEGNet11 to several baseline methods: MEGNet, SchNet13, and GemNet14 on full representation, and CatBoost36 with matminer-generated features 4.3. The results are presented in Table 2. For energy prediction, our model achieves 3.7× less combined MAE compared to the best baseline, with 2.2×–6.0× difference in individual dataset parts. For HOMO-LUMO gap, using sparse representation doesn’t lead to an increase in overall prediction quality. The prediction quality for MoS2 and WSe2 is improved by a factor of 1.3–4.8, but this is outweighted by a factor of 1.06–1.15 increase in MAE for the other materials. Coincidentally, the combined MAEs are similar, being averaged over the absolute error values.

Table 2 Performance of the different methods in terms of the mean absolute error (MAE).

In terms of computation time, when trained on a Tesla V100 GPU, MegNet with sparse representation took 45 minutes; MegNet with full representation 105 minutes; GemNet 210 minutes; SchNet 100 minutes; CatBoost 0.5 minutes. Low memory footprint and GPU utilization allow to fit 4 simultaneous runs with sparse representation on the same GPU (16 GiB RAM) without loosing speed, this is not possible for the GNNs running on full representation. Computing matminer features can’t be done on a GPU, and costs 7.5 CPU core-minutes per structure and 1860 core-hours for the whole 2DMD dataset. Model configurations are listed in Supplementary Discussion 1.

Quantum oscillations prediction

In addition to the overall performance, we specifically evaluate the models with the respect to learning quantum oscillations. We use the MoS2 with one Mo and one S vacancy as the test dataset, and the rest of the 2DMD dataset as the training dataset. No sample weighting is used. We train every model 12 times with both optimal hyperparameters found via random search and the default parameters.

As seen in the Table 2, sparse representation performs especially well on the low-density data. This behavior extends nicely to the 2-vacancy data, as shown in Fig. 4. The baseline approaches fail to meaningfully learn the dependence, while sparse representation succeeds perfectly, including the non-monotonous reduction at 5 Å.

Fig. 4: Predicted and DFT formation energy as a function of the distance between the defects for MoS2 with one Mo and one S vacancy.
figure 4

Error bars represent the standard deviation across 12 models trained from different random initialisation.

As shown in the Supplementary Discussion 3, the result is similar for untuned hyperparameters.

Ablation study

The ablation study investigates how much each proposed improvement contributes to the final result. The performance values are presented in Table 3.

Table 3 Performance of various combinations of our proposed features.

To conduct the ablation study, we took the optimal configurations for MEGNet with sparse and full representations found by random search. We then took the configuration for the sparse representation turned off our enchantments one-by-one, trained and evaluated the resulting models. We use a value averaged over 12 experiments, same as in Table 2 to estimate training stability.

For formation energy, just enabling the z coordinate difference in sparse representation edges allows the Sparse-Z model to outperform the Full model everywhere except h-BN; adding pristine atom species (Sparse-Z-Were) as the node features contributes the most of the remaining gain. The most likely explanation for the importance of the pristine species for h-BN is that both atoms can be substituted to C, without this additional information, the model can’t distinguish between B and N substitutions. Adding EOS improves expected prediction quality and stability by a small amount for the low-density datasets.

For HOMO–LUMO gap, Sparse-Z-Were and Sparse-Z-Were-EOS perform similarly to Full in terms of the combined metric, outperform it by a factor of 4 for low-density data. EOS again improves prediction quality and stability by a small amount for the low-density datasets.


2D crystals present an incredible potential for the future of material design. Their two-dimensional nature makes them prone to chemical modification, which further increases their tunability for a variety of applications. However, the search space for possible configurations is vast. Thus, the ability to predict the properties of such crystals efficiently becomes a vital task. In this paper, we focus on predicting the properties of such crystals blended with defects, substitutions, and vacancies. State-of-the-art machine learning algorithms struggle to learn the properties of crystals’ defects accurately. We propose using a sparse representation combined with graph neural network architectures like MEGnet and show that it dramatically improves energy prediction quality. Our studies demonstrate that the prediction error drops 3.7 times compared to the nearest contender. Moreover, the representation is compatible with any machine learning algorithm based on point clouds. Computationally the training of a graph neural network using sparse representation takes at least 4x less memory and 8x less GPU operations compared to the full representation. Thus, we conclude that our approach gives a practical and sound way to explore a vast domain of possible crystal configurations confidently.

We see two principal directions for future work. Firstly, 3D materials. Sparse representation can be used as is for ordinary 3D crystals with point defects. Secondly, generalization to unseen materials. In our paper, we consider setup where each base material is present in the training dataset. Combining sparse defect representation with advanced base material representation24,25,27 opens up an enticing possibility for predicting properties of defect complexes in new materials, without having to prepare a training dataset with defects in those new materials.


DFT computations

Our calculations are based on density functional theory (DFT) using the PBE functional as implemented in the Vienna Ab Initio Simulation Package (VASP)37,38,39. The interaction between the valence electrons and ionic cores is described within the projector augmented (PAW) approach40 with a plane-wave energy cutoff of 500 eV. The initial crystal structures were obtained from the Material Project database, and the supercell sizes and the computational parameters for each material can be found in Supplementary Table 1. Since very large supercells are used for the calculation of defects, the Brillouin zone was sampled using Γ-point only Monkhorst-Pack grid for structural relaxation and denser grids for further electronic structure calculations. A vacuum space of at least 15 Å was used to avoid interaction between neighboring layers. In the structural energy minimization, the atomic coordinates are allowed to relax until the forces on all the atoms are less than 0.01 eV/Å. The energy tolerance is 10−6 eV. For defect structures with unpaired electrons, we utilize standard collinear spin-polarized calculations with magnetic ions in a high-spin ferromagnetic initialization (the ion moments can of course relax to a low spin state during the ionic and electronic relaxations). Currently, we are focusing on basic properties of defects at the level of single-particle physics and did not include spin-orbit coupling (SOC) and charged states calculations. Since the materials we considered are normal nonmagnetic semiconductors and none of them are strongly correlated systems, we did not employ the GGA+U method. A comparison of a few selected the computed values to the ones available in the literature is available in Supplementary Table 2.

General message-passing graph neural networks for materials

There are many types of graph neural networks (GNN). In this section, we outline the message-passing neural network proposed by Battaglia et al.41. Those became rather popular for analyzing material structure11.

To prepare a training sample, a graph is constructed out of a crystal configuration: atoms become graph nodes, and graph edges connect nodes at distances less than a predefined threshold. The connections respect periodic boundary conditions, i.e., for a significantly large threshold, an edge can connect a node to its image in an adjacent supercell. Specific property vectors also characterize nodes and edges. A node contains the atomic number, and an edge contains the Euclidean distance between the atoms it connects.

A layer of a message-passing neural network transforms a graph into another graph with the same connectivity structure, changing only the nodes, edges, and global attributes. The layers are stacked to provide an expressive deep architecture. Let G = (V, E, u) be a crystal graph from the previous step, the nodes are represented by vectors \(V={\{{{{{\bf{v}}}}}_{i}\}}_{i = 0}^{| V| }\), where \({{{{\bf{v}}}}}_{i}\in {{\mathbb{R}}}^{{d}_{{{{\rm{v}}}}}}\) and V is the number of atoms in the supercell. The edge states are represented by vectors \({\{{{{{\bf{e}}}}}_{k}\}}_{k = 0}^{| E| }\), where \({{{{\bf{e}}}}}_{k}\in {{\mathbb{R}}}^{{d}_{e}}\). Each edge has a sender node index vs {0,  , V − 1}, a receiver node vr {0,  , V − 1} and a vector of edge attributes. An edge is represented by a tuple \(({{{{\bf{v}}}}}_{k}^{s},{{{{\bf{v}}}}}_{k}^{r},{{{{\bf{e}}}}}_{k})\), where the superscripts s, r denote the sender and the receiver nodes respectively. The global state vector \({{{\bf{u}}}}\in {{\mathbb{R}}}^{{d}_{u}}\) represents the global state of the system. In the input graph, the global state is used to provide the algorithm with information about the system as a whole, in the case of our sparse representation, the composition of the base material. In the output graph, the global state contains the model predictions of the target variables. A message-passing layer is a mapping from G = (V, E, u) to \({G}^{{\prime} }=({V}^{{\prime} },{E}^{{\prime} },{{{{\bf{u}}}}}^{{\prime} })\), this mapping is based on update rules for nodes, edges, and global state. Edge update rule operates on the information from the sender \({{{{\bf{v}}}}}_{k}^{s}\), receiver nodes \({{{{\bf{v}}}}}_{k}^{r}\), edge itself ek, and the global state u. We can represent this rule by a function φe:

$${{{{\bf{e}}}}}_{k}^{{\prime} }={\phi }_{{{{\rm{e}}}}}\left({{{{\bf{v}}}}}_{i}^{s},{{{{\bf{v}}}}}_{i}^{r},{{{{\bf{e}}}}}_{k},{{{\bf{u}}}}\right).$$

Node update rule aggregates the information from all the edges \({E}_{{{{{\bf{v}}}}}_{i}}=\{{{{{\bf{e}}}}}_{k}^{{\prime} }| {{{{\bf{e}}}}}_{k}^{{\prime} }\in \,{{\mbox{neighbors}}}\,({{{{\bf{v}}}}}_{i})\}\) connected to the node vi, the node itself vi and the global state u. We can represent this rule by function ϕv :

$$\bar{{{{{\bf{e}}}}}_{i}}=\frac{1}{| {E}_{{{{{\bf{v}}}}}_{i}}| }\mathop{\sum}\limits_{{{{\rm{neighbors}}}}({{{{\bf{v}}}}}_{i})}{{{{\bf{e}}}}}_{k}^{{\prime} }$$
$${{{{\bf{v}}}}}_{i}^{{\prime} }={\phi }_{{{{\rm{v}}}}}(\bar{{{{{\bf{e}}}}}_{i}},{{{{\bf{v}}}}}_{i},{{{\bf{u}}}})$$

Finally, the global state u is updated based on the aggregation of both nodes and edges alongside the global state itself and process them with ϕu:

$${\bar{{{{\bf{u}}}}}}_{{{{\rm{v}}}}}=\frac{1}{| V| }\mathop{\sum }\limits_{i=0}^{| V| -1}{{{{\bf{v}}}}}_{i}^{{\prime} }$$
$${\bar{{{{\bf{u}}}}}}_{{{{\rm{e}}}}}=\frac{1}{| E| }\mathop{\sum }\limits_{k=0}^{| E| -1}{{{{\bf{e}}}}}_{k}^{{\prime} }$$
$${{{{\bf{u}}}}}^{{\prime} }={\phi }_{{{{\rm{u}}}}}({\bar{{{{\bf{u}}}}}}_{{{{\rm{v}}}}},{\bar{{{{\bf{u}}}}}}_{{{{\rm{e}}}}},{{{\bf{u}}}})$$

The functions ϕv, ϕe, ϕu are fully-connected neural networks. The model is trained with ordinary backpropagation, minimizing the mean squared error (MSE) loss between the predicted values in u of the output graph, and the target values in the training dataset.

Physics-based descriptors

To make a complete comparison, we also evaluate a classic setup, where physics-based descriptors are combined with a classic machine learning algorithm for tabular data, CatBoost36. The numerical features we extract from the crystal structures using the matminer package28 are outlined in Table 4.

Table 4 Matminer feature groups used for a defect configuration description.