Introduction

Almost a century ago1,2,3, dislocations were identified as carriers of plastic deformation in crystalline materials. A key construct of the theory is the response of dislocations to an applied stress state, the dislocation mobility law, which describes the relationship between the force experienced by the dislocation (f) and the resultant velocity (v). In crystals with low lattice resistance and planar core structure, e.g., Face-Centered Cubic (FCC) metals, the mobility law is usually simplified to a viscous damping relationship v = M(T) f, where M is the matrix of mobility coefficients and T is the temperature4, under the assumption of over-damped dynamics. In metals, M encodes dissipation mechanisms due to both electrons and phonons, with dominant phonon contribution except at very low temperatures. Theoretical calculations of phonon scattering predict a linear dependence of the mobility coefficients on temperature5 with a temperature-independent leading term proposed recently6,7. While this picture holds for edge dislocations in Body-Centered Cubic (BCC) metals, the complicated core structure of the BCC screw dislocations results in additional complexity. Screw dislocations in BCC are known to have high lattice resistance. This in turn results in a strong temperature dependence of the flow stress as the mobility of screw dislocations in BCC are controlled by kink-pair nucleation and migration enthalpy. In addition, screw dislocation motion is strongly influenced by multiple stress components, in addition to the resolved shear stress in the direction of the Burgers vector, resulting in non-Schmid effects8,9,10. These complexities are typically accounted for in the mobility law by introducing functional dependencies on the local stress tensor σ, line orientation ξ, temperature and possibly other local internal variables, that is11

$${\boldsymbol{v}}={{\boldsymbol{g}}}_{\alpha }({\boldsymbol{\sigma }},{\boldsymbol{\xi }},T,\ldots ).$$
(1)

The function gα is known as the (glide) mobility law for slip system α12,13,14. The functional dependencies of the mobility law are in turn informed by phenomenological approaches based on kink-pair nucleation and kink diffusion. A review of the phenomenological approach for deriving dislocation mobility laws can be found in Ref. 15. State-of-the art methods to develop such models for dislocation mobility (e.g., in BCC materials)16,17,18 have generally focused on screw and edge type dislocations and smooth interpolation between these for other line orientations. Researchers have explored calibrating an orientation-dependent phenomenological model using several intuition-based data points, e.g. Ref. 19 for BCC crystals20, for FCC Aluminum, and for superalloys21. However, this approach presents multiple pitfalls. For example, it has been shown previously that “singular” line orientations exist in BCC metals19,22 which result in cusps in the mobility. Dislocations in these singular orientations, and those vicinal to them, move by motion and propagation of kink pairs. Therefore, their mobility cannot be accurately represented using a smooth interpolation between screw and edge dislocations. Moreover, the mobility becomes a complex function of local chemical environments in the case of conventional and high entropy alloys (HEAs). The atomically-rough energy landscape for dislocation motion, which is believed to be an important characteristic of HEAs23,24, results in Arrhenius-type mobility laws which are highly non-linear in stress and temperature15. It has been demonstrated in BCC-based HEAs that the mobility of mixed dislocations25 and multiplicity of glide planes26 can drive the overall plastic behavior, in contrast to BCC metals. The traditional phenomenological approach informed by a handful of intuition-driven data-points to derive the mobility law can quickly become error-prone and cumbersome in such scenarios.

These challenges can in principle be addressed by using data-driven approaches, whereby active learning techniques are used to efficiently explore the parameter space for learning the dislocation mobility law with minimal human intervention. Such approaches are especially relevant with the advent of exascale computing platforms where hierarchical multi-scale methods, deployed at scale with automated model refinement, can potentially be used to drive materials design for targeted functionality. Moreover, the improved accuracy of Machine-Learning (ML) interatomic potentials27 promises affordable near-quantum accurate predictions, which could enable upscaling of atomic-level defect dynamics to predictive mesoscale models such as Discrete Dislocation Dynamics (DDD)28,29,30,31,32,33. To date, the exploration of these ideas is in a relatively nascent stage. Previous studies have explored the possibility of using graph-theoretical techniques for learning specific aspects of dislocation glide. Moraes et al.34 developed a graph-based surrogate model for glide of edge dislocations in BCC Fe, trained to molecular dynamics (MD) data. In this representation, sub-domains within the MD model are represented using nodes on a ring graph. The dislocations are then represented as random walkers which jump between neighboring nodes following a Poisson process. More recently, Bertin and Zhou35 developed a Graph Neural Network (GNN)36 approach to model DDD, with the aim of replacing its time-integration procedure. They demonstrated that this DDD-GNN approach can learn ground-truth (GT) DDD data of a dislocation gliding in a field of obstacles. While these studies show the potential for accelerating materials modeling through machine learning, bridging mesoscale dynamics (i.e. DDD) with information from higher fidelity (i.e. atomistic) simulations involving more general defect characteristics (i.e. mixed orientations of dislocations) remains to be explored.

Our objective in this study is to develop a generalizable framework for data-driven modeling of dislocation dynamics mobility law. Our proposed framework integrates high-throughput MD simulation datasets, physics-inspired machine learning model structures, and uncertainty quantification, accelerating the development of mesoscale models of dislocation mobility. A schematic diagram with different components of our proposed modeling workflow is shown in Fig. 1. To demonstrate our approach, GT data was generated from classical MD simulations of dislocation motion using an Embedded Atom Method (EAM) type interatomic potential (Fig. 1a); the workflow would remain formally identical, albeit more expensive, if ML interatomic potentials were used instead. The MD simulation data consists of dislocation dipoles and shear loops moving under different values of resolved shear stress (RSS) at various temperatures. Note that, while Non-Schmid effects are important in BCC materials, our aim here is to demonstrate the ML workflow supported by high-throughput large-scale atomistic trajectory generation protocols for a relatively simple data set with only RSS. Coarse-grained representations of dislocations were obtained from this atomistic representation using the Dislocation Extraction Algorithm (DXA) as implemented in the Open Visualisation Tool (OVITO)37 software. The DXA representation, which consists of nodes with line segments between them, was converted into a numerical representation with nodes, quadrature points or Gauss points (GPs), and line segments as shown in Fig. 1b. Relevant input features of these dislocation configurations, which will serve as input for an ML procedure, such as the Cauchy stress tensor (σ, shown schematically in Fig. 1c), segment tangent vector (s), and the Burger’s vector (b) were computed using a recent version of the open-source Mechanics of Defects Evolution Library (MoDELib) supporting periodic boundary conditions38. We remark that the computation of stress tensor by MoDELib is based on linear elasticity theory and the resulting stress can deviate from the true stress experienced by the dislocation. Nevertheless, we hypothesize that the stress tensor computed by MoDELib contains sufficient information for us to make accurate predictions, and the error due to the linear assumption, if any, will be corrected by our proposed GNN-based ML procedure below. We also note that the physical features of the dislocation lines, such as the Burger’s vector, segment tangent vector, and local stress, are defined on both the GPs and nodes, while the displacements (Δx) are defined on the nodes alone, with several GPs being present between any given pair of nodes in general. Dislocation network, thus represented as a set of nodes and Gauss points (GPs) in DDD can be naturally encoded as a graph. In Fig. 1d and e, we present the visualization of the dislocation network, showcasing both its physical representation extracted from DDD for ML and its graph representation. The physical network serves as an abstract depiction of the dislocation network, which consists of nodes (Ni) and GPs (GPi), connected by the dislocation segments. Figure 1e illustrates the corresponding heterogeneous graph, a type of graph containing vertices with different features, for learning purposes. Within this heterogeneous graph, we introduce two types of vertices, vN representing the nodes and vGP representing the GPs in the physical network. In the graph representation, instead of connecting vGP’s sequentially into a segment between nodes vN, we directly link GPs to their nearest nodes. This heterogeneous graph enables the direct exchange of information between GPs and their neighboring nodes, facilitating efficient information transfer.

Fig. 1: Schematic of ground truth data generation.
figure 1

a Atomistic representation of a dislocation segment (green atoms represent the defect core while the purple denotes pristine atoms). b Idealized nodal representation of the dislocation segment obtained from Dislocation Extraction Algorithm (DXA). Blue spheres represent dislocation nodes (denoted as N1, N2, …) and yellow cubes represent the Cauchy stress tensor, one of the features, defined at Gauss Points (denoted as GP1, GP2, …) using the linear elasticity kernel in MoDELib Discrete Dislocation Dynamics (DDD) code. c Zoom-in of the local stress state (σ) at a given Gauss point, with the definition of segment tangent vector (s) and the Burger’s vector (b). d General representation of the dislocation configuration with multiple Gauss points between nodes. e Graph representation of the dislocation configuration.

To achieve the goal of the proposed work, we present a general physics-informed data-driven approach for modeling dislocation mobility law. Our approach incorporates a physics-inspired mathematical structure based on the theoretical descriptions of dislocation mobility into the proposed GNN model. Features are designed to respect physical constraints, e.g. rotational and translational invariance. The proposed physics-informed GNN mobility law (PI-GNN) is trained using a coarse-grained MD dataset, with a loss function defined in “Modeling mobility law using Physics-informed Graph Neural Network (PI-GNN)” in “Methods”. To quantify the predictive uncertainty and facilitate accelerated learning, we employ an uncertainty quantification-driven active learning framework39 that is capable of providing on-the-fly uncertainty measures and automatically query new MD simulations to improve the model’s predictability and uncertainty. By studying the mobility of {110} dislocations in BCC Fe, we demonstrate the effectiveness of this PI-GNN framework to accurately capture the relevant physics of dislocations systematically.

Results

Learning the mobility function using Graph Neural Networks in the Physics-informed Machine Learning (PIML) framework

A GNN is a type of neural network designed to act on graph-structured data, where nodes represent entities (e.g., the nodes and GPs on the dislocation) and edges represent relationships (e.g., the spatial proximity of the nodes) between them. GNNs learn powerful representations of graph data by propagating information between neighboring nodes. Unlike convolutional neural networks on grids, GNNs respect rotational and translational invariance, making them well-suited for learning on irregular graph domains. As such, GNNs serve as an ideal neural architecture for the coarse-grained representation of dislocation structures generated by DXA. Furthermore, the design of the PI-GNN architecture was informed by established physics in linear elasticity theory pertaining to dislocation dynamics in continuum media15, i.e.,

$$\frac{{\rm{d}}{\bf{x}}}{{\rm{d}}t}=\frac{{\bf{f}}({\boldsymbol{\sigma }},{\bf{b}},{\bf{s}})}{B(T,{\bf{b}},{\bf{s}})}\exp \left(-\frac{\epsilon ({\boldsymbol{\sigma }},{\bf{b}},{\bf{s}},T)}{{k}_{B}T}\right),$$
(2)

where f, B, and ϵ are separate networks modeling the contribution of the Peach-Koehler force, drag, and thermally activated processes in dislocation dynamics. These networks, referred to as F-net, B-net, and E-net, are jointly trained on time-series data of the dislocation dynamics, generated by MD and processed through the DXA-MoDELib pipeline. Through message passing40, the PI-GNN not only uses the local information but also information on nearby nodes to learn higher-order corrections to linear theories. More details of the PI-GNN model we adopted can be found in “Methods” and in the Supplementary Note 1.

Uncertainty quantification-driven active learning (UQ-AL) framework to facilitate efficient learning

We propose a workflow, outlined in Fig. 2, for training the PI-GNN while aiming to minimize its predictive uncertainty. Rather than generating one large dataset all at once, the workflow begins by learning from an initial dataset, and then progressively queries new datasets where the trained models exhibit the most uncertainty. In each iteration, we independently train Men identical GNN models with randomized initialization of weights and biases. These models are then evaluated along the trajectories of a set (8) of newly generated datasets with randomly sampled simulation parameters Λ = {T, σ, …}, and the uncertainties of the models’ single-step predictions are quantified. This can be considered as an ensemble learning39 method. We integrate the top 50% of new configurations that induce the highest predictive uncertainty into the existing training dataset. The PI-GNN ensemble is retrained on this augmented dataset, and the process repeats. We note that this method is an approximation of more rigorous but computationally expensive UQ-AL frameworks (e.g., Bayesian Optimization41) for two reasons. First, the variability of the Men model predictions is only a lower bound to the true uncertainty39. Second, we select the most uncertain candidates from randomly sampled simulation parameters, rather than directly querying the single most uncertain point across the entire simulation parameter space. More details of the proposed UQ-AL framework can be found in “Methods”.

Fig. 2: Flow chart of UQ-driven active learning framework.
figure 2

The UQ-AL framework is composed of two computational platforms, the high-throughput microscopic MD simulations, and PIML-UQ platform. The UQ-AL workflow begins by generating and learning from an initial dataset, and then progressively queries new datasets where the trained models exhibit the most uncertainty. In each iteration, we independently train an ensemble of identical GNN models with randomized initialization of weights and biases. These models are then evaluated on newly generated datasets with randomly sampled simulation parameters, and the uncertainties of the models' single-step predictions are quantified. We integrate the top 50% of new configurations that induce the highest predictive uncertainty into the existing training dataset. The PI-GNN ensemble is retrained on this augmented dataset, and the process repeats.

We illustrate the advantages of the proposed UQ-AL framework in learning the mobility law using the dipole dataset generated from MD simulations as described in “Automated on-the-fly generation of dislocation mobility data in MD” in “Methods”. In this numerical experiment, the PI-GNN models are continuously retrained as more data are acquired and integrated into the training set, and we compare the performance of the model when the new datasets are either actively selected by the UQ-AL or passively queried by randomization. We fix the total number of ensemble models Men at 20. The initial dataset D(Λ) are generated on the set of simulation parameters: temperature T  {300 K, 500 K}, RSS σ  {0.25 GPa, 0.5 GPa}, and the line orientation (i.e., the angle between the tangent vector s and the Burger’s vector b) angle(s, b)  {45°, 135°}. In each iteration, we maintain a candidate pool of 8 new trajectories by sampling the simulation parameters, T ~ Unif(100 K, 1000 K), σ ~ Unif(0.1 GPa, 1.0 GPa), and angle(s, b)  (0°, 180°), to create new trajectories of dislocation dynamics. With UQ-AL, the trained PI-GNN ensemble is evaluated along the trajectories to quantify their averaged single-step predictive uncertainty; we select and integrate the top 50% (4) trajectories with the highest predictive uncertainties (i.e, the PI-GNN ensemble models disagree with the most) into the existing training dataset. With passive learning, we randomly select 4 new trajectories from the pool and integrate them into the training dataset. For both methods, the selected trajectories are removed from the candidate pool, which is then replenished by 4 newly sampled parameter settings to generate the trajectories for the next iteration.

Figure 3 illustrates how the model performance improves as more data are acquired into the training set, for both active and passive selection of the datasets. Evidently, with the proposed UQ-AL framework, the prediction uncertainty is consistently reduced through the iterations. In comparison, the predictive uncertainty with passive selection is higher and fluctuates more between iterations, even though it eventually decreases in the large-data limit. Suppose we set the uncertainty threshold to be 25% of the mean nodal displacement, σ(Δx) = 25%μ(Δx), the active and passive acquisition of the new training dataset requires 8 and 15 iterations, or equivalently 36 and 64 queries (initial 8 fixed, then 8 randomly sampled ones to populate the candidate pool in the first iteration, and 4 randomly sampled ones in each of the subsequent iterations) of new trajectories, respectively. As such, UQ-AL saves more than 40% of computational resources for generating training dataset in this experiment. Because the proposed UQ-AL can efficiently query computationally expensive high-fidelity datasets to learn mesoscale models and thus accelerate the development of data-driven mesosocpic models for DDD, we deploy this framework in the following analysis as part of the data acquisition pipeline.

Fig. 3: Comparison of the proposed UQ-AL framework with a passive learning method.
figure 3

The uncertainty measure of predictions for nodal displacement as a function of the (a) active (AL) and (b) passive learning (PL) iterations.

Predictability and generalizability of the PI-GNN mobility law

In the following analysis, the proposed UQ-AL is used to train the PI-GNN model until the predicted uncertainty is reduced to 25% of the mean nodal displacement. The 50% top-performing models in the ensemble are selected for the following validation analysis, in which we compare the single-step (Δt ≈ 1 ps) prediction of the trained models to a validation dataset. The model used for prediction was trained using dipole dataset with RSS  {0.1 GPa, 0.25 GPa, 0.5 GPa, 0.75 GPa, 1.0G Pa}, T {100 K, 300 K, 500 K, 700 K, 1000 K}, and angle(s, b)  (0°, 180°). A separate validation dataset was used to demonstrate the prediction capability of the PI-GNN model. See “Methods” for details on the train/test/validation dataset. We designed three different experiments: (a) the PI-GNNs ensemble is trained on a dataset comprised of nominally straight dislocation dipoles and compared against the dipole validation dataset, (b) the PI-GNNs ensemble is trained on a dataset comprised of expanding shear loops and tested on a loop validation dataset, and (c) the PI-GNNs ensemble is trained on the dipole dataset and tested on the loop dataset. Figure 4 shows selected examples of the single-step prediction; more results can be found in Supplementary Note 2.

Fig. 4: The predicted single-step evolution for various dislocation configurations and learning method.
figure 4

a PI-GNN mobility law is trained on coarse-grained dipole dataset with different RSS values, temperatures, and angles between Burger’s vector and tangent vector. The predictions are also made on dipole configurations with different line orientations at 500 K and 0.9 GPa. b PI-GNN mobility law trained on dislocation loop expansion data and prediction on dislocation loops at 500 K and 0.9, 1.2, and 1.5 GPa. c Testing the generalizability of the PI-GNN mobility law by training on dipole dataset and predicting on loop expansion.

We observe that the PI-GNN is capable of learning an effective mobility law for various dislocation dipole configurations. Experiment (a) (Fig. 4a) shows that the PI-GNN model trained on various conditions of temperature, RSS, and line orientation of a dipole configuration, can learn an effective mobility law to predict the evolution of the dislocation with an out-of-sample dipole configuration. In contrast to experiment (a), where we explicitly sampled the line orientation of the dislocation for training, the PI-GNN learns the dependence of the line orientation directly from the angular distribution on the evolving loops in experiment (b), and accurately predicts the evolution of out-of-sample loop configurations. Experiment (c) shows that the PI-GNN is able to learn the mobility law from the dipole dataset and generalize to the out-of-distribution prediction task for loop configurations. Such generalizability indicates that the PI-GNN is learning the physics of the dislocation dynamics, instead of memorizing the training data.

Quantitative error analyses of the learned PI-GNN mobility law are reported in Fig. 5, where the panels’ layout is analogous to Fig. 4. The prediction error is quantified by both the percentage and absolute errors, the computation of which is based on the proposed loss function (See “Modeling mobility law using Physics-informed Graph Neural Network (PI-GNN)” in “Methods” for more information). For the dislocation dipole, the median prediction error is in general very small for different orientations, temperatures and stress levels (500 K and 0.9 GPa is shown in the figure). For screw-like dislocation where the angle(b, s) is close to 0° and 180°, the percentage error is comparatively larger. This is mainly caused by an imbalanced training dataset, due to the timescale separation between the dynamics of screw dislocations and mixed/edge dislocations. At the timescales of the dataset (several picoseconds) used in this study, the screw dislocation dynamics are dominated by thermal fluctuations. Therefore, the signal-to-noise ratio of the nodal displacement, which is the prediction goal of the PI-GNN mobility law, is much smaller than that of the edge. The small signal-to-noise ratio can only be seen for configurations near the pure screw configuration (i.e., angle\(\left({\bf{b}},{\bf{s}}\right)={0}^{\circ }\) and 180°), while the majority of the data points do not have this signature. Consequently, the trained PI-GNN model predicts the edge and mixed configurations better than the pure screw configuration. Future work to rectify the training data imbalance will be discussed in “Discussion”. When we compare the absolute error in Fig. 5b, the absolute errors for different angles are at approximately the same range. For dislocation loops, the single-step prediction percentage error is also very small. The errors in generalization are plotted in Fig. 5d. As explained above, we train the model in this case using dislocation dipoles and make predictions on shear loops. Compared to Fig. 5c, even though the percentage error is larger, it still falls in an acceptable range, with the median at ~20%. This shows that with the PIML structure and features, the PI-GNN model can still capture roughly 80% of the total motion even without any additional training on the complex dataset. These numerical experiments demonstrate the potential generalizability of the proposed UQ-AL PI-GNN framework for modeling complex dislocation networks using simple datasets.

Fig. 5: The predicted single-step error for various dislocation configurations and learning method.
figure 5

PI-GNN mobility law is trained on coarse-grained dipole dataset with different applied stress levels, temperatures, and angles between Burger’s vector and tangent vector. The predictions are also made on dipole configurations with different orientations at 500 K and 0.9 GPa. Both the (a) percentage error and (b) absolute unnormalized error are shown. c PI-GNN mobility law trained on dislocation loop expansion data and prediction on dislocation loops at 500 K and 700 K and various applied stress levels. d Testing the generalizability of the PI-GNN mobility law by training on dipole dataset and predicting on loop expansion.

Integrating PI-GNN mobility law with DDD for iterative multi-step predictions

In Fig. 6, the predicted iterative multi-step evolution of both dipole and loop configurations are visualized at different consecutive time steps. We show results at 0.9 GPa and 500 K, and angle(b, s) ≈ 80° for the dislocation dipole. For each row in Fig. 6, the panels from left to right show the forward evolution in time, where configurations from preceding time steps are also visualized as translucent markers. As observed in Fig. 6, the PI-GNN mobility law enables us to achieve qualitatively accurate long-time predictions. The coarse-grained model prediction will eventually deviate from the GT, but within the visualized prediction horizon, the relative error remains <10%. Quantitative analysis of the errors in multi-step prediction is provided in Supplementary Fig. 13.

Fig. 6: The predicted multi-step evolution for various dislocation configurations and learning method.
figure 6

For a given snapshot, the GT is represented using blue-filled circles, the PI-GNN prediction using red-filled circles with translucency applied to configurations from all preceding time steps. a A dislocation dipole(angle (b, s) ≈ 80°) at 0.9 GPa and 500 K predicted using PI-GNN mobility law trained on dislocation dipole data. b An expanding dislocation loop at 500 K and 1.2 GPa, predicted using PI-GNN mobility law trained on dislocation loop expansion data.

Interpretability of the PI-GNN mobility law

The phenomenological model, represented by Eq. (2), involves three distinct physical aspects of dislocation dynamics, namely the force experienced by the dislocation given a stress state, dissipative mechanisms resulting in drag, and thermal activation, each of these making a unique contribution. We model each aspect using a separate GNN as shown in Eq. (2) and train them altogether with a single loss function. Even though quantitatively isolating each process from the trained PI-GNN mobility law is challenging, the proposed framework still offers qualitative insights into their sensitivity and dependence on the input features. For instance, scaling both F-net and B-net by a constant factor does not alter the output. To resolve the identifiability issue arising from the linear constant factor, we compute the ratio of F-net over B-net, as well as other components of the PI-GNN mobility law, against various input features.

In Fig. 7a, we plot the output of F-net/B-net at different angles and applied stress levels. Despite the identifiability challenge, combining the outputs of both networks allows us to study their collective contributions to the prediction. With the increase of applied stress level, the predicted contributions to nodal displacement become larger as expected. For different angles representing edge, screw or mixed dislocations, the magnitude of the output is larger around 90° (i.e., edge dislocation), and small at 0° and 180° (i.e., screw dislocation), accurately capturing the well-known disparities in the mobility of screw and edge dislocations in BCC metals. An interesting observation is that the output is not symmetric with respect to 90° and is skewed toward larger angles. In Fig. 7b, the dependence of the combined F-net/B-net on temperatures is plotted. We observe that the increase in phonon drag with increasing temperature is also captured by the GNNs. It is important to note that the temperature-dependence of phonon drag is known to be stronger for edge-like dislocations17 and the GNN learns this behavior; as shown in Fig. 7b that the temperature-dependence is stronger close to 90° or edge dislocation as it is apparent from the GT MD data (see Supplementary Fig. 11 for dislocation velocities as a function of stress and temperature for screw, edge, and mixed configurations).

Fig. 7: Physics-informed interpretation of the three components of the PI-GNN mobility law.
figure 7

a Box plots of the \(\frac{{\bf{F}}}{{\bf{B}}}\) for different orientations, applied stress levels at 500 K. b Box plots of the \(\frac{{\bf{F}}}{{\bf{B}}}\) for different orientations, and different temperatures at 0.5 GPa. c Box plots of the E for different orientations, and different temperatures at 0.5 GPa. d Box plots of the total predicted displacement for different orientations, and different temperatures at 0.25 GPa.

Figure 7c illustrates the dependence of the contribution from E-net output on the dislocation orientations. We note that for screw dislocations, with angles(b, s) ≈ 0°, 180°, the PI-GNN mobility law qualitatively captures the comparatively higher activation energy, leading to lower output from the E-net. Conversely, at the mixed configurations, the E-net output becomes larger. In addition, we observe a cusp at 110°, predicting a reduced contribution from the E-net for that singular orientation. The combined output from all three networks in Fig. 7d shows that the PI-GNN mobility law is capable of predicting expected dependencies on temperatures and angles. For instance, the increased phonon drag from high temperatures causes the dislocation lines to move slower at some orientations. In the case of screw dislocations with a low applied stress level, the nodal displacement is dominated by thermal activation, which has a reversed order compared to other angles (see the inset in Fig. 7d). This is also apparent for the singular orientation at to 110°, where thermal activation dominates at low temperatures. Such cusps in dislocation mobility is known to be present in BCC metals19,22 due to singular dislocation orientations which move by kink nucleation and propagation. The predicted trends are aligned very well with what is observed in MD simulations (see Supplementary Fig. 10), demonstrating the accuracy of the PI-GNN mobility law and the ability of GNNs to learn complex dependence of dislocation mobility on different parameters.

Discussion

The central objective of this work is to provide a general, physics-informed data-driven framework for modeling the mobility law in discrete dislocation dynamics simulations autonomously from higher fidelity molecular dynamics data. When constructing the modeling framework, we focused on the following aspects: generalizability, interpretability, and uncertainty quantification. To achieve generalizability for the model, we adopt the flexible graph-based data structure to represent the dislocation network, which can be easily extended to dislocation configurations beyond those studied in this paper. We demonstrate that this framework is capable of learning many complex aspects of dislocation motion in BCC metals, such as highly anisotropic dislocation mobility between screw and edge configurations, cusps in dislocation mobility at singular line orientations, and temperature dependence of phonon drag. The interpretability of the proposed framework results from the physics-inspired structure of the PI-GNN mobility law and physics-informed feature engineering. We illustrate the interpretability of the proposed framework by analyzing learned components of the PI-GNN, which showed qualitative agreement with various physical aspects of dislocation mobility. Uncertainty quantification is built into the proposed framework to inform data selection from MD simulations for active learning, which is shown to significantly accelerate the development of PI-GNN mobility laws. We acknowledge that with the limited dataset used for training, this specific model may not perform well on large DDD simulations with complex configurations. However, the proposed uncertainty-driven AL, along with PI-GNN architecture will help accelerate the model development and refinement for complex configurations. We also expect the proposed framework to be beneficial when the dimensionality of the simulation parameters is even larger, e.g. for learning Non-Schmid effects in BCC.

Further work focusing on improving the PI-GNN mobility law using the proposed UQ-AL framework include (1) mixed modeling of the screw and mixed/edge dislocations, trained by MD data generated at different timescales for capturing both the fast-evolving mixed/edge dislocations and slow-evolving screw dislocations, (2) time scale-aware feature engineering for mobility law input, for enabling variable time-step integration, and (3) stochastic modeling of screw-like dislocations. In addition to the mobility law, an important component of DDD simulations is the description of discrete events resulting in topological re-arrangements of the dislocation network. These consist of cross-slip of screw dislocations, junction formation and annihilation, and dislocation nucleation. Our PI-GNN mobility law is capable of predicting a 3-D displacement and thus, it is well set up to predict cross-slip events. Extension of the PI-GNN mobility law to describe dislocation junctions and nucleation is possible using advanced methodologies like dynamic graphs42,43, for example. We also note that the extension of our framework to more complex systems like alloys is possible by using additional features for local environment dependence. These advancements will be considered in future studies.

We are aware of a pre-print by Bertin et al.44, independently advocating a GNN-based method for learning dislocation mobilities. While the basic concept is similar, namely representing dislocations as graphs, there are multiple distinctions between the two approaches as follows: (1) Training protocols: we tested multiple, UQ-driven, training protocols, with relatively simple MD simulations of straight dislocations and expanding shear loops in broad parameter settings. In contrast, Bertin et al. used large-scale MD simulations of high strain-rate deformation, using a narrow set of loading conditions; (2) Our PI-GNN approach allows the development of a physically interpretable non-linear mobility law, while such an interpretation is not straightforward in the approach by Bertin et al.; (3) Features for our PI-GNN model are composed of local stress tensor and line geometry (defined by Burgers vector and line tangent) as opposed to nodal forces in Bertin et al.; (4) Loss function in our approach is defined using a distance metric between nodal positions, as opposed to a Nye tensor based approach in Bertin et al. While there are pros and cons to the two approaches, a detailed evaluation of their relative merits will be the subject of a future study.

Methods

In this section, we describe the generation of GT data from MD simulations, calculation of relevant features using the linear elasticity kernel in MoDELib Discrete Dislocation Dynamics (DDD) code, physics-informed graph neural networks (PI-GNN) framework for modeling mobility law of discrete dislocation dynamics, including the GNN architecture, physics-informed feature engineering, generalized loss function, and uncertainty-driven active learning (AL).

Automated on-the-fly generation of dislocation mobility data in MD

Generation of dislocation motion data from molecular dynamics simulations has been widely addressed in past16,17,18,45. It is a common practice to use free surface boundary conditions in the direction normal to the glide plane while building atomistic models with specific dislocation configurations moving under externally applied stress and temperature. Evaluation of the effect of free surfaces, on the stress experienced by the dislocation, requires solution of a boundary value problem coupled with the DDD framework. To circumvent this, we used periodic boundary conditions and devised an automated model generation protocol where straight dislocation dipoles on a slip system \([11\bar{1}](101)\) are created for an arbitrary line orientation (defined by angle θ) with the Burger’s vector (±b). The simulation box is aligned as \(b\sin \theta\) (x), \(b\cos \theta\) (y) and [101] (z) with approximate lengths of 30 nm, 18 nm and 40 nm in these directions, adjusted based on the crystallography of Fe BCC unit cell. The dislocation line is oriented parallel to y-axis. Extra half-plane of atoms are removed along x with a thickness of \(b\sin \theta\). Note, for screw dipoles (θ = 0°) this results in no deletion, while for an edge (θ = 90°) a complete half-plane with a thickness of b is removed. We use the periodic image correction superimposed on the Volterra displacement field with 4 periodic images on both sides along Cartesian directions x and z of the box, i.e., 63 [\(\left.{(4\times 2)}^{2}-1\right)\)] images in total following Cai et al.46. We report a few sample box-orientations in Table 1, chosen out of a total 25 cases ranging from 0° to 180°.

Table 1 List of a few sample atomistic supercell orientations out of a total of 25 cases

For simulating the motion of a shear loop, an initial loop of radius 15 nm is created on a \([11\bar{1}](101)\) system. The simulation box is then affinely deformed according to the linear elastic solution corresponding to the applied stress to prevent the loop from shrinking due to self interactions. To generate high-throughput dislocation trajectories, we subject both the loop and dipole configurations to a resolved shear stress range of σ  {0.1, 0.25, 0.5, 0.75, 1.0} GPa and a temperature range of T {100, 300, 500, 700, 1000} K. All MD simulations were performed using the EAM potential for iron (Fe) developed by Mendelev et al.47.

To efficiently perform active learning, on-the-fly processing of million atom MD trajectories is crucial. Our dipole models have 3-4 million atoms and MD simulations were performed using the kokkos-implementation of LAMMPS48. We apply a predefined global stress tensor and temperatures of the simulation box by running simulations in the NPT ensemble for 40 ps with a timestep of 1 fs. On NVIDIA A100 GPUs, 1 million EAM atoms/GPU can be simulated at a rate of up to 10 ns/day, excluding any input/output (I/O) overhead. To avoid reducing the simulation rate due to I/O as well as the storage of large volumes of MD data, we implement an in memory and on-the-fly analysis of dislocation trajectories via the Dislocation Extraction Algorithm (DXA) from Ovito-API49. Simulations were analyzed every 0.1 ps, with nodal spacings of 1.5 nm. This is schematically shown in Fig. 1a, b.

Augmenting coarse-grained MD data with local features

Coarse-grained representations of dislocations present in the MD simulations are generated using the DXA algorithm. Using the same nodes identified by DXA and connecting the nodes via segments that are further discretized using Gauss points (GPs) (as shown in Fig. 1b), an exact geometrical representation of the MD dislocation can be replicated in MoDELib, which is used to compute the local stress state at the GPs (Fig. 1c). The stress state along the dislocation network is computed using a non-singular isotropic linear elastic theory of dislocations33, and it includes dislocation-dislocation interactions as well as the external stress50,51,52,53,54,55,56,57,58,59. MoDELib supports a periodic dislocation topology38, and the stress field of the periodic dislocation network is computed using an Ewald summation strategy60, which tapers the real-space long-range stress field by a user-defined characteristic length. Convergence of the stress calculation is typically realized using eight periodic images in each direction. Augmented coarse-grained MD snapshots are finally obtained, which consist of discretized dislocation lines using nodes, connectivities, and stress information at the GPs including both dislocation-dislocation interactions and external contributions. It is from these snapshots, containing both the geometry from MD and the local stress state computed by DDD that we move forward to construct the physical network and train the PI-GNN. More details about the methodology can be found in Supplemental Information.

Modeling mobility law using Physics-informed Graph Neural Network (PI-GNN)

Representing dislocations using heterogeneous graphs

Graphs are a type of data structure that describes a set of objects, often referred to as vertices, and their relationships, or edges. Graphs are expressive and powerful at representing complex non-Euclidean data structures in social science, engineering, and physical science applications40,61. In the context of dislocation dynamics, we demonstrate the mapping between the physical and computational representation of a dislocation network in Fig. 1d, e. Physical representation consists of nodes and GPs, connected by the dislocation segments, illustrated by Fig. 1d. The physical features of the dislocation lines are defined on the GPs between two nodes, such as the Burger’s vector, segment tangent vector, or local stress, while the features of the nodes consist only of positions. The data structure of a dislocation network can be easily mapped to a heterogeneous graph, where we introduce two types of vertices: vN representing nodes and vGP representing GPs in the physical network.

Physics-informed graph neural network

Figure 8a presents the structure of the graph neural network inspired by the physics-based phenomenological model of dislocation dynamics15 as shown in Eq. (2). We decompose the PI-GNN mobility law into three components: F-net, B-net, and E-net, each with distinct input features and network structures, depending on the physics-based mobility function. As shown in Fig. 8a, we model the three network components with two types of layers, hidden GNN layers, which consist of a series of black-box layers for learning feature embedding from the graph data structure, and physics layers, in which physical constraints for each component of the mobility function can be injected. With these two types of layers, we aim to combine the functional expressiveness of GNNs and the interpretability of physical structure to construct a powerful, robust, and predictive model for dislocation dynamics. For the hidden layers, we extend the widely used GraphSAGE62, which were designed to operate on homogeneous graphs to sample and aggregate features from large graphs, to one that can operate on heterogeneous graphs. We refer to this extended model as HeteroSAGE in this work. Figure 8b illustrates the message-passing and feature operations of each HeteroSAGE layer: the features defined on the neighboring vertices are aggregated using an aggregator function, which is then concatenated with local features and passed through a fully-connected network. This process is then repeated in the opposite direction to generate the output of one HeteroSAGE layer. By stacking multiple HeteroSAGE layers, the model can learn complex feature embeddings from the graph. This enables the PI-GNN model to learn non-local configuration-dependent corrections to the stress computations using linear theory. For problems tested in this work, we used two of HeteroSAGE layers, each with 20 nodes for each network. For the physics layers, we encode the mathematical structure of physics-based mobility law using differentiable functions. This design ensures that the output of each component adheres to the corresponding constraints dictated by the physics-based model. More details on the structure can be found in the Supplementary Information.

Fig. 8: Overview of the PI-GNN architecture and training loss function.
figure 8

a Physics-informed graph neural network structure for learning the mobility law in Discrete Dislocation Dynamics (DDD), which consists of three different networks: F-net, B-net, and E-net. Each network consists of hidden layers, which are stacked HeteroSage GNNs, and physical layers, which are physically inspired functions adopted from a phenomenological description of dislocation mobility. b The proposed HeteroSage GNN structure and message passing diagram for heterogeneous graphs. c, d Illustrations of the proposed generalized loss function for measuring the distance between two dislocation configurations with different total number of nodes resulting from different discretization.

Feature engineering

In modern ML applications, deep neural networks, coupled with large datasets are commonly used to discover informative feature embeddings for effective predictions. However, in scientific applications, a set of carefully designed features can inject physics into the model, leading to improved prediction robustness and reduced dataset size requirements. In this work, we use a set of specially designed features from nodes and GPs as the input of the PI-GNN mobility law. We leverage translational invariance and rotational invariance, which are important in physical systems, for guiding the feature engineering processes. The input features of the PI-GNN are designed such that the aforementioned constraints are automatically satisfied by the model, thus injecting crucial physics-based induction biases into the model. Details on the selected features can be found in the Supplementary Note 1.

Generalized loss function

After constructing the GNN model, we define a loss function that quantifies the differences between the model-predicted evolution of the dislocation and the ground truth, whose minimization will constrain the parameters of the PI-GNN during learning. However, a challenge arises when formulating the loss function from coarse-grained MD simulation snapshots, as the total number of nodes in dislocations separated by the time step Δt may differ and there is no clear one-to-one mapping between nodes. This hinders a straightforward computation of their differences using the commonly used Mean-Squared-Error (MSE) in traditional ML tasks. To resolve this issue, we propose a generalized loss function to measure the distances between two dislocations. As illustrated in Fig. 8c, d, the dislocations can be described as a collection of nodes and segments/edges connecting them. Instead of using the traditional point-wise positional difference to construct the loss function, we define a node-to-segment distance based on the geometric projection as:

$$l\left({N}_{i}^{{D}_{1}},{e}_{j,k}^{{D}_{2}}\right)=\left\{\begin{array}{l}\left| {{\bf{x}}}_{{N}_{i}}^{{\prime} {D}_{1}}-{{\bf{x}}}_{{N}_{i}}^{{D}_{1}}\right| ,{{\bf{x}}}_{{N}_{i}}^{{\prime} {D}_{1}}\in {e}_{j,k}^{{D}_{2}},\quad \\ \min \left\{\left| {{\bf{x}}}_{{N}_{i}}^{{D}_{1}}-{{\bf{x}}}_{{N}_{j}}^{{D}_{2}}\right| ,\left| {{\bf{x}}}_{{N}_{i}}^{{D}_{1}}-{{\bf{x}}}_{{N}_{k}}^{{\prime} {D}_{2}}\right| \right\},{{\bf{x}}}_{{N}_{i}}^{{\prime} {D}_{1}}\notin {e}_{j,k}^{{D}_{2}}.\quad \end{array}\right.$$
(3)

With the node-to-segment distance \(l({N}_{i}^{{D}_{1}},{e}_{j,k}^{{D}_{2}})\) defined, we use the minimum value of the node \({N}_{i}^{{D}_{1}}\) to all the segments \({e}_{j,k}^{{D}_{2}}\in {D}_{2}\) as the node-to-dislocation distance \(l({N}_{i}^{{D}_{1}},{D}_{2})\), which can then be used to compute the dislocation-to-dislocation distance as the sum of the node-to-dislocation distances \(l({D}_{1},{D}_{2})=\sqrt{{\sum }_{i}l{({N}_{i}^{{D}_{1}},{D}_{2})}^{2}}\). We notice that the proposed distance measure based on equations (3), is not symmetric, i.e. l(D1, D2) ≠ l(D2, D1). In this work, we adopt a loss function based on a symmetric distance measure, which is achieved by averaging the distances computed in both directions to avoid the potential issue due to the smoothing effects (see Supplementary Note 1 for more details):

$$L({D}_{1},{D}_{2})=\frac{1}{2}\left[l({D}_{1},{D}_{2})+l({D}_{2},{D}_{1})\right],$$
(4)
$$Loss=\sum _{i}L({D}_{i}^{{\rm{GT}}},{D}_{i}^{{\rm{Pred}}}).$$
(5)

Uncertainty quantification-driven active learning

Ensemble learning for uncertainty quantification

In the context of machine learning for scientific applications, particularly in cases where fully-resolved simulation data is sparse, the neural network-based model demonstrates competitive prediction accuracy. However, this may become problematic when a single, over-parameterized NN provides predictions for new datasets with unwarranted confidence. To address this issue, we adopt the ensemble learning approach to properly quantify the prediction uncertainty for modeling the mobility law using PI-GNN.

Ensemble learning is a machine learning training approach aimed at achieving improved prediction performance by combining the predictions from multiple independently-learned models. The objective of the ensemble learning approach lies in generating multiple instances of model parameters that span the local minima within the parameter-loss landscape of the NN-based model. We then approximate the PDF of the predictions using a Gaussian distribution, whose mean and variance are computed using the predictions from the top-performing ensemble models:

$$\frac{\widehat{d{\bf{x}}}}{dt}\,\approx\, \overline{\mu }\left(\frac{d{\bf{x}}}{dt}\right)=\frac{2}{{M}_{en}}\mathop{\sum }\limits_{i=1}^{{M}_{en}/2}{\left.\frac{d{\bf{x}}}{dt}\right| }_{{\theta }_{i}},$$
(6)
$${\sigma }^{2}\left(\frac{d{{\bf{x}}}_{j}}{dt}\right)\approx \frac{2}{{M}_{en}}\mathop{\sum }\limits_{i=1}^{{M}_{en}/2}{\left({\left.\frac{d{{\bf{x}}}_{j}}{dt}\right| }_{{\theta }_{i}}-\frac{{\widehat{d{\bf{x}}}}_{j}}{dt}\right)}^{2},$$
(7)

where \(d{\bf{x}}/dt{| }_{{\theta }_{i}}\) refer to the PI-GNN prediction of the nodal displacement using the parameter θi. Through the ensemble learning procedure, we can obtain an estimate of the distribution of PI-GNN outputs, which can inform the UQ of predictions.

Active learning

Active learning is a special procedure in ML that can interactively query a new dataset based on a pre-determined strategy. The active learning approach is commonly used in supervised learning setups where it is very expensive to manually enumerate new data points for the query. When a proper strategy for selecting new data points is used, the total number of data points to learn a model can be dramatically reduced compared to a passive learning strategy with a randomly queried dataset.

Figure 2 shows the workflow for the UQ-based active learning loop, which includes data generation, model training, UQ and data point selection. The details of the active learning procedure are described below:

  1. 1.

    Generate initial MD dataset with a set of randomly selected simulation input features Λ = {λi}, where λi can be different combinations of temperature, applied stress, orientations, Burger’s vector, etc. The simulated MD trajectories are then coarse-grained and post-processed to obtain snapshots of the dislocations nodes/GPs and their features. We denote the MD dataset with parameters Λ as D(Λ).

  2. 2.

    Train an ensemble of Men PI-GNNs on the dataset D(Λ) each with random initialization of model parameters. We employ a train/test/validation data split ratio of 70%/20%/10%.

  3. 3.

    Randomly sample a new set of MD simulation input features \({\Lambda }^{{\prime} }\) and run new MD simulations with them and regroup with existing test dataset to generate a test dataset \({{\bf{D}}}_{t}({\Lambda }^{{\prime} })\).

  4. 4.

    Take the top 50% (Men/2) performing trained PI-GNN models and use the ensemble variance of the nodal displacement prediction as a measure of prediction uncertainty.

  5. 5.

    If the highest uncertainty level is lower than a preset threshold, stop training. Otherwise, combine the most uncertain set of the test data with the training data to form a new set of training data, i.e. \({\bf{D}}(\Lambda \cup {\Lambda }^{{\prime} })\), which is then used to re-train (using previously converged model parameters as the initial weights) the ensemble models.

  6. 6.

    Move to step 3.