Introduction

Supported metals constitute an important class of heterogeneous catalysts widely used by the chemical and automotive industry, in controlling environmental pollution, and in energy conversion technology1,2,3,4. The importance of strong metal–support interactions has long been recognized5. Of special interest is the stabilization of single-metal atoms on a support, which is rapidly becoming a new frontier in materials chemistry and catalysis6,7,8,9,10,11. The single-atom nature of the catalysts can overcome the scarcity of particular elements, such as precious group metals, ubiquitous in catalysis1,8,12.

A general requirement for successful supported metal catalysts is high stability of the active phase in terms of the exposed metal surface. Sintering through Ostwald ripening leads to a reduction of the number of exposed metal atoms and is a common deactivation pathway of heterogeneous catalysts13,14,15. Diffusion into lattice sites of a strongly interacting support with defects can also lead to catalyst deactivation16. Obtaining stable single-atom catalysts (SACs) is a particular challenge and requires thorough knowledge about metal–support interactions. Several studies attempted to describe SAC stability in terms of the binding energy (Ebind) of a single-metal atom to the support17,18,19. All these investigations presume that stronger binding of a metal on a support will make it less prone to sintering. However, the diffusion activation barrier (Ea) of a metal atom on a support is the most relevant factor to its stability. Although several reports dealt with diffusion pathways of metal atoms on a support20,21, the thermodynamic and kinetic aspects of stability of SACs have not been systematically explored yet.

Stable SACs should exhibit diffusion barriers substantially higher than the thermal energy to avoid rapid sintering under realistic reaction conditions22. Despite significant advances in computing power, the calculation of activation barriers remains more demanding than that of binding energies of reaction intermediates. Moreover, some supports such as oxides with specific magnetic properties (mostly Cr, Mn, Fe, and Co-based systems, e.g., LaFeO3 perovskite) pose significant challenges in convergence23,24,25,26, making investigations at the density functional theory (DFT) level often very expensive. Thus, it would be desirable to identify correlations between the diffusion activation barrier and the binding strength to assess the stability of a wide range of SACs. A universal scaling relation that holds for diverse metal–support combinations would be preferred.

In the present study, we aim at identifying a correlation between the kinetic stability (Ea) and the thermodynamic stability (Ebind) of a range of single transition metal atoms on various supports. As there is usually no (or little) change in the energy of the initial and final states for diffusion events between adjacent similar adsorption sites, the Brønsted–Evans–Polanyi principle is not applicable. We turn to machine-learning (ML) methods to identify relevant physical descriptors and draw new correlations. We choose two reducible metal oxides (CeO2, TiO2), two stable metal oxides (MgO, ZnO), a perovskite (SrTiO3), and two 2-dimensional materials (MoS2 and graphene) as shown in Supplementary Fig. 1, relevant to many topics in contemporary heterogeneous catalysis. Stepped CeO2(111) was included as a support model with a corrugated and more reactive surface. A total of 11 transition metals (Cu, Ag, Au, Ni, Pd, Pt, Co, Rh, Ir, Fe, and Ru) were included as catalytically active single-atom centers, resulting in a dataset of 99 points. We identify a universal correlation between Ea with the cohesive energy (Ec) of bulk metal and Ebind of metal atoms to a support. This scaling law provides an accurate description of the stability of SACs and allows rapid screening of other systems solely on the basis of easily computable physical descriptors.

Results

DFT investigation

We computed Ebind and corresponding Ea, at the generalized gradient approximation (GGA)-DFT level (Perdew–Burke–Ernzerhof (PBE) functional). In this study, we focused on the diffusion of metal adatoms between two adjacent same sites on idealized support surfaces without vacancies, decoration by functional groups such as hydroxyls or other chemical impurities. The corresponding values are collected in the Supplementary Information (Supplementary Table 1). The adopted binding energies correspond to the most stable adsorption configurations determined by considering different adsorption sites of metal SAs on the supports and possible spin states where necessary (see Supplementary Table 2). Previously, it has been suggested that Ebind is a suitable descriptor for the stability of SACs17,27. Obviously, a stronger metal–support interaction reflected by a higher binding energy implies a lower mobility on the support. However, Fig. 1 shows that Ea correlates only moderately (R2 = 0.83) with Ebind. Graphene and the basal planes of MoS2 only weakly interact with metal atoms and, therefore, will not likely yield stable SACs. The correlation is especially poor for supports that exhibit relatively strong interaction with single-metal atoms. Therefore, such a linear correlation is not suitable for making useful predictions for a wider range of single-metal atom/support pairs. Our goal is to identify a more accurate correlation to improve predictability and general applicability for the diffusion barrier of metal–support systems.

Fig. 1: Activation barrier for diffusion (Ea) versus the binding energy (Ebind) of single-metal atoms.
figure 1

A range of supports is considered. A moderate only correlation between these two parameters exists.

Machine-learning study

We applied three different ML algorithms commonly used for regularization and descriptor selection28,29, namely ridge30, least absolute shrinkage and selection operator (LASSO)31, and elastic net32 regression. As an additional descriptor, we selected the (computed) cohesive energy (Ec) of the metal, because this parameter is an intrinsic metal property, representing the reactivity of the metal. Moreover, Ec can be looked up for all transition metals and is in principle also easily computed accurately at the DFT level. Previous work explored the description of Ebind in terms of parameters such as Ec17,27,33. We assessed the dependence of Ebind on Ec, as shown in Supplementary Fig. 2. Although on a specific surface, Ebind generally increases with Ec, there is no strong correlation between these two parameters when other surfaces and materials are also considered. This lack of correlation is because Ebind strongly depends on other properties of the metal–support systems, such as the ionization energy of the bulk metal and the work function and oxygen vacancy formation energy of support surfaces. We found that linear correlations between the two primary descriptors and Ea were not satisfactory either (see Supplementary Fig. 3). Therefore, we populated the hypothesis space by introducing polynomial and natural logarithm terms of the primary descriptors (Ec and Ebind) as secondary descriptors. The mathematical form of the model contains 87 unique descriptors xj for j = 1,…, p = 87 in the hypothesis space (see the complete list in Supplementary Tables 3 and 4), and is written as,

$${\it{E}}_{\it{a}} = \hat{\beta }_0 \,+\, x_1\hat{\beta } _1 + \cdots + x_p\hat{\beta }_p.$$
(1)

The magnitude of the regression coefficients weights across models given the standardized dataset are visualized in the heat map shown in Fig. 2 and Supplementary Fig. 4. The ridge, LASSO, and elastic net algorithms select descriptors containing (Ebind)2 as the most weighted and informative descriptors. In this work, all models are trained using 80% of the dataset (the training set). For the models requiring hyperparameter tuning including LASSO, ridge, and elastic net, a tenfold cross validation with ten repeats on the training set is used to prevent overfitting. The quality of a predictive model is usually evaluated through the prediction accuracy of future data and the interpretability of the model. Therefore, the root mean square error (RMSE) of the 20% dataset withheld (the testing set, not used in training of the models) is calculated as the best discriminant of model performance as shown in Supplementary Fig. 5. In addition, we use R2 values of all data to describe the general goodness-of-fit of the model.

Fig. 2: Coefficient weights in the standardized form.
figure 2

a Ridge and b LASSO machine-learning (ML) models.

Instead of populating the descriptors manually, genetic programming (GP) based on symbolic regression can also be used to efficiently explore the hypothesis space34. By running the GP program with different random seeds, we found that the fittest model with the lowest testing RMSE of 0.262 eV takes a particularly simple form:

$${\it{E}}_{\it{a}} = 0.565 \times \left( {{\it{E}}_{{\rm{bind}}}} \right)^2/{\it{E}}_{\it{c}}.$$
(2)

Thus, both the feature-selection algorithms and GP identify (Ebind)2/Ec as the most significant descriptor for the diffusion barrier of a single-metal atom. R2 given by Eq. (2) is as high as 0.93, indicating a strong correlation. Next, we employ (Ebind)2/Ec as the sole descriptor in an ordinary least square fitting procedure to further reduce the error (see Supplementary Tables 5 and 6). The resulting correlation with a testing RMSE of 0.220 eV takes the following form

$${\it{E}}_{\it{a}} = 0.636 \,\times \,\left( {{\it{E}}_{{\rm{bind}}}} \right)^2/{\it{E}}_{\it{c}} - 0.203.$$
(3)

Diffusion scaling-law for single-atom catalysts (DSL-SAC)

The strong correlation in the underlying computational data is visualized in Fig. 3a, and the strong quadratic dependency of Ea on Ebind is shown in Fig. 3b. While Eq. 3 does not necessarily imply an underlying physical concept, it is useful to consider the correlation in the form

$${\it{E}}_{\it{a}} \sim {\it{E}}_{{\rm{bind}}} \times \left( {E_{\rm{bind}}/E_c} \right) = {\it{E}}_{{\rm{bind}}} \times \sigma.$$
(4)
Fig. 3: Diffusion scaling-law (DSL)-SAC model for the diffusion activation barrier (Ea).
figure 3

a Scaling relation between diffusion activation energy Ea and (Ebind)2/Ec. b Quadratic scaling relations between Ea and Ebind for metals (indicated) on various supports. c Model performance. d Parity plot showing the DFT-computed Ea against different model predictions.

The ratio σ (Ebind/Ec), which lies between 0.16 and 1.06 for all metal–support combinations explored in this study, measures the binding energy of the metal as an isolated atom to the support surface with respect to the intrinsic binding strength of the metal atom in bulk metal (Ec), and can be considered as a correction factor to Ebind. For a reactive surface like CeO2(100), σ lies between 0.73 and 1.04 and thus the correction is small. On the other hand, the correction is much larger, i.e., σ lies between 0.16 and 0.34, when the same metals are on a more inert support, e.g., graphene. Low σ values mean that Ea is lowered with respect to its intuitively expected proportionality with the Ebind. The physical meanings of σ still remains to be explored, but we anticipate that this term is related to the degree of the metal–support interaction.

Figure 3c shows the performance of various ML algorithms. The LASSO, elastic net, and ridge algorithms have RMSE values of 0.198, 0.198, and 0.264 eV, respectively. R2 values do not vary much across the models. A comparison between the LASSO algorithm, which has the lowest RMSE, and our diffusion scaling-law for single-atom catalysts (DSL-SAC) is presented in Fig. 3d. Both models predict satisfactorily the diffusion barrier Ea with the majority of points scattered around the parity line. Therefore, we suggest the simple form of DSL-SAC to estimate diffusion behaviors of isolated metal atoms on a support. Overall, this ML analysis validates statistically the intrinsic correlation between Ea and (Ebind)2/Ec for SACs. Additionally, some efforts were made to estimate binding energies of metal SAs on supports using several physical descriptors17,27. Therefore, we expect that the prediction of binding energies (Ebind) can further accelerate the screening of diffusion barriers (Ea) of metal SAs on supports and save computational expenses.

Validation of DSL-SAC model

To further validate our DSL-SAC model, we determined the stability of isolated Pd and Pt atoms on the (100) surface of LaFeO3 perovskite. Given its complex magnetic properties and the Jahn–Teller effect, it is computationally expensive to optimize the geometry of a given starting configuration of LaFeO3 (see Supplementary Table 7)23. Based on the more easily accessible Ebind of Cu, Ru, Pd, and Pt on LaFeO3 (Cu: 2.40 eV, Ru: 6.29 eV, Pd: 2.48 eV, Pt: 3.93 eV), our DSL-SAC model predicts Ea values of 0.69, 2.87, 0.80, and 1.49 eV, respectively. The corresponding DFT-computed barriers were 0.87, 2.85, 0.85, and 1.68 eV, respectively, which are well within the RMSE determined above. An additional test of the stability of metal SAs on the ZrO2(100) surface further corroborates the validity of our DSL-SAC model (see Supplementary Fig. 6 and Table 8)

Stability assessment

To assess the stability of single-metal atoms on a support, we estimate the characteristic time of their diffusion (τdiffusion) using the following equations:

$${\it{k}}_{{\rm{diffusion}}} = \frac{{k_BT}}{h}\exp \left( { - \frac{{{\it{E}}_{\it{a}}}}{{{\it{k}}_{\it{B}}{\it{T}}}}} \right),$$
(5)
$$\tau _{{\rm{diffusion}}} = 1/{\it{k}}_{{\rm{diffusion}}},$$
(6)

where kdiffusion is the rate constant, kB is Boltzmann’s constant, T is the temperature, h is Planck’s constant, and R is the gas constant. In this study, the discussed lifetime is the characteristic time of diffusion of metal adatoms. The actual lifetime of a single atom on a support strongly depends on this characteristic diffusion time but also on the metal loading. In the Supplementary Table 9, we estimate the SAC lifetime for 0.5 wt% Pt/CeO2(111), demonstrating that this time is typically 1–2 orders of magnitude higher than τdiffusion. Given the strong variation of the τdiffusion with temperature, we employ τdiffusion as a conservative metric to compare the lifetime of SACs. It is worthy of note that the high concentration of metal SAs on support surfaces may influence the lifetime prediction, especially for weakly bonded systems in which metal SAs easily diffuse and aggregate into particles via Ostwald ripening.

Figure 4 shows the estimated SAC lifetime as a function of Ebind and Ec at room temperature and a typical high temperature of 1073 K at which SACs may be exposed to for several hours. This figure predicts the minimum required binding energies of single atoms (examples given for Pt, Pd, and Ni) on a support to obtain stable SACs for a day at the two indicated temperatures. In this study, we focused on idealized surfaces without defects (except for steps), hydroxyl groups or dopants, which may act as nucleation sites for sintering and in this way alter diffusion processes of metal adatoms35,36,37. Despite this simplification, the framework introduced is general and the correlations provide a guiding tool for initial materials selection. To put this into practice, we discuss the work of Datye and colleagues, who prepared Pt/CeO2 SACs by a vapor-phase synthesis at 1073 K for 12 h1,38. At this temperature, the binding energy to avoid Pt agglomeration is estimated to be 6.3 eV. We note that the highest binding energies on CeO2 are provided by CeO2(100) (5.47 eV) and steps of CeO2(111) (5.54 eV), the binding energy on the most stable CeO2(111) surface being much lower (see Supplementary Fig. 7). Thus, we predict that the single Pt atoms would agglomerate under the given conditions. However, we note that the synthesis is carried out in oxygen, i.e., Pt is present as PtO239. The computed binding energy for a PtO2 on a step of CeO2(111) is 7.5 eV, explaining the high stability of the SAC in the recent experimental work of Datye40. Recently, Lopez and co-workers found that Pt single atoms can be trapped on CeO2(100) in the form of Pt2+ with four O ligands owing to the inherent surface oxygen mobility21. Dynamic charge transfer can evidently affect the binding strength and diffusion behavior of single Pt atoms on CeO2(100)21. Doping of a metal like Pt in the surface of ceria also leads to a much higher binding energy (>10 eV), representing a very stable catalyst under most conditions (see Supplementary Fig. 8)1,35,41.

Fig. 4: Stability assessment of single-atom catalysis.
figure 4

Lifetime of SACs estimated by the characteristic time of diffusion depending on Ebind and EC at a 300 K and b 1073 K.

Discussion

Recently, O’Connor et al. employed machine learning to describe the stability of metal single atoms on oxide supports in terms of the binding energy17. In the present work, we extended this thermodynamic approach by determining scaling relations for the activation barrier of diffusion of the single atom, which is the key kinetic step in the sintering process. Modern ML approaches identify a correlation which includes, in addition to the binding strength, also the cohesive energy of the metal, which reflects the intrinsic chemical reactivity of the single-metal atom. Previously, Mavrikakis and co-workers found that the diffusion barrier of adsorbates such as atomic O and N and molecular CO species depends in a linear fashion on the corresponding adsorption energies on transition metal surfaces42. Our work emphasizes a more complex correlation for the diffusion activation barrier of metal adatoms on typical supports. We expect that the diffusion scaling-law defined for single transition metal atoms on supports will find general application and can be improved by taking also into account support defects, support dopants, and hydroxyl group terminations present on typical metal oxides. The model is validated by accurate predictions of diffusion activation barriers of single Cu, Ru, Pd, and Pt atoms on LaFeO3(100). The computational framework was also used to discuss the stability of Pt atoms trapped on a stepped CeO2(111) surface, highlighting the role of defects in stabilization of this relevant case study under experimental conditions.

In summary, we started from DFT calculations and determined a meaningful correlation of the activation barriers for diffusion of single-metal atoms on a support with two easily accessible parameters, Ebind and Ec. Various ML methods were used to assist the physical descriptor discovery without assuming the functional form of the model explicitly. Contrast to many complex ML or black-box deep learning models, the developed diffusion scaling-law (DSL)-SAC consisting of a single descriptor (Ebind)2/Ec offers an interpretable and generalizable model providing a facile approach to screen large numbers of metal–support combinations. Our approach provides a step toward understanding the stability of SACs, by properly considering the activation barrier for diffusion rather than the simple thermodynamic metric of binding energies. Our study also provides a powerful strategy to rationally design SACs with promising stability.

Methods

DFT calculations

Spin-polarized DFT calculations were carried out by the Vienna ab initio simulation package43. The ion–electron interactions are represented by the projector-augmented wave method and the electron exchange–correlation by the GGA with the PBE exchange–correlation functional44.

For ceria systems, the DFT + U approach was used with U = 4.5 eV for Ce atoms. A (4 × 4) periodic expansion of the ceria surface unit cell was employed. For Brillouin zone integration, a 1 × 1 × 1 Monkhorst–Pack mesh was used. The ceria slab models consist of three Ce–O–Ce layers. The atoms in the bottom layer were frozen to their bulk positions and only the top two Ce–O–Ce layers were relaxed. For TiO2(110), the DFT + U approach was used with U = 4.0 eV for Ti atoms. A (5 × 4) periodic expansion of the titania surface unit cell was employed. For Brillouin zone integration, a 1 × 1 × 1 Monkhorst–Pack mesh was used. The ceria slab models consist of three O–Ti–O layers. The atoms in the bottom layer were frozen to their bulk positions and only the top two O–Ti–O layers were relaxed. For MgO(100), a (3 × 3) periodic expansion of surface unit cell was used with four Mg–O layers. For Brillouin zone integration, a 3 × 3 × 1 Monkhorst–Pack mesh was used. The bottom two layers were frozen to their bulk positions and only the top two layers were relaxed. A (2 × 3) periodic expansion of surface unit cell with six Zn–O layers was constructed for the ZnO(100). The corresponding Brillouin zone integration are determined by a 3 × 2 × 1 Monkhorst–Pack mesh. The bottom two layers were frozen to their bulk positions and only the top four layers were relaxed. For SrTiO3(100) and LaFeO3(100), a (2 × 2) periodic expansion of surface unit cell was used with eight atomic layers. For Brillouin zone integration, a 1 × 1 × 1 Monkhorst–Pack mesh was used. The bottom four atomic layers were frozen to their bulk positions and only the top four atomic layers were relaxed. For graphene and MoS2, a (4 × 4) periodic expansion of surface unit cell was used with one layer. For Brillouin zone integration, a 1 × 1 × 1 Monkhorst–Pack mesh was used. All atoms were relaxed. For all models, a vacuum gap of 15 Å was used.

The climbing image nudged-elastic band algorithm was used to identify the transition states of the metal atom diffusion on a support45,46. The total energy difference was less than 10−4 eV and the relaxation convergence criterion was set at 0.05 eV/Å.

Regularization and feature-selection algorithms

We consider the usual linear regression model with n observations and p descriptors:31,32,47 x1, …, xp, where xj = (x1j, … xnj)T for j = 1,…, p. The response y = (y1, …, yn)T is predicted by

$${\boldsymbol{\hat y}} = \hat \beta _0 + {\boldsymbol{x}}_{\boldsymbol{1}}\hat \beta _1 + \cdots + {\boldsymbol{x}}_{\boldsymbol{p}}\hat \beta _p.$$
(7)

A model training procedure produces a vector of coefficients \(\hat{\boldsymbol{\beta }} = \left( {\hat \beta _0, \ldots ,\hat \beta _p} \right)\) by minimizing a loss function. For the ordinary least squares (OLS), the loss function is the residual sum of squares.

Standardization of a dataset which rescales all descriptor values to a centered mean of 0 and have variance of 1 is a requirement for many ML algorithms. It ensures all descriptors vary within the same order of magnitude so that the selection algorithms can identify the dominate descriptors correctly. For descriptor pi, the scaled values zi are

$${\boldsymbol{z}}_{\it{j}} = \frac{{{\it{x}}_{\it{j}} - {\it{\upmu }}_j}}{{\sigma _j}},$$
(8)

where μi is the mean of training samples and σi is the standard deviation. The standardized form of model can be written as

$${\boldsymbol{\hat y}} = \hat \beta _0^ \ast + {\boldsymbol{z}}_{\boldsymbol{1}}\hat \beta _1^ \ast + \cdots + {\boldsymbol{z}}_{\boldsymbol{p}}\hat \beta _p^ \ast.$$
(9)

The coefficient weights \(\hat {\boldsymbol{\beta }}^ \ast = \left( {\hat {\beta} _0^ \ast , \ldots ,\hat{\beta}_p^ \ast } \right)\) in the standardized form can be obtained from the original coefficients by,

$$\hat \beta _0^ \ast = \hat \beta _0 + \frac{{\mathop {\sum }\nolimits_{j = 1}^p \hat \beta _j\mu _j}}{{\sigma _j}};{\mathrm{and}}\,\hat \beta _j^ \ast = \sigma _j\hat \beta _j,\,{\mathrm{for}}\,{\mathrm{j}} = 1, \ldots ,\,p.$$
(10)

One should note that only \(\hat{\boldsymbol{\beta }}^ \ast\)reflects the weightage of each descriptor in producing the responses. Here we introduce three regularization techniques including the elastic net, the LASSO, and ridge regression with the aim to select significant descriptors and to prevent overfitting in OLS. The coefficient weights \(\hat {\boldsymbol{\beta }}^ \ast\)are obtained through minimizing the loss function containing the residual sum of squares, L2 and L1 norm of the coefficients:

$$\hat{\boldsymbol{\beta}}^\ast \equiv \mathop {\mathrm{argmin}}\limits_{{\boldsymbol{\beta }}^\ast } \left( {\left\Vert {{\boldsymbol{y}} - {\boldsymbol{X}}{\boldsymbol{\beta }}^\ast } \right\Vert^2\, +\, \lambda _2\left\Vert {{\boldsymbol{\beta }}^\ast } \right\Vert^2 \,+\, \lambda _1\left\Vert {{\boldsymbol{\beta }}^\ast } \right\Vert_1} \right).$$
(11)

The above equation is the general formula of the elastic net, which contains both L1 and L2 penalty functions. Rewriting Eq. (11) gives,

$$\hat{\boldsymbol{\beta }}^\ast \equiv \mathop {\mathrm{argmin}}\limits_{{\boldsymbol{\beta }}^ \ast } \left( {\left\| {{\boldsymbol{y}} - {\boldsymbol{X}}{\boldsymbol{\beta }}^ \ast } \right\|^2 \,+\, 0.5{{\lambda }}\left( {1 - r_1} \right)\left\| {{\boldsymbol{\beta }}^ \ast } \right\|^2\, +\, {{\lambda }}r_1\left\| {{\boldsymbol{\beta }}^ \ast } \right\|_1} \right),{\mathrm{where}}\,{{\lambda }} = {{\lambda }}_1 + {{\lambda }}_2\,{\rm{and}}\,r_1 = \frac{{{{\lambda }}_1}}{{{{\lambda }}_1 + {{\lambda }}_2}}.$$
(12)

The LASSO and ridge regression are special cases of the elastic net, where λ1 = λ, λ2 = 0, r1 = 1 (L1 penalty only) or \(\lambda _1 = 0,\,\lambda _2 = \lambda ,\,r_1 = 0\) (L2 penalty only), respectively (see Supplementary Fig. 9). The training of ML models requires the tuning of hyperparameters including the degree of shrinkage λ and the amount of L1 penalty r1.

Overfitting is prevented by shrinking the number of descriptors included in the model (L1) or the magnitude of the coefficients (L2). Removing descriptors from the model can be seen as setting their coefficients to zero. Ridge regression only shrinks the magnitude of the coefficients but keeps all descriptors in the model. The LASSO shrinks the number of descriptors and the nonzero coefficients indicate the significance of corresponding descriptors after selection. The elastic net performs both L1 and L2 regularization which shrinks both the number of descriptors included and the magnitude of remaining descriptors. The amount of L1 penalty r1 determines the degree of shrinkage on the number of descriptors.

The primary descriptors are Ec and Ebind. We also included terms of the order −2, −1, −0.5, 0.5, and 2 as well as the natural logarithm of the primary descriptors (Ec and Ebind) as secondary descriptors. As tertiary descriptors, we added pair interactions between primary and secondary descriptors (see Supplementary Table 2). We perform the training in the Scikit-learn ML package in Python29. The dataset is first randomly shuttled and split into the training (80% of the dataset) and testing set (20% of the dataset). We then performed tenfold cross validation on the training set. The loss function was evaluated by leaving 10% of the training set out and using the rest to fit the coefficients. This process was repeated for ten times. The best set of coefficients giving the minimal value of the loss function during the entire training procedure were selected. The final prediction errors were determined by the RMSE of the testing set.

Genetic programming (GP)

GP with symbolic regression is a supervised learning technique to identify an underlying mathematic expression that best describes the relationship between input and output data. The search space consists of combinations of simple mathematical operators on the input descriptors. An evolutionary algorithm is used to evolve a population of randomly generated candidate models according to natural-selection rules (selection, crossover, and mutation). Each model is associated with a fitness value, which in our case is the RMSE value of diffusion barrier Ea. The advantage of GP is that no manual combination of descriptors is needed. The population of models would evolve based on the rules towards the optimal, which can be seen as a stochastic optimization process. We include addition, subtraction, multiplication, division, square root, and natural logarithm as operators for two descriptors, Ebind and Ec. We perform GP analysis in the Python package gplearn6. The same set of training data (consisting 80% of the dataset) was used for training GP models and the rest 20% of the dataset was used as the testing set. We performed five parallel runs initialized with different random seeds (see Supplementary Fig. 10). In each run, the population size was 5000. The fittest models were selected after a sufficient number of generations (here we use 100) when the evolution process converged to one solution.