Using statistical learning to predict interactions between single metal atoms and modified MgO(100) supports

Metal/oxide interactions mediated by charge transfer influence reactivity and stability in numerous heterogeneous catalysts. In this work, we use density functional theory (DFT) and statistical learning (SL) to derive models for predicting how the adsorption strength of metal atoms on MgO(100) surfaces can be enhanced by modifications of the support. MgO(100) in its pristine form is relatively unreactive, and thus is ideal for examining ways in which its electronic interactions with metals can be enhanced, tuned, and controlled. We find that the charge transfer characteristics of MgO are readily modified either by adsorbates on the surface (e.g., H, OH, F, and NO2) or dopants in the oxide lattice (e.g., Li, Na, B, and Al). We use SL methods (i.e., LASSO, Horseshoe prior, and Dirichlet–Laplace prior) that are trained against DFT data to identify physical descriptors for predicting how the adsorption energy of metal atoms will change in response to support modification. These SL-derived feature selection tools are used to screen through more than one million candidate descriptors that are generated from simple chemical properties of the adsorbed metals, MgO, dopants, and adsorbates. Among the tested SL tools, we demonstrate that Dirichlet–Laplace prior predicts metal adsorption energies on MgO most accurately, while also identifying descriptors that are most transferable to chemically similar oxides, such as CaO, BaO, and ZnO.


INTRODUCTION
Transition metals (TMs) supported on oxide surfaces are ubiquitous heterogeneous catalysts in chemical processes that produce fuel and value-added chemicals, as well as in processes central to renewable energy and environmental protection technologies. Among supported TM catalysts, single-atom catalysts (SACs) have attracted attention due to their enhanced performance in many applications 1 , including CO oxidation 2-10 , water-gas shift [11][12][13] , selective hydrogenation [14][15][16][17] , dehydrogenation [18][19][20][21] , and photocatalytic 22 reactions. SACs with uniform TM dispersion expose every TM atom to the reaction environment, thus maximizing utilization of the expensive TM component. Key to achieving such uniform dispersion is the ability to control metal cluster sizes, which in turn is largely dictated by the TM's strength of interaction with the support. Predicting the strength of metal/support interactions is a challenging task because there are multiple factors at play, such as the reducibility of the oxide, the electronegativity of the metal, and the structure of the interface. Herein, we use density functional theory (DFT) and statistical learning (SL) to identify physical descriptors that build a predictive model for describing metal atom binding on MgO(100) surfaces. Focus is placed on understanding and predicting how modifications of the unreactive MgO(100) surface, through the introduction of surface adsorbates or dopants, can enhance metal binding energies. We also focus on comparing the performance of various SL approaches, where we show that the physical descriptors identified with Dirichlet-Laplace prior 23 are strong descriptors for predicting metal binding on MgO(100). Furthermore, we show that descriptors identified with MgO training sets also can be used to describe TM binding on modified CaO(100), BaO(100), and ZnO (100) surfaces-demonstrating the transferability of SL-derived descriptors beyond the MgO system they were trained to describe, albeit for very closely related systems.
Metal/support interactions impact both the morphology of the catalyst surface and oxidation states at the active site, which both influence catalytic performance 24,25 . The size of metal clusters on oxide supports is controlled by thermodynamic driving forces that usually cause small clusters to agglomerate (e.g., through Ostwald ripening) 26 . Metal/support interactions alter the chemical potential of each metal atom in the cluster, and thus influence thermodynamic stability with respect to cluster size 27,28 . These interactions also influence the kinetics of cluster formation, where an increase in the metal adsorption energy on the oxide surface generally reduces sintering rates leading to smaller particle sizes 29,30 . Metal/support interactions affect not only particle size distributions, but also alter the electronic state of the adsorbed metal [31][32][33][34] . This phenomenon, known as electronic metal-support interaction 35 , is caused by charge transfer between the metal and the support, which can alter reaction rates by affecting how strongly intermediates adsorb at the active site.
Adsorbates from the reaction environment, or dopants present in the oxide structure, can enhance charge transfer at the metal/ support interface 31,[36][37][38][39][40] . Addou et al. 41 demonstrated that adsorbed hydrogen atoms on the rutile TiO 2 (011)-(2 × 1) surface alters the binding energy of Pd atoms to the support, which stabilizes small Pd n clusters (i.e., n~1-3 atoms). Babucci et al. 42 used Fourier-transform infrared spectroscopy to show that the electronic state of Ir(CO) 2 can be altered on oxide supports that are modified with different ligands. They demonstrated that Ir complexes exhibit different reactivity toward 1,3-butene hydrogenation if the surface modifications shift the Ir oxidation state at the active site. Kumar et al. 43 used Hammett reactivity studies, together with DFT, to show that the electronic state of Au during benzyl alcohol oxidation is influenced by electron donation from the supporting oxide. Similar effects have been observed on surfaces that are modified by introducing dopants in the lattice of the oxide. For example, Shao et al. 44 reported that the electronic structure of CaO surfaces can be controlled by introducing Mo dopants that replace Ca atoms in the surface lattice. The additional valence electrons supplied by Mo migrate toward adsorbed Au clusters, thus enhancing Au binding on the modified CaO surfaces. This effect was shown to be less pronounced over Cr-doped MgO surfaces 45,46 , which demonstrates that the dopant effect varies between different types of dopants and oxide supports.
Given the impact that metal/support interactions can have on catalyst morphology and activity, it is clear that identifying physical descriptors for predicting interaction trends will be of high value. Toward this end, Campbell and Sellers 47 proposed that the adsorbed metal's enthalpy of oxide formation computed relative to the isolated metal atom (ΔH f,ox,atom ) should be an effective descriptor for predicting metal adsorption energies on oxide surfaces; ΔH f,ox,atom captures the bonding strength between the metal atom and oxygen, and it should therefore correlate with the metal's binding strength to oxide surfaces. They demonstrated a linear correlation between ΔH f,ox,atom and metal adhesion energies measured by adsorption calorimetry, thus deriving a simple model for predicting metal adsorption energies based on readily available reference data 27 . Using both isothermal titration calorimetry and DFT, Strayer et al. 30 verified that ΔH f,ox,atom is also an effective descriptor for predicting metal adsorption energies on HCa 2 Nb 3 O 10 perovskite surfaces. Although successful in these cases, identifying physical descriptors purely from chemical intuition is not a simple task. Complex charge transfer between the metals, the oxide supports, and surface modifiers (i.e., adsorbates and dopants) is challenging to describe with closed physical forms motivated from chemical intuition alone, as multiple phenomena are occurring simultaneously. As such, in this work we apply SL to identify useful physical descriptors that capture charge transfer at complex, multicomponent interfaces. We choose MgO for our initial study to demonstrate the effectiveness of the SL approach, since charge transfer characteristics can be anticipated from the non-reducible characteristics of MgO. Although we do not expect all surface modifications of MgO studied here to be experimentally feasible because dopants/ adsorbates can generate charge-compensating defects that would counteract charge transfer to the adsorbed metal 45 , the idealized nature of pristine MgO is a desirable testbed for evaluating the performance of various SL approaches and for identifying physical descriptors to predict how charge transfer affects metal adsorption in the idealized case. Once the strengths and weaknesses of the SL approaches are known, then we can pursue studies that use well-chosen SL approaches to understand more complex, but experimentally realizable, oxide systems containing defects.
Multiple feature selection (FS) methods have been introduced in the materials community to identify physical descriptors from extremely large sets of candidate descriptors. FS methods are preceded by gathering the chemical properties of the system's components from various databases and using these properties to generate a large feature space of candidate descriptors. This step, referred to as feature engineering, is achieved by applying a series of mathematical operations on all descriptor pairs to enumerate millions (or even billions) of possible descriptor combinations and functional forms. FS methods are then applied to identify the features from this pool that are the strongest predictors of the property of interest (Fig. 1). There is a rich menu of FS methods in SL, ranging from traditional principle analysis component (PCA) 48 or kernel ridge regression (KRR) 49,50 to large scale methods, such as least absolute shrinkage and selection operator with l 0 norm regression (LASSO + l 0 ) 51,52 or sure independence screening and sparsifying operator (SISSO) 53,54 . The success of these FS approaches in the materials community is demonstrated by many examples 55,56 , including the work of Ghiringhelli et al. 51 , who used LASSO + l 0 regression to identify physical descriptors that can predict crystal structures based on material composition. Similarly, Andersen et al. 54 used SISSO to derive descriptor subsets that can predict molecular adsorption energies on metal alloys. Alternative to these frequentist approaches, Bayesian FS has emerged in SL as a primary tool that provides a coherent and principled framework to quantify uncertainties [57][58][59][60] . In addition to inheriting the advantage of Bayesian methods to conveniently incorporate domain knowledge via prior distributions, state-of-the-art Bayesian FS is able to adapt to unknown sparsity levels in the feature space 61,62 and to achieve automatic multiplicity correction 63,64 . In this work, we will explore the performance of both LASSO-based FS methods and state-of-the-art Bayesian FS methods 23,65 for finding descriptors that can predict changes in metal binding energy caused by MgO surface modification.
In the present work, DFT binding energies of single metal atoms on modified MgO surfaces are used to train our SL models. Training and validation sets are built from data collected for MgO (100) surfaces that are modified by either adsorbates or dopants. MgO(100) is an irreducible oxide that, in its pristine form, binds TMs weakly compared to other supports 52 . It therefore is ideal for investigating the effects of support modifications, as changes in its binding interaction with the supported metal can be attributed solely to the effects of the surface modification. Indeed, MgO has been used as a template for many investigations of metal/support interactions, such as demonstrating how the oxidation state of gold varies upon surface modification 39,66 and how linear scaling relationships can be derived for a wide variety of adsorbates at Au/oxide interfaces 67 . We modified the MgO substrate with surface dopants and adsorbates to induce electron-poor and electron-rich conditions, where we find that both lead to an enhancement of the metal binding energies due to an increase in charge transfer between the metal and the support, as expected from the irreducible nature of pristine MgO.
We identify descriptors for predicting the effects of surface modifications by using multiple FS methods: LASSO 68 , Dirichlet-Laplace prior 23 , and Horseshoe prior 65 (Fig. 1). The methods are applied to identify physical descriptors for predicting the enhancement of metal binding energies on modified MgO based on readily available chemical properties of the system's components (i.e., the adsorbed metal, the MgO surface, and the adsorbate or dopant). The selected feature spaces are then refined with l 0 norm 51 regression to build predictive models that describe single metal atom binding on MgO surfaces. We find that Dirichlet-Laplace prior shows outstanding FS properties, as it mitigates many disadvantages evident in the performance of LASSO and Horseshoe prior. We evaluate the transferability of the  identified features of each method by applying them to predict changes in metal adsorption over CaO(100), BaO(100), and ZnO (100) surfaces. We find that features identified by Dirichlet-Laplace prior using the MgO training data are effective descriptors for predicting metal binding on these irreducible surfaces as well, which again demonstrates the robust nature of Dirichlet-Laplace prior because no CaO, BaO, and ZnO data were included in the FS procedure. It also demonstrates the transferability of SL-derived features within oxide families, which widens model applicability to systems that were not used explicitly to generate the training set.

Effects of dopants and adsorbates on metal adsorption energy
Our objective is to build a model that captures the relationship between metal adsorption energy and readily available chemical properties of the adsorbed metals, MgO, dopants, and adsorbates. A range of TMs (Ag, Au, Cd, Co, Cr, Cu, Fe, Ir, Mn, Mo, Nb, Ni, Pd, Pt, Rh, Ru, V, W, and Zn) were adsorbed on MgO(100) surfaces to generate our DFT training sets. The metal adsorption energy (ΔE ads ) is calculated using Eq. 1, where E Metal/Surface is the total DFT energy of the metal atom adsorbed on the MgO(100) surface at its most stable site, E Metal is the total DFT energy of the isolated metal atom, and E Surface is the total DFT energy of the clean MgO(100) surface.
There are four unique adsorption sites on the MgO(100) surface: an atop site on O, an atop site on Mg, a bridge site between O and Mg, and a hollow site. Adsorption at all four sites was tested to find the most favorable adsorption site for each metal. The atop site above the oxygen anion was found to be the most stable adsorption site for every metal, in agreement with previous reports 52, 69,70 . A representative structure of Au adsorbed on MgO (100) is shown in Fig. 2; similar geometries were obtained for all other metals.
We modified the MgO(100) surface by introducing dopants and adsorbates to enhance the reducibility of the support, which we find generally increases metal adsorption strengths. We first considered H and OH adsorbates, which are commonly encountered species under catalytic reaction conditions. Adsorption of H and OH on four unique surface sites was tested to identify the most stable site, which we found was the atop site on surface O for H and the surface hollow site for OH. The metal atoms were then placed on the optimized structures at the atop O site and then the entire system was optimized to calculate the metal binding energies. A representative structure of Au adsorbed on the OH-MgO(100) surface is shown in Fig. 2; other adsorbatemodified surfaces are shown in Supplementary Fig. 1. F and NO 2 were also introduced to expand the DFT training data, where F has a higher electron affinity (EA) (3.40 eV) than OH (1.83 eV) and thus is expected to yield to a stronger metal binding enhancement compared to OH. NO 2 was included because it can readily act as a neutral, positive, or negative species based on the chemical environment, thus diversifying the training set. Each metal atom was placed as far from the adsorbate as possible within the cell to prevent direct interaction between the metal and the adsorbate. This isolates the adsorbate's impact on the properties of the support from effects related to direct metal-adsorbate interactions. Ag, Nb, and Ru formed direct bonds to NO 2 during all optimization attempts, so their binding energies were excluded from the data set so that only surface-mediated effects were considered. Doping the oxide surface alters the electronic structure of the oxide in a manner similar to the adsorbates 37,39,[44][45][46] . Li, Na, B, and Al were chosen as dopants in this study, as they contain either one fewer or one additional valance electron compared to Mg. The dopants replaced one Mg atom in the first layer of the oxide (Fig. 2c, f).
All modifications of the MgO(100) surface enhanced charge transfer between the surface and the supported metal, as shown by the charge distributions in Fig. 3. When H is adsorbed on Au/ MgO(100), one electron is transferred from H to the metal adatom (Fig. 3b). Conversely, OH is electron withdrawing causing charge to be drawn away from Au (Fig. 3c). For surface dopants, Al has one more valence electron than Mg and donates charge to the adsorbed Au atom (Fig. 3d). Na reverses this trend, causing Au to donate electrons to balance the charge depletion in the surface (Fig. 3e). There is a clear similarity in the effects of surface dopants and adsorbates, and therefore we broadly classify surfaces with additional electrons compared to pristine MgO as electron-rich and surfaces with fewer electrons as electron-poor.
Metal adatom oxide formation enthalpy (ΔH f.ox,atom ) is a widely used descriptor for predicting metal adsorption on different oxides 27,30,47,52 . However, we found that this descriptor was not sufficient for predicting changes in adsorption energy on the  Fig. 3, as it only captures how easily the metal can be oxidized by donating its electrons to the surface and fails to represent situations where electrons are donated to the metal (i.e., as seen in Fig. 3b, d for Au on electronrich surfaces). The enhancement in metal adsorption energy in such situations is largely controlled by the ability of the adsorbed metal to accept excess charge from the modified surface, where ΔH f.ox,atom is no longer a suitable descriptor because it only captures the ability of the metal to donate electrons. Therefore, we propose that the ionization potential (i.e., IE 1 ) and EA can be used as broad descriptors for predicting the change in metal adsorption energy (Δ(ΔE ads ), as defined by Eq. 2) caused by bidirectional charge transfer.
We plot the EA and IE 1 against Δ(ΔE ads ) for electron-rich and electron-poor surfaces in Fig. 4. The correlation is separated into two categories: a negative correlation for electron-rich surfaces SL for identifying physical descriptors Although useful for predicting broad trends, EA and IE 1 are limited in their ability to quantitatively describe metal binding on  modified MgO. This is evident in the scatter seen in Fig. 4. Quantitative descriptors that can unravel the effects of multiple charge transfer phenomena simultaneously will likely require more complex functional forms than what is captured by the simple electronic properties. In this section, we identify such descriptors for MgO using FS via LASSO, Horseshoe prior, and Dirichlet-Laplace prior. The feature spaces were also tested for transferability to CaO(100), BaO(100), and ZnO(100) in the next section, where DFT data for the additional oxide surfaces were not included in any training data used for feature selection. We find that Dirichlet-Laplace prior yields the best models, both in terms of lowest error and highest transferability.
We show the performance of the selected descriptor sets in Figs 5 and 6 using 50 trials of randomly separated training/validation data sets for dopant-and adsorbate-modified MgO, respectively. We use the l 0 norm method to refine each model by systematically decreasing the number of descriptors in the model (i.e., the model dimension corresponds to the number of descriptors in the model, nD). The root mean square error (RMSE) is calculated by Eq. 3 to quantify the prediction accuracy, where y i is the ith predicted value, b y i is the ith estimated value, and n is the number of predicted values.
The model with trained coefficients is then applied to the validation set, and the dimension that provides the lowest validation RMSE is used to determine the final predictive model. RMSE for the final models derived from 50 randomly chosen training set points is shown in Fig. 5a-c and Fig. 6a-c for dopantand adsorbate-modified MgO, respectively. Although LASSO tends to select the most descriptors, which will be shown in the "Methods" section, the derived models are less accurate than the ones built from the Bayesian methods. In addition, the total data set derived from Horseshoe prior contains less than five descriptors on average using either dopant-modified or adsorbate-modified MgO data, which is much smaller in size than the total data sets derived by the other two methods. Although Horseshoe prior constructs more accurate models than LASSO, it is highly aggressive in its selection and generates a very small pool of candidate features that are available for refining the model. Dirichlet-Laplace prior exhibits the best performance, as it is a balance between the stringent selection tendencies of Horseshoe prior and the permissive selection tendencies of LASSO. Although Horseshoe usually performs similarly to Dirichlet-Laplace (Supplementary Fig. 4) at a given model dimension, the model cannot be further improved because we rapidly exhaust the pool of candidate descriptors. Conversely, Dirichlet-Laplace prior can be systematically improved by adding more descriptors until the validation error of the test set increases, thus allowing us to balance model accuracy with model simplicity. The predictive performance of these models is shown by the parity plots in Figs 5d-f and 6d-f, where each plot was generated with a representative training set that yields a training RMSE close to the median of each method (i.e., the same training set was used for all three methods to ensure comparability). The corresponding training and validation RMSE of these plots is labeled as an orange diamond in Figs 5a-c and 6a-c. The performance of the models is improved compared to our previous work on clean oxides using LASSO + l 0 52 , where in that work we achieved an RMSE of 0.41 eV on the training set using five descriptors to predict 92 data points (i.e., compared to errors of 0.26 and 0.25 eV on training sets with five descriptors for the dopant and adsorbate data in Supplementary Fig. 4, respectively). Tables 1 and 2  Although some terms in these models may be interpreted, this is in general a difficult task given the fact that interrelated phenomena controlling charge transfer cannot be described by a simple physical framework 71 .
Robustness and transferability of the selected descriptors The descriptors identified in the previous section were applied to predict Δ(ΔE ads ) on various surfaces to test feature transferability and robustness. The metals were placed on CaO(100), BaO(100), and ZnO(100) doped with the same dopants as used for MgO (Al, B, Li, and Na). We used the SL-derived descriptor sets from the MgO training data to predict the behavior on the abovementioned modified surfaces, where we use l 0 norm regression to refine the model dimensionality (Fig. 7). Note that this entails refitting the coefficients in front of each descriptor, so here we are only testing the transferability of the descriptors and not the transferability of the entire model. The performance of features derived from 50 trials of randomly separated MgO training sets are used to verify feature transferability, as shown by the box plots in Fig. 7. Since Horseshoe typically selects less than five descriptors, we restrict all of the predictive models to have at most five descriptors for comparison in Fig. 7a-c, as indicated by a dashed line in Fig. 7d-f. Comparing RMSE in Fig. 7a- Supplementary Fig. 5, which shows that LASSO has higher average prediction errors than Dirichlet-Laplace when comparing models derived with the same training set and the same model dimension. Thus, Dirichlet-Laplace shows the highest transferability among the SL methods. In summary, we find that descriptors identified using Dirichlet-Laplace prior for one oxide are applicable to other related oxides, demonstrating the robust nature of the selected features and their transferability outside the training set used for feature selection. We note that the RMSE for all of the testing oxides with the same feature space is generally similar to the MgO models.

Stability of modified MgO(100) surfaces
In the previous sections we evaluated the performance of various SL approaches for predicting the behavior of idealized MgO surfaces, where it is assumed that the dopants or adsorbates do not induce accompanying defects to compensate the additional charge and therefore charge transfer to the adsorbed metal atom was forced to occur. In real systems there inevitably will be a tendency for charge-compensating defects to compete for this charge transfer, which will be examined in this section. Numerous experiments demonstrate that the presence of the adsorbed metal can in certain cases suppresses the tendency to form charge-compensating defects in these irreducible oxides [73][74][75] . Shao et al. 44 showed that the charge transfer from Mo dopants in CaO enhances Au binding strength because excess electrons from Mo are transferred to the electronegative Au atoms. This effect was also present, but less pronounced, on Cr-doped MgO(100), where Stavale et al. 45 found that the enhancement of Au binding was inhibited by charge-compensating Mg vacancies. In addition to MgO, Tran et al. 76 induced Fe as dopants in ZnO to modulate the Pt/ZnO interaction. The X ray photoelectron spectroscopy analysis of Pt 4f core levels revealed the change in the oxidation state of Pt, which is direct evidence of the charge transfer between Fe and Pt. The enhanced metal-support interaction in this system stabilizes Pt nanoparticles and suppresses metal sintering during CO oxidation. For instance, the average Pt particle size increased from 2.2 to 5.7 nm on the bare ZnO during CO oxidation but only increased from 2.4 to 3.2 nm on the Fe-doped ZnO. Furthermore, the turnover frequency was raised from 0.60 to 5.37 s −1 with the assistance of the Fe dopants. Thus, the predictive model developed in this work is relevant for screening system compositions where one can expect significant charge transfer to induce an enhancement of the metal binding energy. Once identified, the stability of the required surface modification can be assessed, by computing free energies of formation as demonstrated below, to determine if the candidate system is a viable target for experimental synthesis attempts.
Here we compute phase diagrams for the formation of chargecompensating O or Mg vacancy defects in electron-poor (Nadoped and OH-modified) and electron-rich (Al-doped and Hmodified) surfaces, respectively (Fig. 8). We expanded the simulation cell to twice the size of the original unit cell to accommodate multiple dopants in the surface ( Supplementary  Fig. 6) Fig. 7) and the pristine MgO (100) surface, E surface is the total DFT energy of the modified surface without any charge-compensating defect, T is the temperature, and P O2 is the partial pressure of oxygen molecule. µ O2 is the chemical potential of an oxygen molecule in the gas phase, which is computed by the total DFT energy of the O 2 molecule with enthalpy and entropy corrections provided by the NIST webbook 77 . We used the energy of a small supported MgO cluster with two MgO formula units as the reference for Mg vacancy formation in Eq. 5 because the formation of this small cluster will be the first nucleation step once Mg atoms leave the MgO lattice. The energy of bulk MgO can also be used as the reference, which will systematically shift all Mg vacancy formation energies to be more negative by 2.59 eV. This will not impact the relative difference in vacancy formation energy when comparing the clean surfaces to the surfaces with adsorbed metal atoms.
We use the above equations to compute the boundary in (P O2 , T) space where the defect formation energy is zero, which is used to generate the phase diagrams shown in Fig. 8. In each case, we compute the defect formation boundary on the clean surface without any adsorbed metal, and then we compare it to the boundary on surfaces with the adsorbed metals. We find that the metal atom on modified MgO generally inhibits defect formation because the metal atom acts as an electron source or sink to alleviate the excess surface charge caused by the dopant or adsorbate. The extent of the boundary shift correlates with IE 1 and EA for electron-poor and electron-rich surfaces, respectively, which indicates that it is controlled by the ability of the adsorbed metal to donate or accept excess charge ( Supplementary Fig. 8).
Comparing vacancy formation energy of modified MgO(100) surfaces (the dashed lines in Supplementary Fig. 8) to the pristine surfaces (Supplementary Table 2), it is evident that the surface modification will lead to compensating defects, as discussed above. However, the favorability of compensating defect formation will be suppressed by the adsorbed metals. The suppression of the area of the phase diagram in which a compensating defect would occur, as seen in Fig. 8, suggests that it may thermodynamically be feasible to prepare some of the adsorbed metal/ modified MgO surfaces considered in this work if the surface modification and metal adsorbate are introduced simultaneously before charge-compensating defects can form.

DISCUSSION
We have shown the capability of SL methods for constructing predictive models for computing Δ(ΔE ads ) in oxide-supported SAC systems. The SL methods yield useful relationships between the fundamental chemical properties of dopants/adsorbates and overall metal/support interactions, which are often used to control surface morphology but in many cases are not wellunderstood 31,[36][37][38]41,[44][45][46] . The resulting models provide an estimation tool for quantifying how metal binding energy can Fig. 7 Feature transferability. Box plots illustrate RMSE for a CaO, b BaO, and c ZnO applying the MgO-derived features. We note that RMSE is obtained from models with five descriptors even if the method selects more than five descriptors to have a fair comparison between the SL methods because Horseshoe prior usually selects less than five features (Fig. 10). Comparison of models for describing d CaO, e BaO, and f ZnO data built from features selected by LASSO, Horseshoe prior, and Dirichlet-Laplace prior using the MgO training data in Fig. 5. RMSE is computed using the l 0 norm method to refine the total descriptor set identified with each method to a model containing the number of descriptors shown on the x-axis. The center line, upper bound, lower bound, upper whisker, and lower whisker in box plots represent median, 75th percentile, 25th percentile, maximum, and minimum, respectively. be tuned through surface modification. This in turn can provide guidelines for designing and controlling the particle size of the TMs (e.g., by finding modifications that stabilize single adsorbed metal atoms relative to the bulk metal). Indeed, we predict that Mn may form stable SACs on Li-doped and Na-doped ZnO surfaces because the formation energy of single Mn atoms on these surfaces is negative relative to the formation of bulk Mn (Supplementary Section 1). We also introduced two state-of-theart Bayesian methods that have several advantages over other common FS methods (e.g., PCA, KRR, LASSO, SISSO, etc.), such as flexible prior distributions, providing automatic error estimation on selected features, adaptivity to arbitrary sparsity in the feature space, and the ability to utilize the full data set for inference without requiring cross validation.
The robustness of our methodology was verified by applying the final predictive models to validation data sets, as well as by applying the selected features to CaO, BaO, and ZnO surfaces. The transferability of descriptors reveals the power of the Dirichlet-Laplace prior method, since it can use training data on one particular oxide to identify features that are applicable to similar oxides. We are currently exploring combined FS approaches based on training data derived from multiple oxides to build models that are fully general within oxide families. This will include extensions to more reducible oxides, where it is expected that the parent metal of the oxide will play a more direct role as a center of charge transfer. For example, our previous study 52 showed that metal-metal and redox interactions can occur on highly reducible oxides, such as CeO 2 . The parent metals in the support are expected to interact with dopants and adsorbates as well, which will further complicate the nature of charge transfer. Finding suitable descriptors from chemical intuition alone will be nearly impossible for such systems, but likely will be feasible using the SL tools developed in this study.
Transition metal adsorption on MgO is enhanced by surface doping and adsorbate modification, which can be predicted using physical descriptors identified by SL (namely, LASSO, Horseshoe prior, and Dirichlet-Laplace prior methods). In particular, MgO can be transformed by dopants or adsorbates to exhibit either electron-rich or electron-poor characteristics. The extent of charge transfer between metal and modified MgO surfaces correlates with the ionization potential and the EA of the metal, but scatter in these correlations preclude quantitative prediction. Therefore, we used SL to derive predictive models that go beyond these simple descriptors, which were built from the fundamental chemical properties of the adsorbed metals, Mg, O, dopants, and adsorbates. Multiple FS tools, including state-of-the-art Bayesian methods, were applied together with l 0 norm regression to derive simple and effective models for predicting metal adsorption on modified MgO surfaces. Of the tested SL methods, we found that Dirichlet-Laplace prior yields the most accurate and transferable models. The descriptors identified for MgO(100) by Dirichlet-Laplace prior were also shown to be effective for estimating metal binding on CaO(100), BaO(100), and ZnO(100) surfaces with comparable accuracy, which demonstrates the robust nature of this FS approach and the transferability of our models to related oxides.

DFT calculations
Metal adsorption energies were calculated with DFT using the Vienna ab initio simulation package (VASP 5.4.4) 78 . The Perdew-Burke-Ernzerhof exchange-correlation functional 79 was applied with spin polarization and the projector augmented-wave method 80 was used to treat core electrons with VASP default potentials 81 . The valence electrons treated selfconsistently for each atom type are listed in Supplementary Table 3. Planewave basis sets were expanded to a kinetic energy cutoff of 600 eV. Gaussian smearing was employed with a smearing width of 0.05 eV. A Monkhorst-Pack (MP) 82 k-point mesh was used with 4 × 4 × 4 sampling on the bulk MgO structure and 3 × 3 × 1 sampling on the (2 × 2) MgO(100) surface models. The Grimme empirical dispersion correction was used to treat van der Waals dispersion 83 . Geometries were optimized to a convergence criterion of 0.05 eV Å −1 . Metal binding energies computed with a force convergence criterion of 0.05 eV Å −1 were found to be nearly identical to those computed with a tighter criterion of 0.01 eV Å −1 (Supplementary Table 4).
Surface models contained four layers of MgO(100) with all layers relaxed to avoid spurious surface dipoles that can arise from frozen layers in oxide surfaces 84 . The vacuum distance between layers in the direction perpendicular to the surface was at least 15 Å to avoid interactions between MgO slabs, and a dipole correction 85 was applied perpendicular to the MgO(100) surface. Calculations on CaO(100), BaO(100), and ZnO (100) surfaces followed the same settings as those applied for the MgO (100) surfaces, except the MP k-points sampling is 5 × 5 × 1 for ZnO(100). The ground state energy of single metal atoms is computed in a 15 × 16 × 17 Å 3 unit cell, which was our reference for computing binding energies. The magnetization states and energies of the isolated metal atom are listed in Supplementary Table 5. Multiple magnetization states were tested when considering metal adsorption on the oxide surfaces, examining all probable spin configurations as listed in Supplementary

Statistical learning
Our previous study 52 applied LASSO + l 0 51 to derive descriptors that can predict metal adsorption energies across a wide range of metal/oxide pairs. While successful in that work, in the present study we found that LASSO + l 0 was limited in its ability to handle correlated features when we introduced properties of the surface modifiers in the feature space (vide infra). As such, here we apply two additional state-of-the-art Bayesian methods for feature selection, Horseshoe prior 65 and Dirichlet-Laplace prior 23 , that are implemented alongside the LASSO 68 approach for comparison. Bayesian methods are particularly well-suited for FS because they allow users to incorporate domain knowledge when applicable, they offer uncertainty quantification, and, most notably, they provide a coherent framework to infer model probabilities. Modern Bayesian FS methods can adapt to unknown sparsity levels in the feature space and perform well when handling correlated features 88 . Bayesian methods operate by first specifying an initial distribution of values for each descriptor coefficient (i.e., a prior distribution) and then updating this distribution based on available training data through Bayes' theorem (i.e., solving for the posterior distribution). These methods offer flexibility through the choice of the prior distribution, as different prior distributions will yield different FS characteristics. The DFT training data for FS in this work consisted of metal atom binding energies on MgO surfaces that were modified by either dopants or adsorbates, where FS was separate for "dopant" and "adsorbate" data sets. Details regarding the implementation of each FS method are provided in the Supplementary Information. Primary descriptors. Our feature space was built from a primary descriptor set that contained chemical properties of the adsorbed metals, the parent atoms in the oxide surface (i.e., Ba, Ca, Mg, O, and Zn), and dopants that were available in the CRC Handbook of Chemistry and Physics 89 . These atomic properties include the atomic number (Z), electronegativity in Pauling and Martynov-Batsanov 90 scales (EN P and EN MB ), first and second ionization energy (IE 1 and IE 2 ), EA, standard sublimation enthalpy (ΔH sub ), standard molar enthalpy of oxide formation of the metal adatoms (ΔH f,ox, bulk ), standard molar enthalpy of formation of the adsorbed metal's most stable oxide referenced to the isolated metal atom (ΔH f,ox,atom ), Zunger and Cohen orbital radii of s and p orbitals (zr s and zr p ) 91 , Waber and Cromer orbital radii of s and p orbitals (wr s and wr p ) 92 , number of valence electrons (NVal), Miedema metal alloy formation parameters (η 1/3 and ϕ) 93 , and absolute electronegativity (AEN) 94 . The following data were not available in CRC Handbook of Chemistry and Physics and instead were taken from the provided references: IE 2 of Ir 95 , EA of Cd, Mg, Mn, and Zn 96 , and ΔH f,ox,bulk of Au and Pt 47 . Following our previous notation scheme 52  , and EA d ). For adsorbate-modified systems, we also included EN P , EN MB , IE, EA, NVal, coordination number between adsorbates and the MgO(100) surface (CN), and bond dissociation energy (BD). We define BD as the binding energy between the atom in the adsorbate and the atom in the oxide surface to which the adsorbate binds (e.g., O-H for H*, Mg-O for OH* and NO 2 *, and Mg-F for F*, where * indicates a species adsorbed on the surface). The EN of adsorbates is taken as the EN of the atom in the adsorbate that is attached to the support. We also considered polarization inside the adsorbed molecules by including electronegativity differences (ΔEN P and ΔEN MB ) between the different atoms in multi-atom molecules (e.g., |EN N -EN O | for NO 2 and |EN H -EN O | for OH). All data described above are provided in the Supplementary Information.
Feature selection. We applied LASSO 68 , Horseshoe prior 65 , and Dirichlet-Laplace prior 23 methods, implemented in R version 3.6.0 on Linux 97 to identify correlations between the chemical properties of the system's components and the enhancement in metal binding energy that results from surface modification. There are a total of 76 and 73 calculated binding energies for dopant-modified and adsorbate-modified MgO(100), respectively. A total of 52 data points are randomly selected from this set to build a training set used to develop the predictive model, while the remaining data points are isolated from all SL procedures to use as a validation set. The feature engineering procedures used to generate the feature space of candidate descriptors are summarized in Fig. 9. First, we expanded our feature space of candidate descriptors by applying a series of mathematical operators (described in detail in the Supplementary Information) on the set of primary descriptors described in the previous section. This procedure introduces secondary descriptors that can capture nonlinear correlations between the fundamental properties of each component in the system and the metal binding energy. All categorical Fig. 9 Feature engineering and feature preselection procedures. Descriptor sets are indicated in orange boxes, feature engineering steps are indicated in green boxes, feature preselection steps are indicated in blue boxes, and the size of the feature space at each step is indicated in black text. There were seven and ten categorical descriptors for doped and adsorbate-modified MgO surfaces, respectively. descriptors (i.e., descriptors for which all elements in the feature vector fall in a common category; see Supplementary Information for detailed explanation and definition of categorical descriptors) in the secondary descriptor set were converted into numerical descriptors using dummy variables so that they can be treated with linear methods 98 . The numerical descriptors from the secondary descriptor set were then cross-multiplied to mix different properties, which builds an engineered feature space containing~10 6 descriptors. We prescreened the feature space by ranking the Pearson coefficient of correlation to identify the top 1,000 descriptors. We found that this data preprocessing step stabilizes the subsequent analysis, as well as improves calculation speed. We formed the final ternary feature space by adding the transformed categorical descriptors back into the secondary feature space, where all descriptor sets were normalized. All feature engineering procedures and the resulting feature space subsets are provided in the Supplementary Information. The algorithms of Dirichlet-Laplace prior and Horseshoe prior are summarized in Supplementary Tables 16 and 17, respectively.
The FS process yields a reduced feature space with an order of~10 descriptors. To build the final model, we systematically test the performance of all combinations of features in this reduced space, where the size of the model (i.e., number of features in the combination) is predetermined, and select the combination that yields the lowest RMSE. We then systematically decrease the number of features in the final model until we reach a model with only one descriptor. This strategy, called l 0 norm regression, helps to systematically test the performance of the model with respect to the number of tunable parameters. Given the selected descriptors and the corresponding coefficients, we use the validation set to calculate the validation error to determine whether the predictive model is overfit, which occurs when the error of the validation set begins to rise as more descriptors are added to the model. We choose the final model size by selecting the model that yields the lowest error on the validation data set, thus ensuring that the model is not overfit. To test the FS processes, we applied the l 0 norm regression approach to evaluate feature spaces selected from MgO data on data derived from CaO, BaO, or ZnO modified with the same dopants (Al, B, Li, and Na). Each of these sets contained 52 data points for each oxide.

Evaluation of FS methods using synthetic data
It is crucial to understand the properties of each SL method by evaluating its performance characteristics on simulated data sets, where the ground truth is known a priori. This was done by populating a simulated data set with 100 observations and a feature space of 1,000 candidate descriptors, out of which ten descriptors truly correlate with the response. We repeated the simulation 100 times, each generating a random data set following the same model. We evaluated each method by reporting the average number of true positives (i.e., descriptors that truly correlate with the training set and are selected by the method), true negatives (i.e., descriptors that do not correlate with the training set and are not selected by the method), false positives (i.e., descriptors that do not correlate with the training set but are selected by the method), and false negatives (i.e., descriptors that correlate with the training set but are not selected by the method). This gives us insight into the characteristics of each SL approach. Figure 10a shows the simulation result for a candidate feature space that has an average feature correlation of ρ = 0.5 (i.e., a test case with a high rate of correlation among the candidate features). As seen in the figure, LASSO suffers from a high rate of false positives when candidate features are highly correlated. This issue is largely mitigated by the Bayesian methods, which motivates their use in this work. Although Horseshoe prior shows a low rate of false positive selection, it also misses many true descriptors evident from its high rate of false negatives. Simulation tests demonstrate that the high rate of false positives in LASSO and false negatives in Horseshoe is consistent across various correlation parameters (see Supplementary Information for simulations with ρ = 0, ρ = 0.5, and ρ = 0.9 in Supplementary Tables [18][19][20], which suggests that this behavior will also be present given the expected range of correlation in our feature space. This is verified here by showing the number of features selected by each method on our MgO data set (Fig. 10b). Horseshoe selects the fewest descriptors in both dopant-modified and adsorbate-modified data sets, which suggests that Horseshoe is the most aggressive FS methods. We note that LASSO does not always select the most descriptors, but its performance is mostly worse than Dirichlet-Laplace. The feature spaces selected based on the "real" MgO data are analyzed in the "Results" section. Fig. 10 Behavior and performance of feature selection methods. a Simulated average number of descriptors selected by each statistical learning method over 100 trials with 100 observations, 1000 candidate descriptors, and a correlation of ρ = 0.5 between the descriptors. b The average number of descriptors selected by each method using 50 randomly separated training set of metal binding energies on dopantmodified or adsorbate-modified MgO surfaces. The error bars represent the range of one standard deviation.