## Abstract

Single-atom-alloy catalysts (SAACs) have recently become a frontier in catalysis research. Simultaneous optimization of reactants’ facile dissociation and a balanced strength of intermediates’ binding make them highly efficient catalysts for several industrially important reactions. However, discovery of new SAACs is hindered by lack of fast yet reliable prediction of catalytic properties of the large number of candidates. We address this problem by applying a compressed-sensing data-analytics approach parameterized with density-functional inputs. Besides consistently predicting efficiency of the experimentally studied SAACs, we identify more than 200 yet unreported promising candidates. Some of these candidates are more stable and efficient than the reported ones. We have also introduced a novel approach to a qualitative analysis of complex symbolic regression models based on the data-mining method subgroup discovery. Our study demonstrates the importance of data analytics for avoiding bias in catalysis design, and provides a recipe for finding best SAACs for various applications.

## Introduction

Recently, single-atom dispersion has been shown to dramatically reduce the usage of rare and expensive metals in heterogeneous catalysis, at the same time providing unique possibilities for tuning catalytic properties^{1,2}. The pioneering work by Sykes and co-workers^{2} has demonstrated that highly dilute bimetallic alloys, where single atoms of Pt-group are dispersed on the surface of an inert metal host, are highly efficient and selective in numerous catalytic reactions. These alloy catalysts are now extensively used in the hydrogenation-related reactions such as hydrogenation of CO_{2}, water–gas shift reaction, hydrogen separation, and many others^{3,4,5}. The outstanding performance of SAACs is attributed to a balance between efficiency of H_{2} dissociation and binding of H at the surface of metallic alloys^{2,6,7}.

Using desorption measurements in combination with high-resolution scanning tunneling microscopy, Kyriakou et al. have shown that isolated Pd atoms on a Cu surface can substantially reduce the energy barrier for both hydrogen uptake and subsequent desorption from the Cu metal surface^{2}. Lucci and co-workers have observed that isolated Pt atoms on the Cu(111) surface exhibit stable activity and 100% selectivity for the hydrogenation of butadiene to butenes^{8}. Liu et al. have investigated the fundamentals of CO adsorption on Pt/Cu SAAC using a variety of surface science and catalysis techniques. They have found that CO binds more weakly to single Pt atoms in Cu(111), compared to larger Pt ensembles or monometallic Pt. Their results demonstrate that SAACs offer a new approach to design CO-tolerant materials for industrial applications^{9}. To date, Pd/Cu^{10,11,12}, Pt/Cu^{7,8,9,13,14,15}, Pd/Ag^{12,16}, Pd/Au^{12}, Pt/Au^{17}, Pt/Ni^{18}, Au/Ru^{19}, and Ni/Zn^{20} SAACs have been synthesized and found to be active and selective towards different hydrogenation reactions. However, the family of experimentally synthesized SAACs for hydrogenation remains small and comparisons of their catalytic properties are scarce.

Conventional approaches to designing single-atom heterogeneous catalysts for different industrially relevant hydrogenation reactions mainly rely on trial-and-error methods. However, challenges in synthesis and in situ experimental characterization of SAACs impose limitations on these approaches. With advances in first-principles methods and computational resources, theoretical modeling opens new opportunities for rational catalyst design^{6,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48}. A general simple yet powerful approach is the creation of a large database with first-principles based inputs, followed by intelligent interrogation of the database in search of materials with the desired properties^{35,48}. Significant efforts have been made in developing reliable descriptor-based models following the above general approach^{6,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,48}. In catalysis, a descriptor is a parameter (a feature) of the catalytic material that is easy to evaluate and is correlated with a complex target property (e.g., activation energy or turnover frequency of a catalytic reaction). A notable amount of research has been devoted to searching for and using descriptors with a simple (near-linear) relation to target properties^{22,23,24,25,26,27,28,29,30}. For example, the linear relationship between the reaction energies and the activation energies is known as the Brønsted–Evans–Polanyi relationship (BEP) in heterogeneous catalysis^{29,30,45,46,47}. Also, the linear correlation between *d*-band center of a clean transition-metal surface and adsorption energies of molecules on that surface have been studied in great detail and widely applied^{22,23,24,36,44}. In catalysis, near-linear correlations between adsorption energies of different adsorbates are referred to as scaling relations^{26,28,37}. The advantages of such correlations are their simplicity and usually clear physical foundations. However, they are not exact, and there is an increasing number of studies focused on overcoming limitations imposed by the corresponding approximations^{6,31,32,33,34,38,39,40,41,48}. The nonlinear and intricate relationship between the catalysts’ properties and surface reactions at realistic conditions^{42,43} has held back the reliable description of catalytic properties. Note that, although the stability of SAACs is of no less significance in designing a potential catalyst than their catalytic performance, it hasn’t received the same level of attention.

In this work, combining first-principles calculations and compressed-sensing data-analytics methodology, we address the issues that inhibit the wider use of SAAC in different industrially important reactions. By identifying descriptors based only on properties of the host surfaces and guest single atoms, we predict the binding energies of H (BE_{H}), the dissociation energy barriers of H_{2} molecule (*E*_{b}), the segregation energies (SE) of the single guest atom at different transition metal surfaces, and the segregation energies in the presence of adsorbed hydrogen (SE_{H}). The state-of-the-art compressed-sensing based approach employed here for identifying the key descriptive parameters is the recently developed SISSO (sure independence screening and sparsifying operator)^{49}. SISSO enables us to identify the best low-dimensional descriptor in an immensity of offered candidates. The computational time required for our models to evaluate the catalytic properties of a SAAC is reduced by at least a factor of one thousand compared to first-principles calculations, which enables high-throughput screening of a huge number of SAAC systems.

## Results and discussion

The BE_{H} for more than three hundred SAACs are calculated within the framework of DFT with RPBE exchange-correlation functional. This large dataset consists of BE_{H} values at different low-index surface facets including fcc(111), fcc(110), fcc(100), hcp(0001), and bcc(110) and three stepped surface facets including fcc(211), fcc(310), and bcc(210) of SAACs with twelve transition-metal hosts (Cu, Zn, Cr, Pd, Pt, Rh, Ru, Cd, Ag, Ti, Nb, and Ta). On each TM host surface, one of the surface atoms is substituted by a guest atom to construct the SAACs. BE_{H} for pristine surfaces (where the guest atom is the same with the host metal) are also included. H atom is placed at different non-equivalent high-symmetry sites close to the guest atom (Supplementary Fig. 1), and the BE_{H} for the most favorable site is included in the data set. Complete information on adsorption sites and the corresponding BE_{H} is given in Supplementary Data 1. The BE_{H} are further validated by a comparison with previous calculations^{6,21}.

To better understand the variation in BE_{H} for different guest atoms, we first investigate correlation between BE_{H} and the *d*-band center of the *d* orbitals that are projected to the single guest atom for the alloyed systems. We find that this way of calculating *d*-band center provides better correlation with other properties than *d*-band centers for the *d* orbitals projected on (i) the single guest atom plus it’s 1st nearest neighbor shell or (ii) the whole slab^{50}. The correlation is shown in Fig. 1a (Supplementary Fig. 2) for different SAACs on Ag(110) host surface [Pt(111) host surface]. According to the *d*-band center theory^{21,23,36,44}, the closer the *d*-band center is to the Fermi level, the stronger the BE_{H} should be. However, it is evident from Fig. 1a (Supplementary Fig. 2) that the expected linear correlation, as predicted by the *d*-band model, is broken for SAACs for H adsorption. This is due to the small size of the atomic H orbitals, leading to a relatively weak coupling between H *s* and the TM *d*-orbitals^{21}. Furthermore, we check the validity of the BEP relations between the *E*_{b} and the H_{2} dissociation reaction energy for SAACs (Fig. 1b), which is commonly used to extract kinetic data for a reaction on the basis of the adsorption energies of the reactants and products^{29,45,46,47}. As shown in Fig. 1b, the highlighted SAACs inside the blue dotted circle significantly reduce *E*_{b} while reducing reaction energy only moderately. As a result, SAACs provide small reaction energy and low activation energy barrier, which leads to breaking BEP relations and thus optimized catalytic performance. The BEP relations are also found to be broken for other reactions catalyzed by SAACs^{6}.

Thus, the standard simple correlations (from *d*-band center theory and the BEP relations) fail for H adsorption on SAACs. Moreover, the calculation of the *d*-band center for each SAAC is highly computationally demanding, considering the very large number of candidates. These facts emphasize the necessity to find new accurate, but low-cost descriptors for computational screening of SAACs. In the SISSO method, a huge pool of more than 10 billion candidate features is first constructed iteratively by combining 19 low-cost primary features listed in Table 1 using a set of mathematical operators. A compressed-sensing based procedure is used to select one or more most relevant candidate features and construct a linear model of the target property (see Supplementary Methods for details on the SISSO procedure). Note that the three primary surface features are properties of the pure host surfaces (elemental metal systems). This is undoubtedly much more efficient than obtaining the properties of SAACs (alloyed metal systems). In the latter case, due to the interaction between the single guest atom and its images, a large supercell of the whole periodic system containing guest atom and host surface needs to be computed. On the contrary, only smallest unit cell is needed to compute the pristine surface features.

To test the predictive power of obtained models, we employ 10-fold cross validation (CV10). The dataset is first split into ten subsets, and the descriptor identification along with the model training is performed using nine subsets. Then the error in predicting properties of the systems in the remaining subset is evaluated with the obtained model^{51,52,53}. The CV10 error is defined as the average value of the test errors obtained for each of the ten subsets. In SISSO over-fitting may occur with increasing dimensionality of the descriptor (i.e., the number of complex features that are used in construction of the linear model)^{49}. The descriptor dimension at which the CV10 error starts increasing identifies the optimal dimensionality of the descriptor (details of the validation approach can be found in Supplementary Methods). For the optimal dimensionality, the same set of primary features is found during CV10 in 9, 8, and 8 cases for the SISSO models of BE_{H}, *E*_{b}, and SE, respectively. The root-mean-square errors (RMSE), together with the CV10 errors of the SISSO models for BE_{H}, *E*_{b}, and SE are displayed in Fig. 2a. The obtained optimal descriptor dimensionalities for BE_{H}, *E*_{b}, and SE of the SAACs are 5, 6, and 6, respectively. Distribution of errors for the best models versus RPBE results is displayed in Fig. 2b–d. The RMSE and maximum absolute error (MAE) of the models are also shown. The error distributions for all the lower-dimensional models relative to the best ones are displayed in Supplementary Figs. 4–6.

From the Table 2 one can see that the *d*-band center features DC, DC*, DT, DT*, DS, and DS* appear in every dimension of the descriptors for BE_{H} and *E*_{b}, consistent with the well-established importance of *d*-band center for adsorption at transition-metal surfaces^{21,23,36,44}. The cohesive energies of guest (EC) and host (EC*) bulk metals are selected in each dimension of the descriptor for SE. This is due to the fact that the segregation is driven by the imbalance of binding energy between host and guest–host atoms. Interestingly, most of the descriptor components include only simple mathematical operators (+, −, ·, /, ||), indicating that the primary features already capture most of the complexity of the target properties.

We employ the identified computationally cheap SISSO models to perform high-throughput screening of SAACs to find the best candidates for the hydrogenation reactions. The results for BE_{H}, *E*_{b}, and SE_{H} (the segregation energy when surface H adatom is present, where the H adatom induced segregation energy change is included, see the “Methods” part for details) of the flat surfaces are displayed in Fig. 3a–c (see Supplementary Fig. 7 for the results for the stepped surfaces, the values of BE_{H}, *E*_{b}, and SE_{H} for all the SAACs are given in Supplementary Data 1).

The choice of the screening criteria for the three properties BE_{H}, *E*_{b}, and SE_{H}, which are related to the activity and stability of SAACs, plays the central role in the screening processes and determines the candidates to be chosen. Previous work demonstrates that for the high performance in hydrogenation reactions, SAACs should exhibit weaker binding of H and lower H_{2} dissociation energy barrier simultaneously^{2}. However, different criteria are applicable for different reaction conditions. For example, at low temperatures SAACs can maintain their stability for a longer time. At higher temperatures H atoms will desorb from the surfaces and larger energy barriers can be overcome, resulting in a requirement for stronger binding and higher upper limit of the dissociation barrier *E*_{b}. Keeping this variability in mind, we consider temperature-dependent and pressure-dependent selection criteria (see “Methods” section below for details on the selection criteria). We have screened more than five thousand SAAC candidates (including about the same number of flat and stepped surfaces; the values of the primary features for all the candidates can be found in the Supplementary Data 2) at both low temperature (200 K) and high temperature (700 K) at partial H_{2} pressure *p* = 1 atm. We find 160 flat-surface SAACs (Fig. 3d, in green) and 134 stepped-surface SAACs (Supplementary Fig. 7d, in green) that are both active and stable at a low temperature (200 K). At a higher temperature (700 K), 102 flat-surface SAACs (Fig. 3d, in blue and green) and 136 stepped-surface SAACs (Supplementary Fig. 7d, in blue and green) are classified as promising SAACs for hydrogenation reactions. Moreover, we have identified the SAACs that are promising in a wide range of temperatures (green squares in Fig. 3d for flat surfaces and Supplementary Fig. 7d for stepped surfaces).

Note that, without the stability selection criterion based on SE_{H}, all experimentally established SAACs (Pd/Cu, Pt/Cu, Pd/Ag, Pd/Au, Pt/Au, Pt/Ni, Au/Ru, and Ni/Zn) are predicted to be good catalysts in the temperature range of 200 K < *T* < 700 K, which is further confirmed by DFT calculations. However, some of these systems (Pd/Ag and Pd/Au) are experimentally shown to have low stability^{12,16}. Thus, inclusion of the stability-related property SE_{H} is of immense importance for a reliable prediction of catalytic performance, as is confirmed by our results. We note that a machine-learning study on stability of single-atom metal alloys has recently been reported^{54}. However, our analysis takes into account effects of adsorbates on the segregation energy, which has not been considered previously. For example, the SE for Pd/Ag(110) and Pt/Ag(110) systems are 0.33 eV and 0.46 eV, respectively, implying that the Pd and Pt impurities tend to segregate into the bulk of the Ag(110) systems. However, SE_{H} for Pd/Ag(110) and Pt/Ag(110) systems are −0.10 eV and −0.21 eV, respectively, suggesting Pd and Pt impurities will segregate to the surface in the presence of H adatom. These results are also consistent with the experimental observations that the efficiency of Pd/Ag single-atom catalysts towards the selective hydrogenation of acetylene to ethylene was highly improved with the pretreatment of the samples under H_{2} conditions^{16}.

We define an activity (or efficiency) indicator involving both the free energy of H adsorption (∆*G*) and the energy barrier (*E*_{b}) as \(\sqrt {\Delta G^2 + E_{\mathrm{b}}^2}\) to construct an activity-stability map. As shown in Fig. 4, some of the new discovered candidates (bottom-left corner of activity-stability map) are predicted to have both higher stability and efficiency than the reported ones, making them optimized for practical applications (see Supplementary Fig. 8 for the results for the stepped surfaces). As expected, stability and activity are inversely related, which can be seen from the negative slope of the general trend in Supplementary Fig. 8 (showing selected materials) and Supplementary Fig. 9 (showing all explored materials), as well as a cut-off in population of the lower left-hand corner of these plots. Nevertheless, we have found several materials that are predicted to be better SAACs than the so-far reported ones. Considering stability, activity, abundance, and health/safety, two discovered best candidates Mn/Ag(111) and Pt/Zn(0001) are highlighted in Fig. 4. The aggregation energies for Mn/Ag(111), Pt/Zn(0001), and the experimentally established SAACs are also tested and displayed in Supplementary Table 9.

Although the SISSO models are analytic formulas, the corresponding descriptors are complex, reflecting the complexity of the relationship between the primary features and the target properties. While potentially interpretable, the models do not provide a straightforward way of evaluating relative importance of different features in actuating desirable changes in target properties. To facilitate physical understanding of the actuating mechanisms, we apply the subgroup discovery (SGD) approach^{55,56,57,58,59,60}. SGD finds local patterns in the data that maximize a quality function. The patterns are described as an intersection (a selector) of simple inequalities involving provided features, e.g., (feature1 < a1) AND (feature2 > a2) AND… . The quality function is typically chosen such that it is maximized by subgroups balancing the number of data points in the subgroup, deviation of the median of the target property for the subgroup from the median for the whole data set, and the width of the target property distribution within the subgroup^{60}.

Here, we apply SGD in a novel context, namely as an analysis tool for symbolic regression models, including SISSO. The primary features that enter the complex SISSO descriptors of a given target property are used as features for SGD (see Table 2). The data set includes all 5200 materials and surfaces used in the high-throughput screening. The target properties are evaluated with the obtained SISSO models. Five target properties are considered: \(\sqrt {\Delta G^2 + E_{\mathrm{b}}^2}\), SE, SE_{H}, *E*_{b}, |∆*G*|, and BE_{H}. Since we are interested mainly in catalysts that are active at normal conditions, ∆*G* is calculated at *T* = 300 K. Our goal is to find selectors that minimize these properties within the subgroup. Such selectors describe actuating mechanisms for minimization of a given target property. For SE, the following best selector is found: (EC* ≤ −3.85 eV) AND (−3.36 eV < EC ≤ −0.01 eV) AND (IP ≥ 7.45 eV). The corresponding subgroup contains 738 samples (14% of the whole population), and the distribution of SE within the subgroup is shown in Supplementary Fig. 10. Qualitatively, the first two conditions imply that the cohesive energy of the host material is larger in absolute value than the cohesive energy of the guest material. Physically this means that bonding between host atoms is preferred over bonding between guest atoms and therefore over intermediate host–guest binding. This leads to the tendency of maximizing the number of host–host bonds by pushing guest atom to the surface. We note that this stabilization mechanism has been already discussed in literature^{61}, and here we confirm it by data analysis. In addition, we find that stability of SAACs requires that the ionization potential of the guest atom is high. This can be explained by the fact that lower IP results in a more pronounced delocalization of the *s* valence electrons of the guest atom, and partial charge transfer to the surrounding host atoms. The charge transfer favors larger number of neighbors due to increased Madelung potential, and therefore destabilizes surface position of the guest atom.

We calculate SE_{H} using SISSO models for SE and BE_{H} [see Eq. (3) in the “Methods” section]. Therefore, SGD for SE_{H} is performed using primary features present in the descriptors of both SE and BE_{H}. The top subgroup contains features related to binding of H to the host and guest metal atoms, e.g., (EB* < −5.75 eV) AND (EH* ≤ −2.10 eV) AND (EH ≥ −2.88 eV) AND (IP* ≤ 7.94 eV) AND (IP > 8.52 eV) AND (R ≥ 1.29 Å). However, the distribution of SE for this subgroup is very similar to the distribution of SE_{H}, which means that the stability of guest atoms at the surface is weakly affected by H adsorption when guest atoms are already very stable at the surface. The important effect of H adsorption is revealed when we find subgroups minimizing directly SE_{H}—SE (in this case only primary features that appear in the SISSO descriptor of BE_{H} are considered for SGD analysis). The top subgroup we found contains 72 samples (1.4% of the whole population) and is described by several degenerate selectors, in particular (−2.35 eV ≤ EH* ≤ −2.32 eV) AND (EC* > −2.73 eV) AND (EC < −5.98 eV) AND (*H* ≥ −5.12 eV). This is a very interesting and intuitive result. Distributions of SE_{H} and SE for this subgroup are shown in Supplementary Fig. 11. The SE for all materials in the subgroup is above 0 eV. However, SE_{H} is much closer to 0 eV, and is below 0 eV for a significant number of materials in this subgroup. The conditions on the cohesive energy of guest and host metals (very stable bulk guest metal and less stable bulk host metal) are reversed with respect to SE, i.e., adsorption of hydrogen affects strongly the systems where guest atom is unstable at the surface. This increases the reactivity of the guest atom towards an H atom. The condition (EH* ≥ −2.35 eV) selects materials where interaction of H with a host atom is not too strong, so that H can bind with the guest atom and stabilize it at the surface. The condition (EH* ≤ −2.32 eV) makes the subgroup narrower, which further decreases median difference SE_{H}—SE but has no additional physical meaning. The condition (*H* ≥ −5.12 eV) has a minor effect on the subgroup.

One of the top selectors (among several describing very similar data subsets) for minimizing \(\sqrt {\Delta G^2 + E_{\mathrm{b}}^2}\) (calculated at *T* = 300 K) is: (−2.85 eV ≤ DC ≤ 1.95 eV) AND (DT* ≤ −0.17 eV). The corresponding subgroup contains 1974 samples (38% of the whole population). The distribution of *E*_{b} within the subgroup is shown in Supplementary Fig. 10. The selector implies that systems providing low barrier for H_{2} dissociation, and at the same time balanced binding of H atoms to the surface are characterized by (i) *d*-band center of the bulk guest metal around the Fermi level and (ii) *d*-band center of the host surface top layer below the Fermi level. This can be understood as follows. Condition (i) implies that there is a significant *d*-electron density that can be donated to the adsorbed H_{2} molecule, facilitating its dissociation. A very similar (apart from slightly different numerical values) condition appears in the selector for the best subgroup for *E*_{b} target property alone [(−2.05 eV ≤ DC ≤ 1.46 eV) AND (EC* ≥ −6.33 eV)]. Condition (ii) implies that the surface *d*-band is more than half-filled, so that additional electrons are available for transferring to the H_{2} molecule for its activation without causing excessive binding and therefore minimizing |∆*G*| in accordance with Sabatier principle. Indeed, several subgroups of surfaces binding H atoms strongly (minimizing BE_{H}) are described by selectors including condition DT* > −0.17, which is exactly opposite to condition (ii). Analysis of BE_{H} and |∆*G*| also shows that the strong and intermediate binding of H atoms to the surface is fully controlled by the features of host material.

We note that SGD is capable of finding several alternative subgroups, corresponding to different mechanisms of actuating interesting changes in target properties. These subgroups have a lower quality according to the chosen quality function, but they still contain useful information about a particular mechanism. In fact, they can be rigorously defined as top subgroups under additional constraint of zero overlap (in terms of data points) with previously found top subgroups. Analysis of such subgroups can be a subject of future work. We also note that quality function used in SGD is a parameter and can affect the found subgroups. It should be chosen based on the physical context of the problem. Exploring the role of different factors in the quality function and taking into account proposition degeneracy (no or minor effect of different conditions in the selectors due to correlation between the features) can significantly improve interpretability of the selectors. The interpretability also depends crucially on our physical understanding of the features and relations between them. Nevertheless, in combination with human knowledge SGD analysis allows for development of understanding, that would not be possible without the help of artificial intelligence.

In summary, by combining first-principles calculations and the data-analytics approach SISSO, we have identified accurate and reliable models for the description of the hydrogen binding energy, dissociation energy, and guest-atom segregation energy for SAACs, which allow us to make fast yet reliable prediction of the catalytic performance of thousands SAACs in hydrogenation reactions. The model correctly evaluates performance of experimentally tested SAACs. By scanning more than five thousand SAACs with our model, we have identified over two hundred new SAACs with both improved stability and performance compared to the existing ones. We have also introduced a novel approach to a qualitative analysis of complex SISSO descriptors using data-mining method subgroup discovery. It allows us to identify actuating mechanisms for desirable changes in the target properties, e.g., reaction barrier reduction or an increase in catalyst’s stability, in terms of basic features of the material. Our methodology can be easily adapted to designing new functional materials for various applications.

## Methods

All first-principles calculations are performed with the revised Perdew-Burke-Ernzerhof (RPBE) functional^{62} as implemented in the all-electron full-potential electronic-structure code FHI-aims^{63}. The choice of functional is validated based on a comparison of calculated H_{2} adsorption energies to the available experimental results^{64} (see Supplementary Table 1). Nevertheless, it is expected that, because of the large set of systems inspected and the small variations introduced by the functional choice, the main trends will hold even when using another functional (see Supporting Information for more details on the computational setup). The climbing-image nudged elastic band (CI-NEB) algorithm is employed to identify the transition state structures^{65}.

BE_{H} are calculated using Eq. (1), where *E*_{H/support} is the energy of the total H/support system, *E*_{support} is the energy of the metal alloy support, and *E*_{H} is the energy of an isolated H atom.

The surface segregation energy in the dilute limit, SE, is defined as the energy difference of moving the single impurity from the bulk to the surface. In this work, it is calculated using Eq. (2), where *E*_{top-layer} and *E*_{nth-layer} correspond to the total RPBE energies of the slab with the impurity in the top and *n*th surface layer, respectively. The value of *n* is chosen so that the energy difference between *E*_{nth-layer} and *E*_{(n−1)th-layer} is less than 0.05 eV.

The surface segregation energy when surface H adatom is present (the H is put at the most stable adsorption site for each system), SE_{H}, is calculated using Eq. (3).

where Δ*E*_{H} = BE_{H-top-layer} – BE_{H-pure} is the H adatom-induced segregation energy change.

Here BE_{H-top-layer} and BE_{H-pure} are the hydrogen adatom binding energies with the impurity in the top layer and the BE_{H} of the pure system without impurity. Thus, the SE_{H} can be derived from the models of SE and BE_{H}.

Using first-principles inputs as training data, we have employed SISSO to single out a physically interpretable descriptor from a huge number of potential candidates. In practice, a huge pool of more than 10 billion candidate descriptors is first constructed iteratively by combining user-defined primary features with a set of mathematical operators. The number of times the operators are applied determines the complexity of the resulting descriptors. We consider up to three levels of complexity (feature spaces) Φ_{1}, Φ_{2}, and Φ_{3}. Note that a given feature space Φ_{n} also contains all of the lower rung (i.e., *n* − 1) feature spaces. Subsequently, the desired low-dimensional representation is obtained from this pool^{49}. The details of the feature space (Φ_{n}) construction and the descriptor identification processes can be found in the Supplementary Methods. The proper selection of primary features is crucial for the performance of SISSO-identified descriptors. Inspired by previous studies^{31,38}, we consider three classes of primary features (see Table 1) related to the metal atom, bulk, and surface. The more detailed description and values of all the primary features are given in the Supplementary Table 2, Supplementary Table 3, Supplementary Data 1, and Supplementary Data 2.

The selection of the promising candidates at various temperatures and hydrogen partial pressures is performed based on ab initio atomistic thermodynamics^{66}. H adsorption/desorption on SAAC surfaces as a function of temperature and H_{2} partial pressure (*T*, *p*) is characterized by the free energy of adsorption ∆*G*:

with the chemical potential of hydrogen \(\mu _{\mathrm{H}} = \frac{1}{2}\mu _{{\mathrm{H}}_2}\) obtained from:

where \({{\Delta }}\mu _{{\mathrm{H}}_2}\left( {T,p} \right) = \mu _{{\mathrm{H}}_2}\left( {T,p^0} \right) - \mu _{{\mathrm{H}}_2}\left( {T^0,p^0} \right) + k_{\rm{B}}T\,{\mathrm{ln}}(\frac{p}{{p^0}})\).

Here *T*^{0} = 298 K and *p*^{0} = 1 atm. The frst two terms are taken from JANAF thermochemical tables^{67}. In the following, we set *p* = 1 atm.

According to Sabatier principle the optimum heterogeneous catalyst should bind the reactants strong enough to allow for adsorption, but also weak enough to allow for the consecutive desorption^{25}. In this work, a BE_{H} range is defined by the conditions:

where \(E_{{\rm{H}}_2} - 2E_{\mathrm{H}}\) is the hydrogen binding energy of the hydrogen molecule. The experimental value of −4.52 eV^{68} was used in this work.

The above conditions correspond to the free-energy bounds:

Conditions on energy barrier (*E*_{b}) are defined by considering Arrhenius-type behaviour of the reaction rate on *E*_{b} and *T*. Assuming that acceptable barriers are below 0.3 eV for *T*^{0} = 298 K, we estimate acceptable barrier at any temperature as:

Similarly the bounds for SE_{H} are determined by imposing a minimum 10% ratio for top-layer to subsurface-layers dopant concentration by assuming an Arrhenius-type relation with SE_{H} interpreted as activation energy:

The subgroup discovery was performed using RealKD package (https://bitbucket.org/realKD/realkd/). Each feature was split to 15 subsets using 15-means clustering algorithm. The borders between adjacent data clusters (a1, a2,…) are applied further for construction of inequalities (feature1 < a1), (feature2 ≥ a2), etc. While final result might depend on the number of considered clusters, in our previous study we found that relatively high numbers of considered clusters provide essentially the same result^{60}. The candidate subgroups are built as conjunctions of obtained simple inequalities. The main idea of SGD is that the subgroups are unique if the distribution of the data in them is as different as possible from the data distribution in the whole sampling. Here the data distribution is the distribution of a target property (\(\sqrt {\Delta G^2 + E_{\mathrm{b}}^2}\), SE, *E*_{b}, |∆*G*|, and BE_{H}). The uniqueness is evaluated with a quality function. In this study we used the following function:

with S—subgroup, P—whole sampling, s—size, med and min—median and minimal values of a target property, amd—absolute average deviation of the data around the median of target property. With this function the algorithm is searching for subgroups with lower values of target properties. The search was done with an adapted for such purposes Monte-Carlo algorithm^{59}, in which first a certain number of trial conjunctions (seeds) is generated. Afterwards, for each seed (accompanied with pruning of inequalities) the quality function is calculated. We have tested here several numbers of initial seeds: 10,000, 30,000, 50,000, and 100,000. The subgroups with the overall high quality function value were selected.

## Data availability

All relevant data are available from the corresponding authors upon reasonable request.

## Code availability

FHI-aims: https://aimsclub.fhi-berlin.mpg.de.

## References

- 1.
Qiao, B. et al. Single-atom catalysis of CO oxidation using Pt 1/FeO x.

*Nat. Chem.***3**, 634 (2011). - 2.
Kyriakou, G. et al. Isolated metal atom geometries as a strategy for selective heterogeneous hydrogenations.

*Science***335**, 1209–1212 (2012). - 3.
Choi, K. I. & Vannice, M. A. CO oxidation over Pd and Cu catalysts V. Al2O3-supported bimetallic Pd• Cu particles.

*J. Catal.***131**, 36–50 (1991). - 4.
Greeley, J., Nørskov, J. K., Kibler, L. A., El‐Aziz, A. M. & Kolb, D. M. Hydrogen evolution over bimetallic systems: Understanding the trends.

*ChemPhysChem***7**, 1032–1035 (2006). - 5.
Kamakoti, P. et al. Prediction of hydrogen flux through sulfur-tolerant binary alloy membranes.

*Science***307**, 569–573 (2005). - 6.
Darby, M. T., Réocreux, R., Sykes, E. C. H., Michaelides, A. & Stamatakis, M. Elucidating the stability and reactivity of surface intermediates on single-atom alloy catalysts.

*ACS Catal.***8**, 5038–5050 (2018). - 7.
Sun, G. et al. Breaking the scaling relationship via thermally stable Pt/Cu single atom alloys for catalytic dehydrogenation.

*Nat. Commun.***9**, 1–9 (2018). - 8.
Lucci, F. R. et al. Selective hydrogenation of 1, 3-butadiene on platinum–copper alloys at the single-atom limit.

*Nat. Commun.***6**, 1–8 (2015). - 9.
Liu, J. et al. Tackling CO poisoning with single-atom alloy catalysts.

*J. Am. Chem. Soc.***138**, 6396–6399 (2016). - 10.
Tierne, H. L., Baber, A. E. & Sykes, E. C. H. Atomic-scale imaging and electronic structure determination of catalytic sites on Pd/Cu near surface alloys.

*J. Phys. Chem. C***113**, 7246–7250 (2009). - 11.
Boucher, M. B. et al. Single atom alloy surface analogs in Pd 0.18 Cu 15 nanoparticles for selective hydrogenation reactions.

*Phys. Chem. Chem. Phys.***15**, 12187–12196 (2013). - 12.
Pei, G. X. et al. Performance of Cu-alloyed Pd single-atom catalyst for semihydrogenation of acetylene under simulated front-end conditions.

*ACS Catal.***7**, 1491–1500 (2017). - 13.
Marcinkowski, M. D. et al. Selective formic acid dehydrogenation on Pt-Cu single-atom alloys.

*ACS Catal.***7**, 413–420 (2017). - 14.
Simonovis, J. P., Hunt, A., Palomino, R. M., Senanayake, S. D. & Waluyo, I. Enhanced stability of Pt-Cu single-atom alloy catalysts: in situ characterization of the Pt/Cu (111) surface in an ambient pressure of CO.

*J. Phys. Chem. C***122**, 4488–4495 (2018). - 15.
Marcinkowski, M. D. et al. Pt/Cu single-atom alloys as coke-resistant catalysts for efficient C–H activation.

*Nat. Chem.***10**, 325 (2018). - 16.
Pei, G. X. et al. Ag alloyed Pd single-atom catalysts for efficient selective hydrogenation of acetylene to ethylene in excess ethylene.

*ACS Catal.***5**, 3717–3725 (2015). - 17.
Duchesne, P. N. et al. Golden single-atomic-site platinum electrocatalysts.

*Nat. Mater.***17**, 1033–1039 (2018). - 18.
Li, Z. et al. Atomically dispersed Pt on the surface of Ni particles: synthesis and catalytic function in hydrogen generation from aqueous ammonia–borane.

*ACS Catal.***7**, 6762–6769 (2017). - 19.
Chen, C. H. et al. Ruthenium‐based single‐atom alloy with high electrocatalytic activity for hydrogen evolution.

*Adv. Energy Mater.***9**, 1803913 (2019). - 20.
Studt, F. et al. Identification of non-precious metal alloy catalysts for selective hydrogenation of acetylene.

*Science***320**, 1320–1322 (2008). - 21.
Greeley, J. & Mavrikakis, M. Alloy catalysts designed from first principles.

*Nat. Mater.***3**, 810–815 (2004). - 22.
Pallassana, V., Neurock, M., Hansen, L. B., Hammer, B. & Nørskov, J. K. Theoretical analysis of hydrogen chemisorption on Pd (111), Re (0001) and Pd ML/R e (0001), Re ML/P d (111) pseudomorphic overlayers.

*Phys. Rev. B***60**, 6146 (1999). - 23.
Mavrikakis, M., Hammer, B. & Nørskov, J. K. Effect of strain on the reactivity of metal surfaces.

*Phys. Rev. Lett.***81**, 2819 (1998). - 24.
Xin, H., Vojvodic, A., Voss, J., Nørskov, J. K. & Abild-Pedersen, F. Effects of d-band shape on the surface reactivity of transition-metal alloys.

*Phys. Rev. B***89**, 115114 (2014). - 25.
Nørskov, J. K., Bligaard, T., Rossmeisl, J. & Christensen, C. H. Towards the computational design of solid catalysts.

*Nat. Chem.***1**, 37 (2009). - 26.
Montemore, M. M. & Medlin, J. W. Scaling relations between adsorption energies for computational screening and design of catalysts.

*Catal. Sci. Technol.***4**, 3748–3761 (2014). - 27.
Greeley, J. Theoretical heterogeneous catalysis: scaling relationships and computational catalyst design.

*Annu. Rev. Chem. Biomol. Eng.***7**, 605–635 (2016). - 28.
Abild-Pedersen, F. et al. Scaling properties of adsorption energies for hydrogen-containing molecules on transition-metal surfaces.

*Phys. Rev. Lett.***99**, 016105 (2007). - 29.
Michaelides, A. et al. Identification of general linear relationships between activation energies and enthalpy changes for dissociation reactions at surfaces.

*J. Am. Chem. Soc.***125**, 3704–3705 (2003). - 30.
Logadottir, A. et al. The Brønsted–Evans–Polanyi relation and the volcano plot for ammonia synthesis over transition metal catalysts.

*J. Catal.***197**, 229–231 (2001). - 31.
Andersen, M., Levchenko, S. V., Scheffler, M. & Reuter, K. Beyond scaling relations for the description of catalytic materials.

*ACS Catal.***9**, 2752–2759 (2019). - 32.
Calle-Vallejo, F., Loffreda, D., Koper, M. T. & Sautet, P. Introducing structural sensitivity into adsorption–energy scaling relations by means of coordination numbers.

*Nat. Chem.***7**, 403 (2015). - 33.
O’Connor, N. J., Jonayat, A., Janik, M. J. & Senftle, T. P. Interaction trends between single metal atoms and oxide supports identified with density functional theory and statistical learning.

*Nat. Catal.***1**, 531–539 (2018). - 34.
Xu, H., Cheng, D., Cao, D. & Zeng, X. C. A universal principle for a rational design of single-atom electrocatalysts.

*Nat. Catal.***1**, 339–348 (2018). - 35.
Curtarolo, S. et al. The high-throughput highway to computational materials design.

*Nat. Mater.***12**, 191–201 (2013). - 36.
Hammer, B. & Norskov, J. K. Why gold is the noblest of all the metals.

*Nature***376**, 238–240 (1995). - 37.
Roling, L. T. & Abild‐Pedersen, F. Structure‐sensitive scaling relations: adsorption energies from surface site stability.

*ChemCatChem***10**, 1643–1650 (2018). - 38.
Li, Z., Wang, S., Chin, W. S., Achenie, L. E. & Xin, H. High-throughput screening of bimetallic catalysts enabled by machine learning.

*J. Mater. Chem. A***5**, 24131–24138 (2017). - 39.
Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H 2 evolution.

*Nat. Catal.***1**, 696–703 (2018). - 40.
Kitchin, J. R. Machine learning in catalysis.

*Nat. Catal.***1**, 230–232 (2018). - 41.
Li, Z., Wang, S. & Xin, H. Toward artificial intelligence in catalysis.

*Nat. Catal.***1**, 641–642 (2018). - 42.
Reuter, K. Ab initio thermodynamics and first-principles microkinetics for surface catalysis.

*Catal. Lett.***146**, 541–563 (2016). - 43.
Reuter, K., Frenkel, D. & Scheffler, M. The steady state of heterogeneous catalysis, studied by first-principles statistical mechanics.

*Phys. Rev. Lett.***93**, 116105 (2004). - 44.
Nørskov, J. Catalysis—calculations and concepts.

*Adv. Catal.***45**, 71 (2001). - 45.
Van Santen, R. A., Neurock, M. & Shetty, S. G. Reactivity theory of transition-metal surfaces: a Brønsted− Evans–Polanyi linear activation energy−free-energy analysis.

*Chem. Rev.***110**, 2005–2048 (2009). - 46.
Fajín, J. L., Cordeiro, M. N. D., Illas, F. & Gomes, J. R. Generalized Brønsted–Evans–Polanyi relationships and descriptors for O–H bond cleavage of organic molecules on transition metal surfaces.

*J. Catal.***313**, 24–33 (2014). - 47.
Viñes, F., Vojvodic, A., Abild-Pedersen, F. & Illas, F. Brønsted–Evans–Polanyi relationship for transition metal carbide and transition metal oxide surfaces.

*J. Phys. Chem. C***117**, 4168–4171 (2013). - 48.
Zhong, M. et al. Accelerated discovery of CO2 electrocatalysts using active machine learning.

*Nature***581**, 178–183 (2020). - 49.
Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M. & Ghiringhelli, L. M. SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates.

*Phys. Rev. Mater.***2**, 083802 (2018). - 50.
Thirumalai, H. & Kitchin, J. R. Investigating the reactivity of single atom alloys using density functional theory.

*Top. Catal.***61**, 462–474 (2018). - 51.
Bengio, Y. & Grandvalet, Y. No unbiased estimator of the variance of k-fold cross-validation.

*J. Mach. Learn. Res.***5**, 1089–1105 (2004). - 52.
Rodriguez, J. D., Perez, A. & Lozano, J. A. Sensitivity analysis of k-fold cross validation in prediction error estimation.

*IEEE Trans. Pattern Anal. Mach. Intell.***32**, 569–575 (2009). - 53.
Wong, T.-T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation.

*Pattern Recognit.***48**, 2839–2846 (2015). - 54.
Rao, K. K., Do, Q. K., Pham, K., Maiti, D. & Grabow, L. C. Extendable machine learning model for the stability of single atom alloys. Top. Catal.

**63**, 728–741 (2020). - 55.
Wrobel, S.

*European Symposium on Principles of Data Mining and Knowledge Discovery*78–87 (Springer, 1997). - 56.
Friedman, J. H. & Fisher, N. I. Bump hunting in high-dimensional data.

*Stat. Comput.***9**, 123–143 (1999). - 57.
Atzmueller, M. Subgroup discovery.

*Wiley Interdiscip. Rev.***5**, 35–49 (2015). - 58.
Boley, M., Goldsmith, B. R., Ghiringhelli, L. M. & Vreeken, J. Identifying consistent statements about numerical data with dispersion-corrected subgroup discovery.

*Data Min. Knowl. Discov.***31**, 1391–1418 (2017). - 59.
Goldsmith, B. R., Boley, M., Vreeken, J., Scheffler, M. & Ghiringhelli, L. M. Uncovering structure-property relationships of materials by subgroup discovery.

*N. J. Phys.***19**, 013031 (2017). - 60.
Mazheika, A. et al. Ab initio data-analytics study of carbon-dioxide activation on semiconductor oxide surfaces. https://arXiv.org/1912.06515 (2019).

- 61.
Chelikowsky, J. R. Predictions for surface segregation in intermetallic alloys.

*Surf. Sci.***139**, L197–L203 (1984). - 62.
Hammer, B., Hansen, L. B. & Nørskov, J. K. Improved adsorption energetics within density-functional theory using revised Perdew-Burke-Ernzerhof functionals.

*Phys. Rev. B***59**, 7413 (1999). - 63.
Blum, V. et al. Ab initio molecular simulations with numeric atom-centered orbitals.

*Comput. Phys. Commun.***180**, 2175–2196 (2009). - 64.
Silbaugh, T. L. & Campbell, C. T. Energies of formation reactions measured for adsorbates on late transition metal surfaces.

*J. Phys. Chem. C***120**, 25161–25172 (2016). - 65.
Henkelman, G., Uberuaga, B. P. & Jónsson, H. A climbing image nudged elastic band method for finding saddle points and minimum energy paths.

*J. Chem. Phys.***113**, 9901–9904 (2000). - 66.
Reuter, K., Stampf, C. & Scheffler, M. in

*Handbook of Materials Modeling*149–194 (Springer, 2005). - 67.
Tables, J. T.

*JANAF Thermochemical Tables*, (eds. Stull, D.R. & Prophet, H.) (National Bureau of Standards Publication, 1971). - 68.
Lide, D. R.

*CRC Handbook of Chemistry and Physics: A Ready-Reference Book of Chemical and Physical Data*(CRC Press, 1995).

## Acknowledgements

S.V.L. is supported by Skolkovo Foundation Grant. The machine-learning methodology development was funded by RFBR and INSF, project number 20-53-56065. Y.G. is supported by the National Natural Science Foundation of China (11604357, 11574340). R.O. is supported by the National Key Research and Development Program of China (2018YFB0704400) and the Program of Shanghai Youth Oriental Scholars.

## Author information

### Affiliations

### Contributions

S.V.L. created the idea and conceived the work. S.V.L. and Y.G. designed and supervised the project. S.V.L. and A.M. supervised the SGD analysis. R.O. supervised the SISSO analysis. Z.-K.H. performed all the calculations. Z.-K.H. and D.S. wrote the manuscript with inputs from all the authors. All authors contributed to the analysis and interpretation of the results. All the authors commented on the manuscript and have given approval to the final version of the manuscript.

### Corresponding authors

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Peer review information** *Nature Communications* thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Han, ZK., Sarker, D., Ouyang, R. *et al.* Single-atom alloy catalysts designed by first-principles calculations and artificial intelligence.
*Nat Commun* **12, **1833 (2021). https://doi.org/10.1038/s41467-021-22048-9

Received:

Accepted:

Published:

## Further reading

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.