Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Statistical learning goes beyond the d-band model providing the thermochemistry of adsorbates on transition metals


The rational design of heterogeneous catalysts relies on the efficient survey of mechanisms by density functional theory (DFT). However, massive reaction networks cannot be sampled effectively as they grow exponentially with the size of reactants. Here we present a statistical principal component analysis and regression applied to the DFT thermochemical data of 71 C\({}_{1}\)–C\({}_{2}\) species on 12 close-packed metal surfaces. Adsorption is controlled by covalent (\(d\)-band center) and ionic terms (reduction potential), modulated by conjugation and conformational contributions. All formation energies can be reproduced from only three key intermediates (predictors) calculated with DFT. The results agree with accurate experimental measurements having error bars comparable to those of DFT. The procedure can be extended to single-atom and near-surface alloys reducing the number of explicit DFT calculation needed by a factor of 20, thus paving the way for a rapid and accurate survey of whole reaction networks on multimetallic surfaces.


Heterogeneous catalysis holds the key to solving fundamental sustainability issues by introducing renewable compounds as a source of chemicals and energy vectors1,2,3. The rational search for new catalysts benefits from the extensive use of density functional theory (DFT) and kinetic models derived from it3,4,5,6,7,8,9,10. This procedure requires sampling the reaction network that links reactants, intermediates, and products through transition states. For large molecules, such as those involved in biomass valorization processes, the number of intermediates and transition states grow exponentially with the molecular size, rendering the computational screening of new catalysts impractical. For instance, the decomposition of a medium size C\({}_{5}\)–C\({}_{6}\) sugar alcohols encompasses more than \(1{0}^{4}\) species and \(1{0}^{5}\) transition states in its reaction network11. The positive aspect is that kinetic barriers are linked to the formation energies of different intermediates4,12. Thus, the screening may be reduced to obtain the thermodynamics for the adsorbed species and couple them with linear-scaling relationships and microkinetic models to simulate operation conditions. The upgrade of biomass-derived molecules is often done by metals and alloys1,13, and lately attention has been drawn to the versatile properties of single-atom alloys (SAAs) and near-surface alloys (NSAs)9,14,15,16,17,18,19,20,21,22. The number of combinations is again unlimited and some have shown an almost continuum of adsorption strengths21. To this end, new thermochemical models based on statistical learning may allow a rapid survey of the energies of adsorbed species for faster screening8,18,23,24,25,26,27.

The pioneering work by Benson established the basis for thermochemical scaling relationships of gas-phase molecules already in the 60s28. In this formulation, the formation energy for a hydrocarbon or oxygenated molecule is obtained as the sum of the energies stored on C–C, C–O, C–H, and O–H bonds, considering also the contribution from rings, unsaturations, and radicals28. Despite its simplicity, Benson’s model has an impressive accuracy for small molecules such as hydrocarbons, alcohols, and ethers, the formation energy of which is predicted with errors lower than 0.05 eV29.

When molecules adsorb on metal surfaces, the interaction has covalent, ionic, and dispersion contributions. The most studied term is the covalency appearing from the coupling of the metal \(sp\)- and \(d\)-states with the adsorbate. The \(sp\) part depends on the species but it is rather constant along the metals. The second one gives the metal-to-metal variability and comes from the \(d\)-band center and filling30,31. As a consequence, the adsorption energy of a molecular fragment AH\({}_{x}\) is a linear function of that of its heteroatom, A, and the slope accounts for the valence of AH\({}_{x}\)32. Further linear dependencies have been identified for heteroatoms belonging to the same group in the periodic table, i.e., P* scales with N*33. In addition, the accuracy of the above models can be improved by using site-specific adsorption rules22,32 and the dependence on the local coordination of the adsorption sites22,34,35. In the particular case of very small nanoparticles, the activity modulation is linearly dependent with the local electrostatic potential36.

These adsorption energy models have been extended to multifunctionalized molecules by combining the heteroatom scalings and the Benson model28,37,38. The combination can be centered either on individual bond energies37 or on the coordination environment of each heteroatom38. Attempts to generalize simplified thermochemical models to other materials are less frequent, but for perovskites and transition metal oxides39,40,41, electronic parameters such as the occupancy of e\({}_{g}\) orbitals and the covalency for the oxygen-transition metal bond were deemed descriptors for their catalytic activity.

Still, the reactivity on metals has provided the largest amount of DFT data and benchmarks on the thermochemistry demonstrate the robustness of the results42. Thus, large FAIR databases43 open alternative paths to rationally design heterogeneous catalysts10, by improving existing thermochemical models and generating new ones through statistical learning. For instance, the formation energies of large molecules in gas phase can be retrieved from neural networks with an accuracy comparable to DFT, 0.04 eV MAE44, thus beyond Benson’s model29. An alternative approach is to predict the formation energy of few reactivity descriptors from geometric and electronic features, such as atomic radius, local electronegativity, ionic potential, and the coordination of the active site9,18,22,45. Particularly, a recent SISSO study considers features from the adsorbate, the metal, and the adsorption site45, providing excellent predictions with one DFT evaluation for each metal. However, the method does not belong to the explanatory class of machine-learning techniques and thus results are difficult to interpret. A key step to generalize statistical learning models is to extract physical insights from them. For instance, a feature importance analysis18 rediscovered the differentiated roles of \(d\)- and \(sp\)-band contributions31,46. Other studies have highlighted the role of electronic and redox descriptors for the thermochemistry of transition metal complexes47 and oxide-supported single-metal atoms25. Yet, the potential of statistical learning in heterogeneous catalysis remains largely unexplored9,26.

In the present work, we applied principal component analysis and Regression (PCA, PCR) on a set of formation energies obtained by DFT. From the descriptors so obtained, we retrieved the \(d\)-band center and the redox ability of the metal as the main controllers of the thermochemistry, along conjugation and conformational effects. With these descriptors and a minimum set of DFT energy evaluations (around two-thousand), we predicted a full thermochemical database of 31,000 species adsorbed on pure metals, SAAs and NSAs. The methodology reduces the number of explicit DFT evaluations by a factor of 20 keeping its accuracy. As the procedure is modular it can be adapted or extended to other systems in heterogeneous catalysis.


Interpretation of thermochemical data by PCA

The first step consists in the generation of a well-converged database of formation energies on late transition metals: Cu, Ag, Au, Ni, Pd, Pt, Rh, Ir, Ru, Os, Zn, and Cd. This was done using the PBE-D2 functional following the gold standard in DFT42. The formation energies, \({E}_{{{\rm{C}}}_{x}{{\rm{H}}}_{y}{{\rm{O}}}_{z}* }\), are referred to gas-phase reservoirs of methane, hydrogen, and water, Eqs. (1)–(2). In all cases, the lowest energy conformation was employed to ensure that the PCA includes the information corresponding to conjugation and conformational changes. The data are arranged in a matrix-E in which the rows span the metals \(i\) and the columns correspond to the adsorbates \(j\). As the data matrix is complete, meaning that it does not have any missing points, it is suitable for PCA. PCA is a statistical technique that reduces the dimensionality of E by projecting it along the directions of greatest variability, thus reducing its noise and aiding to its interpretability48. The process, summarized in Fig. 1 and detailed in the Supplementary Methods, proceeds as follows: the average adsorption energy for each intermediate, \({\mu }_{j}\), is employed to center the adsorption matrix-E and get X. We kept the units of X as eV. This matrix is multiplied at the left by its own transpose to get the covariance matrix C, which is then diagonalized. The eigenvalues of the diagonal matrix D are all positive and are placed in decreasing order, consistently with the eigenvector matrix V. Afterwards, V is truncated to \({k}_{\max }\) principal components and multiplied at the left by X to get W and T. These matrices contain the descriptors for the metals and adsorbates, \({t}_{ik}\) and \({w}_{kj}\), respectively. The adsorption energies can then be retrieved from Eq. (3).

$$x{{\rm{CH}}}_{4}+\left(-2x+\frac{1}{2}y-z\right){{\rm{H}}}_{2}+z{{\rm{H}}}_{2}{\rm{O}}{+}^{* }\, \longrightarrow \, {{\rm{C}}}_{x}{{\rm{H}}}_{y}{{\rm{O}}}_{z}^{* }$$
$${{E}_{{{\rm{C}}}_{x}{{\rm{H}}}_{y}{{\rm{O}}}_{z}^{* }}={E}_{{{\rm{C}}}_{x}{{\rm{H}}}_{y}{{\rm{O}}}_{z}^{* }}^{{\rm{vasp}}}-x{E}_{{{\rm{CH}}}_{4}}^{{\rm{vasp}}}+\left(2x-\frac{1}{2}y+z\right){E}_{{{\rm{H}}}_{2}}^{{\rm{vasp}}}-z{E}_{{{\rm{H}}}_{2}{\rm{O}}}^{{\rm{vasp}}}-{E}_{* }^{{\rm{vasp}}}}$$
$${\hat{E}}_{ij}={t}_{i1}{w}_{1j}+{t}_{i2}{w}_{2j}+\cdots +{t}_{ik}{w}_{kj}+\cdots +{t}_{i{k}_{\max }}{w}_{{k}_{\max }j}+{\mu }_{j}$$
Fig. 1

Principal component analysis (PCA) and regression (PCR). The formation energy of species \(j\) on metal \(i\) is obtained from DFT and Eqs. (1)-(2). These energies are grouped in thermochemistry matrices, in red, and are approximated following PCA or PCR, in green. PCR was validated by leaving-one-metal-out of the data matrix (L1O). Variables associated with metals and species are shown in blue and orange, respectively. In black, variables associated with mathematical procedures. A data flow diagram, including the sizes of all matrices in this study, is shown in Supplementary Fig. 3

The accuracy of Eq. (3) is given by the number of principal components selected; this is, the number of \({t}_{ik}{w}_{kj}\) terms. To assess the minimum number of terms in the expansion, \({k}_{\max }\), we have evaluated two criteria: the MAE and the variance, see Supplementary Table 3. In the first case, the MAE stagnates for two terms to a value that falls within the DFT accuracy. At that point, 98.1% of the variance is already captured. As a result, only two principal components, \({k}_{\max }=2\), will be used from now on and only two descriptors are needed for metals \(\{({t}_{i1},{t}_{i2})\}\) and adsorbates \(\{({w}_{1j},{w}_{2j})\}\), Fig. 2a, b. These descriptors wrap up the causes of variability in metal-adsorbate bond energies.

Fig. 2

Descriptors from principal component analysis. Descriptors for a metals \(\{({t}_{i1},{t}_{i2})\}\) and b adsorbates \(\{({w}_{1j},{w}_{2j})\}\), obtained by PCA. The color scale in b measures the robustness of each species of being a predictor (\({\iota }_{j}\)). Those marked in brown are more suitable predictors, as they have large projections in at least one \({w}_{jk}\) and a low SD, Supplementary Eq. 12. Those marked in yellow are the least suitable. Relevant species are labeled

The first metal descriptor, \({t}_{i1}\) spans the larger variability, 17 eV, whereas the second one, \({t}_{i2}\), accounts only for one-third of this energy span, Fig. 2a. The metals tend to be ordered left-to-right by their position on each period in the table of elements. Those more resistant to oxidation, such as Pt and Au, appear on the topmost region.

All adsorbates appear in the right side of Fig. 2b, meaning that their first weight, \({w}_{1j}\), is always positive. The largest terms belong to highly unsaturated carbonaceous species. Therefore, the first term, \({t}_{i1}{w}_{1j}\), can be interpreted as the affinity of metal \(i\) to form covalent bonds with intermediate \(j\). For late transition metals (groups 8–11), this characteristic can be mapped to the \(d\)-band center, Fig. 3a, which modulates the adsorption strength on such surfaces30,31. However, the \(d\)-band model cannot describe several systems that are relevant for catalysis: it does not apply for adsorbates with an almost completely filled valence shell, such as *OH, adsorbing on alloys with almost-filled \(d\)-bands46. Besides, the \(d\)-band model cannot describe metals from group 12 (Zn, Cd) and above, which can be components in high-entropy alloys suited for electrocatalysis21. The second weight for the adsorbates, \({w}_{2j}\), is positive for those that bind through an oxygen atom, such as O* and *OCH\({}_{2}\)CH\({}_{2}\)O*, and negative in species that bind by a *COH center. Therefore, the second term, \({t}_{i2}{w}_{2j}\), can be interpreted as the ionicity of the metal-adsorbate bond. As such, \({t}_{i2}\) can be mapped to the reduction potential, Fig. 3b. Cross-correlations do not appear, Supplementary Fig. 1, meaning that \({t}_{i1}{w}_{1j}\) and \({t}_{i2}{w}_{2j}\) summarize rather independently the covalent and ionic contributions, respectively.

Fig. 3

Interpretation of metal descriptors in physical terms. First and second descriptors for the metals, \({t}_{i1}\) and \({t}_{i2}\), plotted against a the \(d\)-band center and b the reduction potential50, respectively. The inset shows that \(d\)-band center controls the adsorption energy on late transition metals (groups 8–11)30,31, but it cannot describe the behavior of Zn and Cd. Additional data and plots are provided in Supplementary Table 2 and Supplementary Figs. 1 and 5

The fact that 98.1% of the variance is captured by the aforementioned descriptors, implies that other contributions to the thermochemistry, such as conjugation, conformational changes (different adsorption sites), and dispersion (van der Waals) are already included, as they are related to the two major covalent and ionic terms. For instance, CHCH\({}_{2}\) can adsorb as monodentate (*CH=CH\({}_{2}\)), tridentate (**CH–*CH\({}_{2}\)), or intermediate structures. The most stable conformation depends on the metal affinity to carbon, Supplementary Fig. 4. This means that the adsorbate valence is not necessarily an integer, as it is normally assumed in heteroatom scaling relationships32. Also, small molecules such as *OH can adsorb on fcc, hcp, bridge, and top-tilted sites49, and the preferred site can differ even for chemically similar metals, such as Rh (fcc sites) and Ir (bridge), or Ru (hpc) and Os (bridge). In other words, the most stable conformation of the adsorbate is defined by the metal, thus highlighting the interplay of the metal-adsorbate system.

To provide a rapid survey on the adsorption energies of surface species, it is desirable to calculate by DFT only a small subset of intermediates, called predictors. Their number should be at least equal to the number of principal components \({k}_{\max }\)27,48. Choosing the predictors is not evident from Fig. 2b alone. For instance, the simplest set would contain only the heteroatoms C* and O*27, but \({E}_{{\rm{O}}}\) and \({E}_{{\rm{C}}}\) are mildly codependent. This codependence appears in many pairs of adsorbates, as can be seen in Supplementary Fig. 2, although its origin was unknown33,51. Therefore, the predictor set can be expanded to ensure that the full DFT database, (E in Fig. 1), is properly estimated (\(\hat{{\bf{E}}}\)).

To select the proper predictor set, we calculated the error matrix \((\hat{{\bf{E}}}-{\bf{E}})\) that compares the potential energies obtained by DFT to the ones estimated by Eq. (3). Then, the robustness of each intermediate as a predictor, \({\iota }_{j}\), was measured following the procedure detailed in Supplementary Methods. This \({\iota }_{j}\) is indicated by the color scale in Fig. 2b, in which dark brown indicates at least one large \({w}_{kj}\) and a low SD. Therefore, such species are better at predicting the thermochemistry of others. Interestingly, we noticed that when the prediction error of O* was positive, the one of *OH tends to be negative and vice versa. We also chose *CCHOH as predictor, as it has a pair (\({w}_{1j},{w}_{2j}\)) that is almost orthogonal to O* and *OH, Fig. 2b. In summary, the full thermochemistry of a given metal can be estimated from two principal components obtained from the formation energies of three27 predictors (O*, *OH, and *CCHOH) that capture most of the variability of the original data matrix. The principal components define two descriptors for both metals (\({t}_{i1}\), \({t}_{i2}\)) and molecules (\({w}_{1j}\), \({w}_{2j}\)).

Fast and accurate prediction of thermochemistry via PCR

To assess the accuracy of the statistical learning tools, we performed two set of tests, using PCA and PCR-L1O, respectively (Fig. 4). In both cases, we compared the results of the estimated formation energies from Eq. (3) (\(\hat{{\bf{E}}}\), Fig. 1) with those obtained by DFT (E, Fig. 1). For PCA, the training and prediction sets are equal, as they contain the full DFT set: 71 molecules and 12 metals. PCA estimates all the energies within ±0.50 eV and 98% of them lie within ±0.30 eV, with a MAE of 0.08 eV (Fig. 4a, b). The test on the PCR is stricter as the training set is reduced to only three molecules as predictors. We followed a leave-one-out (L1O) validation in which we took a subset of 11 metals (training set) to predict the thermochemistry of the 12\({}^{{\rm{th}}}\) one (validation set) (Fig. 1). This matrix of energies is split into two submatrices containing just the three predictors (\({\bf{E}}^{\prime}\)) and the remaining species (\({\bf{E}}^{\prime\prime}\)). In total, 792 DFT evaluations are required to get \({\bf{E}}^{\prime}\), corresponding to 11 metals \(\times\) 71 species plus the empty surface. For the validation set (\({{\bf{E}}}_{{\rm{val}}}\)), only four DFT evaluations are needed and correspond to the clean surface and the three predictors. The PCR-L1O starts by applying PCA on the \({\bf{E}}^{\prime}\) submatrix to obtain \({\bf{T}}^{\prime}\) and \({\bf{W}}^{\prime}\). Then the descriptors for the metals in the validation set, \({t}_{1,{\rm{val}}}\) and \({t}_{2,{\rm{val}}}\), are estimated from the DFT formation energies of the three predictors. The descriptors for the remaining species (\({w}_{1j,{\rm{val}}}\), \({w}_{2j,{\rm{val}}}\), and \({\mu }_{j,{\rm{val}}}\)) are then found via linear regression of Eq. (4) on the training set. Finally, the thermochemistry of the validation set is predicted from Eq. (5). This procedure is sequentially run to consider every metal independently as a validation set. Fig. 4c, d compares the results from PCR-L1O with DFT data, showing that the MAE increases to 0.12 eV and the population in the central bars is about 25% smaller than for PCA. Still, 98% of the estimated energies lie within ±0.40 eV. Thus, the PCR-L1O methodology only increases the error span by 0.10 eV. If C* and O* were used as predictors instead of O*, *OH, and *CCHOH, the MAE would have rise to 0.16 eV and the maximum error to ±1.00 eV. Writing the formation energies as linear regression of the \(d\)-band center and the oxidation potential (excluding Os, Cd, and Zn) would increase the MAE to 0.18 eV and the maximum error to ±0.80 eV. Thus, higher accuracies can be obtained from predictors calculated by DFT than by using tabulated data.

$${E}_{ij}^{{\prime\prime} }={t}_{i1}^{{\prime} }{w}_{1j,{\rm{val}}}+{t}_{i2}^{{\prime} }{w}_{2j,{\rm{val}}}+\cdots +{t}_{i{k}_{\max }}^{{\prime} }{w}_{{k}_{\max }j,{\rm{val}}}+{\mu }_{j,{\rm{val}}}$$
$${\hat{E}}_{ij,{\rm{val}}}={t}_{i1,{\rm{val}}}{w}_{1j,{\rm{val}}}+{t}_{i2,{\rm{val}}}{w}_{2j,{\rm{val}}}+\cdots +{t}_{i{k}_{\max },{\rm{val}}}{w}_{{k}_{\max }j,{\rm{val}}}+{\mu }_{j,{\rm{val}}}$$
Fig. 4

Accuracy assessment of PCA and PCR. a Error distribution and b cumulative errors for the PCA, taking the DFT data of the 71 molecules on the 12 metals. c, d The corresponding values for PCR-L1O, using only O*, *OH, and *CCHOH as predictors, and leaving-one-metal-out of the pool (L1O). The inset shows the size of the corresponding input (red) and output (green) energy matrices

The adsorption energies of metals that are better predicted correspond to Rh and Ir with 0.05 and 0.08 eV MAE, whereas the worst are Zn and Ag with 0.18 and 0.17 eV MAE. Zn (Ag) has the more exothermic (endothermic) average of formation energies. Therefore, the binding energy of their adsorbates is normally underestimated (overestimated). In general, the predicted values are least accurate for species that have highly endothermic formation energies, such as *CHOHCHOH and *CCH. However, they would play a minor role when large reaction networks are taken as a whole particularly when different competitive paths are wrapped through microkinetic models7. The PCR-L1O method was then benchmarked against experimental and DFT thermochemical data (Fig. 5a)52,53. Our estimates, shown in orange, lie nicely close to the 1:1 line. The error bars lie between the \(\pm\!0.25\) eV typical of DFT predictions, as shown by PBE and BEEF-vdW values obtained by other groups (in dark and light gray, respectively)52,53.

Fig. 5

Prediction of adsorption energies on metal alloys through PCR. a PCR-L1O estimates of formation energies plotted against experimental data of Supplementary Table 4 (orange circles)53. Open circles stand for the predicted values when coverage effects are not considered. DFT data from ref. 52 are provided as benchmark for PBE and BEEF-vdW density functionals, in dark and light gray, respectively. b Single-atom (SAA), c sub-surface (NSA-SS), and d overlayer (NSA-OL) alloys. Taking the DFT thermochemistry of 71 adsorbates on 12 pure metals and 3 predictors for each alloy (*O, *OH, and *CCHOH), the adsorption energies for the remaining 68 adsorbates was predicted. Then, 1% of these species were randomly selected as validation set, calculated by DFT, and benchmarked on bd

Finally, we have used PCR to generate the first full thermochemical database containing SAAs and NSAs. The SAAs were obtained by replacing one atom by the guest element, whereas the NSA14 were generated by substituting either the overlayer or the sub-surface layer (Fig. 5b–d). The host elements were those listed in Fig. 4 and the guest elements also included Fe, Co, and Re, for a total of \(12* (15-1)=168\) SAA and 336 NSA. For each alloy, the prediction of the thermochemistry required four DFT evaluations, corresponding to the clean surface and three predictors: O*, *OH, and *CCHOH. The alloys whose structures did not converge are listed in the Supplementary Methods and were removed from the pool, leaving 165 SAAs and 278 NSAs. These results, collected on matrix \({{\bf{E}}}_{{\rm{val}}}\), required 1772 DFT evaluations. With this data, and taking the pure metals as a training set (12 \(\times\) 71 = 852 DFT evaluations distributed between matrices E and E’), a PCR was applied to predict the thermochemistry for the remaining 68 species on each alloy. As a result, a database of 31,453 formation energies was generated (Supplementary File: alloys-prediction.csv). To benchmark this database, 1% of these species were randomly selected, calculated by DFT, and compared with the predictions of PCR. The benchmark, shown in Fig. 5b–d excluded those species that belong to predictor set. In all cases, the estimates are within the DFT accuracy, with a MAE around 0.19 eV. This shows the high predictive power of PCR as the number of explicit DFT calculations is reduced 20 times for each new alloy. However, the PCA/PCR employed above can be extended to account for other effects, likely appearing as new components in the expansion in Eq. (5). Examples of these effects are coverage contributions54, transition state search18, site coordination22,35,45, and solvation7,55.


Statistical learning provides a robust toolbox to go beyond the traditional interpretation of the energetics of intermediates on transition metal surfaces. By applying PCA on a set of formation energies obtained by DFT, we extracted only two descriptors for the reactivity of metals and alloys. A close analysis to these descriptors revealed their nature as covalent and ionic contributions, which can be traced back to the well-known \(d\)-band model, the reduction potential of the metal, as well as conjugation and conformational effects. Then, PCR enabled us to do a fast survey on the adsorption on SAAs and NSAs, using just three adsorbates as predictors: *O, *OH, and *CCHOH. A minimum of DFT energy evaluations (around 1800) is required to predict the full set of 31,000 formation energies with high accuracy, as the error bars are comparable to DFT ones. This approach is modular and can be extended to other materials and systems, such as the prediction of activation energies for elementary steps or the quantification of solvent and coverage effects, thus paving the way for reliable thermochemical models suited to heterogeneous catalysis.


Computational details

We performed DFT calculations with the Vienna Ab-initio Simulation Package, VASP56,57, for 71 species derived from C\({}_{1}\) and C\({}_{2}\) alcohol decomposition7,49 on closed-packed surfaces of Cu, Ag, Au, Ni, Pd, Pt, Rh, Ir, Ru, Os, Zn, and Cd. The structures can be retrieved from ioChem-BD58,59, which is a FAIR (Findability, Accessibility, Interoperability, and Reusability) database43. Instructions about data management are provided in Supplementary Methods. The functional of choice was PBE60 with the D2 dispersion corrections of Grimme and our reparameterized values for metals61,62. The structural and electronic parameters for the metals can be found on Supplementary Tables 12. The present setup follows the current gold standard in DFT calculations42. Core electrons were represented by Projector Augmented Wave pseudopotentials63,64 and valence electrons were represented by plane waves with a kinetic energy cutoff of 450 eV. The calculated lattice parameters for the metals show good agreement with experimental values, as detailed in the Supplementary Information. Metal surfaces were modeled by four-layer slabs, where the two uppermost layers were fully relaxed and the bottom ones were fixed to the bulk distances. We selected the (111) surfaces for the fcc metals and the (0001) for the hcp ones. The adsorption was studied on \(2\sqrt{3}\times 2\sqrt{3}-R3{0}^{\circ }\) supercells. The vacuum between the slabs was set larger than 13 Å and the dipole correction was applied in \(z\) direction65. The Brillouin zone was sampled by a \(\Gamma\)-centered \(3\times 3\times 1\) k-points mesh generated through the Monkhorst–Pack method66. For each species, several conformations were calculated by DFT7,49, but only the most stable ones were taken for subsequent analysis. The gas-phase molecules were relaxed in a cubic box with 20 Å sides. For PCA and PCR, the diagonalizations were done with Maple using double precision.

Data availability

All relevant data are available from the authors. The matrices E, E′, and E″ from the training set are uploaded as Supplementary Files matrix-E.csv, matrix-E-prime.csv, and matrix-E-second.csv, respectively. The matrix E′ for SAAs and NSAs is presented on matrix-E-prime-alloys.csv and the 31,453 structures predicted are listed on alloys-prediction.csv. The matrix files are labeled using a succinct notation detailed in Supplementary Methods and Supplementary Table 5. The structures of all species can be downloaded from ioChem-BD58 following ref. 59.

Code availability

The LibreOffice Calc spreadsheets, Maple worksheets, and Python scripts are available from the authors. The data from pure metals and alloys are processed in pca-regressions.ods and pca-alloys.ods spreadsheets, respectively. The Maple worksheet used for diagonalizations is The python script that generates all thermochemical data for alloys is


  1. 1.

    Besson, M., Gallezot, P. & Pinel, C. Conversion of biomass into chemicals over metal catalysts. Chem. Rev. 114, 1827–1870 (2013).

    PubMed  Article  CAS  Google Scholar 

  2. 2.

    Resasco, D. E., Wang, B. & Sabatini, D. Distributed processes for biomass conversion could aid UN Sustainable Development Goals. Nat. Catal. 1, 731 (2018).

    Article  Google Scholar 

  3. 3.

    Jones, G. Industrial computational catalysis and its relation to the digital revolution. Nat. Catal. 1, 311 (2018).

    Article  Google Scholar 

  4. 4.

    Nørskov, J. K., Bligaard, T., Rossmeisl, J. & Christensen, C. H. Towards the computational design of solid catalysts. Nat. Chem. 1, 37–46 (2009).

    PubMed  Article  CAS  Google Scholar 

  5. 5.

    Sutton, J. E., Guo, W., Katsoulakis, M. A. & Vlachos, D. G. Effects of correlated parameters and uncertainty in electronic-structure-based chemical kinetic modelling. Nat. Chem. 8, 331–337 (2016).

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Seh, Z. W. et al. Combining theory and experiment in electrocatalysis: insights into materials design. Science 355, eaad4998 (2017).

    PubMed  Article  Google Scholar 

  7. 7.

    Li, Q., García-Muelas, R. & López, N. Microkinetics of alcohol reforming for H\({}_{2}\) production from a FAIR density functional theory database. Nat. Commun. 9, 526 (2018).

    ADS  PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8.

    Kitchin, J. R. Machine learning in catalysis. Nat. Catal. 1, 230 (2018).

    Article  Google Scholar 

  9. 9.

    Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO\({}_{2}\) reduction and H\({}_{2}\) evolution. Nat. Catal. 1, 696 (2018).

    CAS  Article  Google Scholar 

  10. 10.

    Singh, A. K., Montoya, J. H., Gregoire, J. M. & Persson, K. A. Robust and synthesizable photocatalysts for CO\({}_{2}\) reduction: a data-driven materials discovery. Nat. Commun. 10, 443 (2019).

    ADS  PubMed  PubMed Central  Article  CAS  Google Scholar 

  11. 11.

    Sutton, J. E. & Vlachos, D. G. Building large microkinetic models with first-principles’ accuracy at reduced computational cost. Chem. Eng. Sci. 121, 190–199 (2015).

    CAS  Article  Google Scholar 

  12. 12.

    Zaffran, J., Michel, C., Auneau, F., Delbecq, F. & Sautet, P. Linear energy relations as predictive tools for polyalcohol catalytic reactivity. ACS Catal. 4, 464–468 (2014).

    CAS  Article  Google Scholar 

  13. 13.

    Alonso, D. M., Wettstein, S. G. & Dumesic, J. A. Bimetallic catalysts for upgrading of biomass to fuels and chemicals. Chem. Soc. Rev. 41, 8075–8098 (2012).

    CAS  PubMed  Article  Google Scholar 

  14. 14.

    Greeley, J. & Mavrikakis, M. Alloy catalysts designed from first principles. Nat. Mater. 3, 810–815 (2004).

    ADS  CAS  PubMed  Article  Google Scholar 

  15. 15.

    Nikolla, E., Schwank, J. & Linic, S. Measuring and relating the electronic structures of nonmodel supported catalytic materials to their performance. J. Am. Chem. Soc. 131, 2747–2754 (2009).

    CAS  PubMed  Article  Google Scholar 

  16. 16.

    Marcinkowski, M. D. et al. Pt/Cu single-atom alloys as coke-resistant catalysts for efficient C-H activation. Nat. Chem. 10, 325 (2018).

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    Lucci, F. R. et al. Controlling hydrogen activation, spillover, and desorption with Pd-Au single-atom alloys. J. Phys. Chem. Lett. 7, 480–485 (2016).

    CAS  PubMed  Article  Google Scholar 

  18. 18.

    Li, Z., Wang, S., Chin, W. S., Achenie, L. E. & Xin, H. High-throughput screening of bimetallic catalysts enabled by machine learning. J. Mater. Chem. A 5, 24131–24138 (2017).

    CAS  Article  Google Scholar 

  19. 19.

    Duchesne, P. N. et al. Golden single-atomic-site platinum electrocatalysts. Nat. Mater. 17, 1033 (2018).

    ADS  CAS  PubMed  Article  Google Scholar 

  20. 20.

    Greiner, M. T. et al. Free-atom-like \(d\) states in single-atom alloy catalysts. Nat. Chem. 10, 1008 (2018).

    CAS  PubMed  Article  Google Scholar 

  21. 21.

    Batchelor, T. A. A. et al. High-entropy alloys as a discovery platform for electrocatalysis. Joule 3, 834–845 (2019).

    CAS  Article  Google Scholar 

  22. 22.

    Choksi, T. S., Roling, L. T., Streibel, V. & Abild-Pedersen, F. Predicting adsorption properties of catalytic descriptors on bimetallic nanoalloys with site-specific precision. J. Phys. Chem. Lett. 10, 1852–1859 (2019).

    CAS  PubMed  Article  Google Scholar 

  23. 23.

    Wei, J. N., Duvenaud, D. & Aspuru-Guzik, A. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Ulissi, Z. W., Medford, A. J., Bligaard, T. & Nørskov, J. K. To address surface reaction network complexity using scaling relations machine learning and DFT calculations. Nat. Commun. 8, 14621 (2017).

    ADS  PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    O’Connor, N. J., Jonayat, A. S. M., Janik, M. J. & Senftle, T. P. Interaction trends between single metal atoms and oxide supports identified with density functional theory and statistical learning. Nat. Catal. 1, 531 (2018).

    Article  CAS  Google Scholar 

  26. 26.

    Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547 (2018).

    ADS  CAS  PubMed  Article  Google Scholar 

  27. 27.

    Chowdhury, A. J. et al. Prediction of adsorption energies for chemical species on metal catalyst surfaces using machine learning. J. Phys. Chem. C 122, 28142–28150 (2018).

    CAS  Article  Google Scholar 

  28. 28.

    Benson, S. W. Thermochemical Kinetics (Wiley, 1976).

  29. 29.

    Benson, S. W. et al. Additivity rules for the estimation of thermochemical properties. Chem. Rev. 69, 279–324 (1969).

    CAS  Article  Google Scholar 

  30. 30.

    Hammer, B. & Nørskov, J. K. Why gold is the noblest of all the metals. Nature 376, 238–240 (1995).

    ADS  CAS  Article  Google Scholar 

  31. 31.

    Hammer, B., Morikawa, Y. & Nørskov, J. K. CO chemisorption at metal surfaces and overlayers. Phys. Rev. Lett. 76, 2141 (1996).

    ADS  CAS  PubMed  Article  Google Scholar 

  32. 32.

    Abild-Pedersen, F. et al. Scaling properties of adsorption energies for hydrogen-containing molecules on transition-metal surfaces. Phys. Rev. Lett. 99, 016105 (2007).

    ADS  CAS  PubMed  Article  Google Scholar 

  33. 33.

    Calle-Vallejo, F., Martínez, J. I., García-Lastra, J. M., Rossmeisl, J. & Koper, M. T. M. Physical and chemical nature of the scaling relations between adsorption energies of atoms on metal surfaces. Phys. Rev. Lett. 108, 116103 (2012).

    ADS  CAS  PubMed  Article  Google Scholar 

  34. 34.

    Montemore, M. M. & Medlin, J. W. Site-specific scaling relations for hydrocarbon adsorption on hexagonal transition metal surfaces. J. Phys. Chem. C 117, 20078–20088 (2013).

    CAS  Article  Google Scholar 

  35. 35.

    Calle-Vallejo, F. et al. Finding optimal surface sites on heterogeneous catalysts by counting nearest neighbors. Science 350, 185–189 (2015).

    ADS  CAS  PubMed  Article  Google Scholar 

  36. 36.

    Stenlid, J. H. & Brinck, T. Extending the \(\sigma\)-hole concept to metals: An electrostatic interpretation of the nanostructural effects in gold and platinum catalysis. J. Am. Chem. Soc. 139, 11012–11015 (2017).

    CAS  PubMed  Article  Google Scholar 

  37. 37.

    Salciccioli, M., Chen, Y. & Vlachos, D. G. Density functional theory-derived group additivity and linear scaling methods for prediction of oxygenate stability on metal catalysts: adsorption of open-ring alcohol and polyol dehydrogenation intermediates on Pt-based metals. J. Phys. Chem. C 114, 20155–20166 (2010).

    CAS  Article  Google Scholar 

  38. 38.

    Salciccioli, M., Edie, S. M. & Vlachos, D. G. Adsorption of acid, ester, and ether functional groups on Pt: fast prediction of thermochemical properties of adsorbed oxygenates via DFT-based group additivity methods. J. Phys. Chem. C 116, 1873–1886 (2012).

    CAS  Article  Google Scholar 

  39. 39.

    Suntivich, J. et al. Design principles for oxygen-reduction activity on perovskite oxide catalysts for fuel cells and metal-air batteries. Nat. Chem. 3, 546–550 (2011).

    CAS  PubMed  Article  Google Scholar 

  40. 40.

    Suntivich, J., May, K. J., Gasteiger, H. A., Goodenough, J. B. & Shao-Horn, Y. A perovskite oxide optimized for oxygen evolution catalysis from molecular orbital principles. Science 334, 1383–1385 (2011).

    ADS  CAS  PubMed  Article  Google Scholar 

  41. 41.

    Hong, W. T. et al. Toward the rational design of non-precious transition metal oxides for oxygen electrocatalysis. Energy Environ. Sci. 8, 1404–1427 (2015).

    CAS  Article  Google Scholar 

  42. 42.

    Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, 1415 (2016).

    CAS  Article  Google Scholar 

  43. 43.

    Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  44. 44.

    Yao, K., Herr, J. E., Brown, S. N. & Parkhill, J. Intrinsic bond energies from a bonds-in-molecules neural network. J. Phys. Chem. Lett. 8, 2689–2694 (2017).

    CAS  PubMed  Article  Google Scholar 

  45. 45.

    Andersen, M., Levchenko, S., Scheffler, M. & Reuter, K. Beyond scaling relations for the description of catalytic materials. ACS Catal. 9, 2752–2759 (2019).

    CAS  Article  Google Scholar 

  46. 46.

    Xin, H. & Linic, S. Communications: exceptions to the \(d\)-band model of chemisorption on metal surfaces: the dominant role of repulsion between adsorbate states and metal \(d\)-states. J. Chem. Phys. 132, 221101 (2010).

    ADS  PubMed  Article  CAS  Google Scholar 

  47. 47.

    Janet, J. P. & Kulik, H. J. Resolving transition metal chemical space: Feature selection for machine learning and structure-property relationships. J. Phys. Chem. A 121, 8939–8954 (2017).

    CAS  PubMed  Article  Google Scholar 

  48. 48.

    Friedman, J., Hastie, T. & Tibshirani, R. in The elements of statistical learning, 2nd edn Pages: xii; 79–80; 408–409; 534–541 (Springer Series in Statistics, 2001).

  49. 49.

    García-Muelas, R., Li, Q. & López, N. Density functional theory comparison of methanol decomposition and reverse reactions on metal surfaces. ACS Catal. 5, 1027–1036 (2015).

    Article  CAS  Google Scholar 

  50. 50.

    Lide, D. CRC Handbook of Chemistry and Physics 84th edn (CRC Press LLC, 2003–2004).

  51. 51.

    Montemore, M. M. & Medlin, J. W. A unified picture of adsorption on transition metals through different atoms. J. Am. Chem. Soc. 136, 9272–9275 (2014).

    CAS  PubMed  Article  Google Scholar 

  52. 52.

    Wellendorff, J. et al. A benchmark database for adsorption bond energies to transition metal surfaces and comparison to selected DFT functionals. Surf. Sci. 640, 36–44 (2015).

    ADS  CAS  Article  Google Scholar 

  53. 53.

    Silbaugh, T. L. & Campbell, C. T. Energies of formation reactions measured for adsorbates on late transition metal surfaces. J. Phys. Chem. C 120, 25161–25172 (2016).

    CAS  Article  Google Scholar 

  54. 54.

    Xu, Z. & Kitchin, J. R. Probing the coverage dependence of site and adsorbate configurational correlations on (111) surfaces of late transition metals. J. Phys. Chem. C 118, 25597–25602 (2014).

    CAS  Article  Google Scholar 

  55. 55.

    Garcia-Ratés, M., García-Muelas, R. & López, N. Solvation effects on methanol decomposition on Pd(111), Pt(111), and Ru(0001). J. Phys. Chem. C 121, 13803–13809 (2017).

    Article  CAS  Google Scholar 

  56. 56.

    Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15–50 (1996).

    CAS  Article  Google Scholar 

  57. 57.

    Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).

    ADS  CAS  Article  Google Scholar 

  58. 58.

    ÁlvarezMoreno, M. et al. Managing the computational chemistry big data problem: The ioChem-BD platform. J. Chem. Inf. Model. 55, 95–103 (2015).

    Article  CAS  Google Scholar 

  59. 59.

    García-Muelas, R. Statistical learning goes beyond the \(d\)-band model providing the thermochemistry of adsorbates on transition metals: Dataset. Stored in ioChem-BD, Ref. [58]. (2019).

  60. 60.

    Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).

    ADS  CAS  Google Scholar 

  61. 61.

    Grimme, S. Semiempirical GGA-type density functional constructed with a long-range dispersion correction. J. Comput. Chem. 27, 1787–1799 (2006).

    CAS  PubMed  Article  Google Scholar 

  62. 62.

    Almora-Barrios, N., Carchini, G., Błoński, P. & López, N. Costless derivation of dispersion coefficients for metal surfaces. J. Chem. Theory Comput. 10, 5002–5009 (2014).

    CAS  PubMed  Article  Google Scholar 

  63. 63.

    Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 (1994).

    ADS  Article  Google Scholar 

  64. 64.

    Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999).

    ADS  CAS  Article  Google Scholar 

  65. 65.

    Makov, G. & Payne, M. C. Periodic boundary conditions in ab initio calculations. Phys. Rev. B 51, 4014–4022 (1995).

    ADS  CAS  Article  Google Scholar 

  66. 66.

    Monkhorst, H. J. & Pack, J. D. Special points for Brillouin-zone integrations. Phys. Rev. B 13, 5188–5192 (1976).

    ADS  MathSciNet  Article  Google Scholar 

Download references


We thank AGAUR 2017 SGR 90 and MCIU RTI2018-101394-B-I00 projects for financial support. The Barcelona Supercomputing Center – MareNostrum (BSC-RES) is acknowledged for providing generous computer resources. We thank Professors C. Bo, R. Guimerà, M. Sales-Pardo, F. Abild-Pedersen, and N. Almora-Barrios for fruitful discussions.

Author information




R.G.-M. performed the numerical calculations. R.G.-M. and N.L. contributed to analyze the data and to write the manuscript.

Corresponding authors

Correspondence to Rodrigo García-Muelas or Núria López.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

García-Muelas, R., López, N. Statistical learning goes beyond the d-band model providing the thermochemistry of adsorbates on transition metals. Nat Commun 10, 4687 (2019).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links