Abstract
Building upon the dband reactivity theory in surface chemistry and catalysis, we develop a Bayesian learning approach to probing chemisorption processes at atomically tailored metal sites. With representative species, e.g., *O and *OH, Bayesian models trained with ab initio adsorption properties of transition metals predict site reactivity at a diverse range of intermetallics and nearsurface alloys while naturally providing uncertainty quantification from posterior sampling. More importantly, this conceptual framework sheds light on the orbitalwise nature of chemical bonding at adsorption sites with dstates characteristics ranging from bulklike semielliptic bands to freeatomlike discrete energy levels, bridging the complexity of electronic descriptors for the prediction of novel catalytic materials.
Introduction
Adsorption of molecules or their fragments at transitionmetal surfaces is a fundamental process for many technological applications, such as chemical sensing, molecular selfassembly, and heterogeneous catalysis. Because of the convoluted interplay between electron transfer and orbital coupling, chemical bonding can be formidably complex. Recent decades have brought major advances in spectroscopic tools^{1,2}, which reveal orbitalwise information of chemisorbed systems and concurrently in predicting chemical reactivity at sites of interest via electronic factors, e.g., the number of valence delectrons^{3}, density of dstates at the Fermi level^{4}, dband center^{5}, and dband upper edge^{6,7}. Compared with a full quantummechanics treatment of manybody systems, the simplicity of physicsinspired descriptors comes at a cost of limited generalization, particularly for highthroughput materials screening. Incorporation of multifidelity site features into reactivity models with machine learning (ML) algorithms has shown early promise for the prediction of adsorption energies, with an accuracy comparable to the typical error (~0.1−0.2 eV) of density functional theory (DFT) calculations^{8,9,10,11,12,13,14,15,16}. However, the approach is largely blackbox in nature, prohibiting its physical interpretation. Developing a theorybased, generalizable model of chemisorption that bridges the complexity of electronic descriptors, and predicts the binding affinity of active sites to key reaction intermediates with uncertainty quantification represents one of the biggest challenges in fundamental catalysis.
Here, we present a Bayesian inference approach to probing chemisorption processes at metal sites by learning from ab initio datasets. The model is built upon the basic framework of the dband reactivity theory^{5}, while employing a Newns–Andersontype Hamiltonian^{17,18} to capture essential physics of adsorbatesubstrate interactions. Such types of simplified Hamiltonians were originally used for describing magnetic properties of impurities in a bulk metallic host^{17}, and later extended with success by Newns and Grimley to chemisorption at surfaces^{18,19}. A basis set of orbitals consisting of the adsorbate and substrate states was used for solving the hybridization problem within a selfconsistent Hartree–Fock scheme^{18}. Despite a remarkable success in advancing the basic understanding of adsorption phenomena at surfaces, particularly for dblock metals^{6}, its application in materials design remains limited due to the lack of accurate model parameters and meaningful error estimates. Bayesian inference produces the posterior probability distribution of model parameters under the influence of observations and prior knowledge^{20}. With representative species, e.g., *O and *OH, we demonstrate the predictive performance and physical interpretability of Bayesian models for chemical bonding at a diverse range of intermetallics and nearsurface alloys, bridging the complexity of electronic descriptors in search of novel catalytic materials.
Results
The dband reactivity theory
Within the basic framework of the dband reactivity theory for transitionmetal surfaces, the formation of the adsorbatemetal bond conceptually takes place in two consecutive steps^{5}, as illustrated in Fig. 1. First, the adsorbate frontier orbital (or orbitals) \(\lefta\right\rangle\) at \({\epsilon }_{{\mathrm{a}}}^{0}\) couples to the delocalized, freeelectronlike spstates of the metal substrate, leading to a Lorenzianshaped resonance state at ϵ_{a}. Second, the adsorbate resonance state interacts with the localized, narrowlydistributed metal dstates, shifting up in energies due to the orthogonalization penalty for satisfying the Pauli principle, and then splitting into bonding and antibonding states. The first step interaction contributes a constant ΔE_{0} albeit often the largest part of chemical bonding. The variation in adsorption energies from one metal to another is determined by the metal dstates. This part of the interaction energy ΔE_{d} can be further partitioned into orbital orthogonalization and orbital hybridization contributions^{21}. To a first approximation, the orbital hybridization energy can be evaluated by the changes of integrated oneelectron energies. The orbital orthogonalization cost is considered simply as proportional to the product of interatomic coupling matrix and overlap matrix, VS, or equivalently αV^{2}, where α is the orbital overlap coefficient. The absolute value of V^{2} can be written as \(\beta {V}_{{\mathrm{ad}}}^{2}\), in which the standard values of \({V}_{{\mathrm{ad}}}^{2}\) relative to Cu are readily available on the Solid State Table^{22}. The overall adsorption energy ΔE can then be written as the sum of the energy contributions from the spstates ΔE_{0} and the dstates ΔE_{d}, with the latter depending on the symmetry and degeneracy of adsorbate frontier orbitals. Another important information from this framework is the evolving density of states projected onto the adsorbate orbital(s) upon adsorption, ρ_{a}. A full account of the theoretical framework is presented in the “Methods” section.
There are a number of unknown parameters within the basic framework of the dband reactivity theory as discussed above and detailed in “Methods” section, including the energy contribution from the spband ΔE_{0}, adsorbate resonance energy ϵ_{a} relative to the Fermi level, spband chemisorption function Δ_{0}, orbital overlap coefficient α, and orbital coupling coefficient β. By leastsquares fitting of the adsorbate density of states and the integrated oneelectron energy changes to those from DFT calculations^{23,24}, the Schmickler model of electron transfer has been developed to understand H_{2} evolution/oxidation and OH^{−} adsorption at metal–electrolyte interfaces. However, the deterministic fitting of adsorption properties from a single surface is prone to overfitting or trapping into a locally optimal region, limiting its application in catalysis.
Bayesian learning
We instead employ Bayesian learning to infer the vector of model parameters \(\overrightarrow{\theta }={(\Delta {E}_{0},{\epsilon }_{{\mathrm{a}}},{\Delta }_{0},\alpha ,\beta )}^{\prime}\) from the evidence, i.e., ab initio adsorption properties, along with prior knowledge if available^{20}. In Bayes’ view, those parameters are not deterministic point values, but rather probabilistic distributions reflecting the uncertainty of physical variables. The use of parameter distributions as opposed to computationallyderived point values has obvious advantages for uncertainty quantification. In the chemical sciences, Bayesian learning has been used for calibration and validation of thermodynamic models for the uptake of CO_{2} in mesoporous silicasupported amines^{25}, designing the Bayesian error estimation functional with van der Waals correlations^{26}, and identifying potentially active sites and mechanisms of catalytic reactions^{27}, just to name a few. The Bayesian approach allows one to infer the posterior probability distribution \(P(\overrightarrow{\theta } {\mathcal{D}})\) for latent variables based on the prior \(P(\overrightarrow{\theta })\) as well as the likelihood function \(P({\mathcal{D}} \overrightarrow{\theta })\) subject to the observation \({\mathcal{D}}\). The mathematical relationship between the prior, observation, and posterior is given by the Bayes’ theorem^{20}, \(P(\overrightarrow{\theta } {\mathcal{D}})=P({\mathcal{D}} \overrightarrow{\theta })P(\overrightarrow{\theta })/P({\mathcal{D}})\). Our initial belief about likely parameter values is provided by weakly informative priors to minimize potential bias. For example, ΔE_{0} and ϵ_{a} can be estimated from DFT calculations of the adsorbate on a simple metal, e.g., sodium (Na) at the facecentered cubic (fcc) phase. Specifically, we took Normal for floatingpoint variables unrestricted in sign, LogNormal for nonnegative parameters, and Uniform for others (see the details of Bayesian learning and parameter choices in the “Methods” section). Computing the normalizing constant \(P({\mathcal{D}})\), denominator of the posterior distribution, is impossible in most practical scenarios. To avoid this complication, the Markov chain Monte Carlo (MCMC) method^{28}, whose sampling criterion only depends on the relative posterior density of the newly explored point and its preceding point, is used. To compute the transition probability of each MCMC step, we define the sum of the (negative) logarithm of the likelihood functions corresponding to binding energies and projected density of states onto each adsorbate orbital with a hyperparameter λ adjusting the weight of two contributing metrics, see details in the “Methods” section. After a large number of MCMC samplings, burning (discard) of the first half of the trajectory and then thinning (1 out of 5 samplings) were performed before extracting converged values from the joint posterior distributions. The convergence of the MCMC sampling is checked by using parallel chains with different starting parameter sets such that the variance of interchain samplings is close or within 1.2–1.5 times to that of intrachains^{28}. The complete code, named Bayeschem, is now available at a Github repository https://github.com/hlxin/bayeschem for public access.
Model development
In Fig. 2a, we are showing the covariance of the joint posterior distribution for each parameter pair and the 1D histogram of model parameters (ΔE_{0}, ϵ_{a}, Δ_{0}, α, and β) from MCMC simulations for *O adsorption at the fcchollow site of the {111}terminated transitionmetal surfaces (Cu, Ag, Au, Ni, Pd, Pt, Co, Rh, Ir, and Ru). We assume three degenerate O_{2p} orbitals as used before^{29} for demonstration of the approach, while later extend it to multiorbital models. To attain converged posterior distributions, 200k MCMC sampling steps with the Metropolis–Hastings algorithm were performed in a multidimensional parameter space illustrated in Fig. 2b. In Fig. 2, the approximate contours for 68, 95, and 99% confidence regions are shown at the lower triangle, showing little to no correlation between latentvariable pairs.
With the converged Bayesian sampling, in Fig. 3a, it shows the modelpredicted adsorption energies of *O at the fcchollow site of transitionmetal surfaces, with a mean absolute error (MAE) ~0.17 eV compared to DFT calculations. The standard deviation of model prediction using the posterior distribution of model parameters (\(\overrightarrow{\theta },\,\overrightarrow{\sigma }\)) is overlaid, providing for the first time uncertainty quantification of adsorption energies within the dband reactivity theory. Figure 3b shows DFTcalculated and modelconstructed projected density of states onto the O_{2p} orbital using the posterior means of model parameters, taking Pt(111) as an example (see all the surfaces in Supplementary Fig. 1). The chemisorption function Δ(ϵ) and its Hilbert transform Λ(ϵ) along with the straight adsorbate line (ϵ − ϵ_{a}) are shown for the graphical solution of the Newns–Anderson model^{18}. The intersects indicated by solid circles in Fig. 3b represent the O_{2p}–Pt_{5d} bonding and antibonding states, with the latter above the Fermi level, suggesting a strong covalent interaction of *O at Pt(111). Given the simplicity of the model, the clearly captured electronic structure of the adsorbate–substrate system and the reactivity trend are satisfying.
To extend the approach for adsorbates with multiple valence orbitals that possibly contribute to bonding, we have explicitly treated O_{2p} states with the doubly degenerate p_{xy} orbitals and the single p_{z} orbital in Bayesian learning. We infer model parameters (ϵ_{a}, Δ_{0}, and β) corresponding to each nonequivalent adsorbate orbital together with an orbitalindependent α^{29} and a global parameter ΔE_{0}. The posterior parameter distributions are shown in Supplementary Fig. 2. From the posterior means of model parameters, we can see that the orbital coupling coefficient β of p_{xy} (1.67 eV^{−1}) is smaller than that of p_{z} (1.77 eV^{−1}), consistent with the symmetry analysis, that the p_{xy} orbitals that are parallel to a surface form π bonds with the dstates, while the p_{z} orbital can interact through a stronger σ bond. A weaker coupling manifests itself in a narrower orbital splitting of π/π^{*} than that of σ/σ^{*}, which has been previously observed using the angleresolved photoemission spectroscopy on Cu and Ni^{30}. In Supplementary Figs. 3 and 4, it shows that the modelconstructed projected density of states onto symmetryresolved orbitals closely resemble the DFTcalculated distributions and the predicted values of *O adsorption energies have a MAE ~0.17 eV. To demonstrate the robustness and generalizability of the approach, we have also optimized the Bayeschem model of *O at the atop configuration, see Supplementary Figs. 5–7. In this model scheme, an individual set of parameters is obtained for the adsorbate at a given site. Compared to the linear adsorptionenergy scaling relations^{31} that link adsorption energies of different adsorbates, Bayeschem creates the connection between the electronic structure of a surface site and the adsorption energy.
To test the prediction capability of the Bayeschem model for unseen systems, we took the *OH species at the atop adsorption configuration as a case study because of its fundamental importance in understanding the nature of chemical bonding^{32}, and practical interests as a key reactivity descriptor in transition metal catalysis^{33,34,35}. Three frontier molecular orbitals, i.e., 3σ, 1π, and 4σ*, are assumed to be involved in chemical bonding^{32}. Symmetryresolved, molecular orbital density of states projected onto OH along with adsorption energies are used as the DFT ground truth Y in Eq. (6). With the Bayeschem model developed here (see Supplementary Figs. 8–10 for posterior parameter distributions, modelpredicted adsorption energies and projected density of states on training samples), we predict *OH binding energies at a diverse range of intermetallics and nearsurface alloys. Specifically, we included A_{3}B, A′@A_{ML}, AB@A_{ML}, A_{3}B@A_{ML}, A@A_{3}B, and A@AB_{3}, where A (A′) represents ten fcc/hcp metals used in the model development and B covers dmetals across the periodic table (see ref. ^{36} for structural details and tabulated data). The coupling matrix element V_{ad} for alloys is assumed to be constant from the Solid State Table^{22}. Its dependence on the local chemical environment can be incorporated into the model using the tightbinding approximation^{33}. The A sites of abovementioned surfaces exhibit diverse characteristics of the metal dstates ranging from bulklike semielliptic bands to freeatomlike discrete energy levels^{37}, as illustrated in Fig. 4a using Pt and Ag_{3}Pt as examples. Similar to previous observations of singleatom alloys with coinage metal hosts^{37,38}, a reactive guest metal often exhibits peaky signatures within the dband due to the energy misalignment of coupling d–d orbitals^{7}. A direct consequence of such diverse electronic properties of adsorption sites is that no single electronic descriptor can capture the local chemical reactivity accurately. Encouragingly, the Bayeschem model, parameterized using ten pristine transitionmetal data, predicts *OH adsorption energies on 512 alloy surfaces with a MAE 0.16 eV, see Fig. 4b. The standard deviation of predicted *OH adsorption energies from the posterior distribution of model parameters is marked for uncertainty quantification. It shows a similar performance to datadriven ML models^{8,9,10,11} while outperforming the stateoftheart electronic descriptors, e.g., the dband center ϵ_{d} (MAE: 0.20 eV) and upper edge ϵ_{u} (MAE: 0.23 eV). The approach can be easily extended to more complex adsorbates than *O and *OH, e.g., *OOH, without losing its generalizability in the development workflow.
Orbitalwise interpretation of chemical bonding
More importantly, the Bayesian framework with builtin physics allows us to quantitatively interrogate the underlying mechanism of chemical bonding, that is difficult to obtain from purely datadriven regression models. Taking *OH adsorption at the M (10 fcc/hcp metals) site of {111}terminated Ag_{3}M intermetallics as examples, Fig. 4c shows the partition of *OH adsorption energies resulting from the 2^{nd} step interaction (ΔE_{d}) into orbital orthogonalization and hybridization. As we can see, for 3d, 4d, and 5d series of the guest metal M, the orthogonalization and hybridization contributions decrease in magnitude from left to right across the periodic table, while the hybridization dominates the reactivity trends. The changes in \(\Delta {E}_{{\mathrm{d}}}^{{\mathrm{hyb}}}\) can be understood from the simplified dband model, with the position and occupancy of adsorbate–substrate antibonding states tracking with the dband center or upper edge. The orthogonalization energy is proportional to the filling f and \({V}_{{\mathrm{ad}}}^{2}\) (see Eq. (4)), which are offsetting each other to a certain extent (\({V}_{{\mathrm{ad}}}^{2}\) decreases while f increases across 3d, 4d, and 5d series), leading to a less dominant role than the hybridization. The orbitalwise contributions shown in Fig. 4c with different fill patterns suggest that the sole contribution of *OH adsorption at dmetal surfaces is from the 1π orbital, while those from 3σ and 4σ^{*} are too small to be visible. This is supported by projected molecular orbital density of states in Supplementary Fig. 7, which shows that 3σ and 4σ^{*} are forming resonance states after their interactions with the spstates of the metal site without noticeable splitting due to dstates. Thus, they do not contribute to the observed trend of *OH adsorption. The Bayesianoptimized orbital coupling coefficients of 3σ and 4σ^{*} are rather small (0.12 and 0.001 as shown in Supplementary Fig. 5, respectively), supporting unfavorable orbital overlaps with the dstates. This rationalizes the observation that *OH prefers the nearlyparallel adsorption geometry on most of the dmetals to maximize the interaction of the 1π orbital with metal dstates, while *OH on Na(111) adsorbs more strongly in a upstraight orientation because of a lack of such directional interactions. This orbitalwise insight of chemical bonding could provide guidance in tailoring orbitalspecific characteristics of the metal dband for desired catalytic properties through site engineering. Despite an exclusive discussion about the dmetals, it is possible to extend the Bayeschem framework to pblock metals and alloys see Supplementary Fig. 11, unifying the reactivity theory of metal surfaces.
To conclude, we present the first Bayesian model of chemisorption by learning from ab initio adsorption properties. The model leverages the wellestablished dband reactivity theory and a Newns–Andersontype Hamiltonian for capturing essential physics of chemisorption processes. We demonstrated that the Bayeschem models of descriptor species, e.g., *O and *OH, optimized with pristine transitionmetal data predicts adsorption energies at a diverse range of atomicallytailored metal sites with a MAE ~0.1–0.2 eV while providing uncertainty quantification. Incorporation of physicsbased models into datadriven ML algorithms, e.g., deep learning, might hold the promise toward developing highly accurate while interpretable reactivity models. Furthermore, this conceptual framework can be broadly applied to unravel orbitalspecific factors governing adsorbate–substrate interactions, paving the path toward design strategies to go beyond adsorptionenergy scaling limitations in catalysis.
Methods
DFT calculations
Spinpolarized DFT calculations were performed through Quantum ESPRESSO^{39} with ultrasoft pseudopotentials. The exchangecorrelation was approximated within the generalized gradient approximation (GGA) with Perdew–Burke–Ernzerhof (PBE)^{40}. {111}terminated metal surfaces were modeled using (2 × 2) supercells with four layers and a vacuum of 15 Å between two images. The bottom two layers were fixed while the top two layers and adsorbates were allowed to relax until a force criteria of .1 eV/Å. A plane wave energy cutoff of 500 eV was used. A MonkhorstPack mesh of 6 × 6 × 1 was used to sample the Brillouin zone, while for molecules and radicals only the Gamma point was used. Gas phase species of O and OH were used as the reference for adsorption energies of *O and *OH, respectively. The projected atomic and molecular density of states were obtained by projecting the eigenvectors of the full system at a denser kpoint sampling (12 × 12 × 1) with a energy spacing 0.01 eV onto the ones of the part, as determined by gasphase calculations. The convergence of DFT calculations was thoroughly tested to be within 0.05 eV. Further details and tabulated data can be found in the ref. ^{9}.
The dband reactivity theory
To revisit the dband theory of chemisorption along with new developments, let’s consider a metal substrate M in which electrons occupy a set of continuous states with oneelectron wavefunctions \(\leftk\right\rangle\) and eigenenergies ϵ_{k}, and an isolated adsorbate species A with a valence electron described by an atomic wavefunction \(\lefta\right\rangle\) at \({\epsilon }_{{\mathrm{a}}}^{0}\), see Fig. 1. When the adsorbate is brought close to the substrate, the two sets of states will overlap and hybridize with each other. The strength of such interactions is determined by the coupling integral \({V}_{{\mathrm{ak}}}\,=\,\langle a \hat{{\mathcal{H}}} k\rangle\), where \(\hat{{\mathcal{H}}}\) is the system Hamiltonian. Within the Newns–Anderson model of chemisorption^{17,18,19}, \(\hat{{\mathcal{H}}}\) is defined as,
where σ denotes the electron spin, n is the orbital occupancy operator, and c^{†} and c represent the creation and annihilation operator, respectively. The first two terms in Eq. (1) are the oneelectron energies from the adsorbate and the substrate when they are infinitely separated in space. The last term captures the coupling, or intuitively electron hopping, between the adsorbate orbital \(\lefta\right\rangle\) and a continuum of substrate states \(\leftk\right\rangle\). If the oneelectron states of the whole system can be described as a linear combination of the unperturbed adsorbate and substrate states, the oneelectron Schrödinger equation can be solved using the Green’s function approach^{18}. In Fig. 1, we illustrate the chemisorption process of a simple adsorbate onto a dblock metal site characterized by delocalized spstates and localized dstates^{21}. The interaction of the adsorbate state at \({\epsilon }_{{\mathrm{a}}}^{0}\) with the structureless spstates, typically accompanied with electron transfer from/to the Fermi sea, results in a broadened resonance (or socalled renormalized adsorbate state) at an effective energy level ϵ_{a}. Conceptually viewing chemical bonding as consecutive steps in Fig. 1, the renormalized adsorbate state then couples with the narrowly distributed dstates, shifting up in energies due to orbital orthogonalization that increases the kinetic energy of electrons and splitting into bonding and antibonding states. One important information from this framework is the evolving density of states projected onto the adsorbate orbital \(\lefta\right\rangle\) upon adsorption
in which spin is neglected for simplicity. The effective adsorbate energy level, ϵ_{a}, is determined by the image potential of a charged particle in front of conducting surfaces and the Coulomb repulsion between electrons in the same orbital^{18}. The chemisorption function Δ(ϵ) includes contributions from the spstates and the dstates
To simplify the matter, only the 2^{nd} step interaction, i.e., the coupling of the renormalized adsorbate state with the substrate dstates, is explicitly considered in Eq. (2). As a new development in our approach, we include an energyindependent constant Δ_{0} along with Δ_{d} as the chemisorption function Δ(ϵ). The inclusion of Δ_{0} provides a lifetime broadening of the adsorbate state, serving as a mathematical trick to avoid burdensome sampling of the resonance, i.e., the Lorentzian distribution \({\tilde{\rho }}_{{\mathrm{a}}}\) from the 1^{st} step interaction in Fig. 1. Accordingly, ϵ_{a} represents the renormalized adsorbate state. Attributed to the narrowness of a typical metal dband, Δ_{d} can be simplified as the projected density of dstates onto the metal site ρ_{d}(ϵ) modulated by an effective coupling integral squared V^{2}, i.e., Δ_{d} ≃ πV^{2}ρ_{d}(ϵ). Λ(ϵ) is the Hilbert transform of Δ(ϵ). In this framework, the interaction energy between the adsorbate and the substrate can be partitioned into two contributions, i.e., ΔE_{0} and ΔE_{d}. ΔE_{0} is the energy change due to the interaction of the unperturbed adsorbate orbital(s) with the delocalized spstates, while ΔE_{d} is the energy contribution from further interactions with the localized dstates of the substrate. Since all dblock metals have a similar, freeelectronlike spband, ΔE_{0} can be approximated as a surfaceindependent constant albeit the largest contribution to bonding^{21}. To calculate ΔE_{d}, we include both the attractive orbital hybridization \(\Delta {E}_{{\mathrm{d}}}^{{\mathrm{hyb}}}\) and repulsive orbital orthogonalization \(\Delta {E}_{{\mathrm{d}}}^{{\mathrm{orth}}}\)^{29,41}:
The constant 2 considers spin degeneracy of the orbital, \(\langle {\tilde{n}}_{{\mathrm{a}}}\rangle\) is the occupancy of the renormalized adsorbate state by integrating the Lorentzian distribution \({\tilde{\rho }}_{{\mathrm{a}}}\) up to the Fermi level ϵ_{F} (taken as 0), and f is the idealized dband filling of the metal atom. The \({\tan }^{1}\) is defined to lie between −π to 0 since Δ_{0} is a nonzero constant across the energy scale [−15, 15] eV. Thus there is no need to explicitly include localized states even if present below or above the dband. In Eq. (4), α is termed the orbital overlap coefficient, i.e., S ≈ α∣V∣, in which the overlap integral S is linearly proportional to the coupling integral V for a given orbital. Similarly, the effective coupling integral squared V^{2} can be written as \(\beta {V}_{{\mathrm{ad}}}^{2}\), where β denotes the orbital coupling coefficient and \({V}_{{\mathrm{ad}}}^{2}\) characterizes the interorbital coupling strength when the bonding atoms are aligned along the zaxis at a given distance^{42}. Its values of dblock metals relative to that of Cu are readily available on the Solid State Table^{22}. It is important to note that β is in the chemisorption function, which determines both the adsorption energy and adsorbate density of states, whereas α only affects the orbital orthogonalization energy since overlap was not explicitly considered.
Bayesian learning
Due to the computationally intensive nature of the MCMC algorithm, there is a need for a more efficient implementation of the Newns–Anderson model than what is obtained by Python and standard libraries like SciPy and NumPy. We make extensive use of Cython, a C++ extension to the standard Python, to speed up the performance (10–1000 times) of some CPUintensive functions in the model, e.g., Hilbert transform. To perform MCMC sampling, we use PyMC, a flexible and extensible Python package which includes a wide selection of builtin statistical distributions and sampling algorithms^{43}, e.g., MetropolisHastings. A “burnin” of the first half of the samplings and then thinning (1 out of 5 samplings) was performed to ensure that subsequent ones are representative of the posterior distribution. Convergence of our MCMCbased sampling was verified using parallel chains^{28}. The MCMC sampling results can be directly visualized using corner, a opensource Python module. We took Normal for floatingpoint variables unrestricted in sign, LogNormal for nonnegative parameters, and Uniform for others. ΔE_{0} and ϵ_{a} can be estimated from DFT calculations of the adsorbate on a simple metal, e.g., sodium (Na) at the facecentered cubic (fcc) phase. Specifically, for *O, we used ΔE_{0} ~ N(−5.0, 1), ϵ_{a} ~ N(−5, 1), Δ_{0} ~ LN(1, 0.25), β ~ LN(2, 1), and α ~ U(0, 1). For *OH, we used ΔE_{0} ~ N(−3.0, 1), \({\epsilon }_{{\mathrm{a}}}^{3\sigma } \sim N(6,1)\), \({\epsilon }_{{\mathrm{a}}}^{1\pi } \sim N(2,1)\), and \({\epsilon }_{{\mathrm{a}}}^{4{\sigma }^{* }} \sim N(4,1)\). We assume that the predicted adsorption properties from Eqs. (2) and (4) are subject to independent normal errors. Specifically, for the property Y and the surface i we have
where ϵ_{i} is an independent and standard normal random variable and σ is the standard deviation, allowing for a mismatch between the model prediction \({\hat{Y}}_{{\mathrm{i}}}(\overrightarrow{\theta })\) and the DFT ground truth Y_{i}. In this approach, we define the likelihood function of the property Y from n observations^{44}
where the sum runs over n training samples for the property Y, which is either the projected density of states onto an adsorbate orbital or adsorption energies. For adsorption energies, Y_{i} and \({\hat{Y}}_{{\mathrm{i}}}\) are scalar values with no ambiguity. For projected density of states, it is a vector of paired values, i.e., the oneelectron energy of a state and its probability density, thus deserving a clarification. The mean squared residuals of model prediction from Eq. (2) for the surface i is used as \({\{{Y}_{{\mathrm{i}}}{\hat{Y}}_{{\mathrm{i}}}(\overrightarrow{\theta })\}}^{2}\) in Eq. (6). To compute the transition probability of each MCMC step, we define the sum of the (negative) logarithm of the likelihood functions corresponding to projected density of states onto each adsorbate orbital and binding energies with a hyper parameter λ adjusting the weight of two contributing metrics, i.e., \({\mathrm{ln}}\,({P}_{\Delta {\mathrm{E}}})\lambda \sum {\mathrm{ln}}\,({P}_{{\rho }_{{\mathrm{a}}}})\). To optimize this parameter, we varied it on a grid of 1.0e−3, 1.0e−2, 1.0e−1, and 1, and found that 1.0e−2 is the optimal value to obtain the best performance in adsorption energy prediction.
Data availability
The training data of metal surfaces used for model development is available at the Github repository https://github.com/hlxin/bayeschem while the test data are from the article https://doi.org/10.1039/C7TA01812F10.1039/C7TA01812F.
Code availability
The complete code of Bayeschem is available at a Github repository https://github.com/hlxin/bayeschem for public access.
References
Nilsson, A., Pettersson, L. & Nørskov, J. K. Chemical Bonding at Surfaces and Interfaces. (Elsevier, Amsterdam, Oxford, 2008).
Somorjai, G. A. & Li, Y. Introduction to Surface Chemistry and Catalysis (Wiley, Hoboken, 2010).
CalleVallejo, F. et al. Number of outer electrons as descriptor for adsorption processes on transition metals and their oxides. Chem. Sci. 4, 1245–1249 (2013).
Tong, Y. Y., Renouprez, A. J., Martin, G. A. & van der Klink, J. J. In Studies in Surface Science and Catalysis (eds Hightower, J. W. et al.) Vol. 101, 901–910 (Elsevier, Amsterdam, 1996).
Hammer, B. & Nørskov, J. K. Electronic factors determining the reactivity of metal surfaces. Surf. Sci. 343, 211–220 (1995).
Vojvodic, A., Nørskov, J. K. & AbildPedersen, F. Electronic structure effects in transition metal surface chemistry. Top. Catal. 57, 25–32 (2014).
Xin, H., Vojvodic, A., Voss, J., Nørskov, J. K. & AbildPedersen, F. Effects of dband shape on the surface reactivity of transitionmetal alloys. Phys. Rev. B Condens. Matter 89, 115114 (2014).
Ma, X., Li, Z., Achenie, L. E. K. & Xin, H. MachineLearningAugmented chemisorption model for CO_{2} electroreduction catalyst screening. J. Phys. Chem. Lett. 6, 3528–3533 (2015).
Li, Z., Wang, S., Chin, W. S., Achenie, L. E. & Xin, H. Highthroughput screening of bimetallic catalysts enabled by machine learning. J. Mater. Chem. A 5, 24131–24138 (2017).
Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO_{2} reduction and H_{2} evolution. Nat. Catal. 1, 696–703 (2018).
Palizhati, A., Zhong, W., Tran, K., Back, S. & Ulissi, Z. W. Towards predicting intermetallics surface properties with highthroughput DFT and convolutional neural networks. J. Chem. Inf. Model. 59, 4742–4749 (2019).
Back, S. et al. Convolutional neural network of atomic surface structures to predict binding energies for highthroughput screening of catalysts. J. Phys. Chem. Lett. 10, 4401–4408 (2019).
Andersen, M., Levchenko, S. V., Scheffler, M. & Reuter, K. Beyond scaling relations for the description of catalytic materials. ACS Catal. 9, 2752–2759 (2019).
Gu, G. H. et al. Practical deeplearning representation for fast heterogeneous catalyst screening. J. Phys. Chem. Lett. 11, 3185–3191 (2020).
Montemore, M. M., Nwaokorie, C. F. & Kayode, G. O. General screening of surface alloys for catalysis. Catal. Sci. Technol. 10, 4467–4476 (2020).
Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. TheoryGuided Machine Learning Finds Geometric StructureProperty Relationships for Chemisorption on Subsurface Alloys. Chem 6, 3100–3117 (2020).
Anderson, P. W. Localized magnetic states in metals. Phys. Rev. 124, 41 (1961).
Edwards, D. M. & Newns, D. M. Electron interaction in the band theory of chemisorption. Phys. Lett. A 24, 236–237 (1967).
Grimley, T. B. The indirect interaction between atoms or molecules adsorbed on metals. Proc. Phys. Soc. Lond. 90, 751 (1967).
Bayes, T. & Price, N. LII. an essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r. S. Philos. Trans. R. Soc. Lond. 53, 370–418 (1763).
Hammer, B., Morikawa, Y. & Nørskov, J. K. CO chemisorption at metal surfaces and overlayers. Phys. Rev. Lett. 76, 2141 (1996).
Harrison, W. A. & Physics. Electronic Structure and the Properties of Solids: The Physics of the Chemical Bond (Dover Publications, New York, 1989).
Santos, E., Quaino, P. & Schmickler, W. Theory of electrocatalysis: hydrogen evolution and more. Phys. Chem. Chem. Phys. 14, 11224–11233 (2012).
Román, A. M., Dudoff, J., Baz, A. & Holewinski, A. Identifying “optimal” electrocatalysts: impact of operating potential and charge transfer model. ACS Catal. 7, 8641–8652 (2017).
Mebane, D. S. et al. Bayesian calibration of thermodynamic models for the uptake of CO2 in supported amine sorbents using ab initio priors. Phys. Chem. Chem. Phys. 15, 4355–4366 (2013).
Wellendorff, J. et al. Density functionals for surface science: Exchangecorrelation model development with bayesian error estimation. Phys. Rev. B Condens. Matter 85, 235149 (2012).
Walker, E. A., Mitchell, D., Terejanu, G. A. & Heyden, A. Identifying active sites of the Water–Gas shift reaction over titania supported platinum catalysts under uncertainty. ACS Catal. 8, 3990–3998 (2018).
Gamerman, D. & Lopes, H. F. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference 2nd Edn (Chapman and Hall/CRC, London, 2006).
Hammer, B. & Nørskov, J. K. In Chemisorption and Reactivity on Supported Clusters and Thin Films: Towards an Understanding of Microscopic Processes in Catalysis (eds Lambert, R. M. & Pacchioni, G.), 285–351 (Springer, Dordrecht, 1997).
Wandelt, K. Photoemission studies of adsorbed oxygen and oxide layers. Surf. Sci. Rep. 2, 1–121 (1982).
AbildPedersen, F. et al. Scaling properties of adsorption energies for hydrogencontaining molecules on transitionmetal surfaces. Phys. Rev. Lett. 99, 016105 (2007).
Xin, H. & Linic, S. Communications: exceptions to the dband model of chemisorption on metal surfaces: the dominant role of repulsion between adsorbate states and metal dstates. J. Chem. Phys. 132, 221101–221104 (2010).
Xin, H., Holewinski, A. & Linic, S. Predictive structure–reactivity models for rapid screening of Ptbased multimetallic electrocatalysts for the oxygen reduction reaction. ACS Catal. 2, 12–16 (2012).
Tang, M. T., Peng, H., Lamoureux, P. S., Bajdich, M. & AbildPedersen, F. From electricity to fuels: descriptors for C1 selectivity in electrochemical CO_{2} reduction. Appl. Catal. B 279, 119384 (2020).
Strmcnik, D. et al. Improving the hydrogen oxidation reaction rate by promotion of hydroxyl adsorption. Nat. Chem. 5, 300–306 (2013).
Li, Z., Ma, X. & Xin, H. Feature engineering of machinelearning chemisorption models for catalyst design. Catal. Today 280, 232–238 (2017).
Thirumalai, H. & Kitchin, J. R. Investigating the reactivity of single atom alloys using density functional theory. Top. Catal. 61, 462–474 (2018).
Greiner, M. T. et al. Freeatomlike d states in singleatom alloy catalysts. Nat. Chem. 10, 1008–1015 (2018).
Giannozzi, P. et al. QUANTUM ESPRESSO: a modular and opensource software project for quantum simulations of materials. J. Phys. Condens. Matter 21, 395502 (2009).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
Vojvodic, A., Nørskov, J. K. & AbildPedersen, F. Electronic structure effects in transition metal surface chemistry. Top. Catal. 57, 25–32 (2014).
Ma, X. & Xin, H. Orbitalwise coordination number for predicting adsorption properties of metal nanocatalysts. Phys. Rev. Lett. 118, 036101 (2017).
Patil, A., Huard, D. & Fonnesbeck, C. J. PyMC: Bayesian stochastic modelling in python. J. Stat. Softw. 35, 1–81 (2010).
Baggaley, A. W., Sarson, G. R., Shukurov, A., Boys, R. J. & Golightly, A. Bayesian inference for a wavefront model of the neolithization of Europe. Phys. Rev. E 86, 016105 (2012).
Acknowledgements
S.W., H.S.P., and H.X. acknowledge the financial support from the NSF CAREER program (CBET1845531). The computational resource used in this work is provided by the advanced research computing at Virginia Polytechnic Institute and State University. H.X. acknowledges the insightful discussion with Prof. John Kitchin from Carnegie Mellon University that inspired the work.
Author information
Authors and Affiliations
Contributions
S.W. and H.S.P. equally contributed to the work. H.X. supervised the research. S.W. and H.X. conceived the idea and designed the general approach. S.W. and H.S.P. conducted DFT calculations and coding. S.W. and H.S.P. performed the detailed analysis. All authors revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Christopher Bartel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, S., Pillai, H.S. & Xin, H. Bayesian learning of chemisorption for bridging the complexity of electronic descriptors. Nat Commun 11, 6132 (2020). https://doi.org/10.1038/s4146702019524z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702019524z
This article is cited by

A review of the recent progress in battery informatics
npj Computational Materials (2022)

Breaking adsorptionenergy scaling limitations of electrocatalytic nitrate reduction on intermetallic CuPd nanocubes by machinelearned insights
Nature Communications (2022)

Interpretable machine learning for knowledge generation in heterogeneous catalysis
Nature Catalysis (2022)

Electronic structure factors and the importance of adsorbate effects in chemisorption on surface alloys
npj Computational Materials (2022)

Infusing theory into deep learning for interpretable reactivity prediction
Nature Communications (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.