Infusing theory into deep learning for interpretable reactivity prediction

Despite recent advances of data acquisition and algorithms development, machine learning (ML) faces tremendous challenges to being adopted in practical catalyst design, largely due to its limited generalizability and poor explainability. Herein, we develop a theory-infused neural network (TinNet) approach that integrates deep learning algorithms with the well-established d-band theory of chemisorption for reactivity prediction of transition-metal surfaces. With simple adsorbates (e.g., *OH, *O, and *N) at active site ensembles as representative descriptor species, we demonstrate that the TinNet is on par with purely data-driven ML methods in prediction performance while being inherently interpretable. Incorporation of scientific knowledge of physical interactions into learning from data sheds further light on the nature of chemical bonding and opens up new avenues for ML discovery of novel motifs with desired catalytic properties.

Adsorption energies of simple molecules or their fragments at solid surfaces often serve as reactivity descriptors in heterogeneous catalysis [1].Rapid discovery of structural motifs with kinetics-favorable descriptor values, for example using quantum-chemical calculations, is appealing while remaining as a daunting task due to the formidable computational cost in accurately solving the many-electron Schrödinger equation.In this aspect, the d-band theory of chemisorption pioneered by Hammer and Nørskov [2][3][4][5][6] has been widely used for understanding reactivity trends of d-block metals [7,8] and, to some extent, their compounds [9].However, its quantitative prediction accuracy using individual d-band characteristics, e.g., the number of d-electrons [10], d-band center [2], and d-band upper edge [6,11], is limited due to the perturbative nature of the theoretical framework [12] and a large variation of site properties in high-throughput catalyst screening.
In recent years, machine learning (ML) has emerged as an alternative approach to predicting chemical reactivity of catalytic sites with either hand-crafted [13][14][15][16][17][18][19][20] or algorithmderived features [21][22][23][24][25].By learning correlated interactions of atoms, ions, or molecules with a substrate from a sufficient amount of ab initio data, it is possible to compute adsorption properties orders of magnitude faster than traditional practices and narrow down candidate materials prior to experimental tests [13, 14, 16-18, 22, 25-28].A major limitation of blackbox ML models, particularly with the resurgent deep learning algorithms [29], is that it is easy to learn some correlates that look deceptively good on both training and test samples, but do not generalize well outside the labeled data.To alleviate the issue, active learning workflows guided by key performance indicators [17,30] and/or model uncertainties [16] have been used to accelerate the exploration of the enormous, essentially infinite, size of the accessible design space.Nevertheless, the necessity of a very large amount of data samples for model development and difficulties in interpreting model prediction impose tremendous challenges toward its adoption for automated search of high-performance catalytic materials.
Herein, we present a theory-infused neural network (Tin-Net) approach to predicting chemical reactivity of transition-metal surfaces and, more importantly, to extracting physical insights into the nature of chemical bonding that can be translated into catalyst design strategies.Incorporation of scientific knowledge of physical interactions into data-driven ML methods is an emerging area of research in catalysis science [13,18,19,23,24,31,32].To the best of our knowledge, no such hybrid surrogate models of chemisorption were developed within a fully-integrated ML framework that are reasonably accurate (∼0.1−0.2 eV error) and transferable across diverse samples.By learning from ab initio adsorption properties with deep learning algorithms, e.g., convolutional neural networks, while respecting the well-established d-band theory of chemisorption in architecture design, the TinNet can be applied for a broad range of d-block metal sites and naturally encodes physical aspects of bonding interactions, inheriting the merits of both worlds.We demonstrate the approach using adsorbed hydroxyl (*OH) at {111}-terminated intermetallics and near-surface alloys as a representative descriptor species, such as in finding efficient electrocatalysts for metalcatalyzed O 2 reduction [33], CO 2 reduction [34], and H 2 oxidation in alkaline electrolytes [35].This framework can be straightforwardly applied to other adsorbates (e.g., *O) or active site ensembles of multiple bonding atoms as shown for *N adsorption at {100}-terminated metal surfaces.The TinNet not only achieves prediction performance on par with purely regression-based ML methods, especially for out-of-sample systems with unseen structural and electronic features, but also enables physical interpretation, paving the path toward ML discovery of novel motifs with desired catalytic properties.

Results
Deep network architecture.As illustrated in Fig. 1, the Tin-Net framework contains two sequential components: a regression module and a theory module.The input into the regression module built with convolutional neural networks is the feature representation of the adsorbate-substrate system that encodes the atomic information and bonding interactions of each atom with its neighboring environment.The output units from the regression module then serve as unknown parameters in the theory module that is built upon the d-band theory of chemisorption for predicting adsorption properties of a d-metal site.To ensure model transferability, easilyaccessible graph features were used, see Fig. level scheme, each atom or node is represented by a binary vector, comprising 9 properties of the atom, e.g., electron affinity, atomic volume, and electronegativity [26,36].Similarly, each connection or edge encodes the pair interaction between neighboring atoms, including the solid angles swept out by the shared face of Voronoi polyhedra [22] and the kernelized distances [36].A surface at the optimized bulk geometry with the adsorbate attached to the site of interest is used [37], thus avoiding the time-consuming structural optimization in exploration of new systems [22].Neural nets with m convolution-pooling layers are connected to the feature representation sub-module.Within the convolutional layers, multidimensional feature arrays are iteratively updated by convolution (i.e., feature mapping) to extract high-level patterns and by pooling for feature subsampling.The 2D array is flattened into a vector, which can be fed into a fully-connected network with k hidden layers and a certain number of hidden neurons at each layer to capture the complex mapping between the extracted features and output targets.Finally, the output vector from the regression is incorporated into the theory module as local parameters along with user-defined global parameters, if any, that are independent of input features.
The physical meaning of each output unit from the regression module is pre-assigned in the TinNet framework.Historically, many factors have been used to correlate with the chemical reactivity of d-block metals, e.g., atomic or bulk properties [10,38], coordination numbers [39,40], and dband characteristics [2,6].Mapping physically relevant factors onto adsorption energies with ML algorithms has been previously explored with some success [13-15, 17-19, 21, 25, 31, 32, 41].Besides the ambiguity of physical interpretation inherent to highly non-linear regression techniques, another major criticism is that some of the hand-crafted features require fully-optimized geometric and/or electronic structures of the clean adsorption site, adding computational overhead costs to reactivity prediction of new materials.Instead of purely mathematical regression, we resort to the d-band theory of chemisorption with Newns-Anderson-type Hamiltonians [31,42,43] for computing adsorption properties of metal sites.The central idea of the approach is to employ the activation output from the regression module as unknown, albeit trainable, parameters in the theory module, see Fig. 1.According to the d-band theory of chemisorption, chemical bonding at transition-metal surfaces can be conceptually separated into two consecutive steps [2].First, the gas-phase adsorbate species, characterized by an orbital |a at 0 a , is embedded into the delocalized sp-states of the substrate, leading to a resonance state at a with a Lorentzian line shape.Second, the adsorbate resonance interacts with a distribution of localized d-states ρ d , shifting up in energies due to the orbital orthogonalization penalty for satisfying the Pauli exclusion principle (termed Pauli repulsion) and then hybridizing into bonding and antibonding states.The first step interaction with the sp-band contributes the largest part of chemical bonding, albeit as a constant ∆E 0 for a given adsorbate and site type.The adsorption energy difference from one metal to the next is governed by the 2 nd step ∆E d , which consists of Pauli repulsion and orbital hybridization [44], as illustrated in Fig. 1.The orthogonalization cost of interacting orbitals ∆E orth d can be quantified simply as proportional to the coupling integral V and overlap integral S, which are related through S ≈ α|V | (α as the overlap coefficient) [44].V 2 can be conveniently written as βV 2  ad , in which β denotes the coupling coefficient.V 2 ad represents the interatomic coupling integral squared when the atoms are aligned along the z-axis and its standard value for a d-metal relative to Cu has been estimated from the linear muffin-tin orbitals (LMTO) theory and is readily available on the solid state table [45].To a first approximation, the d-band hybridization contribution ∆E hyb d can be obtained from oneelectron eigenenergies using the Green's function approach [43] with the parameterized Hamiltonian and the density of d-states ρ d as the input.The total adsorption energy ∆E is the sum of the energy contributions from the metal sp-states and d-states, ∆E 0 and ∆E d , respectively.Another important information from the d-band theory with the Newns-Anderson model is the density of states projected onto the adsorbate orbital ρ a .Inclusion of multiple frontier orbitals 1 • • • i of an adsorbate while considering their degeneracies can be realized by stacking full-connected network sub-modules, see Fig. 1.A full account of the theoretical framework was recently presented to bridge the complexity of electronic descriptors in understanding reactivity trends of pristine transition-metal surfaces and their alloys [31].
A TinNet model using the architecture in Fig. 1 can be considered as a complex function mapping the graph feature representation of an adsorbate-substrate system to adsorption properties, i.e., the adsorption energy ∆E, projected density of states onto the adsorbate frontier orbital(s) ρ 1 a • • • ρ i a , and dband moments µ 1 • • • µ j of the adsorption site.Such mapping is parameterized by learnable weights of convolutional filters and neural connections in the regression module that is subsequently regularized by the theory module.The training of TinNet models can be performed by minimizing the sum-ofsquares error loss function J between model-predicted prop- erties and DFT-calculated ground truths in the output layer, see Fig. 1.In the current TinNet implementation, two loworder moments (µ 1 , µ 2 ) are embedded in the network for constructing the semi-ellipse ρ d , which is centered at d (µ 1 , the 1 st moment of the distribution relative to the Fermi level) with a full-width W d (4 √ µ 2 , µ 2 is the 2 nd moment of the distribution relative to the center).This simplified distribution is sufficient in computing orbital hybridization energies compared with self-consistent, DFT-calculated density of d-states for transition-metal surfaces [11].Higher-order moments of a distribution can be included using moment methods, if necessary [6,46].Using the backpropagation and stochastic gradient descent (SGD) algorithms, the constrained optimization can be performed.The PyTorch framework is used for implementing the hierarchical neural networks [26,36] in Fig. 1.In optimization of ML models, the output activations from the fully-connected layers in the regression module are directly passed into the theory module as a vector.Those vector elements are partitioned into different parts and assigned to the d-band moments of the site atoms and interaction parameters of individual adsorbate frontier orbitals with the metal spand d-states.The binding energy of the adsorbate and the projected density of states onto adsorbate orbitals can then be computed from the theory module.For comparison purposes, the fully-connected neural network (FCNN) and crystal graph convolution neural network (CGCNN) [26,36]  and near-surface alloys (A @A ML , A-B@A ML , A 3 B@A ML , A@A 2 B 2 , and A@AB 3 ), where A (or A ) represents 10 fcc/hcp metals and B covers 26 d-metals across the periodic table, see the "Methods" section for computational details.
OH is adsorbed at the atop site while the O−H bond is tilted toward the bridge site.The straight-up *OH adsorption configuration is less favorable than the tilted ones on transition metals because of the directional 1π-orbital interactions with metal d-states.In this study, we did not include other local minima of tilted *OH adsorption configurations.In the feature representation, bonding angles are also not included in the CGCNN framework.Note that other frameworks that are built upon the CGCNN, e.g., iCGCNN [47], and ALIGNN [48], have implemented angle features, which will be useful if multiple local minima exist in the dataset.Compared with previous studies that include different surface terminations and adsorption sites [17,26], we are focusing on a relatively small but representative dataset [14,49,50].For *OH, we explicitly included the 3σ, 1π, and 4σ * frontier molecular orbitals in the network design.To rigorously evaluate the prediction performance of ML models with a balanced bias/variance trade-off, we adopted k-fold cross-validation (k=10) to optimize hyperparameters, including learning rate, # of atomic features, # of convolution-pooling layers, # of hidden layers, and # of hidden neurons of each layer [51].A validation set (10%) is randomly split off the training set for early stopping of the optimization process as a form of regularization to avoid overfitting.In Fig. 2(a), we present the learning curves of the FCNN, CGCNN, and TinNet models, in which the mean absolute error (MAE) of prediction and its standard deviation are estimated by the nested 10-fold cross-validation approach [52] (see Supplementary Table 1 for the hyper-parameters of each model scheme).We include a diagram of the TinNet model architecture and hyper-parameters in Supplementary Fig. 2 for *OH to further clarify the flow/mapping of graph features to target properties.In the data-scarce region, the FCNN showed a relatively accurate and stable prediction of *OH adsorption energies compared with CGCNN and TinNet models because of employing physics-based features (e.g., orbitalwise coordination numbers [13]) rather than low-level graph features.
As the number of training samples increases, the TinNet can attain a 0.118 eV MAE of prediction with a .022eV deviation, outperforming the FCNN (0.152±.015 eV) and on par with the CGCNN (0.114±.025 eV).TinNet/CGCNN/FCNN models in this work and previously published ML models of *OH chemisorption on alloy surfaces, we have tabulated their feature type, learning algorithm, # of tuning parameters, # of samples, data range, and prediction errors (MAE and RMSE) in Table 1.In comparison of those methods, FCNN and CGCNN models rely on data to learn the underlying correlations between a site structure and the adsorption energy of *OH in a purely regression fashion, while the TinNet embeds the well-established physics, i.e., the Newns-Anderson model within the d-band theory of chemisorption, into the network architecture.Compared to the Bayschem model [31] trained with pristine transition-metal data (Supplementary Fig. S7), the significant improvement of the prediction accuracy (MAEs, Bayeschem: .27eV, TinNet: .118eV) can be attributed to the design of the TinNet architecture, allowing the algorithms to learn local interaction parameters of individual adsorbate frontier orbitals with the metal spand d-states from data samples of diverse site coordination environments.In contrast to ML models with hand-crafted features [13,14,21,25,31,41], the electronic structure of test samples is not needed for prediction using the TinNet.This elaborate design of the network architecture, as seen in Fig. 1, further improves the transferability of the TinNet framework and signifies its potential as a robust ML approach for guiding catalyst design beyond labeled material structures.Model validation with single-atom alloys.To test the prediction performance of those final models for unseen data, we chose single-atom alloys (SAAs) [53] as an out-of-sample material system that was not used in model training and crossvalidation.This emerging type of materials has received substantial interest in recent years because of its simplicity in structure allowing us to control catalytic properties at the atomic level.Here, we calculated *OH adsorption at the atop site of SAAs with Cu, Ag, or Au as the host and 26 d-metals as the single-atom active site.Because of the limited overlap between the d-states wavefunction of an active d-metal and that of the inert host, most of those SAAs exhibit previously unseen free-atom-like d-states [54,55], resembling the localized electronic structure in homogeneous molecular catalysts.With the Cu 1 /Ag(111) single-atom alloy as as a specific case, recent spectroscopic measurements validated the formation of such peaky d-states and its effect on surface reactivity of Cu 1 sites [55].Using the TinNet-predicted interaction parameters (∆ i 0 , i a , α i , and β i , where i represents an adsorbate frontier orbital) of Cu 1 /Ag(111) from the regression module, Fig. 3 of states onto the OH 3σ, 1π, and 4σ * orbitals against with DFT-calculated distributions.The d-states distribution ρ d of a Cu 1 site and its Hilbert transform along with the adsorbate line y = ( − a )/πβV 2 ad for each orbital are plotted for the graphical solution of the Newns-Anderson model [43].The intersections in Fig. 3(a) represent either the adsorbatesubstrate bonding and anti-bonding states (2 localized roots) for 1π or the resonance state (1 localized root) for 3σ and 4σ * .Given the simplicity of the model, the clearly captured strong-coupling and weak-coupling signatures for 1π and 3σ/4σ * orbitals, respectively, justified the TinNet in qualitatively predicting the electronic structure of an adsorbatesubstrate system.In another aspect, the comparison of model performance for predicting *OH adsorption energies between FCNN, CGCNN, and TinNet is shown in Fig. 3(b) and Supplementary Fig. 4. Using the 10-fold cross-validated final models, the TinNet (MAE: 0.161±.008eV) improves its prediction error over the FCNN (MAE: 0.193±.026eV) and CGCNN (MAE: 0.179±.029eV), particularly for the region involving highly-active early transition metals.Supplementary Fig. 5 shows the DFT-calculated vs. model-predicted dband center d and full width W d (MAE: .13eV and .37 eV, respectively) that were used to construct the semi-ellipse representing the projected d-states distribution ρ d onto a metal site.As an additional metric of model performance, the MAEs of the TinNet-predicted, projected density of states ρ i a are .0205, .0166, and .0187eV −1 for the OH 3σ, 1π, and 4σ * orbitals, respectively.To better understand the origin of the improved generalization performance, we have re-trained the FCNN and CGCNN models using the Multi-Task Learning (MTL), i.e., including both the adsorption energy and the d-band moments of the adsorption site in the loss function.We found that the generalization error of the adsorption en-ergy prediction of SAAs remains similar or slightly worsens for the FCNN (MAE: 0.198±.039eV) and CGCNN (MAE: 0.185±.029eV).The improved generalization performance can be attributed to the solid physical basis of the TinNet framework for property prediction of out-of-sample systems with unseen structural and electronic features, rather than accessing more electronic structure information.It is important to note that optimizing hyper-parameters in deep learning architectures and training deployable models with a rigorous validation procedure is quite expensive even with current GPU architectures (10 2 −10 3 GPU hours).Future development of the TinNet framework should enable transfer learning of trained model parameters to other adsorbate systems.For adsorbates with an identical set of frontier orbitals, e.g., atomic p x , p y , and p z orbitals of C, N, and O adatoms, it is natural to start from past fittings since the output vectors from the regression module have the same length and physical meaning of individual adsorbate frontier orbital interacting with the metal spand d-states.For adsorbates with a distinct set of frontier orbitals, e.g., O, OH, and OOH, it is generally accepted that the underlying physics or factors governing the interaction strength of those adsorbates with alloy surfaces are universal.In that scenario, convolution filter parameters that extract high-level feature representations of adsorption sites can be preloaded to speed up optimization processes.

Discussion
Model interpretability.A significant advantage of the Tin-Net framework is the model interpretability empowered by the theory module.To provide physical insights into the reactivity trend of *OH at transition-metal surfaces, we deconvolute the d-contributed adsorption energy ∆E d into Pauli repulsion and orbital hybridization, see Fig. 4(a).Not surprisingly, orbital hybridization dominates the overall trend of *OH adsorption energies, in agreement with the Bayesian chemisorption model developed for pure metals [31].In the strong-binding region, the Pauli repulsion due to orbital orthogonalization involving less than half-filled d-shells is expected to be negligible, very well captured by the TinNet.However, it becomes prominently important for late transition metals with completely or nearly filled d-states [3,33].Although this phenomenon was recognized, leveraging this physical aspect of chemical bonding for catalyst design in addition to strain [5] and ligand [4] effects has not been realized.For the diverse sites considered here, neither the d-band center nor the upper edge is linearly correlated with the *OH adsorption energy (R 2 : 0.64 and 0.49, respectively), see Supplementary Fig. 6.We argue that a linear descriptor of this kind might not exist for such a diverse dataset.Interestingly, the TinNet-predicted coupling integral squared V 2 , i.e., βV 2 ad , correlates very well with the orbital hybridization energies for 3σ (R 2 ∼0.93), 1π (R 2 ∼0.87), and 4σ * (R 2 ∼0.89) orbitals, see Fig. 4(b).This result showcases the ability of the TinNet framework to provide detailed physical interpretation of the reactivity trend of metal sites that is inaccessible with purely regression-based models.TinNet models for other adsorbates/facets.To demonstrate the approach for other adsorbates and facets, we developed the TinNet models for *O at the atop site of the {111}-terminated bimetallic alloy surfaces and *N at the hollow site of {100}terminated ternary alloy surfaces.The 10-fold cross-validated MAEs are .147eV and .116eV for *O and *N, respectively.We use the same set of alloy surfaces for *O as the *OH models (748 total).For *N adsorbed at the four-fold hollow site, we used 329 {100}-terminated Pt-based ternary alloy surfaces (Pt 3 M and Pt 2 M 2 intermetallics with M dopants at different positions of the top two layers, see "Methods" for details).*N adsorption at metal sites represents an important reactivity descriptor for ammonia electro-oxidation as the anode reaction in direct ammonia fuel cells [56][57][58].We note that the surface has a coadsorbed *OH spectator species for all the samples.Our previous study has shown that *OH play a crucial role in stabilizing *NH x species under relevant operating conditions [59].The dataset showcases the inclusion of adsorbateadsorbate interactions in developing machine learning models.In the current TinNet implementation, for a N-atom site ensemble, the regression module automatically allocates 2N output neurons for the 1 st and 2 nd moments of the d-states distribution of site atoms.The d-states distribution of the adsorption site will be represented by a superposition of individual d-dos constructs, e.g., semi-elliptic functions.Other output neurons representing interaction parameters of the adsorbate frontier orbitals with the metal spand d-states have the same dimension and physical meanings for adsorption sites of different atom ensembles.
This study highlights the importance of the frontier molecular orbital theory, electronic structure methods, and deep learning algorithms in developing interpretable ML models of chemical bonding.Infusing theory into ML fueled with ab initio adsorption properties will eventually lead us to bet- ter understand the fundamentals of linear energy relationships [60,61] and devise strategies to overcome such constraints in catalysis [62].For example, electrolyte molecules or ions can exert an additional coupling term with the adsorbate energy level a , often via hydrogen bonding [63,64], which could be leveraged to break the adsorption-energy scaling relations for hydrogen-containing species.Indeed, there is evidence that adding a co-solvent or ionic species into the bulk electrolyte does have a positive effect on stabilizing charge-transfer intermediates in metal-air batteries [65], ammonia synthesis [66], CO 2 reduction [67], and oxygen evolution [68].This physical aspect of chemical bonding can be built into the TinNet for screening improved catalytic systems with consideration of electrolyte choices.As a related note, all the structures used in this study are DFT-optimized local minima.Informing the learning algorithms of this physical information (forces are less than a threshold) in the spirit of incorporating physics, if the forces are accessible in the TinNet framework, can further constrain deep learning models and improve their transferability.Beyond a better estimation of adsorption energetics that are extensively explored in the field of catalysis, activation barriers, adsorbate-adsorbate interactions, and surface segregation energies are also important for predicting reaction kinetics and site stability prior to catalyst screening.The framework proposed here is a step toward that direction.
To conclude, the herein proposed theory-infused neural network (TinNet) represents a generalized ML approach to predicting chemical reactivity of solid surfaces with atomicallytailored active sites.Importantly, physical insights by learning from data come naturally with the TinNet, which can not be obtained otherwise using purely regression-based methods, irrespective of feature representations.We demonstrate the approach using simple adsorbates (e.g., *OH, *O, and *N) at active site ensembles as specific cases, and it can also be transferred directly to other descriptor species and nanostructures of different site geometries or electronic complexities, e.g., metal compounds with strongly-correlated d electrons, paving the path toward interpretable ML discovery of novel motifs with desired catalytic properties.This study encapsulates all of the important ingredients of the ML approach and can be straightforwardly extended to generic models or principles where the neuron representing parameters should be treated on a case-by-case basis.

Methods
DFT calculations Spin-polarized DFT calculations of *OH and *O adsorption systems were performed through Quantum ESPRESSO [69] with ultrasoft pseudopotentials.The exchange-correlation was approximated within the generalized gradient approximation (GGA) with Perdew-Burke-Ernzerhof (PBE) [70].{111}-terminated metal surfaces were simulated using (2 × 2) supercells with 4 layers and a vacuum of 15 Å between two images.The bottom two layers were fixed while the top two layers and adsorbates were allowed to relax until a force criteria of .1 eV/ Å.A plane-wave energy cutoff of 500 eV was used.The *N adsorption systems consist of {100}terminated Pt-based bimetallic surfaces doped with a third element.It includes Pt3M and PtM bimetallics where M can be any of the transition metals, while the dopants cover 15 elements: Fe, Zn, Cu, Co, Ni, Rh, Pd, Ag, Ir, Pt, Au, Ru, Mo, Cr, and W. Spin-polarized DFT calculations were performed through Vienna Ab initio Simulation Package (VASP) with projector augmented wave psuedopotentials.The exchange-correlation was approximated within the generalized gradient approximation (GGA) with the revised Perdew-Burke-Ernzerhof (RPBE) [71].A plane-wave energy cutoff of 450 eV was used.The {100}-terminated alloy surfaces were modeled using (2 × 2) supercells with 4 layers and a vacuum of 15 Å between two images.The bottom two layers were fixed while the top two layers and adsorbates were allowed to relax until a force criteria of .05eV/ Å.In order to consider the effect of aqueous solvation on adsorption energies, an implicit solvation model was employed through the VASPsol package [72].All of the Pt-based alloy surfaces have coadsorbed *OH (θ OH = 1/4 ML) as a spectator species.Doping is simulated by replacing one of the top two-layer metal atoms with dopant metals.For both {111} and {100} terminations, a Monkhorst-Pack mesh of 6×6×1 was used to sample the Brillouin zone, while for molecules and radicals only the Gamma point was used.Methfessel-Paxton smearing scheme was used with a smearing parameter of .1 eV for adsorbate systems and 0.001 eV for molecules.Electronic energies are extrapolated to kBT = 0 eV.The projected atomic and molecular density of states were obtained by projecting the eigenvectors of the full system at a denser k-point sampling (12×12×1) with an energy spacing 0.01 eV onto the ones of the part, as determined by gas-phase calculations.FCNN models.Fully-connected neural network (FCNN) is the simplest artificial neural network, and there is no cycle between node connections.The input features of FCNN include atomic features, surface features, and bulk features, which represent characteristics of the adsorption site, the environment of the adsorption site, and properties of the entire crystal.The "BulkFingerprintGenerator.bulk average" module of the CatLearn package [37] is used to extract properties of the adsorption site, the first two surface layers, and the bulk as atomic, surface, and bulk features, respectively.All missing properties in the module are set to zero.In addition to previous properties, atomic features also contain Pauling electronegativity (χ0), V 2 ad , and atomic radius (r0) while surface features include local Pauling electronegativity (χ) and orbitalwise coordination numbers (CN s and CN d ) [40].Hyper-parameter optimization.
In this study, five hyperparameters, namely learning rate (lr), number of hidden layers (n h), number of neurons of each hidden layer (h f ea len), number of convolutional layers (n conv) and the length of atomic features into the convolution (atom f ea len), were tuned by using the random search algorithm through the Ray package [51].lr is randomly sampled from 0.0001 to 1 with log uniform distribution.atom f ea len, n conv, h f ea len and n h are a random integer in between 16 to 112, 1 to 10, 32 to 224 and 1 to 10, respectively.For each model, 150 randomly selected combinations are used as the hyper-parameter set for the training.For each hyper-parameter set, regular 10-fold crossvalidation (CV) is applied.The data set is divided into 10 folds first.A fold is used as the test set for each calculation.The rest of 90% data set will be divided into 10 folds again and a randomly-chosen one fold is used as the validation set for early stopping the training procedure.Supplementary Fig. 1 illustrates the hyper-parameter optimization procedure.AdamW optimization algorithm, MSE loss function and Softplus, Sigmoid and ReLU activation functions are implemented in the training.Batch size and weight decay are 64 and 0.0001, respectively.If no better validation loss within 1,000 epochs, the model with minimal validation loss will be selected as the final model of that fold.For FCNN and CGCNN, the loss function only contains MSE(∆E), but, for TinNet, the loss function is constructed with MSE(∆E) + MSE(µ1) + MSE(2 √ µ2) + λ[MSE(ρ3σ) + MSE(ρ1π) + MSE(ρ4σ * )].The energy contribution from the spelectrons (∆E0) and the weight of density of states (λ) are set at −2.69 eV and 0.01, respectively, as derived from Bayesian learning models [31].The final loss (average 10 test loss) of that hyper-parameter set will be obtained.Optimized hyper-parameter set with a minimal loss for each algorithm is shown in Supplementary Table 1.These hyper-parameter sets will be used for all later ML optimization.Details of the CGCNN model setting can be found in refs [22,36].Learning curve.The nested 10-fold cross-validation with different proportion of the dataset (from 5% to 100% with 5% as the interval) was used to evaluate the model performance.For each proportion, the dataset is divided into 10 folds.One of the folds is used as the test set, the other fold is used as the validation set, and all other eight folds are used as the training set.Supplementary Fig. 3 illustrates the procedure for generating the learning curve with the nested 10-fold cross-validation approach.90 models, whose test set is not equal to the validation set, are used to evaluate model performance.For those 10 models whose test set is also the validation set will be used as final models for predicting unknown systems.For different methods, the average wall-time consumed to train a model for a given data split is shown in Supplementary Table 2.

1 .Figure 1 .
Figure 1.Schematic illustration of the theory-infused neural network (TinNet) for interpretable reactivity prediction of transition-metal surfaces.The information flows from the graph representation of a given adsorbate-substrate system to the adsorption energy ∆E, projected density of states onto the adsorbate frontier orbital(s) ρ 1 a • • • ρ i a , and d-band moments µ1 • • • µj of the adsorption site.

Figure 2 .
Figure 2. Model development.a Learning curves of FCNN, CGCNN, and TinNet (this work) models of *OH adsorption energies on {111}-terminated intermetallics and near-surface alloys with respect to the number of available data samples.The error bar corresponds to the standard deviation of the error estimates from 10fold cross-validation.b DFT-calculated vs. TinNet-predicted *OH adsorption energies for all 10-fold test sets, along with a histogram of data sampling.

Figure 2 (
b) shows a 2D histogram representing the TinNet-predicted *OH adsorption energies of all 10-fold test sets against DFT-calculated values.In graph representation, the strain and ligand effects on site reactivity can be captured by atomic features and neighboring information.For the TinNet framework, graph representation of the local coordination environment is naturally reflected by the output activations from the regression module, including 1) the d-band center (1 st moment) and width (2 nd moment) of the site atoms and 2) interaction parameters of individual adsorbate frontier orbitals with the metal spand d-states, such as the orbital overlap and coupling coefficients which are dependent on d-orbital radii, interatomic distances, and local electron densities based on the tight-binding theory[45].To make a clear benchmark comparison of the

Figure 3 .
Figure 3. Out-of-sample validation of the TinNet model.a Projected density of states ρa onto the OH 3σ, 1π, and 4σ * orbitals from DFT calculations (solid) and TinNet models (dashed), taking Cu1/Ag(111) as an example.The graphical solution to the Newns-Anderson model is also shown, in which the intersections of the adsorbate line y = ( − a)/πβV 2 ad for each orbital with the Hilbert transform Λ( ) of the density of d-states ρ d represent the adsorbatesubstrate bonding and anti-bonding states (2 localized roots) for 1π and the resonance state (1 localized root) for 3σ and 4σ * .b DFTcalculated vs. TinNet-predicted *OH adsorption energies for out-ofsample single-atom alloys.A broad range of transition-metal atoms (26 in total) were used as the single-site substitute of the coinage metal host, i.e., Cu, Ag, and Au.

Figure 4 .
Figure 4. Physical insights into chemical bonding.a Orbital hybridization and Pauli repulsion contributions from the metal d-states to the *OH adsorption energies on all 10-fold test sets deconvoluted by the TinNet models.b The TinNet-predicted coupling integral squared βV 2ad for 3σ, 1π, and 4σ * orbitals linearly correlates with the corresponding orbital hybridization energy (R 2 : 0.93, 0.87, and 0.89, respectively).Regression lines with the intercept at 0 are shown.To avoid overlap, the 1π data are plotted in the inset.All markers are color coded according to the theoretical d-band filling f of the *OH adsorption site.

Figure 5 .
Figure 5. TinNet models for other adsorbates/facets.DFTcalculated vs. TinNet-predicted (a) *O adsorption energies at the atop site of {111}-terminated alloy surfaces and (b) *N adsorption energies at the hollow site of {100}-terminated alloy surfaces for all 10fold test sets, along with a histogram of data sampling.The error bar corresponds to the standard deviation of the error estimates from 10-fold cross-validation.

Table I .
Benchmark comparison of ML models of *OH chemisorption on alloy surfaces.
a Artificial neural network.b Gaussian process regression.c Convolutional neural network.d Density of states.