Abstract
Mechanistic understanding of large molecule conversion and the discovery of suitable heterogeneous catalysts have been lagging due to the combinatorial inventory of intermediates and the inability of humans to enumerate all structures. Here, we introduce an automated framework to predict stable configurations on transition metal surfaces and demonstrate its validity for adsorbates with up to 6 carbon and oxygen atoms on 11 metals, enabling the exploration of ~10^{8} potential configurations. It combines a graph enumeration platform, force field, multifidelity DFT calculations, and firstprinciples trained machine learning. Clusters in the data reveal groups of catalysts stabilizing different structures and expose selective catalysts for showcase transformations, such as the ethylene epoxidation on Ag and Cu and the lack of CC scission chemistry on Au. Deviations from the commonly assumed atom valency rule of small adsorbates are also manifested. This library can be leveraged to identify catalysts for converting large molecules computationally.
Introduction
The advancement in density functional theory (DFT) has enabled mechanism development and in silico catalyst design^{1}. DFT calculations have been performed for several smallmolecule chemistries, including hydrogen evolution and oxidation reactions^{2,3}, oxygen reduction and evolution reactions^{4,5,6,7,8}, CO_{2} reduction, N_{2} reduction^{9,10}, and CH_{4} activation^{11}. Computing the species configurations and thermochemistry is essential, as correlated uncertainty quantification reveals that more thermodynamic parameters than activation energy parameters affect the kinetics^{12}. Adsorbate configurations are prerequisites in computing activation energies of elementary reactions. While manual DFT calculations have been adequate for small molecules, they are impractical for large molecules due to the combinatorial size of the reaction network that includes all intermediates^{13}. Thus, an extension of computations to large molecules on transitionmetal catalysts has been lagging. Establishing a framework for modeling large molecules would thus be essential to significantly accelerate mechanistic and discovery studies, for example, in renewable energy, such as biomass pyrolysis and gasification^{14,15}, biomass upgrade via hydrogenation^{16,17,18} and hydrodeoxygenation^{19,20}, and hydrogen production via biomass reforming^{21}, and recycling of plastics.
Surprisingly, the challenge in DFT calculations of adsorbates is not merely the computational cost–databases in the order 10^{6} are becoming commonplace^{22,23,24,25}. A challenge is the automated generation of stable adsorbate configurations on surfaces. The adsorption configuration of large molecules is combinatorially intractable to enumerate in practice due to the multiple adsorption sites and several surfacebinding atoms^{26}. Each stable configuration can undergo different chemistry, and the reaction network thus depends critically on identifying all (or at least the most) stable configurations. It turns out that this task escapes intuition.
Several tools can ease the generation of stable configurations. Peterson et al.^{27} developed a global adsorbate configuration optimization method using the constrained minima hopping method, but its scalability is limited as DFTbased annealing is used. Medford and coworkers utilized the minima hopping with faster density functional theory tightbinding (DFTB) methods for bidentates, but obtaining reliable DFTB parameters is not trivial^{28}. Bligaard and coworkers have implemented graphbased enumeration for bidentate adsorbate configurations^{29}, and Greeley and coworkers developed a pythonbased graph theory package to encode the adsorption structure into a graph to identify the adsorption structures and generate high coverage configurations uniquely^{30}. Currently, no general strategy exists that systematically identifies stable adsorbate configurations with three or more surfacebinding atoms needed to adequately describe the chemical reactions of large molecules on metal surfaces.
Here, we introduce a general framework to predict a nearly complete set of stable adsorbate configurations on metal surfaces. We introduce expert knowledgebased enumeration rules to generate the configuration space, containing most, if not all, stable configurations. The configurations are optimized using a force field, and strained configurations are removed. For the configurations with ≤3 heteroatoms (nonhydrogen organic atoms), we perform multifidelity DFT calculations to assess the configuration stability. With this data, we train a machine learning (ML) model and use it as a screening tool to predict the stability of larger adsorbates before performing DFT calculations. The workflow is summarized in Fig. 1. We apply the framework to closepacked surfaces of Ag, Au, Co, Cu, Ir, Ni, Pd, Pt, and discover 4,979 stable configurations. The predictive ability of the ML modelbased screening is further demonstrated for 1650 configurations with 4 ≤ heteroatoms ≤ 6 also computed via DFT. We find that distinct trends in stable configurations among catalysts explain the observed selectivity in experimental systems, and the clustering in the adsorbate data is rationalized by the dband/adsorbate interactions. We propose that stable intermediates are essential for a catalyst to carry out a specific reaction, and the extensive library created here can be leveraged to prescreen catalysts for all commonly metalcatalyzed chemistries. This work paves the foundation toward mechanistic insights into and design principles of large molecule conversion.
Results
Skeleton enumeration
We introduce graph transformation rules to enumerate “skeleton” configurations, which contain carbons and their connectivity patterns to the surface, inspired by Ruddigkeit et al.^{31} The initial pool of configurations are built by adding carbon on top, bridge, and hollow sites on a large surface lattice graph. Then, the rules that precisely add one carbon atom are repeatedly applied to build larger configurations. Hydrogen additions and electronic effects are considered later.
Four types of rules can comprehensively enumerate all possible adsorbate configurations. The first type adds an adsorbed carbon to an adsorbed carbon (surface propagation rules). These rules can be made systematically using the following steps. First, find all possible one atom binding sites on closepacked surfaces (top, bridge, and hollow sites; inset 1 in Fig. 2a). Second, enumerate twoatom configurations by exhaustively evaluating (1) the number of metal atoms that participate in two binding sites, e.g., an atom involved in bonding of two bridge sites, and (2) the total number of the adsorbatesurface bonds–1, 2, and 3 for top, bridge, and hollow sites (Fig. 2a). Third, remove unreasonable configurations of unrealistic bond distances. Fourth, convert the twoatom configurations (e.g., green box in inset 2 in Fig. 2a) to graph transformation rules (e.g., blue box in inset 2 of Fig. 2a). A rule consists of a pattern graph (lefthand side of the blue box) and a replacement graph (righthand side of the blue box). A graph transformation is applied to a configuration by searching for the occurrence of the pattern graph in the configuration, and by replacing the found occurrence with the replacement graph. The twoatom configuration (the green box) becomes the replacement graph (right side of the blue box). The pattern graph (left side of the blue box) is made by removing an atom in twoatomconfigurations (the green box). The key postulates are (1) the systematic enumeration of all possible twoatom configurations and (2) the larger configurations consist of twoatom configurations (e.g., a sixatom skeleton can be decomposed to the twoatom configurations). This framework applies to other planar surfaces, such as fcc(100), hcp(1010), and bcc(110).
The second type of rule accounts for nonsurfacebonding carbons (e.g., CH_{2} and CH_{3}). Nonsurfacebonding carbon can be added to an adsorbed carbon on top, bridge, or hollow site. Also, nonsurfacebonding carbon can be added to another nonsurfacebonding carbon to increase the chain length. We call these rules vacuum propagation rules (Fig. 2b).
As shown in Fig. 2c, adsorbates form an “arc” containing a nonsurfacebonding atom chain and two anchoring adsorbed atoms (e.g., (CH_{2})_{x}). Rules that add an adsorbed carbon to a nonsurfacebonding atom can be used to construct arcs, but two anchoring atoms cannot be too far apart. Thus we introduce two metrics, as shown in Fig. 2c, where d_{surface} and d_{nearest neighbor} are the distance between the two anchoring surface atoms and the distance between two nearest neighbor surface atoms, respectively. The ratio of the two defines a normalized length threshold for the arc to be stable (Fig. 2d), which we estimate using DFT with (CH_{2})_{x} on Pt(111). The line between the stable and unstable data indicates the decision boundary we used to decide the arcs’ stability. Figure 2e demonstrates a rule for anchoring an arc (called anchoring rules), the pattern graph of which has to respect the distance constraint of Fig. 2d.
The last type of rule adds an adsorbed carbon to two adsorbed carbons, forming a ring (ring rules). Figure 2f shows the ring rules developed by enumerating three adsorbedcarbon chains and building the pattern graph by removing the central atom.
After the enumeration, surface atoms in each enumerated configuration are systematically pruned to build a unique, unambiguous graph (see Supplementary Fig. 1). Duplicate configurations are removed by comparing their hash, such as the SMILES string.
Force field screening
We remove strained configurations by optimizing the structures of skeleton configurations with the universal force field^{32} with additional interactions between the adsorbate and the surface (see methods for details) with heuristic parameters. The structures with C–C bond lengths outside the range of 0.8 Å and 1.65 Å are removed, which is a broad threshold based on the covalent radius of carbon and oxygen.
Transformation to an adsorbate
The unstrained skeleton configurations produce realistic configurations on which we substitute carbon with oxygen at all possible locations and add hydrogens to carbons and oxygens while respecting the valency rule. A varying number of hydrogens is added to the skeleton to represent all possible degrees of saturation; thus, the number of configurations significantly increases in this step.
Multifidelity DFT screening
We perform lowfidelity DFT calculations of configurations with ≤3 heteroatoms with an early stopping criterion upon configuration divergence to assess the stability. The parameters used for the lowfidelity DFT setup result in less accurate but more efficient calculations (see methods). These achieve decent accuracy compared to the standard DFT relaxation (see methods). The configuration of the DFTcalculated structures is built by determining the connectivity between atoms using d_{ij} < t(r_{cov,i} + r_{cov,j}), where d_{ij} is the distance between atoms i and j, t is the tolerance factor (1.18 used), and r_{cov,i} is the covalent radius of atom i. The stable configurations are further refined using highfidelity DFT calculations.
MLbased stability prediction
We rapidly screen the stability of the configurations with >3 heteroatoms by introducing a fingerprintlike descriptorbased logistic regression (FLDLR), shown in Fig. 3a, with fingerprintlike descriptors as input features^{33}. In this method, all possible subgraphs of adsorbate are enumerated, and, for each subgraph, surface atoms connected to the adsorbate are added. The output feature vector contains the number of occurrences for each fingerprint. The training data set is obtained by performing DFT calculations for configurations with ≤3 heteroatoms where the stability is quantified as 1 (stable) or 0 (unstable). A configuration is labeled stable if the connectivity does not change after the DFT relaxation (i.e. the configuration represents a local or global minimum on the potential surface). If the connectivity pattern changes upon DFT relaxation, we labeled them unstable, as the configuration represents an unstable point on the potential surface. As the model will primarily be used to predict configurations of larger adsorbates, we devise a similar extrapolation test. We train the model with adsorbates of ≤2 heteroatoms and assess its error on adsorbates with three heteroatoms. Logistic regression calculates the probability (a continuous value between zero and one) that a configuration is stable. The probability threshold is used as a tunable parameter for screening. Its effect on the model performance is assessed by the test set recall, precision, F_{1} score, selectivity, and accuracy in Fig. 3b–e, and Supplementary Fig. 2. As we are interested in a comprehensive database containing nearly all stable configurations, a high recall TP/(TP + FN) value is desired. Here T, F, P, and N are true, false, positive, and negative, respectively. A low threshold of 0.2 (Fig. 3b) ensures that 95% of all stable configurations are sampled (a high recall). However, a low threshold implies also that unstable configurations are also selected (undesired). The precision TP/(TP + FP) in Fig. 3c shows that only 10% of the selected configurations will be stable (a low precision). The F_{1} score in Fig. 3d shows the harmonic mean of the precision and recall. A threshold of 0.76 most efficiently samples the stable configurations at the cost of unaccounted stable configurations. The selectivity TN/(TN + FP) in Fig. 3e indicates the DFT costsaving from the ML screening, where we would screen out 44% of the unstable configurations using ML at the threshold of 0.2. Supplementary Figure 2 shows that the accuracy is high at higher tolerance, as most of the enumerated configurations are unstable.
Incorporating FLDLR as a screening tool before performing DFT calculations can significantly reduce the computational cost for larger adsorbates. We retrained the model with ≤3 heteroatoms configurations, and randomly sampled 50 configurations each for 4, 5, and 6 heteroatoms on 11 metals using the uniform distribution over stability score, and performed DFT calculations. The FLDLR calculated score and the DFT inferred stability are compared in Supplementary Fig. 3. We find that 99% of the configurations with low scores (<0.05) are unstable. Since the configurations with low scores (<0.05) comprise most of the large molecule configuration space (84%, 95%, and 99% for 4, 5, and 6 heteroatoms, respectively), one to two orders of magnitude reduction in DFT calculations is expected using the low score as a screening criterion. We believe that ML predictions in the low score region extrapolate well to larger adsorbates; the fingerprints causing instability in the small adsorbate configurations are also present and also cause instability in larger adsorbate configurations. Some of the converged structures with 6 heteroatoms are shown in Fig. 4.
Enumerated data distribution
The number of configurations in the various methodological stages is shown in Fig. 5. It increases exponentially with increasing the number of atoms, reaching ~10^{8} configurations for six heteroatoms. The number of DFTcalculated stable structures (green points) scales less steeply than the enumerated ones. The ML screening (using a threshold of 0.2) reduces the number of calculations by two orders of magnitude for adsorbates with 6 heteroatoms.
Figure 6 demonstrates the distribution of the stable configurations assessed using DFT. Intuitively, the atom valency, defined here as (number of the electrons in the valence shell)–(number of neighbor adsorbate atoms), generally follows the number of the coordinated surface atoms (1, 2, and 3 for top, bridge, and hollow sites) as shown in Fig. 6a. Many configurations violate this traditional rule, demonstrating the importance of exhaustive enumeration compared to simple intuition. For complex multidentate adsorption, the valency of a single heteroatom is not the only dictating principle; strain effects from stretching the bonds to accommodate the metal lattice and the adsorption characteristics of the other atoms collectively matter. Minimizing the energy of the entire species is the overarching principle. Figure 6b shows the principal component analysis of the stable configurations. The binary matrix is constructed with dimension (number of metals) × (number of configurations), where the matrix element is set to 1 if the given configuration is observed and 0 otherwise (configuration stability matrix in Fig. 6b). We observe that metals form clusters of data. Pt, Pd, Re, Ru, and Ir favor intuitive valencybased configurations: the adsorbate heteroatom valency matches the number of adsorbatesurface bonds (e.g., top, bridge, and hollow sites for CH_{3}, CH_{2}, and CH, respectively). Adsorbates fulfill their valency by making the necessary number of bonds with the metal atoms. For Ag and Cu, the number of adsorbatesurface bonds is less than or equal to the valency. The weakly binding Au has the lowest number of stable configurations. Ni and Co contain structures where the number of adsorbatesurface bonds exceeds the adsorbate atom valency. In this regard, several theoretical and experimental investigations reported that the methyl radical on hollow sites makes three adsorbatesurface bonds (3) and exceeds its valency (1)^{34,35,36} The hollow site adsorption is attributed to the dband coupling with the adsorbate orbitals^{34}. This observation is furthermore validated as the adsorption becomes stronger for the metal with a dband center closer to the Fermi level. Similarly, the dband center has also been shown to correlate to the energy of adsorbates with a varying number of adsorbatesurface bonds^{37}. Thus, we calculated the dband center relative to the Fermi level for the metals considered here: Co > Ni > Rh > Ru > Pd > Re > Cu > Pt > Ir > Au > Ag (Supplementary Table 1). The excess adsorbatesurface bonds for Rh, Ni, and Co are due to their enhanced dband center interaction. Finally, Pt, Pd, and especially Au disfavor η mode interaction between the πorbit and the metal atoms. As a result, the C=O and C=C substructure is observed less on these metals.
Some molecules do not adsorb on some metals. For example, ethylene (CH_{2}CH_{2}) does not adsorb on Au(111) and Ag(111) but adsorbs on Cu(111) in η adsorption mode, in agreement with previous DFT calculations^{38}. Thus, we perform a principal component analysis of a binary matrix with dimension (number of metals) × (number of molecules), where the matrix element is set to 1 if the given molecule adsorbs on metals and 0 otherwise (molecule adsorption stability matrix in Fig. 6c). Compared to the previous matrix, the second dimension runs over molecules. There are essentially three clusters of data: the first cluster contains mostly strongly binding metals (Pt, Ni, Rh, Co, Ir, Re, Pd, and Ru). On these metals, most of the molecules have multiple stable adsorbed configurations. The second includes several multidentate molecules and molecules with high valency that do not adsorb on Au. This explains the poor performance of Au for C–C scission (encountered, for example, in steam and dry reforming of larger fuels, e.g., ethanol) and isomerization, as important dehydrogenated reaction intermediates, such as CH_{3}CHO, CH_{2}CH_{2}, and CH_{2}C, do not adsorb on Au^{39,40}. Similarly, Au is a poor catalyst for the Fischer Tropsch synthesis as important intermediates for C–C coupling typically have high valency^{41,42,43}. The third contains Ag and Cu that can adsorb three atomring structures that are unstable on other metals which typically dissociate. Some of these molecules are dehydrogenated ethylene oxide (epoxide). Ag and Cu have long been used for selectively producing ethylene oxide^{44,45}. Hence, these metals’ affinity for the stable ethylene oxide derivatives may be the key to their high selectivity.
Predicting selective catalysts
Exploiting the concept of stability of adsorbates being crucial for selectivity, we predict selective catalysts for four heteroatom closedshell molecules using ethylene oxide as a reactant. We enumerate all possible reaction paths between ethylene oxide and four heteroatom closedshell molecules by adding and removing C, H, and O in the enumeration rules. For each metal, the shortest reaction paths containing stable intermediates were extracted. The stability of adsorbates was assessed using DFT for ≤3 heteroatom adsorbates and FLDLR with a threshold of 0.95 (high probability of stability) for >3 heteroatom adsorbates. The paths to closedshell molecules with less than 5 viable metals are shown in Fig. 7 as examples of selective catalysts. The thermochemistry and kinetics were not assessed, thus realizing these chemistries requires further investigation. We find that Au(111) is selective to all nine molecules whereas seven other surfaces are selective to a few. Especially, eight out of nine molecules contain rings, which are typically produced by homogeneous organic reactions. Specifically, homogeneous gold catalysts produce small rings with less than six atoms^{46}, and some cyclization transfers to gold nanoparticles^{47}. These facts indicate that the discovered pathways could be experimentally viable.
Discussion
The conversion of large molecules is poorly understood due to the large size of the reaction network and the lack of automation for initializing DFT calculations of large adsorbates. This, in turn, stems from the combinatorial explosion of complex adsorbate configurations that dictate thermochemistry and reaction pathways. The intuitive binding of adsorbates, based on the heteroatom valency, has long been used. We discover it can fail, yet certain clusters of data are observed based on the dband/adsorbate orbital interaction. To the best of our knowledge, this work presents the first systematic enumeration of multidentate adsorbate configurations with arbitrary binding motifs. Importantly, we also find correlations between configurations with the dband. We observe that the stability of intermediates is essential for highly selective catalysis, as a correlation between the intermediate stability and selectivity is demonstrated for the ethylene oxide and Fischer Tropsch process. More generally, a catalyst cannot produce a molecule if its reaction intermediates are not stable on it, and the library of molecules we built can be leveraged to understand if a metal catalyst can conduct specific chemistry. Potentially, highly selective catalysts can be made by designing catalytic sites that selectively adsorb desired adsorbates. Furthermore, we often assume in creating volcano curves for materials discovery that the reaction pathway, intermediates, and ratedetermining step are the same on all catalysts. Our results clearly identify clusters of materials for which this is true but expose profound differences among clusters. The developed database could aid in the theoretical investigation of large molecules by predicting adsorbate thermodynamic properties and enabling a database for lateral interaction models^{30}, Brønsted−Evans−Polanyi relations^{48} (scaling relationship between reaction energy and activation energy), and transition state structures. These investigations could enable microkinetic model development toward elucidating catalyst design principles. We emphasize that, while we focused on the widely studied closepacked surfaces, the framework can be expanded to other surfaces such as fcc(100), stepped surfaces, and alloys by constructing an appropriate surface lattice, and differentiating surface atoms by elements and location (e.g., stepedge, corner, terrace). Other heteroatoms, such as nitrogen and sulfur, with pharmaceutical applications, can trivially be considered.
The number of enumerated configurations becomes computationally vast, reaching 10^{8} for adsorbates with six C and O atoms, posing a significant challenge in studying large molecules. The difference in the slopes of enumerated configurations and DFTcalculated stable configurations is notable, underscoring that an improved enumeration algorithm could potentially be developed. We expect the performance of the ML model to improve significantly by adding structures of four C and O atoms, as an adsorbed carbon has a maximum of three neighbors. We are expanding the database to improve the ML model.
Our scheme can be further improved in several directions. Lateral interactions between adsorbates are wellknown to affect the adsorption energy and potentially change the preferred site^{49}. While we used a relatively low coverage, the effect of lateral interactions on the configuration stability remains unclear. We also did not assess the vibrational modes of adsorbates, and thus, some adsorbates may be on unstable saddle points on the potential energy surface. Our scheme faces an additional challenge for larger biomass molecules, such as glucose involving 12 C, O atoms, requiring >10^{6} DFT calculations. Potentially online learning, where we repeat the cycle of data sampling and model training, can improve model accuracy and reduce the number of candidates continuously on the fly. Our scheme has similarities with global optimization techniques aiming to identify all minima in a highdimensional space. Integration with advanced global optimization algorithms^{50,51,52} can improve scalability as well. As we focused on the enumeration of adsorbates’ connectivity patterns, our scheme does not account for cis/trans isomers not implicitly accounted for by the connectivity pattern (Supplementary Fig. 4). The assessment of the quality of the data is critical. While we addressed the challenge of the enumeration of connectivity patterns, future work should include the curation of the data, which can include manual curation, and the use of statistics to identify faulty data.
Methods
Force field optimization
The universal force field as implemented in RdKit (Rdkit.org) is modified to generate the structures. In addition to the standard UFF parameters, distance and angle constraints are added using the quadratic relations,
where E_{r} and E_{θ} are the distance and angle energy, k is the force constant, r and θ are radius and angle, and the subscript eq represents the equilibrium value. Forces that hold surface atoms in their lattice position and describe adsorbate atom–surface atom bond are added, as shown in Table 1. Also, various angle constraining forces are added to generate reasonable structures, as shown in Table 2. The heuristic forces provide a plausible initial guess structure for DFT calculations, typically better than the manually guessed structures. As a strong force constant is used for the adsorbatesurface bond, the strain manifests as the distorted adsorbateadsorbate bond, which we used to decide the strained, unstable configurations.
DFT calculations
We performed DFT calculations using the Vienna ab initio Simulation Package^{53}. The electron exchange and correlation energies were computed using the PBE functional^{54}. Our previous study finds that the choice of functional and dispersion correction does not affect the geometry of a large molecule, namely furan, significantly^{55}. The core electrons were calculated with the projector augmentedwave (PAW) pseudopotentials^{56}. The Brillouin zone is sampled with a MethfesselPaxton smearing of 0.1 eV^{57}.
To construct the slab, the lattice constants of the metals are optimized using 15 × 15 × 15 MonkhorstPack kpoint mesh with Blöchl correction^{58,59}, D3 dispersion correction^{60}, and the planewave cutoff energy of 500 eV. Closepacked surfaces (fcc(111), hcp(0001)) were modeled with a fourlayer deep 4 × 4 unit cell with a 20 Å vacuum where the bottom two layers are fixed.
For assessing configuration stability, we used lowfidelity parameters. The cutoff energy of 300 eV was used with nonspin polarized calculations. Gamma point was used to sample the Brillouin zone. The quasi–Newton algorithm was used to converge the structure into its instantaneous ground state. The DFT calculations were stopped if the configuration diverged to another configuration. Molecular graphs are constructed by adding an edge between two atoms if the distance between the two is less than the sum of the two elements’ covalent radius multiplied by 1.18. If the calculation did not converge after 200–800 ionic steps, we used the conjugategradient algorithm to relax the structure. Here, the early stopping is not used to observe the final structure. The configuration graph is determined using covalent radiusbased graph construction^{33}. To test the stability convergence, we compare the stability of adsorbates with ≤2 C, and O atoms on Pd(111) between the lowfidelity and standard DFT parameters. The highfidelity calculations entail cutoff energy of 400 eV with 3 × 3 × 1 MonkhorstPack kpoint mesh^{58}, spinpolarization, and D3 dispersion correction^{60}. Here, 9 out of the 52 stable configurations of standard DFT calculations diverged in lowfidelity (see confusion matrix in Supplementary Table 2). Out of these, the binding energies of the four configurations are >0.5 eV higher than the ground state configuration binding energy of the respective adsorbate. Four configurations are local ground states (0.13, 0.19, 0.12, and 0.01 eV with respect to each molecules’ ground state configuration). Only one ground state configuration was not predicted stable in the lowfidelity calculation. This was due to the early stopping method, stopping calculations prematurely before convergence.
Data availability
The enumerated configurations, their stability, and energetics are available at our GitHub repository^{61}.
Code availability
The enumeration and machine learning code with an example output is available at our GitHub repository^{61}.
References
Nørskov, J. K., Bligaard, T., Rossmeisl, J. & Christensen, C. H. Towards the computational design of solid catalysts. Nat. Chem. 1, 37 (2009).
Nørskov, J. K. et al. Trends in the exchange current for hydrogen evolution. J. Electrochem. Soc. 152, J23 (2005).
Greeley, J., Jaramillo, T. F., Bonde, J., Chorkendorff, I. & Nørskov, J. K. Computational highthroughput screening of electrocatalytic materials for hydrogen evolution. Nat. Mater. 5, 909 (2006).
Kulkarni, A., Siahrostami, S., Patel, A. & Nørskov, J. K. Understanding catalytic activity trends in the oxygen reduction reaction. Chem. Rev. 118, 2302 (2018).
Nørskov, J. K. et al. Origin of the overpotential for oxygen reduction at a fuelcell cathode. J. Phys. Chem. B 108, 17886 (2004).
Greeley, J. et al. Alloys of platinum and early transition metals as oxygen reduction electrocatalysts. Nat. Chem. 1, 552 (2009).
Man, I. C. et al. Universality in oxygen evolution electrocatalysis on oxide surfaces. ChemCatChem 3, 1159 (2011).
Rossmeisl, J., Qu, Z.W., Zhu, H., Kroes, G.J. & Nørskov, J. K. Electrolysis of water on oxide surfaces. J. Electroanal. Chem. 607, 83 (2007).
Jacobsen, C. J. H. et al. Catalyst design by interpolation in the periodic table: bimetallic ammonia synthesis catalysts. J. Am. Chem. Soc. 123, 8404 (2001).
Skúlason, E. et al. A theoretical evaluation of possible transition metal electrocatalysts for N_{2} reduction. Phys. Chem. Chem. Phys. 14, 1235 (2012).
Latimer, A. A. et al. Understanding trends in C–H bond activation in heterogeneous catalysis. Nat. Mater. 16, 225 (2017).
Sutton, J. E., Guo, W., Katsoulakis, M. A. & Vlachos, D. G. Effects of correlated parameters and uncertainty in electronicstructurebased chemical kinetic modelling. Nat. Chem. 8, 331 (2016).
Sutton, J. E. & Vlachos, D. G. Building large microkinetic models with firstprinciples׳ accuracy at reduced computational cost. Chem. Eng. Sci. 121, 190 (2015).
Edye, L. A., Richards, G. N. & Zheng, G. Clean Energy from Waste and Coal Ch. 8 (American Chemical Society,1992).
Samolada, M. C., Papafotica, A. & Vasalos, I. A. Catalyst evaluation for catalytic biomass pyrolysis. Energy Fuels 14, 1161 (2000).
Yan, Z.p, Lin, L. & Liu, S. Synthesis of γvalerolactone by hydrogenation of biomassderived levulinic acid over Ru/C catalyst. Energy Fuels 23, 3853 (2009).
Gilkey, M. J. & Xu, B. Heterogeneous catalytic transfer hydrogenation as an effective pathway in biomass upgrading. ACS Catal. 6, 1420 (2016).
Alamillo, R., Tucker, M., Chia, M., PagánTorres, Y. & Dumesic, J. The selective hydrogenation of biomassderived 5hydroxymethylfurfural using heterogeneous catalysts. Green. Chem. 14, 1413 (2012).
Lee, J., Kim, Y. T. & Huber, G. W. Aqueousphase hydrogenation and hydrodeoxygenation of biomassderived oxygenates with bimetallic catalysts. Green. Chem. 16, 708 (2014).
Laskar, D. D., Tucker, M. P., Chen, X., Helms, G. L. & Yang, B. Noblemetal catalyzed hydrodeoxygenation of biomassderived lignin to aromatic hydrocarbons. Green. Chem. 16, 897 (2014).
Cortright, R. D., Davda, R. R. & Dumesic, J. A. Hydrogen from catalytic reforming of biomassderived hydrocarbons in liquid water. Nature 418, 964 (2002).
Jain, A. et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Curtarolo, S. et al. AFLOW: an automatic framework for highthroughput materials discovery. Comput. Mater. Sci. 58, 218 (2012).
Kirklin, S. et al. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. Npj Comput. Mater. 1, 15010 (2015).
Winther, K. T. et al. CatalysisHub.org, an open electronic structure database for surface reactions. Sci. Data 6, 75 (2019).
Morin, C., Simon, D. & Sautet, P. Intermediates in the hydrogenation of benzene to cyclohexene on Pt(111) and Pd(111): a comparison from DFT calculations. Surf. Sci. 600, 1339 (2006).
Peterson, A. A. Global optimization of adsorbate–surface structures while preserving molecular identity. Top. Catal. 57, 40 (2014).
Chang, C. & Medford, A. J. Application of density functional tight binding and machine learning to evaluate the stability of biomass intermediates on the Rh(111) surface. J. Phys. Chem. C (2021).
Boes, J. R., Mamun, O., Winther, K. & Bligaard, T. Graph theory approach to highthroughput surface adsorption structure generation. J. Phys. Chem. A 123, 2281 (2019).
Deshpande, S., Maxson, T. & Greeley, J. Graph theory approach to determine configurations of multidentate and high coverage adsorbates for heterogeneous catalysis. npj Comput. Mater. 6, 79 (2020).
Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J.L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB17. J. Chem. Inf. Model. 52, 2864 (2012).
Rappe, A. K., Casewit, C. J., Colwell, K. S., Goddard, W. A. & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024 (1992).
Gu, G. H., Plechac, P. & Vlachos, D. G. Thermochemistry of gasphase and surface species via LASSOassisted subgraph selection. React. Chem. Eng. 3, 454 (2018).
Wang, G.C., Li, J., Xu, X.F., Li, R.F. & Nakamura, J. The relationship between adsorption energies of methyl on metals and the metallic electronic properties: a firstprinciples DFT study. J. Comput. Chem. 26, 871 (2005).
Pascal, M. et al. Methyl on Cu(111)––structural determination including influence of coadsorbed iodine. Surf. Sci. 512, 173 (2002).
Yang, Q. Y., Maynard, K. J., Johnson, A. D. & Ceyer, S. T. The structure and chemistry of CH_{3} and CH radicals adsorbed on Ni(111). J. Chem. Phys. 102, 7734 (1995).
GarcíaMuelas, R. & López, N. Statistical learning goes beyond the dband model providing the thermochemistry of adsorbates on transition metals. Nat. Commun. 10, 4687 (2019).
Vorotnikov, V. & Vlachos, D. G. Group additivity and modified linear scaling relations for estimating surface thermochemistry on transition metal surfaces: application to furanics. J. Phys. Chem. C. 119, 10417 (2015).
Sutton, J. E., Panagiotopoulou, P., Verykios, X. E. & Vlachos, D. G. Combined DFT, microkinetic, and experimental study of ethanol steam reforming on Pt. J. Phys. Chem. C. 117, 4691 (2013).
Salciccioli, M., Chen, Y. & Vlachos, D. G. Microkinetic modeling and reduced rate expressions of ethylene hydrogenation and ethane hydrogenolysis on platinum. Ind. Eng. Chem. Res. 50, 28 (2011).
Filot, I. A. W., van Santen, R. A. & Hensen, E. J. M. The optimally performing Fischer–Tropsch catalyst. Angew. Chem. Int. Ed. 53, 12746 (2014).
Cheng, J. et al. Some understanding of Fischer–Tropsch synthesis from density functional theory calculations. Top. Catal. 53, 326 (2010).
Schumann, J. et al. Selectivity of synthesis gas conversion to C2+ oxygenates on fcc(111) transitionmetal surfaces. ACS Catal. 8, 3447 (2018).
Pu, T., Tian, H., Ford, M. E., Rangarajan, S. & Wachs, I. E. Overview of selective oxidation of ethylene to ethylene oxide by ag catalysts. ACS Catal. 9, 10727 (2019).
Dellamorte, J. C., Lauterbach, J. & Barteau, M. A. Rhenium promotion of Ag and Cu–Ag bimetallic catalysts for ethylene epoxidation. Catal. Today 120, 182 (2007).
Mato, M., Franchino, A., Garcı́aMorales, C. & Echavarren, A. M. Goldcatalyzed synthesis of small rings. Chem. Rev. 121, 8613 (2021).
Corma, A. & Garcia, H. Supported gold nanoparticles as catalysts for organic reactions. Chem. Soc. Rev. 37, 2096 (2008).
Gu, G. H., Mullen, C. A., Boateng, A. A. & Vlachos, D. G. Mechanism of dehydration of phenols on noble metals via firstprinciples microkinetic modeling. ACS Catal. 6, 3047 (2016).
Xu, Z. & Kitchin, J. R. Probing the coverage dependence of site and adsorbate configurational correlations on (111) surfaces of late transition metals. J. Phys. Chem. C. 118, 25597 (2014).
Zhang, J., Glezakou, V.A., Rousseau, R. & Nguyen, M.T. NWPEsSe: an adaptivelearning global optimization algorithm for nanosized cluster systems. J. Chem. Theory Comput. 16, 3947 (2020).
Janet, J. P., Ramesh, S., Duan, C. & Kulik, H. J. Accurate multiobjective design in a space of millions of transition metal complexes with neuralnetworkdriven efficient global optimization. ACS Cent. Sci. 6, 513 (2020).
Bisbo, M. K. & Hammer, B. Efficient global structure optimization with a machinelearned surrogate model. Phys. Rev. Lett. 124, 086102 (2020).
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio totalenergy calculations using a planewave basis set. Phys. Rev. B 54, 11169 (1996).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
Vorotnikov, V., Mpourmpakis, G. & Vlachos, D. G. DFT study of furfural conversion to furan, furfuryl alcohol, and 2methylfuran on Pd(111). ACS Catal. 2, 2496 (2012).
Blöchl, P. E. Projector augmentedwave method. Phys. Rev. B 50, 17953 (1994).
Methfessel, M. & Paxton, A. T. Highprecision sampling for Brillouinzone integration in metals. Phys. Rev. B 40, 3616 (1989).
Monkhorst, H. J. & Pack, J. D. Special points for Brillouinzone integrations. Phys. Rev. B 13, 5188 (1976).
Blöchl, P. E., Jepsen, O. & Andersen, O. K. Improved tetrahedron method for Brillouinzone integrations. Phys. Rev. B 49, 16223 (1994).
Grimme, S., Antony, J., Ehrlich, S. & Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFTD) for the 94 elements HPu. J. Chem. Phys. 132, 154104 (2010).
Gu, G., Lee, M., Jung, Y., & Vlachos D. G. Automated Exploitation of the Big Configuration Space of Large Adsorbates on Transition Metals Reveals Chemistry Feasibility, AdsorptionConfiguration_MS2021, https://doi.org/10.5281/zenodo.6343921, 2022.
Acknowledgements
This work was supported by the National Research Foundation of Korea, the Ministry of Science and ICT under award numbers 2021R1C1C2094407 (G.G.) and 2019M3D3A1A01069099 (Y.J.), and as part of the Catalysis Center for Energy Innovation, an Energy Frontier Research Center funded by the US Department of Energy, Office of Science, Office of Basic Energy Sciences under award number DESC0001004 (D.G.V. and M.L). We acknowledge the Korea Institute of Science and Technology Information (KISTI) for the computational resources provided for this research.
Author information
Authors and Affiliations
Contributions
G.G. conceived this project and developed the enumeration algorithm, and the ML model, and analyzed the configuration space. G.G. and M.L. performed DFT calculations. G.G., Y.J., and D.G.V. discussed the results and assisted with the manuscript preparation.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Rodrigo GarcíaMuelas, Sergey Levchenko, and the other, anonymous, reviewer for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gu, G.H., Lee, M., Jung, Y. et al. Automated exploitation of the big configuration space of large adsorbates on transition metals reveals chemistry feasibility. Nat Commun 13, 2087 (2022). https://doi.org/10.1038/s41467022297057
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467022297057
This article is cited by

Fast evaluation of the adsorption energy of organic molecules on metals via graph neural networks
Nature Computational Science (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.