Machine-Learning Driven Global Optimization of Surface Adsorbate Geometries

The adsorption energies of molecular adsorbates on catalyst surfaces are key descriptors in computational catalysis research. For the relatively large reaction intermediates frequently encountered, e.g., in syngas conversion, a multitude of possible binding motifs leads to complex potential energy surfaces (PES), however. This implies that ﬁnding the optimal structure is a diﬃcult global optimization problem, which leads to signiﬁcant uncertainty about the stability of many intermediates. To tackle this issue, we present a global optimization protocol for surface adsorbate geometries which trains a surrogate machine learning potential on-the-ﬂy. The approach is applicable to arbitrary surface models and adsorbates and minimizes both human intervention and the number of required DFT calculations by iteratively updating the training set with conﬁgurations explored by the algorithm. We demonstrate the eﬃciency of this approach for a diverse set of adsorbates on the Rh(111) and (211) surfaces.


Introduction
The adsorption energies of key reactants are the most commonly used descriptors for the activity of heterogeneous catalysts, and thus play an important role in understanding and designing catalysts. [1,2] In the case of small adsorbates (e.g. OH x , CH x , NH x ) and ideal metal surfaces, adsorption energies are relatively straightforward to calculate with first-principles methods. In this case, the adsorbates feature simple monodentate binding motives and the number of symmetry inequivalent surface sites on low-index crystalline facets is small. Indeed, one can even benefit from simple linear scaling relations to estimate adsorption energies across different surfaces. [3] The situation is different when larger reactants are involved in the process. A prime example of this is synthesis gas (syngas) conversion, which is an important industrial process that can be used to produce valuable chemicals and fuels such as ethanol and higher hydrocarbons. Here, the reaction network features a wide range of differently sized intermediates. [4][5][6][7] Understanding the selectivity of syngas conversion on a given catalyst therefore inevitably requires at least an approximate notion of the adsorption energies of these larger molecules.
The main problem with this is that there are typically many local minima on the adsorbate binding energy surface for larger adsorbates. This is due to the fact that the possible binding motives on a catalyst surface are numerous in this case, including multidentate geometries. Furthermore, the adsorbates may display significant internal flexibility, so that the convergence of a local geometry optimization will strongly depend on the details of the initial geometry used to set up the calculation. These issues are exacerbated when working with non-trivial surface models (e.g. high-index, defected or amorphous surfaces). In combination, these factors presently hinder the robust and reproducible prediction of the adsorption energies of large, flexible molecules on catalyst surfaces.
The most common way of addressing this issue is so-called "brute-intuition". [8] This basically means constructing a number of reasonable starting geometries by hand and carrying out local relaxation for each in order to find the most stable adsorbate geometry. However, this approach is clearly biased by the intuition of the user. To avoid this, a full "brute-force" search can be performed, which amounts to the high-throughput screening of all viable candidate geometries. [9,10] This requires an unbiased and efficient strategy for enumerating the viable candidates, e.g. via a graph-based approach (as in CatKit) [11] or by sampling in internal coordinate ranges (as in DockOnSurf) [12]. Unfortunately, the brute-force strategy quickly becomes computationally prohibitive due to the large number of candidates that are typically obtained. Furthermore, both CatKit and DockOnSurf still require the definition of molecular conformers, surface binding sites and anchor-points on the molecule, all of which are themselves potential sources of bias.
Ultimately, these issues are common to all global optimization problems. We can therefore draw from the wealth of algorithms developed for this purpose, such as simulated annealing [13], basin hopping [14] or minima hopping [15]. These approaches can in principle be used to perform an unbiased search of the potential energy surface (PES) without enumerating candidates, but they are typically prohibitively expensive in conjunction with first-principles methods like density functional theory (DFT). To overcome this, semiempirical methods or empirical potentials have been used, [5,8,16] but their availability and quality across the periodic table is often lacking, particularly for the description of molecule/surface interactions. This makes the use of global optimization algorithms like minima hopping still rather uncommon in computational catalysis research.
With the development of machine learning (ML) techniques over the past decades, there is promise that the computational bottleneck towards the unbiased search for ground-state adsorbate geometries can be overcome with a data-driven approach. Specifically, ML surrogate models can replace DFT calculations with orders of magnitude lower computational cost. Importantly, this can be achieved with little sacrifice of accuracy, provided that the training configurations are sufficiently representative of the PES. Indeed, there are already ML models which accurately predict adsorption energies, mostly focusing on simple adsorbates on different surface models. [11,[17][18][19] Recently, these efforts have been expanded toward larger adsorbates with promising results [4,[20][21][22], though these approaches still rely on a static representation of the adsorbate/surface systems and avoid a full global optimization of the geometric structure. Since ML surrogate models have already shown great potential in local [23][24][25][26][27][28] and global [29][30][31][32][33][34][35][36][37][38] geometry optimization problems, we herein want to expand this approach to obtain a efficient and universal algorithm for finding the global minimum geometries of adsorbate/surface systems.
Specifically, we present an active learning workflow based on Gaussian Approximation Potentials (GAP) [39], which constructs its own training set on-the-fly during constrained minima hopping simulations. Only a small number of single-point DFT calculations are used for training, while the GAP PES is used to widely the configuration space, yielding a diverse set of promising candidates. These can be further refined with local DFT relaxations. Importantly, the necessary hyperparameters of the ML model are automatically determined by robust heuristics, leading to a method with high data-efficiency and requiring minimal human intervention.

Global Optimization Workflow
The global optimization protocol presented herein follows a similar philosophy as the GAP-driven random structure search (RSS) proposed by Deringer, Pickard and Csányi [40] in two aspects. First, it requires no assumptions about the adsorbate geometry or site a priori. Secondly, the global Hookean Fix 2 Layers

Fix All Slab (b)
Step (N hop ) E (c)  structure search and GAP fitting are performed simultaneously, since a pool of candidate structures is iteratively generated by Minima Hopping (MH) and used to select new training samples. Nonetheless, the specific setting of searching for surface-adsorbate geometries requires a specialized workflow, since RSS is ill-suited for maintaining the molecular identity of the adsorbates.
The full protocol is composed of three parts. In the first part, iterative GAP fitting is performed, using a series of MH searches to generate new training samples. Once this training process is converged, an extensive MH production run is performed, using parallel simulations that contribute to a shared pool of minima. From this pool, a diverse set of promising structures is selected, using Kernel Principal Component Analysis (PCA) and k-means clustering. These structures are subsequently locally relaxed at the DFT level. The full workflow is illustrated in Fig. 1 and the individual steps will be described in detail in the following.

Iterative Training
The workflow is initiated by providing a SMILES string [41] representing the adsorbate and the relaxed geometry of the clean surface as input. A rough gas-phase geometry of the adsorbate is obtained using the Merck Molecular Force Field (MMFF) [42] as implemented in RDkit. [43] It should be noted that empirical force fields like MMFF are not well suited for describing (poly-)radical adsorbates that frequently occur in heterogeneous catalysis. Nevertheless, the obtained geometries are sufficient for our purpose, as they avoid unphysical contacts, preserve the bonding topology of the adsorbate and yield reasonable bond lengths, which are in turn used for defining so-called Hookean constraints. [8] These ensure that the molecular identity of the molecule of interest is maintained throughout the simulation (see Methods section). Additionally, the metal surface atoms are constrained for all GAP simulations, whereas only the lower layers of the surface are constrained in the final DFT relaxation (see Fig. 1 The initial geometry is then randomly placed onto the catalyst surface and the energy and forces of the full system are evaluated with DFT. This single configuration represents the initial training set of the GAP. Not surprisingly, the quality of the corresponding potential is low and results in rather unphysical structures for the first MH run (see Fig. 2). While these structures are therefore not very useful from a global optimization perspective, they nevertheless help to substantially improve the GAP, marking high energy regions of the PES. Note that subsequent iterations begin with newly randomized initial structures to further aid the MH runs in exploring as different region of the PES as possible.
Based on these structures, a small number of DFT calculations is performed. These serve the dual purpose of validating the quality of the GAP model and generating new training data for the next iteration. This leads to an important design choice for the workflow, namely how to select which structures are added to the training set. Our primary goal is to find the global minimum geometry for a given combination of adsorbate and surface. To this end, the GAP should provide accurate geometries of all local minima and reliably rank their relative stability. Clearly, minimum structures will thus form an important part of the training set. However, training only on putative minima would preclude any infomation about energetic barriers on the PES and potentially lead to numerical instabilities during the molecular dynamics (MD) runs required for MH (see Fig.  1(c)). After some experimentation, we therefore settled on the simple strategy of adding the global minimum of a given MH run, two random local minima, and two random snapshots from the MD trajectories.
Note that potentially more sophisticated algorithms such as estimated uncertainties could be used to this end. [44][45][46] Alternatively, farthestpoint heuristics based on the Smooth Overlap of Atomic Positions (SOAP) representation [47] have also been found to be useful for selecting diverse training samples. [7,48,49] However, both of these approaches are not ideal for the current setting. Uncertainties require calibration and the optimal threshold for when to perform a DFT calculation is system dependent. [50] Farthest-point heuristics, on the other hand, tend to only be beneficial when starting from reasonably large datasets, and actually lead to larger errors than random sampling for the kinds of very small datasets used herein. [7,48] In contrast to this, the simple selection scheme used herein is universally applicable and robust. Additional analysis of training set diversity is provided in the SI.

Model Convergence
In addition to providing training data for the next GAP model, the DFT calculations are also used to estimate the out-of-sample error of the current GAP. In principle, the iterative training procedure can be considered as converged once the predicted energies and forces for the minima are sufficiently accurate (while the accuracy for the high temperature MD structures is less important). However, due to the small number of calculations performed at each iteration, the Root Mean Square Deviation (RMSD) with respect to the DFT reference only provides a noisy estimate of the true outof-sample error. This is shown in Fig. 3 for the specific case of CH 3 OO on Rh(211), where the RMSD oscillates strongly in the initial iterations. More importantly, the RMSD can be very low for some iterations and subsequently spike, meaning that it is not a robust measure of convergence.
To circumvent this issue, we use the Exponential Moving Average (EMA) of the RMSD to estimate the convergence of the training procedure: (1) Here, the hyperparameter α determines how quickly the weights of previous RMSDs decay in the average, with α = 1.0 recovering the current RMSD at each iteration i. As shown in Fig. 3, the EMA displays a slower and smoother decay than the RMSD. We use 0.3 for α throughout the workflow as it exhibits a reasonable balance between providing a conservative error estimate without unduly inflating the number of iterations. We consider the GAP to be converged  when the EMA falls below 8 meV/atom for the energy or 0.15 eV/Å for the forces. As usual, these criteria are somewhat arbitrary and have been found empirically to yield a good balance between data-efficiency and accuracy for the systems investigated here.

Parallel Minima Hopping
Upon convergence, the training process has already yielded an extensive set of putative minimum structures. However, the quality of these structures is rather low for the initial iterations. The converged GAP model is therefore used in an additional extensive MH production run. To this end, we use a parallel minima hopping scheme where a number of independent MH simulations are spawned and explore distinct regions of the PES simultaneously, sharing the same history of visited minima. [8] This avoids the danger of a single MH run spending a significant amount of time trapped in a PES region far from the global minimum due to high barriers. [51] Initial structures are diversified by random rotations of the rigid molecule with respect to its center of mass and random translations along the metal surface. Forty parallel processes are executed.
Since the global minimum structure is generally not known a priori, deciding when to terminate the parallel MH run is non-trivial. In previous work, the temperature of the MD simulations has been used as a stopping criterion, since the algorithm increases the temperature by a certain factor whenever it revisits known minima. [52] We also adopt this strategy herein, with each process performing an independent simulation with its own temperature and terminating when the temperature reaches twice the initial temperature (i.e. 4000 K) or a maximum number of iterations is exceeded (80). Note that convergence is strongly accelerated in the parallel MH approach, as several processes commonly fall into overlapping regions of the PES, rediscovering minima previously found by a nearby walker.

Candidate Selection and DFT Refinement
The parallel MH production run attempts an exhaustive exploration of the adsorbate conformer and binding site space, which typically results in a large ensemble of minimum adsorption structures. However, these are minima on the GAP PES (and additionally subject to the Hookean constraints on the adsorbate geometry), whereas our goal is to ultimately obtain the unconstrained minima on the DFT PES. To this end, a subset of the GAP minima (five structures in the current work) is further refined at the DFT level (using the revised Perdew-Burke-Ernzerhof functional, revPBE, and the vdW surf dispersion correction, see Methods section). [53,54] Naively, one can select the five lowest energy structures from the GAP ensemble, given that we are looking for the global minimum. However, this turns out to be a poor sampling strategy, because it often yields a set of minima with very similar geometries. Furthermore, molecular dissociation is sometimes observed during DFT relaxation, e.g. when the candidate structure is only a minimum on the constrained PES or because of inconsistencies between the GAP and DFT PES.
In light of these issues, it is helpful to investigate the structural diversity within the candidate ensemble prior to selecting configurations to refine. To this end, we map the structures from the conformer ensemble into a 2D space using Kernel PCA, with the averaged SOAP vector as a representation of each system (see Fig. 4). [55,56] This allows the visualization of how the candidate structures are distributed in terms of their structural similarity. As we are assuming a fixed computational budget of five DFT relaxations, we apply k-means clustering to partition the Kernel PCA space into k = 5 distinct regions. Finally, the structure with the lowest formation energy, E form , (according to GAP) from each cluster is selected as a representative candidate for DFT relaxation. (For definition of E form , see Methods section) As mentioned above, the candidate structures produced by the GAP are subject both to Hookean constraints within the adsorbate and the frozen surface layers. Abrupt removal of all constraints for the DFT relaxation can sometimes lead to rather large initial forces on the atoms, possibly causing a spurious dissociation of the adsorbate. To prevent such repercussion, DFT relaxations are mildly performed in a stepwise manner. Initially, only the constraints for the upper metal layers are removed, while the Hookean constraints are retained until the maximum force component falls below 0.2 eV/Å. When this condition is satisfied, the Hookean constraints are also removed and the structure is fully relaxed until the maximum force norm is lower than 0.05 eV/Å.

Application to Molecular Adsorbates on Rhodium
To demonstrate the applicability of the proposed workflow in the context of heterogeneous catalysis, we considered a set of thirteen small to midsized adsorbates on Rh(111) and five on Rh(211). These were previously studied in detail by Yang et al. (Ref. 5), while investigating the selectivity of CO hydrogenation on Rh. Importantly, these authors also used MH simulations to determine putative global minima, using approximate energies and forces from the density functional tight binding (DFTB) as implemented in Hotbit. [57] This allows us to benchmark the accuracy of the present workflow for a set of complex adsorbates. It should be noted that the computational setup used by Yang et al. is somewhat different from ours, mainly because they use a different Generalized Gradient Approximation (GGA) functional, dispersion correction (BEEF-vdW) and DFT code (QuantumEspresso). [58] Due to the different setups, their reported structures were re-optimized at the revPBE+vdW surf level with FHI-aims. The formation energies obtained from the present workflow are compared to the ones of Yang et al. in Fig. 5. In all cases, the global minima suggested by the proposed workflow display comparable or lower formation energies than the previously reported ones, indicating that both the quality of the GAP potential and the MH search are sufficient to predict good starting points for DFT relaxation. More precisely, the here reported formation energies are on average 0.3 eV and 0.7 eV lower for Rh(111) and (211), respectively. In the most extreme case, we find a geometry for CHCO on Rh(211) that is more stable by 1.39 eV. Clearly, such large energy differences can have significant implications in catalysis.
Given these discrepancies, it is instructive to explore the differences between the new and previously reported structures in more detail, as shown on the bottom of Fig. 5. This reveals that in some cases, large energy differences can be attributed to different adsorption sites, as is the case for CH 2 CO on Rh(111) or CH 3 on Rh(111). In other cases, however, the decisive factor appears to be the orientation and conformation of the molecule on the site, as can be seen for CH 3 CHCOH on Rh(211) and CHCO on Rh(211). Here, the proposed GAP/MH workflow potentially is potentially advantageous compared to discrete graphbased algorithms, since it simultaneously explores binding sites and molecular conformations.
Since both the current workflow and the work of Yang et al. combine MH on an approximate PES with local DFT relaxations, the differences shown in Fig. 5 are strikingly large. Importantly, all energies reported therein were obtained with the same functional and code (revPBE+vdW surf ), so that all shown configurations represent local minima on the same PES. The observed differences therefore likely stem from the use of a DFTB model (fitted to BEEF+vdW energies) for the MH search in Ref. 5. Indeed, the authors of Ref. 5 report that the energetics of their DFTB model are generally poor and therefore only use it as a structure generator. However, the current results indicate that reasonable agreement between the approximate PES used for MH and the target PES is essential for reliably predicting low-energy adsorbate geometries, even if the candidate structures are ultimately refined with the target method.
It should also not be discounted that there are intrinsic differences between BEEF-vdW and revPBE+vdW surf . This can be seen most prominently for the case of CHO on Rh(111). This molecule consistently decomposes into CO and H during the final DFT relaxation step of our workflow, whereas Yang et al. report it as a local minimum. Interestingly, their structure can indeed be reoptimized into a (shallow) local minimum at the revPBE+vdW surf level, with a dissociation  barrier of 0.13 eV (see SI). The high-temperature MD based strategy of MH is clearly not well suited for the discovery of such shallow minima.

Computational Cost
To put the computational benefit of the current workflow into perspective, we briefly discuss the timings for a representative system (CH 3 CHOH on Rh(211), see SI). In terms of core-hours, the dominating factors are the DFT single-point calculations used for generating the training set (21%) and the final DFT relaxations (77%), whereas the cost of fitting the GAP models and running the MH simulations is overall almost negligible. Importantly, the cost for single-point calculations depends on the complexity of the PES and is thus somewhat system-dependent. In this context, CH 3 CHOH on Rh(211) represents the worst-case scenario among the adsorbates we studied, requiring 26 iterations to achieve convergence. In contrast, simpler adsorbates like H 2 O and CH 3 only required 7-8 iterations. Nevertheless, even for a complex adsorbate like CH 3 CHOH, the full workflow is executed in less than 3000 core-hours (on a 40 core Intel Skylake 6148 node). Given that the bulk of this time is taken up by the local DFT relaxations, our workflow thus provides a full global optimization at a cost on the order of a local optimization. For comparison, performing the full parallel minima hopping run at the DFT level would entail approximately the 130-fold cost.

Discussion
In this manuscript, a global optimization workflow for surface adsorbates was presented and tested on important reaction intermediates for ethanol synthesis on Rh (111) and (211) surfaces. The workflow is applicable to any kind of adsorbatesurface system, since no assumption about surface sites or binding motifs are made. To achieve high computational efficiency, a GAP surrogate model is used to explore the potential energy surface. Importantly, this model is iteratively trained during the MH simulations, achieving high-data efficiency. Furthermore, the fitting procedure is fully automated, requiring minimum human intervention thanks to the use of robust heuristics for training set and hyperparameter selection. Note that we have focused on the low-coverage regime herein by considering single adsorbates in supercells. Extension to larger coverage and/or multiple adsorbates is relatively straightforward. Since the cost of generating the training data is one of the main contributions to the overall effort, it would be desirable to further increase the data efficiency of the models in the future. Here, the use of well-parameterized baselines models is a promising route. [23,59] Finally, in order to be universally applicable, this workflow builds the surrogate model for each adsorbate from scratch. Clearly, reusing data and/or models for different adsorbates would be more efficient in many use cases. This is the subject of ongoing work.

Gaussian Approximation Potentials
Gaussian Approximation Potentials are ML-based interatomic potentials that provide accurate representations of high-dimensional PESs. [39] Since the GAP methodology has recently been extensively reviewed, [60] we only provide a brief overview of the aspects which are most pertinent to the current paper.
In this work, we use GAPs that contain a twobody term and a many-body term based on the SOAP representation. [47] In order to improve the stability of the potentials in the initial training iterations (when very little data is available), an additional purely repulsive two-body baseline potential is added, which prevents unphysically close contacts between atoms and is zero otherwise (see SI). Apart from these basic design choices, there are a number of hyperparameters that need to be set in order to fit a GAP. These pertain to the basis expansions used in the two-and manybody representations, element dependent cutoff radii and switching functions, regularization of the fit and the prior weightings of two-and many-body terms.
Generally, the hyperparameters in ML models can simply be optimized, e.g. with respect to the cross-validation error of the potential. However, crossvalidation and similar approaches require a sufficiently large dataset, in order to obtain a robust estimate of the generalization error. In the present work, we aim to develop an active learning workflow which starts from scratch, so that this is clearly not the case. We therefore use a mix of robust defaults and simple heuristics to set and update the hyperparameters, as detailed in the SI.
Briefly, physically motivated lengthscale parameters for the SOAP representation are available for all elements and used without modification. [56] Additionally, robust defaults are used for the remaining hyperparameters necessary for defining the representations and kernels. This leaves a small number of system-dependent hyperparameters, namely the regularization strength for energies and forces (σ E and σ F ) and the prior weights of the two and many-body terms (δ 2b and δ MB ). These are determined according to the heuristics discussed in Ref. 60, see SI for details.

Constrained Minima Hopping
MH is a global optimization algorithm, which has been extensively applied to surface and bulk structure searching problems. [15,51] The basic idea is to use short, high-temperature MD runs to escape a given minimum on the PES, followed by local relaxations into the next local minimum. Importantly, MH keeps track of previously visited minima making it more efficient in finding new structures. Moreover, the algorithm drives towards lower energy structures by adaptively adjusting the temperature and energy threshold parameters, which determine the intensity of the hopping moves and the acceptance criteria for new minima. For a detailed description, we refer readers to the original MH publication. [15] In the context of adsobate optimization, a common problem with MH is that the high temperature MD often leads to the dissociation of the adsorbate. To address this issue, Peterson proposed using so-called Hookean constraints on bond distances, which add a harmonic energy penalty to the total energy when covalent bonds in the adsorbate are stretched beyond a certain length, thus preserving the molecular identity of the adsorbate. [8] Throughout this work, a spring constant (k) of 20 eV/Å 2 is used. The threshold distances were set individually for each bond, using 1.05 times the bond distance in the gas-phase molecular geometry as optimized with the MMFF. In some cases the adsorbate detaches and remains floating above the metal surface owing to the high temperature in the MD. To avoid this, an additional Hookean constraint can be imposed, which pushes the adsorbate back towards the surface when it moves too far away from it.

Computational Details
As representative catalytic surface models, surface slabs were constructed from (3 × 3) Rhodium (Rh) surface supercells with a thickness of four metal layers. Both the low-index (111) and stepped (211) facets were used. Due to their relevance in catalysis modeling, we report formation energies (E form ) with respect to gas-phase H 2 O, CO, and H 2 : where E slab+mol is the energy of the combined surface and adsorbate system, E slab is the energy of the clean slab, ε i is the gas-phase reference atomic energy of element i, and n i denotes the number of occurrences of element i in the adsorbate molecule. All DFT calculations were carried out using the revised Perdew-Burke-Ernzerhof (revPBE) functional [53] with default light integration grids and "tier-1" numerical atomic orbital basis set as implemented in FHI-aims. [61] These settings represent a good compromise between accuracy and cost, and are commonly used for structure searching. [62] Nevertheless, it should be noted that the reported values of E form are not fully converged with respect to the basis set size. Van der Waals (vdW) interactions were corrected with the Tkatchenko-Scheffler method [63] using optimized Rh parameters for molecule-surface interactions (termed vdW surf ). [54]. The Brillouin zone was sampled with a 4 × 4 × 1 k-grid. Selfconsistency convergence criteria were set to 1 × 10 −4 eV/Å (forces), 1 × 10 −5 eV (total energy), 1 × 10 −3 eV (sum of eigenvalues), and 1 × 10 −4 e/a 3 0 (charge density). For DFT geometry optimization, the Fast Inertial Relaxation Engine (FIRE) algorithm [64] was used as implemented in the Atomic Simulation Environment (ASE). [65] Supplementary information. Further details on hyperparameters for GAP, heuristics on GAP hyperparameter selection, and parameters for minima hopping, energetic data in Fig. 5 are provided.