High-throughput calculations of catalytic properties of bimetallic alloy surfaces

A comprehensive database of chemical properties on a vast set of transition metal surfaces has the potential to accelerate the discovery of novel catalytic materials for energy and industrial applications. In this data descriptor, we present such an extensive study of chemisorption properties of important adsorbates - e.g., C, O, N, H, S, CHx, OH, NH, and SH - on 2,035 bimetallic alloy surfaces in 5 different stoichiometric ratios, i.e., 0%, 25%, 50%, 75%, and 100%. To our knowledge, it is the first systematic study to compile the adsorption properties of such a well-defined, large chemical space of catalytic interest. We propose that a collection of catalytic properties of this magnitude can assist with the development of machine learning enabled surrogate models in theoretical catalysis research to design robust catalysts with high activity for challenging chemical transformations. This database is made publicly available through the platform www.Catalysis-hub.org for easy retrieval of the data for further scientific analysis.

www.nature.com/scientificdata www.nature.com/scientificdata/ This gives a total of 96,015 unique surfaces, adsorbate and site combinations (including the empty slabs), where roughly 65,000 calculations are completed. Also, the adsorption of the hydrogenated species CH, NH, CH 2 , CH 3 , SH, OH and H 2 O has been studied for a smaller subset of alloy surfaces, where alloys formed from 16 metals of particular interest for catalysis have been chosen, with approximately 25,000 calculations completed. We note that due to reorientation of adsorbates during structure relaxation, the number of unique surface structures are lower than the number of initially sampled configurations. More than 90,000 adsorption and reaction energies have been parsed from the dataset, where approximately 30,000 adsorption energies stems from the monoatomic adsorbates (H, C, N, O and S), and 10,000 adsorption and reaction energies stems from the hydrogenated adsorbates. The remaining reaction energies are generated by decomposing a set of gas phase molecules of interest for catalysic applications, such as CH 4 (g), NH 3 (g), CO 2 (g), CH 2 CH 2 (g), CH 3 OH(g), H 2 O 2 (g), CH 3 COOH(g), into their atomic constituents on the surfaces.
The dataset is made available from the open repository Catalysis-Hub.org 17 , where reaction energies and barriers from more than 50 publications and datasets can be accessed.
Examples of calculated adsorption energies are given in Fig. 2, showing the results for the most stable adsorption sites for atomic carbon (C), oxygen (O), and nitrogen (N). In Fig. 2(a,b) the adsorption energies are plotted as a function of metal A and B, that are arranged on an improved Pettifor scale 18,19 , with small adjustments for magnetic elements, which ensures a smooth variation of the adsorption energies with composition. Grey areas in the figure can be seen for structures where converged adsorption energies could not be obtained due to surface reconstruction, mismatch in the magnetic structure of the slab and the adsorbate-slab structure or convergence problems for the electronic structure calculation.
Another approach for visualizing adsorption energy trends is to plot the adsorption energy of two adsorbates against each other, which often gives rise to linear scaling relationships for similar surface geometries. Utilizing scaling relationships is a well established approach in theoretical catalysis to model and understand catalytic activity and selectivity 6,20 . In the lower panel of Fig. 2 the correlation between the adsorption of carbon with (c) oxygen and (d) nitrogen is shown. Metals containing a partially filled d-band versus a filled or empty d-band is labeled as d-and non-d metals respectively. All alloys containing a non-d metal are labeled as non-d alloys. While a close to linear relationship between the adsorption of C and O can be seen for the d and non-d pure metals separately, the correlation between the atomic adsorption energies on the alloys are more complicated, emphasizing the need for more sophisticated methods for modelling these systems, such as data-driven approaches. A link to the script used to plot Fig. 2 by fetching the data directly with the Catalysis-Hub Python API is provided in the Methods section.

Methods
Adsorption energies were calculated with DFT and obtained from the equation: where the gas phase species are chosen among CH 4 (g), H 2 (g), N 2 (g), H 2 O(g) and H 2 S(g). A full list of studied adsorbates and references used are given in Table 1.
The Catalysis Kit (CatKit) 21 software was used to generate the slab structures from optimized bulk systems and to systematically enumerate all high-symmetry adsorption sites. The generated structures were stored and handled with the ASE database 22 .
DFT calculations were performed in the Quantum Espresso (QE) electronic structure code 23 , using the BEEF-vdw exchange correlation functional 24 , a 500 eV plane-wave cutoff, and a 5,000 eV density cutoff. Monkhorst-Pack k-point grids of (12,12,12) for bulk and (6, 6, 1) for slab calculations were used, with a 0.15 eV Fermi smearing. Spin-polarized calculations were performed only for alloys containing Fe, Ni, Co, and Mn, while allowing the magnetic moments to converge during the electronic structure optimization. Initial magnetic moments of 3, 3, 2, 1 μ B was chosen for Fe, Mn, Co and Ni respectively, and set to zero for all other elements. For the A 1 and L1 2 structures, lattice constants were obtained from bulk alloy calculations with an equation of state combined with an energy minimization in QE. For L1 0 structures we used a variable cell optimization in QE with a high plane wave cutoff (800 eV) and then used the resulting lattice constants as initial guess for the final energy minimization with respect to lattice constant parameters -i.e., 'a' and 'c' -using the Scipy fmin optimizer 25 . Slab geometries were optimized by fixing the two bottom layers and allowing the top layer and adsorbates to relax. Due to the large number of calculations, job submissions were handled with FireWorks 26 and the CatFlow submodule of CatKit, that provides a FireWorks interface to QE and other electronic structures calculators supported by ASE.
Upon relaxation we found that reconstructions of the metal surfaces, e.g. horizontal sliding or dissociation of the top layer from the slab, are quite common. Also, we found that the adsorbates often reorient into other sites. The relaxed geometries were therefore post-processed with a tailored classification method to label reconstructed www.nature.com/scientificdata www.nature.com/scientificdata/ surfaces and reclassify the adsorption sites. Only non-reconstructed surfaces have been used to generate adsorption energies, although, as the reconstructed structures could be of interest for model generation, the atomic structures are still made available via the web and python APIs discussed in the Usage Notes section.

Data Records
All the DFT calculations stems from one dataset, generated by O. Mamun, K. Winther, and J. Boes, in the group of Thomas Bliggard at the SUNCAT-Center for Interface Science and Catalysis. The data are made available from two platforms, where the open electronic structure database Catalysis-Hub.org 17 , is the main resource for easy access to parsed adsorption energies. The dataset has been assigned its own permanent link at https://www. catalysis-hub.org/publications/MamunHighT2019. The Catalysis-Hub web interface enables in-browser search for reactions and chemical compositions, with a visualization of atomic geometries, that can be downloaded in several formats including CIF, JSON, VASP POSCAR and Quantum ESPRESSO input. A description on how to download reaction energies and atomic structures with the Catalysis-Hub (CatHub) Python API, available from the Zenodo repository 27 , is provided in the Usage Notes section.

technical Validation
To ensure the quality of the adsorption properties reported herein, the convergence with respect to all calculation parameters have been carefully checked. Adsorption and reaction energies have only been included for surface structures that do not undergo reconstruction upon relaxation. In the case of magnetic surface structures, we have only parsed adsorption and reaction energies if the discrepancy in total magnetization between the empty surface and the surface with the adsorbate is less than 4 in atomic units.
To illustrate the validity of the data, we compare the lattice constant reported in reputed journal articles to the computed lattice constant. We found excellent agreement between our results and previously computed lattice constants which are presented in Tables 2, 3 and 4. In Table 5, we also show a comparison between previously computed adsorption energies to those reported in this article. We also see good agreement between the reported and the computed adsorption energies with slight deviation which may be an artifact of different calculation setup and/or system size, i.e., pseudopotential, smearing scheme, number of layers and lateral size of the slab. For     www.nature.com/scientificdata www.nature.com/scientificdata/ example, the differences in adsorption energies between this work and ref. 8 , which is also available at https://www. catalysis-hub.org/publications/SchumannSelectivity2018, can be attributed to the use of a 4 layer (3 × 3) repeated surface slab model, compared to the 3 layer (2 × 2) slab used in this study.

Usage Notes
The CatHub software module, which is available from the Zenodo repository 27 , provides a Python API which is better suited for fetching a large amount of data from the Catalysis-Hub repository. A small script for obtaining pre-parsed adsorption energies in Python is provided below: Note that each data entry is given as a 'node' in a list of 'edges' , utilizing the graph-theory based query language GraphQL (https://graphql.org/). Since the Catalysis-Hub repository contains several datasets from different  Table 6. Names of sampled adsorption sites, where A and B refers to the choice of metals. The sites marked with '*' have not been sampled with the initial configuration shown in Fig. 1, but stems from deviation from the hexagonal surface structure for some of the L1 0 alloys or reorientation of the adsorbate into a subsurface site.
www.nature.com/scientificdata www.nature.com/scientificdata/ publications, the "pubId = 'MamunHighT2019'" argument must be assigned in the script above in order to query only this dataset. The script above queries for entries with hollow-site adsorption of C (product) with respect to the relevant gas phase species (reactants), on surfaces containing Mo as well as Ru. The reaction and product entries must be chosen (and matched) among the adsorbates and gas phase references in Table 1. A more specific query for adsorption site can be made by using the site names specified in Table 6.
Furthermore, easy access to all the atomic structures, calculation results and parameters in the study, can be obtained with the ASE database interface 22 , where the CatHub module features a convenient wrapper around the ASE db command line interface (CLI), used directly from a terminal. For example, the query: cathub ase 'pub_id = MamunHighT2019, relaxed = 1' will return a list with the first 20 results (out of approximately 90,000) for the relaxed configurations in the study. The initial geometries can be queried by setting 'relaxed = 0' . The atomic structures are labeled with an several key-value-pair metadata, that can be used to query the dataset. For example: cathub ase 'Pt,pub_id = MamunHighT2019,relaxed = 1, reconstructed = 0,SB_symbol = L10 -c formula,energy,adsorbate,site,site_type -L 100' --gui