Machine learning hydrogen adsorption on nanoclusters through structural descriptors

Catalytic activity of the hydrogen evolution reaction on nanoclusters depends on diverse adsorption site structures. Machine learning reduces the cost for modelling those sites with the aid of descriptors. We analysed the performance of state-of-the-art structural descriptors Smooth Overlap of Atomic Positions, Many-Body Tensor Representation and Atom-Centered Symmetry Functions while predicting the hydrogen adsorption (free) energy on the surface of nanoclusters. The 2D-material molybdenum disulphide and the alloy copper – gold functioned as test systems. Potential energy scans of hydrogen on the cluster surfaces were conducted to compare the accuracy of the descriptors in kernel ridge regression. By having recourse to data sets of 91 molybdenum disulphide clusters and 24 copper – gold clusters, we found that the mean absolute error could be reduced by machine learning on different clusters simultaneously rather than separately. The adsorption energy was explained by the local descriptor Smooth Overlap of Atomic Positions, combining it with the global descriptor Many-Body Tensor Representation did not improve the overall accuracy. We concluded that ﬁ tting of potential energy surfaces could be reduced signi ﬁ cantly by merging data from different nanoclusters.


INTRODUCTION
5][6][7] For example, gold clusters with a diameter of a few nanometres exhibit non-metallic properties due to quantum size effects. 80][11][12] These developments mean a tremendous combinatorial and structural space has opened up for rational catalyst design, where nanoscale experiments and computational screening can be used to optimize catalyst design. 135][16] At the cathode of electrolytic water splitting into hydrogen and oxygen, the hydrogen evolution reaction (HER) takes place.As part of the process, the currently used expensive noble metals, especially platinum group metals (PGM), categorised as critical by the European Commission, 17 need to be replaced to make the production of hydrogen competitive to other energy storage technologies.Some bimetallic alloy nanoclusters, such as copper-titanium 18 exhibit catalytic activity towards HER, thus binary combinations of metals are of high interest, particularly if the fraction of PGMs can be significantly reduced. 19Beyond metals, one candidate to replace PGMs are MoS 2 nanoclusters.Recent studies of single-layer MoS 2 have shown that its electronic band structure can be fine-tuned at the nanoscale. 20][22] The configurational space offered by the wide variety of nanocluster materials, active sites and environmental conditions means that a conventional approach to catalyst optimization, using ab initio methods, is particularly challenging.][25][26][27][28] In this work, we begin by considering the latest developments in descriptors for ML in materials science, as yet untested in nanocatalytic systems, and compare them in terms of accuracy and efficiency for characterizing a particular catalytic reaction, HER: This stands out as a relatively simple reaction with one intermediate state-adsorbed hydrogen on the catalyst surface.The rate of the reaction on a catalyst surface (denoted below as *) is determined by the hydrogen adsorption free energy ΔG H of the elementary Volmer step: According to the Sabatier principle, hydrogen should neither bind too weakly nor too strongly.This general principle explains why ΔG H can reasonably describe catalytic activity.Optimally, nanoclusters should have adsorption sites with ΔG H ≈ 0 to be considered catalyst candidates. 29,30Since this quantity is accessible by ab initio methods, directly from the adsorption energy of hydrogen, materials can be pre-screened computationally.Our approach is to build a large data set of hydrogen adsorption energies on a variety of nanoclusters, characterize this with appropriate structural descriptors, and then train a model to predict these energies for an arbitrary site based on its description.

Potential energy scan of sample clusters
As initial data sets, we started by mapping out the energy landscape of hydrogen adsorbed on the surface of one sample cluster for each system, MoS 2 and AuCu.
The two nanoclusters were fully scanned with respect to the hydrogen position and are depicted in Fig. 1. Figure 1a shows a potential energy scan of a triangular-shaped sample MoS 2 cluster with molybdenum-terminated edges (Fig. 1b).The cluster Au 40 Cu 40 -H had a flatter potential energy surface (Fig. 1c) than MoS 2 -H and no patterns were clearly apparent.On the other hand, MoS 2 -H had three distinct global minima at the edges where hydrogen bound to molybdenum.Since the cluster had a near-C 3symmetry the local environments of the 3 minima were equivalent.When hydrogen was bound at corner-sites, ΔE H increased, while the highest energy positions were observed on the surface sulphur atoms.Even though the C 3 -symmetry of the cluster was broken, ΔE H remained similar at different edges and corners.

Machine learning on single clusters
The data sets MoS 2 (single) and Au 40 Cu 40 (single) contained 10,000 DFT-based ΔE H single-point calculations of hydrogen positioned on the surface of the same cluster.We were interested in how many points were needed to predict the potential energy surface by interpolation.However, we did not conduct this interpolation in real space, but feature space with KRR.Thus, two points far away from each other in real space were close in feature space if the structures were similar.The feature space was spanned by the descriptors Atom-Centered Symmetry Functions (ACSF), Many-Body Tensor Representation (MBTR) or Smooth Overlap of Atomic Positions (SOAP).The goal was to reach an accuracy of 0.1 eV, which would allow us to make reasonable predictions of ΔE H for an arbitrary system.
Figure 2a shows learning curves predicting ΔE H at random positions around the triangular MoS 2 cluster (Fig. 1b).In this example only, we included the results for the Coulomb Matrix (CM) descriptor in order to see how it fares with respect to adsorption energy prediction.As we transformed the global descriptor into a local CM, we observed an improved accuracy.This was due to the strong dependence of ΔE H on the local environment.In general, the CM had a significantly higher MAE, which might be due to its values ranging over many orders of magnitude, 31 see also Fig. 3. To do justice to CM, it is possible to increase the accuracy a bit by randomly sorting it, and thus smoothening the feature space. 32ACSF performs comparably to ACSF H and MBTR with a training set larger than 3000, and reached the threshold of 0.1 eV at about 900 training points.ACSF H required only about 400 training points.SOAP and MBTR, on the other hand, had a MAE of 0.1 eV with only 300 training points, while SOAP also performed best at large training set sizes.
Figure 2b shows learning curves predicting ΔE H at random positions around a medium-sized AuCu cluster.SOAP and MBTR again performed equally well reaching the threshold of 0.1 eV with about 300 training points.Remarkably, ACSF H reached 0.1 eV MAE with only 100 training points, but it exhibited a shallow learning curve.Although ACSF had a lower accuracy with small training set sizes, it overtook ACSF H and MBTR with a training set larger than 3000.The low error with a large training set makes ACSF an excellent choice for Molecular Dynamics simulations where high accuracy is needed, for example simulations over many time steps where even small errors can propagate rapidly.A machine learning potential fitted to a large DFT data set provides energies close to the reference method. 31SOAP showed a similarly steep learning curve compared to ACSF, however was offset to a lower accuracy at all training set sizes.
To summarise the results for both test systems, ACSF needed a large training set, but then it was as good or even better than MBTR.This was due to the many symmetry functions used.If symmetry functions were eliminated by feature selection the performance of ACSF at lower training set sizes would likely be better.
Indeed, a principal component analysis revealed that 130 components for both data sets could explain 99% of the variance.A sensible choice was to restrict the features to ACSF H , the local version of ACSF.Expectedly, ACSF H performed better than global ACSF for smaller training set sizes.Systematic feature selection using e.g., mutual information could further reduce the MAE for small training set sizes.Eventually, ACSF H , MBTR and SOAP showed comparable MAE with smaller training set sizes.

Machine learning on multiple clusters
In the next step, we were interested if it was possible to interpolate between hydrogen adsorption sites on different Figure 2c shows the learning curve predicting ΔE H at random positions around multiple MoS 2 clusters.The descriptor SOAP reached a MAE of 0.1 eV with a training set size of 4000 (or 44 per cluster).It was estimated before that learning on the potential energy surface of a single cluster required 300 training points (MoS 2 (single)).This comparison clarified that learning on different clusters simultaneously was beneficial and interpolation in compound space was possible with similar nanoclusters.MBTR got as low as 0.13 eV with a training set size of 9000.The size of ACSF depended on the number of atoms in the system.Since the nanoclusters had different sizes and different compositions, it did not make sense to compare atoms other than hydrogen with each other.Hence, the local version of ACSF, ACSF H was taken.Similar to MBTR it did not reach the threshold of 0.1 eV, but got as close as 0.11 eV with 9000 training points.Since SOAP (here a local descriptor) and MBTR (here a global descriptor) were designed in such a way that they might contain information which the other did not, we tried to combine both.In this case, however, the combined and equally weighted features of MBTR and SOAP did not improve the overall accuracy.
To verify that the results were independent of the system, we repeated the analysis with the data set AuCu(multi) containing 24 small copper-gold clusters with a fixed size of 13 atoms, but different compositions.A total of 420 hydrogen positions were randomly chosen on the surface of each cluster.
Figure 2d   comparison confirmed that learning on different clusters was possible, which indicated that it should be possible on any nanocluster system.Furthermore, the fact that MBTR and SOAP combined did not improve the overall accuracy, strongly suggests that the relevant information is contained around the adsorption site.Since SOAP outperformed the other descriptors even though it only contained information about the local environment around hydrogen, it became apparent that size effects of nanoclusters play a minor role (<0.1 eV in our model) in defining ΔE H .
The log-log plots of Fig. 2 emphasize the empirical linear relationship log(MAE) = a-b log(N) for large N in agreement with ref. 33 .The linear relationship of our data sets started at around N = 500-2000 where different error decay rates became apparent.The global descriptor ACSF and SOAP displayed their superiority over ACSF H and MBTR in this regime.
The purpose of the above data sets was to compare descriptors as well as to investigate the benefit of merging data from diverse structures.The generalization error of the best performing descriptor should be estimated higher, though only slightly, since the test sets acted as validation sets to pick the best descriptor.An estimate of the generalization error will be presented for MoS 2 in Fig. 5.
To visualise that similar local environments indeed do not give vastly different ΔE H , 1000 data point pairs were selected with the lowest (dis)similarity d = Descriptor k k 2 , descriptor being SOAP, MBTR or ACSF.In Fig. 3, a histogram plot shows pairs of local environments at a certain (dis)similarity d (taken from the data set MoS 2 (multi)) and the mean of their difference in energy Δ(ΔE H ). The mean difference in ΔE H at any given d increased monotonously.As depicted by the increasing standard deviation, the more dissimilar the data points were the wider the spread of ΔE H , which indicated that the property changed smoothly in feature space.On average, MBTR had a slightly higher Δ(ΔE H ) than SOAP or ACSF.For comparison, CM exhibits a much less smooth feature space.In summary, SOAP outperformed MBTR and ACSF H and the information to explain adsorption energies is contained in the local environment.The property of interest, ΔE H , changed smoothly in feature space spanned by SOAP even though clusters of different sizes were present.
As depicted in Fig. 3 similar adsorption sites have similar ΔE H .In order to achieve predictive power with as few training points as possible, clustered data points should be avoided, but instead selected as such that they are approximately evenly spaced.The data set MoS 2 (single) is a good example to show that the accuracy depends on whether the training points are chosen randomly or are identically distributed.Since significantly more data points were sampled on the sulphur surface of MoS 2 than on its Moterminated edges we suspected a biased data set.Descriptors can be used to select an identically distributed data set with respect to feature space (spanned by the descriptor).
The greedy algorithm farthest point sampling (FPS) was exerted to get a set of the most dissimilar training points. 34In Fig. 4, the MAE of random training and test sets were plotted and contrasted against FPS-sampled training and test sets.Using FPS improved the overall accuracy significantly at smaller training set sizes but the effect soon became less apparent.The choice of the test set did not significantly affect the MAE.At a large enough training set size of 500-1500, selecting training points did not make a difference any more.However, when the training set size was in the range of interest (MAE around 0.1 eV) the difference was significant.We interpreted this result as such that the randomly selected data set was biased and not identically distributed.In order to reduce data set size, descriptors could be used to scan local environments and represent them evenly without bias towards more abundant structural patterns.
Prediction of energy distribution of potential energy scan Next, we investigated to which degree the potential energy surface of a single cluster can be inferred from a data set of multiple clusters.The data set MoS 2 (multi) was used as a training set to predict ΔE H on the surface of the sample cluster MoS 2 (single), where a large test set was available.It should be mentioned that the sample MoS 2 cluster was part of the data set MoS 2 (multi) with 110 data points.
Figure 5a shows the parity plot of ΔE H of the test set MoS 2 (multi).Here, SOAP was chosen as the descriptor.An overall MAE of 0.13 eV was reached.In the sparsely sampled high-energy region, the error was significantly higher than average.In the sparsely sampled low-energy region, however, the error was much lower.Since stable adsorption sites will not be found in the highenergy region, the accuracy of predictions could further be improved by sampling more in the low-energy region.As can be seen from the dashed line errors introduced predicting ΔE H with descriptors were statistical and not systematic since the predictions were centered around y = x.Figure 5a also shows the distribution of ΔE H of the test set MoS 2 (multi).When focusing on global rather than local properties, the MAE does not have to be as low as 0.1 eV rather should the energy distribution be predicted accurately.The predicted energy distribution was in good agreement with the DFT energy distribution.Depending on the desired accuracy, smaller data sets than the ones we used might be enough to reliably predict the energy distribution.
Finally, we tested whether ΔG H of local minima on the potential energy surface could be predicted accurately from single-point calculations only going from ΔE H to ΔG H by adding a constant shift.Hydrogen on top of around 1000 MoS 2 surface atoms of the data set MoS 2 (multi) was relaxed while the cluster itself was kept frozen.SOAP descriptors were created at the relaxed positions.The data set MoS 2 (multi) was used as a training set to predict ΔG H of the relaxed hydrogen adsorption sites.Figure 5b shows the resulting parity plot.Again, an overall MAE of 0.12 eV was reached.However, it showed several outliers.This was probably due to the fact that local environments of the low-energy region were underrepresented in the data set MoS 2 (multi).Higher sampling in the region of interest could alleviate the probability of outliers and further reduce MAE.
Figure 5b also shows the distribution of ΔG H of the sampled hydrogen adsorption sites.The predicted energy distribution was in good agreement with the DFT energy distribution.There seemed to be no systematic over-or under-estimation of the property.KRR failed to predict the lowest-energy adsorption sites under ΔG H = −0.4eV.This was again due to poor sampling in the Fig. 4 The data set MoS 2 (single) was sampled randomly or with FPS in SOAP feature space, and the mean absolute error compared.Random training and testing is shown in red whereas FPS-sampled training and testing or random testing is shown in green or blue, respectively low-energy region.Even though only random positions were taken on the surface of several nanoclusters, a combined database could extrapolate to the local minima with a satisfactory accuracy.A smarter selection of points in feature space spanned by a descriptor opens up a new way of finding adsorption sites on similar systems.
To show the limitation of this method, we greedily extrapolated from the data set AuCu(multi), containing 13-atom clusters to predict ΔE H on the surface of the sample cluster Au 40 Cu 40 .Figure 6 shows a parity plot using the previously best performing descriptor SOAP.
SOAP showed learning tendency with a slight under-estimation.However, the MAE at 0.25 was too high, especially due to the under-estimation of the high-energy regime.Also, it can be noted that the parity plot featured two clusters which indicated that only part of the local environments of Au 40 Cu 40 were represented in the training set.

DISCUSSION
We analysed the performance of state-of-the-art atomic structural descriptors (SOAP, MBTR and ACSF) when used to predict the hydrogen adsorption (free) energy on the surface of nanoclusters.As expected, we found that none of the descriptors which had been designed for molecules and crystals are optimized for nanoclusters.In general, we observed that learning on one cluster at a time required unnecessarily large training sets to achieve good accuracy-this can be improved by merging data from many different nanoclusters in the training set.Since SOAP performed significantly better, we deem it a good choice for adsorption energy predictions.Our data sets did not make it necessary to include global information as could be seen upon the combination of SOAP and MBTR, so the local environment dominates the influence on the adsorption energy.It is, however, possible that a global addition improves the learning when e.g., dopants or defects are added.Descriptor improvements might be possible by combining other descriptors, optimising the weighting functions or other parameters of MBTR and SOAP, or even by constructing a new descriptor encompassing the special structural features of nanoclusters like size, shape and surface morphology.Recently, a multi-scale SOAP kernel has been developed which could incorporate missing information while still retaining the local nature of the descriptor. 34This new approach will be subject to future work.Nevertheless, given sufficient training, all descriptors except CM performed satisfactorily when used as features in KRR.
We identified a few shortcomings of ACSF, MBTR and SOAP with respect to the description of nanoclusters.SOAP in the implementation used here only considers the local environment of hydrogen within a certain cutoff.There are, however, global SOAP descriptors which take into account local environments of all atoms-its performance on nanoclusters will be investigated in the future.ACSF, in order to be size-consistent, was feature selected to be a local descriptor ACSF H , and the accuracy improves slowly with increasing training set size.Better performance with smaller training set sizes could be achieved by feature-selecting symmetry functions.MBTR as a global, size-consistent descriptor could not exhibit its conceptual advantage over the local descriptors, the local environment mostly determined ΔE H .
Many interesting studies could build upon the presented results.In the future, we plan to make more complex databases where the compound space is enlarged by defects or dopants.Ternary metallic clusters, with increased compositional space are particularly challenging for conventional ab initio approaches and could be systems of interest for ML optimization.In terms of the DFT data generation itself, by including information about local similarity encoded in the descriptors it should also be possible to reduce the number of relaxation steps needed to find the local minimum.In conclusion, our results demonstrate that the approach of predicting properties based on descriptors alleviates redundancy in a batch of similar nanocluster calculations-the near-symmetric structures with repeating patterns offer many similar local environments perfectly suited to descriptor methods.

Density functional theory calculations
All electronic calculations were performed with the CP2K package 35 at the density functional theory (DFT) level, where orbitals and electron density were represented by Gaussian and planewave (GPW) basis sets.The exchange-correlation energy was approximated using the spin-polarized GGA-functional by Perdew-Burke-Ernzerhof (PBE). 36Short-ranged doubleζ valence plus polarization molecularly optimized basis sets (MOLOPT-SR-DZVP) 37 and norm-conserving Goedecker-Teter-Hutter (GTH) pseudopotentials [38][39][40] were assigned to all atom types.Van der Waals interactions were taken into account with the D3 method of Grimme et al. with Becke-Johnson damping (DFT-D3(BJ)). 41,42The energy cutoff for the auxiliary PW basis was set to 550 Ry and the cutoff of the reference grid was set to 60 Ry.Atomic positions were optimised using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm until the maximum force component reached 0.02 eV/Å.A gap of at least 8 Å vacuum was added in all cartesian directions of the simulation box.The crystal structures of bulk gold, copper and MoS 2 obtained at this DFT-level were in good agreement with experiments.In Supplementary Information, it is shown that the double-ζ basis set performs in good agreement with TZV2P.
Regarding relaxed hydrogen structures, we calculated the Gibbs free energy of adsorbed hydrogen ΔG H as ; where E Cluster+H , E Cluster and E H2 denote the total energy of adsorbed hydrogen, the solitary cluster and molecular hydrogen in the gas phase.
The term E BSSE corrected for basis-set-superposition error.The term ΔE ZPE − TΔS H was approximated by values from literature at standard conditions; in the case of MoS 2 , the zero-point energy minus the entropic term was estimated as 0.29 eV in ref. 43 .Considering the system AuCu, ΔS H was approximated by 1 2 ΔS 0 H2 , the entropy of H 2 in the gas phase at standard conditions as in ref. 43 ; The zero-point-energies of copper (0.17) and gold (0.14) from ref. 44 differed only a little and were averaged as an approximation, which resulted in ΔE ZPE − 298 KΔS H ≈ 0.22 eV.This approximation resulted in a constant shift in adsorption energy.

Nanocluster data sets
We created several DFT data sets based on nanoclusters of the 2D-material MoS 2 and the metal alloy AuCu.Two nanoclusters were fully scanned with respect to the hydrogen position.They are as follows: The structures are depicted in Fig. 1.The single-cluster data sets, named hereafter MoS 2 (single) and Au 40 Cu 40 (single), comprised of 10,000 singlepoint calculations of single-hydrogen atoms adsorbed on the surface.Hydrogen was positioned randomly at a distance of 130-220 pm from the cluster, where the random points were at least 0.1 Å from each other.Furthermore, data sets containing hydrogen adsorbed on different nanoclusters were produced in a similar fashion.Small-sized AuCu clusters containing 13 atoms ranged from 4 to 9 gold atoms.We wanted to analyse clusters of the same size, but with different compositions.For each of those 24 clusters, we calculated 420 data points of adsorbed hydrogen.The combined data set, named hereafter AuCu(multi), had a size of around 10,000 points.Analogously, the data set MoS 2 (multi) comprised of 91 different MoS 2 nanoclusters, so that it also contained around 10,000 data points.MoS 2 clusters of different size (ranging from 4 to 11 Mo atoms at the edge), shape and edge-termination were chosen based on ref. 22 .In order to create clusters of different shapes, ranging from triangular to hexagonal, corners were capped, leaving behind 3 additional sulphurterminated edges.First, one Mo atom was capped, then 3, then 6, until the cluster had a hexagonal shape.Different edge types were also present in the data set, with sulphur coverages of 0, 25, 50 and 100% equally represented.A few examples are shown in Fig. 7, otherwise edge structures can be found here. 22

Structural descriptors
In general, with a large enough data set containing nanocluster structures, the location of the hydrogen adsorption site and their corresponding ΔG H , it is fairly straightforward to develop a predictive model with the help of ML. ab initio calculations require only atomic types and relative positions of atoms as input.Hence, cartesian coordinate or Z-matrix formats contain all information in order to calculate the total energy of a nanocluster and then derive ΔG H .Those formats, however, have a disadvantage when it comes to interpolation of data or ML.The same structure can be constructed in many different ways-as a result, similar structures might not be treated as similar by the ML algorithm, and discontinuities appear.ML in general requires the input data to be in compact form and in a smooth feature space.
Another structural representation (descriptor) is needed which fulfils several criteria, summarised here. 45A good structural descriptor is: Efforts to develop efficient descriptors in materials science have led to a family of approaches successfully applied to molecules and crystals. 46,47In particular, we consider the following popular descriptors (a detailed description of each of the descriptors is available in Supplementary Information): • CM is a global descriptor based on pairwise coulomb repulsion of the nuclei. 48ACSF 49 -for each atom in a system, ACSF express distance and angular interactions with neighbour atoms in symmetry functions.• MBTR 51 -MBTR is a global descriptor which groups interactions by atomic type and puts them into a tensor.
Descriptor hyper-parameters.The structural descriptors CM, ACSF, MBTR and SOAP have method-specific parameters which can be fitted to the investigated system.A few performance tests showed that the mean absolute error (MAE) was sensitive to a few of those hyper-parameters.The radial cutoff of the local CM was optimised to 6 Å.The rows and columns of the matrix were sorted with respect to the L2-norm.Regarding ACSF, only the radial cutoff R c was optimised.For other parameters, all combinations of sensible values inspired by Behler, 49 were used to construct symmetry functions.Table 1 shows the values used for the parameters ζ, κ, η, λ and R s , which in combination formed symmetry functions from Supplementary Eqs.(S2)-(S5).ACSF H denotes the symmetry functions with hydrogen as the center atom.
The performance of MBTR depended on several hyper-parameters, namely the gaussian broadening parameters σ(k2), σ(k3) as well as the decay exponent d.The other parameters, such as σ(k1) = 5 Å and the grid fineness n(k1) = 100, n(k2) = 900, n(k3) = 360 were kept constant for all data sets.SOAP can in principle be made global by matching local environments with each other, but we used it only locally in this work.The performance of the SOAP descriptor was to a small degree sensitive to the radial cutoff R c .Other parameters, such as the highest angular contribution l max = 9 and the highest radial contribution n max = 10 were kept constant.The aforementioned descriptor parameters were scanned and evaluated on around 1000 data points, a subset of the training set.The optimal parameters are listed in Table 2.

Kernel ridge regression
For medium-sized data sets (1000-10,000) kernel ridge regression is a fast and accurate ML method.In ref. 52 , KRR performed best with the descriptor HDAD (histograms of distances, angles and dihedrals) at predicting atomization energies, a conceptually similar descriptor to the ones we used which supported our choice of KRR.Of the ML models in ref. 52 , graph convolution neural networks were not applicable to the descriptors, hence only random forest regression was another sensible choice.However, as shown in Supplementary Information, its performance is significantly worse than KRR in our case.In order to predict the properties of new data points, the descriptor features of the training set x are compressed into the kernel matrix K The method benefits from a continuous feature space and a unique descriptor-property relation.It is worth mentioning that it works well even with large descriptor sizes and small training sets.The computational cost, however, scales with O N 3 ð Þ, which makes it computationally expensive or infeasible for large data sets (>10,000).
The calculated adsorption energies of the training sets were interpolated by kernel ridge regression using the radial basis function kernel Based on a comparison of different kernels in Supplementary Information, the RBF kernel performs on par with the SOAP-kernel. 50The resulting kernel matrices were used to predict the (free) adsorption energies of the test sets.The exponent of the radial distribution function γ and regularization parameter α were optimised by fivefold cross-validation.
When the features of MBTR and SOAP were combined to a new descriptor, they were weighted within the kernel: Kðx; x0Þ ¼ exp Àγ x MBTR À x 0 MBTR 2 þq x SOAP À x 0 SOAP 2 where q ¼ nMBTR nSOAP is the quotient of the number of features in MBTR and SOAP.This accounted for different descriptor sizes and thus ensured equal weigthing of the descriptors.

Fig. 1 a
Fig. 1 a Hydrogen position scan on the surface of a triangular-shaped MoS 2 cluster (b).c Hydrogen position scan on the surface of a Au 40 Cu 40 cluster (d) shows the learning curve predicting ΔE H at random positions around multiple AuCu clusters.A MAE of around 0.11 eV was reached at 9000 training points with MBTR and ACSF H .With SOAP, only 2000 training points or 80 per cluster were needed to achieve a MAE lower than 0.1 eV.It was estimated before that learning on the potential energy surface of a single-copper-gold cluster required around 300 training points.Again, this

Fig. 3 Fig. 2
Fig. 3 Mean of data point pairs on the axes of Δ(ΔE H ) and (dis) similarity defined by d = Descriptor k k 2 within bins of size 0.1.The coloured area highlights the standard deviation in those bins.The data set MoS 2 (multi) was used to compare the descriptors CM (cyan, offset 1.0 eV), SOAP (red, offset 0.7 eV), MBTR (blue, offset 0.3 eV) and ACSF (green)

Fig. 6 Fig. 5
Fig. 6 Parity plot of predicted against calculated ΔE H together with a histogram of predicted (red) and calculated (black) energy distributions.The data set of multiple clusters AuCu(multi) was used as a training set and the data set Au 40 Cu 40 (single) cluster was used as the displayed test set

•
a triangular MoS 2 cluster with Mo-terminated edges • a medium-sized near-spherical Au 40 Cu 40 cluster

•
invariant with respect to rotation, translation and homo-nuclear permutation • unique-there should be only one way to construct a descriptor for any given structure • non-degenerate-no two sets of descriptor features are identical for structures with different relevant properties • continuous in the spanned feature space

Fig. 7
Fig. 7 Four example MoS 2 clusters illustrate different sizes, shapes and edge-terminations: a small triangular cluster, b hexagonal cluster with a sulphur coverage of 50%, c triangular cluster with capped corners, terminated by 100% sulphur, and d triangular and Mo-terminated (sulphur coverage 0%)

where x 1 ,
…, x N are feature vectors of N training points and K(x i , x j ) is a symmetric positive semi-definite kernel function (e.g., Gaussian kernel).The property y of a new data point x pred is predicted by inverting the kernel matrix y x pred À Á ¼ k T pred ðK þ λIÞ À1 y train ; and regularising it by λ.The vector y train consists of the properties y 1 , …, y N of the training set.The kernel vector k pred is defined as:

Table 1 .
List of parameters of ACSF

Table 2 .
Optimised descriptor hyper-parameters for different data