In recent years, two new techniques have emerged that have changed the way that atomistic simulations at electrochemical interfaces are approached. First, atomistic machine-learning methods have been developed1,2,3,4,5,6 that use machine-learning surrogate models to quickly emulate and predict the results of expensive electronic structure calculations. These methods offer numerous advantages, including fast evaluation times and linear scaling with system size, at least when the number of atomic elements does not change. This approach can allow systematic exploration at much larger length- and time-scales, while maintaining accuracies similar to those of electronic structure calculations. Crucial to their successful use is the intelligent choice of training data and the ability to estimate the uncertainty associated with each predicted quantity.

Second, grand-canonical electronic structure methodologies have been developed that allow electrochemical simulations to be run at constant simulated voltage7,8,9,10,11. This is accomplished by adding or removing fractions of an electron to the simulation, while achieving charge neutrality by various approaches which typically involve a screened countercharge. For example, in the solvated jellium method (SJM)7, the countercharge is placed in a jellium slab above both the slab and explicit solvent, an implicit solvent is used to screen the otherwise large field, and a dipole correction ensures the excess charge localizes on the correct side of the slab, as shown schematically in Fig. 1. Over the course of an elementary reaction, the number of electrons in a simulation changes in order to hold the work function of the simulated electrode constant. Thus, these calculations are electronically grand-canonical—they control the potential by allowing the number of electrons to be a free variable. These simulation approaches have allowed, for example, the calculation of reaction barriers at a systematic series of electrode potentials, leading to an understanding of the potential dependence of reaction barriers and free energy diagrams.

Fig. 1: Schematic of the Solvated Jellium Method.
figure 1

The unit cell is charge-neutral with excess charge localizing on the metal surface, balanced by charge of opposite sign in the jellium. The total number of electrons is adjusted until the work function meets the target value.

The system sizes that are studied with electrochemical simulations are typically too small to fully capture the role of the environment in a particular reaction. Solvent molecules and ions at the interface and within the double-layer region can make significant contributions to the rates and mechanisms of electrochemical processes. In order to properly identify and characterize these contributions, the simulated system size must exceed the correlation length and time scales that emerge at the interface. These scales are vastly larger for solution-phase reactions than they are in the gas phase, often spanning nanometer length scales and nanosecond time scales12. Electronic structure-based simulations, which are well-suited for addressing gas-phase systems, thus tend to fall short in describing reactions at solid-liquid interfaces.

While atomistic machine-learning approaches can accelerate traditional, canonical electronic structure calculations, they do not natively operate in the electronically grand-canonical ensemble. That is, in typical atomistic learning approaches, the energy is considered to be a unique function of the nuclear positions. In the electronically grand-canonical ensemble, the energy is a function of both the atomic positions and the electrode potential. It is the purpose of this work to develop a practical approach to atomistic learning in the electronically grand-canonical ensemble.



In 2007, Behler and Parrinello1 proposed a machine-learning scheme to model the potential energy surface of atomistic simulations, in an ansatz where the total system energy can be decomposed into per-atom terms. In this scheme, the atomic coordinates are first transformed into feature vectors, known as atomic fingerprints or symmetry functions. The symmetry functions distinguish the local environment around a central atom due to its neighboring atoms. A neural network exists for each atom type, which takes the descriptor as the input and gives the corresponding atomic energy as the output. The total energy is calculated as the sum of all atomic energies. The neural network for each atomic element is identical, and the parameters of each are tuned simultaneously by minimizing the difference between the predicted and true energies of a training set of images. Once trained to appropriate data, the machine-learning method can decrease the computational resources by orders of magnitude, and scales linearly with the number of atoms being modeled (whereas density functional theory, DFT, typically scales cubically with the number of atoms, although efforts are underway to reduce this13). Recent work has shown that atomic energies localized by the Behler–Parrinello scheme are in good agreement with those decomposed from DFT, where the energy density is first decomposed into the kinetic contribution, the classic Coulomb energy, the exchange-correlation interactions, and the nonlocal pseudopotential contribution and then integrated over the whole space to satisfy the DFT total energy14,15. In the Behler–Parrinello scheme and similar methods inspired from it, the energy is a unique function of atomic positions. Therefore, these schemes intrinsically cannot work for electronically grand-canonical calculations, where the energy is also a function of the electric potential of the system.

Goedecker and colleagues introduced a machine-learning scheme to allow long-range charge transfer, in which a neural-network model predicts the per-atom electronegativity, rather than the per-atom energy3. In their scheme, the total charge of the system is constrained through a Lagrange factor and the charge distribution (comprised of per-atom charges) is found by solving a linear system to minimize the system energy, which is found through a charge–electronegativity expansion. This system energy is used as a training target to adjust the neural-network parameters that predict atomic electronegativities.

Inspired by these previous works, we introduce the following dual-learning scheme in the electronically grand canonical ensemble.

Dual-learning scheme

We consider an image—that is, a single configuration of atoms at a specified potential—with N atoms, which is calculated in the electronically grand-canonical ensemble. Its potential is set to a value ϕ by varying the net number of electrons in the simulation to Ne, which we define as the number of electrons relative to a neutral system. The net system charge is then Q = − Ne. Since Q is an observable of an electronic structure calculation, it is a quantity that can be learned as a function of the atomic positions \(\{{\overrightarrow{R}}_{i}\}\) and the potential ϕ. Footnote 1 Analogous to the Behler–Parrinello approach—in which the system energy is learned by decomposing it into per-atom energies, each of which is predicted by a regression model—we decompose the predicted system charge into per-atom charges,

$$\hat{Q}=\mathop{\sum }\limits_{i}^{N}{\hat{q}}_{i}\left(\{{\overrightarrow{R}}_{i}\},\phi \right)$$

where the per-atom charges \({\hat{q}}_{i}\) are predicted based upon the local atomic environment and the target potential. (Throughout this work, we will use a hat, ^, to denote a predicted quantity.) We note that per-atom charges (qi) are not a direct output of an electronic structure calculation, so in our scheme training takes place by minimizing the difference between \(\hat{Q}\) and Q for each image of the training set in a combined loss function, which we describe later. Doing this requires a different feature vector to be developed that incorporates the potential, which we describe in the next section.

Now that we have a framework to predict the per-atom charges, \({\hat{q}}_{i}\), we set out to predict the image’s grand-canonical energy, Ω ≡ E − Neμe, where E is the canonical energy, Ne is the net number of electrons (relative to a neutral system) and μe is the electron chemical potential, which is typically taken to be equivalent to the work function in grand-canonical electronic structure treatments16. We also take this convention here. Ω is the energy returned by many grand-canonical calculators, such as the Solvated Jellium7 method implemented in GPAW17,18. In the grand-canonical ensemble, the forces are conservative with Ω, not E16. We use an ansatz that \(\hat{E}\) can be predicted from a summation of atomic terms \({\hat{E}}_{i}\), each of which can be predicted from a truncated charge expansion, similar to the expression from Rappe and Goddard19. Upon converting to \(\hat{\Omega }\) we have

$$\begin{array}{ll}\hat{\Omega }=&\left(\mathop{\sum }\limits_{i=1}^{N}{\hat{E}}_{i}\right)-{N}_{{{{\rm{e}}}}}{\mu }_{{{{\rm{e}}}}}=\mathop{\sum }\limits_{i=1}^{N}({E}^{0,Z(i)}+{\hat{\chi }}_{i}{\hat{q}}_{i}+\frac{1}{2}{J}^{Z(i)}{\hat{q}}_{i}^{2}+\cdots \,)+\mathop{\sum }\limits_{i=1}^{N}{\hat{q}}_{i}{\mu }_{{{{\rm{e}}}}}\\ &=\mathop{\sum }\limits_{i=1}^{N}({E}^{0,Z(i)}+({\hat{\chi }}_{i}+{\mu }_{{{{\rm{e}}}}}){\hat{q}}_{i}+\frac{1}{2}{J}^{Z(i)}{\hat{q}}_{i}^{2}+\cdots \,)\end{array}$$

where Z(i) is the element type of atom i, E0,Z(i) is a reference energy of element Z(i), and \({\hat{\chi }}_{i}\) and JZ(i) are referred to as the electronegativity and hardness of element Z(i) respectively. A full derivation of Eq. (2) is contained in Supplementary Note 1.1.1.

In our implementation, we take both E0,Z(i)and JZ(i) to be element-specific trainable parameters, while \({\hat{\chi }}_{i}\) is an environment-dependent per-atom electronegativity predicted from a machine-learning model, as proposed by Goedecker and colleagues3; that is,

$${\hat{\chi }}_{i}={\hat{\chi }}_{i}\left(\{{\overrightarrow{R}}_{i}\}\right)$$

The specific structure of the \({\hat{\chi }}_{i}\) machine-learning model is described in the section “Implementation”. The per-atom charges \({\hat{q}}_{i}\) in Eq. (2) are those predicted from the charge-learning scheme of Eq. (1). In this way, both per-atom charges and per-atom electronegativities are deduced in a dual-learning scheme, leading to predictions of the per-image charge and the per-image energy.

One notable limitation of this scheme is the neglect of explicit inter-atomic Coulombic interactions in the formulation of the learned per-image energy. As such, any contribution of these interactions to the system energy will be accounted for implicitly within the learned electronegativities, \({\hat{\chi }}_{i}\). Since \({\hat{\chi }}_{i}\) only sees atoms within a specified cutoff radius (typically 5–8 Å), this can account for only short-range charge-interaction effects. Hence, this scheme may be best suited for application to systems with minimal longer-range charge interaction effects, such as homogeneous systems comprised of neutral and non-polar constituents. For all of the test systems included in this work, we found excellent emulation using only the charge–electronegativity formalism, and thus did not develop this aspect. If necessary, our scheme can be extended to include inter-atomic Coulombic interactions. For finite systems, \(\hat{\Omega }\) can be formulated to include the Coulomb energy, ∑i<jqiqj/rij, such as described in the literature3,20,21,22,23. For periodically replicated systems, the Coulomb interactions could be formulated in terms of an Ewald summation24. However, care would need to be taken to account for the net charge in each unit cell, which if implemented naïvely would lead to infinite energies. Such an approach, while less practical, would ensure that learned energetics are physically interpretable.

Charge-predicting fingerprints

As described in Eq. (1), per-atom charges are predicted as a function of positions \(\{{\overrightarrow{R}}_{i}\}\) and potential ϕ. Thus, any feature vector that enters a regression model (e.g., a neural network) must contain these two quantities. Standard feature vectors, such as the symmetry functions of Behler1, depend only on \(\{{\overrightarrow{R}}_{i}\}\). There are many conceivable ways to add potential-dependence to such vectors. Here, we simply extend a standard feature vector, {Gk} as

$$\{{G}_{k}^{{{{\rm{charge}}}}}\}=\{{G}_{k}\}\cup \{{G}_{k}^{\phi }\}$$

where we give each of the fingerprints in \(\{{G}_{k}^{\phi }\}\) an exponential form:

$${G}_{i,k}^{\phi }=\left\{\begin{array}{ll}0,\quad &{{{\rm{if}}}}\,{r}_{iz}\le {r}_{z,{{{\rm{surface}}}}}\\ \phi \,{e}^{-{\eta }_{k}\cdot ({z}_{i}-{z}_{{{{\rm{surface}}}}})},\quad &{{{\rm{if}}}}\,{r}_{iz} > {r}_{z,{{{\rm{surface}}}}}\end{array}\right.$$

where ϕ is the electric potential; in practice, we use the work function of the simulation. zi is the vertical position of the atom i being fingerprinted; this is aligned with the direction of the electrostatic field. zsurface in this implementation indicates where the electrode–electrolyte interface is. In the current implementation, the surface is defined as the z coordinate of the top-most metal atom plus the van der Waals radius, with a custom-defined correction.

ηk is a hyper-parameter specific to the fingerprint element; in this work, we use a set of values of ηk of {2, 1, 0.5, 0.1}.

In the electrochemical simulations we emulate, the periodic boundary conditions along x- and y-directions model the infinite slab and solvents. The electrode is modeled as a finite-depth slab, as shown in Fig. 1 for a three-layer thick slab; however, this is a practical representation of a semi-infinite system. That is, the back side of this slab represents bulk-like atoms, and excess charge cannot be allowed to localize at this surface in properly constructed electrostatic simulations such as the SJ method7. Our charge fingerprints in this implementation set the electrostatic terms of the non-surface metal atoms to be zeros indicating the backside of the electrode cannot be affected by the external electrostatic field, while the charge fingerprints follow the exponential decay mimicking the strength of the electric field as the Gouy–Chapman model describes. In other words, the effect from the electric field on the atoms above the metal surface decays exponentially with the distance zi − zsurface to the surface, while the metal atoms below the surface representing the backside of electrode have a zero contribution from the electrostatic interactions with the field. This is also the case in reality where the back side of electrode is field-free7. These charge fingerprints could measure the interactions not only between atoms but also with the electric field. As such, we anticipate that this approach can be applied to any surface which can be approximated with a plane, including stepped surfaces; we show data on a stepped 211 surface in the Supplementary Figures. Of course, this may have to be re-developed for more elaborate surface geometries in future implementations.

The first part of the charge-predicting fingerprint (that is, \({\{{G}_{k}\}}_{i}\) in Eq. (4)) can be any fingerprint that describes chemical environments. In this paper, we used Gaussian symmetry functions, including GII and GV types as suggested in ref. 2. Specific values of the fingerprint hyperparameters are contained in the section “Implementation”.

These charge-predicting fingerprints bypass the problem that the number of electrons in a simulation is unknown a priori, but rather is determined over the course of an electronic structure calculation from the atomic positions and the work function.

Loss functions and forces

Thus, the scheme is split into two sub-regression models, as shown in Fig. 2, to predict the charge \(\hat{Q}\) and energy \(\hat{\Omega }\) of each image. The first regression model takes the atomic positions, {Ri}, and the electrode potential, ϕ, as inputs and predicts per-atom charges \(\{{\hat{q}}_{i}\}\), using an ML regression model. The sum of these charges, \(\hat{Q}={\sum }_{i}{\hat{q}}_{i}\), is compared to the total charge Q of the parent calculation. The second regression model takes as input the atomic positions \(\{{\overrightarrow{R}}_{i}\}\) and predicts per-atom, environment-dependent electronegativities, \(\{{\hat{\chi }}_{i}\}\). The electronegativities are used along with the set of charges predicted by the other model, \(\{{\hat{q}}_{i}\}\) to predict the system energy \(\hat{\Omega }\) by Eq. (2). \(\hat{\Omega }\) is then compared to the actual Ω calculated by the parent calculator for each image. A combined loss function is assembled that simultaneously can optimize the parameters of the two ML models (along with the parameters \({E}_{i}^{0}\) and Ji):

$$L=\frac{1}{2}\mathop{\sum }\limits_{j = 1}^{M}\left[{\left(\frac{{{{\Omega }}}_{j}}{{N}_{j}}-\frac{{\hat{\Omega }}_{j}}{{N}_{j}}\right)}^{2}+{\alpha }_{{{{\rm{charge}}}}}\cdot {\left(\frac{{Q}_{j}}{{N}_{j}}-\frac{{\hat{Q}}_{j}}{{N}_{j}}\right)}^{2}\right]$$

where the summations are over the M images in the training set. αcharge is a hyper-parameter that lets us tune the relative importance of predicting charges versus energies. We also note that an alternate strategy is to first train the charge-predicting model, and then use the resulting per-atom charges to train the energy-predicting model. In practice, we found both strategies to work fine, but we implemented the combined loss function such that training could be accomplished in a single step.

Fig. 2: Scheme for grand-canonical machine learning.
figure 2

Atoms with the same element type share the same neural-network structures and parameters. Two sub-neural networks---one predicts atomic charges from the geometry and electrode potential, the other predicts atomic electronegativities from solely geometry information—calculate the total energies and total charges at a fixed electrode potential.

The loss function above trains against the calculated charge and energy of each image. However, we can also fit to the atomic forces of a simulation, since atomic forces provide much more information than energy or charge alone. The force on each atom j can be calculated by a chain-rule derivation, since forces are equivalent to the negative gradient of Ω in the grand-canonical formalism16:

$${\hat{{{{\rm{F}}}}}}_{j}=-\mathop{\sum }\limits_{i}^{N}\left({\hat{q}}_{i}\frac{\partial {\hat{\chi }}_{i}}{\partial {{{{\bf{G}}}}}_{i}(\{{{{\bf{R}}}}\})}\cdot \frac{\partial {{{{\bf{G}}}}}_{i}(\{{{{\bf{R}}}}\})}{\partial {{{{\bf{R}}}}}_{j}}+({\hat{\chi }}_{i}+{J}_{i}{\hat{q}}_{i}+{\mu }_{e})\frac{\partial {\hat{q}}_{i}}{\partial {{{{\bf{G}}}}}_{i}(\{{{{\bf{R}}}}\})}\cdot \frac{\partial {{{{\bf{G}}}}}_{i}(\{{{{\bf{R}}}}\})}{\partial {{{{\bf{R}}}}}_{j}}\right)$$

This can be added to our loss function, so in the case of force training it becomes:

$$L=\frac{1}{2}\mathop{\sum }\limits_{j=1}^{M}\left[{\left(\frac{{{{\Omega }}}_{j}}{{N}_{j}}-\frac{{\hat{\Omega }}_{j}}{{N}_{j}}\right)}^{2}+{\alpha }_{{{{\rm{charge}}}}}\cdot {\left(\frac{{Q}_{j}}{{N}_{j}}-\frac{{\hat{Q}}_{j}}{{N}_{j}}\right)}^{2}+\frac{{\alpha }_{{{{\rm{force}}}}}}{3{N}_{j}}\mathop{\sum }\limits_{k=1}^{3}\mathop{\sum }\limits_{i=1}^{{N}_{i}}{({F}_{ijk}-{\hat{F}}_{ijk})}^{2}\right]$$

where here Fijk indicates the kth Cartesian component of the force on atom i of image j.


We implemented this scheme in our open-source machine-learning software AMP, the Atomistic Machine-Learning Package25,26. An example script of the use of the method is contained in the Supporting information.

The machine-learning models of Eqs. (1) and (3) can, in principle, be any regression model; for generality and convenience we implemented them both as basic neural networks in the current work. All the neural networks were trained with the same structure and hyper-parameter settings. The machine learning descriptors were constructed following Behler and colleagues’ suggestion1,2. The cutoff radius was 6.5 Å. 36 symmetry functions were used in the descriptor. η’s were set to be {0.005, 4, 20, 80} in GII-type symmetry functions and η = 0.005, ζ = {1, 4}, γ = {+1, −1} were chosen in GV-type symmetry functions. The set of {ηk} used in the charge-predicting fingerprints was {0.1, 0.2, 0.5, 2}. The neural-networks’ hidden-layer structures were (20, 10) and (5, 5) for the charge and electronegativity networks, respectively. The convergence criterion was set to be 0.00005 (electrons) charge root-mean square error (RMSE) per atom and 0.0002 eV energy RMSE per atom.

Training and testing data

The bulk of the testing data reported in this work used a 2 × 3 × 3 gold slab electrode which is periodic in the lateral (x, y) directions, whereas the z direction is normal to the electrode surface. The bottom layer is fixed during structure optimizations. Explicit water molecules were present near the surface (with implicit water above these), and extra protons/molecules were present in some simulations as described in the results. The bottom layer of the Au slab was fixed to represent the bulk of the electrode. All the grand-canonical DFT calculations were conducted in the solvated jellium method (SJM)7 in GPAW17,18. A Monkhorst–Pack k-point grid of 4 × 6 × 1 was employed and PBE27 was used as the exchange–correlation functional. When structural optimization was employed, local optimization to below 0.03 eV Å−1 and nudged elastic band (NEB)28,29 optimization to below 0.05 eV Å−1 were used as targets.

The model was fit with a combination of images from both Volmer and Heyrovsky reactions. The convergence criterion for all systems in energy training was 0.2 meV RMSE per atom. All images in the training and testing sets were chosen randomly.


Here, we demonstrate the application of the grand-canonical machine-learning scheme to both replicate and predict the results of DFT calculations of electrified surfaces. We first report the ability of the model on both training and testing data, and combine it with a bootstrap ensemble method4 to assess the usefulness of uncertainty predictions. We assess the model’s ability to replicate both the energy and the charge of these simulations, using the energy/charge-predicting version of the model; that is, we did not train to forces in the first test.

Afterward, we test the scheme in a real-world application: namely the acceleration of nudged elastic band (NEB) calculations for saddle-point searching30 in the electrochemical reactions at fixed electrode potentials. In this application, we employed the model with force training.

Finally, we test the ability of the method to extrapolate to larger-sized systems: specifically, we look at the ability of the model trained on fewer layers of water to predict energies and charges for more layers of water.

Accuracy of predictions, and uncertainty bounds

We tested two systems for both the accuracy of the prediction, as well as the reliability of the uncertainty bounds produced by a bootstrapping technique. The training and testing images were randomly chosen from DFT-calculated NEB iterations of the Volmer and Heyrovsky steps shown in Fig. 3 of the hydrogen evolution reaction, respectively:

$${{{{\rm{H}}}}}^{+}+{{{{\rm{e}}}}}^{-}+{}{* }\to {{{{\rm{H}}}}}{* }$$
$${{{{\rm{H}}}}}^{+}+{{{{\rm{e}}}}}^{-}+{{{{\rm{H}}}}}{* }\to {{{{\rm{H}}}}}_{2}+{}{* }$$
Fig. 3: Reaction geometries.
figure 3

Atomistic figure corresponding to Volmer (right) and Heyrovsky (left) reactions, respectively.

Note that we chose the Heyrovsky as the second reaction instead of the Tafel reaction (2 H → H2 + 2*), as it poses a greater test of constant-potential machine-learning methods due to the electron transfer. Particularly in the Heyrovsky step, the electron transfers rapidly when close to the saddle point—even though the geometry only changes slightly; the steep barrier also leads to a large difference in potential energies at small geometric changes. The mixture of Volmer and Heyrovsky processes makes the training images more diverse.

The DFT reference data were calculated in the solvated jellium method7 as described earlier. The calculations were run on a Au fcc(111) surface at absolute electrode potentials ranging from 3.8 V to 4.6 V (about –0.6 V to 0.2 V on a standard hydrogen electrode (SHE) scale31) in 0.2 V increments. Sixty percent of the images were chosen randomly for training, with the rest reserved for testing.

Figure 4 compares the learned predictions to the electronic structure calculations; the top–left plot shows an excellent fit for the energies (Ω ≡ E − Neϕ) for both the training and testing images. The root-mean-square error (RMSE) is on the order of 0.1 meV/atom, which for a 50-atom system means the model is replicating and predicting the GC-DFT results to about 0.005 eV. For these systems, the precision of the learning representation (~10−3 eV) is safely within the accuracy range of typical DFT calculations (~10−1 eV)32. Charges (that is, the number of excess electrons) were replicated to an RMSE of 5 × 10−5 per atom, as shown in the bottom–left plot. The charge RMSEs for testing sets were in the range of 5 × 10−5 to 7 × 10−5 electrons.

Fig. 4: Parity and uncertainty plots.
figure 4

The left panels describe the training and testing RMSE of the grand-canonical energy per atom (top) and the number of excess surface electrons (bottom) per atom in HER reactions. The energy parity plot is referenced to the minimum of energy per atom for plotting purpose. The right panels plot the ensemble half-spread (a predictor of uncertainty) versus the median error of the predictions from the bootstrap ensembles for HER reactions for the grand-canonical energy per atom (top) and the number of electrons per atom (bottom).

The convergence criterion for all systems in energy training was 0.2 meV RMSE per atom, resulting in RMSEs for the testing set in the range of 0.25 to 0.50 meV atom−1. All images in the training and testing sets were chosen randomly, and the number of images in the training sets was around 1500. We emphasize that the training and testing sets were randomly chosen from saddle-point searches for both the Volmer and Heyrovsky reactions with a wide range of work functions: 3.8, 4.0, 4.2, 4.4, and 4.6 eV (roughly spanning −0.6 to 0.2 VSHE). It is notable that a single model could be trained to fit all these potentials, and predictions could be made to any potential.

To assess the generality of the method across metal surfaces, we also tested on Pt, Cu, and Ag slabs; the results are similar and reported in the supporting materials.

In previous works with (canonical) atomistic learning4,33, a bootstrap ensemble was shown to give a good estimate of the uncertainty inherent in predictions. Here, we test if this bootstrap-ensemble approach also can bracket the prediction uncertainty in the electronically grand-canonical ensemble. The right column of Fig. 4 shows runeplots for the halfspread (one-half the spread between the 5th and 95th percentile values of the model predictions) for these two systems; we expect the majority of points to lie below the parity line. We see just this for both the training and the test data, for both predictions of charge and energy. Thus, it appears that the bootstrap approach continues to work well in the grand-canonical framework.

For these systems, the ML training was about 100 and 800 times faster than the time of the parent DFT calculations (in CPU time) with and without force training, respectively. The resources required to call a trained model for calculation were negligible. Like most atomistic machine-learning approaches, the simulation time increases approximately linearly as the size of the system increases, in contrast to DFT which scales approximately cubically.

Saddle-point searches

The search for reaction barriers—first-order saddle points on the potential energy surface—is one of the most computationally demanding tasks in routine catalysis studies, and the ability to do so at a constant potential is one of the key advantages of electronically grand-canonical codes34,35. We therefore expect one of the most common and demanding uses of a grand-canonical learning approach will be in saddle-point searches.

We first describe our best practices—without machine learning—in searching for electrochemical reaction barriers in an efficient manner, such that we can fairly assess the ability of machine learning to further speed up such calculations. Our goal is to find the reaction barrier (that is, the peak of the minimum-energy pathway, MEP) for a single reaction over a range of potentials, for example the proton-deposition (or Volmer) reaction on gold. We first start with a single potential; in this example we start at a work function of 4.4 eV (~0 VSHE). Our initial guess of the minimum energy path is comprised of a linear interpolation of all atomic positions between the initial and final states. We run the simulation with the DyNEB36 approach, which is particularly efficient for serial NEB calculations. Once this converges, we use the resulting MEP as the initial guess for DyNEB runs at the neighboring potentials, of 4.2 and 4.6 eV; the good initial guess drastically reduces the number of NEB iterations required. We then use the 4.2 and 4.6 eV MEPs as initial guesses for the 4.0 and 4.8 eV runs, respectively. The number of band force calls in this approach is shown in Fig. 6; while the first potential takes 141 iterations, the subsequent potentials each take only 5–9 iterations.

To test the ability of atomistic learning to accelerate these grand-canonical barrier searches, we employ a simple method we published30 for the acceleration of saddle-point searches in canonical calculations. We briefly describe the scheme; full details are in the reference. We start with an initial guess of the MEP, generated in an identical fashion to our conventional approach (either a linear interpolation or from a converged NEB at a nearby potential). We calculate each of these images in SJM-DFT, these ~7 images create a minimal training set on which we train a GC-ML calculator. We then use the trained ML as a surrogate calculator to run a NEB calculation to convergence, and we validate the predicted MEP by running each image in SJM-DFT. If the SJM-calculated forces indicate the NEB is converged, the algorithm is terminated. If not, we add these new images to our training set, re-train the calculator, and repeat until convergence.

To fairly assess the impact of machine learning, we adopt a similar strategy in the machine-learning scheme. We first converge the simulation at a work function of 4.4 eV, and use this converged MEP as the initial guess for the neighboring potentials, and so on. We further use the previous SJM-DFT images (for example, from 4.4 eV) as additional training data for the subsequent potentials; this is possible as our GC-ML approach is valid across differing potentials.

Figure 5 compares the output of the ML-assisted scheme with the DFT-only scheme, in both energy and charge coordinates. We see that the ML-assisted scheme provides an excellent prediction of the MEPs on both metrics. We note that a slight deviation can be seen between the ML and pure-DFT approaches at some potentials. Within the tolerance of DFT-calculated NEBs (0.05 eV/Å), both are converged, so the difference is purely within the tolerance of the NEB method; that is, the machine-learning calculations are as valid as the DFT-alone calculations. From Fig. 4, we see that the GC-ML replicates the DFT results to within about 0.03 eV, much less than the difference in NEB tolerances.

Fig. 5: Barrier-search acceleration.
figure 5

Comparison of reaction pathway (left) and charge transfer (right) along the band between pure DFT calculations and ML-assisted NEB for potentials (work functions) from 4.0 V to 4.8 V (from bottom to top).

The computational savings are shown in Fig. 6. We see a very large savings in the first potential (4.4 eV), when the system starts far away from the optimal reaction path. Since this is the bulk of the computational effort, this is the most important region for savings, and we see a reduction from 141 to 15 DFT-calculated band force calls. There is a much less dramatic reduction for the subsequent potentials, largely because the initial guess is so good in these bands already. On average, the standard DFT approach takes 6.75 band force calls for each of these conditions, while ML is slightly faster at 5.5 band force calls.

Fig. 6: Barrier-search acceleration.
figure 6

Comparison of number of DFT force calls of NEB band between DFT and ML-assisted DFT simulations.

Charge partitioning

Although the partitioning of charge to atoms in an electronic structure calculation is arbitrary—that is, it requires assumptions about how to assign electron density to each atom—it is nevertheless useful to understand if the ML and the ab initio approaches are assigning charges in a similar manner. In the training scheme, we only require that the total charge, or excess electrons, of a system match the training data. Here, we examine if the charge partitioning in the ML scheme is reasonable.

To do so, we partition the charge into two regions: (1) the slab, or metal atoms, and (2) the water layer, including the excess proton if present. In the case of the ab initio data, we use a sum of per-atom Bader charges37, although other partitioning schemes would also be valid. For the machine-learned implementation, per-atom charges are predicted directly, which we sum for each subsystem. Since atomic charge-partitioning schemes are inherently arbitrary, here we focus solely on the differences in per-atom charges as the potential is changed, which we expect to be less sensitive to the particular choice of charge-partitioning scheme. Examination of these charge differences can indicate if the machine-learning model and the ab initio model are localizing the excess electrons into similar locations within the simulation.

The initial state of the Volmer reaction on Au(111) was chosen for this test, covering several different electrode potentials. All results are reported in Fig. 7, referenced to the charge partitioning at a work function of 4.4 eV. Here, we see that the machine-learned partitioning of excess electrons between the slab and the water layer matches the ab initio partitioning within reasonable deviations, suggesting the machine-learning model is localizing the charge in a similar manner to the ab initio model.

Fig. 7: Charge localization.
figure 7

Comparison between Bader analysis and ML on charge change in responding to electrode potentials varies on the Au fcc(111) surface with one layer of water and one extra proton. All electron changes are reference to potential 4.4V. Left: ΔNe comparison between DFT (solid red) and ML (open black). The upper triangles represent the number of excess electrons summed from slab side and lower triangles represent that from water. Right: the parity comparison of the ΔNe of water (blue) and slab (green) calculated from DFT (x-axis) and ML (y-axis), respectively.

Extrapolation to larger solvent structures

A well-known limitation of DFT methods is the system size; it is typical to only use one or two layers of explicit water near the surface, although in the liquid phase many layers of explicit solvent may contribute to the observed kinetics. This is due to both the computational expense of DFT calculations (\(\sim {{{\mathcal{O}}}}({N}^{3})\) scaling, where N is the number of electrons), and the many degrees of freedom of such systems, such that the computational expense precludes exploration of the potential energy surface. Thus, an intriguing use of GC-ML methods is for extending length scales; allowing both more layers of explicit water and greater exploration of the configurational space of those layers. Here, we explore the systematic ability of the GC-ML method to accommodate multiple water layers, and its ability to extrapolate.

Since electrochemical atomistic models are typically periodic along the slab planes (the x and y directions of the unit cell), it is natural that the ML calculator is transferable when the systems are enlarged by repeating along x and y directions, because both the chemical and electrostatic environments are similar for the corresponding atoms in repeating unit cells. However, when the systems are enlarged by adding more explicit water molecules—that is, extending the solvent in the z direction—the ML representation must be thoroughly tested.

We systematically explored the ability of GC-ML to make extrapolative predictions of extra water layers. We created a data set of 3647 images, comprised of images with 1, 2, 3, 4, and 5 water layers, at work functions of 3.8, 4.0, 4.2, 4.4, and 4.6 eV. These images additionally had geometrical variations of different layers of water and protonation of an arbitrary water molecule. Moreover, the configuration of water layers are also relaxed for more diverse in geometries. 124, 138, 197, 173, 30 training images with 1, 2, 3, 4, and 5 water layers respectively are sampled randomly with the remaining 250, 456, 741, 672, and 896 images for testing purposes.

We first trained the data on images that contain only a single water layer, and tested this model’s ability to make predictions on systems with more water layers. The results are in the top row of Fig. 8, which uses the bootstrap-uncertainty approach to produce runeplots of the data. Here, we see that the one-layer testing structures fit similarly to the one-layer training structures. However, the bootstrap approach predicts a large uncertainty in the 2- to 5-layer structures, and we also see that the predictions are far from the true SJM-DFT results.

Fig. 8: Size extrapolation.
figure 8

Bootstrap uncertainty analysis for images with continuing adding more layers of water.

Next, we add training images from the two-layer system to our model, and use it to make predictions. As shown in the second row, now the one- and two-layer structures are fit well, but there are still deviations in the three- to five-layer structures.

As we repeat this systematic process, we see that by the time we reach four-layer training data, we are able to predict with reasonable certainty the five-layer structures. This is understandable, since the water molecules within the water layers take on three characteristic geometries: near the electrode surface, bulk-like, and near the top (implicit) surface. As the number of layers grows, examples of all of these water layers are present, and the system can reasonably be expected to extrapolate to larger numbers of water layers.


In the current work, we introduced a scheme for the machine-learning emulation of electronically grand-canonical electronic-structure calculations, of the type employed in studies of electrochemical interfaces. The framework we developed employs a dual-learning scheme where both the per-atom charge and the per-atom electronegativity are used to both predict the system charge and the system (grand-potential) energy. We showed that a single well-trained model could nicely predict both the charge and energy of a variety of structures from two different reactions across a range of potentials spanning 0.8 V. We also showed that a drop-in use of the existing bootstrap ensemble approach gives a good, yet conservative prediction on the uncertainty of each output quantity, as compared to the true deviations from grand-canonical density functional theory.

We further showed that this method works well to accelerate saddle-point searches within the grand-canonical framework, with the largest benefit coming in the regions where the system starts far from the optimal pathway. We also studied the ability to extrapolate to larger sizes, specifically more layers of water, and found that as more layers were added the extrapolation became more reliable, but in all cases the uncertainty predictions were able to identify unreliable predictions.

Like most machine-learning models, the current model has physical limitations compared to the true electronic-structure calculations and can be expected to fail in certain situations. In particular, the current model does not contain long-range Coulombic interactions, so if true long-range charge transfer occurs then this model may not capture it. It is conceivable that this model could be extended to account for such effects, but care must be taken to deal with the divergent nature of periodic finite charge summations.