Introduction

Geometry optimisation and transition state search are fundamental procedures to identify important stationary points of molecules, molecular crystals and material systems in computational chemistry. Since the evaluation of chemically accurate ab initio potential energies and gradients are computationally demanding, several techniques have been developed over the last three decades to enhance the convergence of optimisation methods.

Among the most widely used are quasi-Newton methods, in particular BFGS or its limited memory version1, which start with a (scaled) identity as their guess for the Hessian and update it at each iteration based on the gradient information collected from previous steps. Initialising with a Hessian guess that includes more molecule specific geometrical information can improve the convergence. For instance, just introducing some connectivity information about a molecule can lead to surprisingly good results2,3.

Utilising internal coordinates in the construction of Hessian has also been intensively investigated4,5,6,7,8,9. These approaches include (1) methods that build the Hessian matrix in the space of internal coordinates in the beginning of the optimisation and then transform it to Cartesian coordinates, (2) methods that recompute and transform the internal Hessian every step and (3) methods that carry out even the optimisation step in the space of internal coordinates. However, this latter scheme requires to carry out both the projection of Cartesian gradients to internal coordinates gradients and the transformation of the internal coordinates to Cartesian coordinates. None of these steps is straightforward and the gradient conversion can be accomplished only by an iterative solution due to the curvilinear nature of the transformation8.

More sophisticated methods estimate the initial Hessian guess from a surrogate potential (e.g. force field or semiempirical potential) whose second derivative can be obtained at low computational cost. The update of the approximate Hessian can then be achieved either by using quasi-Newton methods or other techniques such as DIIS10,11. Such strategies significantly improve the speed of convergence in either Cartesian or internal coordinates6.

If the model Hessian is cheap to calculate (e.g., as obtained from a surrogate model) and provides a reasonable approximation of the quantum Hessian, it may be advantageous to recompute it at every optimisation step. Such a scheme was introduced by Lindh et al.12, where a model potential is constructed consisting of quadratic terms for all distances, angles and dihedrals in the molecule. At each geometry optimisation step the force field is constructed such that the current conformation is its local minimum, and its “Hessian” is computed, neglecting the dependence of the force field parameters on the geometry. With this construction, the model “Hessian” is therefore not the Hessian of any potential. Nevertheless, this approach yields excellent performance, which led to its wide implementation in quantum chemistry program packages (e.g. in MOLPRO13, ORCA14, DALTON15, CRYSTAL16).

The model “Hessian” of Lindh12 can be considered as a preconditioner or metric that effects a transformation to a new coordinate system where the optimisation problem is better conditioned, hence algorithms converge more rapidly and tend to be more robust. Geometrically, the shape of the energy landscape becomes more isotropic. In general it is desirable that both the construction and inversion of the preconditioner matrix are inexpensive (at least compared to the computation of the energy and gradient). This can be achieved by building a sparse preconditioner from simple analytical functions. Since the preconditioner defines a metric in configuration space it needs to be positive definite. This requirement is automatically fulfilled in the Lindh approach12 by the use of quadratic molecular mechanical terms with equilibrium points corresponding to the actual geometry at each step.

We recently introduced a general and effective preconditioner for geometry optimisation and saddle point search for material systems3. This preconditioner is determined by the local connectivity structure of atoms making both the construction and inversion computationally inexpensive. Especially for larger systems we observe an order of magnitude or larger reduction of the number of optimisation steps for the preconditioned LBFGS method compared to the same without preconditioning. Here we expand our previous work to molecules and molecular crystals by combining it with a force field based preconditioner inspired by the approach of Lindh et al.12.

Methods

Enhancing geometry optimisation by using preconditioners

We briefly review the methodology for preconditioning geometry optimisation and the dimer saddle point search method for material systems3. For a system with N particles let \({x}_{k}\in {{\mathbb{R}}}^{3N}\) denote the configuration at the k th iterate of an optimisation algorithm. The corresponding energy, gradient and preconditioner are denoted by fk = f(xk), gk = f(xk) and \({P}_{k}\in {{\mathbb{R}}}^{3N\times 3N}\approx {\nabla }^{2}f({x}_{k})\), respectively. A preconditioned steepest-descent step is then given by

$${x}_{k+1}={x}_{k}-{\alpha }_{k}{P}_{k}^{-1}{g}_{k}$$
(1)

where αk is the step size obtained from some line search procedure at the k th iteration. If Pk = I, then (1) becomes the standard steepest descent scheme and if Pk = 2f(xk), then it becomes a quadratically convergent Newton scheme. In general, different choices of Pk may “interpolate” between these extremes.

From an alternative point of view preconditioning can be thought of as a coordinate transformation, where a new set of coordinates is defined as \({y}_{k}:={P}^{\mathrm{1/2}}{x}_{k}\). The advantage of this framework is that once an appropriate preconditioner matrix is available then any optimisation algorithm can be modified by applying the original algorithm on the transformed coordinates. To obtain the final form of the modified algorithm we need to transform the variables back to the original coordinate system. As a simple example, applying the coordinate transformation on the gradient descent equation immediately leads to the equation of quasi Newton schemes (eq. 1):

$$\begin{array}{rcl}{y}_{k+1} & = & {y}_{k}-{\alpha }_{k}{\nabla }_{y}F({y}_{k})\\ {P}_{k}^{\mathrm{1/2}}{x}_{k+1} & = & {P}_{k}^{\mathrm{1/2}}{x}_{k}-{\alpha }_{k}{P}_{k}^{-\mathrm{1/2}}{\nabla }_{x}\,f({x}_{k})\\ {x}_{k+1} & = & {x}_{k}-{\alpha }_{k}{P}_{k}^{-1}{\nabla }_{x}\,f({x}_{k})\end{array}$$
(2)

where we used the fact that \({\nabla }_{y}F({y}_{k})={P}_{k}^{-\mathrm{1/2}}{\nabla }_{x}\,f({x}_{k})\). Preconditioning popular optimisation methods like LBFGS, conjugate gradients, FIRE17 etc. is similarly possible.

A simple preconditioner that is effective for a wide range of materials systems is based on the following N × N matrix3

$${L}_{ij}=\{\begin{array}{cc}-\mu \exp (\,\,-\,A(\frac{{r}_{ij}}{{r}_{{\rm{n}}{\rm{n}}}}-1)), & i\ne j\,\text{and}\,|{r}_{ij}| < {r}_{{\rm{c}}{\rm{u}}{\rm{t}}}\\ 0, & i\ne j\,\text{and}\,|{r}_{ij}|\ge {r}_{{\rm{c}}{\rm{u}}{\rm{t}}}\\ -\sum _{i^{\prime} \ne j}{L}_{i^{\prime} j}, & i=j\end{array}$$
(3)

where i and j denote atomic indices and μ, A, rcut and rnn are parameters that can be user-specified or estimated numerically. We note that (3) is a generalisation of the Laplacian matrix used to represent undirected graphs. Given a specific connectivity defined by rcut and setting A = 0 and μ = 1, Lij reduces exactly to the Laplacian matrix. The actual 3N × 3N preconditioner is simply obtained from the corresponding Lij element in an isotropic manner:

$$[{P}_{{\rm{E}}{\rm{x}}{\rm{p}}}{]}_{k+3(i-1),l+3(j-1)}=\{\begin{array}{cc}{L}_{ij}, & k=l\\ 0, & k\ne l\end{array}$$
(4)

where the k and l indices denote Cartesian components. Despite its simplicity in capturing only geometric connectivity but no specific material information, PExp provides a good model for the local curvature of the potential energy landscape, which effectively controls ill-conditioning in large systems. The application of PExp resulted in a significant reduction of the number of optimisation steps required for several material systems compared to the unpreconditioned LBFGS3.

FF-based preconditioners

Preliminary tests showed that for molecular systems such as molecules in gas phase or molecular crystals using PExp still leads to a speed-up, but a much more modest one than for material systems. The explanation for this is that molecular systems contain a wide range of different interactions (e.g., pair, angle, dihedral, electrostatic, dispersive) of vastly varying stiffness which in addition are more loosely coupled, and this creates a second source of ill-conditioning distinct from ill-conditioning due to large system size. Inspired by the use of internal coordinates in molecular optimisation techniques4 and the model Hessian of Lindh et al.12 we therefore propose a generalisation of PExp that is effective also for molecular systems.

The construction of our FF-based preconditioner begins with a surrogate potential energy function, given by a sum over internal coordinates each describing a short-range bond in the system (distance, angle, or dihedral),

$${V}_{{\rm{FF}}}=\sum _{\alpha }{V}_{\alpha }({\xi }_{\alpha }(x)).$$
(5)

The individual potential energy terms are in general simple functions of the internal coordinates. Some examples of most typical forms are the quadratic, Morse or torsional potentials, respectively given by

$${V}_{{\rm{Quadratic}}}(q)=\frac{1}{2}k{(q-{q}_{0})}^{2},$$
(6)
$${V}_{{\rm{Morse}}}(d)={D}_{0}{(1-\exp (-\alpha (d-{d}_{0})))}^{2},$$
(7)

and

$${V}_{{\rm{Torsion}}}(\varphi )=\frac{1}{2}{k}_{\varphi }(1+\,\cos (n\varphi -{\varphi }_{0}))$$
(8)

where the corresponding parameters can be taken from standard force field libraries.

Due to its simple functional form \({H}_{{\rm{FF}}}\,:\,=\,{\nabla }^{2}{V}_{{\rm{FF}}}\) is cheap to compute. If only short-range bonds are taken into account then it is also sparse, hence it is cheap to store and invert. Moreover, we expect that VFF gives a good qualitative approximation to the quantum potential energy landscape, hence HFF is a good qualitative approximation to the quantum Hessian 2f. Therefore, HFF satisfies all the conditions required for a preconditioner except that it will in general be indefinite. A conceptually straightforward but computationally expensive approach to overcome this limitation is to enforce positivity by replacing all eigenvalues of HFF with their absolute values. Instead, we propose to analytically modify the local Hessian contributions to ensure overall positivity, resulting in a further reduction in computational cost.

The Hessian contribution from Vα is given by

$${H}_{\alpha }=\frac{{\partial }^{2}{V}_{\alpha }}{\partial {x}^{2}}=\mathop{\underbrace{\frac{\partial {\xi }_{\alpha }}{\partial x}\otimes \frac{\partial {\xi }_{\alpha }}{\partial x}\frac{{\partial }^{2}{V}_{\alpha }}{\partial {\xi }_{\alpha }^{2}}}}\limits_{{H}_{\alpha }^{(1)}}+\mathop{\underbrace{\frac{{\partial }^{2}{\xi }_{\alpha }}{\partial {x}^{2}}\frac{\partial {V}_{\alpha }}{\partial {\xi }_{\alpha }}}}\limits_{{H}_{\alpha }^{(2)}},$$
(9)

where we have decomposed Hα into two terms, \({H}_{\alpha }^{\mathrm{(1)}}\) and \({H}_{\alpha }^{\mathrm{(2)}}\). If Vα is quadratic then \(\frac{{\partial }^{2}{V}_{{\rm{Quadratic}}}}{\partial {\xi }^{2}}=k > 0\), hence \({H}_{\alpha }^{\mathrm{(1)}}\) is positive semi-definite, while the sign of \({H}_{\alpha }^{\mathrm{(2)}}\) is ambiguous. Note however, that if the system is at equilibrium of Vα with respect to ξα, i.e., if \(\frac{\partial {V}_{\alpha }}{\partial {\xi }_{\alpha }}=0\), then \({H}_{\alpha }^{\mathrm{(2)}}=0\). This in fact is the case in the Lindh approach12. Instead of adjusting Vα at every step such that the geometry corresponds to its equilibrium, here we simply drop \({H}_{\alpha }^{\mathrm{(2)}}\) and only use \({H}_{\alpha }^{\mathrm{(1)}}\) to construct the preconditioner, thus ensuring that it always stays positive definite.

For non-quadratic contributions we expect that \(\frac{{\partial }^{2}{V}_{\alpha }}{\partial {\xi }_{\alpha }^{2}} > 0\) for most but not all bonds, hence we enforce positivity by replacing it with its absolute value. This leads to the following general preconditioner for molecular systems:

$${P}_{{\rm{F}}{\rm{F}}}=\sum _{\alpha }{\mathop{H}\limits^{ \sim }}_{\alpha }^{(1)}=\sum _{\alpha }\frac{{\rm{\partial }}{\xi }_{\alpha }}{{\rm{\partial }}x}\otimes \frac{{\rm{\partial }}{\xi }_{\alpha }}{{\rm{\partial }}x}|\frac{{{\rm{\partial }}}^{2}{V}_{\alpha }}{{\rm{\partial }}{\xi }_{\alpha }^{2}}|$$
(10)

It is worth noting that \(\frac{\partial {\xi }_{\alpha }}{\partial x}\) is already computed by molecular mechanics force field based MD programs since it is required for the assembly of Vα. Thus, the only new quantity that must be computed is \(\frac{{\partial }^{2}{V}_{\alpha }}{\partial {\xi }_{\alpha }^{2}}\), which represents a negligible additional computational cost.

We note that the final functional form (10) of our preconditioner is very similar to that of Lindh et al.12, however we arrived at it from a fundamentally different perspective, which has several advantages. Lindh et al.‘s method was introduced for quadratic terms only, hence the force constants have to be recomputed after each optimisation step (as the equilibrium bond lengths, angles and dihedrals are set to the actual ones to obtain a positive semidefinite matrix). Our method can be considered as a generalisation of their approach, allowing arbitrary functional forms of internal coordinate dependent terms. In particular this means that the FF parameters need not be adjusted to achieve a positive semidefinite matrix, and incorporating different parameter sets is straightforward. Moreover, as it will be discussed below our perspective makes it easy to extend the preconditioner construction to new situations. Finally we mention that since the FF-based preconditioner is a sparse matrix both its construction and inversion are computationally inexpensive, which gives the possibility to utilise it even for large systems.

Combining FF and Exp preconditioners

For molecular crystals, intermolecular interactions also play an important role. This consideration led us to combine the molecular mechanics based FF preconditioner (describing bonded interactions) with a modified Exp preconditioner (4) (tuned to describe only non-bonded interactions, i.e. interactions already treated by the FF preconditioner are omitted):

$${P}_{\mathrm{Exp}+{\rm{FF}}}={P}_{\mathrm{Exp}}^{{\rm{nb}}}+{P}_{{\rm{FF}}}$$
(11)

PFF is fully specified from the chosen force field VFF. To specify \({P}_{\mathrm{Exp}}^{{\rm{nb}}}\) we first manually choose the parameters A and rcut in (4) to account for the interaction between molecules3. The remaining parameters are computed in a similar automatic manner as described in ref.3, and we keep only those matrix elements of PExp for which the corresponding matrix element in PFF is zero. We note that correct scaling between PFF and PExp is implicitly ensured via the μ parameter in Eq. 3.

Implementation details

We tested the FF and FF + Exp preconditioners on a range of optimisation and saddle point search tasks. For geometry optimisations the form of the preconditioned LBFGS method was identical to the one we describe in ref.3: at each iterate the search direction is given by

$$\begin{array}{c}{\bf{input}}\,q=\nabla f({x}_{k})\\ {s}_{k}={x}_{k}-{x}_{k-1}\\ {y}_{k}=\nabla f({x}_{k})-\nabla f({x}_{k-1})\\ {\rho }_{k}=1/{y}_{k}^{T}{s}_{k}\\ {\rm{for}}\,i=k,\ldots ,k-m\\ \,\,{\alpha }_{i}={\rho }_{i}{s}_{i}^{T}q\\ \,\,q=q-{\alpha }_{i}{y}_{i}\\ \boxed{z={P}_{k}^{-1}q}\\ {\rm{for}}\,i=k-m,\ldots ,k\\ \,\,{\beta }_{i}={\rho }_{i}{y}_{i}^{T}z\\ \,\,z=z+({\alpha }_{i}-{\beta }_{i}){s}_{i}\\ {\bf{output}}\,{p}_{k}=z\end{array}$$
(12)

with initial search direction \(z={P}_{0}^{-1}\nabla f({x}_{0})\). The box in Eq. 12 indicates the single modification of the standard algorithm to obtain preconditioning. Variable m is the maximum number of metric corrections. The step length selection is obtained by a backtracking line-search enforcing only the Armijo condition3. We note that although there exist several techniques (discussed in the Introduction) that use a quasi-Newton method in combination with some approximated Hessian information, their applicability is rather limited for large systems due to the excessive computational cost. Therefore our baseline was the unpreconditioned LBFGS method and where it was necessary we also compared our method to other preconditioning based techniques.

For saddle point search tasks we slightly modified the superlinearly converging dimer method18. Dimer methods use two copies of the system with coordinates x(1) and x(2) and a fixed separation length \(l=\parallel {x}^{\mathrm{(1)}}-{x}^{\mathrm{(2)}}\parallel \) between them. The algorithm is usually split into two alternating steps19: (1) in the rotation step we fix the midpoint and rotate the endpoints to approximately align them with the lowest (negative) eigenmode of the Hessian (vk); (2) in the translation step we shift the dimer to maximise the energy along the dimer direction while minimising energy in all directions perpendicular to it (pk).

In principle, both the rotation and translation steps can be preconditioned, however, we found that in many systems preconditioning the rotation step results in a smaller spectral gap and hence slower convergence. Therefore, we chose to precondition only the translation step. Our implementation employs the conjugate gradient method using the Polak-Ribière formula:

$$\begin{array}{c}{\bf{i}}{\bf{n}}{\bf{p}}{\bf{u}}{\bf{t}}\,{q}_{k}=-\,(I-2{v}_{k}\otimes {v}_{k})\nabla f({x}_{k})\\ \boxed{\beta =\frac{{q}_{k}^{T}{P}_{k}^{-1}({q}_{k}-{q}_{k-1})}{{q}_{k-1}^{T}{P}_{k}^{-1}{q}_{k-1}}}\\ {s}_{k}={q}_{k}+\beta {s}_{k-1}\\ {\bf{o}}{\bf{u}}{\bf{t}}{\bf{p}}{\bf{u}}{\bf{t}}\,\boxed{{P}_{k}=\frac{{P}^{-1}{s}_{k}}{{s}_{k}^{T}{P}^{-1}{s}_{k}}}\end{array}$$
(13)

and using the initial iterate s0 = −(I − 2v0v0)f(x0). Again, boxed steps are the modifications needed to the original method to achieve preconditioning. For computing the step length we used the trust region radius approach suggested by Kästner and Sherwood18, with acceptance criterion based on the projection of the gradient of the actual step.

For molecules in gas phase VFF is invariant under rotations and translations, hence PFF will be at least six fold degenerate with zero eigenvalues for any configuration of the molecule corresponding to the three translational and three rotational degrees of freedom. While these degrees of freedom could in principle be fixed we found that a straightforward solution is to simply regularise the preconditioner by replacing it with \(P\to P+cI\). We found that good generic values for c are 0.1 and 1.0 eV/Å2 for geometry optimisations and saddle point search, respectively.

Model systems and potentials

Organic molecules in gas phase

Three potential energy surfaces were investigated for geometry optimisations: semiempirical PM620, DFT21,22 and MP223. For the DFT potential we used the the BLYP exchange-correlation functional24,25 with the DZVP-MOLOPT basis set26 and plane wave cutoff of 480 Ry within the Gaussian and plane waves method (GPW) approach27 and Goedecker-Teter-Hutter (GTH) pseudopotentials28. For the MP2 calculations the 6–31G** basis set was applied.

In the case of geometry optimisations on the PM6 surface we compared three different force fields from which we constructed the preconditioners: the force field of Lindh et al. (LFF)12, the universal force field (UFF)29 and the generalised Amber force field (GAFF)30.

Initial configurations were taken from 0.5 ns long molecular dynamics (MD) simulations at 300 K.

For transition state search we selected 7 examples (with their initial configurations) from the benchmark of Baker and Chan31 and three additional systems whose initial configurations were taken from 0.5 ns long MD simulations at 300 K. The computations were performed on the semiempirical PM6 surface20.

Molecular crystals

We compared four different optimisation schemes (unpreconditioned, only FF-based, only Exp-based and Exp + FF-based preconditioners) on five organic molecular crystals (systems XVIII to XXII) whose initial geometries were taken from the Organic Crystal Structure Prediction competition of the Cambridge Crystallographic Data Centre32,33. We used a DFT potential energy surface with the PBE exchange-correlation functional34 with a plane wave basis set using a cutoff energy of 800 eV and ultrasoft pseudopotentials35.

Material systems

We also tested how the force field based preconditioner works on two material systems compared to the exponential preconditioner. We examined the unpreconditioned and preconditioned geometry optimisation for bulk silicon and vacancy with varying system size. The potential energy surface was the screened Tersoff potential36,37 and we used the universal force field (UFF)29 for building the FF-based preconditioner matrix. The bulk systems were built by using the bulk function of ASE38 with default lattice constants while for the corresponding vacancy systems a silicon atom was removed. Initial configurations were obtained by applying a random displacement on the atomic positions using normal distribution with standard deviation of 0.05 Å.

Next we considered bulk tungsten and a single interstitial site in bulk tungsten, also using different system sizes. The potential energy surface was a machine learning based Gaussian Approximation Potential (GAP) reproducing the quality of DFT (with PBE functional)39. The preconditioner in this case was based on a simple Embedded Atom Method (EAM) potential40,41. Initial configurations were obtained in a similar way as the bulk silicon ones but this time a standard deviation of 0.15 Å was applied.

Software

For the semiempirical PM6 method AmberTools1642 was used. The MP2 potential surface was generated using MOLPRO13,43. The screened Tersoff potential was provided by Atomistica44. The DFT potentials with the BLYP and PBE functionals were provided by CP2K45 and CASTEP46, respectively, using the QUIP interface47. The GAP model was called via QUIP47.

In all cases the geometry optimisation was performed within ASE38. The other software packages were only used to compute the energy and gradient of the configurations. A Python implementation of the FF and Exp + FF preconditioners with several potential forms of nonbonded terms is available within ASE38,48 (https://gitlab.com/molet/ase).

Initial structures of molecules and molecular crystals as well as the generating Python codes for the starting geometry of material systems are provided as Supplementary Data.

Accession codes

Python implementation of the preconditioners can be found here: [https://gitlab.com/molet/ase].

Results

Organic molecules in gas phase

We investigated several organic molecules’ geometry optimisations with (FF) and without (ID) our new FF-based preconditioner on three potential energy surfaces (PM6, DFT and MP2), using the GAFF force field for building the preconditioner. The results are shown in Table 1 and Fig. 1. The convergence criterion of the geometry optimisations was \(\parallel \nabla E{\parallel }_{\infty }={10}^{-3}\) eV Å−1 for DFT and MP2 surfaces while for the relatively inexpensive PM6 potential we applied a slightly tighter threshold of \(\parallel \nabla E{\parallel }_{\infty }={10}^{-4}\) eV Å−1. Depending on the system and underlying potential we can observe a 4–10 fold decrease in the required number of optimisation steps using our preconditioner.

Table 1 Total number of function/gradient calls of geometry optimisation for organic molecules in gas phase using conventional (ID) and FF-based preconditioned (FF) LBFGS method on three different quantum chemistry surfaces. Convergence threshold was \(\parallel \nabla E{\parallel }_{\infty }={10}^{-4}\) eV Å−1 for PM6 and \(\parallel \nabla E{\parallel }_{\infty }={10}^{-3}\) eV Å−1 for DFT and MP2 potentials, respectively.
Figure 1
figure 1

Computational saving of FF/GAFF preconditioner over unpreconditioned LBFGS optimisations for geometry optimisation of molecular systems in gas phase on three model potentials.

For PM6 only, to highlight the correlation between performance gain and ill-conditioning, we also computed the ratio between the condition numbers for the unpreconditioned and preconditioned Hessians, κI/κP, at the minima, where

$${\kappa }_{P}=\frac{{\lambda }_{P}^{max}}{{\lambda }_{P}^{min}}=\frac{\mathop{max}\limits_{{u}^{T}Pu=1}{u}^{T}Hu}{\mathop{min}\limits_{{u}^{T}Pu=1}{u}^{T}\mathop{H}\limits^{ \sim }u}$$
(14)

where \(\tilde{H}\) is a modified Hessian where the zero eigenvalue due to symmetries are removed. In Fig. 2 we observe that the computational saving is more strongly correlated to the condition number ratio than to the system size.

Figure 2
figure 2

Correlation between the computational saving and the condition number ratio at potential energy minima of molecular systems in gas phase on PM6 potential.

For the three smallest systems and the PM6 surface only we also compared our FF-based preconditioner against using the exact Hessian of the model potential (Hessian/GAFF) and a finite-difference Hessian of PM6 (Hessian/PM6) as preconditioners for LBFGS method. Zero eigenvalues of the Hessian matrices were shifted to a moderate positive number to avoid numerical instabilities. The results are collected in Table 2 and Fig. 3. Our FF-based preconditioner clearly outperforms both of these variants.

Table 2 Comparison of total number of function/gradient calls of geometry optimisation of different optimisation algorithms for minimisation of small organic molecules on PM6 surface: unpreconditioned LBFGS (ID), FF-based preconditioned LBFGS (FF/GAFF), FF-Hessian based preconditioned LBFGS (Hessian/GAFF), PM6-Hessian based preconditioned LBFGS (PM6/GAFF). Convergence threshold was \(\parallel \nabla E{\parallel }_{\infty }={10}^{-4}\) eV Å−1 for all cases.
Figure 3
figure 3

Computational gain of different optimisation algorithms compared to the unpreconditioned LBFGS method for geometry optimisation of molecular systems in gas phase on PM6 potential.

We also examined the effect of using different force fields from which to construct the FF-based preconditioner. Beside the GAFF force field, we investigated two general force fields: the universal force field (UFF)29 and a general force field introduced by Lindh et al. (LFF)12. The results shown in Table 3 and Fig. 4 indicate that there is no significant difference between the three force fields. We only mention that the LFF force-field includes all possible 2, 3 and 4–body interactions12, resulting in a dense preconditioner matrix, which for larger systems and an efficient potential energy surface could become a performance bottleneck. By contrast, the preconditioners based on GAFF or UFF are sparse, hence their cost scales linearly with system size.

Table 3 Comparison of the effect of different force field based preconditioners for the geometry optimisation of organic molecules in gas phase (total number function/gradient calls). Convergence threshold was \(\parallel \nabla E{\parallel }_{\infty }={10}^{-4}\) eV Å−1 for all cases.
Figure 4
figure 4

Computational gain of the FF-based preconditioner using different force fields for geometry optimisation of molecular systems in gas phase on PM6 potential.

Finally, we tested how the different preconditioners perform when applied to transition state search, comparing again against ID (no preconditioning) and against the Exp preconditioner (with default parameter set and μ = 1; unlike LBFGS, CG is invariant under rescaling of μ). The results are collected in Table 4 and Fig. 5. Overall the Exp preconditioner does not improve significantly over ID. We experimented with different parameters, e.g., adding connectivity information up to the 4-body interaction, but observed no improvements. Both FF-based preconditioners are again comparable and yield a much improved convergence even for these relatively small systems. For instance, the gain is already 2–3-fold for dimethyl-phosphate and tyrosine hydrolyses.

Table 4 Number of steps of translations (and total number of function and gradient calls in parentheses) of saddle searches using different preconditioned variants of superlinearly converging dimer method. Convergence threshold was \(\parallel \nabla E{\parallel }_{\infty }={10}^{-4}\) eV Å−1 for all cases.
Figure 5
figure 5

Computational gain of the Exp and FF-based preconditioners compared to the unpreconditioned superlinearly converging dimer method for molecular systems in gas phase on PM6 potential.

Molecular crystals

We compared geometry optimisation with fixed unit cells using LBFGS, preconditioned with ID (unpreconditioned), Exp3 and FF (GAFF force field). For Exp the nearest neighbour distance (rnn) in Eq. 4 was calculated from the initial structure, we specified rcut = 2rnn and A = 3.0. In addition, we also employed the Exp + FF preconditioner as defined in (11). The results for different molecular crystals are shown in Table 5 and Fig. 6. As expected, Exp reduces the number of optimisation steps compared to ID, although the improvement is significantly smaller for molecular crystals that for material systems3. Interestingly, FF alone already leads to a significant speed-up over both ID and Exp even though the inter-molecular interaction is not captured well. This indicates that for molecular crystal optimisations preconditioning based on specific intramolecular information is crucial. The most successful method was the Exp + FF combination, which leads to a 3-7 fold speed up even for these relatively small test systems.

Table 5 Total number of function/gradient calls of geometry optimisation of molecular crystals using different preconditioning strategies. Convergence criterion was \(\parallel \nabla E{\parallel }_{\infty }={1.0}^{-3}\) eV Å−1.
Figure 6
figure 6

Computational gain of the Exp, FF and Exp + FF preconditioned over unpreconditioned LBFGS for geometry optimisation of molecular crystals on a DFT potential.

Material systems

Finally, it is also interesting to investigate how an FF-based preconditioner compares against the Exp preconditioner for material systems, where Exp performs very well3. We tested geometry optimisation of bulk silicon and a vacancy in bulk silicon with perturbed initial conditions, with increasing system size. The screened Tersoff potential was used as the potential energy, while the FF preconditioner was constructed from UFF. The results are shown in Table 6. In both cases, the FF-based preconditioner yields a clear further speed-up over Exp for both systems.

Table 6 Total number of function/gradient calls geometry optimisation steps of bulk silicon and a bulk silicon vacancy using different preconditioning strategies. Convergence criterion was \(\parallel \nabla E{\parallel }_{\infty }={1.0}^{-3}\) eV Å−1.

Another test system was bulk tungsten and a single interstitial site in bulk tungsten with perturbed initial conditions and different system sizes. The potential energy surface was provided by a GAP model that was trained on DFT data. The preconditioner was based on a simple EAM potential:

$${V}_{{\rm{E}}{\rm{A}}{\rm{M}}}=\sum _{i}[\frac{1}{2}\sum _{j\ne i}{\rm{\Phi }}({r}_{ij})+F(\sum _{j\ne i}\rho ({r}_{ij}))]$$
(15)

where Φ(rij) is a pair potential, F is the embedding function and ρ(rij) is the electron charge density contribution from atom j to atom i. Based on Eq. 10 our FF-based preconditioner was defined as \({P}_{{\rm{F}}{\rm{F}}}={\sum }_{\alpha }\frac{{\rm{\partial }}{r}_{\alpha }}{{\rm{\partial }}x}\otimes \frac{{\rm{\partial }}{r}_{\alpha }}{{\rm{\partial }}x}|\frac{{{\rm{\partial }}}^{2}{V}_{{\rm{E}}{\rm{A}}{\rm{M}}}}{{\rm{\partial }}{r}_{\alpha }^{2}}|\), where α runs over all ij pairs. In the actual implementation Φ, F and ρ functions are represented by splines so computing the corresponding curvature is fairly straightforward.

The results are presented in Table 7. For both the bulk and interstitial systems the number of function/gradient calls of the unpreconditioned optimisation increases with system size while the preconditioned optimisations require almost the same number of optimisation steps to achieve the same convergence criterion.

Table 7 Total number of function/gradient calls for geometry optimisation of bulk tungsten and interstitial defect in bulk tungsten using different preconditioning strategies. Convergence criterion was \(\parallel \nabla E{\parallel }_{\infty }={1.0}^{-3}\) eV Å−1.

Conclusion

We introduced a flexible preconditioner for molecular simulation based on empirical potentials that are widely implemented in popular molecular mechanical program packages. Our method, which can be considered a generalisation of Lindh et al.12, decomposes the analytic Hessian of the empirical potential and modifies individual components to ensure their positivity. An advantage of this procedure is that it avoids the computation of second derivatives of the collective variables (or internal coordinates). The preconditioner yields significant improvements (at least 2 fold, and typically 5 fold decrease in function/gradient calls compared to unpreconditioned techniques), demonstrated thoroughly on a wide range of systems including molecules in gas phase, molecular crystals and materials, using different target potential energy surfaces (empirical, semiempirical and ab initio) as well as different optimisation tasks (geometry optimisations and saddle point searches).