Preconditioners for the geometry optimisation and saddle point search of molecular systems

A class of preconditioners is introduced to enhance geometry optimisation and transition state search of molecular systems. We start from the Hessian of molecular mechanical terms, decompose it and retain only its positive definite part to construct a sparse preconditioner matrix. The construction requires only the computation of the gradient of the corresponding molecular mechanical terms that are already available in popular force field software packages. For molecular crystals, the preconditioner can be combined straightforwardly with the exponential preconditioner recently introduced for periodic systems. The efficiency is demonstrated on several systems using empirical, semiempirical and ab initio potential energy surfaces.

Geometry optimisation and transition state search are fundamental procedures to identify important stationary points of molecules, molecular crystals and material systems in computational chemistry. Since the evaluation of chemically accurate ab initio potential energies and gradients are computationally demanding, several techniques have been developed over the last three decades to enhance the convergence of optimisation methods.
Among the most widely used are quasi-Newton methods, in particular BFGS or its limited memory version 1 , which start with a (scaled) identity as their guess for the Hessian and update it at each iteration based on the gradient information collected from previous steps. Initialising with a Hessian guess that includes more molecule specific geometrical information can improve the convergence. For instance, just introducing some connectivity information about a molecule can lead to surprisingly good results 2,3 .
Utilising internal coordinates in the construction of Hessian has also been intensively investigated [4][5][6][7][8][9] . These approaches include (1) methods that build the Hessian matrix in the space of internal coordinates in the beginning of the optimisation and then transform it to Cartesian coordinates, (2) methods that recompute and transform the internal Hessian every step and (3) methods that carry out even the optimisation step in the space of internal coordinates. However, this latter scheme requires to carry out both the projection of Cartesian gradients to internal coordinates gradients and the transformation of the internal coordinates to Cartesian coordinates. None of these steps is straightforward and the gradient conversion can be accomplished only by an iterative solution due to the curvilinear nature of the transformation 8 .
More sophisticated methods estimate the initial Hessian guess from a surrogate potential (e.g. force field or semiempirical potential) whose second derivative can be obtained at low computational cost. The update of the approximate Hessian can then be achieved either by using quasi-Newton methods or other techniques such as DIIS 10,11 . Such strategies significantly improve the speed of convergence in either Cartesian or internal coordinates 6 .
If the model Hessian is cheap to calculate (e.g., as obtained from a surrogate model) and provides a reasonable approximation of the quantum Hessian, it may be advantageous to recompute it at every optimisation step. Such a scheme was introduced by Lindh et al. 12 , where a model potential is constructed consisting of quadratic terms for all distances, angles and dihedrals in the molecule. At each geometry optimisation step the force field is constructed such that the current conformation is its local minimum, and its "Hessian" is computed, neglecting the dependence of the force field parameters on the geometry. With this construction, the model "Hessian" is therefore not the Hessian of any potential. Nevertheless, this approach yields excellent performance, which led to its wide implementation in quantum chemistry program packages (e.g. in MOLPRO 13 , ORCA 14 , DALTON 15 , CRYSTAL 16 ).
The model "Hessian" of Lindh 12 can be considered as a preconditioner or metric that effects a transformation to a new coordinate system where the optimisation problem is better conditioned, hence algorithms converge more rapidly and tend to be more robust. Geometrically, the shape of the energy landscape becomes more isotropic. In general it is desirable that both the construction and inversion of the preconditioner matrix are inexpensive (at least compared to the computation of the energy and gradient). This can be achieved by building a sparse preconditioner from simple analytical functions. Since the preconditioner defines a metric in configuration space it needs to be positive definite. This requirement is automatically fulfilled in the Lindh approach 12 by the use of quadratic molecular mechanical terms with equilibrium points corresponding to the actual geometry at each step.
We recently introduced a general and effective preconditioner for geometry optimisation and saddle point search for material systems 3 . This preconditioner is determined by the local connectivity structure of atoms making both the construction and inversion computationally inexpensive. Especially for larger systems we observe an order of magnitude or larger reduction of the number of optimisation steps for the preconditioned LBFGS method compared to the same without preconditioning. Here we expand our previous work to molecules and molecular crystals by combining it with a force field based preconditioner inspired by the approach of Lindh et al. 12 .

Methods
Enhancing geometry optimisation by using preconditioners. We briefly review the methodology for preconditioning geometry optimisation and the dimer saddle point search method for material systems 3 . For a system with N particles let  ∈ x k N 3 denote the configuration at the k th iterate of an optimisation algorithm. The corresponding energy, gradient and preconditioner are denoted by , respectively. A preconditioned steepest-descent step is then given by where α k is the step size obtained from some line search procedure at the k th iteration. If P k = I, then (1) becomes the standard steepest descent scheme and if P k = ∇ 2 f(x k ), then it becomes a quadratically convergent Newton scheme. In general, different choices of P k may "interpolate" between these extremes. From an alternative point of view preconditioning can be thought of as a coordinate transformation, where a new set of coordinates is defined as = y P x : k k 1/2 . The advantage of this framework is that once an appropriate preconditioner matrix is available then any optimisation algorithm can be modified by applying the original algorithm on the transformed coordinates. To obtain the final form of the modified algorithm we need to transform the variables back to the original coordinate system. As a simple example, applying the coordinate transformation on the gradient descent equation immediately leads to the equation of quasi Newton schemes (eq. 1): where i and j denote atomic indices and μ, A, r cut and r nn are parameters that can be user-specified or estimated numerically. We note that (3) is a generalisation of the Laplacian matrix used to represent undirected graphs. Given a specific connectivity defined by r cut and setting A = 0 and μ = 1, L ij reduces exactly to the Laplacian matrix. The actual 3N × 3N preconditioner is simply obtained from the corresponding L ij element in an isotropic manner: where the k and l indices denote Cartesian components. Despite its simplicity in capturing only geometric connectivity but no specific material information, P Exp provides a good model for the local curvature of the potential energy landscape, which effectively controls ill-conditioning in large systems. The application of P Exp resulted in a significant reduction of the number of optimisation steps required for several material systems compared to the unpreconditioned LBFGS 3 .

FF-based preconditioners.
Preliminary tests showed that for molecular systems such as molecules in gas phase or molecular crystals using P Exp still leads to a speed-up, but a much more modest one than for material systems. The explanation for this is that molecular systems contain a wide range of different interactions (e.g., pair, angle, dihedral, electrostatic, dispersive) of vastly varying stiffness which in addition are more loosely coupled, and this creates a second source of ill-conditioning distinct from ill-conditioning due to large system size. Inspired by the use of internal coordinates in molecular optimisation techniques 4 and the model Hessian of Lindh et al. 12 we therefore propose a generalisation of P Exp that is effective also for molecular systems. The construction of our FF-based preconditioner begins with a surrogate potential energy function, given by a sum over internal coordinates each describing a short-range bond in the system (distance, angle, or dihedral), The individual potential energy terms are in general simple functions of the internal coordinates. Some examples of most typical forms are the quadratic, Morse or torsional potentials, respectively given by Quadratic 0 Morse 0 0 2 Torsion 0 where the corresponding parameters can be taken from standard force field libraries. Due to its simple functional form FF is cheap to compute. If only short-range bonds are taken into account then it is also sparse, hence it is cheap to store and invert. Moreover, we expect that V FF gives a good qualitative approximation to the quantum potential energy landscape, hence H FF is a good qualitative approximation to the quantum Hessian ∇ 2 f. Therefore, H FF satisfies all the conditions required for a preconditioner except that it will in general be indefinite. A conceptually straightforward but computationally expensive approach to overcome this limitation is to enforce positivity by replacing all eigenvalues of H FF with their absolute values. Instead, we propose to analytically modify the local Hessian contributions to ensure overall positivity, resulting in a further reduction in computational cost.
The Hessian contribution from V α is given by where we have decomposed H α into two terms, α H (1) and α H (2) . If V α is quadratic then . This in fact is the case in the Lindh approach 12 .
Instead of adjusting V α at every step such that the geometry corresponds to its equilibrium, here we simply drop α H (2) and only use α H (1) to construct the preconditioner, thus ensuring that it always stays positive definite. For non-quadratic contributions we expect that > for most but not all bonds, hence we enforce positivity by replacing it with its absolute value. This leads to the following general preconditioner for molecular systems: is already computed by molecular mechanics force field based MD programs since it is required for the assembly of ∇V α . Thus, the only new quantity that must be computed is , which represents a negligible additional computational cost.
We note that the final functional form (10) of our preconditioner is very similar to that of Lindh et al. 12 , however we arrived at it from a fundamentally different perspective, which has several advantages. Lindh et al.'s method was introduced for quadratic terms only, hence the force constants have to be recomputed after each optimisation step (as the equilibrium bond lengths, angles and dihedrals are set to the actual ones to obtain a positive semidefinite matrix). Our method can be considered as a generalisation of their approach, allowing arbitrary functional forms of internal coordinate dependent terms. In particular this means that the FF parameters need not be adjusted to achieve a positive semidefinite matrix, and incorporating different parameter sets is straightforward. Moreover, as it will be discussed below our perspective makes it easy to extend the preconditioner construction to new situations. Finally we mention that since the FF-based preconditioner is a sparse matrix both its construction and inversion are computationally inexpensive, which gives the possibility to utilise it even for large systems.
Combining FF and Exp preconditioners. For molecular crystals, intermolecular interactions also play an important role. This consideration led us to combine the molecular mechanics based FF preconditioner (describing bonded interactions) with a modified Exp preconditioner (4) (tuned to describe only non-bonded interactions, i.e. interactions already treated by the FF preconditioner are omitted): Exp FF E xp nb FF P FF is fully specified from the chosen force field V FF . To specify P Exp nb we first manually choose the parameters A and r cut in (4) to account for the interaction between molecules 3 . The remaining parameters are computed in a similar automatic manner as described in ref. 3 , and we keep only those matrix elements of P Exp for which the corresponding matrix element in P FF is zero. We note that correct scaling between P FF and P Exp is implicitly ensured via the μ parameter in Eq. 3. Implementation details. We tested the FF and FF + Exp preconditioners on a range of optimisation and saddle point search tasks. For geometry optimisations the form of the preconditioned LBFGS method was identical to the one we describe in ref. 3 : at each iterate the search direction is given by The box in Eq. 12 indicates the single modification of the standard algorithm to obtain preconditioning. Variable m is the maximum number of metric corrections. The step length selection is obtained by a backtracking line-search enforcing only the Armijo condition 3 . We note that although there exist several techniques (discussed in the Introduction) that use a quasi-Newton method in combination with some approximated Hessian information, their applicability is rather limited for large systems due to the excessive computational cost. Therefore our baseline was the unpreconditioned LBFGS method and where it was necessary we also compared our method to other preconditioning based techniques.
For saddle point search tasks we slightly modified the superlinearly converging dimer method 18 . Dimer methods use two copies of the system with coordinates x (1) and x (2) and a fixed separation length = − l x x (1) between them. The algorithm is usually split into two alternating steps 19 : (1) in the rotation step we fix the midpoint and rotate the endpoints to approximately align them with the lowest (negative) eigenmode of the Hessian (v k ); (2) in the translation step we shift the dimer to maximise the energy along the dimer direction while minimising energy in all directions perpendicular to it (p k ). In principle, both the rotation and translation steps can be preconditioned, however, we found that in many systems preconditioning the rotation step results in a smaller spectral gap and hence slower convergence. Therefore, we chose to precondition only the translation step. Our implementation employs the conjugate gradient method using the Polak-Ribière formula: For molecules in gas phase V FF is invariant under rotations and translations, hence P FF will be at least six fold degenerate with zero eigenvalues for any configuration of the molecule corresponding to the three translational and three rotational degrees of freedom. While these degrees of freedom could in principle be fixed we found that a straightforward solution is to simply regularise the preconditioner by replacing it with → + P P cI. We found In the case of geometry optimisations on the PM6 surface we compared three different force fields from which we constructed the preconditioners: the force field of Lindh et al. (LFF) 12 , the universal force field (UFF) 29 and the generalised Amber force field (GAFF) 30 .
Initial configurations were taken from 0.5 ns long molecular dynamics (MD) simulations at 300 K. For transition state search we selected 7 examples (with their initial configurations) from the benchmark of Baker and Chan 31 and three additional systems whose initial configurations were taken from 0.5 ns long MD simulations at 300 K. The computations were performed on the semiempirical PM6 surface 20 .

Molecular crystals.
We compared four different optimisation schemes (unpreconditioned, only FF-based, only Exp-based and Exp + FF-based preconditioners) on five organic molecular crystals (systems XVIII to XXII) whose initial geometries were taken from the Organic Crystal Structure Prediction competition of the Cambridge Crystallographic Data Centre 32,33 . We used a DFT potential energy surface with the PBE exchange-correlation functional 34 with a plane wave basis set using a cutoff energy of 800 eV and ultrasoft pseudopotentials 35 .
Material systems. We also tested how the force field based preconditioner works on two material systems compared to the exponential preconditioner. We examined the unpreconditioned and preconditioned geometry optimisation for bulk silicon and vacancy with varying system size. The potential energy surface was the screened Tersoff potential 36,37 and we used the universal force field (UFF) 29 for building the FF-based preconditioner matrix. The bulk systems were built by using the bulk function of ASE 38 with default lattice constants while for the corresponding vacancy systems a silicon atom was removed. Initial configurations were obtained by applying a random displacement on the atomic positions using normal distribution with standard deviation of 0.05 Å.
Next we considered bulk tungsten and a single interstitial site in bulk tungsten, also using different system sizes. The potential energy surface was a machine learning based Gaussian Approximation Potential (GAP) reproducing the quality of DFT (with PBE functional) 39 . The preconditioner in this case was based on a simple Embedded Atom Method (EAM) potential 40,41 . Initial configurations were obtained in a similar way as the bulk silicon ones but this time a standard deviation of 0.15 Å was applied.
Software. For the semiempirical PM6 method AmberTools16 42 was used. The MP2 potential surface was generated using MOLPRO 13,43 . The screened Tersoff potential was provided by Atomistica 44 . The DFT potentials with the BLYP and PBE functionals were provided by CP2K 45 and CASTEP 46 , respectively, using the QUIP interface 47 . The GAP model was called via QUIP 47 .
In all cases the geometry optimisation was performed within ASE 38 . The other software packages were only used to compute the energy and gradient of the configurations. A Python implementation of the FF and Exp + FF preconditioners with several potential forms of nonbonded terms is available within ASE 38,48

Results
Organic molecules in gas phase. We investigated several organic molecules' geometry optimisations with (FF) and without (ID) our new FF-based preconditioner on three potential energy surfaces (PM6, DFT and MP2), using the GAFF force field for building the preconditioner. The results are shown in Table 1 and Fig. 1. The convergence criterion of the geometry optimisations was ∇ = ∞ − E 10 3 eV Å −1 for DFT and MP2 surfaces while for the relatively inexpensive PM6 potential we applied a slightly tighter threshold of ∇ = ∞ − E 10 4 eV Å −1 . Depending on the system and underlying potential we can observe a 4-10 fold decrease in the required number of optimisation steps using our preconditioner.
For PM6 only, to highlight the correlation between performance gain and ill-conditioning, we also computed the ratio between the condition numbers for the unpreconditioned and preconditioned Hessians, κ I /κ P , at the minima, where  where ∼ H is a modified Hessian where the zero eigenvalue due to symmetries are removed. In Fig. 2 we observe that the computational saving is more strongly correlated to the condition number ratio than to the system size.
For the three smallest systems and the PM6 surface only we also compared our FF-based preconditioner against using the exact Hessian of the model potential (Hessian/GAFF) and a finite-difference Hessian of PM6 (Hessian/PM6) as preconditioners for LBFGS method. Zero eigenvalues of the Hessian matrices were shifted to a moderate positive number to avoid numerical instabilities. The results are collected in Table 2 and Fig. 3. Our FF-based preconditioner clearly outperforms both of these variants.
We also examined the effect of using different force fields from which to construct the FF-based preconditioner. Beside the GAFF force field, we investigated two general force fields: the universal force field (UFF) 29 and a general force field introduced by Lindh et al. (LFF) 12 . The results shown in Table 3 and Fig. 4 indicate that there is no significant difference between the three force fields. We only mention that the LFF force-field includes all possible 2, 3 and 4-body interactions 12 , resulting in a dense preconditioner matrix, which for larger systems and an efficient potential energy surface could become a performance bottleneck. By contrast, the preconditioners based on GAFF or UFF are sparse, hence their cost scales linearly with system size.   Table 3. Comparison of the effect of different force field based preconditioners for the geometry optimisation of organic molecules in gas phase (total number function/gradient calls). Convergence threshold was ∇ = ∞ − E 10 4 eV Å −1 for all cases. Finally, we tested how the different preconditioners perform when applied to transition state search, comparing again against ID (no preconditioning) and against the Exp preconditioner (with default parameter set and μ = 1; unlike LBFGS, CG is invariant under rescaling of μ). The results are collected in Table 4 and Fig. 5. Overall the Exp preconditioner does not improve significantly over ID. We experimented with different parameters, e.g., adding connectivity information up to the 4-body interaction, but observed no improvements. Both FF-based preconditioners are again comparable and yield a much improved convergence even for these relatively small systems. For instance, the gain is already 2-3-fold for dimethyl-phosphate and tyrosine hydrolyses.

Molecular crystals.
We compared geometry optimisation with fixed unit cells using LBFGS, preconditioned with ID (unpreconditioned), Exp 3 and FF (GAFF force field). For Exp the nearest neighbour distance (r nn ) in Eq. 4 was calculated from the initial structure, we specified r cut = 2r nn and A = 3.0. In addition, we also employed the Exp + FF preconditioner as defined in (11). The results for different molecular crystals are shown in Table 5 and Fig. 6. As expected, Exp reduces the number of optimisation steps compared to ID, although the improvement is significantly smaller for molecular crystals that for material systems 3 . Interestingly, FF alone already leads to a significant speed-up over both ID and Exp even though the inter-molecular interaction is not captured well. This indicates that for molecular crystal optimisations preconditioning based on specific intramolecular information is crucial. The most successful method was the Exp + FF combination, which leads to a 3-7 fold speed up even for these relatively small test systems.    3 . We tested geometry optimisation of bulk silicon and a vacancy in bulk silicon with perturbed initial conditions, with increasing system size. The screened Tersoff potential was used as the potential energy, while the FF preconditioner was constructed from UFF. The results are shown in Table 6. In both cases, the FF-based preconditioner yields a clear further speed-up over Exp for both systems.
Another test system was bulk tungsten and a single interstitial site in bulk tungsten with perturbed initial conditions and different system sizes. The potential energy surface was provided by a GAP model that was trained on DFT data. The preconditioner was based on a simple EAM potential: where Φ(r ij ) is a pair potential, F is the embedding function and ρ(r ij ) is the electron charge density contribution from atom j to atom i. Based on Eq. 10 our FF-based preconditioner was defined as where α runs over all ij pairs. In the actual implementation Φ, F and ρ functions are represented by splines so computing the corresponding curvature is fairly straightforward. The results are presented in Table 7. For both the bulk and interstitial systems the number of function/gradient calls of the unpreconditioned optimisation increases with system size while the preconditioned optimisations require almost the same number of optimisation steps to achieve the same convergence criterion.   Table 6. Total number of function/gradient calls geometry optimisation steps of bulk silicon and a bulk silicon vacancy using different preconditioning strategies. Convergence criterion was ∇ = .

System (# of atoms)
Bulk geometry optimisation  Table 7. Total number of function/gradient calls for geometry optimisation of bulk tungsten and interstitial defect in bulk tungsten using different preconditioning strategies. Convergence criterion was ∇ = .

Conclusion
We introduced a flexible preconditioner for molecular simulation based on empirical potentials that are widely implemented in popular molecular mechanical program packages. Our method, which can be considered a generalisation of Lindh et al. 12 , decomposes the analytic Hessian of the empirical potential and modifies individual components to ensure their positivity. An advantage of this procedure is that it avoids the computation of second derivatives of the collective variables (or internal coordinates). The preconditioner yields significant improvements (at least 2 fold, and typically 5 fold decrease in function/gradient calls compared to unpreconditioned techniques), demonstrated thoroughly on a wide range of systems including molecules in gas phase, molecular crystals and materials, using different target potential energy surfaces (empirical, semiempirical and ab initio) as well as different optimisation tasks (geometry optimisations and saddle point searches).