Introduction

Understanding the energetics of solvation is a fundamental part of describing biophysical processes. The liquid state properties are important in their own right, play a key role in battery design, and are a major part of more structured biological ensembles: e.g., protein shape and behavior, protein–ligand complexes and cell membranes. Because of the overwhelming complexity of ab initio calculations the underlying quantum mechanics must be represented by Newtonian models. The art and science of simulating these systems have been in development since the 1960’s1,2 and many force fields that describe proteins and other functional groups have been created and are widely used. However, state-of-the-art wide-coverage molecular force fields3,4,5,6,7,8,9 in simulation packages that enable free energy computations derive some or all of their parameters by fitting to empirical observables. There are at least two drawbacks to this approach. First, even available experimental data (e.g., densities, heats of vaporization) are insufficient to produce models that describe existing compounds precisely; and there will always be molecules (that, for example, haven’t been synthesized) that will require more precise description than is available from existing inference. Second, if an empirical model’s prediction is erroneous, it is exceedingly difficult to decide exactly which parameter(s) to remove, add, correct or adjust. A major advantage of Quantum Mechanical (QM)-parametrized physics-based molecular models (force fields)10,11 is that, with some caveats for molecular size, QM calculations12 can be obtained for arbitrary molecules. Another advantage is that prediction errors can be traced to the imprecise description of the interaction energies and rectified in the model. It is therefore highly desirable to create models parameterized entirely from first-principles (ab initio) Quantum Mechanical calculations.

The value of ±0.5 kcal/mol for the desired (“chemical”) accuracy of free energy predictions arises from several considerations. First and foremost, 0.59 kcal/mol is the thermal noise at ambient conditions (room temperature and pressure). This is the inherent fuzziness of our everyday biological world. Additionally, for example, in ligand–protein lead optimization the definition of “incremental lead improvement” is about 0.5 kcal/mol or ~2–3-fold increase in binding affinity.

We have implemented a QM-parametrized force field in a simulation stack that covers arbitrary organic molecules and predicts solvation free energies of molecular systems to accuracy of ~0.3 kcal/mol for neutral species. The predictions in the liquid phase are satisfyingly accurate, and it is also satisfying that the model is created solely from ab initio computational methods without fitting to any experimental data. We demonstrate the predictive ability of the model and simulation machinery by computing solvation free energies for a wide range of chemical functional groups in water and cyclohexane.

Results

QM-FF agreement

We start by creating a model that represents the QM energies of the ensemble accurately enough. A description of the intermolecular functional form, the component decomposition, and the parametrization procedure is in Supplementary methods (Quantum mechanical details, force field description, force field functional form of ARROW FF, and parameter fitting), Supplementary Fig. 1 and in references8,13. Though models of isolated chemical species with exquisite agreement to QM energies do exist14,15, the complexity required by such precision has prevented researchers from describing arbitrary functional groups simultaneously. One of the contributions of this work is determining the degree of faithfulness that is sufficient for modeling the liquid phase of arbitrary organic molecules and mixtures while keeping the model complexity manageable.

The first step is choosing the level and accuracy of the underlying QM computations. We fit the intermolecular interactions to dimer and select multimer QM energies at the highest level of theory practical for large-scale parameterization. This “silver-like standard”16 is commonly used as a benchmark in the computational chemistry community, and is within 0.05 kcal/mol from the “gold standard”16. More details can be found in Supplementary methods (Quantum Mechanical details).

The next step is encapsulating the QM interaction energies in a physics-based analytical model8,13. The required faithfulness demands a significant level of complexity from the functional form: polarizability terms enable proper transferability from dimer to bulk energies17; multipole descriptions of both the electrostatic18 and exchange-repulsion interactions permit a precise fit of the potential energy surface for all dimer orientations8,19; a fairly detailed typification accounts for the difference in interaction properties of identical atoms in diverse chemical environments. The force field description including the functional form, and the parametrization workflow and pseudo-code, are discussed in detail in the Supplementary methods. The deviation (MAE) between Quantum mechanical (QM) and force field (FF) energies for all the benchmark dimers and multimers in our training set is 0.17 kcal/mol and the error distribution is centered around zero (Fig. 1a, b, e, f, g). In Fig. 1c, dwe illustrate the QM-FF agreement for a single representative system, a strongly interacting ethanol-water dimer. Additionally, the FF:QM errors for ethanol-water dimers as a function of closest distance are shown in Supplementary Fig. 2. Both the total energies as well as their individual components for this system agree to within 0.1 kcal/mol to their ab initio counterparts. To aid transferability, in addition to reproducing the total energy, we also match the individual components to their corresponding QM counterparts (Fig. 1c, d). To investigate the training-test convergence dependence of dimer space on our force field parameters we conducted this test on a subset of molecules and convergence plots are presented in Supplementary Fig. 4.

Fig. 1: QM: FF energies’ correspondences and deviations.
figure 1

a FF vs. QM energies for all the dimers in our training sets. The functional form reproduces the lower energy conformations very well and is designed to permit a larger error in less important high-energy high electron overlap regions. b the distribution of errors for our training dimer sets. The MAE of errors are 0.17 kcal/mol for all, 0.19 kcal/mol for dimers with water (total number of dimers = 36,309), and 0.16 kcal/mol for dimers with alkanes (total number of dimers = 25,986): A specific system (ethanol-water) provides a more detailed illustration of model energies and their correspondence with QM. c, d dissociation curves for primary (c) and secondary (d) minima of the ethanol-water dimer. QM energies are solid lines and FF values are filled circles. The colors designate the energy components: electrostatics (ES), exchange-repulsion (EX), dispersion (DS) and induction (IND). The agreements for the total energy and for each component are excellent. eg Error distributions for the ethanol-water dimer: e is analogous to a; f is a difference plot offering a more detailed view and is projected onto (g) the error distribution. The MAE for the errors in this system is 0.08 kcal/mol.

Molecular deformations (“bonded interactions”), especially torsions, are critical for correct solvation results because they determine the proper solvent accessibility. A variety of accurate models long established in the field3,5,6 as well as brilliant recent developments20,21 provide excellent reproduction of the intramolecular energies. We take the functional form of the bonded interactions from MMFF943), with force constants and equilibrium values fitted to QM energies at the MP2/aug-cc-pVTZ level of theory.

Solvents

We selected water and cyclohexane as our solvents for this benchmark study. Water, of course, is the most ubiquitous molecule in any biophysical model. We chose cyclohexane because it is nonpolar, it equilibrates relatively quickly, and because there is ample reliable experimental data for both cyclohexane (CHEX) solvation free energies, as well as for the cyclohexane/water (CHEX/H2O) partition coefficients. Though the two molecules were parameterized with exactly the same procedure as every other functional group, they participate in bulk and thus warrant extra examination of their liquid-state properties.

The liquid phase must properly model not only the molecular 2-body interactions described in the previous section, but also the many-body contributions. For water, which is small, polar, and polarizable, the many-body energies are estimated to be a sizable 27% of the total22. Figure 2a shows the non-additive energies of select optimized water multimers. Additionally, we also show the non-additive behavior in the case of ethanol–water multimers, see Supplementary Fig. 3. They are in excellent agreement with their reference QM values, confirming that the energy partitioning and the induction terms of our polarizable model capture the non-additive fraction properly.

Fig. 2: Properties of the ARROW water model.
figure 2

a The non-additive many-body error for water multimers vs. their total QM intermolecular energy. All the many-body errors are below 0.5 kcal/mol or 1% of total energy, and below 3% of the many-body contributions. b The radial distribution function for the O–O distance in water. The MD RDF (dotted green) is over-structured compared with the experimental curve, and the presence of NQE (solid green) brings the structure of ARROW H2O in excellent agreement with the experimental one.

Biological systems exist mostly at room temperature and pressure, where the shifting interplay between enthalpy and entropy enables the immense variety of biological phenomena. Therefore, it is the free energies of ensembles that are both the most useful and interesting and also the most difficult to predict correctly, and what we focus on here. For solvation, in addition to capturing the enthalpy of interaction with itself and the solute, a solvent model must also reproduce the entropic effects of pushing aside and reordering molecules to create a cavity for placing the solute. This is especially important for water as it is small, highly polar, and, though called a liquid, is highly structured at room temperature and pressure. In Table 1 we list three bulk properties of our solvents: density, heat of vaporization (Hvap) and the highly informative self-solvation. The values for water agree with experimental values to within 3% or better. Additional proof that our model has captured the free energy of cavity creation in water accurately is that the hydration of anthracene, a large, non-polar molecule, is correct to within 2% (0.1 kcal/mol) (Supplementary Data 1a). The derivative of the system Hamiltonian with respect to the alchemical reaction coordinate (<dH/dλ) for desolvation of anthracene in water and its accumulated statistical errors are shown in Supplementary Fig. 5. The cyclohexane predictions are slightly less accurate for two reasons: 1) it is a larger molecule so per heavy atoms the energetics are actually very good and 2) we designated its atoms to be the same atom type(s) as linear alkanes (unlike those of smaller, strained, cyclic alkanes), which introduces a slight discrepancy with QM energies. Nonetheless, the bulk energetics of cyclohexane are well within our target accuracy of 0.5 kcal/mol.

Table 1 Neat properties and hydration/solvation of water and cyclohexane.

Finally, an excellent measure of how well liquid structure is captured by a model is the radial distribution function (RDF). In Fig. 2b we demonstrate that the ARROW FF reproduces the experimental water oxygen–oxygen (O–O) RDF and, therefore, describes the order of water very well. Additionally, we show that employing eight beads reaches sufficient convergence for the free energies and structural properties (see Supplementary Figs. 7 and 8 and Supplementary Table 4). The agreements for both neat properties (Table 1) and water structure (Fig. 2b) are significantly improved by including NQE13,15,23. Satisfyingly, the small errors in initial model parameterization are not amplified through the chain of model construction and simulation machinery.

Solutes and solvation predictions

We chose representative solutes containing all of the common neutral chemical functional groups: carboxylic acids, alkanes, alkenes, aromatics, aldehydes, ketones, alcohols, amides, esters, thiols, sulfides, disulfides, and heterocycles24. The simulations were performed independently by four groups using their own respective computational resources and architectures, and then averaged. The graphical summaries of the solvation and hydration free energies predictions’ are in Fig. 3a, b, and we list the results for each molecule in Supplementary Data 1a. We also provide the free energy results as reproduced by our collaborators in Supplementary Data 1d. Because aqueous protein and protein–ligand systems are of special importance, and because accurate prediction of solvation and desolvation of amino acids is critical for modeling of these systems25, we highlight the results for neutral amino-acid analogs separately (Fig. 3a inset), see Supplementary Data 1b for raw data. The partition coefficient is a valuable measure of the model’s simultaneous compatibility with both polar (e.g., aqueous) and non-polar (e.g., membranes and proteins) environments which is crucial for describing bio-molecular systems, and we show it in Fig. 3c.

Fig. 3: ARROW force field solvation predictions.
figure 3

a Predicted vs. experimental free energy of hydration for a diverse set of compounds. The straight line is a line of perfect agreement between experimental and theoretical values, and the gray bar is the range of chemical accuracy. The predicted and experimental free energies of hydration for neutral amino acid analogs are inset. b Predicted vs. experimental free energy of solvation in cyclohexane. The molecules here are a subset of those in a because only those with experimental values for CHEX solvation can be included. c H2O/CHEX partition coefficient for the same set as b. The free energy predictions are well within chemical accuracy.

The proper art of simulation26,27 is also essential for obtaining accurate predictions. Accurate treatment of long range electrostatic (e.g., Particle Mesh Ewald28,29) and dispersion27 interactions, proper thermodynamic modeling (thermostats and barostats)30,31, enhanced sampling techniques and the Path Integral formulation of nuclear motion32,33,34 all help to translate the FF-QM agreements to correct free-energy values. Further details are provided in Supplementary methods (Simulation details and protocols). We also provide computational performance of the ARROW FF stack for CPU and CPU+GPU implementations for both classical and path-integral simulations in Supplementary Table 2.

The error (MAE) for the free energies of hydration is 0.2 kcal/mol and for the neutral amino-acid subset is 0.23 kcal/mol. The largest hydration errors seen for o-cresol and 3-methyl-indole are only ~1 kcal/mol. For solvation in cyclohexane the MAE is 0.3 kcal/mol, and for the partition coefficient it is 0.22 log units. These predictions are very good: most are within experimental and simulation uncertainty, and are uniformly correct across a diverse range of chemical groups of varying sizes and interaction strengths.

We recently highlighted the importance of including NQE when modeling alkanes13,35. The results presented in this manuscript suggest that NQE must be taken into account for precision calculations for all molecular systems. We illustrate this in Fig. 4a where we plot the hydration predictions of classical simulations alongside those performed with PIMD. Proper accounting of the quantum nature of nuclear motion systematically shifts the predictions towards the experimental values and improves the prediction error from MAE of 0.78 to 0.2 kcal/mol.

Fig. 4: NQE effect and comparison of hydration predictions.
figure 4

a A visual comparison of the hydration predictions vs. experimental values for PIMD8 vs. classical MD values. The inclusion of NQE systematically improves the predictions and decreases the overall error (MAE) from 0.78 to 0.2 kcal/mol. b A comparison of the hydration free energies to state-of-the art wide coverage Force Fields. The molecules shown include the major functional groups that have been parametrized by all three models and are therefore a subset of those shown in Fig. 3a. The errors (MAE) for GAFF, AMOEBA, and ARROW FF are 0.88, 0.76, and 0.22 kcal/mol, respectively.

Comparison with other force fields

The main advance reported in this paper is three-fold: our model is a wide-coverage force field and simulation stack parameterized exclusively from QM data which produces accurate predictions. It is of interest to gauge the relative performance of ARROW FF to existing wide-coverage state-of-the-art models for prediction accuracy. Most of the QM-parameterized FF’s10,36 are not currently enabled in a simulation stack which produces free energy predictions, so we selected two widely-used empirical models to compare with. One is GAFF6, a representative of the many fixed-charge models, and the other is a polarizable model AMOEBA9,18. To avoid reproduction discrepancies the comparison is made on the available published subset of functional groups and is plotted in Fig. 4b. The MAE’s for this subset are, respectively, 0.88 (GAFF AM1-BCC)37, 0.76 (AMOEBA)9 and 0.22 (ARROW) kcal/mol. A list of molecules and their predicted hydration values for each model is in Supplementary Data 1c. Additionally, in Supplementary Data 1e, Supplementary Table 3, Supplementary Fig. 6, and Supplementary methods (Comparison to Implicit solvent models and Machine learning models) we summarize and discuss the comparative performance of several excellent tools from a variety of methodologies that focus specifically on prediction of solvation energies38,39. In Supplementary Data 1f we also show the QM-FF agreement of ARROW FF on the S22 and S66 datasets as well as a comparison with the same for geometry, frequency, non-covalent force field (GFN-FF)11,39, the MAE’s for such datasets can be found in Supplementary Table 1.

We have shown that a QM-parametrized, physics-based force field embedded in a simulation and analysis stack predicts the free energies of solvation of arbitrary organic molecules to an accuracy better than thermal noise at room temperature (±0.5 kcal/mol). The correspondence from quantum mechanics to ensemble predictions is established via several important links. First, the benchmark QM calculations need to be of sufficient accuracy. Second, the model should provide a faithful description of the QM potential energy surface (PES), which imposes a significant yet computationally manageable level of complexity on the functional form. Third, the established art of molecular ensemble averaging must be performed with care. Finally, the dynamics of sampling the system should account for nuclear quantum effects. The ARROW FF is likely at the limit of complexity feasible for a wide-coverage analytical force field, and so it is satisfying that this model results in excellent prediction of properties in the liquid phase.