The performance of ensemble-based free energy protocols in computing binding affinities to ROS1 kinase

Optimization of binding affinities for compounds to their target protein is a primary objective in drug discovery. Herein we report on a collaborative study that evaluates a set of compounds binding to ROS1 kinase. We use ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) and TIES (thermodynamic integration with enhanced sampling) protocols to rank the binding free energies. The predicted binding free energies from ESMACS simulations show good correlations with experimental data for subsets of the compounds. Consistent binding free energy differences are generated for TIES and ESMACS. Although an unexplained overestimation exists, we obtain excellent statistical rankings across the set of compounds from the TIES protocol, with a Pearson correlation coefficient of 0.90 between calculated and experimental activities.

In recent years, a lot of effort has been invested to develop workflows that simplify and automate the process of free energy calculations 7 , which includes steps to plan, set up and execute simulations, and to analyse the final results. A few automated pipelines have been developed for free energy calculations, including commercial ones like FEP+ 8 and Molecular Operating Environment (MOE) 9 , and non-commercial ones like Amber free energy workflow (FEW) 10 , FESetup 11 , FEPrepare 12 , CHARMM-GUI FEP calculator 13 , to name just a few. Our group also developed an automatic, high throughput workflow to run free energy simulations, known as the binding affinity calculator (BAC) 23 . The workflows have found promising applications in academia as well as industry. FEP+ 8 , for example, is the free energy calculation suite from Schrödinger Inc. that has made a significant impact in the pharmaceutical industry. To obtain reliable and reproducible free energy estimation, we have developed two ensemble-based protocols in the last few years, named ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) 7,14 and TIES (thermodynamic integration with enhanced sampling) 7,15 . ESMACS is an end-point approach based on the molecular mechanics Poisson-Boltzmann surface area method (MMPBSA) 16 , and TIES is an alchemical approach centred on thermodynamic integration (TI). Our group has recently publicly released the comprehensive TIES toolkit (https:// www. ties-servi ce. org/) to automatically setup, execute, and analyse such calculations. We recently undertook a systematic uncertainty quantification (UQ) analysis of molecular simulations and binding free energy estimations, and showed that ensembles are required to acquire robust statistical measures of uncertainty, which address both parametric and stochastic components of MD simulations 17 . The names of our methods emphasise the central importance of the ensemble-based nature of the protocols employed 7,14,18 . Ensemble approaches lead to increased reliability and reproducibility, with tighter control of standard errors revealing the distribution of results that can be obtained 7,[19][20][21][22] . ESMACS and TIES are performed using a binding affinity calculator (BAC) 23 , a computational pipeline designed to automate the end-to-end execution of free energy calculations, and to handle ensemble calculations.
The purpose of the present study is to evaluate the ability of ESMACS and TIES to estimate binding affinities of a set of 150 compounds to the ROS1 protein target. Like many protein kinases, the binding site of ROS1 is well structured, and the binding mode for the scaffold of the congeneric series is clearly defined in the crystal structure. Under such circumstances, it is reasonable to expect that good predictions can be achieved, although rotatable bonds in some of the compounds may pose a challenge for conformational sampling and hence convergence of the predictions.

Methods and simulations
In this section, we first describe the dataset of the compounds and the preparation of the molecular models before explaining the two methodologies used to predict binding affinity values.
Compound set. A set of kinase inhibitor candidates for ROS1 was available from Janssen, which comprises 150 compounds from an internal ROS1 inhibition lead optimization program. The compounds are numbered as per the order in the file provided by Janssen. They are all based on the same chemical scaffold, a tetrahydropyrazolopyrazinone (Fig. 1). The ROS1 activity is provided from a biochemical substrate inhibition assay and a cell proliferation assay. The former is preferred here for comparing with calculated binding free energies. The pIC 50 of the compounds ranged from 5.03 to 8.37, a total of 3.34 log units (no activity was detected for five of the compounds at the highest concentration (10 μM) tested; a pIC 50 of 5.00 was used hereafter). To capture the experimental variability, we extracted all data for repeat bioactivity measurements of the same compound in the same assay. The average standard deviation in pIC 50 for compounds tested at least twice in the ROS1 biochemical  www.nature.com/scientificreports/ assay was 0.12 log units. This is considered an underestimate of the true experimental error which should also include considerations of new compound samples, different assay conditions and laboratories etc. The compounds differ with functional groups added at different positions of the scaffold (Fig. 1), which interact with different regions of the binding pocket 24 . The hinge (hg) position interacts with the adenine region of the pocket, where one hydrogen bond forms with a hinge residue. The functional groups at the hydrophobic (hy) position binds in a hydrophobic pocket. The solvent (s) position interacts with a solvent exposure region, and also partial exposes to the bulk solvent. The atoms at bridge (b) position sit in the middle of the compounds, and interact with a buried region at the bottom of the binding pocket. Modifications at these positions can change the binding affinity and/or the selectivity of the compounds. The compounds can be grouped based on their electrostatic properties (charged and non-charged), or the positions where functional groups are added. The DFG kinase motif is a conserved Asp-Phe-Gly tripeptide at the proximal end of a flexible activation loop (A-loop). Two distinct conformations, DFG-in and DFG-out, are often observed in the kinase structures, which indicate the catalytic states of the kinases. There are two ROS1 X-ray structures in the protein data bank (PDB IDs: 4UXL and 3ZBF), both in an active DFG-in conformation. Janssen have obtained four crystallographic structures. All of them have an inactive DFG-out conformation and are complexed with compounds that share the same scaffold ( Fig. 1).
Model preparation and molecular simulation. Chain A of one of the Janssen structures has been used for the current study, in which ROS1 adopts the DFG-out conformation (Fig. 2). The missing residues in the A-loop are constructed by ModLoop 25 . All water molecules in the pdb file are retained. There are three phosphorylated tyrosine residues in the X-ray structures; they are phosphorylated as well in the simulations. The compounds were docked with the protein using the OpenEye Posit method 26 , drawing on information from the bound ligands to improve pose prediction. All the compounds share the same binding pose at the ROS1 binding site, forming one hydrogen bond with the hinge residue (Fig. 2). The docked structures for all of the compounds have the scaffold well aligned with that in the X-ray structure. They are used as the initial structures for ESMACS simulations. For each of the hybrid compounds in the TIES simulations, the docked position for one of the two compounds is used while additional atoms from another compound are placed accordingly by TIES20 27 (https:// ccs-ties. org) to retain their original conformation. No obvious steric clashes are detected between the hybrid compounds and the protein in any of the initial TIES models.
Preparation and setup of the simulations were implemented using the BAC (binding affinity calculator) 23 , including parametrization of the compounds, solvation of the complexes, electrostatic neutralization of the systems by adding counterions and generation of configurations files for the simulations. Protein parameters were taken from the standard Amber force field for bioorganic systems (ff14SB) 28 . Ligand parameterizations were produced using the general Amber force field 2 (GAFF2). The partial charges of the compounds were generated using AM1-BCC method which has been suggested to be preferable compared with the RESP method 22 .
All ligand-protein complexes were solvated in orthorhombic water boxes with a minimum extension from the protein of 14 Å. The protonation states of the residues were assigned using the "reduce" module of AmberTools 28 . Counterions were added to electrostatically neutralize the systems.

ESMACS.
We use the ESMACS 14 protocol for the free energy calculations. We also used TIES 15 for the prediction of relative free energies, described later. The protocols use ensembles of replicas to obtain reproducible binding affinity estimates with robust uncertainty estimates. ESMACS is based on MMPBSA calculations but incorporates a variety of sampling approaches and entropic calculations 29,30 . Here conformations of the complex, receptor and compound are all extracted from simulation of the complex, a protocol termed as 1traj 31 . This www.nature.com/scientificreports/ is the commonly used approach for the end-point free energy methods. It achieves good convergence because of the cancellation between the noisy terms of the internal energies of the ligand, receptor and complex. Upon binding, however, conformational changes occur for both protein and ligands, which associate with adaptation free energies 14 . They are the energy differences between the free state and the bound state. Studies have shown that the inclusion of adaptation energies clearly improves the predictions of binding free energy ranking for some molecular systems 14,29,32 , while the situation may be more complicated for others 30,31 . The entropic contributions are commonly calculated by normal mode analysis. The computational cost and numerical instability of normal mode approach 33 , however, have motivated alternatives for the entropy calculations, including interaction entropy 34 , weighted solvent accessible surface area 35 , and external entropy correction 33 . Although the inclusion of entropic contribution can bring the estimated free energies closer to more physically realistic values, it fails to improve correlations in most cases 30,31,36,37 . For a rational drug development project, the correct ranking of binding affinities is more important for the selection of compounds for further investigation. In the current study, the entropic contribution is omitted in the ESMACS free energy calculations. The binding free energies of multiple compounds are investigated to the same protein receptor. The adaptation energy can be calculated in relative terms using the average energy of the protein 29 , which is then incorporated into 1traj, a protocol designated as 1traj-ar 31 .
Ensembles of NAMD 38 simulations with 25 replicas were performed, with different initial velocities drawn independently from a Maxwell-Boltzmann distribution. A series of equilibration runs, totalling 2 ns, were conducted, with the restraints on heavy atoms gradually removed. Finally, 4 ns production simulations were run for each replica for all ESMACS and TIES simulations. For the ESMACS free energy calculations, the conformations of the protein and the ligands were extracted from the complex simulations.

TIES.
We use TIES 15 to predict free energy differences for two groups of compound pairs, using compounds L30 and L2 as the reference structures. The two compounds differ by an isopropyl group which extends into a hydrophobic binding pocket (Fig. 2b). TIES20 27 is used to identify the maximum common substructure and to generate the topology and coordinate files for the hybrid compounds. A total of 103 compound pairs have been constructed using the two reference compounds. The compound pairs are chosen so that the atoms in the reference compounds are largely in the common region, with no more than 5 atoms in the alchemical region from each of the compounds. The standard TIES protocol 15, 27 is used, in which an ensemble of 5 replicas is used for each pair of the compounds. The TIES toolkit (https:// ccs-ties. org/) has been used to automate all of these steps for this study.
While the first group mainly focuses on the differences at the hydrophobic site, the compounds in the second group share the same isopropyl group at the hydrophobic site but differ at other functional site(s) (Fig. 1). Compound L2 is in both of the groups, and is one of the best binders in the first group. The grouping indeed represents the common lead optimisation steps: optimising one functional position of the lead to find more promising compounds, and then adding/changing functional groups at other positions.
The majority of TIES simulations-74 out of 103-are performed with NAMD. The remaining 29 pairs are studied with OpenMM 39 . The two MD engines were used to take advantage of the architectures of different supercomputers. We use NAMD on CPU-based ARCHER2 (https:// www. arche r2. ac. uk/) and SuperMUC-NG (https:// doku. lrz. de/ displ ay/ PUBLIC/ Super MUC-NG), and use OpenMM on the GPU partition of Summit (https:// www. olcf. ornl. gov/ summit/). Our previous studies have shown that, for the same molecular systems, consistent free energies are obtained not only from different supercomputers using the same MD engine-NAMD in this case 32 , but also from different MD engines on different supercomputers 21, 40 -NAMD 38 running on CPUs and OpenMM 39 or pmemdGTI 41 running on GPUs.

ESMACS results.
A moderate correlation, r = 0.33, is obtained for the entire set of the compounds (Fig. 3).
The predicted binding free energies appear to be more negative for the charged compounds. These compounds have no obvious differences with the electrostatically neutral ones in their structural and dynamic behaviours. The inclusion of receptor adaptation energy in the 1traj-ar approach causes the calculated binding free energies to be scattered more widely, with a reduced correlation coefficient (r = 0.22).
The scaffold is aligned well for all the compounds in their initial structures; it has the same pose as shown in Fig. 2. For the compounds with similar experimental binding affinities, the free energies of the charged ones are more favourably predicted than the electrostatically neutral ones (Fig. 3). A good correlation is obtained for the charged compounds, with a correlation coefficient of 0.84. This is largely due to the correct predictions of the ranking for the two compounds with less favourable binding affinities (Fig. 3). The correlation coefficient exhibits no improvements when the charged compounds are excluded, having the same correlation coefficient of 0.33 but a reduced covariance from 1.37 to 1.11 kcal/mol. The weak correlation is due to the wide spread of the predictions for the compounds with less favourable binding affinities. More negative predictions are made for some of these compounds (Fig. 3), which are caused by more negative van der Waals contributions. It should be noted that, because of the exclusion of entropic contributions, the predicted binding free energies are much more negative than the experimental measurements. The inclusion of entropy terms typically reduces errors but does not improve the correlations, as noted in previous studies 30,31,36,37 .
Based on the positions and the chemical properties of the modifications made to the scaffold (Fig. 1, Table 1), the compounds are clustered into different groups. The grouping is based on the following categories: (1) ether variation at position hy in Fig. 1, (2 Table 1), indicating that long simulations may be needed to fully sample the conformational changes. The performance of the predictions for different subgroups is in agreement with the criteria we have proposed 42 for the quality of predictions, including how well a compound fits into the binding pocket and how many rotamers a compound may manifest. While the inclusion of adaptation energies does not improve the overall correlation (Fig. 3b) (1)(2)(3)(4) or combinations of these categories. 14, for example, is a subgroup of compounds with modifications of (1) ether variation and (4) lactam substituent variation. "1 + 14" is a subset of compounds from subgroups 1 and 14. L30 is labelled as the only member in subgroup 0. Only electrostatically neutral compounds are included. www.nature.com/scientificreports/ the ensemble approach is employed. Here we evaluate the performance of TIES protocols in reproducing the experimental data for the dataset.
Comparison of predictions with experimental data. A very good correlation, with a correlation coefficient of 0.90, is obtained between the calculated free energy differences and the experimental data (Fig. 5a). In the subgroup with compound L30 as the reference, most of the predicted binding free energy differences are negative, indicating that the modifications to the compound are likely to improve the binding potencies (ΔΔG < 0). The calculations correctly predict compound L2 (Fig. 1) as one of the best binders. As L2 already has a very favourable binding free energy (−10.49 kcal/mol), it is not surprising that most of the modifications from it deteriorate the binding affinities (ΔΔG > 0). For all of the 103 compound pairs, 94% of the TIES predictions agree directionally with the experimental observations, that is, the calculated binding free energy differences have the same sign as those from experimental measurements. The relative binding free energy differences from TIES agree well with those from ESMACS (Fig. 5c), with a correlation coefficient of 0.62. The predicted ΔΔG values for the two subgroups, paired with L30 and L2, are clearly distinct from both of the protocols. The predicted ΔΔG results from ESMACS also have a good correlation  www.nature.com/scientificreports/ with the experimental data, with a correlation coefficient of 0.67 (Fig. 5b). We should again point out that because of the nature of the approximations used, such as use of an implicit solvent model and the neglect of the conformational entropy contributions, the ESMACS approach does not provide accurate absolute (Fig. 3) or relative (Fig. 5b) binding free energies. However, it often yields precise binding affinity rankings. As an alchemical approach, TIES is in principle both accurate and precise in its domain of applicability. The calculated binding free energies are therefore directly comparable with those from experimental measurements, as evidenced by the same scales of the axes in Fig. 5a. It should be noted, however, that an apparent overestimation is observed for some of the compound pairs involving L30, making the predicted free energy differences larger than those from experiments. An MSE of −1.36 kcal/mol is obtained for the L30 subgroup, compared with an MSE of 0.16 kcal/mol for the subgroup with compound L2 as the reference. The apparent overestimations are investigated further in the following sections.
Overestimation from closed cycle mutations. There are 17 compounds which are paired with both L30 and L2 (Fig. 6). This makes it possible to check the performance of the cycle closure method. For the transformations starting from L2 (L2 → L* in Fig. 6a), the average free energy change from TIES calculations agrees well with the experimental data (Fig. 6b). The transformation L30 → L2, however, overestimates the difference by 1.28 kcal/ mol. The two-step transformation for all 17 compounds (L30 → L2 → L*) generates an averaged overestimation of 1.17 kcal/mol, similar to that observed in L30 → L2 transformation. Compounds L* and L2 (Fig. 6) have the same/similar alkyl isopropyl substituent at the hydrophobic position (Fig. 1), while L30 does not; transformations L30 → L* therefore share the same change as that in L30 → L2 transformation. There is a similar overestimation in L30 → L* (−0.99 kcal/mol) as that in L30 → L2, which is likely to be rooted in the same alchemical change of growing an alkyl group from L30.
The legs L30 → L2, L2 → L* and L* → L30 generate an average difference of 0.17 kcal/mol within the closed cycle (L30 → L2 → L* → L30, Fig. 6) between the calculations and the experiments. As we have stated previously 40 , a hysteresis value of 0 is a necessary but not sufficient condition for convergence of predictions. An overall hysteresis near 0 for a closed cycle is not necessarily an indication of convergence for each leg; it may be a result of cancellation from individual legs. It is understandable that extending an alkyl group into a hydrophobic binding pocket (Fig. 2b) makes the binding more favourable, about which TIES and experiment agree with each other directionally. It is likely that the force field and/or the sampling contribute to the differences between calculations and experimental measurements. All simulations were initiated from the same X-ray structure in which an isopropyl group is present in the bound ligand. The removal of the group (Fig. 1) would be expected to induce some conformational collapse or solvation of the hydrophobic subpocket (Fig. 2). The subpocket is formed by a few hydrophobic residues including Leu2000, Phe2004, Ile2009, Leu2070, Phe2075, His2077 and Ile2100. The root-mean-square deviations of these residues from the X-ray structure show no overall differences for the simulations of L2 and L30. Therefore, despite the variability in the ligand structure, no significant changes are observed in the shape and size of the subpocket from the simulations of the two ligands. This indicates that the current sampling regimen does not permit protein conformational adaptations for ligands without the isopropyl group in the ESMACS and TIES simulations. This could explain the predicted less favourable binding and hence larger differences in the binding free energies (Figs. 5a, 6) between the ligands with and without the isopropyl group. It is unlikely though that ROS1 needs to adopt a DFG-in conformation to accommodate the compounds containing the -CH group, as the DFG motif directly interacts with the scaffold of the compounds and retains them in the same binding mode (Fig. 2).
Ligand pairs sharing the common change. 50 pairs have been identified which involve a common variation between the two compounds in the pairs, changing from a -CH group to -C-CH-(CH 3 ) 2 at the hydrophobic site. This is the same change as that between L30 to L2 (Fig. 1). Simulations for the pair get an overestimation of 1.28 kcal/mol for the binding free energy difference: −4.16 kcal/mol from TIES vs −2.88 kcal/mol from experiment. For these 50 pairs, an averaged binding free energy difference of −3.82 kcal/mol is obtained from TIES calculations, while an average of −2.42 kcal/mol from experimental data. A recent study has proposed a combi- www.nature.com/scientificreports/ nation of machine learning (ML) and alchemical free energy calculation to improve the accuracy of free energy predictions 43 . The ML method is used to learn the differences between the calculations and the experimental data and derives a correct term which brings the two closer. Such an approach may identify and subsequently correct the discrepancies as observed here. It should be noted that the ML approach used in the study 43 only learns the differences between the calculations and experiments, with no attempt to identify the source of the discrepancies. Studies have shown that a large set of typical chemical modifications lead to a nearly normal distribution of binding energy differences 44 . For individual modifications, however, a wide and non-Gaussian distribution has been observed for experimental binding free energies from a large number of independent measurements 42 . Such non-normality has also been reported in numerous ensemble simulations when large sample sizes are used 20,22,27 . These observations indicate that the assumption of Gaussian statistics may not be justified 20,42 . If, on average, these 50 compound pairs have the same discrepancy as that in L30 → L2 transformation, the systematic differences might be collectively "corrected". If we apply an offset and shift up all of the TIES results by 1.28 kcal/ mol, the calculations agree well with the experimental data, with an MSE reducing from −1.40 to −0.12 kcal/mol (Fig. 7). As stated above, this offset could be associated with incomplete sampling of the protein conformational movement that is required to fully accommodate the change from a -C-CH-(CH 3 ) 2 group, which is present in the initial X-ray structures, to a -CH group in the region of the A-loop.

Conclusion
Using the TIES and ESMACS protocols, we have computed the binding free energies of a series of ligands interacting with ROS1 kinase. Good correlations are obtained from our ESMACS study for subsets of the compounds. An excellent statistical ranking is generated from the TIES study, with a correlation of 0.90 between the predictions and the experimental measurements. Moreover, TIES and ESMACS agree with one another in the prediction of free energy differences. An unexplained overestimation is observed in the TIES study for one subset of compound pairs which share a common difference between the pairs. For the transformations to which cycle closure methods can be applied for the error estimation, a cancellation has been observed as the errors are similar for specific legs within the cycle. This results in an overall hysteresis near 0 for a closed cycle, but significant errors remain for individual legs. While continuing improvements in the force fields and the application of statistically robust ensemble methods are essential to generate increased accuracy for the physics-based alchemical approaches, machine learning methods may help to identify discrepancies between the predictions and experimental data and potentially reduce those differences, providing sufficient quantities of reliable training data are available that incorporate information on the way such data are distributed. Despite the existence of this systematic discrepancy, an excellent correlation is obtained in the free energy differences between the calculations and the experiments, showing once again that the TIES approach can be used in prospective drug discovery projects to predict the changes of binding free energies in the lead optimisation process 29, 32, 37 .

Data availability
The structures for the ROS1 protein and the compounds, and Amber format topologies and PDB-format coordinates for the compound pairs, are freely available at http:// doi. org/ 10. 23728/ b2sha re. 89da4 77f82 8048d d8332 72b51 360d1 3a.  Original and "corrected" binding free energies from TIES calculations, compared with the experimental data. The correction is made by assuming a systematic overestimation is introduced by -CH to -C-CH-(CH 3 ) 2 transformation which is involved in all of these pairs.