Introduction

Classical molecular dynamics (MD) simulations in principle allow us to describe biomolecular processes in atomistic detail1. Prime examples include the study of protein complex formation2 and protein–ligand binding and unbinding3,4, which constitute key steps in biomolecular function. Apart from structural analysis, the prediction of kinetic properties has recently become of interest, since optimized ligand binding and unbinding kinetics have been linked to an improved drug efficacy5,6,7,8,9. Since these processes typically occur on timescales from milliseconds to hours, however, they are out of reach for unbiased all-atom MD simulations which currently reach microsecond timescales. To account for rare biomolecular processes, a number of enhanced sampling techniques10,11,12,13,14,15,16,17,18 have been proposed. These approaches all entail the application of a bias to the system in order to enforce motion along a usually one-dimensional reaction coordinate x, such as the protein–ligand distance.

While the majority of the above methods focuses on the calculation of the stationary free energy profile ΔG(x), several approaches have recently been suggested that combine enhanced sampling with a reconstruction of the dynamics of the process19,20,21. In this vein, we recently proposed dissipation-corrected targeted MD (dcTMD), which exerts a pulling force on the system along reaction coordinate x via a moving distance constraint22. By combining a Langevin equation analysis with a cumulant expansion of Jarzynski’s equality23, dcTMD yields both ΔG(x) and the friction field Γ(x). Reflecting interactions with degrees of freedom orthogonal to those which define the free energy, the friction accounts for the dynamical aspects of the considered process. In this work, we go one step further and use ΔG(x) and Γ(x) to run Langevin simulations, which describe the coarse-grained dynamics along the reaction coordinate and reveal timescales and mechanisms of the considered process. Moreover, we introduce the concept of “temperature boosting” of the Langevin equation, which allows us to speed up the calculations by several orders of magnitude in order to reach biologically relevant timescales.

Results

Dissipation-corrected targeted molecular dynamics

To set the stage, we briefly review the working equations of dcTMD derived in22. TMD as developed by Schlitter et al.24 uses a constraint force fc that results in a moving distance constraint x = x0 + vct with a constant velocity vc. The main assumption underlying dcTMD is that this nonequilibrium process can be described by a memory-free Langevin equation1,

$$m\ddot{x}(t)=-\frac{{\rm{d}}G}{{\rm{d}}x}-\varGamma (x)\dot{x}+\sqrt{2k_{\rm{B}}T\varGamma (x)}\ \xi (t)+{f}_{{\rm{c}}}(t),$$
(1)

which contains the Newtonian force  −dG/dx, the friction force \(-\varGamma (x)\dot{x}\), as well as a stochastic force with white noise ξ(t), that is assumed to be of zero mean, 〈ξ〉 = 0, delta-correlated, \(\langle \xi (t)\xi (t^{\prime} )\rangle =\delta (t-t^{\prime} )\), and Gaussian distributed. Since the constraint force fc imposes a constant velocity on the system (\(\dot{x}={v}_{{\rm{c}}}\)), the total force \(m\ddot{x}\) vanishes. Performing an ensemble average 〈…〉 of Eq. (1) over many TMD runs, we thus obtain the relation22

$$\Delta G(x)=\langle W(x)\rangle -{v}_{{\rm{c}}}\int_{{x}_{0}}^{x}\varGamma (x^{\prime} )\ {\rm{d}}x^{\prime} .$$
(2)

Here the first term \(\langle W(x)\rangle =\int_{{x}_{0}}^{x}\langle {f}_{{\rm{c}}}(x^{\prime} )\rangle \ {\rm{d}}x^{\prime}\) represents the averaged external work performed on the system, and the second term corresponds to the dissipated work Wdiss(x) of the process expressed in terms of the friction Γ(x).

While the friction in principle can be calculated in various ways25,26, it proves advantageous to invoke Jarzynski’s identity23, \(e^{-\Delta G(x)/k_{\mathrm{B}}T}=\langle e^{-W(x)/k_{\mathrm{B}}T}\rangle\), which allows us to calculate Γ(x) directly from TMD simulations. To circumvent convergence problems associated with the above exponential average27, we perform a second-order cumulant expansion which gives Eq. (2) with \({W}_{{\rm{diss}}}(x)=\left\langle \delta {W}^{2}(x)\right\rangle /k_{\rm{B}}T\). Expressing work fluctuations δW in terms of the fluctuating force δfc, we obtain for the friction22

$$\varGamma (x)=\frac{1}{k_{\rm{B}}T}\int_{{t}_{0}}^{t(x)}\left\langle \delta {f}_{c}(t)\delta {f}_{c}(t^{\prime} )\right\rangle {\rm{d}}t^{\prime} ,$$
(3)

which is readily evaluated directly from the TMD simulations.

As discussed in ref. 22, the derivation of Langevin Eq. (1) assumes that the pulling speed vc is slow compared to the timescale of the bath fluctuations, such that the effect of fc can be considered as a slow adiabatic change28. This means that the free energy Eq. (2) and the friction Eq. (3) determined by the nonequilibrium TMD simulations correspond to their equilibrium results. As a consequence, we can use ΔG(x) and Γ(x) to describe the unbiased motion of the system via Langevin Eq. (1) for fc = 0. Numerical propagation of the unbiased Langevin equation then accounts for the coarse-grained dynamics of the system. In this way, calculations of ΔG(x) and Γ(x) as well as dynamical calculations are based on the same theoretical footing (i.e., the Langevin equation), and are therefore expected to yield a consistent estimation of the timescales of the considered process. Moreover, the exact solution of the Langevin equation allows us to directly use the computed fields ΔG(x) and Γ(x) and thus to avoid further approximations29.

The theory developed above rests on two main assumptions. For one, we have assumed that the Langevin Eq. (1) provides an appropriate description of nonequilibrium TMD simulations, and applies as well to the unbiased motion (fc = 0) of the system. This means that, due to a timescale separation of slow pulling speed and fast bath fluctuations, the constraint force fc enters this equation merely as an additive term. Secondly, to ensure rapid convergence of Jarzynski’s identity, we have invoked a cumulant expansion to derive the friction coefficient in Eq. (3), which is valid under the assumption that the distribution of the work is Gaussian within the ensemble. While this assumption may break down if the system of interest follows multiple reaction paths, we have recently shown that we can systematically perform a separation of dcTMD trajectories according to pathways by a nonequilibrium principal component analysis of protein–ligand contacts30. This approach bears similarities with the work of Tiwary et al. for the construction of path collective variables31. Alternatively, path separation can be based on geometric distances between individual trajectories, making use of the NeighborNet algorithm32. Details on the convergence of the free energy and friction estimates, the path separation, and the choice of the pulling velocity are given in the Supplementary Methods and in Supplementary Figs. 14.

T-boosting

The speed-up of Langevin Eq. (1) compared to an unbiased all-atom MD simulation is due to the drastic coarse graining of the Langevin model (one instead of 3N degrees of freedom, N being the number of all atoms). Since the numerical integration of the Langevin equation typically requires a time step of a few femtoseconds (see Supplementary Table 1), however, we still need to propagate Eq. (1) for 100 × 1015 steps to sufficiently sample a process occurring on a timescale of seconds, which is prohibitive for standard computing resources.

As a further way to speed up calculations, we note that the temperature T enters Eq. (1) via the stochastic force, indicating that temperature is the driving force of the Langevin dynamics. That is, when we consider a process described by a transition rate k and increase the temperature from T1 to T2, the corresponding rates k1 and k2 are related by the Kramers-type expression29

$${k}_{2}={k}_{1}{{\rm{e}}}^{-\Delta {G}^{\ne }({\beta }_{2}-{\beta }_{1})},$$
(4)

where ΔG denotes the transition state energy and βi = 1/kBTi is the inverse temperature. Hence, by increasing the temperature we also increase the number n of observed transition events according to n2/n1 = k2/k1.

To exploit this relationship for dcTMD, we proceed as follows. First we employ dcTMD to calculate the Langevin fields ΔG(x) and Γ(x) at a temperature of interest T1. Using these fields, we then run a Langevin simulation at some higher temperature T2, which results in an increased transition rate k2 and number of events n2. In particular, we choose a temperature high enough to sample a sufficient number of events (N 100) for some given simulation length. In the final step, we use Eq. (4) to calculate the transition rate k1 at the desired temperature T1.

As Eq. (4) arises as a consequence29 of Langevin Eq. (1), the above described procedure, henceforth termed T-boosting, involves no further approximations. It exploits the fact that we calculate fields ΔG(x) and Γ(x) at the same temperature for which we eventually want to calculate the rate. We wish to stress that this virtue represents a crucial difference to temperature accelerated MD33. In the latter method the free energy ΔG(x) is first calculated at a high temperature and subsequently rescaled to a desired low temperature, whereupon ΔG(x) in general does change. T-boosting avoids this, because by using dcTMD we calculate ΔG(x) right away at the desired temperature. We note in passing that a Langevin simulation run at T2 using fields obtained at T1 in general does not reflect the coarse-grained dynamics of an MD simulation run at T2, but can only be used to recover k1 from k2.

In practice, we perform T-boosting calculations at several temperatures T2 in increments of 25 K to 50 K and choose the smallest T2 such that N 100 transitions occur. In the Supporting Methods we derive an analytic expression of the extrapolation error as a function of boosting temperatures and achieved number of transitions, from which the necessary length of the individual Langevin simulations can be estimated, in order to achieve a desired extrapolation error. One-dimensional Langevin simulations require little computational effort (1 ms of simulation time at a 5 fs time step take ~6 h of wall-clock time on a single CPU) and are trivial to parallelize in the form of independent short runs. Hence the extrapolation error due to boosting can easily be pushed below 10% and is thus negligible in comparison to systematic errors coming from the dcTMD field estimates. As shown in Supplementary Table 1, a further increase in efficiency can be achieved if the considered dynamics is overdamped, which is the case for both protein–ligand systems. Since overdamped dynamics neglects the inertia term \(m\ddot{x}\) and therefore does not depend on the mass m, we may artificially enhance the mass in the Langevin simulations. For the protein–ligand systems, this allows us to increase the integration time step from 1 to 10 fs, i.e., a speed-up of an order of magnitude.

Ion dissociation of NaCl in water

To illustrate the above developed theoretical concepts and test the validity of the underlying approximations, we first consider sodium chloride in water as a simple yet nontrivial model system. For this system, detailed dcTMD as well as long unbiased MD simulations are available22, making it a suitable benchmark system for our approach. Fig. 1a shows the free energy profiles ΔG(x) along the interionic distance x, whose first maximum at x ≈ 0.4 nm corresponds to the binding-unbinding transition of the two ions. The second smaller maximum at x ≈ 0.6 nm reflects the transition from a common to two separate hydration shells34. We find that results for ΔG(x) obtained from a 1 μs long unbiased MD trajectory and from dcTMD simulations (1000 × 1 ns runs with vc = 1 m/s) match perfectly. Since the average work 〈W(x)〉 of the nonequilibrium simulations is seen to significantly overestimate the free energy at large distances, the dissipation correction Wdiss in Eq. (2) is obviously of importance. Fig. 1b shows the underlying friction profile Γ(x) obtained from dcTMD, which in part deviates from the lineshape of the free energy. While we also find a maximum at x ≈ 0.4 nm, the behavior of Γ(x) is remarkably different for larger distances 0.5 x 0.7 nm, where a region of elevated friction can be found before dropping to lower values. Interestingly, these features of Γ(x) match well the changes of the average number of water molecules bridging both ions34. This indicates that the increased friction in Eq. (3) is mainly caused by force fluctuations associated with the build-up of a hydration shell22. For x 0.8 nm, the friction is constant within our signal-to-noise resolution. The dynamics of ion dissociation and association can be described by their mean waiting times and corresponding rates shown in Fig. 2a and Table 1. For the chosen force field, ion concentration and resulting effective simulation box size, the unbiased MD simulation at 293 K yields mean dissociation and association times of τD = 1/kD = 120 ps and \({\tau }_{{\rm{A}}}=1/\left({k}_{{\rm{A}}}C\right)=850\) ps, respectively, where C denotes a reference concentration (see the Supplementary Methods for details). Using fields ΔG(x) and Γ(x) obtained from TMD, the numerical integration of Langevin Eq. (1) for 1 μs results in τD = 420 ps and τA = 3040 ps. While the dissociation constants KD = kD/kA = 1.5 M from Langevin and MD simulations match perfectly, we find that the Langevin predictions overestimate the correct rates by a factor of ~3.4. The latter may be caused by various issues. For one, to be of practical use, the Langevin model was deliberately kept quite simple. For example, it does not include an explicit solvent coordinate34,35, but accounts for the complex dynamics of the solvent merely through the friction field Γ(x). Moreover, we note that the calculation of Γ(x) via Eq. (3) uses constraints, which have the effect of increasing the effective friction36. This finding is supported by calculations using the data-driven Langevin approach37,38, which estimates friction coefficients based on unbiased MD simulations that are consistantly smaller than the ones obtained from dcTMD (Supplementary Fig. 5). Considering the simplicity of the Langevin model and the approximate calculation of the friction coefficient by dcTMD, overall we are content with a factor ~3 deviation of the predicted kinetics.

Fig. 1: Dissociation of NaCl in water.
figure 1

a Free energy profiles ΔG(x) along the interionic distance x, obtained from a 1 μs long unbiased MD trajectory at 293 K (orange line) and 1000 × 1 ns TMD runs (blue line). Error bars are given in Supplementary Fig. 2. Also shown is the average work 〈W(x)〉 calculated from the TMD simulations (dashed black line). b Friction profile Γ(x) (red) obtained from dcTMD after Gaussian smoothing together with the average number of water molecules (black), that connect the Na+ and Cl ions in a common hydration shell34.

Fig. 2: Prediction of binding and unbinding times.
figure 2

Mean binding (red) and unbinding (blue) times, drawn as a function of the inverse temperature, obtained from T-boosted Langevin simulations of a solvated NaCl, b the trypsin-benzamidine complex, and c the Hsp90-inhibitor complex. Dashed lines represent fits (R2 = 0.90−0.99) to Eq. (4), crosses (binding in grey, unbinding in black) indicate reference results from a unbiased MD simulation22 and b, c experiment39,48.

Table 1 Predicted binding and unbinding kinetics.

To illustrate the validity of the T-boosting approach suggested above, we performed a series of Langevin simulations for eight temperatures ranging from 290 to 420 K and plotted the resulting dissociation and association times as a function of the inverse temperature (Fig. 2a and Table 1). Checking the consistency of our approach, a fit to Eq. (4) yields transition state free energies ΔG of 13 and 12 kJ/mol for ion dissociation and association, respectively, which agree well with barrier heights of the free energy profile in Fig. 1a. Moreover, dissociation and association times obtained from the extrapolated T-boosted Langevin simulations (τD = 370 ps, τA = 3050 ps) agree excellently with the directly calculated values. This indicates that high-temperature Langevin simulations can indeed be extrapolated to obtain low-temperature transition rates.

Trypsin-benzamidine

Let us now consider the prediction of free energies, friction profiles and kinetics in protein–ligand systems. The first system we focus on is the inhibitor benzamidine bound to trypsin39,40,41, which represents a well-established model problem to test enhanced sampling techniques21,31,42,43,44,45. The slowest dynamics in this system is found in the unbinding process, which occurs on a scale of milliseconds39. To capture the kinetics of the unbinding process, so far Markov state models42,43, metadynamics31, Brownian dynamics44 and adaptive enhanced sampling methods21,45 have been employed.

Here we combined dcTMD simulations and a subsequent nonequilibrium principal component analysis30 to identify the dominant dissociation pathways of ligands during unbinding from their host proteins (see Supplementary Methods). Fig. 3 shows TMD snapshots of the structural evolution along this pathway, its free energy profile ΔG(x), and the associated friction Γ(x). Starting from the bound state (x1 = 0 nm), ΔG(x) exhibits a single maximum at x2 ≈ 0.46 nm, before it reaches the dissociated state for xx4 = 0.75 nm. In line with the findings of Tiwary et al.31, the maximum of ΔG(x) reflects the rupture of the Asp189-benzamidine salt bridge, which represents the most important contact of the bound ligand. Following right after, the friction profile Γ(x) reaches its maximum at x3 ≈ 0.54 nm, where the charged side chain of benzamidine becomes hydrated with water molecules. Similarly to NaCl, the friction peak coincides with the increase in the average number of hydrogen bonds between benzamidine and bulk water. The peak in friction is slightly shifted to higher x, because the ligand acts as a plug for the binding site, and first needs to be (at least partially) removed in order to allow water flowing in. As for the dissociation of NaCl in water, enhanced friction during unbinding appears to be directly linked to a rearrangement of the protein–ligand hydration shell, which is in agreement with recent results from neutron crystallography41.

Fig. 3: Unbinding of benzamidine from trypsin.
figure 3

a TMD snapshots of the structural evolution in trypsin along the dominant dissociation pathway, showing protein surface in gray, benzamidine as van der Waals spheres, Asp189 and water molecules as sticks. Benzamidine is bound to the protein in a cleft of the protein surface via a bidental salt bridge to Asp189. dcTMD calculations of b free energy ΔG(x), and c (Gaussian smoothed) friction Γ(x) together with the mean number of hydrogen bonds between benzamidine and water. Highlighted are the bound state 1, transition state 2, the state with maximal friction 3 and the unbound state 4. Error bars of free energy and friction estimates are given in Supplementary Fig. 2.

To calculate rates kon and koff describing the binding and unbinding of benzamidine from trypsin, we performed 10 ms long Langevin simulations along the dominant pathways at thirteen temperatures ranging from 380–900 K. As shown in Fig. 2b and Table 1, the resulting rates are well fitted (R2 ≥ 0.90) by the T-boosting expression in Eq. (4). Representing the resulting number of transitions as a function of the inverse temperature, we find that at 380 K only ~9 events happen during a millisecond. That is, to obtain statistically converged rates at 290 K would require Langevin simulations at 290 K on a timescale of seconds. Using temperature boosting with Eq. (4), on the other hand, our high-temperature millisecond Langevin simulations readily yield converged transition rates at 290 K (see Fig. 2b and Table 1), that is, kon = 8.7 × 106 s−1 M−1 and koff = 2.7 × 102 s−1, which underestimate the experimental values39kon = 2.9 × 107 s−1 M−1 and koff = 6.0 × 102 s−1 by a factor of 2–3. Similarly, the calculated KD overestimates the experimental result39 of KD = 2.1 × 10−5 M by a factor of ~1.5. As indicated by a recent review3 comparing numerous computational methods to calculate (un)binding rates of trypsin-benzamidine, our approach compares quite favorably regarding accuracy and computational effort.

As the extrapolation error due to T-boosting is negligible (see Supplementary Methods), the observed error is mainly caused by the approximate calculation of free energy and friction fields by dcTMD. In the case of NaCl, we have shown that reliable estimates of the fields (with errors 1 kBT) require an ensemble of at least 500 simulations (see ref. 22 and Supplementary Fig. 2), although the means of ΔG and Γ appear to converge already for  ~100 trajectories. In a similar vein, by performing a Jackknife “leave-one-out” analysis46, for trypsin-benzamidine we obtain an error of  ~2 kBT for 150 trajectories (Supplementary Fig. 2). Interestingly, the error of the main free energy barrier is typically comparatively small, because the friction and thus variance of W increase directly after the barrier. As a consequence, the sampling error of koff is small compared to that of kon and the binding free energy. We note that if the experimental binding affinity KD is known, it can be used as a further constraint on the error of the free energy and friction fields.

Hsp90-inhibitor

The second investigated protein complex is the N-terminal domain of heat shock protein 90 (Hsp90) bound to a resorcinol scaffold-based inhibitor (1j in ref. 47). This protein has recently been established as a test system for investigating the molecular effects influencing binding kinetics47,48,49,50, and the selected inhibitor unbinds on a scale of half a minute. From the overall appearance of free energy and friction profiles (Fig. 4), we observe clear similarities to the case of trypsin-benzamidine. That is, the main transition barrier is also found at x2 ≈ 0.5 nm, which stems from the ligand pushing between two helices at this point in order to escape the binding site. Moreover, the friction peaks at x2 ≈ 0.5 nm, as well, but with an additional shoulder at x3 ≈ 0.8 nm, which again coincides with changes of the ligand’s hydration shell. The unbound state is reached after x 1.0 nm. We note that the ligand is again bound to the protein via a hydrogen bond to an aspartate (Asp93) and at a position that is open to the bulk water.

Fig. 4: Unbinding of an inhibitor from the N-terminal domain of Hsp90.
figure 4

a Structural evolution along the dissociation pathway in Hsp90, showing protein surface in gray, inhibitor as van der Waals spheres, Asp93 and water molecules as sticks. The inhibitor is bound to the protein in a cleft of the protein surface via a hydrogen bond to Asp93. dcTMD calculations of b free energy ΔG(x) and, c (Gaussian-smoothed) friction Γ(x) together with the mean number of hydrogen bonds between inhibitor and water. Highlighted are the bound state 1, transition state and state with maximal friction 2, an additional state with increased friction 3 and the unbound state 4. Error bars of free energy and friction estimates are given in Supplementary Fig. 2. Fluctuations of Γ(x) for x 1 nm are due to noise. Color code as in Fig. 3.

To calculate rates kon and koff, we again performed 5 ms long Langevin simulations along the dissociation pathway at fourteen different temperatures ranging from 700–1350 K. Rate prediction (see Fig. 2c and Table 1) yields kon =  9.0 × 104 s−1 M−1 and koff = 1.6 × 10−3 s−1, and underestimates the experimental48 values kon = 4.8 ± 0.2 × 105 s−1 M−1 and koff = 3.4 ± 0.2 × 10−2 s−1 by a factor of 5–20. The resulting value for KD = 1.8 × 10−8 M underestimates the experimental value48 7.1 × 10−8 M by a factor of ~4. Considering that we attempt to predict unbinding times on a time scale of half a minute from sub-μs MD simulations, and that a factor 20 corresponds to a free energy difference of about 3 kBT (i.e., 15 % of the barrier height in Hsp90), we find this agreement remarkable for a first principles approach which implies many uncertainties of the physical model51. We attribute the larger deviation in comparison to trypsin to issues with the sampling of the correct unbinding pathways: especially unbinding rates in the range of minutes to hours fall into the same timescale as slow conformational dynamics of host proteins48, requiring a sufficient sampling of the conformational space of the protein as a prerequisite for dcTMD pulling simulations.

Discussion

Using free energy and friction profiles obtained from dcTMD, we have shown that T-boosted Langevin simulations yield binding and unbinding rates which are well comparable to results from atomistic equilibrium MD and experiments. That is, rates are underestimated by an order of magnitude or less which, in comparison to other methods that have been applied to the trypsin-benzamidne and Hsp90 complexes (see refs. 3,52 for recent reviews), is within the top accuracy currently achievable. At the same time, the few other methods that aim at predicting absolute rates (such as Markov state models42,43 and infrequent metadynamics31,53) require substantial more MD simulation time, while dcTMD only requires sub-μs MD runs, that is, at least an order of magnitude less computational time. As the extrapolation error due to T-boosting is negligible, the error is mainly caused by the approximate calculation of free energy and friction fields by dcTMD. We have shown that friction profiles, which correspond to the dynamical aspect of ligand binding and unbinding, may yield additional insight into molecular mechanisms of unbinding processes, which are not reflected in the free energies. Although the three investigated molecular systems differ significantly, in all cases friction was found to be governed by the dynamics of solvation shells.

Methods

MD simulations

All simulations employed Gromacs v2018 (ref. 54) in a CPU/GPU hybrid implementation, using the Amber99SB* force field55,56 and the TIP3P water model57. For each system, 102–103 dcTMD calculations22 at pulling velocity vc = 1 m/s were performed to calculate free energy ΔG(x) and friction Γ(x). For the NaCl-water system, dcTMD as well as unbiased MD simulations were taken from ref. 22. Trypsin-benzamidin complex simulations are based on the 1.7 Å X-ray crystal structure with PDB ID 3PTB40. Simulation systems of the Hsp90-inhibitor complex were taken from ref. 47. Detailed information on system preparation, ligand parameterization, MD simulations and pathway separation can be found in the Supplementary Methods.

Langevin simulations

Langevin simulations employed the integration scheme by Bussi and Parrinello58. Details on the performance of this method with respect to the employed integration time step and system mass can be found in the Supplementary Methods.