Abstract
We introduce the UWHAM (binless weighted histogram analysis method) and SWHAM (stochastic UWHAM) software package that can be used to estimate the density of states and free energy differences based on the data generated by multistate simulations. The programs used to solve the UWHAM equations are written in the C++ language and operated via the command line interface. In this paper, first we review the theoretical bases of UWHAM, its stochastic solver RESWHAM (replica exchangelike SWHAM)and STSWHAM (serial temperinglike SWHAM). Then we provide a tutorial with examples that explains how to apply the UWHAM program package to analyze the data generated by different types of multistate simulations: umbrella sampling, replica exchange, free energy perturbation simulations, etc. The tutorial examples also show that the UWHAM equations can be solved stochastically by applying the RESWHAM and STSWHAM programs when the data ensemble is large. If the simulations at some states are far from equilibrium, the Stratified RESWHAM program can be applied to obtain the equilibrium distribution of the state of interest. All the source codes and the tutorial examples are available from our group’s web page: https://ronlevygroup.cst.temple.edu/software/UWHAM_and_SWHAM_webpage/index.html.
Introduction
The weighted histogram analysis method (WHAM) algorithm^{1,2} is widely applied to estimate the density of states and free energy differences based on the data generated by multistate simulations. Multistate simulations are popular advanced sampling algorithms that are applied in computational biophysics and computational chemistry. For example, the temperature replica exchange method is extensively applied to explore the configurational space of biomolecules; the umbrella sampling method is applied to construct free energy landscape of a system on chosen reaction coordinates; the free energy perturbation and Hamiltonian replica exchange method are powerful tools used to estimate the binding affinities of ligands and proteins for smallmolecule drug discovery^{3,4,5}. The WHAM algorithm is the standard tool to analyze the data generated by these multistate simulations. Consider the simulation at each state as a measurement of density of states, the WHAM algorithm answers the question what the best estimate of density of states is if measurements have been taken at multiple states.
Since its introduction in 1992, the WHAM algorithm has been examined and studied by several research groups^{6,7,8,9,10,11}. The most important improvement of WHAM is that an unbinned WHAM version named the multistate Bennett acceptance ratio (MBAR) or the binless WHAM (UWHAM) was introduced^{12,13,14}. Compared with the original WHAM, which coarsegrains observations into bins of a histogram, the binless WHAM provides the estimate of density of states for each data point therefore increasing the statistical precision and importantly, estimating the density of states provides a connection with the potential distribution theorem^{15,16}.
Complementary to the study of WHAM itself, how to solve WHAM equations efficiently in practice is another topic that has been an object of research^{17,18,19,20}. In fact, this topic became more challenging and more urgent after the introduction of binless WHAM because of the dramatic increase of the number of variables without coarsegraining. In ref.^{14}, Tan et al. proposed to solve the UWHAM equations by minimizing a convex function. To further remove the computational bottleneck in scaling up UWHAM, we developed methods called stochastic UWHAM (SWHAM) which solve the UWHAM equations stochastically by using generalized ensemble algorithms to resample the data collected at multiple states^{21,22}. One important assumption of applying WHAM is that the data obtained from each state has already reached global equilibrium. However, sometimes this assumption does not hold if the barriers between free energy basins are high at some of the states and the simulation times are not long enough. We developed a method called StratifiedUWHAM^{23} to analyze the data generated by multistate simulations when the simulations at some states are far from equilibrium.
The purpose of this paper is to introduce the UWHAM and SWHAM software package developed by our group. The programs used to solve the UWHAM equations are written in the C++ language and operated via the commandline interface. The basic solver solves the UWHAM equations by either a direct iteration method or minimization of a convex function. When the data ensemble is large, we show that the multistate free energies can be obtained directly by running serial temperinglike SWHAM (STSWHAM), which resamples the raw data by applying the serial tempering (ST) protocol; the multistate distributions can be obtained directly by running replica exchangelike SWHAM (RESWHAM), which resamples the raw data by applying the replica exchange (RE) protocol. If the simulations at some states are far from convergence, the multistate distributions can be estimated by Stratified RESWHAM. Local WHAM^{22}, which is a variant of STSWHAM that couples the adjacent states by a stochastic resampling procedure, is also included in this software package. The remaining part of the paper proceeds as follows: First, we briefly review the theoretical basis of UWHAM and SWHAM. Then we introduce the tutorial examples on the web page of the UWHAM and SWHAM software package.
Methods and Discussion
UWHAM
Suppose M parallel (independent or coupled) simulations in the canonical ensemble are run at M states. Each state is characterized by a specific combination of thermodynamic parameters and potential energy functions. They are referred to as λstates in the remaining part of this paper to avoid the confusion with the terms such as conformational states and microstates. Suppose X_{αi} is the ith microstate observed at the αth λstate, the probability of observing X_{αi} at the γth λstate is
where q_{γ}({x}_{αi}) = exp {−β_{γ}E_{γ}({x}_{αi})} is the Boltzmann’s factor of X_{αi} at the γth λstate; {x}_{αi} is the coordinates of the microstate X_{αi}; β_{γ} is the inverse temperature of the γth λstate; E_{γ}({x}_{αi}) is the potential energy of the microstate X_{αi} at the γth λstate; and Z_{γ} is the partition function of the γth λstate. The likelihood of the observed data is proportional to^{14}
where u_{αi} is the energy coordinate of the microstate X_{αi} that in general may be written as the sum of a reference energy plus perturbations (see ref.^{14}). N_{α} is the total number of observations observed at the αth λstate; and Ω(u_{αi}) is the density of states. Let \({\hat{Z}}_{\alpha }\) and \(\hat{{\rm{\Omega }}}({u}_{\gamma i})\) denote estimates of the partition function of the αth λstates and the density of states of u_{γi}, respectively. These two estimates satisfy the equation
Maximizing the log likelihood function yields
Eqs (3) and (4) are the UWHAM (or MBAR) equations^{13,14}. Note that the UWHAM estimates do not depend on the original λstate at which each observation was observed. Therefore the UWHAM equations can be simplified as
where \(N={\sum }_{\alpha =1}^{M}\,{N}_{\alpha }\) is the total number of observations.
The UWHAM estimate of the probability of observing the observation u_{i} at the αth λstate is
where \({\hat{w}}_{\alpha }({u}_{i})=\hat{{\rm{\Omega }}}({u}_{i}){q}_{\alpha }({u}_{i})\) is the unnormalized probability. We can define one of the λstates as the reference state, and the normalized probability of observing the observation u_{i} at the reference state is
and \({\hat{w}}_{\alpha }={\hat{w}}_{0}({u}_{i}){\rm{\Delta }}{q}_{\alpha }({u}_{i})\), where Δq_{α}(u_{i}) = q_{α}(u_{i})/q_{0}(u_{i}) = exp{−[β_{α}E_{α}(u_{i}) − β_{0}E_{0}(u_{i})]} is the biasing factor. Then the equation array Eq. (5) can be rewritten as
In practice, the UWHAM program solves the equation array Eq. (8) instead of Eq. (5). Suppose A is a property of interest of the system. According to Eq. (6), the expectation value of the property A at the αth λstate is calculated by the weighted average
where A(u_{i}) is the the property A measured by using the ith observation.
Currently, a selfconsistent iteration solver and a solver that optimizes a convex function by using the NewtonRaphson algorithm^{14} have been implemented in the UWHAM program to solve the UWHAM equations.
SWHAM
Suppose the raw data were generated from simulations at M λstates, and the total number of observations is N. During the procedure of UWHAM analysis, the program needs to evaluate M biasing factors (or Boltzmann’s factors) for each observation at the beginning. Namely, the UWHAM program evaluates a biasing matrix which contains n × M^{2} elements, where n = N/M is the average number of observations observed at each λstate. Then the UWHAM equations are solved by minimization of a convex function, which involves multiplication of matrices that contain M × N elements (as large as the biasing matrix) and diagonalization of matrices that contain M × M elements. The costs of memory and computational time of running UWHAM are proportional to the second order of the number of λstates M. To remove this computational bottleneck in scaling up UWHAM, we developed methods which solve UWHAM equations stochastically by using the generalized ensemble algorithms.
RESWHAM
RESWHAM is an algorithm that we developed to solve the UWHAM equations stochastically by applying the replica exchange (RE) protocol to resample the raw data generated by multistate simulations^{21}. As shown in Fig. 1, the observations observed at each λstate are collected as the database for that λstate beforehand. Then RESWHAM analyses are run by performing cycles of RE simulations. Each cycle consists of a “move” procedure and an “exchange” procedure. During the move procedure of RESWHAM, an observation in the database of a λstate is randomly chosen with equal probability to associate with the replica at that λstate. During the exchange procedure of RESWHAM, we attempt to swap two random replicas based on the Metropolis criterion. If the swap is accepted, in addition to swapping the replicas, the observation associated with the replica is also swapped to the database of the other λstate^{21}. The exchange step should be repeated multiple times to approach the infinite swapping limit for the best sampling efficiency^{24}. At the end of the exchange procedure, the observation associated with the replica at each λstate is recorded as the output of that λstate. Note the direct outputs of RESWHAM are the estimates of the equilibrium distribution at each λstate.
In ref.^{21}, we proved that the distribution of the outputs of RESWHAM at each λstate are asymptotic to the UWHAM estimate when the number of observations observed at each λstate is large by treating RESWHAM as a random walk in the space of the weight arrays of observations. Here we provide an alternative proof. Consider a trial exchange in RESWHAM which swaps one observation u_{m} at the αth λstate and the other observation u_{n} at the γth λstate. The probability that this trial exchange is accepted is
where \({\tilde{p}}_{X}({u}_{Y})\) is the normalized timeaverage probability of choosing the observation u_{Y} to associate with the replica at the X th λstate, and Ψ is the Metropolis function^{25}
which has the property Ψ(x)/Ψ(−x) = exp{−x}. Consider the reverse trial exchange that swaps the observation u_{n} at the αth λstate and the observation u_{m} at the γ λstate. The probability that this trial exchange is accepted is
If the RESWHAM resampling procedure converges, P_{ex} and P′_{ex} will agree with each other for each pair of observations (u_{m}, u_{n}) and each pair of λstates (α, γ), which leads to the detailed balance relation of RESWHAM:
Eq. (13) can be rewritten as
where subscript 0 denotes the reference state. Then the probability \({\tilde{p}}_{\alpha }({u}_{m})\) can be expressed as
where \(\hat{{\rm{\Omega }}}({u}_{m})={\tilde{p}}_{0}({u}_{m})/{q}_{0}({u}_{m})\). Summing both sides of Eq. (15) over all the observations and applying the relationship \({\sum }_{m=1}^{N}\,{\tilde{p}}_{\alpha }({u}_{m})=1\) at each λstate yields
Note that the probability of finding the observation u_{m} in the database of the αth λstate is \({N}_{\alpha }\,{\tilde{p}}_{\alpha }({u}_{m})\) and there is one copy of each observation in the databases of all λstates, namely, \({\sum }_{\alpha =1}^{M}\,{N}_{\alpha }\,{\tilde{p}}_{\alpha }({u}_{m})=1\). Multiplying both side of Eq. (15) by N_{α} and summing over all the λstates yields
Thus, the RESWHAM estimates \({\hat{Z}}_{\alpha }\) and \(\hat{{\rm{\Omega }}}({u}_{m})\) satisfy Eqs (16) and (17), which are equivalent to the UWHAM equations (Eq. (5)).
STSWHAM
STSWHAM is an algorithm that we developed to solve the UWHAM equations stochastically by applying the serial tempering (ST) protocol to resample the raw data generated by multistate simulations^{22}. The procedure is illustrated in Fig. 2. Like the RESWHAM analysis, the observations observed at each λstate are collected as the database for that λstate beforehand. However, unlike resampling the data using replica exchanges, there is only one “simulation run” in the serial tempering resampling algorithm. For the sake of comparison and convenience, we still refer to this single simulation as a replica in this paper. Serial tempering simulations are also run by cycles, and each cycle consists of a “move” procedure and a “jump” procedure. During the move procedure of STSWHAM, an observation in the database of the λstate sampled by the replica is randomly chosen with equal probability to associate with the replica. During the jump procedure, the replica jumps to the αth λstate according to the probability^{22}
where u_{i} is the ith observation associated with the replica; ζ_{κ} = −lnZ_{κ} is the unitless free energy of the κth λstate; \({\pi }_{\kappa }^{0}={N}_{\kappa }/N\) is the proportion of the κth λstate of the raw data generated by the multistate simulations; and q_{κ}(u_{i}) is the biasing factor of the ith observation at the κth λstate. Suppose π_{κ} is the observed proportion of the κth λstate sampled by the replica during the STSWHAM analysis. The values of {ζ_{κ}} are adjusted during the analysis of STSWHAM until the observed proportion of the replica being at the κth λstate π_{κ} agrees with \({\pi }_{\kappa }^{0}\)^{22}. Note the direct outputs of STSWHAM are the estimates of the free energies of different λstates — {ζ_{κ}}.
It can be shown that ζ_{κ} is the UWHAM estimate of the free energy of the κth λstate when π_{κ} and \({\pi }_{\kappa }^{0}\) agree with each other for all λstates. The details of the proof that STSWHAM solves the UWHAM equations stochastically can be found in ref.^{22}. One brief rationale is as follows. First, if π_{κ} equals \({\pi }_{\kappa }^{0}\), the probability of each observation being chosen to associate with the replica during the STSWHAM analysis is 1/N, where N is the total number of observations. Therefore, the observed proportion of the αth λstate sampled by the replica during the STSWHAM analysis is
On the other hand, note that
Then \({\pi }_{\alpha }^{0}\) can be rewritten as
Comparison between Eqs (19) and (21) leads to the conclusion that π_{α} and \({\pi }_{\alpha }^{0}\) agree with each other if \({\zeta }_{\alpha }={\hat{\zeta }}_{\alpha }\) for each λstate.
The jump of the replica following Eq. (18) was referred to as the global jump proposal in ref.^{22} because the replica can reach any λstate of the system by one jump. According to Eq. (18), every jump of the replica requires calculations of M exponential functions, where M is the total number of λstates. When the total number of states is large, STSWHAM analyses using the global jump proposal take a long time to converge. In our software package, we implemented a much faster approximate solver of UWHAM–STSWHAM using a local jump proposal. This algorithm was referred to as local WHAM in ref.^{22} because the replica can only be at the λstates that are the local neighbors of the initial λstate at the end of the jump procedure if the number of jumps per cycle is finite. Suppose the replica that associates with the observation u_{i} is at the γth λstate initially. The procedure of performing one jump in local WHAM is as follows^{22}:

select a trial λstate with uniform probabilities from the nearest neighbors of the γth λstate, suppose the chosen λstate is the αth λstate.

accept the αth λstate as the new λstate to jump to according to the Metropolis probability
where p(αu_{i}; ζ, π^{0}) and p(γu_{i}; ζ, π^{0}) are defined by Eq. (18); and Γ(γ, α) is the probability of choosing the αth λstate as the trial λstate when the replica is at the γth λstate originally. Namely, Γ(γ, α) = 1/n_{γ}, where n_{γ} is the total number of the nearest neighbors of the γth λstate if the αth λstate is one of the nearest neighbors of the γth λstate; Γ(γ, α) = 0 otherwise.
As can be seen, the replica can only be at the original λstate or one of its nearest neighbors after one jump. However, the replica can diffuse further away from the original λstate by repeating this one jump procedure multiple times. As the number of jumps per cycle increases, the results of local WHAM converges asymptotically to the results of STSWHAM that uses the global jump proposal^{22}. Therefore, the jump of the replica following Eq. (18) in the infinite jump limit in serial tempering simulations is analogous to the infinite swapping limit in replica exchange simulations^{24}.
In STSWHAM, the free energy estimates are adjusted during the analysis until the observed proportion of the replica being at the κth λstate π_{κ} agrees with the proportion of the κth λstate of the raw data generated by the multistate simulations \({\pi }_{\kappa }^{0}\). So far a variant of the updating algorithm discussed in ref.^{22} is implemented in the STSWHAM program.
Stratified RESWHAM
When applying UWHAM and its stochastic solvers, the basic assumption is that the simulation at each λstate is “approximately” equilibrated. However, this assumption might not always hold. To handle such situations, we developed an analysis tool called StratifiedUWHAM and its stochastic solver Stratified RESWHAM to compute free energy and expectations from a multistate ensemble when the simulations at a subset of λstates are far from global equilibrium^{23}. In ref.^{23}, we showed that the Stratified UWHAM equations can be solved in the form of the original UWHAM equations (Eq. (5)) with an expanded set of λstates. The stochastic solver, Stratified RESWHAM, has been included in the UWHAM and SWHAM software package. See the Supporting Information for a brief review and discussion about Stratified UWHAM and Stratified RESWHAM.
Illustrative applications
So far the tutorial examples include how to analyze the data generated by “one dimensional umbrella sampling”, “two dimensional umbrella sampling”, “temperature replica exchange”, “Hamiltonian replica exchange”, “two dimensional replica exchange” and “ free energy perturbation” simulations. The tutorials provide the raw data generated by different types of multistate simulations and explain the corresponding analysis procedures and outputs in details.
One Dimensional Umbrella Sampling
We explain how to apply UWHAM or STSWHAM to analyze the raw data generated by one dimensional umbrella sampling simulations. The potential function of the system studied in this example is a one dimensional double well potential^{26}
where H = 20 k_{B}T is the height of the barrier between the two wells; k_{B} is Boltzmann’s constant; T is the temperature; and W = 1 is the half width between the two minima of the potential. This one dimensional potential can be explored by a Brownian particle simulated with the overdamped Langevin dynamics^{26}. Here we applied 31 parabolic potentials in the region between x = −3 to x = 3 to perform the umbrella sampling simulations. Then UWHAM and STSWHAM are used to analyze the data and construct the potential energy profile.
Umbrella sampling simulations are usually applied to construct free energy profiles for systems with multiple degrees of freedom. Although the example that we used here is a Brownian particle governed by a one dimensional potential function, the analysis procedure is the same for applying UWHAM or STSWHAM to construct one dimensional free energy profiles of complex systems. In such cases, the position of the complex system projected on the chosen reaction coordinate is analogous to the position of the Brownian particle in this tutorial.
Two Dimensional Umbrella Sampling
This example explains how to apply UWHAM or STSWHAM to raw data generated by two dimensional umbrella sampling simulations (of ~100 degrees of freedom) to construct the free energy profile. The system studied in this example is an alanine dipeptide (AlaD) molecule in implicit solvent at 300 K. The simulations were performed by using the GROMACS 5.1.2 simulation package with the Amber99SB force field and the OBC GB model^{27,28}. To explore the two dimensional free energy surface (the Ramachandran plot of AlaD), we applied 24 × 24 parabolic potentials by using the PLUMED plugin^{29} to perform the umbrella sampling simulations. The Ramachandran plots of AlaD are constructed by using the UWHAM and STSWHAM estimates.
Temperature Replica Exchange
We explain how to apply UWHAM or RESWHAM to raw data generated by temperature replica exchange simulations to obtain the estimates of the equilibrium distribution at the λstate of interest. The system studied in this example is the same as the previous example–an alanine dipeptide (AlaD) molecule in implicit solvent. The RE simulations were performed by using the GROMACS 5.1.2 simulation package with the Amber99SB force field and the OBC GB model^{27,28}. The coupled simulations were run at 10 temperatures (300 K, 317.52 K, 336.063 K, 355.689, 376.462 K, 398.447 K, 421.716 K, 446.345 K, 472.411, 500 K). The Ramachandran plots of AlaD in implicit solvent at 300 K are constructed by using the UWHAM and RESWHAM estimates.
Free Energy Perturbation
This example shows how to analyze the data generated by free energy perturbation (FEP) simulations. Here we calculate the solvation free energy of a water molecule in pure solvent (TIP3P) at 300 K by using the slowgrowth method. The simulations were performed by using the GROMACS 5.1.2^{27} and the TIP3P water model. In this example, we ran 11 independent parallel simulations for a box of pure solvent with a fixed tagged water molecule inside. The interaction between the tagged water molecule and the environment were gradually turned off through 11 λstates^{30}. The chosen λ values are 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0. The UWHAM estimate of the solvation free energy of a water molecule in pure solvent is −6.18 kcal/mol. One can obtain the same result by using STSWHAM. More details and discussion about measuring the excess chemical potential of water molecules in solution using UWHAM can be found in ref.^{30}.
BEDAM: Hamiltonian Replica Exchange
We explain how to use UWHAM or RESWHAM to analyze the data generated by Hamiltonian replica exchange simulations. In this example, we study the binding affinity of a guest molecule (heptanoate) to a host molecule (βcyclodextrin) in implicit solvent (OPLAAA/AGBNP2)^{31,32}. Here we apply the binding energy distribution analysis method (BEDAM)^{33} to obtain the binding free energy and binding energy distributions of this complex. BEDAM is a free energy method based on the Hamiltonian replica exchange algorithm. Suppose there are M parallel simulations in BEDAM, the Hamiltonian (potential) function of the ith λstate is
where V_{0} is the effective potential energy of the complex without the direct and solventmediated ligandreceptor interactions, and u is the binding energy^{33}. Namely, the λ factor in BEDAM simulations linearly scales the interaction between the ligand and acceptor. We ran BEDAM simulations at 300 K by using 16 λstates. The chosen λ values are 0.0, 0.001, 0.002, 0.004, 0.01, 0.04, 0.07, 0.1, 0.2, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0^{21}. We applied UWHAM to estimate the binding free energy of the βcyclodextrin Heptanoate Complex— −0.603 kcal/mol + G_{vsite}, where G_{vsite} is a correction because of the restraint applied to the ligand during the BEDAM simulation. One can obtain the same result by applying STSWHAM to the raw data. We also show how to apply UWHAM or RESWHAM to estimate the equilibrium distribution of the binding energy at the λ = 1 state (full interaction state).
Two Dimensional (Temperature and Hamiltonian) Replica Exchange
This example shows how to use SWHAM to analyze the data generated by two dimensional (Hamiltonian and temperature) replica exchange simulations. We study the binding affinity of a guest molecule (heptanoate) to a host molecule (βcyclodextrin) in implicit solvent (OPLAAA/AGBNP2)^{32} at different temperatures. The raw data used in this example were generated by 15 separated BEDAM^{33} simulations at temperatures 200 K, 206 K, 212 K, 218 K, 225 K, 231, 238 K, 245 K, 252 K, 260 K, 267 K, 275, 283 K, 291 K, 300 K. The chosen λ values are the same as the previous example–0.0, 0.001, 0.002, 0.004, 0.01, 0.04, 0.07, 0.1, 0.2, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0. There are totally 16 × 15 = 240 states, and each state has 144,000 data points^{21,22}. Although there are no exchanges between replicas at different temperatures, the procedure described in this tutorial can be applied to two dimensional (Hamiltonian and temperature) replica exchange simulations without any alteration. The goal of this practice is to obtain the best estimates of the binding affinity at 200 K, which is the most difficult for BEDAM simulation to converge. Because the raw data ensemble is large, UWHAM is not suitable to analyze them directly. Here, we applied RESWHAM to estimate the equilibrium distribution of binding energies of each λstate at 200 K. And the RESWHAM results are compared with the corresponding distributions calculated from the raw data. See ref.^{21} and ^{22} for more discussion about this tutorial example. The equilibrium distributions constructed by the RESWHAM output can be used as the input for UWHAM to estimate the binding free energy at the temperature of interest. The binding free energy of the βcyclodextrin Heptanoate Complex is about −6.3 kcal/mol + G_{vsite} at 200 K, which is much stronger compared with its binding free energy at 300 K. This result can also be obtained by applying STSWHAM with the local jump algorithm to the raw data directly.
Two Binding Modes of the βcyclodextrin Heptanoate Complex
The βcyclodextrin heptanoate complex has two binding states depending on the orientation of the heptanoate molecule^{23}. The two binding modes are referred to as the UP and DOWN macrostates. We ran two sets of independent MD simulations at 300 K of the βcyclodextrin heptanoate complex in implicit solvent (AGBNP GB model^{32}) at 16 λstates. The initial structures of the complex in the first and the second sets of simulations were chosen from the UP and Down macrostates, respectively. The interaction between the ligand and the receptor was scaled by a λ factor like BEDAM^{33}, However, all the simulations are independent. The chosen λ values are (0.0, 0.001, 0.002, 0.004, 0.01, 0.04, 0.07, 0.1, 0.2, 0.4, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0). In this example, the λstates with the largest seven λ values (λ = 1.0, 0.95, 0.9, 0.8, 0.7, 0.6, 0.4) are considered as the partially connected states because it is difficult for the binding complex to switch its binding mode and the simulations have not converged at these λstates; the other nine λstates are the fully connected states^{23}. This tutorial shows how to apply Stratified RESWHAM to estimate the equilibrium distribution at the λ = 1 state (full interaction state) when some simulations are far from convergence. See ref.^{23} for more discussion about this tutorial example.
Data Availability
The UWHAM and SWHAM software package and its tutorials are available from the web page: https://ronlevygroup.cst.temple.edu/software/UWHAM_and_SWHAM_webpage/index.html. The UWHAM and SWHAM software package is distributed using the MIT license. In the future, we will keep adding more examples of the application of UWHAM and SWHAM to the web page. For instance, free energy perturbation (FEP) is one popular method that is applied to measure the relative ligand binding potency^{34,35}. Currently we are applying UWHAM to analyze the FEP data and extract a density of states that can be used to estimate the relative binding free energy differences for multiple ligands simultaneously to solve the cycle closure challenge^{34}. We will continue optimizing the code and plan to introduce parallelism to the software package.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.
Kumar, S., Rosenberg, J. M., Bouzida, D., Swendsen, R. H. & Kollman, P. A. The weighted histogram analysis method for freeenergy calculations on biomolecules. i. the method. J. Comput. Chem. 13, 1011–1021, https://doi.org/10.1002/jcc.540130812 (1992).
 2.
Kumar, S., Rosenberg, J. M., Bouzida, D., Swendsen, R. H. & Kollman, P. A. Multidimensional freeenergy calculations using the weighted histogram analysis method. J. Comput. Chem. 16, 1339–1350, https://doi.org/10.1002/jcc.540161104 (1995).
 3.
Zuckerman, D. M. Equilibrium sampling in biomolecular simulations. Annu. Rev. Biophys. 40, 41–62, https://doi.org/10.1146/annurevbiophys042910155255 (2011).
 4.
Gallicchio, E. & Levy, R. M. Advances in all atom sampling methods for modeling protein, ligand binding affinities. Curr. Opin. Struct. Biol. 21, 161–166, https://doi.org/10.1016/j.sbi.2011.01.010 (2011).
 5.
Maximova, T., Moffatt, R., Ma, B., Nussinov, R. & Shehu, A. Principles and overview of sampling methods for modelling macromolecular structure and dynamics. PLoS Comput. Biol. 12, e1004619, https://doi.org/10.1371/journal.pcbi.1004619 (2016).
 6.
Roux, B. The calculation of the potential of mean force using computersimulations. Comput. Phys. Commun. 91, 275–282, https://doi.org/10.1016/00104655(95)00053I (1995).
 7.
Bartels, C. & Karplus, M. Multidimensional adaptive umbrella sampling: Applications to main chain and side chain peptide conformations. J. Comput. Chem. 18, 1450–1462, https://doi.org/10.1002/(sici)1096987x (1997).
 8.
Bartels, C. Analyzing biased Monte Carlo and molecular dynamics simulations. Chem. Phys. Lett. 331, 446–454, https://doi.org/10.1016/s00092614(00)01215x (2000).
 9.
Souaille, M. & Roux, B. Extension to the weighted histogram analysis method: combining umbrella sampling with free energy calculations. Comput. Phys. Commun. 135, 40–57, https://doi.org/10.1016/s00104655(00)002150 (2001).
 10.
Gallicchio, E., Andrec, M., Felts, A. K. & Levy, R. M. Temperature weighted histogram analysis method, replica exchange, and transition paths. J. Phys. Chem. B 109, 6722–6731, https://doi.org/10.1021/jp045294f (2005).
 11.
Chodera, J. D., Swope, W. C., Pitera, J. W., Seok, C. & Dill, K. A. Use of the weighted histogram analysis method for the analysis of simulated and parallel tempering simulations. J. Chem. Theory Comput. 3, 26–41, https://doi.org/10.1021/ct0502864 (2007).
 12.
Tan, Z. On a likelihood approach for Monte Carlo integration. J. Am. Stat. Assoc. 99, 1027–1036, https://doi.org/10.1198/016214504000001664 (2004).
 13.
Shirts, M. R. & Chodera, J. D. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129, 124105, https://doi.org/10.1063/1.2978177 (2008).
 14.
Tan, Z., Gallicchio, E., Lapelosa, M. & Levy, R. M. Theory of binless multistate free energy estimation with applications to proteinligand binding. J. Chem. Phys. 136, 144102, https://doi.org/10.1063/1.3701175 (2012).
 15.
Widom, B. Some topics in the theory of fluids. J. Chem. Phys. 39, 2808, https://doi.org/10.1063/1.1734110 (1963).
 16.
Beck, T. L., Paulaitis, M. E. & Pratt, L. R. The Potential Distribution Theorem and Models of Molecular Solutions (Cambridge University Press, 2006).
 17.
Bereau, T. & Swendsen, R. H. Optimized convergence for multiple histogram analysis. J. Comput. Phys. 228, 6119–6129, https://doi.org/10.1016/j.jcp.2009.05.011 (2009).
 18.
Kim, J., Keyes, T. & Straub, J. E. Communication: Iterationfree, weighted histogram analysis method in terms of intensive variables. J. Chem. Phys. 135, 061103, https://doi.org/10.1063/1.3626150 (2011).
 19.
Zhu, F. & Hummer, G. Convergence and error estimation in free energy calculations using the weighted histogram analysis method. J. Comput. Chem. 33, 453–465, https://doi.org/10.1002/jcc.21989 (2012).
 20.
Zhang, C., Lai, C.L. & Pettitt, B. M. Accelerating the weighted histogram analysis method by direct inversion in the iterative subspace. Mol. Simul. 42, 1079–1089, https://doi.org/10.1080/08927022.2015.1110583 (2016).
 21.
Zhang, B. W., Xia, J., Tan, Z. & Levy, R. M. A stochastic solution to the unbinned wham equations. J. Phys. Chem. Lett. 6, 3834–3840, https://doi.org/10.1021/acs.jpclett.5b01771 (2015).
 22.
Tan, Z., Xia, J., Zhang, B. W. & Levy, R. M. Locally weighted histogram analysis and stochastic solution for largescale multistate free energy estimation. J. Chem. Phys. 144, 034107, https://doi.org/10.1063/1.4939768 (2016).
 23.
Zhang, B. W., Deng, N., Tan, Z. & Levy, R. M. Stratified UWHAM and its stochastic approximation for multicanonical simulations which are far from equilibrium. J. Chem. Theory Comput. 13, 4660–4674, https://doi.org/10.1021/acs.jctc.7b00651 (2017).
 24.
Zhang, B. W. et al. Simulating replica exchange: Markov state models, proposal schemes, and the infinite swapping limit. J. Phys. Chem. B 120, 8289–8301, https://doi.org/10.1021/acs.jpcb.6b02015 (2016).
 25.
Bennett, C. H. Efficient estimation of free energy differences from monte carlo data. J. Comput. Phys. 22, 245–268, https://doi.org/10.1016/00219991(76)900784 (1976).
 26.
Zhang, B. W., Jasnow, D. & Zuckerman, D. M. Transitionevent durations in onedimensional activated processes. J. Chem. Phys. 126, 074504, https://doi.org/10.1063/1.2434966 (2007).
 27.
Abraham, M. J. et al. GROMACS: High performance molecular simulations through multilevel parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25, https://doi.org/10.1016/j.softx.2015.06.001 (2015).
 28.
Onufriev, A., Bashford, D. & Case, D. A. Exploring protein native states and largescale conformational changes with a modified generalized born model. Proteins: Struct., Funct., Bioinf. 55, 383–394, https://doi.org/10.1002/prot.20033 (2004).
 29.
Tribello, G. A., Bonomi, M., Branduardi, D., Camilloni, C. & Bussi, G. PLUMED 2: New feathers for an old bird. Comput. Phys. Commun. 185, 604–613, https://doi.org/10.1016/j.cpc.2013.09.018 (2014).
 30.
Zhang, B. W., Cui, D., Matubayasi, N. & Levy, R. M. The excess chemical potential of water at the interface with a protein from end point simulations. J. Phys. Chem. B 122, 4700–4707, https://doi.org/10.1021/acs.jpcb.8b02666 (2018).
 31.
Wickstrom, L., He, P., Gallicchio, E. & Levy, R. M. Large scale affinity calculations of cyclodextrin hostguest complexes: Understanding the role of reorganization in the molecular recognition process. J. Chem. Theory Comput. 9, 3136–3150, https://doi.org/10.1021/ct400003r (2013).
 32.
Gallicchio, E., Paris, K. & Levy, R. M. The agbnp2 implicit solvation model. J. Chem. Theory Comput. 5, 2544–2564, https://doi.org/10.1021/ct900234u (2009).
 33.
Gallicchio, E., Lapelosa, M. & Levy, R. M. The binding energy distribution analysis method (bedam) for the estimation of proteinligand binding affinities. J. Chem. Theory Comput. 6, 2961–2977, https://doi.org/10.1021/ct1002913 (2010).
 34.
Wang, L. et al. Modeling local structural rearrangements using fep/rest: Application to relative binding affinity predictions of CDk2 inhibitors. J. Chem. Theory Comput. 9, 1282–1293, https://doi.org/10.1021/ct300911a (2013).
 35.
Wang, L. et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern freeenergy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695–2703, https://doi.org/10.1021/ja512751q (2015).
Acknowledgements
This work was supported by NIH grant (GM30580), NSF grant (1665032) and by an NIH computer equipment grant OD020095. This work also used Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation (ACI1053575). The authors acknowledge invaluable discussions with Zhiqiang Tan at Rutgers University.
Author information
Affiliations
Contributions
B.W.Z. and R.M.L. wrote the manuscript. B.W.Z. wrote the program codes and prepared the tutorial examples. B.W.Z. and S.A. constructed the web page. All authors reviewed the manuscript.
Competing Interests
The authors declare no competing interests.
Corresponding author
Correspondence to Bin W. Zhang.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Received
Accepted
Published
DOI
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.