A Bayesian approach to extracting free-energy profiles from cryo-electron microscopy experiments

Cryo-electron microscopy (cryo-EM) extracts single-particle density projections of individual biomolecules. Although cryo-EM is widely used for 3D reconstruction, due to its single-particle nature it has the potential to provide information about a biomolecule’s conformational variability and underlying free-energy landscape. However, treating cryo-EM as a single-molecule technique is challenging because of the low signal-to-noise ratio (SNR) in individual particles. In this work, we propose the cryo-BIFE method (cryo-EM Bayesian Inference of Free-Energy profiles), which uses a path collective variable to extract free-energy profiles and their uncertainties from cryo-EM images. We test the framework on several synthetic systems where the imaging parameters and conditions were controlled. We found that for realistic cryo-EM environments and relevant biomolecular systems, it is possible to recover the underlying free energy, with the pose accuracy and SNR as crucial determinants. We then use the method to study the conformational transitions of a calcium-activated channel with real cryo-EM particles. Interestingly, we recover not only the most probable conformation (used to generate a high-resolution reconstruction of the calcium-bound state) but also a metastable state that corresponds to the calcium-unbound conformation. As expected for turnover transitions within the same sample, the activation barriers are on the order of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_BT$$\end{document}kBT. We expect our tool for extracting free-energy profiles from cryo-EM images to enable more complete characterization of the thermodynamic ensemble of biomolecules.

www.nature.com/scientificreports/ Let a predetermined smooth 1D path X in configuration space be parameterized by 0 ≤ s ≤ 1 , so that x = X(s) is a particular configuration chosen to be on the path. This path should span the relevant conformational changes of the system, and thermal motion should be relatively small in all directions transverse to the path. In Fig. 1, we show a schematic representation of the path X (white curve) that connects the relevant metastable states (basins) in the conformational space. At each configuration x = X(s) one sets up transverse coordinates z ∈ R 3N−1 , so that any configuration x in a tubular neighborhood of the path may be written uniquely via a map x = X (s, z) , where X(s) = X (s, 0) . This means that inverse functions S(x) and Z(x) exist such that X (S(x), Z(x)) = x for all x in this neighborhood. Our CV is defined by S(x), i.e. the parameter value s of the unique point on the path nearest to a given thermally-accessible configuration x. For all points X(s) on the path, S(X(s)) = s extracts their CV parameter.
In practice, one must discretize integrals (e.g., for the Bayesian analysis presented below) over the parameter 0 ≤ s ≤ 1 . For this we use a simple M-node equispaced rule, which applies to smooth functions f, the parameter nodes being s m := (m − 1)/(M − 1) . This defines a discrete set of 3D conformations (which we refer to as nodes) x m := X(s m ) , that take the system from a starting conformation x 1 to a final one x M . Note that M is a numerical convergence parameter (the results are expected to converge as M → ∞ ), and should be chosen large enough so that conformational changes are small between adjacent nodes. Ideally, the parameterization of the path should also have roughly uniform "speed" |X ′ (s)| , so that discrete conformations x m are approximately evenly spaced in R 3N , although satisfying this condition may be challenging in many applications. If the path is well chosen, then the assumption that the cryo-EM images come from conformations near the path is justified by the Laplace approximation in the low-temperature limit, as in path-based algorithms for MD simulations 36,37 .
The CV defined in Ref. 36 compares 3D conformations (e.g. from an MD trajectory) to the set of nodes belonging to the path X. Inspired by this, we develop the cryo-BIFE method, a Bayesian formalism to infer the free-energy profile along the predetermined path, given an ensemble of raw cryo-EM images from the same biomolecule.
The free-energy profile along the path. Here, we consider the biomolecule at thermal equilibrium.
From Boltzmann statistics, the probability density at configuration x ∈ R 3N is given by where H(x) is the system's Hamiltonian (potential energy of conformation x), and Z 0 = e −βH(x) dx is the full partition function. We now project this down to the CV. One may choose the map X (s, z) so that, at each point on the path, ∂x ∂z j for the transverse coordinates z j , j = 1, . . . , 3N − 1 , are mutually orthonormal, and orthogonal  Figure 1. Schematic representation of the path collective variable and Bayesian formalism for cryo-BIFE. The main goal of our methodology is to determine the posterior probability distribution of free-energy profiles G(s) over a given configuration space path X(s), given a set of noisy cryo-EM particle (projection) images w = {w i } from i = 1, . . . , I . The green graphs on the right show independent samples drawn from this posterior, and the blue curve their mean. The black curve represents the true free-energy profile. Variation between sampled free energy surfaces arises from a detailed Bayesian model of imaging noise. The path 0 ≤ s ≤ 1 is discretized using M nodes. www.nature.com/scientificreports/ to the path tangent vector X ′ (s) . Then, near to the path, the Jacobian of the map is the "speed" |X ′ (s)| (note that |z| 2 then matches the squared-distance variable preferred in Ref. 36 ). A change of variables gives the marginalized probability density as where δ is the 1D Dirac delta distribution, and in the last step we used Eq. (2) and theJacobian. Since only conformations near to the path are assumed relevant, for simplicity the Jacobian here was approximated as constant with respect to z. Note that the final integral in Eq. (3) is a partition function restricted to the "slice" transverse to X at s. It is then standard to interpret this ρ(s) as the equilibrium density due to an effective 1D free-energy profile (or potential of mean force) G(s) defined by a 1D analog of Eq. (2) with Z 1 = 1 0 e −βG(s) ds . Our goal is to infer the function G from a large set of 2D cryo-EM images in a statistically rigorous fashion, up to an additive offset. Note that, by Eq. (4), this is equivalent to inferring the population density ρ G . cryo-BIFE: a Bayesian approach for extracting the free-energy profile using cryo-EM images. In general, the underlying free energy for a system is unknown. However, in cryo-EM, we have access to a collection of (noisy) raw images w : The model for each image w i is a noisy unknown projection of the biomolecule with an unknown configuration x taken to be independently distributed following Eq. (2). In the CV approach sketched above we restrict this to the 1D configuration path x = X(s) , where s is a Boltzmann-distributed random variable as in Eq. (4).
For simplicity of notation, we use the symbol G to represent the profile, i.e., function G(s) over 0 ≤ s ≤ 1 , keeping in mind that in all numerical computations it will be represented by its vector of values at the nodes, {G(s m )} M m=1 (see the Methods). In the Bayesian approach, uncertainty about G is encoded by a posterior density over the space of functions. Then, by Bayes' rule, where p(G|w) is the desired posterior density over free-energy profiles induced by the observed data. p(w|G) is the sampling density (or likelihood) of the set of all observed images w, assuming a specific free-energy profile function G. The term p(G) encodes any prior knowledge about the free-energy profile. In this work, we will impose only a weak-smoothness prior, whose functional form is given in the Methods section. The normalizing constant p(w), also known as the evidence, will be ignored since it is not needed for inference of G. Note that in Eq. (5), and many subsequent formulae, each term is of course conditioned on the path X, and thus one could write p(G|w, X), etc. However, since X is fixed, for notational simplicity we leave this dependence implied.
We assume that the cryo-EM images are conditionally independent given G, where p(w i |G) is the sampling density (likelihood) of the single image w i given G. Our imaging model, encoded by p(w i |G) , may be interpreted as having two steps: first we draw s randomly according to ρ G in Eq. (4), then we draw a noisy image of the 3D molecular configuration x = X(s) according to the full random set of imaging parameters (orientation, translation, noise, etc). Because s is an unobserved (a.k.a. latent) variable, the likelihood of an image can be computed by marginalizing over s, where the second step applies the quadrature, Eq. (1), and our assumption that images come from conformations near the path. The second factor in this sum is, under the Boltzmann assumption, the normalized equilibrium density (4) evaluated at the mth parameter node, The first factor p(w i |x m ) in the sum (7) is interpreted as the likelihood function of image w i conditioned on a known conformation x m . The cryo-EM imaging process is quite well understood, and considerable work has gone into evaluating such likelihoods 10,11,38 . Here, we will use the BioEM formalism from Ref. 39 , which uses a set of numerical marginalizations over all imaging parameters, analogous to (but much larger in scale than) the above one over s. See the Methods, and Refs. 39,40 , for details about the BioEM calculations. We note that the present method is not limited to the use of BioEM: any other likelihood formalism (e.g., those used for 3D reconstruction 10 ) could be inserted.
Plugging Eqs. (6)-(8) into Bayes's rule, p(G|w) ∝ p(G)p(w|G) , and dropping irrelevant normalization factors, the posterior becomes www.nature.com/scientificreports/ Given a set of particles, the cryo-BIFE algorithm consists of three main steps: (1) define a path X and discretize it with M nodes x m = X(s m ) , (2) pre-calculate the BioEM likelihoods p(w i |x m ) for all nodes m = 1, . . . , M , for every image w i , then (3) use a Markov chain Monte Carlo (MCMC) method to sample from the posterior, Eq. (9), and from these samples-each a possible profile G(s)-estimate the expected value of the free-energy profile, G(s) , and also its uncertainty. Steps (2) and (3) are described in the Methods.
Step (1), defining the path, is challenging because it depends on the particular system of interest. In practice, we select a set of conformations x m that go from one relevant state of the system to another, as is done with the CV from Ref. 36 . In future work, we hope to adapt algorithms from the molecular-simulation community, such as the String method 37,41 and Nudged Elastic Band 42 , to let us determine optimal path-CVs directly from the cryo-EM data.
In the following, we validate and test cryo-BIFE over a diverse set of systems, from a conformational change along one dimension, using synthetic images, to a membrane channel's calcium bound/unbound transition, using real cryo-EM data.

Results
To understand the effects of the physical parameters (e.g., those involved in the image formation process) for recovering free-energy profiles with cryo-BIFE, we designed several control systems where the projections are generated synthetically following the ideas of Ref. 43 . The first system consists of conformations of the Hsp90 chaperone representing a low-dimensional (1D-2D) conformational space. The analysis is then extended to more realistic ensembles from MD simulations. Lastly, we apply cryo-BIFE to experimental cryo-EM data. To this end, we chose raw images of TMEM16F, a membrane channel and lipid scramblase 44 available at the EMPIAR databank 45 .
Free-energy profile recovery over controlled datasets. Hsp90 chaperone. Hsp90 (a heat shock protein) is a chaperone involved in the folding process of several kinases, transcription factors, and steroid hormone receptors 46 . This protein consists of two chains (A and B, containing 677 residues each) forming a V-like shape. Although Hsp90 is flexible, in the presence of certain ligands (e.g., ATP) its conformational space can be reduced to a few degrees of freedom that go from an open to a closed state of the chains. Following the ideas described in Ref. 43 , we reduced the open-closed dynamics of the Hsp90 into a one (1D) and two (2D) dimensional phase space where both chains are rotated in mutual, normal directions and perpendicular to the axis of symmetry (see the Methods).
Free-energy profile recovery for a 1D conformational change. In Fig. 2A, we show a 1D conformational change of Hsp90, where chain B is fixed and chain A is rotated from the closed state to the open state (denoted by CMA). We define the path using twenty conformations, equally spaced by 1 • in the rotation angle. The underlying synthetic free-energy profile (i.e. ground truth) along the path is shown as a black line in Fig. 2C. We generated around 13,300 synthetic images from the predetermined population of the twenty conformations (given by the Boltzmann factor of the ground truth free energy). The synthetic images have a uniform random signal-tonoise-ratio (SNR) log 10 ([0.001, 0.1]) , defocus [0.5,3] µ m and orientation angles (see the Methods). Examples of the synthetic particles are shown in Fig. 2B.
To apply cryo-BIFE, we first precalculated the BioEM probabilities for the nodes along the path and all synthetic images for two BioEM rounds of orientation estimation (see the Methods). The MCMC sampling strategy described in the Methods was applied to extract the expected G(s) and the credible interval at 5% and 95% of the empirical quantile at each node. Figure 2C, shows the results of G(s) using all particles for the first and second BioEM rounds of orientation estimation. Note that the second round was more accurate than the first. This was also reflected in the recovery of the free-energy profile G(s) , where the second round had a much better performance. This suggests that the pose accuracy of the particles is crucial for extracting an adequate free-energy estimate. The results from BioEM round 2 show that cryo-BIFE was able to recover the free-energy profile for a wide range of SNRs and defocus. Interestingly, the credible intervals widen for higher free-energy values, i.e., near the barrier, where there are fewer particles and the error is expected to be larger. Extracting the credible intervals is the main advantage of using the full posterior in comparison to a maximum a posteriori estimation (see Supplementary Fig. 1).
The performance of the method for different cryo-EM conditions was then studied. In Fig. 3A, the particle set was divided in two: high SNRs from [0.01, 0.1] and low SNRs from [0.001, 0.01], each with an equal number of particles ( ∼ 6600 each). The expected free energy calculated from cryo-BIFE is shown for the high and low SNRs sets (light blue and green, respectively) for the second BioEM orientation round. The expected free energy was also compared to G(s) using the entire set (blue line). We observed a poor recovery for the low SNR set [0.001, 0.01] and large errors, whereas the high SNR set behaved well. Interestingly, the free-energy estimate for the entire particle set (SNR [0.001, 0.1]) was slightly worse than for the high SNR set but much better than the low SNR set. The reason for this is that the Bayesian posterior (Eq. (9)) naturally weighs the contribution of each particle and particles with high SNR contribute much more weight to the posterior. If particles with even higher SNR are added (see Supplementary Fig. 2), the free-energy profile recovery is better, and for example, artifacts like the shoulder around s = 0.55 vanish.  [2,3] µ m (red line) were analyzed. The results for the large defocus were slightly better, but these have large errors around the barrier. The number of particles needed to recover the free-energy profile was also studied. In Fig. 3C, the results are shown for sets with 3300 (pink line) and 6600 (purple line) particles. In agreement with previous results for 3D map validation 47 , just a small set of particles ( ≥ 3000 ) randomly picked from the entire set is able to reproduce the underlying statistics. Contrary to 3D refinement, where large numbers of particles are required, our results indicate that conformational variability can be captured from a small set of particles.
Cryo-BIFE has several advantages over standard particle-classification methods for calculating the populations (or equivalently the free-energy profile). These classification methods treat each particle equally, whereas cryo-BIFE weighs them differently (e.g., depending on their SNR). Moreover, most methods assign each particle to a single node along the path and calculate a histogram over all particles to extract the populations. In Supplementary Fig. 3, this analysis (using the BioEM likelihood) was compared to the cryo-BIFE results for the 1D Hsp90 data with a wide range of SNR [0.001, 0.1]. These results show that cryo-BIFE outperforms standard The ground truth free-energy profile is shown in black. The expected free energy profile using cryo-BIFE is shown for BioEM orientation rounds 1 and 2 in orange and blue, respectively. The R-hat test for the MCMC stationarity yielded 1.000 and 1.001 for BioEM round 1 and 2, respectively. The bars show the credible interval at 5% and 95% of the empirical quantile at each node. A cubic spline is used to fit the expected free-energy profile, providing a smooth profile. www.nature.com/scientificreports/ classification because individual particle-contributions are weighted by the posterior and are not assigned to a single node.
2D conformational change of Hsp90. As described in Ref. 43 , Hsp90 is also characterized by a second degree of freedom; the rotation of chain B relative to the 1D rotation of chain A (see Fig. 4A, and the Methods). A synthetic 2D underlying free-energy surface was generated, shown in Fig. 4B, with an energy barrier of around 2 k B T . Given the imagining conditions in cryo-EM experiments, free-energy barriers around this range are expected. We generated 6800 synthetic particles, using the population given by the Boltzmann factor of ground truth free energy, with SNR [0.01, 0.1], defocus [0.5, 3] µ m and random orientations in SO(3) (see the Methods).
To study the effects of the path-CV, we defined three paths. The black dashed line (CV1) in Fig. 4B shows a good path-CV that passes along the relevant basins and the transition state of the system. In contrast, the orange and green dashed lines in Fig. 4B (CV2 and CV3, respectively) are able to discriminate between the states (i.e., good order parameters) but are not ideal reaction coordinates because they underestimate the barrier. In Fig. 4C, we compare the expected free-energy profile extracted with cryo-BIFE to the ground truth (given by Eq. (4)) along each path. Relatively good agreement between the underlying profile and the extracted free energy using the cryo-EM images along the three paths was observed. However, using only CV1, the metastable states of the system, the transition state, and true barrier height were recovered. Conversely, using non-ideal CVs, e.g., CV2 and CV3, the barrier can be underestimated. In extreme cases, the identification of the metastable states could also be lost. We note that these are artifacts caused by choosing a poor projection direction, and are not the result of using 2D images. This highlights the importance of choosing an adequate path-CV.
Cryo-BIFE over conformational ensembles. MD simulations of the VGVAPG hexapeptide have been extensively used to test methods, such as Girsanov reweighting 48 . In the Supplementary Information, we present a video showing an example of the hexapeptide MD simulations performed for this work (see the Methods). The peptide has opposite charges at its extremes and exhibits a conformational change between an open state and a closed state. Here, we will compare the free energy extracted from the 3D ensemble to one estimated by cryo-BIFE using 2D particles with the same path (Fig. 5A). The path was created by selecting ten conformations from the MD with equally spaced end-to-end distances between successive nodes (see the Methods). To calculate the free energy from the 3D conformations, we used the path-CV proposed by Branduardi et al. 36 with the RMSD as a metric. This path-CV was evaluated for each MD conformation, then a histogram was taken and the free energy was calculated via Boltzmann's factor and the population of each histogram bin. For cryo-BIFE, we used a set of 5688 synthetic images generated from the MD ensemble. The synthetic images had uniformly distributed random SNR, defocus and orientations (see the Methods). Cryo-BIFE was applied to extract the expected G(s) along the same path used for the 3D conformations. In Fig. 5B, the free-energy profiles from cryo-BIFE and the The free-energy profiles along these three path CVs, extracted with cryo-BIFE using synthetic particle images (dashed lines), are compared to the ground truth projected profiles (solid lines). The R-hat test for the MCMC yielded values < 1.003 for all cases. The bars show the credible interval at 5% and 95% of the empirical quantile at each node. The results are for the second BioEM round of orientation estimate. www.nature.com/scientificreports/ path-CV 36 were compared. The difference is that cryo-BIFE extracts the FE profile from 2D cryo-EM images, whereas the path-CV uses 3D conformations (Fig. 5A).
To investigate whether cryo-BIFE is able to resolve the free-energy profile of membrane proteins with nanodisk belts (as in the cryo-EM experiment), and small conformational changes ( < 4 Å), we attempted to recover a free-energy profile from synthetic images of the semiSWEET transporter generated from MD configurations. Our results are given in the Supplementary Text and Supplementary Figs. 4 and 5. In conjunction with our results on the VGVAPG hexapeptide, they demonstrate that cryo-BIFE is able to recover the free-energy profile from 2D cryo-EM projections for a realistic ensemble.
Real cryo-EM data: TMEM16F ion channel. TMEM16F is a membrane channel and lipid scramblase that is activated by calcium binding. In Ref. 44 , cryo-EM experiments using different Ca +2 conditions and membrane/detergent compositions were performed to resolve TMEM16F's Ca +2 bound and unbound states. The cryo-EM particles under different conditions are available at the EMPIAR 45 . In this work, we focus on the EMPIAR dataset with around 1.2 million particles that was used to generate the Ca +2 -bound state in digitonin (EMPIAR code 10278). Since around 13% of these particles are used to generate the final reconstruction (all other particles are classified out), we wanted to investigate (1) if there could be a small population of the Ca +2 -unbound state in this set, and (2) if a free-energy profile from the Ca +2 -bound to the Ca +2 -unbound states can be extracted. Starting from the PDB structures (Fig. 6A), steered MD simulations were used, which included a lipid membrane and explicit solvent (see the Methods), to generate a path connecting both states. The C α -RMSD of the nodes for both states is shown in Fig. 6B. We randomly selected around 15,000 particles from the entire set, not only those used for the final reconstruction. In Fig. 6C, the free energy along the path using the same cryo-BIFE setup as for the previous systems is shown. It was observed that both the Ca +2 -bound and the Ca +2 -unbound states correspond to metastable basins of the system. Because the cryo-EM data set was prepared with Ca +2 , it is expected that the Ca +2 -bound state corresponds to the lowest free-energy minimum. However, it is interesting that not all the particles belong to this state, and that the Ca +2 -unbound state also has metastability. The highest barrier is around 2.2 k B T , consistent with what is expected for turnover conditions in cryo-EM samples. These results show that it is possible to extract a free-energy profile from real cryo-EM particles that agrees with the biophysical setup and expectations of the system.

Discussion
In this work, we have developed cryo-BIFE, a methodology for extracting free-energy profiles from cryo-EM experiments using a Bayesian approach with a path collective variable. The method was tested and validated over diverse systems covering a range of complexities. Using controlled parameters, we found that the particle orientation accuracy and the SNR are important for adequately recovering the free-energy profile. This work is a proof of principle, demonstrating that under reasonable cryo-EM conditions it is possible to extract free-energy profiles using individual cryo-EM particles. www.nature.com/scientificreports/ Primary focus has been given to extracting the expectation of the free-energy profile G(s). However, this method produces (in the form of independent MCMC draws) the full posterior for such profiles, which contains much more information than just an average. In particular it quantifies the degree of certainty with which G(s) can be extracted given the noise in particle images. Credible intervals can be placed on any function of G, such as downstream predictions (reaction rates, etc), simply by evaluating them for all G values in a set of MCMC samples.
The cryo-BIFE analysis should be performed on a raw, unbiased cryoEM-particle set. For cryo-BIFE, particles can be picked, polished, and motion corrected. However, 3D-classification methods, which group particles with respect to conformational states, should not be performed before cryo-BIFE because these artificially modify the distribution of conformations. In other words, free-energy profiles extracted from classified-subsets of particles will be biased, and these will not represent the true thermodynamic ensemble.
Here, we have focused on developing, understanding and validating cryo-BIFE for a predetermined path. We have shown that under realistic cryo-EM-imaging conditions the extracted profile coincides with the free-energy profile of the true conformational ensemble along that path. A demanding aspect is how to generate a conformational path for experimental cases. If the metastable states of the system have been resolved using standard cryo-EM 3D classification or from X-ray crystallography, then one could create a path by simply interpolating the maps (or structures) or by using steered MD (as done for the TMEM16F system). If metastable states are not available, then, one could generate conformational paths by directly analyzing the variability of the 2D images, for example, using the covariance matrix or spatialvariational autoencoder (VAE) 49 .
A major challenge remains in determining if the path-CV is optimal. From a thermodynamic perspective, an optimal CV should separate the metastable states of the system, identify the transition states, and activation barriers, corresponding to those of the multidimensional landscape. The lowest free-energy path in the multidimensional space can be considered as an adequate CV. For simulations, several methods have been developed to measure the quality of a CV using transition state theory 50 or committor analysis 51 , and algorithms exist to find optimal path-CVs 37,41,42 that can be shown to converge stably 52 . Recently, additional developments have standardized CV design 53,54 . Nonetheless, a method to determine the optimal path-CV using cryo-EM images is still to be developed. Moreover, for some systems, a single degree of freedom may be insufficient and extending the CV to multiple dimensions would be advantageous.
It is important to note that the temperature plays a crucial role in extracting free energies. In principle, the flash-cooling process 7 is done rapidly enough that the cryo-EM sample is trapped in the ensemble just before freezing. Consequently, the extracted free-energy profile should be a representation of the system at that temperature. However, freezing takes on the order of µs 55 to complete, so all relaxation processes faster than this timescale are lost. Since vitrification is not instantaneous, cooling might depopulate the barrier and cause the estimated barrier to be artificially large. Other experimental considerations, such as icesheet buckling during vitrification, can cause further perturbations to the observed structural ensemble. It remains to be fully assessed (B) C α RMSD of the nodes along the path to the Ca +2 -bound and Ca +2 -unbound states (purple and green, respectively). (C) Free-energy profile extracted along the path CV from real cryo-EM particles from the dataset used to generate the Ca +2 -bound reconstruction in digitonin 44  www.nature.com/scientificreports/ how much the freezing process affects the extracted free energy 56 . On the other hand, to obtain high-resolution reconstructions, it is common to set the system at temperatures below the ambient one for over stabilizing a single state. We hope that these methods to extract free energies will motivate the field to measure more at ambient temperature, and moreover, use all particles (i.e., without having to discard large percentages). In summary, extracting free energies from cryo-EM experiments opens the field to the assessment of conformational dynamics from a biophysical perspective. By measuring the populations along relevant degrees of freedom, the results go beyond the discussion of discrete versus continuous, and the biophysical mechanisms are truly revealed. Additional clues to biomolecular function are unraveled by the information of the metastable states (e.g., the size and shape of the free energy basins), of the activation barriers and of the location of the transition states of the system, as is common in single-molecule experiments.

Methods
BioEM analysis. The likelihoods p(w i |x m ) in Eq. (9) were calculated using the BioEM algorithm 39 , as follows. Given an image w i and a 3D conformation (from a density map or atomic model) x m , BioEM computes the probability density p(w i |x m ) that w i is a projection of x m . This probability was calculated by integrating the likelihood function L(w i |�, x m ) (see the Supplementary Text), weighted by prior probabilities p(�) , over all relevant physical parameters for image formation (rotation angles, displacements, CTF parameters, noise variance, normalization factor and offset 39,40 ), The integrals over the noise variance, offset and normalization were performed analytically, and all others were computed numerically, as described in Ref. 40 . The prior densities of the orientation angles and the displacements were taken to be uniform over the integration interval. The prior for the CTF defocus parameter was a Gaussian distribution whose center and width depended on the BioEM rounds described below. The normalization constant in Eq. (10) requires some care, since for Bayes' rule, hence Eq. (9), to be correct, the likelihood p(w i |x m ) must be normalized over the space of 2D images w i . It suffices that that the normalization factor is merely independent of configuration x m .
The BioEM orientational integral was divided into two stages referred to as Round 1 and Round 2, respectively. In BioEM round 1, p(w i |x m ) was calculated by integrating over a uniform orientation grid of 36864 quaternions, which was constructed following the method described in Ref. 57 . The BioEM integration ranges and number of grid points for round 1 are presented in the Supplementary Text for each system. In BioEM round 2, a finer quaternion grid of 125 points was created around the ten best orientations (i.e., with the highest probability) selected from BioEM round 1. In total, a 1250 quaternion grid were used for the second BioEM orientation round. For this round, the Gaussian prior for the defocus was centered at the synthetic/experimental value of each particle and its scale was 0.3 µ m. This procedure is similar to that described in Refs. 47,58 ; however, here we calculated BioEM rounds 1 and 2 independently for each node of the path. We used the BioEM code from Ref. 40 with CPU and GPU acceleration. For one node along with the path and 10000 particles of 128 × 128 size, BioEM round 1 takes ∼ 6 h on 24 CPU cores + 2 GPUs, and BioEM round 2 takes ∼ 3 h on 24 CPU cores.
Recalling Eq. (9), one needs to evaluate Eq. (10) for every image-node pair, i.e., MI distinct evaluations. Then, to estimate the free-energy profile, we used the MCMC algorithm described below to draw samples from its posterior, Eq. (9).
Markov chain Monte Carlo. We used a Markov chain Monte Carlo (MCMC) method to draw a correlated sample of the free-energy profile G(s) from the posterior defined in Eq. (9). Such a set of samples captures the full posterior in a much more practical fashion than trying to represent it as a function in the high-dimensional space R M . We found that a standard random-walk Metropolis algorithm, sampling the unknown vector of values {G(s m )} M m=1 at the discrete quadrature nodes, was adequate for our needs. Initial values G 0 (s m ) were chosen independently and uniformly at random in [−2, 2] , for each m = 1, . . . , M . Then, each MCMC step i = 1, 2, . . . , N MC comprised the following sub-steps.
• We randomly selected a node m ∈ [1, M] with uniform probability.
• We randomly displaced the free-energy profile at the selected node G i (s m ) = G i−1 (s m ) + δg where δg was uniformly randomly chosen in [−0.5, 0.5]k B T. • We shifted the free-energy profile so that m G i (s m ) = 0 . Note that the particular choice of shift here is irrelevant. • We evaluated the posterior in Eq. (9) using the samples G i (s m ) of this free energy, and the pre-calculated values of log(p(w i |x m )) (described above by Eq. (10)) for all images and all nodes m = 1, . . . , M . For the prior in Eq. (9), we used p(G) = e − G d = 1/G 2 , where G = M−1 m=1 (G(s m+1 ) − G(s m )) 2 , which is a standard normal prior on the discrete differences, marginalized over the precision parameter .
• From this, the log-acceptance probability of the proposal was computed (here we omit s for notational simplicity, so that G may be thought of as a vector in R M ): • We chose a uniform random number u ∈ [0, 1] . Then, if log(u) ≤ A(G i , G i−1 ) , the move was accepted, otherwise it was rejected (in which case G i = G i−1 ). www.nature.com/scientificreports/ This procedure was iterated well beyond the time by which the distribution over samples has reached stationarity. For the systems analyzed in this work, we ran R = 8 independent MCMC chains each with a total of N MC = 200,000 steps. The expected value of the free energy at each node was calculated using all samples i = 1, . . . , R N MC , that is, Finally, since it is assumed that the nodes adequately discretize a continuous path, to recover a continuous function G(s) , we fitted a cubic spline through the values {G(s m )} M m=1 with knots being the nodes s m . Because only free-energy differences are relevant, we shifted G such that its minimum was zero. The credible interval for each node was calculated at 5% and 95% of the resulting empirical distribution. We performed the R-hat diagnostic 59 , which compares the inter-chain variance to the variance within each chain to monitor convergence of the MCMC using the arviz package 60 . R-hat values ≤ 1.1 indicate convergence of the sampling.
The MCMC code was written in Python3.5. It was optimized with the Numba compiler, taking approximately 2 h on 24 CPU cores for I = 13,000 particles, M = 20 nodes, and R = 8 replicas each with N MC = 200,000 MCMC iterations.
Synthetic particles. We used a modification of the BioEM program 40 to generate the synthetic cryo-EM particles following similar ideas to those described in Ref. 43 . Each image was created by coarse-graining the molecular configuration (e.g. one taken from an MD simulation) on the residue level. Each residue was represented as a sphere with a corresponding radius and number of electrons 39 . The contrast transfer function (CTF) was modeled on top of the ideal image given a defocus, amplitude and B-factor (for details see the SI of Ref. 39 ). For the synthetic particles, the amplitude was 0.1 and the B-factor was 1 Å. Gaussian noise was added on top of the CTF convoluted image. The standard deviation of the noise was determined (as in Ref. 43 ) using the SNR and variance of the image without noise (calculated within a circle of radius 40 pixels centered at the box center). All synthetic images were 128 × 128 pixels, however, the pixel size varied for each system. Benchmark systems. Hsp90 system. The Hsp90 chaperone is a flexible protein involved in several biological processes related to protein folding 46 . When bound to certain ligands, its conformational landscape can be approximated by two relative motions of its chains (A and B) 43 . The Hsp90 dynamics was reduced to a 2D dimensional phase space, where both chains are rotated in mutual normal directions and perpendicular to the axis of symmetry. In this work, we first assessed conformations from just one degree of freedom (1D analysis), and then we assessed images from conformations belonging to the 2D conformational space (2D analysis).
To generate the conformations for the first degree of freedom (1D case), we started from the closed state (PDB ID 2cg9 61 ), removed the ATP ligand and residues 1-11 to avoid overlapping crashes. Chain B was fixed and chain A was rotated at 1 • steps around the center of mass of residues LEU674-ASN677, up to 20 • from the starting position, generating 20 conformations along this degree of freedom (denominated CMA motion 43 ). These 20 conformations were used to define the path for the 1D analysis ( Fig. 2A). Along this reaction coordinate, we proposed a synthetic free energy (which determines the population occupancy) given by For the 2D conformational landscape, we add a new rotation. Starting from each rotated chain A from the 1D case, residues ILE12-LEU442 of chain B were rotated in 2 • steps around the center of mass of residues LEU442-LEU443, in the normal direction to the plane generated by the 1D movement of chain A and the axis of symmetry. This normal motion mode was referred to as CMB 43 . In total, 400 conformations were generated corresponding to 20 × 20 rotations. We proposed a 2D synthetic free energy given by exp(−βG true (u, v)) = exp(−(u−6) 2 /18−(v−6) 2 /10)+exp(−(u−15) 2 /18−(v−15) 2 /10) where u is the CMA motion and v the CMB motion. This density is characterized by two minima localized at models (6,6) and (15,15) separated by a barrier of around 2 k B T . We generated 6800 synthetic images of pixel size 2.2 Åwith uniformly distributed random orientations in SO (3), SNR in log 10 [0.01, 0.1] and defocus in [0.5, 3] µ m. For this case, we defined three paths: CV1 is a good reaction coordinate that passes through the minima and transition state following the function u = v (black dashed line Fig. 4B), CV2 has model u = 10 fixed and v varying (orange dashed line Fig. 4B) and CV3 has u varying and model v = 10 fixed (green dashed line Fig. 4B).
3D ensemble of the hexapeptide VGVAPG. We used the conformational ensemble of the hexapeptide VGVAPG from a long all-atom MD simulation in explicit solvent. GROMACS 62 was used to perform a 230 ns MD simulation. The initial conformation was extracted from the crystal structure of the Ca6 site mutant of Pro-SA-subtilisin 63 with PBD code 3VHQ (residues 171-176) 48 . The peptide was solvated with a cubic water box, centered at the geometric center of the complex with at least 2.0 nm between any two periodic images. The AMBER99SB-ILDN 64 force field and TIP3P water model were used 65 . Minimization was done with the steepest descent algorithm and stopped when the maximum force was ≤ 1000 kJ/mol nm. Periodic boundary conditions were used. We performed a 100 ps equilibration in an NVT ensemble using the velocity rescaling thermostat 66 followed by a 100 ps equilibration in an NPT ensemble using Parrinello-Rahman barostat 67 . The MD production run was performed without restraints, with a time step of 2 fs in an NPT ensemble at 300.15 K and 1 atm. We extracted MD snapshots (or frames) every 40 ps, obtaining 5688 conformations (shown in Supplementary video 1). www.nature.com/scientificreports/ We selected ten conformations to create the path such that the nodes covered the relevant conformational changes of the system. To do so, we use the end-to-end distance of the peptide, i.e., the distance between the nitrogen atom of the N-terminus, and the carboxyl carbon of the C-terminus 48 . The path was created by selecting ten conformations from the MD with equally spaced end-to-end distances between successive nodes of 1.8Å. The path is shown at the bottom of Fig. 5A, and it was used both with the path-CV 36 and cryo-BIFE. The path-CV was calculated using the RMSD between all the MD frames and the ten nodes belonging to the path with parameter = 50 Å -2 [using Eq. (8) of Ref. 36 ]. To calculate the free-energy profile, we computed the value of each CV for all MD conformations, summarized with a histogram (with a number of bins equal to the number of nodes along the path), and then estimated the free energy using the Boltzmann factor and the histogram bin populations.
From each MD conformation, we generated a synthetic image with pixel size of 0.3 Å and with uniformly distributed random orientations in SO (3), SNR in log 10 [0.01, 0.1] and defocus in [0.1, 1.0] µ m. Using the 5688 synthetic images and the same ten nodes of the path, we performed the cryo-BIFE analysis.
TMEM16F: experimental cryo-EM data. Cryo-EM particles. The cryo-EM particles of the TMEM16F membrane channel used to generate the calcium bound state 44 from the EMPIAR dataset 45 with code EMPI-AR-10278 were used. See Ref. 44 , for information about the experimental conditions. The images were recorded with a pixel size of 1.059 Å box size of 256 × 256 pixels, with defocus values within the interval [0.5, 2.7] µm . For this work, we randomly selected 15,000 images from this Ca +2 -bound (Digitonin_Ca) set. Note that these images represent the entire set and not only those used for the final reconstruction. Since only 13% of the particles from the EMPIAR-10278 set are used to create the Ca +2 -bound reconstruction 44 , our hypothesis is that not all imaged particles belong to this state. Our aim was to extract a free-energy profile from the Ca +2 -bound to the Ca +2 -unbound states using only the cryo-EM particles from the Ca +2 -added set.
Steered MD for creating the TMEM16F path. To generate the path, we used steered MD simulations from the Ca +2 -bound to the Ca +2 -unbound state. The simulations were performed as follows. We started from the Ca +2 -bound structure (PDB ID 6p46). Since the structure has atoms missing, we added these using the Swiss model webserver 68 . We note that because some residues have to accommodate to fit the missing residues the full atom structure was not identical to the PDB. Starting from the full atom model of 6p46, we added the membrane using CHARMM-GUI 69 , in a 3:1:1 ratio of 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC), 1-palmitoyl-2-oleoylsn-glycero-3-phosphoethanolamine (POPE), and 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphol-serine (POPS), respectively. A box size of 16.8076 × 16.8076 × 17.2012 nm was used with periodic boundary conditions and 122923 TIP3P water molecules were inserted. We used the GROMACS program 62 with the CHARMM36M force field 70 . The temperature was controlled in the simulation with the Berendsen thermostat at 300 K, whereas the pressure was controlled with the Berendsen barostat at 1.0 atm 71 . The energy was then minimized using the steepest descent algorithm and stopped when the maximum force was ≤ 1000 kJ/mol nm. We used the leapfrog algorithm to propagate the equations of motion. The long-range electrostatic interactions are calculated using a PME scheme with a 1.2 nm cutoff. We performed two consecutive equilibrations, of 125 ps each, in an NVT ensemble with a time step of 1 fs. Then, we performed two equilibrations in an NPT ensemble, where the first was of 125 ps and time step of 1 fs, and the last was of 1.5 ns, with a time step of 2 fs. For the equilibration in the NPT ensemble, the pressure coupling was of semi-isotropic type. The backbone atoms of the protein were restrained throughout the equilibration runs.
After the MD equilibration, we performed steered MD simulations 72 using the GROMACS program 62 patched with the PLUMED 2.5 library 73 . The first target structure for the steered MD was the Ca +2 -unbound state (PDB ID 6p47). We used the RMSD of the C α atoms to steer the dynamics between the initial structure and the target structure. The steering harmonic potential had an initial force constant of 5000 and ending at 260,000 kJ/mol/ nm 2 . We noticed that a threshold of 0.2 Å in RMSD to the Ca +2 -unbound reference was reached very quickly, in less than 1 ns (Supplementary Fig. 6). A second steered MD simulation was needed to go from the initial system (all-atom system) to the 6p46 PDB structure. This steered MD used the same parameters mentioned before. We also ran two short (1 ns) unbiased MD simulations starting from each state (i.e., closest conformation to PDB 6p47 and 6p46). These trajectories allowed us to build a path from the Ca +2 -bound to the Ca +2 -unbound states. We used the C α -RMSD to the Ca +2 -bound state to select 19 nodes, where successive nodes are as equidistant as possible (see Fig. 6B). To mimic the detergent in the cryo-EM images, we included a membrane nanodisk surrounding each node. It was taken from the lipids from the MD simulations, centered at the center of mass of the protein and of 50 Åradius. The nanodisk was modeled in a coarse-grained manner, similarly to the SemiSWEET transporter (see Supplementary Text and Supplementary Fig. 4).

Data availability
The BioEM code is available at https:// github. com/ bio-phys/ BioEM. For the MCMC Python code please contact the corresponding author.